History log of /freebsd-current/sys/sys/pcpu.h
Revision Date Author Comments
# 95ee2897 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: two-line .h pattern

Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/


# 5440e701 28-Jul-2023 Dmitry Chagin <dchagin@FreeBSD.org>

i386: Don't use static DPCPU and VNET defines in i386 modules

As of c84617e8 a similar to 4802a2cb and b6ea4c5a fix should be
applied to i386 too.

Reviewed by:
Differential Revision: https://reviews.freebsd.org/D41195


# b5d43972 21-Mar-2023 Mateusz Guzik <mjg@FreeBSD.org>

vfs: decouple freevnodes from vnode batching

In principle one cpu can keep vholding vnodes, while another vdrops
them. In this case it may be the local count will keep growing in an
unbounded manner. Roll it up after a threshold instead.

While here move it out of dpcpu into struct pcpu.

Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D39195


# f1f98706 18-Apr-2021 Warner Losh <imp@FreeBSD.org>

Minor style cleanup

We prefer 'while (0)' to 'while(0)' according to grep and stlye(9)'s
space after keyword rule. Remove a few stragglers of the latter.
Many of these usages were inconsistent within the file.

MFC After: 3 days
Sponsored by: Netflix


# dfff3776 12-Apr-2021 Mark Johnston <markj@FreeBSD.org>

Rename struct device to struct _device

types.h defines device_t as a typedef of struct device *. struct device
is defined in subr_bus.c and almost all of the kernel uses device_t.
The LinuxKPI also defines a struct device, so type confusion can occur.

This causes bugs and ambiguity for debugging tools. Rename the FreeBSD
struct device to struct _device.

Reviewed by: gbe (man pages)
Reviewed by: rpokala, imp, jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29676


# bee115bc 12-Feb-2020 Mateusz Guzik <mjg@FreeBSD.org>

Dedup zpcpu assertions into one macro and guard the rest with #ifndef

Sponsored by: The FreeBSD Foundation


# 3acb6572 12-Feb-2020 Mateusz Guzik <mjg@FreeBSD.org>

Store offset into zpcpu allocations in the per-cpu area.

This shorten zpcpu_get and allows more optimizations.

Reviewed by: jeff
Differential Revision: https://reviews.freebsd.org/D23570


# d1e55387 10-Feb-2020 Mateusz Guzik <mjg@FreeBSD.org>

Tidy up zpcpu_replace*

- only compute the target address once
- remove spurious type casting, zpcpu_get already return the correct type

While here add missing newlines to other routines.


# c77649d8 07-Feb-2020 Mateusz Guzik <mjg@FreeBSD.org>

Add zpcpu_{set,add,sub}_protected.

The _protected suffix follows counter(9).


# 98d97cde 17-Dec-2019 Mateusz Guzik <mjg@FreeBSD.org>

Convert zpcpu_* inlines to macros and add zpcpu_replace.

This allows them to do basic type casting internally, effectively relieving
consumers from having to cast on their own.


# 4cace859 16-Sep-2019 Mateusz Guzik <mjg@FreeBSD.org>

vfs: convert struct mount counters to per-cpu

There are 3 counters modified all the time in this structure - one for
keeping the structure alive, one for preventing unmount and one for
tracking active writers. Exact values of these counters are very rarely
needed, which makes them a prime candidate for conversion to a per-cpu
scheme, resulting in much better performance.

Sample benchmark performing fstatfs (modifying 2 out of 3 counters) on
a 104-way 2 socket Skylake system:
before: 852393 ops/s
after: 76682077 ops/s

Reviewed by: kib, jeff
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21637


# a2a0f906 29-Aug-2019 Konstantin Belousov <kib@FreeBSD.org>

Centralize __pcpu definitions.

Many extern struct pcpu <something>__pcpu declarations were
copied/pasted in sources. The issue is that the definition is MD, but
it cannot be provided by machine/pcpu.h due to actual struct pcpu
defined in sys/pcpu.h later than the inclusion of machine/pcpu.h.
This forced the copying when other code needed direct access to
__pcpu. There is no way around it, due to machine/pcpu.h supplying
part of struct pcpu fields.

To work around the problem, add a new machine/pcpu_aux.h header, which
should fill any needed MD definitions after struct pcpu definition is
completed. This allows to remove copies of __pcpu spread around the
source. Also on x86 it makes it possible to remove work arounds like
OFFSETOF_CURTHREAD or clang specific warnings supressions.

Reported and tested by: lwhsu, bcran
Reviewed by: imp, markj (previous version)
Discussed with: jhb
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D21418


# 018ff686 12-Aug-2019 Jeff Roberson <jeff@FreeBSD.org>

Move scheduler state into the per-cpu area where it can be allocated on the
correct NUMA domain.

Reviewed by: markj, gallatin
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D19315


# e2edff41 25-Jun-2019 Leandro Lupori <luporl@FreeBSD.org>

[PowerPC64] Don't mark module data as static

Fixes panic when loading ipfw.ko and if_epair.ko built with modern compiler.

Similar to arm64 and riscv, when using a modern compiler (!gcc4.2), code
generated tries to access data in the wrong location, causing kernel panic
(data storage interrupt trap) when loading if_epair and ipfw.

Issue was reproduced with kernel/module compiled using gcc8 and clang8. It
affects both ELFv1 and ELFv2 ABI environments.

PR: 232387
Submitted by: alfredo.junior_eldorado.org.br
Reported by: Mark Millard
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D20461


# e2e050c8 19-May-2019 Conrad Meyer <cem@FreeBSD.org>

Extract eventfilter declarations to sys/_eventfilter.h

This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h"
in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header
pollution substantially.

EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c
files into appropriate headers (e.g., sys/proc.h, powernv/opal.h).

As a side effect of reduced header pollution, many .c files and headers no
longer contain needed definitions. The remainder of the patch addresses
adding appropriate includes to fix those files.

LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by
sys/mutex.h since r326106 (but silently protected by header pollution prior
to this change).

No functional change (intended). Of course, any out of tree modules that
relied on header pollution for sys/eventhandler.h, sys/lock.h, or
sys/mutex.h inclusion need to be fixed. __FreeBSD_version has been bumped.


# 86c59375 12-Sep-2018 Ruslan Bukin <br@FreeBSD.org>

Don't mark module data as static on RISC-V.

Similar to arm64, riscv compiler uses PC-relative loads/stores,
and with static data compiler does not emit relocations.
In result, kernel module linker has nothing to fix and data accessed
from the wrong location.

Approved by: re (gjb)
Sponsored by: DARPA, AFRL


# b6ea4c5a 30-Jul-2018 Andrew Turner <andrew@FreeBSD.org>

As with DPCPU_DEFINE_STATIC make VNET_DEFINE_STATIC non-static on arm64 in
modules. It also fails in the same way, we are unable to relocate static
variables as the compiler uses PC-relative loads with nothing for the
kernel linker to relocate.

Sponsored by: DARPA, AFRL


# 4802a2cb 16-Jul-2018 Andrew Turner <andrew@FreeBSD.org>

Don't use the static keyword with DPCPU defines in arm64 modules.

On arm64 compiler will create PC-relative loads and stores for static data.
This means it doesn't emit a relocation. Unfortunately the in-kernel linker
expects there to be one for DPCPU defines so it can modify its value so the
code will use the correct DPCPU region.

To workaround the lack of a relocation with static data remove it when
building modules on arm64. The kernel is unaffected as it doesn't rely on
modifying these relocations to find the data.

PR: 225684
Reported by: Johannes Lundberg <johalun0@gmail.com>
Reported by: Jose Luis Duran <jlduran@gmail.com>
Reported by: Greg V <greg@unrelenting.technology>
Reviewed by: bz
Sponsored by: ABT Systems Ltd
Differential Revision: https://reviews.freebsd.org/D16145


# 53dec71d 06-Jul-2018 Konstantin Belousov <kib@FreeBSD.org>

Expand x86 struct pcpus to UMA_PCPU_ALLOC_SIZE AKA PAGE_SIZE.

This restores counters(9) operation.
Revert r336024. Improve assert of pcpu size on x86.

Reviewed by: mmacy
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D16163


# fb0a2811 06-Jul-2018 Konstantin Belousov <kib@FreeBSD.org>

Revert to recommit with the proper message.


# 16147166 06-Jul-2018 Konstantin Belousov <kib@FreeBSD.org>

Save a call to pmap_remove() if entry cannot have any pages mapped.

Due to the way rtld creates mappings for the shared objects, each dso
causes unmap of at least three guard map entries. For instance, in
the buildworld load, this change reduces the amount of pmap_remove()
calls by 1/5.

Profiled by: alc
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D16148


# ab3059a8 05-Jul-2018 Matt Macy <mmacy@FreeBSD.org>

Back pcpu zone with domain correct pages

- Change pcpu zone consumers to use a stride size of PAGE_SIZE.
(defined as UMA_PCPU_ALLOC_SIZE to make future identification easier)

- Allocate page from the correct domain for a given cpu.

- Don't initialize pc_domain to non-zero value if NUMA is not defined
There are some misconceptions surrounding this field. It is the
_VM_ NUMA domain and should only ever correspond to valid domain
values as understood by the VM.

The former slab size of sizeof(struct pcpu) was somewhat arbitrary.
The new value is PAGE_SIZE because that's the smallest granularity
which the VM can allocate a slab for a given domain. If you have
fewer than PAGE_SIZE/8 counters on your system there will be some
memory wasted, but this is obviously something where you want the
cache line to be coming from the correct domain.

Reviewed by: jeff
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D15933


# 2bf95012 05-Jul-2018 Andrew Turner <andrew@FreeBSD.org>

Create a new macro for static DPCPU data.

On arm64 (and possible other architectures) we are unable to use static
DPCPU data in kernel modules. This is because the compiler will generate
PC-relative accesses, however the runtime-linker expects to be able to
relocate these.

In preparation to fix this create two macros depending on if the data is
global or static.

Reviewed by: bz, emaste, markj
Sponsored by: ABT Systems Ltd
Differential Revision: https://reviews.freebsd.org/D16140


# 51369649 20-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.


# 83c9dea1 17-Apr-2017 Gleb Smirnoff <glebius@FreeBSD.org>

- Remove 'struct vmmeter' from 'struct pcpu', leaving only global vmmeter
in place. To do per-cpu stats, convert all fields that previously were
maintained in the vmmeters that sit in pcpus to counter(9).
- Since some vmmeter stats may be touched at very early stages of boot,
before we have set up UMA and we can do counter_u64_alloc(), provide an
early counter mechanism:
o Leave one spare uint64_t in struct pcpu, named pc_early_dummy_counter.
o Point counter(9) fields of vmmeter to pcpu[0].pc_early_dummy_counter,
so that at early stages of boot, before counters are allocated we already
point to a counter that can be safely written to.
o For sparc64 that required a whole dummy pcpu[MAXCPU] array.

Further related changes:
- Don't include vmmeter.h into pcpu.h.
- vm.stats.vm.v_swappgsout and vm.stats.vm.v_swappgsin changed to 64-bit,
to match kernel representation.
- struct vmmeter hidden under _KERNEL, and only vmstat(1) is an exclusion.

This is based on benno@'s 4-year old patch:
https://lists.freebsd.org/pipermail/freebsd-arch/2013-July/014471.html

Reviewed by: kib, gallatin, marius, lidl
Differential Revision: https://reviews.freebsd.org/D10156


# fbbd9655 28-Feb-2017 Warner Losh <imp@FreeBSD.org>

Renumber copyright clause 4

Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by: Jan Schaumann <jschauma@stevens.edu>
Pull Request: https://github.com/freebsd/freebsd/pull/96


# 650b9693 09-Aug-2016 Jean-Sébastien Pédron <dumbbell@FreeBSD.org>

sys/pcpu.h: Revert change introduced in r303890

`device_t` is not defined outside the kernel but this header is used by
eg. libkvm or vmstat(8). Thus, r303890 broke the build.

So let's restore `struct device` here until a longer term solution is
found.

Reported by: Michael Butler <imb@protected-networks.net>, Jenkins
MFC after: 3 days
MFC with: r303890


# bd937497 09-Aug-2016 Jean-Sébastien Pédron <dumbbell@FreeBSD.org>

Consistently use `device_t`

Several files use the internal name of `struct device` instead of
`device_t` which is part of the public API. This patch changes all
`struct device *` to `device_t`.

The remaining occurrences of `struct device` are those referring to the
Linux or OpenBSD version of the structure, or the code is not built on
FreeBSD and it's unclear what to do.

Submitted by: Matthew Macy <mmacy@nextbsd.org> (previous version)
Approved by: emaste, jhibbits, sbruno
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D7447


# 7c6f639b 04-May-2016 Enji Cooper <ngie@FreeBSD.org>

Revert r299096

The change broke buildworld when building lib/libkvm

This change likely needs to be run through a ports -exp run as a sanity
check, as it might break downstream consumers.

Pointyhat to: adrian
Reported by: kargl (confirmed on $work workstation)
Sponsored by: EMC / Isilon Storage Division


# 1b34262b 04-May-2016 Adrian Chadd <adrian@FreeBSD.org>

s/struct device */device_t/g

Submitted by: kmacy


# c25fabea 27-Aug-2015 Mark Johnston <markj@FreeBSD.org>

Remove weighted page handling from vm_page_advise().

This was added in r51337 as part of the implementation of
madvise(MADV_DONTNEED). Its objective was to ensure that the page daemon
would eventually reclaim other unreferenced pages (i.e., unreferenced pages
not touched by madvise()) from the active queue.

Now that the pagedaemon performs steady scanning of the active page queue,
this weighted handling is unnecessary. Instead, always "cache" clean pages
by moving them to the head of the inactive page queue. This simplifies the
implementation of vm_page_advise() and eliminates the fragmentation that
resulted from the distribution of pages among multiple queues.

Suggested by: alc
Reviewed by: alc
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D3401


# 9188e408 10-Feb-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Add zpcpu_get_cpu() that converts base pointer of UMA_ZPCPU_ZONE
to a pointer private to a given cpuid.

Sponsored by: Nginx, Inc.


# 17dece86 08-Apr-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Merge from projects/counters:

Pad struct pcpu so that its size is denominator of PAGE_SIZE. This
is done to reduce memory waste in UMA_PCPU_ZONE zones.

Sponsored by: Nginx, Inc.


# 6a612df1 17-Sep-2012 Attilio Rao <attilio@FreeBSD.org>

Remove namespace pollution in _rmlock.h by defining rm_queue structure
directly in _rmlock.h and then including it (and its dependencies)
in pcpu.h. This leads to few _*.h headers to be included in pcpu.h
but this is not considered a big deal.

Really pc_rm_queue should be implemented as a dynamic member with
DPCPU interface, but we really want to keep the read acquisition as
fast as possible, so even the further pc_dynamic indirection should be
avoided, and the pollution is dealt like this.

Discussed with: jhb
MFC after: 1 week


# 6338c579 19-Jul-2011 Attilio Rao <attilio@FreeBSD.org>

Remove explicit MAXCPU usage from sys/pcpu.h avoiding a namespace
pollution. That is a step further in the direction of building correct
policies for userland and modules on how to deal with the number of
maxcpus at runtime.

Reported by: jhb
Reviewed and tested by: pluknet
Approved by: re (kib)


# edf26ab8 19-Jul-2011 Attilio Rao <attilio@FreeBSD.org>

Remove pc_name member of struct pcpu.
pc_name is only included when KTR option is and it does introduce a
subdle KBI breakage that totally breaks vmstat when world and kernel are
not in sync.
Besides, it is not used somewhere.

In collabouration with: pluknet
Reviewed by: jhb
Approved by: re (kib)


# 2a292868 09-Jul-2011 Marius Strobl <marius@FreeBSD.org>

Fix the definition for PCPU_NAME_LEN, which is intended to fit
("CPU %d", cpuid) where cpuid <= MAXCPU.

1. sizeof(__XSTRING(MAXCPU) + 1) is a typo: typeof(__XSTRING(...) + 1)
is 'char *', so sizeof() will return the size of the pointer, not
the size of the string contents. The proper expression should be
'sizeof(__XSTRING(MAXCPU)) + 1'.

2. One should not add one, but substract it: sizeof() accounts for the
trailing '\0' and we have two sizeof's, so the size of one '\0'
should be substracted -- this will give the maximal string buffer
length for CPU with its number, no less, no more.

Submitted by: rea


# a2f4e284 04-Jul-2011 Attilio Rao <attilio@FreeBSD.org>

Completely remove now unused pc_other_cpus, pc_cpumask.

Tested by: pluknet


# d098f930 31-May-2011 Nathan Whitehorn <nwhitehorn@FreeBSD.org>

On multi-core, multi-threaded PPC systems, it is important that the threads
be brought up in the order they are enumerated in the device tree (in
particular, that thread 0 on each core be brought up first). The SLIST
through which we loop to start the CPUs has all of its entries added with
SLIST_INSERT_HEAD(), which means it is in reverse order of enumeration
and so AP startup would always fail in such situations (causing a machine
check or RTAS failure). Fix this by changing the SLIST into an STAILQ,
and inserting new CPUs at the end.

Reviewed by: jhb


# d5880f9c 27-May-2011 Attilio Rao <attilio@FreeBSD.org>

In the near future cpuset_t objects in struct pcpu will be axed out, but
as long as this does not happen, we need to fix interfaces to userland
in order to not break run-time accesses to the structure.

Reviwed by: kib
Tested by: pluknet


# 71a19bdc 05-May-2011 Attilio Rao <attilio@FreeBSD.org>

Commit the support for removing cpumask_t and replacing it directly with
cpuset_t objects.
That is going to offer the underlying support for a simple bump of
MAXCPU and then support for number of cpus > 32 (as it is today).

Right now, cpumask_t is an int, 32 bits on all our supported architecture.
cpumask_t on the other side is implemented as an array of longs, and
easilly extendible by definition.

The architectures touched by this commit are the following:
- amd64
- i386
- pc98
- arm
- ia64
- XEN

while the others are still missing.
Userland is believed to be fully converted with the changes contained
here.

Some technical notes:
- This commit may be considered an ABI nop for all the architectures
different from amd64 and ia64 (and sparc64 in the future)
- per-cpu members, which are now converted to cpuset_t, needs to be
accessed avoiding migration, because the size of cpuset_t should be
considered unknown
- size of cpuset_t objects is different from kernel and userland (this is
primirally done in order to leave some more space in userland to cope
with KBI extensions). If you need to access kernel cpuset_t from the
userland please refer to example in this patch on how to do that
correctly (kgdb may be a good source, for example).
- Support for other architectures is going to be added soon
- Only MAXCPU for amd64 is bumped now

The patch has been tested by sbruno and Nicholas Esborn on opteron
4 x 12 pack CPUs. More testing on big SMP is expected to came soon.
pluknet tested the patch with his 8-ways on both amd64 and i386.

Tested by: pluknet, sbruno, gianni, Nicholas Esborn
Reviewed by: jeff, jhb, sbruno


# 3e288e62 22-Nov-2010 Dimitry Andric <dim@FreeBSD.org>

After some off-list discussion, revert a number of changes to the
DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various
people working on the affected files. A better long-term solution is
still being considered. This reversal may give some modules empty
set_pcpu or set_vnet sections, but these are harmless.

Changes reverted:

------------------------------------------------------------------------
r215318 | dim | 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) | 4 lines

Instead of unconditionally emitting .globl's for the __start_set_xxx and
__stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu
sections are actually defined.

------------------------------------------------------------------------
r215317 | dim | 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) | 3 lines

Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.

------------------------------------------------------------------------
r215316 | dim | 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) | 2 lines

Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.


# c3adda9f 14-Nov-2010 Dimitry Andric <dim@FreeBSD.org>

Instead of unconditionally emitting .globl's for the __start_set_xxx and
__stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu
sections are actually defined.


# 47d46d92 14-Nov-2010 Dimitry Andric <dim@FreeBSD.org>

Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.


# 5f67450d 14-Nov-2010 Dimitry Andric <dim@FreeBSD.org>

Similar to sys/net/vnet.h, define the linker set name for sys/sys/pcpu.h
as a macro, and use it instead of literal strings.


# 4403994d 11-Nov-2010 Dimitry Andric <dim@FreeBSD.org>

Use the same treatment as in linker_set.h for the __start and __stop
symbols of the set_vnet and set_pcpu sections, so those symbols will
always be emitted in kernel modules, if they use vnet.h or pcpu.h.

Also, for pcpu.h, make the __(start|stop)_set_pcpu declarations, and
associated macros invisible to userland, to prevent it picking up these
symbols.

Reviewed by: kib


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# d9481554 15-Sep-2010 Andriy Gapon <avg@FreeBSD.org>

sys/pcpu.h: remove a workaround for a fixed ld bug

The workaround was incorrectly documented as having something to do with
set_pcpu section's progbits, but in fact it was for incorrect placement
of __start_set_pcpu because of the bug in ld.
The bug was fixed in r210245, see commit message for details.

A side-effect of the workaround was that a zero-size set_pcpu section was
produced for modules, source code of which included pcpu.h but didn't
actually define any dynamic per-cpu variables.
This commit should remove the side-effect.

The same workaround is present sys/net/vnet.h, has an analogous side-effect
and can be removed as well.

An UPDATING entry that warns about a need for recent ld is following.

MFC after: 1 month


# a3870a18 27-Jul-2010 John Baldwin <jhb@FreeBSD.org>

Very rough first cut at NUMA support for the physical page allocator. For
now it uses a very dumb first-touch allocation policy. This will change in
the future.
- Each architecture indicates the maximum number of supported memory domains
via a new VM_NDOMAIN parameter in <machine/vmparam.h>.
- Each cpu now has a PCPU_GET(domain) member to indicate the memory domain
a CPU belongs to. Domain values are dense and numbered from 0.
- When a platform supports multiple domains, the default freelist
(VM_FREELIST_DEFAULT) is split up into N freelists, one for each domain.
The MD code is required to populate an array of mem_affinity structures.
Each entry in the array defines a range of memory (start and end) and a
domain for the range. Multiple entries may be present for a single
domain. The list is terminated by an entry where all fields are zero.
This array of structures is used to split up phys_avail[] regions that
fall in VM_FREELIST_DEFAULT into per-domain freelists.
- Each memory domain has a separate lookup-array of freelists that is
used when fulfulling a physical memory allocation. Right now the
per-domain freelists are listed in a round-robin order for each domain.
In the future a table such as the ACPI SLIT table may be used to order
the per-domain lookup lists based on the penalty for each memory domain
relative to a specific domain. The lookup lists may be examined via a
new vm.phys.lookup_lists sysctl.
- The first-touch policy is implemented by using PCPU_GET(domain) to
pick a lookup list when allocating memory.

Reviewed by: alc


# 96f2f82a 16-Jul-2010 Lawrence Stewart <lstewart@FreeBSD.org>

Unbreak DPCPU_SUM() by dereferencing the pointer returned by DPCPU_ID_PTR().

MFC after: 3 days


# 7d1d80fd 13-Jul-2010 Lawrence Stewart <lstewart@FreeBSD.org>

- The sum variable used in DPCPU_SUM needs to be of the same type as the
DPCPU variable, rather than a pointer to the type.

- Zero # bytes equivalent to sizeof(object), not sizeof(ptr_to_object).

- Remove an unnecessary __typeof.

Sponsored by: FreeBSD Foundation
Submitted by: jmallet
MFC after: 3 days


# aac762c2 13-Jul-2010 Lawrence Stewart <lstewart@FreeBSD.org>

Macro to simplify zeroing DPCPU variables.

Sponsored by: FreeBSD Foundation
MFC after: 3 days


# 07de7d5f 13-Jul-2010 Lawrence Stewart <lstewart@FreeBSD.org>

- Rename DPCPU_SUM to DPCPU_VARSUM to better reflect the fact it operates on
member variables of a DPCPU struct.

- Add DPCPU_SUM which sums a DPCPU variable.

Sponsored by: FreeBSD Foundation
MFC after: 3 days


# cb0bd51d 18-Jun-2010 Lawrence Stewart <lstewart@FreeBSD.org>

- Rename the internal for loop iterator to "_i" to avoid potential shadowing of
external variables named "i". The "_" prefix is reserved for infrastructure
type code and is therefore not expected to be used by normal code likely to
call DPCPU_SUM(). [1]

- Change DPCPU_SUM to return the sum rather than calculate and assign it
internally. Usage is now: "sum = DPCPU_SUM(dpcpu_var, member_to_sum);" [2]

- Fix some style nits. [3]

Sponsored by: FreeBSD Foundation
Suggested by: bde [3], mdf [1], kib [1,2], pjd [1,3]
Reviewed by: kib
MFC after: 1 week (instead of r209119)


# 5ad333cf 12-Jun-2010 Lawrence Stewart <lstewart@FreeBSD.org>

Add a utility macro to simplify calculating an aggregate sum from a DPCPU
counter variable.

Sponsored by: FreeBSD Foundation
Reviewed by: jhb, rpaulo, rwatson (previous version of patch)
MFC after: 1 week


# 567e51e1 24-May-2010 Alan Cox <alc@FreeBSD.org>

Roughly half of a typical pmap_mincore() implementation is machine-
independent code. Move this code into mincore(), and eliminate the
page queues lock from pmap_mincore().

Push down the page queues lock into pmap_clear_modify(),
pmap_clear_reference(), and pmap_is_modified(). Assert that these
functions are never passed an unmanaged page.

Eliminate an inaccurate comment from powerpc/powerpc/mmu_if.m:
Contrary to what the comment says, pmap_mincore() is not simply an
optimization. Without a complete pmap_mincore() implementation,
mincore() cannot return either MINCORE_MODIFIED or MINCORE_REFERENCED
because only the pmap can provide this information.

Eliminate the page queues lock from vfs_setdirty_locked_object(),
vm_pageout_clean(), vm_object_page_collect_flush(), and
vm_object_page_clean(). Generally speaking, these are all accesses
to the page's dirty field, which are synchronized by the containing
vm object's lock.

Reduce the scope of the page queues lock in vm_object_madvise() and
vm_page_dontneed().

Reviewed by: kib (an earlier version)


# e80b9043 30-Mar-2010 John Baldwin <jhb@FreeBSD.org>

Use CACHE_LINE_SIZE alignment for 'struct pcpu' rather than hardcoding 128.

Reviewed by: jeff


# 495072de 30-Mar-2010 John Baldwin <jhb@FreeBSD.org>

Various and sundry style, whitespace, and comment fixes.

Submitted by: bde (mostly)


# 33962e6d 10-Mar-2010 John Baldwin <jhb@FreeBSD.org>

Typo.


# 28a2b3dd 12-Aug-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

MFC r196118:
Put minimum alignment on the dpcpu and vnet section so that ld
when adding the __start_ symbol knows the expected section alignment
and can place the __start_ symbol correctly.

These sections will not support symbols with super-cache line alignment
requirements.

For full details, see posting to freebsd-current, 2009-08-10,
Message-ID: <20090810133111.C93661@maildrop.int.zabbadoz.net>.

Debugging and testing patches by:
Kamigishi Rei (spambox haruhiism.net),
np, lstewart, jhb, kib, rwatson
Tested by: Kamigishi Rei, lstewart
Reviewed by: kib

Approved by: re


# 1b501e53 12-Aug-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

Put minimum alignment on the dpcpu and vnet section so that ld
when adding the __start_ symbol knows the expected section alignment
and can place the __start_ symbol correctly.

These sections will not support symbols with super-cache line alignment
requirements.

For full details, see posting to freebsd-current, 2009-08-10,
Message-ID: <20090810133111.C93661@maildrop.int.zabbadoz.net>.

Debugging and testing patches by:
Kamigishi Rei (spambox haruhiism.net),
np, lstewart, jhb, kib, rwatson
Tested by: Kamigishi Rei, lstewart
Reviewed by: kib
Approved by: re


# eddfbb76 14-Jul-2009 Robert Watson <rwatson@FreeBSD.org>

Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)


# 50c202c5 23-Jun-2009 Jeff Roberson <jeff@FreeBSD.org>

Implement a facility for dynamic per-cpu variables.
- Modules and kernel code alike may use DPCPU_DEFINE(),
DPCPU_GET(), DPCPU_SET(), etc. akin to the statically defined
PCPU_*. Requires only one extra instruction more than PCPU_* and is
virtually the same as __thread for builtin and much faster for shared
objects. DPCPU variables can be initialized when defined.
- Modules are supported by relocating the module's per-cpu linker set
over space reserved in the kernel. Modules may fail to load if there
is insufficient space available.
- Track space available for modules with a one-off extent allocator.
Free may block for memory to allocate space for an extent.

Reviewed by: jhb, rwatson, kan, sam, grehan, marius, marcel, stas


# 869c29a7 05-Jun-2009 John Baldwin <jhb@FreeBSD.org>

Trim old remnants of per-CPU KTR buffers.

Submitted by: Eygene Ryabinkin


# d4b5cae4 01-Jun-2009 Robert Watson <rwatson@FreeBSD.org>

Reimplement the netisr framework in order to support parallel netisr
threads:

- Support up to one netisr thread per CPU, each processings its own
workstream, or set of per-protocol queues. Threads may be bound
to specific CPUs, or allowed to migrate, based on a global policy.

In the future it would be desirable to support topology-centric
policies, such as "one netisr per package".

- Allow each protocol to advertise an ordering policy, which can
currently be one of:

NETISR_POLICY_SOURCE: packets must maintain ordering with respect to
an implicit or explicit source (such as an interface or socket).

NETISR_POLICY_FLOW: make use of mbuf flow identifiers to place work,
as well as allowing protocols to provide a flow generation function
for mbufs without flow identifers (m2flow). Falls back on
NETISR_POLICY_SOURCE if now flow ID is available.

NETISR_POLICY_CPU: allow protocols to inspect and assign a CPU for
each packet handled by netisr (m2cpuid).

- Provide utility functions for querying the number of workstreams
being used, as well as a mapping function from workstream to CPU ID,
which protocols may use in work placement decisions.

- Add explicit interfaces to get and set per-protocol queue limits, and
get and clear drop counters, which query data or apply changes across
all workstreams.

- Add a more extensible netisr registration interface, in which
protocols declare 'struct netisr_handler' structures for each
registered NETISR_ type. These include name, handler function,
optional mbuf to flow ID function, optional mbuf to CPU ID function,
queue limit, and ordering policy. Padding is present to allow these
to be expanded in the future. If no queue limit is declared, then
a default is used.

- Queue limits are now per-workstream, and raised from the previous
IFQ_MAXLEN default of 50 to 256.

- All protocols are updated to use the new registration interface, and
with the exception of netnatm, default queue limits. Most protocols
register as NETISR_POLICY_SOURCE, except IPv4 and IPv6, which use
NETISR_POLICY_FLOW, and will therefore take advantage of driver-
generated flow IDs if present.

- Formalize a non-packet based interface between interface polling and
the netisr, rather than having polling pretend to be two protocols.
Provide two explicit hooks in the netisr worker for start and end
events for runs: netisr_poll() and netisr_pollmore(), as well as a
function, netisr_sched_poll(), to allow the polling code to schedule
netisr execution. DEVICE_POLLING still embeds single-netisr
assumptions in its implementation, so for now if it is compiled into
the kernel, a single and un-bound netisr thread is enforced
regardless of tunable configuration.

In the default configuration, the new netisr implementation maintains
the same basic assumptions as the previous implementation: a single,
un-bound worker thread processes all deferred work, and direct dispatch
is enabled by default wherever possible.

Performance measurement shows a marginal performance improvement over
the old implementation due to the use of batched dequeue.

An rmlock is used to synchronize use and registration/unregistration
using the framework; currently, synchronized use is disabled
(replicating current netisr policy) due to a measurable 3%-6% hit in
ping-pong micro-benchmarking. It will be enabled once further rmlock
optimization has taken place. However, in practice, netisrs are
rarely registered or unregistered at runtime.

A new man page for netisr will follow, but since one doesn't currently
exist, it hasn't been updated.

This change is not appropriate for MFC, although the polling shutdown
handler should be merged to 7-STABLE.

Bump __FreeBSD_version.

Reviewed by: bz


# 0d2cf837 25-Jan-2009 Jeff Roberson <jeff@FreeBSD.org>

- Use __XSTRING where I want the define to be expanded. This resulted in
sizeof("MAXCPU") being used to calculate a string length rather than
something more reasonable such as sizeof("32"). This shouldn't have
caused any ill effect until we run on machines with 1000000 or more
cpus.


# 8f51ad55 17-Jan-2009 Jeff Roberson <jeff@FreeBSD.org>

- Implement generic macros for producing KTR records that are compatible
with src/tools/sched/schedgraph.py. This allows developers to quickly
create a graphical view of ktr data for any resource in the system.
- Add sched_tdname() and the pcpu field 'name' for quickly and uniformly
identifying records associated with a thread or cpu.
- Reimplement the KTR_SCHED traces using the new generic facility.

Obtained from: attilio
Discussed with: jhb
Sponsored by: Nokia


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# 70d12a18 19-Aug-2008 John Baldwin <jhb@FreeBSD.org>

Export 'struct pcpu' to userland w/o requiring _KERNEL. A few ports
already define _KERNEL to get to this and I'm about to add hooks to
libkvm to access per-CPU data.

MFC after: 1 week


# b75e2d0b 06-Mar-2008 Marcel Moolenaar <marcel@FreeBSD.org>

Move the PCPU_MD_FIELDS last in struct pcpu. While this header is
private to the kernel, some ports define _KERNEL and include this
header. While arguably this is wrong, it's also reality. By having
the MD fields last, architectures that have CPU-specific variations
of PCPU_MD_FIELDS will at least have the MI fields at a constant
offset. Of course, having all MI fields first helps kernel debugging
as well, so this is not a change without some benefits to us.

This change does not result in an ABI breakage, because this header
is not part of the ABI. Recompilation of lsof is required though :-)


# 1ad19157 14-Dec-2007 David E. O'Brien <obrien@FreeBSD.org>

Add comment to pc_cp_time.


# 7628402b 28-Nov-2007 Peter Wemm <peter@FreeBSD.org>

Move the shared cp_time array (counts %sys, %user, %idle etc) to the
per-cpu area. cp_time[] goes away and a new function creates a merged
cp_time-like array for things like linprocfs, sysctl etc. The
atomic ops for updating cp_time[] in statclock go away, and the scope
of the thread lock is reduced.

sysctl kern.cp_time returns a backwards compatible cp_time[] array.
A new kern.cp_times sysctl returns the individual per-cpu stats.

I have pending changes to make top and vmstat optionally show per-cpu
stats.

I'm very aware that there are something like 5 or 6 other versions "out
there" for doing this - but none were handy when I needed them.

I did merge my changes with John Baldwin's, and ended up replacing a
few chunks of my stuff with his, and stealing some other code.

Reviewed by: jhb
Partly obtained from: jhb


# f53d15fe 08-Nov-2007 Stephan Uphoff <ups@FreeBSD.org>

Initial checkin for rmlock (read mostly lock) a multi reader single writer
lock optimized for almost exclusive reader access. (see also rmlock.9)

TODO:
Convert to per cpu variables linkerset as soon as it is available.
Optimize UP (single processor) case.


# 42ce445f 06-Jun-2007 David Xu <davidxu@FreeBSD.org>

Backout experimental adaptive-spin umtx code.


# c640357f 10-Mar-2007 Alan Cox <alc@FreeBSD.org>

Push down the implementation of PCPU_LAZY_INC() into the machine-dependent
header file. Reimplement PCPU_LAZY_INC() on amd64 and i386 making it
atomic with respect to interrupts.

Reviewed by: bde, jhb


# 4e32b7b3 19-Dec-2006 David Xu <davidxu@FreeBSD.org>

Add a lwpid field into per-cpu structure, the lwpid represents current
running thread's id on each cpu. This allow us to add in-kernel adaptive
spin for user level mutex. While spinning in user space is possible,
without correct thread running state exported from kernel, it hardly
can be implemented efficiently without wasting cpu cycles, however
exporting thread running state unlikely will be implemented soon as
it has to design and stablize interfaces. This implementation is
transparent to user space, it can be disabled dynamically. With this
change, mutex ping-pong program's performance is improved massively on
SMP machine. performance of mysql super-smack select benchmark is increased
about 7% on Intel dual dual-core2 Xeon machine, it indicates on systems
which have bunch of cpus and system-call overhead is low (athlon64, opteron,
and core-2 are known to be fast), the adaptive spin does help performance.

Added sysctls:
kern.threads.umtx_dflt_spins
if the sysctl value is non-zero, a zero umutex.m_spincount will
cause the sysctl value to be used a spin cycle count.
kern.threads.umtx_max_spins
the sysctl sets upper limit of spin cycle count.

Tested on: Athlon64 X2 3800+, Dual Xeon 5130


# e0b65125 29-Nov-2006 John Birrell <jb@FreeBSD.org>

Turn console printf buffering into a kernel option and only on
by default for sun4v where it is absolutely required.

This change moves the buffer from struct pcpu to the stack to avoid
using the critical section which created a LOR in a couple of cases
due to interaction with the tty code and kqueue. The LOR can't be
fixed with the critical section and the pcpu buffer can't be used
without the critical section.

Putting the buffer on the stack was my initial solution, but it was
pointed out that the stress on the stack might cause problems
depending on the call path. We don't have a way of creating tests
for those possible cases, so it's best to leave this as an option
for the time being. In time we may get enough data to enable this
option more generally.


# 3d068827 31-Oct-2006 John Birrell <jb@FreeBSD.org>

Add a cnputs() function to write a string to the console with
a lock to prevent interspersed strings written from different CPUs
at the same time.

To avoid putting a buffer on the stack or having to malloc one,
space is incorporated in the per-cpu structure. The buffer
size if 128 bytes; chosen because it's the next power of 2 size
up from 80 characters.

String writes to the console are buffered up the end of the line
or until the buffer fills. Then the buffer is flushed to all
console devices.

Existing low level console output via cnputc() is unaffected by
this change. ithread calls to log() are also unaffected to avoid
blocking those threads.

A minor change to the behaviour in a panic situation is that
console output will still be buffered, but won't be written to
a tty as before. This should prevent interspersed panic output
as a number of CPUs panic before we end up single threaded
running ddb.

Reviewed by: scottl, jhb
MFC after: 2 weeks


# 5b1a8eb3 07-Feb-2006 Poul-Henning Kamp <phk@FreeBSD.org>

Modify the way we account for CPU time spent (step 1)

Keep track of time spent by the cpu in various contexts in units of
"cputicks" and scale to real-world microsec^H^H^H^H^H^H^H^Hclock_t
only when somebody wants to inspect the numbers.

For now "cputicks" are still derived from the current timecounter
and therefore things should by definition remain sensible also on
SMP machines. (The main reason for this first milestone commit is
to verify that hypothesis.)

On slower machines, the avoided multiplications to normalize timestams
at every context switch, comes out as a 5-7% better score on the
unixbench/context1 microbenchmark. On more modern hardware no change
in performance is seen.


# 0e56f395 26-Apr-2005 John Baldwin <jhb@FreeBSD.org>

Drop the CURPROC, curkse, and curksegrp aliases as they aren't used
anywhere.


# 7849b49a 26-Apr-2005 Robert Watson <rwatson@FreeBSD.org>

Add 'curcpu', a shortcut to the current CPU ID, similar to curthread,
curproc, et al. Useful for indexing into per-CPU data structures.

MFC after: 2 weeks


# 60727d8b 06-Jan-2005 Warner Losh <imp@FreeBSD.org>

/* -> /*- for license, minor formatting changes


# b2ae7ed7 27-Mar-2004 Marcel Moolenaar <marcel@FreeBSD.org>

Change the type of the various CPU masks to cpumask_t. Note that as
long as there are still explicit uses of int, whether in types or
in function names (such as atomic_set_int() in sched_ule.c), we can
not change cpumask_t to be anything other than u_int. See also the
commit log for sys/sys/types.h, revision 1.84.


# 29f5b9a8 08-Mar-2004 Nate Lawson <njl@FreeBSD.org>

Hook CPUs up to newbus. CPUs will ultimately be a bus driver so that
multiple CPU-specific drivers can attach. This is a work in progress
so children aren't supported yet.

Help from: jhb


# de8e370e 20-Nov-2003 Peter Wemm <peter@FreeBSD.org>

Allow the MD backend to provide an alternative to
#define curthread PCPU_GET(curthread)
since its so heavily used in the kernel and ripe for compile-speed
optimization on some platforms.


# 696058c3 09-Dec-2002 Julian Elischer <julian@FreeBSD.org>

Unbreak the KSE code. Keep track of zobie threads using the Per-CPU storage
during the context switch. Rearrange thread cleanups
to avoid problems with Giant. Clean threads when freed or
when recycled.

Approved by: re (jhb)


# 004998bc 20-Aug-2002 John Baldwin <jhb@FreeBSD.org>

Whitespace and style fixes.

Submitted by: bde


# 80f5c8bf 04-Apr-2002 Matthew Dillon <dillon@FreeBSD.org>

Embed a struct vmmeter in the per-cpu structure and add a macro,
PCPU_LAZY_INC() which increments elements in it for cases where we
can afford the occassional inaccuracy. Use of per-cpu stats counters
avoids significant cache stalls in various critical paths that would
otherwise severely limit our cpu scaleability.

Adjust all sysctl's accessing cnt.* elements to now use a procedure
which aggregates the requested field for all cpus and for the global
vmmeter.

The global vmmeter is retained, since some stats counters, like v_free_min,
cannot be made per-cpu. Also, this allows us to convert counters from
the global vmmeter to the per-cpu vmmeter in a piecemeal fashion, so
have at it!


# 8d0747c9 20-Mar-2002 John Baldwin <jhb@FreeBSD.org>

Document that MD pcpu fields are defined in PCPU_MD_FIELDS in
machine/pcpu.h.

Requested by: dillon


# 181df8c9 26-Feb-2002 Matthew Dillon <dillon@FreeBSD.org>

revert last commit temporarily due to whining on the lists.


# f96ad4c2 26-Feb-2002 Matthew Dillon <dillon@FreeBSD.org>

STAGE-1 of 3 commit - allow (but do not require) interrupts to remain
enabled in critical sections and streamline critical_enter() and
critical_exit().

This commit allows an architecture to leave interrupts enabled inside
critical sections if it so wishes. Architectures that do not wish to do
this are not effected by this change.

This commit implements the feature for the I386 architecture and provides
a sysctl, debug.critical_mode, which defaults to 1 (use the feature). For
now you can turn the sysctl on and off at any time in order to test the
architectural changes or track down bugs.

This commit is just the first stage. Some areas of the code, specifically
the MACHINE_CRITICAL_ENTER #ifdef'd code, is strictly temporary and will
be cleaned up in the STAGE-2 commit when the critical_*() functions are
moved entirely into MD files.

The following changes have been made:

* critical_enter() and critical_exit() for I386 now simply increment
and decrement curthread->td_critnest. They no longer disable
hard interrupts. When critical_exit() decrements the counter to
0 it effectively calls a routine to deal with whatever interrupts
were deferred during the time the code was operating in a critical
section.

Other architectures are unaffected.

* fork_exit() has been conditionalized to remove MD assumptions for
the new code. Old code will still use the old MD assumptions
in regards to hard interrupt disablement. In STAGE-2 this will
be turned into a subroutine call into MD code rather then hardcoded
in MI code.

The new code places the burden of entering the critical section
in the trampoline code where it belongs.

* I386: interrupts are now enabled while we are in a critical section.
The interrupt vector code has been adjusted to deal with the fact.
If it detects that we are in a critical section it currently defers
the interrupt by adding the appropriate bit to an interrupt mask.

* In order to accomplish the deferral, icu_lock is required. This
is i386-specific. Thus icu_lock can only be obtained by mainline
i386 code while interrupts are hard disabled. This change has been
made.

* Because interrupts may or may not be hard disabled during a
context switch, cpu_switch() can no longer simply assume that
PSL_I will be in a consistent state. Therefore, it now saves and
restores eflags.

* FAST INTERRUPT PROVISION. Fast interrupts are currently deferred.
The intention is to eventually allow them to operate either while
we are in a critical section or, if we are able to restrict the
use of sched_lock, while we are not holding the sched_lock.

* ICU and APIC vector assembly for I386 cleaned up. The ICU code
has been cleaned up to match the APIC code in regards to format
and macro availability. Additionally, the code has been adjusted
to deal with deferred interrupts.

* Deferred interrupts use a per-cpu boolean int_pending, and
masks ipending, spending, and fpending. Being per-cpu variables
it is not currently necessary to lock; bus cycles modifying them.

Note that the same mechanism will enable preemption to be
incorporated as a true software interrupt without having to
further hack up the critical nesting code.

* Note: the old critical_enter() code in kern/kern_switch.c is
currently #ifdef to be compatible with both the old and new
methodology. In STAGE-2 it will be moved entirely to MD code.

Performance issues:

One of the purposes of this commit is to enhance critical section
performance, specifically to greatly reduce bus overhead to allow
the critical section code to be used to protect per-cpu caches.
These caches, such as Jeff's slab allocator work, can potentially
operate very quickly making the effective savings of the new
critical section code's performance very significant.

The second purpose of this commit is to allow architectures to
enable certain interrupts while in a critical section. Specifically,
the intention is to eventually allow certain FAST interrupts to
operate rather then defer.

The third purpose of this commit is to begin to clean up the
critical_enter()/critical_exit()/cpu_critical_enter()/
cpu_critical_exit() API which currently has serious cross pollution
in MI code (in fork_exit() and ast() for example).

The fourth purpose of this commit is to provide a framework that
allows kernel-preempting software interrupts to be implemented
cleanly. This is currently used for two forward interrupts in I386.
Other architectures will have the choice of using this infrastructure
or building the functionality directly into critical_enter()/
critical_exit().

Finally, this commit is designed to greatly improve the flexibility
of various architectures to manage critical section handling,
software interrupts, preemption, and other highly integrated
architecture-specific details.


# 1cbb9c3b 22-Feb-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Convert p->p_runtime and PCPU(switchtime) to bintime format.


# ab8061d8 05-Jan-2002 Peter Wemm <peter@FreeBSD.org>

Add a per-cpu variable, cpumask, the preshifted equivalent of 1 << cpuid.
We use this around the place a lot.


# 0bbc8826 11-Dec-2001 John Baldwin <jhb@FreeBSD.org>

Overhaul the per-CPU support a bit:

- The MI portions of struct globaldata have been consolidated into a MI
struct pcpu. The MD per-CPU data are specified via a macro defined in
machine/pcpu.h. A macro was chosen over a struct mdpcpu so that the
interface would be cleaner (PCPU_GET(my_md_field) vs.
PCPU_GET(md.md_my_md_field)).
- All references to globaldata are changed to pcpu instead. In a UP kernel,
this data was stored as global variables which is where the original name
came from. In an SMP world this data is per-CPU and ideally private to each
CPU outside of the context of debuggers. This also included combining
machine/globaldata.h and machine/globals.h into machine/pcpu.h.
- The pointer to the thread using the FPU on i386 was renamed from
npxthread to fpcurthread to be identical with other architectures.
- Make the show pcpu ddb command MI with a MD callout to display MD
fields.
- The globaldata_register() function was renamed to pcpu_init() and now
init's MI fields of a struct pcpu in addition to registering it with
the internal array and list.
- A pcpu_destroy() function was added to remove a struct pcpu from the
internal array and list.

Tested on: alpha, i386
Reviewed by: peter, jake


# ba228f6d 10-May-2001 John Baldwin <jhb@FreeBSD.org>

- Split out the support for per-CPU data from the SMP code. UP kernels
have per-CPU data and gdb on the i386 at least needs access to it.
- Clean up includes in kern_idle.c and subr_smp.c.

Reviewed by: jake