History log of /freebsd-current/sys/sys/cpuset.h
Revision Date Author Comments
# cd4bd975 27-Apr-2024 Jake Freeland <jfree@FreeBSD.org>

bitset: Add ORNOT macros

Macros to ANDNOT a bitset currently exist, but there are no ORNOT
equivalents. Introduce ORNOT macros for bitset(9), cpuset(9), and
domainset(9).

Approved by: markj (mentor)
Reviewed by: markj
MFC after: 1 week
Sponsored by: NIKSUN, Inc.
Differential Revision: https://reviews.freebsd.org/D44976


# 95ee2897 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: two-line .h pattern

Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/


# 4d846d26 10-May-2023 Warner Losh <imp@FreeBSD.org>

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix


# c21b080f 29-Jan-2023 Dmitry Chagin <dchagin@FreeBSD.org>

cpuset: Fix sched_[g|s]etaffinity() for better compatibility with Linux.

Under Linux to sched_[g|s]etaffinity() functions the value returned from a call
to gettid(2) (thread id) can be passed in the argument pid. Specifying pid as 0
will set the attribute for the calling thread, and passing the value returned
from a call to getpid(2) (process id) will set the attribute for the main thread
of the thread group.

Native cpuset(2) family of system calls has "which" argument to determine how
the value of id argument is interpreted, i.e., CPU_WHICH_TID is used to pass
a thread id and CPU_WHICH_PID - to pass a process id.

For now native sched_[g|s]etaffinity() implementation is wrong as uses "which"
CPU_WHICH_PID to pass both (process and thread id) to the kernel. To fix this
adding a new "which" CPU_WHICH_TIDPID intended to handle both id's.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D38209
MFC after: 1 week


# 01f74ccd 29-Jan-2023 Dmitry Chagin <dchagin@FreeBSD.org>

libthr: Fix pthread_attr_[g|s]etaffinity_np to match it's manual and the kernel.

Since f35093f8 semantics of a thread affinity functions is changed to be a
compatible with Linux:

In case of getaffinity(), the minimum cpuset_t size that the kernel permits is
the maximum CPU id, present in the system, / NBBY bytes, the maximum size is not
limited.
In case of setaffinity(), the kernel does not limit the size of the user-provided
cpuset_t, internally using only the meaningful part of the set, where the upper
bound is the maximum CPU id, present in the system, no larger than the size of
the kernel cpuset_t.

To match pthread_attr_[g|s]etaffinity_np checks of the user-provided cpusets to
the kernel behavior export the minimum cpuset_t size allowed by running kernel
via new sysctl kern.sched.cpusetsizemin and use it in checks.

Reviewed by:
Differential Revision: https://reviews.freebsd.org/D38112
MFC after: 1 week


# 02f7670e 29-Jan-2023 Dmitry Chagin <dchagin@FreeBSD.org>

sched.h: Fix _S macros for better compatibility with glibc.

In e2650af157 was added "_S" macros for compatibility with glibc, but it's still
incompatible as under glibc the macros whose names end with "_S" operate on the
dynamically allocated CPU set(s) whose size is in bytes, not in bits.

While here remove limiting ifdef to non-kernel case.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D38110
MFC after: 1 week


# 4a3e5133 20-May-2022 Mark Johnston <markj@FreeBSD.org>

cpuset: Fix the KASAN and KMSAN builds

Rename the "copyin" and "copyout" fields of struct cpuset_copy_cb to
something less generic, since sanitizers define interceptors for
copyin() and copyout() using #define.

Reported by: syzbot+2db5d644097fc698fb6f@syzkaller.appspotmail.com
Fixes: 47a57144af25 ("cpuset: Byte swap cpuset for compat32 on big endian architectures")
Sponsored by: The FreeBSD Foundation


# 47a57144 12-May-2022 Justin Hibbits <jhibbits@FreeBSD.org>

cpuset: Byte swap cpuset for compat32 on big endian architectures

Summary:
BITSET uses long as its basic underlying type, which is dependent on the
compile type, meaning on 32-bit builds the basic type is 32 bits, but on
64-bit builds it's 64 bits. On little endian architectures this doesn't
matter, because the LSB is always at the low bit, so the words get
effectively concatenated moving between 32-bit and 64-bit, but on
big-endian architectures it throws a wrench in, as setting bit 0 in
32-bit mode is equivalent to setting bit 32 in 64-bit mode. To
demonstrate:

32-bit mode:

BIT_SET(foo, 0): 0x00000001

64-bit sees: 0x0000000100000000

cpuset is the only system interface that uses bitsets, so solve this
by swapping the integer sub-components at the copyin/copyout points.

Reviewed by: kib
MFC after: 3 days
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D35225


# 5650d340 31-Dec-2021 Stefan Eßer <se@FreeBSD.org>

sys/cpuset.h: fix macro definition

The _s parameter was missing in the paramater list.

Reported by: gljennjohn at gmail.com (Gary Jennejohn)


# cb65d443 31-Dec-2021 Stefan Eßer <se@FreeBSD.org>

sys/cpuset.h: add 3 more macros provided by GLIBC

The lang/python* ports failed since they expected CPU_COUNT_S() to be
provided by sys/cpuset.h. Add this function plus 2 more in a way that
is compatible with GLIBC.

Reported by: ler at lerctr.org (Larry Rosenman)


# e2650af1 29-Dec-2021 Stefan Eßer <se@FreeBSD.org>

Make CPU_SET macros compliant with other implementations

The introduction of <sched.h> improved compatibility with some 3rd
party software, but caused the configure scripts of some ports to
assume that they were run in a GLIBC compatible environment.

Parts of sched.h were made conditional on -D_WITH_CPU_SET_T being
added to ports, but there still were compatibility issues due to
invalid assumptions made in autoconfigure scripts.

The differences between the FreeBSD version of macros like CPU_AND,
CPU_OR, etc. and the GLIBC versions was in the number of arguments:
FreeBSD used a 2-address scheme (one source argument is also used as
the destination of the operation), while GLIBC uses a 3-adderess
scheme (2 source operands and a separately passed destination).

The GLIBC scheme provides a super-set of the functionality of the
FreeBSD macros, since it does not prevent passing the same variable
as source and destination arguments. In code that wanted to preserve
both source arguments, the FreeBSD macros required a temporary copy of
one of the source arguments.

This patch set allows to unconditionally provide functions and macros
expected by 3rd party software written for GLIBC based systems, but
breaks builds of externally maintained sources that use any of the
following macros: CPU_AND, CPU_ANDNOT, CPU_OR, CPU_XOR.

One contributed driver (contrib/ofed/libmlx5) has been patched to
support both the old and the new CPU_OR signatures. If this commit
is merged to -STABLE, the version test will have to be extended to
cover more ranges.

Ports that have added -D_WITH_CPU_SET_T to build on -CURRENT do
no longer require that option.

The FreeBSD version has been bumped to 1400046 to reflect this
incompatible change.

Reviewed by: kib
MFC after: 2 weeks
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D33451


# 5e04571c 05-Dec-2021 Stefan Eßer <se@FreeBSD.org>

sys/bitset.h: reduce visibility of BIT_* macros

Add two underscore characters "__" to names of BIT_* and BITSET_*
macros to move them to the implementation name space and to prevent
a name space pollution due to BIT_* macros in 3rd party programs with
conflicting parameter signatures.

These prefixed macro names are used in kernel header files to define
macros in e.g. sched.h, sys/cpuset.h and sys/domainset.h.

If C programs are built with either -D_KERNEL (automatically passed
when building a kernel or kernel modules) or -D_WANT_FREENBSD_BITSET
(or this macros is defined in the source code before including the
bitset macros), then all macros are made visible with their previous
names, too. E.g., both __BIT_SET() and BIT_SET() are visible with
either of _KERNEL or _WANT_FREEBSD_BITSET defined.

The main reason for this change is that some 3rd party sources
including sched.h have been found to contain conflicting BIT_*
macros.

As a work-around, parts of shed.h have been made conditional and
depend on _WITH_CPU_SET_T being set when sched.h is included.
Ports that expect the full functionality provided by sched.h need
to be built with -D_WITH_CPU_SET_T. But this leads to conflicts if
BIT_* macros are defined in that program, too.

This patch set makes all of sched.h visible again without this
parameter being passed and without any name space pollution due
to BIT_* macros becoming visible when sched.h is included.

This patch set will be backported to the STABLE branches, but ports
will need to use -D_WITH_CPU_SET_T as long as there are supported
releases that do not contain these patches.

Reviewed by: kib, markj
MFC after: 1 month
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D33235


# 43e6f07b 30-Oct-2021 Konstantin Belousov <kib@FreeBSD.org>

sched.h: add CPU_EQUAL() for better compatibility with Linux

Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32901


# de855429 21-Sep-2021 Mark Johnston <markj@FreeBSD.org>

cpuset(9): Add CPU_FOREACH_IS(SET|CLR) and modify consumers to use it

This implementation is faster and doesn't modify the cpuset, so it lets
us avoid some unnecessary copying as well. No functional change
intended.

This is a re-application of commit
9068f6ea697b1b28ad1326a4c7a9ba86f08b985e.

Reviewed by: cem, kib, jhb
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32029


# bcdc599d 21-Sep-2021 Mark Johnston <markj@FreeBSD.org>

Revert "cpuset(9): Add CPU_FOREACH_IS(SET|CLR) and modify consumers to use it"

This reverts commit 9068f6ea697b1b28ad1326a4c7a9ba86f08b985e.

The underlying macro needs to be reworked to avoid problems with control
flow statements.

Reported by: rlibby


# 9068f6ea 21-Sep-2021 Mark Johnston <markj@FreeBSD.org>

cpuset(9): Add CPU_FOREACH_IS(SET|CLR) and modify consumers to use it

This implementation is faster and doesn't modify the cpuset, so it lets
us avoid some unnecessary copying as well. No functional change
intended.

Reviewed by: cem, kib, jhb
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32029


# ca7005f1 25-Apr-2021 Patrick Kelsey <pkelsey@FreeBSD.org>

iflib: Improve mapping of TX/RX queues to CPUs

iflib now supports mapping each (TX,RX) queue pair to the same CPU
(default), to separate CPUs, or to a pair of physical and logical CPUs
that share the same L2 cache. The mapping mechanism supports unequal
numbers of TX and RX queues, with the excess queues always being
mapped to consecutive physical CPUs. When the platform cannot
distinguish between physical and logical CPUs, all are treated as
physical CPUs. See the comment on get_cpuid_for_queue() for the
entire matrix.

The following device-specific tunables influence the mapping process:
dev.<device>.<unit>.iflib.core_offset (existing)
dev.<device>.<unit>.iflib.separate_txrx (existing)
dev.<device>.<unit>.iflib.use_logical_cores (new)

The following new, read-only sysctls provide visibility of the mapping
results:
dev.<device>.<unit>.iflib.{t,r}xq<n>.cpu

When an iflib driver allocates TX softirqs without providing reference
RX IRQs, iflib now binds those TX softirqs to CPUs using the above
mapping mechanism (that is, treats them as if they were TX IRQs).
Previously, such bindings were left up to the grouptaskqueue code and
thus fell outside of the iflib CPU mapping strategy.

Reviewed by: kbowling
Tested by: olivier, pkelsey
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D24094


# 9f1e5783 16-Nov-2020 Mateusz Guzik <mjg@FreeBSD.org>

cpuset: reorder so that cs_mask does not share cacheline with cs_ref


# 26a3bf76 21-Sep-2020 D Scott Phillips <scottph@FreeBSD.org>

bitset: expand bit index type to `long`

An upcoming patch to use the bitset macros for tracking vm page
dump information could conceivably need more than INT_MAX bits.
Expand the bit type to long so that the extra range is available
on 64-bit platforms where it would most likely be needed.

CPUSET_COUNT and DOMAINSET_COUNT are also modified to remain of
type `int`.

Reviewed by: kib, markj
Approved by: scottl (implicit)
MFC after: 1 week
Sponsored by: Ampere Computing, Inc.
Differential Revision: https://reviews.freebsd.org/D26190


# 9825eadf 13-Dec-2019 Ryan Libby <rlibby@FreeBSD.org>

bitset: rename confusing macro NAND to ANDNOT

s/BIT_NAND/BIT_ANDNOT/, and for CPU and DOMAINSET too. The actual
implementation is "and not" (or "but not"), i.e. A but not B.
Fortunately this does appear to be what all existing callers want.

Don't supply a NAND (not (A and B)) operation at this time.

Discussed with: jeff
Reviewed by: cem
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D22791


# e5818a53 28-Mar-2018 Jeff Roberson <jeff@FreeBSD.org>

Implement several enhancements to NUMA policies.

Add a new "interleave" allocation policy which stripes pages across
domains with a stride or width keeping contiguity within a multi-page
region.

Move the kernel to the dedicated numbered cpuset #2 making it possible
to assign kernel threads and memory policy separately from user. This
also eliminates the need for the complicated interrupt binding code.

Add a sysctl API for viewing and manipulating domainsets. Refactor some
of the cpuset_t manipulation code using the generic bitset type so that
it can be used for both. This probably belongs in a dedicated subr file.

Attempt to improve the include situation.

Reviewed by: kib
Discussed with: jhb (cpuset parts)
Tested by: pho (before review feedback)
Sponsored by: Netflix, Dell/EMC Isilon
Differential Revision: https://reviews.freebsd.org/D14839


# 3f289c3f 12-Jan-2018 Jeff Roberson <jeff@FreeBSD.org>

Implement 'domainset', a cpuset based NUMA policy mechanism. This allows
userspace to control NUMA policy administratively and programmatically.

Implement domainset based iterators in the page layer.

Remove the now legacy numa_* syscalls.

Cleanup some header polution created by having seq.h in proc.h.

Reviewed by: markj, kib
Discussed with: alc
Tested by: pho
Sponsored by: Netflix, Dell/EMC Isilon
Differential Revision: https://reviews.freebsd.org/D13403


# c4e20cad 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/sys: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.


# 29dfb631 03-May-2017 Conrad Meyer <cem@FreeBSD.org>

Extend cpuset_get/setaffinity() APIs

Add IRQ placement-only and ithread-only API variants. intr_event_bind
has been extended with sibling methods, as it has many more callsites in
existing code.

Reviewed by: kib@, adrian@ (earlier version)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D10586


# bdc29f65 14-Jul-2016 John Baldwin <jhb@FreeBSD.org>

Move nested include of <sys/queue.h> inside _KERNEL.

This removes namespace pollution for userland brought in by r299122.

PR: 210319
Submitted by: knu
MFC after: 1 week


# f8e81aa5 05-May-2016 John Baldwin <jhb@FreeBSD.org>

Fix <sys/_bitset.h> and <sys/_cpuset.h> to not require <sys/param.h>.

- Hardcode '8' instead of NBBY in _BITSET_BITS.
- Define a private version of 'howmany' for use in __bitset_words().
- While here, move a few more things out of _bitset.h and _cpuset.h to
bitset.h and cpuset.h, respectively. The only things left in
_bitset.h and _cpuset.h are the bits needed to define a bitset
structure.

Reviewed by: bde, kib (ish)


# 39b160b3 09-Jul-2015 Ed Schouten <ed@FreeBSD.org>

Add forward declaration of struct thread.

This structure is used in some of the functions in this header, but we
don't depend on any header that pulls it i.


# 5bbb2169 25-Jun-2015 Adrian Chadd <adrian@FreeBSD.org>

Un-static cpuset_which() - it's useful in other contexts, such as some
CPU set operations in my upcoming NUMA work.

Tested/compiled:

* i386 (run)
* amd64 (run)
* mips (run)
* mips64 (run)
* armv6 (built)

Sponsored by: Norse Corp, Inc.


# 070b4903 09-Feb-2015 John Baldwin <jhb@FreeBSD.org>

Use __builtin_popcnt() to implement a BIT_COUNT() operation for bitsets and
use this to implement CPU_COUNT() to count the number of CPUs in a cpuset.

MFC after: 2 weeks


# c0ae6688 08-Jan-2015 John Baldwin <jhb@FreeBSD.org>

Create a cpuset mask for each NUMA domain that is available in the
kernel via the global cpuset_domain[] array. To export these to userland,
add a CPU_WHICH_DOMAIN level that can be used to fetch the mask for a
specific domain. Add a -d flag to cpuset(1) that can be used to fetch
the mask for a given domain.

Differential Revision: https://reviews.freebsd.org/D1232
Submitted by: jeff (kernel bits)
Reviewed by: adrian, jeff


# e011dc96 20-Oct-2014 Neel Natu <neel@FreeBSD.org>

Merge from projects/bhyve_svm all the changes outside vmm.ko or bhyve utilities:

Add support for AMD's nested page tables in pmap.c:
- Provide the correct bit mask for various bit fields in a PTE (e.g. valid bit)
for a pmap of type PT_RVI.
- Add a function 'pmap_type_guest(pmap)' that returns TRUE if the pmap is of
type PT_EPT or PT_RVI.

Add CPU_SET_ATOMIC_ACQ(num, cpuset):
This is used when activating a vcpu in the nested pmap. Using the 'acquire'
variant guarantees that the load of the 'pm_eptgen' will happen only after
the vcpu is activated in 'pm_active'.

Add defines for various AMD-specific MSRs.

Submitted by: Anish Gupta (akgupt3@gmail.com)


# 7f7528fc 15-Sep-2014 Adrian Chadd <adrian@FreeBSD.org>

Modify cpuset_setithread() to take a CPU ID as an integer, not a char.

We're going to end up having > 254 CPUs at some point.


# a2684814 06-Sep-2014 Neel Natu <neel@FreeBSD.org>

Do proper ASID management for guest vcpus.

Prior to this change an ASID was hard allocated to a guest and shared by all
its vcpus. The meant that the number of VMs that could be created was limited
to the number of ASIDs supported by the CPU. It was also inefficient because
it forced a TLB flush on every VMRUN.

With this change the number of guests that can be created is independent of
the number of available ASIDs. Also, the TLB is flushed only when a new ASID
is allocated.

Discussed with: grehan
Reviewed by: Anish Gupta (akgupt3@gmail.com)


# 81198539 22-Jun-2014 Alexander V. Chernikov <melifaro@FreeBSD.org>

Permit changing cpu mask for cpu set 1 in presence of drivers
binding their threads to particular CPU.

Changing ithread cpu mask is now performed by special cpuset_setithread().
It creates additional cpuset root group on first bind invocation.

No objection: jhb
Tested by: hiren
MFC after: 2 weeks
Sponsored by: Yandex LLC


# a0887a4c 30-Aug-2013 Konstantin Belousov <kib@FreeBSD.org>

Add BIT_AND_ATOMIC() and CPU_AND_ATOMIC().

Sponsored by: The FreeBSD Foundation
Reviewed by: alc
Tested by: pho, bf


# cd32bd7a 25-Jun-2013 John Baldwin <jhb@FreeBSD.org>

Several improvements to rmlock(9). Many of these are based on patches
provided by Isilon.
- Add an rm_assert() supporting various lock assertions similar to other
locking primitives. Because rmlocks track readers the assertions are
always fully accurate unlike rw_assert() and sx_assert().
- Flesh out the lock class methods for rmlocks to support sleeping via
condvars and rm_sleep() (but only while holding write locks), rmlock
details in 'show lock' in DDB, and the lc_owner method used by
dtrace.
- Add an internal destroyed cookie so that API functions can assert
that an rmlock is not destroyed.
- Make use of rm_assert() to add various assertions to the API (e.g.
to assert locks are held when an unlock routine is called).
- Give RM_SLEEPABLE locks their own lock class and always use the
rmlock's own lock_object with WITNESS.
- Use THREAD_NO_SLEEPING() / THREAD_SLEEPING_OK() to disallow sleeping
while holding a read lock on an rmlock.

Submitted by: andre
Obtained from: EMC/Isilon


# 17a27377 13-Jun-2013 Jeff Roberson <jeff@FreeBSD.org>

- Add a BIT_FFS() macro and use it to replace cpusetffs_obj()

Discussed with: attilio
Sponsored by: EMC / Isilon Storage Division


# 75363312 08-May-2013 Attilio Rao <attilio@FreeBSD.org>

Generalize the bitset operations, present in cpuset and offer a KPI to
redefine such operations for different consumers.
This will be used when NUMA support will be finished and numaset
will need to be used.

Sponsored by: EMC / Isilon storage division
Obtained from: jeff
Reviewed by: alc


# 3907c073 12-Mar-2012 Alexander Motin <mav@FreeBSD.org>

Tune cpuset macros to optimize cases when CPU_SETSIZE fits into single
machine word. For example, it turns CPU_SET() into expected shift and OR,
removing two extra shifts and additional index on memory access.

Generated code checked for kernel (optimized) and user-level (unoptimized)
cases with GCC and CLANG.

Reviewed by: attilio
MFC after: 2 weeks


# 678d0423 03-Jul-2011 Attilio Rao <attilio@FreeBSD.org>

- Remove the now unused CPU_NAND_ATOMIC()
- Add a comment explaining that CPU_OR_ATOMIC() and
CPU_COPY_STORE_REL() are special wrappers used to cater particular
cases.


# e3709597 31-May-2011 Attilio Rao <attilio@FreeBSD.org>

Fix KTR_CPUMASK in order to accept a string representing a cpuset_t.
This introduce all the underlying support for making this possible (via
the function cpusetobj_strscan() and keeps ktr_cpumask exported. sparc64
implements its own assembly primitives for tracing events and needs to
properly check it. Anyway the sparc64 logic is not implemented yet due
to lack of knowledge (by me) and time (by marius), but it is just a
matter of using ktr_cpumask when possible.

Tested and fixed by: pluknet
Reviewed by: marius


# d0984adc 31-May-2011 Attilio Rao <attilio@FreeBSD.org>

Revert a change that crept in during MFC.


# 8e8b0e46 29-May-2011 Attilio Rao <attilio@FreeBSD.org>

Remove the unnecessary _KERNEL protection


# 217e1c0e 23-May-2011 Attilio Rao <attilio@FreeBSD.org>

Revert a patch that unvolountary sneaked in while I was MFCing.


# 71a19bdc 05-May-2011 Attilio Rao <attilio@FreeBSD.org>

Commit the support for removing cpumask_t and replacing it directly with
cpuset_t objects.
That is going to offer the underlying support for a simple bump of
MAXCPU and then support for number of cpus > 32 (as it is today).

Right now, cpumask_t is an int, 32 bits on all our supported architecture.
cpumask_t on the other side is implemented as an array of longs, and
easilly extendible by definition.

The architectures touched by this commit are the following:
- amd64
- i386
- pc98
- arm
- ia64
- XEN

while the others are still missing.
Userland is believed to be fully converted with the changes contained
here.

Some technical notes:
- This commit may be considered an ABI nop for all the architectures
different from amd64 and ia64 (and sparc64 in the future)
- per-cpu members, which are now converted to cpuset_t, needs to be
accessed avoiding migration, because the size of cpuset_t should be
considered unknown
- size of cpuset_t objects is different from kernel and userland (this is
primirally done in order to leave some more space in userland to cope
with KBI extensions). If you need to access kernel cpuset_t from the
userland please refer to example in this patch on how to do that
correctly (kgdb may be a good source, for example).
- Support for other architectures is going to be added soon
- Only MAXCPU for amd64 is bumped now

The patch has been tested by sbruno and Nicholas Esborn on opteron
4 x 12 pack CPUs. More testing on big SMP is expected to came soon.
pluknet tested the patch with his 8-ways on both amd64 and i386.

Tested by: pluknet, sbruno, gianni, Nicholas Esborn
Reviewed by: jeff, jhb, sbruno


# 86702289 04-May-2011 Attilio Rao <attilio@FreeBSD.org>

Revert comment introduction. It seems I'm the only one who didn't know
of the size mismatch.


# 3d9f29d0 03-May-2011 Attilio Rao <attilio@FreeBSD.org>

Add a comment explaining the discrepancy in size between kernel and
userland of cpuset_t.

Discussed with: jhb


# 6106275b 03-May-2011 Attilio Rao <attilio@FreeBSD.org>

Make CPU_MAXSETSIZE dependant by MAXCPU as well.

Discussed with: jhb


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# c8eb786c 23-Jun-2009 Jeff Roberson <jeff@FreeBSD.org>

- Add a new cpuset macro, CPU_FILL(), for setting the set to all 1s.


# 0304c731 27-May-2009 Jamie Gritton <jamie@FreeBSD.org>

Add hierarchical jails. A jail may further virtualize its environment
by creating a child jail, which is visible to that jail and to any
parent jails. Child jails may be restricted more than their parents,
but never less. Jail names reflect this hierarchy, being MIB-style
dot-separated strings.

Every thread now points to a jail, the default being prison0, which
contains information about the physical system. Prison0's root
directory is the same as rootvnode; its hostname is the same as the
global hostname, and its securelevel replaces the global securelevel.
Note that the variable "securelevel" has actually gone away, which
should not cause any problems for code that properly uses
securelevel_gt() and securelevel_ge().

Some jail-related permissions that were kept in global variables and
set via sysctls are now per-jail settings. The sysctls still exist for
backward compatibility, used only by the now-deprecated jail(2) system
call.

Approved by: bz (mentor)


# 413628a7 29-Nov-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

MFp4:
Bring in updated jail support from bz_jail branch.

This enhances the current jail implementation to permit multiple
addresses per jail. In addtion to IPv4, IPv6 is supported as well.
Due to updated checks it is even possible to have jails without
an IP address at all, which basically gives one a chroot with
restricted process view, no networking,..

SCTP support was updated and supports IPv6 in jails as well.

Cpuset support permits jails to be bound to specific processor
sets after creation.

Jails can have an unrestricted (no duplicate protection, etc.) name
in addition to the hostname. The jail name cannot be changed from
within a jail and is considered to be used for management purposes
or as audit-token in the future.

DDB 'show jails' command was added to aid debugging.

Proper compat support permits 32bit jail binaries to be used on 64bit
systems to manage jails. Also backward compatibility was preserved where
possible: for jail v1 syscalls, as well as with user space management
utilities.

Both jail as well as prison version were updated for the new features.
A gap was intentionally left as the intermediate versions had been
used by various patches floating around the last years.

Bump __FreeBSD_version for the afore mentioned and in kernel changes.

Special thanks to:
- Pawel Jakub Dawidek (pjd) for his multi-IPv4 patches
and Olivier Houchard (cognet) for initial single-IPv6 patches.
- Jeff Roberson (jeff) and Randall Stewart (rrs) for their
help, ideas and review on cpuset and SCTP support.
- Robert Watson (rwatson) for lots and lots of help, discussions,
suggestions and review of most of the patch at various stages.
- John Baldwin (jhb) for his help.
- Simon L. Nielsen (simon) as early adopter testing changes
on cluster machines as well as all the testers and people
who provided feedback the last months on freebsd-jail and
other channels.
- My employer, CK Software GmbH, for the support so I could work on this.

Reviewed by: (see above)
MFC after: 3 months (this is just so that I get the mail)
X-MFC Before: 7.2-RELEASE if possible


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# 9b33b154 10-Apr-2008 Jeff Roberson <jeff@FreeBSD.org>

- Add the interrupt vector number to intr_event_create so MI code can
lookup hard interrupt events by number. Ignore the irq# for soft intrs.
- Add support to cpuset for binding hardware interrupts. This has the
side effect of binding any ithread associated with the hard interrupt.
As per restrictions imposed by MD code we can only bind interrupts to
a single cpu presently. Interrupts can be 'unbound' by binding them
to all cpus.

Reviewed by: jhb
Sponsored by: Nokia


# 3bc8c68d 03-Apr-2008 Jeff Roberson <jeff@FreeBSD.org>

- Add a Nokia copyright to cpuset to reflect their generous
contribution to this work.


# a03ee000 30-Mar-2008 Jeff Roberson <jeff@FreeBSD.org>

- Consistently return EDEADLK when presented with a new set that is
incompatible with existing bindings.
- Try to copyout the setid in cpuset() before migrating the proc to the
setid in case the user has supplied a bad buffer.
- Rename cpuset_root() and cpuset_base() to cpuset_ref{root,base} to
be more descriptive and free cpuset_root to be used as a different
type of symbol.
- Make cpuset_root the cpuset_t set of all cpus in the system. This
should contain the same bitmask as all_cpus presently.
- Add a CPU_CMP() macro to compare two sets.


# 7f64829a 25-Mar-2008 Ruslan Ermilov <ru@FreeBSD.org>

Fixed type of the fourth argument of cpuset_{get,set}affinity(2) to be size_t.

Prodded by: davidxu


# 83660cf9 12-Mar-2008 David Xu <davidxu@FreeBSD.org>

Add const qualifier to cpuset mask's pointer, since the cpuset mask should
be not changed by the system call.


# 73c40187 04-Mar-2008 Jeff Roberson <jeff@FreeBSD.org>

- Verify that when a user supplies a mask that is bigger than the kernel
mask none of the upper bits are set.
- Be more careful about enforcing the boundaries of masks and child sets.
- Introduce a few more CPU_* macros for implementing these tests.
- Change the cpusetsize argument to be bytes rather than bits to match
other apis.

Sponsored by: Nokia


# d7f687fc 02-Mar-2008 Jeff Roberson <jeff@FreeBSD.org>

Add cpuset, an api for thread to cpu binding and cpu resource grouping
and assignment.
- Add a reference to a struct cpuset in each thread that is inherited from
the thread that created it.
- Release the reference when the thread is destroyed.
- Add prototypes for syscalls and macros for manipulating cpusets in
sys/cpuset.h
- Add syscalls to create, get, and set new numbered cpusets:
cpuset(), cpuset_{get,set}id()
- Add syscalls for getting and setting affinity masks for cpusets or
individual threads: cpuid_{get,set}affinity()
- Add types for the 'level' and 'which' parameters for the cpuset. This
will permit expansion of the api to cover cpu masks for other objects
identifiable with an id_t integer. For example, IRQs and Jails may be
coming soon.
- The root set 0 contains all valid cpus. All thread initially belong to
cpuset 1. This permits migrating all threads off of certain cpus to
reserve them for special applications.

Sponsored by: Nokia
Discussed with: arch, rwatson, brooks, davidxu, deischen
Reviewed by: antoine