Cross Reference: /freebsd-current/sys/kern/kern

History log of /freebsd-current/sys/kern/kern_cpuset.c
Revision	Date	Author	Comments
# 96c8b3e5	06-Apr-2024	Jake Freeland <jfree@FreeBSD.org>	ktrace: Record cpuset violations with KTR_CAPFAIL Report Capsicum violations in the cpuset namespace with CAPFAIL_CPUSET. Reviewed by: markj Approved by: markj (mentor) MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D40677
# 685dc743	16-Aug-2023	Warner Losh <imp@FreeBSD.org>	sys: Remove $FreeBSD$: one-line .c pattern Remove /^[\s]__FBSDID$"\$FreeBSD\$"$;?\s*\n/
# 4d846d26	10-May-2023	Warner Losh <imp@FreeBSD.org>	spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch up to that fact and revert to their recommended match of BSD-2-Clause. Discussed with: pfg MFC After: 3 days Sponsored by: Netflix
# 2058f075	30-Jan-2023	Dmitry Chagin <dchagin@FreeBSD.org>	cpuset: Handle CPU_WHICH_TIDPID wherever cpuset_which() is called. cpuset_which() resolves the argument pair which and id and returns references to an appropriate resources. To avoid leaking resources or accessing unresolved references to a resources handle new which CPU_WHICH_TIDPID wherever cpuset_which() is called. To avoid code duplication cpuset_which2() has been added. Reported by: syzbot+331e8402e0f7347f0f2a@syzkaller.appspotmail.com Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D38272 MFC after: 2 weeks
# c21b080f	29-Jan-2023	Dmitry Chagin <dchagin@FreeBSD.org>	cpuset: Fix sched_[g\|s]etaffinity() for better compatibility with Linux. Under Linux to sched_[g\|s]etaffinity() functions the value returned from a call to gettid(2) (thread id) can be passed in the argument pid. Specifying pid as 0 will set the attribute for the calling thread, and passing the value returned from a call to getpid(2) (process id) will set the attribute for the main thread of the thread group. Native cpuset(2) family of system calls has "which" argument to determine how the value of id argument is interpreted, i.e., CPU_WHICH_TID is used to pass a thread id and CPU_WHICH_PID - to pass a process id. For now native sched_[g\|s]etaffinity() implementation is wrong as uses "which" CPU_WHICH_PID to pass both (process and thread id) to the kernel. To fix this adding a new "which" CPU_WHICH_TIDPID intended to handle both id's. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D38209 MFC after: 1 week
# 01f74ccd	29-Jan-2023	Dmitry Chagin <dchagin@FreeBSD.org>	libthr: Fix pthread_attr_[g\|s]etaffinity_np to match it's manual and the kernel. Since f35093f8 semantics of a thread affinity functions is changed to be a compatible with Linux: In case of getaffinity(), the minimum cpuset_t size that the kernel permits is the maximum CPU id, present in the system, / NBBY bytes, the maximum size is not limited. In case of setaffinity(), the kernel does not limit the size of the user-provided cpuset_t, internally using only the meaningful part of the set, where the upper bound is the maximum CPU id, present in the system, no larger than the size of the kernel cpuset_t. To match pthread_attr_[g\|s]etaffinity_np checks of the user-provided cpusets to the kernel behavior export the minimum cpuset_t size allowed by running kernel via new sysctl kern.sched.cpusetsizemin and use it in checks. Reviewed by: Differential Revision: https://reviews.freebsd.org/D38112 MFC after: 1 week
# db79bf75	03-Oct-2022	Alfredo Dal'Ava Junior <alfredo@FreeBSD.org>	powerpc: cpuset: add local functions for copyin/copyout Add local functions to workaround an instruction segment trap (panic) when the indirect functions copyin and copyout are called by an external loadable kernel module (i.e. pfsync, zfs and linuxulator). The crash was triggered by change 47a57144af25a7bd768b29272d50a36fdf2874ba, but kernel binary linked with LLD 9 works fine. LLVM bisect points that LLD behavior chaged after dc06b0bc9ad055d06535462d91bfc2a744b2f589. This is know to affect powerpc targets only and the final fix is still being discussed with the LLVM community. PR: 266730 Reviewed by: luporl, jhibbits (on IRC, previous version) MFC after: 2 days Sponsored by: Instituto de Pesquisas Eldorado (eldorado.org.br) Differential Revision: https://reviews.freebsd.org/D36234
# c84c5e00	18-Jul-2022	Mitchell Horne <mhorne@FreeBSD.org>	ddb: annotate some commands with DB_CMD_MEMSAFE This is not completely exhaustive, but covers a large majority of commands in the tree. Reviewed by: markj Sponsored by: Juniper Networks, Inc. Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D35583
# d46174cd	28-May-2022	Dmitry Chagin <dchagin@FreeBSD.org>	Finish cpuset_getaffinity() after f35093f8 Split cpuset_getaffinity() into a two counterparts, where the user_cpuset_getaffinity() is intended to operate on the cpuset_t from user va, while kern_cpuset_getaffinity() expects the cpuset from kernel va. Accordingly, the code that clears the high bits is moved to the user_cpuset_getaffinity(). Linux sched_getaffinity() syscall returns the size of set copied to the user-space and then glibc wrapper clears the high bits. MFC after: 2 weeks
# 4a3e5133	20-May-2022	Mark Johnston <markj@FreeBSD.org>	cpuset: Fix the KASAN and KMSAN builds Rename the "copyin" and "copyout" fields of struct cpuset_copy_cb to something less generic, since sanitizers define interceptors for copyin() and copyout() using #define. Reported by: syzbot+2db5d644097fc698fb6f@syzkaller.appspotmail.com Fixes: 47a57144af25 ("cpuset: Byte swap cpuset for compat32 on big endian architectures") Sponsored by: The FreeBSD Foundation
# 47a57144	12-May-2022	Justin Hibbits <jhibbits@FreeBSD.org>	cpuset: Byte swap cpuset for compat32 on big endian architectures Summary: BITSET uses long as its basic underlying type, which is dependent on the compile type, meaning on 32-bit builds the basic type is 32 bits, but on 64-bit builds it's 64 bits. On little endian architectures this doesn't matter, because the LSB is always at the low bit, so the words get effectively concatenated moving between 32-bit and 64-bit, but on big-endian architectures it throws a wrench in, as setting bit 0 in 32-bit mode is equivalent to setting bit 32 in 64-bit mode. To demonstrate: 32-bit mode: BIT_SET(foo, 0): 0x00000001 64-bit sees: 0x0000000100000000 cpuset is the only system interface that uses bitsets, so solve this by swapping the integer sub-components at the copyin/copyout points. Reviewed by: kib MFC after: 3 days Sponsored by: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D35225
# 586ed321	11-May-2022	Dmitry Chagin <dchagin@FreeBSD.org>	kdump: Decode cpuset_t. Reviewed by: jhb Differential revision: https://reviews.freebsd.org/D34982 MFC after: 2 weeks
# f35093f8	11-May-2022	Dmitry Chagin <dchagin@FreeBSD.org>	Use Linux semantics for the thread affinity syscalls. Linux has more tolerant checks of the user supplied cpuset_t's. Minimum cpuset_t size that the Linux kernel permits in case of getaffinity() is the maximum CPU id, present in the system / NBBY, the maximum size is not limited. For setaffinity(), Linux does not limit the size of the user-provided cpuset_t, internally using only the meaningful part of the set, where the upper bound is the maximum CPU id, present in the system, no larger than the size of the kernel cpuset_t. Unlike FreeBSD, Linux ignores high bits if set in the setaffinity(), so clear it in the sched_setaffinity() and Linuxulator itself. Reviewed by: Pau Amma (man pages) In collaboration with: jhb Differential revision: https://reviews.freebsd.org/D34849 MFC after: 2 weeks
# e2650af1	29-Dec-2021	Stefan Eßer <se@FreeBSD.org>	Make CPU_SET macros compliant with other implementations The introduction of <sched.h> improved compatibility with some 3rd party software, but caused the configure scripts of some ports to assume that they were run in a GLIBC compatible environment. Parts of sched.h were made conditional on -D_WITH_CPU_SET_T being added to ports, but there still were compatibility issues due to invalid assumptions made in autoconfigure scripts. The differences between the FreeBSD version of macros like CPU_AND, CPU_OR, etc. and the GLIBC versions was in the number of arguments: FreeBSD used a 2-address scheme (one source argument is also used as the destination of the operation), while GLIBC uses a 3-adderess scheme (2 source operands and a separately passed destination). The GLIBC scheme provides a super-set of the functionality of the FreeBSD macros, since it does not prevent passing the same variable as source and destination arguments. In code that wanted to preserve both source arguments, the FreeBSD macros required a temporary copy of one of the source arguments. This patch set allows to unconditionally provide functions and macros expected by 3rd party software written for GLIBC based systems, but breaks builds of externally maintained sources that use any of the following macros: CPU_AND, CPU_ANDNOT, CPU_OR, CPU_XOR. One contributed driver (contrib/ofed/libmlx5) has been patched to support both the old and the new CPU_OR signatures. If this commit is merged to -STABLE, the version test will have to be extended to cover more ranges. Ports that have added -D_WITH_CPU_SET_T to build on -CURRENT do no longer require that option. The FreeBSD version has been bumped to 1400046 to reflect this incompatible change. Reviewed by: kib MFC after: 2 weeks Relnotes: yes Differential Revision: https://reviews.freebsd.org/D33451
# 29bb6c19	13-Apr-2021	Mark Johnston <markj@FreeBSD.org>	domainset: Define additional global policies Add global definitions for first-touch and interleave policies. The former may be useful for UMA, which implements a similar policy without using domainset iterators. No functional change intended. Reviewed by: mav MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D29104
# 60c4ec80	26-Feb-2021	Kyle Evans <kevans@FreeBSD.org>	jail: allow root to implicitly widen its cpuset to attach The default behavior for attaching processes to jails is that the jail's cpuset augments the attaching processes, so that it cannot be used to escalate a user's ability to take advantage of more CPUs than the administrator wanted them to. This is problematic when root needs to manage jails that have disjoint sets with whatever process is attaching, as this would otherwise result in a deadlock. Therefore, if we did not have an appropriate common subset of cpus/domains for our new policy, we now allow the process to simply take on the jail set if it has the privilege to widen its mask anyways. With the new logic, root can still usefully cpuset a process that attaches to a jail with the desire of maintaining the set it was given pre-attachment while still retaining the ability to manage child jails without jumping through hoops. A test has been added to demonstrate the issue; cpuset of a process down to just the first CPU and attempting to attach to a jail without access to any of the same CPUs previously resulted in EDEADLK and now results in taking on the jail's mask for privileged users. PR: 253724 Reviewed by: jamie (also discussed with) MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D28952
# 54a837c8	18-Dec-2020	Kyle Evans <kevans@FreeBSD.org>	kern: cpuset: allow jails to modify child jails' roots This partially lifts a restriction imposed by r191639 ("Prevent a superuser inside a jail from modifying the dedicated root cpuset of that jail") that's perhaps beneficial after r192895 ("Add hierarchical jails."). Jails still cannot modify their own cpuset, but they can modify child jails' roots to further restrict them or widen them back to the modifying jails' own mask. As a side effect of this, the system root may once again widen the mask of jails as long as they're still using a subset of the parent jails' mask. This was previously prevented by the fact that cpuset_getroot of a root set will return that root, rather than the root's parent -- cpuset_modify uses cpuset_getroot since it was introduced in r327895, previously it was just validating against set->cs_parent which allowed the system root to widen jail masks. Reviewed by: jamie MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D27352
# f1b18a66	08-Dec-2020	Kyle Evans <kevans@FreeBSD.org>	cpuset_set{affinity,domain}: do not allow empty masks cpuset_modify() would not currently catch this, because it only checks that the new mask is a subset of the root set and circumvents the EDEADLK check in cpuset_testupdate(). This change both directly validates the mask coming in since we can trivially detect an empty mask, and it updates cpuset_testupdate to catch stuff like this going forward by always ensuring we don't end up with an empty mask. The check_mask argument has been renamed because the 'check' verbiage does not imply to me that it's actually doing a different operation. We're either augmenting the existing mask, or we are replacing it entirely. Reported by: syzbot+4e3b1009de98d2fabcda@syzkaller.appspotmail.com Discussed with: andrew Reviewed by: andrew, markj MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D27511
# b2780e85	08-Dec-2020	Kyle Evans <kevans@FreeBSD.org>	kern: cpuset: resolve race between cpuset_lookup/cpuset_rel The race plays out like so between threads A and B: 1. A ref's cpuset 10 2. B does a lookup of cpuset 10, grabs the cpuset lock and searches cpuset_ids 3. A rel's cpuset 10 and observes the last ref, waits on the cpuset lock while B is still searching and not yet ref'd 4. B ref's cpuset 10 and drops the cpuset lock 5. A proceeds to free the cpuset out from underneath B Resolve the race by only releasing the last reference under the cpuset lock. Thread A now picks up the spinlock and observes that the cpuset has been revived, returning immediately for B to deal with later. Reported by: syzbot+92dff413e201164c796b@syzkaller.appspotmail.com Reviewed by: markj MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D27498
# 9c83dab9	08-Dec-2020	Kyle Evans <kevans@FreeBSD.org>	kern: cpuset: plug a unr leak cpuset_rel_defer() is supposed to be functionally equivalent to cpuset_rel() but with anything that might sleep deferred until cpuset_rel_complete -- this setup is used specifically for cpuset_setproc. Add in the missing unr free to match cpuset_rel. This fixes a leak that was observed when I wrote a small userland application to try and debug another issue, which effectively did: cpuset(&newid); cpuset(&scratch); newid gets leaked when scratch is created; it's off the list, so there's no mechanism for anything else to relinquish it. A more realistic reproducer would likely be a process that inherits some cpuset that it's the only ref for, but it creates a new one to modify. Alternatively, administratively reassigning a process' cpuset that it's the last ref for will have the same effect. Discovered through D27498. MFC after: 1 week
# e07e3fa3	27-Nov-2020	Kyle Evans <kevans@FreeBSD.org>	kern: cpuset: drop the lock to allocate domainsets Restructure the loop a little bit to make it a little more clear how it really operates: we never allocate any domains at the beginning of the first iteration, and it will run until we've satisfied the amount we need or we encounter an error. The lock is now taken outside of the loop to make stuff inside the loop easier to evaluate w.r.t. locking. This fixes it to not try and allocate any domains for the freelist under the spinlock, which would have happened before if we needed any new domains. Reported by: syzbot+6743fa07b9b7528dc561@syzkaller.appspotmail.com Reviewed by: markj MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D27371
# d431dea5	24-Nov-2020	Kyle Evans <kevans@FreeBSD.org>	kern: cpuset: properly rebase when attaching to a jail The current logic is a fine choice for a system administrator modifying process cpusets or a process creating a new cpuset(2), but not ideal for processes attaching to a jail. Currently, when a process attaches to a jail, it does exactly what any other process does and loses any mask it might have applied in the process of doing so because cpuset_setproc() is entirely based around the assumption that non-anonymous cpusets in the process can be replaced with the new parent set. This approach slightly improves the jail attach integration by modifying cpuset_setproc() callers to indicate if they should rebase their cpuset to the indicated set or not (i.e. cpuset_setproc_update_set). If we're rebasing and the process currently has a cpuset assigned that is not the containing jail's root set, then we will now create a new base set for it hanging off the jail's root with the existing mask applied instead of using the jail's root set as the new base set. Note that the common case will be that the process doesn't have a cpuset within the jail root, but the system root can freely assign a cpuset from a jail to a process outside of the jail with no restriction. We assume that that may have happened or that it could happen due to a race when we drop the proc lock, so we must recheck both within the loop to gather up sufficient freed cpusets and after the loop. To recap, here's how it worked before in all cases: 0 4 <-- jail 0 4 <-- jail / process \| \| 1 -> 1 \| 3 <-- process Here's how it works now: 0 4 <-- jail 0 4 <-- jail \| \| \| 1 -> 1 5 <-- process \| 3 <-- process or 0 4 <-- jail 0 4 <-- jail / process \| \| 1 <-- process -> 1 More importantly, in both cases, the attaching process still retains the mask it had prior to attaching or the attach fails with EDEADLK if it's left with no CPUs to run on or the domain policy is incompatible. The author of this patch considers this almost a security feature, because a MAC policy could grant PRIV_JAIL_ATTACH to an unprivileged user that's restricted to some subset of available CPUs the ability to attach to a jail, which might lift the user's restrictions if they attach to a jail with a wider mask. In most cases, it's anticipated that admins will use this to be able to, for example, `cpuset -c -l 1 jail -c path=/ command=/long/running/cmd`, and avoid the need for contortions to spawn a command inside a jail with a more limited cpuset than the jail. Reviewed by: jamie MFC after: 1 month (maybe) Differential Revision: https://reviews.freebsd.org/D27298
# 30b7c6f9	24-Nov-2020	Kyle Evans <kevans@FreeBSD.org>	kern: cpuset: rename _cpuset_create() to cpuset_init() cpuset_init() is better descriptor for what the function actually does. The name was previously taken by a sysinit that setup cpuset_zero's mask from all_cpus, it was removed in r331698 before stable/12 branched. A comment referencing the removed sysinit has now also been removed, since the setup previously done was moved into cpuset_thread0(). Suggested by: markj MFC after: 1 week
# 29d04ea8	24-Nov-2020	Kyle Evans <kevans@FreeBSD.org>	kern: cpuset: allow cpuset_create() to take an allocated setp Currently, it must always allocate a new set to be used for passing to _cpuset_create, but it doesn't have to. This is purely kern_cpuset.c internal and it's sparsely used, so just change it to use setp if it's not-NULL and modify the two consumers to pass in the address of a NULL cpuset. This paves the way for consumers that want the unr allocation without the possibility of sleeping as long as they've done their due diligence to ensure that the mask will properly apply atop the supplied parent (i.e. avoiding the free_unr() in the last failure path). Reviewed by: jamie, markj MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D27297
# dac521eb	22-Nov-2020	Kyle Evans <kevans@FreeBSD.org>	cpuset_setproc: use the appropriate parent for new anonymous sets As far as I can tell, this has been the case since initially committed in 2008. cpuset_setproc is the executor of cpuset reassignment; note this excerpt from the description: * 1) Set is non-null. This reparents all anonymous sets to the provided * set and replaces all non-anonymous td_cpusets with the provided set. However, reviewing cpuset_setproc_setthread() for some jail related work unearthed the error: if tdset was not anonymous, we were replacing it with `set`. If it was anonymous, then we'd rebase it onto `set` (i.e. copy the thread's mask over and AND it with `set`) but give the new anonymous set the original tdset as the parent (i.e. the base of the set we're supposed to be leaving behind). The primary visible consequences were that: 1.) cpuset_getid() following such assignment returns the wrong result, the setid that we left behind rather than the one we joined. 2.) When a process attached to the jail, the base set of any anonymous threads was a set outside of the jail. This was initially bundled in D27298, but it's a minor fix that's fairly easy to verify the correctness of. A test is included in D27307 ("badparent"), which demonstrates the issue with, effectively: osetid = cpuset_getid() newsetid = cpuset() cpuset_setaffinity(thread) cpuset_setid(osetid) cpuset_getid(thread) -> observe that it matches newsetid instead of osetid. MFC after: 1 week
# 1a7bb896	16-Nov-2020	Mateusz Guzik <mjg@FreeBSD.org>	cpuset: refcount-clean
# 6fed89b1	01-Sep-2020	Mateusz Guzik <mjg@FreeBSD.org>	kern: clean up empty lines in .c and .h files
# 69b565d7	06-Jul-2020	Mark Johnston <markj@FreeBSD.org>	Allow accesses of the caller's CPU and domain sets in capability mode. cpuset_(get\|set)(affinity\|domain)(2) permit a get or set of the calling thread or process' CPU and domain set in capability mode, but only when the thread or process ID is specified as -1. Extend this to cover the case where the ID actually matches the caller's TID or PID, since some code, such as our pthread_attr_get_np() implementation, always provides an explicit ID. It was not and still is not permitted to access CPU and domain sets for other threads in the same process when the process is in capability mode. This might change in the future. Submitted by: Greg V <greg@unrelenting.technology> (original version) MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D25552
# 9eb997cb	06-Jul-2020	Mark Johnston <markj@FreeBSD.org>	Lift cpuset Capsicum checks into a subroutine. Otherwise the same checks are duplicated across four different system call implementations, cpuset_(get\|set)(affinity\|domain)(). No functional change intended. MFC after: 1 week Sponsored by: The FreeBSD Foundation
# 9825eadf	13-Dec-2019	Ryan Libby <rlibby@FreeBSD.org>	bitset: rename confusing macro NAND to ANDNOT s/BIT_NAND/BIT_ANDNOT/, and for CPU and DOMAINSET too. The actual implementation is "and not" (or "but not"), i.e. A but not B. Fortunately this does appear to be what all existing callers want. Don't supply a NAND (not (A and B)) operation at this time. Discussed with: jeff Reviewed by: cem Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D22791
# 45cdd437	12-Sep-2019	Mark Johnston <markj@FreeBSD.org>	Remove a redundant NULL pointer check in cpuset_modify_domain(). cpuset_getroot() is guaranteed to return a non-NULL pointer. Reported by: Mark Millard <marklmi@yahoo.com> MFC after: 1 week Sponsored by: The FreeBSD Foundation
# d57cd5cc	05-Sep-2019	Stephen J. Kiernan <stevek@FreeBSD.org>	Bump up the low range of cpuset numbers to account for the kernel cpuset. Reviewed by: jeff Obtained from: Juniper Networks, Inc.
# 63cdd18e	01-Sep-2019	Mark Johnston <markj@FreeBSD.org>	Restrict the input domain set in cpuset_setdomain(2) to all_domains. To permit larger values of MAXMEMDOM, which is currently 8 on amd64, cpuset_setdomain(2) accepts a mask of size 256. In the kernel, domain set masks are 64 bits wide, but can only represent a set of MAXMEMDOM domains due to the use of the ds_order table. Domain sets passed to cpuset_setdomain(2) are restricted to a subset of their parent set, which is typically the root set, but before this happens we modify the input set to exclude empty domains. domainset_empty_vm() and other code which manipulates domain sets expect the mask to be a subset of all_domains, so enforce that when performing validation of cpuset_setdomain(2) parameters. Reported and tested by: pho Reviewed by: kib MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21477
# 44e4def7	27-Aug-2019	Mark Johnston <markj@FreeBSD.org>	Remove an extraneous + 1 in _domainset_create(). DOMAINSET_FLS, like our fls(), is 1-indexed. Reported by: alc MFC after: 1 week Sponsored by: The FreeBSD Foundation
# 8e697504	27-Aug-2019	Mark Johnston <markj@FreeBSD.org>	Fix several logic issues in domainset_empty_vm(). - Don't add 1 to the result of DOMAINSET_FLS. - Do not modify domainsets containing only empty domains. - Always flatten a _PREFER policy to _ROUNDROBIN if the preferred domain is empty. Previously we were doing this only when ds_cnt > 1. These bugs could cause hangs during boot if a VM domain is empty. Tested by: hselasky Reviewed by: hselasky, kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21420
# 9978bd99	30-Oct-2018	Mark Johnston <markj@FreeBSD.org>	Add malloc_domainset(9) and _domainset variants to other allocator KPIs. Remove malloc_domain(9) and most other _domain KPIs added in r327900. The new functions allow the caller to specify a general NUMA domain selection policy, rather than specifically requesting an allocation from a specific domain. The latter policy tends to interact poorly with M_WAITOK, resulting in situations where a caller is blocked indefinitely because the specified domain is depleted. Most existing consumers of the _domain KPIs are converted to instead use a DOMAINSET_PREF() policy, in which we fall back to other domains to satisfy the allocation request. This change also defines a set of DOMAINSET_FIXED() policies, which only permit allocations from the specified domain. Discussed with: gallatin, jeff Reported and tested by: pho (previous version) MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17418
# 920239ef	30-Oct-2018	Mark Johnston <markj@FreeBSD.org>	Fix some problems that manifest when NUMA domain 0 is empty. - In uma_prealloc(), we need to check for an empty domain before the first allocation attempt, not after. Fix this by switching uma_prealloc() to use a vm_domainset iterator, which addresses the secondary issue of using a signed domain identifier in round-robin iteration. - Don't automatically create a page daemon for domain 0. - In domainset_empty_vm(), recompute ds_cnt and ds_order after excluding empty domains; otherwise we may frequently specify an empty domain when calling in to the page allocator, wasting CPU time. Convert DOMAINSET_PREF() policies for empty domains to round-robin. - When freeing bootstrap pages, don't count them towards the per-domain total page counts for now: some vm_phys segments are created before the SRAT is parsed and are thus always identified as being in domain 0 even when they are not. Then, when bootstrap pages are freed, they are added to a domain that we had previously thought was empty. Until this is corrected, we simply exclude them from the per-domain page count. Reported and tested by: Rajesh Kumar <rajfbsd@gmail.com> Reviewed by: gallatin MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17704
# b61f3142	22-Oct-2018	Mark Johnston <markj@FreeBSD.org>	Make it possible to disable NUMA support with a tunable. This provides a chicken switch for anyone negatively impacted by enabling NUMA in the amd64 GENERIC kernel configuration. With NUMA disabled at boot-time, information about the NUMA topology is not exposed to the rest of the kernel, and all of physical memory is viewed as coming from a single domain. This method still has some performance overhead relative to disabling NUMA support at compile time. PR: 231460 Reviewed by: alc, gallatin, kib MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17439
# 662e7fa8	20-Oct-2018	Mark Johnston <markj@FreeBSD.org>	Create some global domainsets and refactor NUMA registration. Pre-defined policies are useful when integrating the domainset(9) policy machinery into various kernel memory allocators. The refactoring will make it easier to add NUMA support for other architectures. No functional change intended. Reviewed by: alc, gallatin, jeff, kib Tested by: pho (part of a larger patch) MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D17416
# 30c5525b	01-Oct-2018	Andrew Gallatin <gallatin@FreeBSD.org>	Allow empty NUMA memory domains to support Threadripper2 The AMD Threadripper 2990WX is basically a slightly crippled Epyc. Rather than having 4 memory controllers, one per NUMA domain, it has only 2 memory controllers enabled. This means that only 2 of the 4 NUMA domains can be populated with physical memory, and the others are empty. Add support to FreeBSD for empty NUMA domains by: - creating empty memory domains when parsing the SRAT table, rather than failing to parse the table - not running the pageout deamon threads in empty domains - adding defensive code to UMA to avoid allocating from empty domains - adding defensive code to cpuset to avoid binding to an empty domain Thanks to Jeff for suggesting this strategy. Reviewed by: alc, markj Approved by: re (gjb@) Differential Revision: https://reviews.freebsd.org/D1683
# 70d66bcf	26-May-2018	Eric van Gyzen <vangyzen@FreeBSD.org>	kern_cpuset: fix small leak on error path The "mask" was leaked on some error paths. Reported by: Coverity CID: 1384683 Sponsored by: Dell EMC
# a6c7423a	18-May-2018	Matt Macy <mmacy@FreeBSD.org>	cpuset: revert and annotate instead
# 39eef2f4	18-May-2018	Matt Macy <mmacy@FreeBSD.org>	cpuset_thread0: avoid unused assignment on non debug build
# e5818a53	28-Mar-2018	Jeff Roberson <jeff@FreeBSD.org>	Implement several enhancements to NUMA policies. Add a new "interleave" allocation policy which stripes pages across domains with a stride or width keeping contiguity within a multi-page region. Move the kernel to the dedicated numbered cpuset #2 making it possible to assign kernel threads and memory policy separately from user. This also eliminates the need for the complicated interrupt binding code. Add a sysctl API for viewing and manipulating domainsets. Refactor some of the cpuset_t manipulation code using the generic bitset type so that it can be used for both. This probably belongs in a dedicated subr file. Attempt to improve the include situation. Reviewed by: kib Discussed with: jhb (cpuset parts) Tested by: pho (before review feedback) Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D14839
# 27a3c9d7	28-Mar-2018	Jeff Roberson <jeff@FreeBSD.org>	Restore r331606 with a bugfix to setup cpuset_domain[] earlier on all platforms. Original commit message as follows: Only use CPUs in the domain the device is attached to for default assignment. Device drivers are able to override the default assignment if they bind directly. There are severe performance penalties for handling interrupts on remote CPUs and this should only be done in very controlled circumstances. Reviewed by: jhb, kib Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D14838
# dd51fec3	08-Mar-2018	Brooks Davis <brooks@FreeBSD.org>	Copyout a whole int to cpuset_domain's policy pointer. The previous code only copied 16-bits and corrupted the target int. Reviewed by: kib, markj Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D14611
# 3f289c3f	12-Jan-2018	Jeff Roberson <jeff@FreeBSD.org>	Implement 'domainset', a cpuset based NUMA policy mechanism. This allows userspace to control NUMA policy administratively and programmatically. Implement domainset based iterators in the page layer. Remove the now legacy numa_* syscalls. Cleanup some header polution created by having seq.h in proc.h. Reviewed by: markj, kib Discussed with: alc Tested by: pho Sponsored by: Netflix, Dell/EMC Isilon Differential Revision: https://reviews.freebsd.org/D13403
# 8a36da99	27-Nov-2017	Pedro F. Giffuni <pfg@FreeBSD.org>	sys/kern: adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts.
# a1d0659c	22-Aug-2017	Jung-uk Kim <jkim@FreeBSD.org>	Fix size to copyout(9) for cpuset_getid(2). MFC after: 3 days
# f299c47b	23-May-2017	Allan Jude <allanjude@FreeBSD.org>	Allow cpuset_{get,set}affinity in capabilities mode bhyve was recently sandboxed with capsicum, and needs to be able to control the CPU sets of its vcpu threads Reviewed by: emaste, oshogbo, rwatson MFC after: 2 weeks Sponsored by: ScaleEngine Inc. Differential Revision: https://reviews.freebsd.org/D10170
# 29dfb631	03-May-2017	Conrad Meyer <cem@FreeBSD.org>	Extend cpuset_get/setaffinity() APIs Add IRQ placement-only and ithread-only API variants. intr_event_bind has been extended with sibling methods, as it has many more callsites in existing code. Reviewed by: kib@, adrian@ (earlier version) Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D10586
# 6286dc78	17-Apr-2017	Gleb Smirnoff <glebius@FreeBSD.org>	Remove unneeded include of vm_phys.h.
# 96ee4310	05-Feb-2017	Edward Tomasz Napierala <trasz@FreeBSD.org>	Add kern_cpuset_getaffinity() and kern_cpuset_getaffinity(), and use it in compats instead of their sys_*() counterparts. Reviewed by: kib, jhb, dchagin MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D9383
# ea2ebdc1	31-Jan-2017	Edward Tomasz Napierala <trasz@FreeBSD.org>	Add kern_cpuset_getid() and kern_cpuset_setid(), and use them in compat32 instead of their sub_*() counterparts. Reviewed by: jhb@, kib@ MFC after: 2 weeks Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D9382
# 62d70a81	09-Apr-2016	John Baldwin <jhb@FreeBSD.org>	Add more fine-grained kernel options for NUMA support. VM_NUMA_ALLOC is used to enable use of domain-aware memory allocation in the virtual memory system. DEVICE_NUMA is used to enable affinity reporting for devices such as bus_get_domain(). MAXMEMDOM must still be set to a value greater than for any NUMA support to be effective. Note that 'cpuset -gd' always works if MAXMEMDOM is enabled and the system supports NUMA. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D5782
# 5bbb2169	25-Jun-2015	Adrian Chadd <adrian@FreeBSD.org>	Un-static cpuset_which() - it's useful in other contexts, such as some CPU set operations in my upcoming NUMA work. Tested/compiled: * i386 (run) * amd64 (run) * mips (run) * mips64 (run) * armv6 (built) Sponsored by: Norse Corp, Inc.
# 60aa2c85	14-May-2015	Jonathan Anderson <jonathan@FreeBSD.org>	Allow sizeof(cpuset_t) to be queried in capability mode. This allows functions that retrieve and inspect pthread_attr_t objects to work correctly: querying the cpuset_t size is part of querying CPU affinity information, which is part of creating a complete pthread_attr_t. Approved by: rwatson (mentor) Reviewed by: pjd Sponsored by: NSERC
# bbf686ed	08-Jan-2015	John Baldwin <jhb@FreeBSD.org>	Reject attempts to read the cpuset mask of a negative domain ID.
# c0ae6688	08-Jan-2015	John Baldwin <jhb@FreeBSD.org>	Create a cpuset mask for each NUMA domain that is available in the kernel via the global cpuset_domain[] array. To export these to userland, add a CPU_WHICH_DOMAIN level that can be used to fetch the mask for a specific domain. Add a -d flag to cpuset(1) that can be used to fetch the mask for a given domain. Differential Revision: https://reviews.freebsd.org/D1232 Submitted by: jeff (kernel bits) Reviewed by: adrian, jeff
# f0188618	21-Oct-2014	Hans Petter Selasky <hselasky@FreeBSD.org>	Fix multiple incorrect SYSCTL arguments in the kernel: - Wrong integer type was specified. - Wrong or missing "access" specifier. The "access" specifier sometimes included the SYSCTL type, which it should not, except for procedural SYSCTL nodes. - Logical OR where binary OR was expected. - Properly assert the "access" argument passed to all SYSCTL macros, using the CTASSERT macro. This applies to both static- and dynamically created SYSCTLs. - Properly assert the the data type for both static and dynamic SYSCTLs. In the case of static SYSCTLs we only assert that the data pointed to by the SYSCTL data pointer has the correct size, hence there is no easy way to assert types in the C language outside a C-function. - Rewrote some code which doesn't pass a constant "access" specifier when creating dynamic SYSCTL nodes, which is now a requirement. - Updated "EXAMPLES" section in SYSCTL manual page. MFC after: 3 days Sponsored by: Mellanox Technologies
# 7f7528fc	15-Sep-2014	Adrian Chadd <adrian@FreeBSD.org>	Modify cpuset_setithread() to take a CPU ID as an integer, not a char. We're going to end up having > 254 CPUs at some point.
# c1d9ecf2	13-Sep-2014	Alexander V. Chernikov <melifaro@FreeBSD.org>	Fix error handling in cpuset_setithread() introduced in r267716. Noted by: kib MFC after: 1 week
# 81198539	22-Jun-2014	Alexander V. Chernikov <melifaro@FreeBSD.org>	Permit changing cpu mask for cpu set 1 in presence of drivers binding their threads to particular CPU. Changing ithread cpu mask is now performed by special cpuset_setithread(). It creates additional cpuset root group on first bind invocation. No objection: jhb Tested by: hiren MFC after: 2 weeks Sponsored by: Yandex LLC
# cd32bd7a	25-Jun-2013	John Baldwin <jhb@FreeBSD.org>	Several improvements to rmlock(9). Many of these are based on patches provided by Isilon. - Add an rm_assert() supporting various lock assertions similar to other locking primitives. Because rmlocks track readers the assertions are always fully accurate unlike rw_assert() and sx_assert(). - Flesh out the lock class methods for rmlocks to support sleeping via condvars and rm_sleep() (but only while holding write locks), rmlock details in 'show lock' in DDB, and the lc_owner method used by dtrace. - Add an internal destroyed cookie so that API functions can assert that an rmlock is not destroyed. - Make use of rm_assert() to add various assertions to the API (e.g. to assert locks are held when an unlock routine is called). - Give RM_SLEEPABLE locks their own lock class and always use the rmlock's own lock_object with WITNESS. - Use THREAD_NO_SLEEPING() / THREAD_SLEEPING_OK() to disallow sleeping while holding a read lock on an rmlock. Submitted by: andre Obtained from: EMC/Isilon
# 17a27377	13-Jun-2013	Jeff Roberson <jeff@FreeBSD.org>	- Add a BIT_FFS() macro and use it to replace cpusetffs_obj() Discussed with: attilio Sponsored by: EMC / Isilon Storage Division
# c9813d0a	06-Jun-2013	John Baldwin <jhb@FreeBSD.org>	Do not compare the existing mask of a cpuset with a new mask when changing the mask of a cpuset. Also, change the cpuset's mask before updating the masks of all children. Previously changing a cpuset's mask first required setting the mask to a super-set of both the old and new masks and then changing it a second time to the new mask.
# d4a2ab8c	30-Aug-2012	Attilio Rao <attilio@FreeBSD.org>	Post r222812 KTR_CPUMASK started being initialized only as a tunable handler and not more statically. Unfortunately, it seems that this is not ideal for new platform bringup and boot low level development (which needs ktr_cpumask to be effective before tunables can be setup). Because of this, add a way to statically initialize cpusets, by passing an list of initializers, divided by commas. Also, provide a way to enforce an all-set mask, for above mentioned initializers. This imposes some differences on how KTR_CPUMASK is setup now as a kernel option, and in particular this makes the words specifications backward wrt. what is currently in -CURRENT. In order to avoid mismatches between KTR_CPUMASK definition and other way to setup the mask (tunable, sysctl) and to print it, change the ordering how cpusetobj_print() and cpusetobj_scan() acquire the words belonging to the set. Please give a look to sys/conf/NOTES in order to understand how the new format is supposed to work. Also, ktr manpages will be updated shortly by gjb which volountereed for this. This patch won't be merged because it changes a POLA (at least from the theoretical standpoint) and this is however a patch that proves to be effective only in development environments. Requested by: rpaulo Reviewed by: jeff, rpaulo
# 2b69bb1f	05-Dec-2011	Kevin Lo <kevlo@FreeBSD.org>	Add a missing curly bracket
# 8451d0dd	16-Sep-2011	Kip Macy <kmacy@FreeBSD.org>	In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)
# e3709597	31-May-2011	Attilio Rao <attilio@FreeBSD.org>	Fix KTR_CPUMASK in order to accept a string representing a cpuset_t. This introduce all the underlying support for making this possible (via the function cpusetobj_strscan() and keeps ktr_cpumask exported. sparc64 implements its own assembly primitives for tracing events and needs to properly check it. Anyway the sparc64 logic is not implemented yet due to lack of knowledge (by me) and time (by marius), but it is just a matter of using ktr_cpumask when possible. Tested and fixed by: pluknet Reviewed by: marius
# d0984adc	31-May-2011	Attilio Rao <attilio@FreeBSD.org>	Revert a change that crept in during MFC.
# 217e1c0e	23-May-2011	Attilio Rao <attilio@FreeBSD.org>	Revert a patch that unvolountary sneaked in while I was MFCing.
# e3071102	22-May-2011	Attilio Rao <attilio@FreeBSD.org>	Merge r221912 from largeSMP project branch: Fix a long-standing bug in cpuset_thread0() where only the first part of cs_mask is set full. Submitted by: anonymous MFC after: 1 week
# 34a1e065	22-May-2011	Attilio Rao <attilio@FreeBSD.org>	Make cpusetobj_strprint() prepare the string in order to print the least significant cpuset_t word at the outmost right part of the string (more far from the beginning of it). This follows the natural build of bits rappresentation in the words.
# f27aed53	14-May-2011	Attilio Rao <attilio@FreeBSD.org>	Fix a longstanding bug where only the first part of the cpumask was correctly set full. Submitted by: anonymous
# faa0e911	14-May-2011	Attilio Rao <attilio@FreeBSD.org>	Simplify the code here. Submitted by: jhb
# 71a19bdc	05-May-2011	Attilio Rao <attilio@FreeBSD.org>	Commit the support for removing cpumask_t and replacing it directly with cpuset_t objects. That is going to offer the underlying support for a simple bump of MAXCPU and then support for number of cpus > 32 (as it is today). Right now, cpumask_t is an int, 32 bits on all our supported architecture. cpumask_t on the other side is implemented as an array of longs, and easilly extendible by definition. The architectures touched by this commit are the following: - amd64 - i386 - pc98 - arm - ia64 - XEN while the others are still missing. Userland is believed to be fully converted with the changes contained here. Some technical notes: - This commit may be considered an ABI nop for all the architectures different from amd64 and ia64 (and sparc64 in the future) - per-cpu members, which are now converted to cpuset_t, needs to be accessed avoiding migration, because the size of cpuset_t should be considered unknown - size of cpuset_t objects is different from kernel and userland (this is primirally done in order to leave some more space in userland to cope with KBI extensions). If you need to access kernel cpuset_t from the userland please refer to example in this patch on how to do that correctly (kgdb may be a good source, for example). - Support for other architectures is going to be added soon - Only MAXCPU for amd64 is bumped now The patch has been tested by sbruno and Nicholas Esborn on opteron 4 x 12 pack CPUs. More testing on big SMP is expected to came soon. pluknet tested the patch with his 8-ways on both amd64 and i386. Tested by: pluknet, sbruno, gianni, Nicholas Esborn Reviewed by: jeff, jhb, sbruno
# e84c2db1	08-Mar-2011	John Baldwin <jhb@FreeBSD.org>	When constructing a new cpuset, apply the parent cpuset's mask to the new set's mask rather than the root mask. This was causing the root mask to be modified incorrectly. Reviewed by: jeff MFC after: 1 week
# 444528c0	31-Oct-2010	David Xu <davidxu@FreeBSD.org>	Use integer for size of cpuset, as it won't be bigger than INT_MAX, This is requested by bge. Also move the sysctl into file kern_cpuset.c, because it should always be there, it is independent of thread scheduler.
# 4a547870	27-Oct-2010	David Xu <davidxu@FreeBSD.org>	- Revert r214409. - Use long word to figure out sizeof kernel cpuset, hope it works.
# 1676b425	26-Oct-2010	David Xu <davidxu@FreeBSD.org>	If input parameter cpusetsize is zero, give userland size of cpuset mask kernel is using.
# 42fe684c	25-Oct-2010	David Xu <davidxu@FreeBSD.org>	Use function tdfind() to find a thread.
# a7d5f7eb	19-Oct-2010	Jamie Gritton <jamie@FreeBSD.org>	A new jail(8) with a configuration file, to replace the work currently done by /etc/rc.d/jail.
# 86855bf5	26-Oct-2009	John Baldwin <jhb@FreeBSD.org>	Another nit that both I and ispell missed. Submitted by: Ben Kaduk minimarmot of gmail
# 93902625	26-Oct-2009	John Baldwin <jhb@FreeBSD.org>	Fix some spelling nits.
# 399645e1	23-Jun-2009	Jamie Gritton <jamie@FreeBSD.org>	Remove unnecessary/redundant includes. Approved by: bz (mentor)
# 0304c731	27-May-2009	Jamie Gritton <jamie@FreeBSD.org>	Add hierarchical jails. A jail may further virtualize its environment by creating a child jail, which is visible to that jail and to any parent jails. Child jails may be restricted more than their parents, but never less. Jail names reflect this hierarchy, being MIB-style dot-separated strings. Every thread now points to a jail, the default being prison0, which contains information about the physical system. Prison0's root directory is the same as rootvnode; its hostname is the same as the global hostname, and its securelevel replaces the global securelevel. Note that the variable "securelevel" has actually gone away, which should not cause any problems for code that properly uses securelevel_gt() and securelevel_ge(). Some jail-related permissions that were kept in global variables and set via sysctls are now per-jail settings. The sysctls still exist for backward compatibility, used only by the now-deprecated jail(2) system call. Approved by: bz (mentor)
# 6aaa0b3c	28-Apr-2009	Bjoern A. Zeeb <bz@FreeBSD.org>	Prevent a superuser inside a jail from modifying the dedicated root cpuset of that jail. Processes inside the jail will still be able to change child sets. A superuser outside of a jail will still be able to change the jail cpuset and thus limit the number of cpus available to the jail. Problem reported by: 000.fbsd@quip.cz (Miroslav Lachman) PR: kern/134050 Reviewed by: jeff MFC after: 3 weeks X-MFC: backout r191596
# 47479a8c	22-Apr-2009	Bjoern A. Zeeb <bz@FreeBSD.org>	Correct a comment: the function name given had never existed in any (relevant) version of this file orany of my patches. MFC after: 1 month
# 413628a7	29-Nov-2008	Bjoern A. Zeeb <bz@FreeBSD.org>	MFp4: Bring in updated jail support from bz_jail branch. This enhances the current jail implementation to permit multiple addresses per jail. In addtion to IPv4, IPv6 is supported as well. Due to updated checks it is even possible to have jails without an IP address at all, which basically gives one a chroot with restricted process view, no networking,.. SCTP support was updated and supports IPv6 in jails as well. Cpuset support permits jails to be bound to specific processor sets after creation. Jails can have an unrestricted (no duplicate protection, etc.) name in addition to the hostname. The jail name cannot be changed from within a jail and is considered to be used for management purposes or as audit-token in the future. DDB 'show jails' command was added to aid debugging. Proper compat support permits 32bit jail binaries to be used on 64bit systems to manage jails. Also backward compatibility was preserved where possible: for jail v1 syscalls, as well as with user space management utilities. Both jail as well as prison version were updated for the new features. A gap was intentionally left as the intermediate versions had been used by various patches floating around the last years. Bump __FreeBSD_version for the afore mentioned and in kernel changes. Special thanks to: - Pawel Jakub Dawidek (pjd) for his multi-IPv4 patches and Olivier Houchard (cognet) for initial single-IPv6 patches. - Jeff Roberson (jeff) and Randall Stewart (rrs) for their help, ideas and review on cpuset and SCTP support. - Robert Watson (rwatson) for lots and lots of help, discussions, suggestions and review of most of the patch at various stages. - John Baldwin (jhb) for his help. - Simon L. Nielsen (simon) as early adopter testing changes on cluster machines as well as all the testers and people who provided feedback the last months on freebsd-jail and other channels. - My employer, CK Software GmbH, for the support so I could work on this. Reviewed by: (see above) MFC after: 3 months (this is just so that I get the mail) X-MFC Before: 7.2-RELEASE if possible
# d7f03759	19-Oct-2008	Ulf Lilleengen <lulf@FreeBSD.org>	- Import the HEAD csup code which is the basis for the cvsmode work.
# dea0ed66	07-Jul-2008	Bjoern A. Zeeb <bz@FreeBSD.org>	Add a `show cpusets' DDB command to print numbered root and assigned CPU affinity sets. Reviewed by: brooks
# 7a8f695a	07-Jul-2008	Bjoern A. Zeeb <bz@FreeBSD.org>	Move cpuset_refroot and cpuset_refbase functions up, grouping the cpuset_ref* functions together. Will make it easier to read and add code without forward declarations. No functional changes.
# ba931c08	29-Jun-2008	Bjoern A. Zeeb <bz@FreeBSD.org>	Add a new priv 'PRIV_SCHED_CPUSET' to check if manipulating cpusets is allowed and replace the suser() call. Do not allow it in jails. Reviewed by: rwatson
# 887aedc6	26-May-2008	Konstantin Belousov <kib@FreeBSD.org>	Take into account possible overflow when multiplying. The casuality is the malloc call later, panicing kernel due to the oversized allocation. Reported by: pho Reviewed by: jeff
# 9b33b154	10-Apr-2008	Jeff Roberson <jeff@FreeBSD.org>	- Add the interrupt vector number to intr_event_create so MI code can lookup hard interrupt events by number. Ignore the irq# for soft intrs. - Add support to cpuset for binding hardware interrupts. This has the side effect of binding any ithread associated with the hard interrupt. As per restrictions imposed by MD code we can only bind interrupts to a single cpu presently. Interrupts can be 'unbound' by binding them to all cpus. Reviewed by: jhb Sponsored by: Nokia
# 3bc8c68d	03-Apr-2008	Jeff Roberson <jeff@FreeBSD.org>	- Add a Nokia copyright to cpuset to reflect their generous contribution to this work.
# a03ee000	30-Mar-2008	Jeff Roberson <jeff@FreeBSD.org>	- Consistently return EDEADLK when presented with a new set that is incompatible with existing bindings. - Try to copyout the setid in cpuset() before migrating the proc to the setid in case the user has supplied a bad buffer. - Rename cpuset_root() and cpuset_base() to cpuset_ref{root,base} to be more descriptive and free cpuset_root to be used as a different type of symbol. - Make cpuset_root the cpuset_t set of all cpus in the system. This should contain the same bitmask as all_cpus presently. - Add a CPU_CMP() macro to compare two sets.
# 7f64829a	25-Mar-2008	Ruslan Ermilov <ru@FreeBSD.org>	Fixed type of the fourth argument of cpuset_{get,set}affinity(2) to be size_t. Prodded by: davidxu
# 374ae2a3	19-Mar-2008	Jeff Roberson <jeff@FreeBSD.org>	- Relax requirements for p_numthreads, p_threads, p_swtick, and p_nice from requiring the per-process spinlock to only requiring the process lock. - Reflect these changes in the proc.h documentation and consumers throughout the kernel. This is a substantial reduction in locking cost for these fields and was made possible by recent changes to threading support.
# c6440f72	06-Mar-2008	Jeff Roberson <jeff@FreeBSD.org>	- Add a missing unlock to cpuset_setaffinity(CPU_LEVEL_CPUSET, CPU_WHICH_PID) Found by: gallatin
# 8bd75bdd	05-Mar-2008	Jeff Roberson <jeff@FreeBSD.org>	- Don't overwrite the recently allocated 'nset' in cpuset_setthread() by passing it to cpuset_which(). Pass in 'set' instead. This argument is not used but for convenience cpuset_which() nulls all incoming parameters. Submitted by: davidxu
# 73c40187	04-Mar-2008	Jeff Roberson <jeff@FreeBSD.org>	- Verify that when a user supplies a mask that is bigger than the kernel mask none of the upper bits are set. - Be more careful about enforcing the boundaries of masks and child sets. - Introduce a few more CPU_* macros for implementing these tests. - Change the cpusetsize argument to be bytes rather than bits to match other apis. Sponsored by: Nokia
# d7f687fc	02-Mar-2008	Jeff Roberson <jeff@FreeBSD.org>	Add cpuset, an api for thread to cpu binding and cpu resource grouping and assignment. - Add a reference to a struct cpuset in each thread that is inherited from the thread that created it. - Release the reference when the thread is destroyed. - Add prototypes for syscalls and macros for manipulating cpusets in sys/cpuset.h - Add syscalls to create, get, and set new numbered cpusets: cpuset(), cpuset_{get,set}id() - Add syscalls for getting and setting affinity masks for cpusets or individual threads: cpuid_{get,set}affinity() - Add types for the 'level' and 'which' parameters for the cpuset. This will permit expansion of the api to cover cpu masks for other objects identifiable with an id_t integer. For example, IRQs and Jails may be coming soon. - The root set 0 contains all valid cpus. All thread initially belong to cpuset 1. This permits migrating all threads off of certain cpus to reserve them for special applications. Sponsored by: Nokia Discussed with: arch, rwatson, brooks, davidxu, deischen Reviewed by: antoine