History log of /freebsd-current/sys/sys/priv.h
Revision Date Author Comments
# 7974ca1c 17-Aug-2023 Olivier Certner <olce.freebsd@certner.fr>

cr_canseejailproc(): New privilege, no direct check for UID 0

Use priv_check_cred() with a new privilege (PRIV_SEEJAILPROC) instead of
explicitly testing for UID 0 (the former has been the rule for almost 20
years).

As a consequence, cr_canseejailproc() now abides by the
'security.bsd.suser_enabled' sysctl and MAC policies.

Update the MAC policies Biba and LOMAC, and prison_priv_check() so that
they don't deny this privilege. This preserves the existing behavior
(the 'root' user is not restricted, even when jailed, unless
'security.bsd.suser_enabled' is not 0) and is consistent with what is
done for the related policies/privileges (PRIV_SEEOTHERGIDS,
PRIV_SEEOTHERUIDS).

Reviewed by: emaste (earlier version), mhorne
MFC after: 2 weeks
Sponsored by: Kumacom SAS
Differential Revision: https://reviews.freebsd.org/D40626


# 95ee2897 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: two-line .h pattern

Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/


# 4d846d26 10-May-2023 Warner Losh <imp@FreeBSD.org>

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix


# 8512d82e 02-Apr-2023 Steve Kiernan <stevek@juniper.net>

veriexec: Additional functionality for MAC/veriexec

Ensure veriexec opens the file before doing any read operations.

When the MAC_VERIEXEC_CHECK_PATH_SYSCALL syscall is requested, veriexec
needs to open the file before calling mac_veriexec_check_vp. This is to
ensure any set up is done by the file system. Most file systems do not
explicitly need an open, but some (e.g. virtfs) require initialization
of access tokens (file identifiers, etc.) before doing any read or write
operations.

The evaluate_fingerprint() function needs to ensure it has an open file
for reading in order to evaluate the fingerprint. The ideal solution is
to have a hook after the VOP_OPEN call in vn_open. For now, we open the
file for reading, envaluate the fingerprint, and close the file. While
this leaves a potential hole that could possibly be taken advantage of
by a dedicated aversary, this code path is not typically visited often
in our use cases, as we primarily encounter verified mounts and not
individual files. This should be considered a temporary workaround until
discussions about the post-open hook have concluded and the hook becomes
available.

Add MAC_VERIEXEC_GET_PARAMS_PATH_SYSCALL and
MAC_VERIEXEC_GET_PARAMS_PID_SYSCALL to mac_veriexec_syscall so we can
fetch and check label contents in an unconstrained manner.

Add a check for PRIV_VERIEXEC_CONTROL to do ioctl on /dev/veriexec

Make it clear that trusted process cannot be debugged. Attempts to debug
a trusted process already fail, but the failure path is very obscure.
Add an explicit check for VERIEXEC_TRUSTED in
mac_veriexec_proc_check_debug.

We need mac_veriexec_priv_check to not block PRIV_KMEM_WRITE if
mac_priv_gant() says it is ok.

Reviewed by: sjg
Obtained from: Juniper Networks, Inc.


# 6ae8d576 29-Jul-2019 Simon J. Gerraty <sjg@juniper.net>

mac_veriexec: add mac_priv_grant check for NODEV

Allow other MAC modules to override some veriexec checks.

We need two new privileges:
PRIV_VERIEXEC_DIRECT process wants to override 'indirect' flag
on interpreter
PRIV_VERIEXEC_NOVERIFY typically associated with PRIV_VERIEXEC_DIRECT
allow override of O_VERIFY

We also need to check for PRIV_VERIEXEC_NOVERIFY override
for FINGERPRINT_NODEV and FINGERPRINT_NOENTRY.
This will only happen if parent had PRIV_VERIEXEC_DIRECT override.

This allows for MAC modules to selectively allow some applications to
run without verification.

Needless to say, this is extremely dangerous and should only be used
sparingly and carefully.

Obtained from: Juniper Networks, Inc.

Reviewers: sjg
Subscribers: imp, dab

Differential Revision: https://reviews.freebsd.org/D39537


# 4819e5ae 15-Apr-2023 Stephen J. Kiernan <stevek@FreeBSD.org>

Add new privilege PRIV_KDB_SET_BACKEND

Summary:
Check for PRIV_KDB_SET_BACKEND before allowing a thread to change
the KDB backend.

Obtained from: Juniper Networks, Inc.
Reviewers: sjg, emaste
Subscribers: imp

Differential Revision: https://reviews.freebsd.org/D39538


# 744bfb21 28-Oct-2022 John Baldwin <jhb@FreeBSD.org>

Import the WireGuard driver from zx2c4.com.

This commit brings back the driver from FreeBSD commit
f187d6dfbf633665ba6740fe22742aec60ce02a2 plus subsequent fixes from
upstream.

Relative to upstream this commit includes a few other small fixes such
as additional INET and INET6 #ifdef's, #include cleanups, and updates
for recent API changes in main.

Reviewed by: pauamma, gbe, kevans, emaste
Obtained from: git@git.zx2c4.com:wireguard-freebsd @ 3cc22b2
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36909


# 43f8c763 15-Oct-2022 Zhenlei Huang <zlei.huang@gmail.com>

if_me: Use dedicated network privilege

Separate if_me privileges from if_gif.

Reviewed by: kp
Differential Revision: https://reviews.freebsd.org/D36691


# ab91feab 22-Feb-2022 Kristof Provost <kp@FreeBSD.org>

ovpn: Introduce OpenVPN DCO support

OpenVPN Data Channel Offload (DCO) moves OpenVPN data plane processing
(i.e. tunneling and cryptography) into the kernel, rather than using tap
devices.
This avoids significant copying and context switching overhead between
kernel and user space and improves OpenVPN throughput.

In my test setup throughput improved from around 660Mbit/s to around
2Gbit/s.

Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D34340


# a20a2450 09-Dec-2021 Florian Walpen <dev@submerge.ch>

Add PRIV_SCHED_IDPRIO

The privilege allows the holder to assign idle priority type to thread
or process.

MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D33338


# f187d6df 15-Mar-2021 Kyle Evans <kevans@FreeBSD.org>

base: remove if_wg(4) and associated utilities, manpage

After length decisions, we've decided that the if_wg(4) driver and
related work is not yet ready to live in the tree. This driver has
larger security implications than many, and thus will be held to
more scrutiny than other drivers.

Please also see the related message sent to the freebsd-hackers@
and freebsd-arch@ lists by Kyle Evans <kevans@FreeBSD.org> on
2021/03/16, with the subject line "Removing WireGuard Support From Base"
for additional context.


# 74ae3f3e 14-Mar-2021 Kyle Evans <kevans@FreeBSD.org>

if_wg: import latest fixup work from the wireguard-freebsd project

This is the culmination of about a week of work from three developers to
fix a number of functional and security issues. This patch consists of
work done by the following folks:

- Jason A. Donenfeld <Jason@zx2c4.com>
- Matt Dunwoodie <ncon@noconroy.net>
- Kyle Evans <kevans@FreeBSD.org>

Notable changes include:
- Packets are now correctly staged for processing once the handshake has
completed, resulting in less packet loss in the interim.
- Various race conditions have been resolved, particularly w.r.t. socket
and packet lifetime (panics)
- Various tests have been added to assure correct functionality and
tooling conformance
- Many security issues have been addressed
- if_wg now maintains jail-friendly semantics: sockets are created in
the interface's home vnet so that it can act as the sole network
connection for a jail
- if_wg no longer fails to remove peer allowed-ips of 0.0.0.0/0
- if_wg now exports via ioctl a format that is future proof and
complete. It is additionally supported by the upstream
wireguard-tools (which we plan to merge in to base soon)
- if_wg now conforms to the WireGuard protocol and is more closely
aligned with security auditing guidelines

Note that the driver has been rebased away from using iflib. iflib
poses a number of challenges for a cloned device trying to operate in a
vnet that are non-trivial to solve and adds complexity to the
implementation for little gain.

The crypto implementation that was previously added to the tree was a
super complex integration of what previously appeared in an old out of
tree Linux module, which has been reduced to crypto.c containing simple
boring reference implementations. This is part of a near-to-mid term
goal to work with FreeBSD kernel crypto folks and take advantage of or
improve accelerated crypto already offered elsewhere.

There's additional test suite effort underway out-of-tree taking
advantage of the aforementioned jail-friendly semantics to test a number
of real-world topologies, based on netns.sh.

Also note that this is still a work in progress; work going further will
be much smaller in nature.

MFC after: 1 month (maybe)


# a459a6cf 25-Aug-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: respect PRIV_VFS_LOOKUP in vaccess_smr

Reported by: novel


# 78957abd 19-Jul-2020 Adrian Chadd <adrian@FreeBSD.org>

[net80211] missing from last commit, le whoops

Differential Revision:https://reviews.freebsd.org/D25630


# 63619b6d 04-Jun-2020 Kyle Evans <kevans@FreeBSD.org>

vfs: add restrictions to read(2) of a directory [2/2]

This commit adds the priv(9) that waters down the sysctl to make it only
allow read(2) of a dirfd by the system root. Jailed root is not allowed, but
jail policy and superuser policy will abstain from allowing/denying it so
that a MAC module can fully control the policy.

Such a MAC module has been written, and can be found at:
https://people.freebsd.org/~kevans/mac_read_dir-0.1.0.tar.gz

It is expected that the MAC module won't be needed by many, as most only
need to do such diagnostics that require this behavior as system root
anyways. Interested parties are welcome to grab the MAC module above and
create a port or locally integrate it, and with enough support it could see
introduction to base. As noted in mac_read_dir.c, it is released under the
BSD 2 clause license and allows the restrictions to be lifted for only
jailed root or for all unprivileged users.

PR: 246412
Reviewed by: mckusick, kib, emaste, jilles, cy, phk, imp (all previous)
Reviewed by: rgrimes (latest version)
Differential Revision: https://reviews.freebsd.org/D24596


# 7b2ff0dc 13-Feb-2020 Mateusz Guzik <mjg@FreeBSD.org>

Partially decompose priv_check by adding priv_check_cred_vfs_generation

During buildkernel there are very frequent calls to priv_check and they
all are for PRIV_VFS_GENERATION (coming from stat/fstat).

This results in branching on several potential privileges checking if
perhaps that's the one which has to be evaluated.

Instead of the kitchen-sink approach provide a way to have commonly used
privs directly evaluated.


# cc426dd3 11-Dec-2018 Mateusz Guzik <mjg@FreeBSD.org>

Remove unused argument to priv_check_cred.

Patch mostly generated with cocinnelle:

@@
expression E1,E2;
@@

- priv_check_cred(E1,E2,0)
+ priv_check_cred(E1,E2)

Sponsored by: The FreeBSD Foundation


# f1379734 27-Mar-2018 Konstantin Belousov <kib@FreeBSD.org>

Allow to specify PCP on packets not belonging to any VLAN.

According to 802.1Q-2014, VLAN tagged packets with VLAN id 0 should be
considered as untagged, and only PCP and DEI values from the VLAN tag
are meaningful. See for instance
https://www.cisco.com/c/en/us/td/docs/switches/connectedgrid/cg-switch-sw-master/software/configuration/guide/vlan0/b_vlan_0.html.

Make it possible to specify PCP value for outgoing packets on an
ethernet interface. When PCP is supplied, the tag is appended, VLAN
id set to 0, and PCP is filled by the supplied value. The code to do
VLAN tag encapsulation is refactored from the if_vlan.c and moved into
if_ethersubr.c.

Drivers might have issues with filtering VID 0 packets on
receive. This bug should be fixed for each driver.

Reviewed by: ae (previous version), hselasky, melifaro
Sponsored by: Mellanox Technologies
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D14702


# c4e20cad 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/sys: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.


# 456a73ef 22-Oct-2017 Konstantin Belousov <kib@FreeBSD.org>

Remove the support for mknod(S_IFMT), which created dummy vnodes with
VBAD type.

FFS ffs_write() VOP catches such vnodes and panics, other VOPs do not
check for the type and their behaviour is really undefined. The
comment claims that this support was done for 'badsect' to flag bad
sectors, we do not have such facility in kernel anyway.

Reported by: Dmitry Vyukov <dvyukov@google.com>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 2ccbbd06 06-Jun-2016 Marcelo Araujo <araujo@FreeBSD.org>

Add support to priority code point (PCP) that is an 3-bit field
which refers to IEEE 802.1p class of service and maps to the frame
priority level.

Values in order of priority are: 1 (Background (lowest)),
0 (Best effort (default)), 2 (Excellent effort),
3 (Critical applications), 4 (Video, < 100ms latency),
5 (Video, < 10ms latency), 6 (Internetwork control) and
7 (Network control (highest)).

Example of usage:
root# ifconfig em0.1 create
root# ifconfig em0.1 vlanpcp 3

Note:
The review D801 includes the pf(4) part, but as discussed with kristof,
we won't commit the pf(4) bits for now.
The credits of the original code is from rwatson.

Differential Revision: https://reviews.freebsd.org/D801
Reviewed by: gnn, adrian, loos
Discussed with: rwatson, glebius, kristof
Tested by: many including Matthew Grooms <mgrooms__shrew.net>
Obtained from: pfSense
Relnotes: Yes


# 7f417bfa 03-May-2016 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/sys: minor spelling fixes.

While the changes are minor, these headers are very visible.

MFC after: 2 weeks


# 0815129d 21-Jan-2016 Kirk McKusick <mckusick@FreeBSD.org>

Update comment to note the function, prison_priv_check(), that needs to
be updated in kern_jail.c when a new priviledge is added.


# 677258f7 18-Jan-2015 Konstantin Belousov <kib@FreeBSD.org>

Add procctl(2) PROC_TRACE_CTL command to enable or disable debugger
attachment to the process. Note that the command is not intended to
be a security measure, rather it is an obfuscation feature,
implemented for parity with other operating systems.

Discussed with: jilles, rwatson
Man page fixes by: rwatson
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 007054f0 20-Oct-2014 Bryan Venteicher <bryanv@FreeBSD.org>

Add vxlan interface

vxlan creates a virtual LAN by encapsulating the inner Ethernet frame in
a UDP packet. This implementation is based on RFC7348.

Currently, the IPv6 support is not fully compliant with the specification:
we should be able to receive UPDv6 packets with a zero checksum, but we
need to support RFC6935 first. Patches for this should come soon.

Encapsulation protocols such as vxlan emphasize the need for the FreeBSD
network stack to support batching, GRO, and GSO. Each frame has to make
two trips through the network stack, and each frame will be at most MTU
sized. Performance suffers accordingly.

Some latest generation NICs have begun to support vxlan HW offloads that
we should also take advantage of. VIMAGE support should also be added soon.

Differential Revision: https://reviews.freebsd.org/D384
Reviewed by: gnn
Relnotes: yes


# 7527624e 14-Mar-2014 Robert Watson <rwatson@FreeBSD.org>

Several years after initial development, merge prototype support for
linking NIC Receive Side Scaling (RSS) to the network stack's
connection-group implementation. This prototype (and derived patches)
are in use at Juniper and several other FreeBSD-using companies, so
despite some reservations about its maturity, merge the patch to the
base tree so that it can be iteratively refined in collaboration rather
than maintained as a set of gradually diverging patch sets.

(1) Merge a software implementation of the Toeplitz hash specified in
RSS implemented by David Malone. This is used to allow suitable
pcbgroup placement of connections before the first packet is
received from the NIC. Software hashing is generally avoided,
however, due to high cost of the hash on general-purpose CPUs.

(2) In in_rss.c, maintain authoritative versions of RSS state intended
to be pushed to each NIC, including keying material, hash
algorithm/ configuration, and buckets. Provide software-facing
interfaces to hash 2- and 4-tuples for IPv4 and IPv6 using both
the RSS standardised Toeplitz and a 'naive' variation with a hash
efficient in software but with poor distribution properties.
Implement rss_m2cpuid()to be used by netisr and other load
balancing code to look up the CPU on which an mbuf should be
processed.

(3) In the Ethernet link layer, allow netisr distribution using RSS as
a source of policy as an alternative to source ordering; continue
to default to direct dispatch (i.e., don't try and requeue packets
for processing on the 'right' CPU if they arrive in a directly
dispatchable context).

(4) Allow RSS to control tuning of connection groups in order to align
groups with RSS buckets. If a packet arrives on a protocol using
connection groups, and contains a suitable hardware-generated
hash, use that hash value to select the connection group for pcb
lookup for both IPv4 and IPv6. If no hardware-generated Toeplitz
hash is available, we fall back on regular PCB lookup risking
contention rather than pay the cost of Toeplitz in software --
this is a less scalable but, at my last measurement, faster
approach. As core counts go up, we may want to revise this
strategy despite CPU overhead.

Where device drivers suitably configure NICs, and connection groups /
RSS are enabled, this should avoid both lock and line contention during
connection lookup for TCP. This commit does not modify any device
drivers to tune device RSS configuration to the global RSS
configuration; patches are in circulation to do this for at least
Chelsio T3 and Intel 1G/10G drivers. Currently, the KPI for device
drivers is not particularly robust, nor aware of more advanced features
such as runtime reconfiguration/rebalancing. This will hopefully prove
a useful starting point for refinement.

No MFC is scheduled as we will first want to nail down a more mature
and maintainable KPI/KBI for device drivers.

Sponsored by: Juniper Networks (original work)
Sponsored by: EMC/Isilon (patch update and merge)


# 1b7bdb29 14-Mar-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Add placeholders for IPX/SPX and AppleTalk priveledges, to avoid
number clashes with any future constants.

Suggested by: rwatson


# 45c203fc 14-Mar-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Remove AppleTalk support.

AppleTalk was a network transport protocol for Apple Macintosh devices
in 80s and then 90s. Starting with Mac OS X in 2000 the AppleTalk was
a legacy protocol and primary networking protocol is TCP/IP. The last
Mac OS X release to support AppleTalk happened in 2009. The same year
routing equipment vendors (namely Cisco) end their support.

Thus, AppleTalk won't be supported in FreeBSD 11.0-RELEASE.


# 2c284d93 13-Mar-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Remove IPX support.

IPX was a network transport protocol in Novell's NetWare network operating
system from late 80s and then 90s. The NetWare itself switched to TCP/IP
as default transport in 1998. Later, in this century the Novell Open
Enterprise Server became successor of Novell NetWare. The last release
that claimed to still support IPX was OES 2 in 2007. Routing equipment
vendors (e.g. Cisco) discontinued support for IPX in 2011.

Thus, IPX won't be supported in FreeBSD 11.0-RELEASE.


# 39ca489e 03-Aug-2013 Edward Tomasz Napierala <trasz@FreeBSD.org>

Fix typo.


# 1e7df843 05-Jul-2013 Jamie Gritton <jamie@FreeBSD.org>

Make the comments a little more clear about PRIV_KMEM_*, explicitly
referring to /dev/[k]mem and noting it's about opening the files rather
than actually reading and writing.

Reviewed by: jmallett


# fd311def 05-Jul-2013 Jamie Gritton <jamie@FreeBSD.org>

Bump up _PRIV_HIGHEST to account for PRIV_KMEM_READ/WRITE.

Submitted by: mdf


# c71e3362 05-Jul-2013 Jamie Gritton <jamie@FreeBSD.org>

Add new privileges, PRIV_KMEM_READ and PRIV_KMEM_WRITE, used in opening
/dev/kmem and /dev/mem (in addition to traditional file permission checks).
PRIV_KMEM_READ is different from other PRIV_* checks in that it's allowed
by default.

Reviewed by: kib, mckusick


# 35fd7bc0 02-Jul-2011 Bjoern A. Zeeb <bz@FreeBSD.org>

Add infrastructure to allow all frames/packets received on an interface
to be assigned to a non-default FIB instance.

You may need to recompile world or ports due to the change of struct ifnet.

Submitted by: cjsp
Submitted by: Alexander V. Chernikov (melifaro ipfw.ru)
(original versions)
Reviewed by: julian
Reviewed by: Alexander V. Chernikov (melifaro ipfw.ru)
MFC after: 2 weeks
X-MFC: use spare in struct ifnet


# 415896e3 10-Apr-2011 Edward Tomasz Napierala <trasz@FreeBSD.org>

Rename a misnamed structure field (hr_loginclass), and reorder priv(9)
constants to match the order and naming of syscalls. No functional changes.


# ec125fbb 30-Mar-2011 Edward Tomasz Napierala <trasz@FreeBSD.org>

Add rctl. It's used by racct to take user-configurable actions based
on the set of rules it maintains and the current resource usage. It also
privides userland API to manage that ruleset.

Sponsored by: The FreeBSD Foundation
Reviewed by: kib (earlier version)


# 2bfc50bc 04-Mar-2011 Edward Tomasz Napierala <trasz@FreeBSD.org>

Add two new system calls, setloginclass(2) and getloginclass(2). This makes
it possible for the kernel to track login class the process is assigned to,
which is required for RCTL. This change also make setusercontext(3) call
setloginclass(2) and makes it possible to retrieve current login class using
id(1).

Reviewed by: kib (as part of a larger patch)


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# 4e3eab52 20-Jul-2010 Rui Paulo <rpaulo@FreeBSD.org>

Fix typo in comment.


# a5a931b3 25-Feb-2010 Xin LI <delphij@FreeBSD.org>

MFC 203052:

Add interface description capability as inspired by OpenBSD. Thanks for
rwatson@, jhb@, brooks@ and others for feedback to the old implementation!

Sponsored by: iXsystems, Inc.


# 215940b3 26-Jan-2010 Xin LI <delphij@FreeBSD.org>

Revised revision 199201 (add interface description capability as inspired
by OpenBSD), based on comments from many, including rwatson, jhb, brooks
and others.

Sponsored by: iXsystems, Inc.
MFC after: 1 month


# 1a9d4dda 12-Nov-2009 Xin LI <delphij@FreeBSD.org>

Revert revision 199201 for now as it has introduced a kernel vulnerability
and requires more polishing.


# 41c8c6e8 11-Nov-2009 Xin LI <delphij@FreeBSD.org>

Add interface description capability as inspired by OpenBSD.

MFC after: 3 months


# 4b0026b9 30-Jul-2009 Robert Watson <rwatson@FreeBSD.org>

Add two new privileges for use by OpenAFS, which will be supported for
FreeBSD 8.x.

MFC after: 3 days
Submitted by: Benjamin Kaduk <kaduk at MIT.EDU>
Approved by: re (kib)


# cebc7fb1 01-Jul-2009 John Baldwin <jhb@FreeBSD.org>

Improve the handling of cpuset with interrupts.
- For x86, change the interrupt source method to assign an interrupt source
to a specific CPU to return an error value instead of void, thus allowing
it to fail.
- If moving an interrupt to a CPU fails due to a lack of IDT vectors in the
destination CPU, fail the request with ENOSPC rather than panicing.
- For MSI interrupts on x86 (but not MSI-X), only allow cpuset to be used
on the first interrupt in a group. Moving the first interrupt in a group
moves the entire group.
- Use the icu_lock to protect intr_next_cpu() on x86 instead of the
intr_table_lock to fix a LOR introduced in the last set of MSI changes.
- Add a new privilege PRIV_SCHED_CPUSET_INTR for using cpuset with
interrupts. Previously, binding an interrupt to a CPU only performed a
privilege check if the interrupt had an interrupt thread. Interrupts
without a thread could be bound by non-root users as a result.
- If an interrupt event's assign_cpu method fails, then restore the original
cpuset mask for the associated interrupt thread.

Approved by: re (kib)


# 5387b227 24-Jun-2009 Robert Watson <rwatson@FreeBSD.org>

Remove kernel SLIP and PPP privileges, since they are no longer used.

Suggested by: bz


# 3364c323 23-Jun-2009 Konstantin Belousov <kib@FreeBSD.org>

Implement global and per-uid accounting of the anonymous memory. Add
rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved
for the uid.

The accounting information (charge) is associated with either map entry,
or vm object backing the entry, assuming the object is the first one
in the shadow chain and entry does not require COW. Charge is moved
from entry to object on allocation of the object, e.g. during the mmap,
assuming the object is allocated, or on the first page fault on the
entry. It moves back to the entry on forks due to COW setup.

The per-entry granularity of accounting makes the charge process fair
for processes that change uid during lifetime, and decrements charge
for proper uid when region is unmapped.

The interface of vm_pager_allocate(9) is extended by adding struct ucred *,
that is used to charge appropriate uid when allocation if performed by
kernel, e.g. md(4).

Several syscalls, among them is fork(2), may now return ENOMEM when
global or per-uid limits are enforced.

In collaboration with: pho
Reviewed by: alc
Approved by: re (kensmith)


# 148747c2 20-Jun-2009 Ed Schouten <ed@FreeBSD.org>

Add placeholder to prevent reuse of privilege 254.

Requested by: rwatson


# f8f61460 20-Jun-2009 Ed Schouten <ed@FreeBSD.org>

Improve nested jail awareness of devfs by handling credentials.

Now that we start to use credentials on character devices more often
(because of MPSAFE TTY), move the prison-checks that are in place in the
TTY code into devfs.

Instead of strictly comparing the prisons, use the more common
prison_check() function to compare credentials. This means that
pseudo-terminals are only visible in devfs by processes within the same
jail and parent jails.

Even though regular users in parent jails can now interact with
pseudo-terminals from child jails, this seems to be the right approach.
These processes are also capable of interacting with the jailed
processes anyway, through signals for example.

Reviewed by: kib, rwatson (older version)


# 679e1390 15-Jun-2009 Jamie Gritton <jamie@FreeBSD.org>

Manage vnets via the jail system. If a jail is given the boolean
parameter "vnet" when it is created, a new vnet instance will be created
along with the jail. Networks interfaces can be moved between prisons
with an ioctl similar to the one that moves them between vimages.
For now vnets will co-exist under both jails and vimages, but soon
struct vimage will be going away.

Reviewed by: zec, julian
Approved by: bz (mentor)


# dbe59260 07-Jun-2009 Hiroki Sato <hrs@FreeBSD.org>

Fix and add a workaround on an issue of EtherIP packet with reversed
version field sent via gif(4)+if_bridge(4). The EtherIP
implementation found on FreeBSD 6.1, 6.2, 6.3, 7.0, 7.1, and 7.2 had
an interoperability issue because it sent the incorrect EtherIP
packets and discarded the correct ones.

This change introduces the following two flags to gif(4):

accept_rev_ethip_ver: accepts both correct EtherIP packets and ones
with reversed version field, if enabled. If disabled, the gif
accepts the correct packets only. This flag is enabled by
default.

send_rev_ethip_ver: sends EtherIP packets with reversed version field
intentionally, if enabled. If disabled, the gif sends the correct
packets only. This flag is disabled by default.

These flags are stored in struct gif_softc and can be set by
ifconfig(8) on per-interface basis.

Note that this is an incompatible change of EtherIP with the older
FreeBSD releases. If you need to interoperate older FreeBSD boxes and
new versions after this commit, setting "send_rev_ethip_ver" is
needed.

Reviewed by: thompsa and rwatson
Spotted by: Shunsuke SHINOMIYA
PR: kern/125003
MFC after: 2 weeks


# f44270e7 01-Jun-2009 Pawel Jakub Dawidek <pjd@FreeBSD.org>

- Rename IP_NONLOCALOK IP socket option to IP_BINDANY, to be more consistent
with OpenBSD (and BSD/OS originally). We can't easly do it SOL_SOCKET option
as there is no more space for more SOL_SOCKET options, but this option also
fits better as an IP socket option, it seems.
- Implement this functionality also for IPv6 and RAW IP sockets.
- Always compile it in (don't use additional kernel options).
- Remove sysctl to turn this functionality on and off.
- Introduce new privilege - PRIV_NETINET_BINDANY, which allows to use this
functionality (currently only unjail root can use it).

Discussed with: julian, adrian, jhb, rwatson, kmacy


# 76ca6f88 29-May-2009 Jamie Gritton <jamie@FreeBSD.org>

Place hostnames and similar information fully under the prison system.
The system hostname is now stored in prison0, and the global variable
"hostname" has been removed, as has the hostname_mtx mutex. Jails may
have their own host information, or they may inherit it from the
parent/system. The proper way to read the hostname is via
getcredhostname(), which will copy either the hostname associated with
the passed cred, or the system hostname if you pass NULL. The system
hostname can still be accessed directly (and without locking) at
prison0.pr_host, but that should be avoided where possible.

The "similar information" referred to is domainname, hostid, and
hostuuid, which have also become prison parameters and had their
associated global variables removed.

Approved by: bz (mentor)


# 9ce13065 22-May-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

Add privileges for Capi4BSD to control:
- controller reset/firmware loading.
- controller level tracing and tracing of capi messages of applications
running with different user credentials.

Reviewed by: rwatson
MFC after: 2 weeks


# b38ff370 29-Apr-2009 Jamie Gritton <jamie@FreeBSD.org>

Introduce the extensible jail framework, using the same "name=value"
interface as nmount(2). Three new system calls are added:
* jail_set, to create jails and change the parameters of existing jails.
This replaces jail(2).
* jail_get, to read the parameters of existing jails. This replaces the
security.jail.list sysctl.
* jail_remove to kill off a jail's processes and remove the jail.
Most jail parameters may now be changed after creation, and jails may be
set to exist without any attached processes. The current jail(2) system
call still exists, though it is now a stub to jail_set(2).

Approved by: bz (mentor)


# cd86ae77 28-Feb-2009 Robert Watson <rwatson@FreeBSD.org>

Remove PRIV_ROOT -- all system privileges must now be explicitly named
in support of forthcoming work on a fine-grained privilege mechanism.

Facilitated by: bz, thompsa, rink


# 1ba4a712 17-Nov-2008 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Update ZFS from version 6 to 13 and bring some FreeBSD-specific changes.

This bring huge amount of changes, I'll enumerate only user-visible changes:

- Delegated Administration

Allows regular users to perform ZFS operations, like file system
creation, snapshot creation, etc.

- L2ARC

Level 2 cache for ZFS - allows to use additional disks for cache.
Huge performance improvements mostly for random read of mostly
static content.

- slog

Allow to use additional disks for ZFS Intent Log to speed up
operations like fsync(2).

- vfs.zfs.super_owner

Allows regular users to perform privileged operations on files stored
on ZFS file systems owned by him. Very careful with this one.

- chflags(2)

Not all the flags are supported. This still needs work.

- ZFSBoot

Support to boot off of ZFS pool. Not finished, AFAIK.

Submitted by: dfr

- Snapshot properties

- New failure modes

Before if write requested failed, system paniced. Now one
can select from one of three failure modes:
- panic - panic on write error
- wait - wait for disk to reappear
- continue - serve read requests if possible, block write requests

- Refquota, refreservation properties

Just quota and reservation properties, but don't count space consumed
by children file systems, clones and snapshots.

- Sparse volumes

ZVOLs that don't reserve space in the pool.

- External attributes

Compatible with extattr(2).

- NFSv4-ACLs

Not sure about the status, might not be complete yet.

Submitted by: trasz

- Creation-time properties

- Regression tests for zpool(8) command.

Obtained from: OpenSolaris


# 4edfe662 14-Nov-2008 Ed Schouten <ed@FreeBSD.org>

Per request, keep privilege number 20 reserved.

In my commit that moved uname(), setdomainname() and getdomainname() to
COMPAT_FREEBSD4, I also removed PRIV_SETDOMAINNAME, because it was
already protected by userland_sysctl(). We'd better keep the number 20
reserved, to prevent it from being used again.

Requested by: rwatson


# a1b5a895 09-Nov-2008 Ed Schouten <ed@FreeBSD.org>

Mark uname(), getdomainname() and setdomainname() with COMPAT_FREEBSD4.

Looking at our source code history, it seems the uname(),
getdomainname() and setdomainname() system calls got deprecated
somewhere after FreeBSD 1.1, but they have never been phased out
properly. Because we don't have a COMPAT_FREEBSD1, just use
COMPAT_FREEBSD4.

Also fix the Linuxolator to build without the setdomainname() routine by
just making it call userland_sysctl on kern.domainname. Also replace the
setdomainname()'s implementation to use this approach, because we're
duplicating code with sysctl_domainname().

I wasn't able to keep these three routines working in our
COMPAT_FREEBSD32, because that would require yet another keyword for
syscalls.master (COMPAT4+NOPROTO). Because this routine is probably
unused already, this won't be a problem in practice. If it turns out to
be a problem, we'll just restore this functionality.

Reviewed by: rdivacky, kib


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# cecd8edb 17-Sep-2008 Attilio Rao <attilio@FreeBSD.org>

Remove the suser(9) interface from the kernel. It has been replaced from
years by the priv_check(9) interface and just very few places are left.
Note that compatibility stub with older FreeBSD version
(all above the 8 limit though) are left in order to reduce diffs against
old versions. It is responsibility of the maintainers for any module, if
they think it is the case, to axe out such cases.

This patch breaks KPI so __FreeBSD_version will be bumped into a later
commit.

This patch needs to be credited 50-50 with rwatson@ as he found time to
explain me how the priv_check() works in detail and to review patches.

Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
Reviewed by: rwatson


# e085f869 08-Aug-2008 Stanislav Sedov <stas@FreeBSD.org>

- Add cpuctl(4) pseudo-device driver to provide access to some low-level
features of CPUs like reading/writing machine-specific registers,
retrieving cpuid data, and updating microcode.
- Add cpucontrol(8) utility, that provides userland access to
the features of cpuctl(4).
- Add subsequent manpages.

The cpuctl(4) device operates as follows. The pseudo-device node cpuctlX
is created for each cpu present in the systems. The pseudo-device minor
number corresponds to the cpu number in the system. The cpuctl(4) pseudo-
device allows a number of ioctl to be preformed, namely RDMSR/WRMSR/CPUID
and UPDATE. The first pair alows the caller to read/write machine-specific
registers from the correspondent CPU. cpuid data could be retrieved using
the CPUID call, and microcode updates are applied via UPDATE.

The permissions are inforced based on the pseudo-device file permissions.
RDMSR/CPUID will be allowed when the caller has read access to the device
node, while WRMSR/UPDATE will be granted only when the node is opened
for writing. There're also a number of priv(9) checks.

The cpucontrol(8) utility is intened to provide userland access to
the cpuctl(4) device features. The utility also allows one to apply
cpu microcode updates.

Currently only Intel and AMD cpus are supported and were tested.

Approved by: kib
Reviewed by: rpaulo, cokane, Peter Jeremy
MFC after: 1 month


# ba931c08 29-Jun-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

Add a new priv 'PRIV_SCHED_CPUSET' to check if manipulating cpusets is
allowed and replace the suser() call. Do not allow it in jails.

Reviewed by: rwatson


# 5a14b2bf 15-Feb-2008 Robert Watson <rwatson@FreeBSD.org>

Add privilege PRIV_NNPFS_DEBUG for use with Arla/nnpfs. This privilege
will authorize debugging system calls.

MFC after: 1 month


# 79ba3952 24-Jan-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

Replace the last susers calls in netinet6/ with privilege checks.

Introduce a new privilege allowing to set certain IP header options
(hop-by-hop, routing headers).

Leave a few comments to be addressed later.

Reviewed by: rwatson (older version, before addressing his comments)


# 3a2669e4 25-Dec-2007 Robert Watson <rwatson@FreeBSD.org>

Add a new privilage category for DDB(4), and add PRIV_DDB_CAPTURE to
control access to the DDB capture buffer.


# e41966dc 21-Oct-2007 Robert Watson <rwatson@FreeBSD.org>

Add PRIV_VFS_STAT privilege, which will allow overriding policy limits on
the right to stat() a file, such as in mac_bsdextended.

Obtained from: TrustedBSD Project
MFC after: 3 months


# c4f45442 18-Jun-2007 Robert Watson <rwatson@FreeBSD.org>

Update comment: kernel privileges are, in fact sorted by subsytem.


# 7251b786 16-Jun-2007 Robert Watson <rwatson@FreeBSD.org>

Rather than passing SUSER_RUID into priv_check_cred() to specify when
a privilege is checked against the real uid rather than the effective
uid, instead decide which uid to use in priv_check_cred() based on the
privilege passed in. We use the real uid for PRIV_MAXFILES,
PRIV_MAXPROC, and PRIV_PROC_LIMIT. Remove the definition of
SUSER_RUID; there are now no flags defined for priv_check_cred().

Obtained from: TrustedBSD Project


# 32f9753c 11-Jun-2007 Robert Watson <rwatson@FreeBSD.org>

Eliminate now-unused SUSER_ALLOWJAIL arguments to priv_check_cred(); in
some cases, move to priv_check() if it was an operation on a thread and
no other flags were present.

Eliminate caller-side jail exception checking (also now-unused); jail
privilege exception code now goes solely in kern_jail.c.

We can't yet eliminate suser() due to some cases in the KAME code where
a privilege check is performed and then used in many different deferred
paths. Do, however, move those prototypes to priv.h.

Reviewed by: csjp
Obtained from: TrustedBSD Project


# dc472513 21-Apr-2007 Robert Watson <rwatson@FreeBSD.org>

Attempt to rationalize NFS privileges:

- Replace PRIV_NFSD with PRIV_NFS_DAEMON, add PRIV_NFS_LOCKD.

- Use PRIV_NFS_DAEMON in the NFS server.

- In the NFS client, move the privilege check from nfslockdans(), which
occurs every time a write is performed on /dev/nfslock, and instead do it
in nfslock_open() just once. This allows us to avoid checking the saved
uid for root, and just use the effective on open. Use PRIV_NFS_LOCKD.


# 18242d3b 16-Apr-2007 Andrew Thompson <thompsa@FreeBSD.org>

Rename the trunk(4) driver to lagg(4) as it is too similar to vlan trunking.

The name trunk is misused as the networking term trunk means carrying multiple
VLANs over a single connection. The IEEE standard for link aggregation (802.3
section 3) does not talk about 'trunk' at all while it is used throughout IEEE
802.1Q in describing vlans.

The lagg(4) driver provides link aggregation, failover and fault tolerance.

Discussed on: current@


# 6493245d 10-Apr-2007 Robert Watson <rwatson@FreeBSD.org>

Add a new privilege, PRIV_NETINET_REUSEPORT, which will replace superuser
checks to see whether bind() can reuse a port/address combination while
it's already in use (for some definition of use).


# b47888ce 09-Apr-2007 Andrew Thompson <thompsa@FreeBSD.org>

Add the trunk(4) driver for providing link aggregation, failover and fault
tolerance. This driver allows aggregation of multiple network interfaces as
one virtual interface using a number of different protocols/algorithms.

failover - Sends traffic through the secondary port if the master becomes
inactive.
fec - Supports Cisco Fast EtherChannel.
lacp - Supports the IEEE 802.3ad Link Aggregation Control Protocol
(LACP) and the Marker Protocol.
loadbalance - Static loadbalancing using an outgoing hash.
roundrobin - Distributes outgoing traffic using a round-robin scheduler
through all active ports.

This code was obtained from OpenBSD and this also includes 802.3ad LACP support
from agr(4) in NetBSD.


# e726fc7c 05-Apr-2007 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Add ZFS-specific privileges.


# bb531912 01-Mar-2007 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Rename PRIV_VFS_CLEARSUGID to PRIV_VFS_RETAINSUGID, which seems to better
describe the privilege.

OK'ed by: rwatson


# 1d1f5f85 27-Feb-2007 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Add a comment for PRIV_NET_SETLLADDR.

OK'ed by: rwatson


# 7ee76f9d 20-Feb-2007 Robert Watson <rwatson@FreeBSD.org>

Remove unnecessary privilege and privilege check for WITNESS sysctl.

Head nod: jhb


# 3bb153ea 19-Feb-2007 Robert Watson <rwatson@FreeBSD.org>

Remove discontinuity in network privilege number space.

Spotted by: emaste (ages ago)


# 95420afe 19-Feb-2007 Robert Watson <rwatson@FreeBSD.org>

Remove unused PRIV_IPC_EXEC. Renumbers System V IPC privilege.


# 95b091d2 19-Feb-2007 Robert Watson <rwatson@FreeBSD.org>

Rename three quota privileges from the UFS privilege namespace to the
VFS privilege namespace: exceedquota, getquota, and setquota. Leave
UFS-specific quota configuration privileges in the UFS name space.

This renumbers VFS and UFS privileges, so requires rebuilding modules
if you are using security policies aware of privilege identifiers.
This is likely no one at this point since none of the committed MAC
policies use the privilege checks.


# 800c9408 06-Nov-2006 Robert Watson <rwatson@FreeBSD.org>

Add a new priv(9) kernel interface for checking the availability of
privilege for threads and credentials. Unlike the existing suser(9)
interface, priv(9) exposes a named privilege identifier to the privilege
checking code, allowing more complex policies regarding the granting of
privilege to be expressed. Two interfaces are provided, replacing the
existing suser(9) interface:

suser(td) -> priv_check(td, priv)
suser_cred(cred, flags) -> priv_check_cred(cred, priv, flags)

A comprehensive list of currently available kernel privileges may be
found in priv.h. New privileges are easily added as required, but the
comments on adding privileges found in priv.h and priv(9) should be read
before doing so.

The new privilege interface exposed sufficient information to the
privilege checking routine that it will now be possible for jail to
determine whether a particular privilege is granted in the check routine,
rather than relying on hints from the calling context via the
SUSER_ALLOWJAIL flag. For now, the flag is maintained, but a new jail
check function, prison_priv_check(), is exposed from kern_jail.c and used
by the privilege check routine to determine if the privilege is permitted
in jail. As a result, a centralized list of privileges permitted in jail
is now present in kern_jail.c.

The MAC Framework is now also able to instrument privilege checks, both
to deny privileges otherwise granted (mac_priv_check()), and to grant
privileges otherwise denied (mac_priv_grant()), permitting MAC Policy
modules to implement privilege models, as well as control a much broader
range of system behavior in order to constrain processes running with
root privilege.

The suser() and suser_cred() functions remain implemented, now in terms
of priv_check() and the PRIV_ROOT privilege, for use during the transition
and possibly continuing use by third party kernel modules that have not
been updated. The PRIV_DRIVER privilege exists to allow device drivers to
check privilege without adopting a more specific privilege identifier.

This change does not modify the actual security policy, rather, it
modifies the interface for privilege checks so changes to the security
policy become more feasible.

Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Discussed on: arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
Alex Lyashkov <umka at sevcity dot net>,
Skip Ford <skip dot ford at verizon dot net>,
Antoine Brodin <antoine dot brodin at laposte dot net>