History log of /freebsd-current/sys/sys/jail.h
Revision Date Author Comments
# ed31b3f4 29-Nov-2023 Jamie Gritton <jamie@FreeBSD.org>

jail: Don't allow jail_set(2) to resurrect dying jails.

Currently, a prison in "dying" state (removed but still holding
resources) can be brought back to alive state via "jail -d", or
the JAIL_DYING flag to jail_set(2). This seemed like a good idea
at the time.

Its main use was to improve support for specifying the jid when
creating a jail, which also seemed like a good idea at the time.
But resurrecting a jail that was partway through thr process of
shutting down is trouble waiting to happen.

This patch deprecates that flag, leaving it as a no-op for creating
jails (but still useful for looking at dying jails). It sill allows
creating a new jail with the same jid as a dying one, but will renumber
the old one in that case. That's imperfect, but allows for current
behavior.

Reviewed by: bz
Differential Revision: https://reviews.freebsd.org/D28150


# cb48780d 01-Sep-2023 Shawn Webb <shawn.webb@hardenedbsd.org>

jail: Add the ability to access system-level filesystem extended attributes

Prior to this commit privileged accounts in a jail could not access to the
filesystem extended attributes in the system namespace. To control access to
the system namespace in a per-jail basis add a new configuration parameter
allow.extattr which is off by default.

Reported by: zirias
Tested by: zirias
Obtained from: HardenedBSD
Reviewed by: kevans, jamie
Differential revision: https://reviews.freebsd.org/D41643
MFC after: 1 week
Relnotes: yes


# 95ee2897 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: two-line .h pattern

Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/


# 4d846d26 10-May-2023 Warner Losh <imp@FreeBSD.org>

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix


# 0b0ae2e4 14-Mar-2023 Mina Galić <freebsd@igalic.co>

jail: convert several functions from int to bool

these functions exclusively return (0) and (1), so convert them to bool

We also convert some networking related jail functions from int to bool
some of which were returning an error that was never used.

Differential Revision: https://reviews.freebsd.org/D29659
Reviewed by: imp, jamie (earlier version)
Pull Request: https://github.com/freebsd/freebsd-src/pull/663


# 88175af8 21-Feb-2023 Rick Macklem <rmacklem@FreeBSD.org>

vfs_export: Add mnt_exjail to control exports done in prisons

If there are multiple instances of mountd(8) (in different
prisons), there will be confusion if they manipulate the
exports of the same file system. This patch adds mnt_exjail
to "struct mount" so that the credentials (and, therefore,
the prison) that did the exports for that file system can
be recorded. If another prison has already exported the
file system, vfs_export() will fail with an error.
If mnt_exjail == NULL, the file system has not been exported.
mnt_exjail is checked by the NFS server, so that exports done
from within a different prison will not be used.

The patch also implements vfs_exjail_destroy(), which is
called from prison_cleanup() to release all the mnt_exjail
credential references, so that the prison can be removed.
Mainly to avoid doing a scan of the mountlist for the case
where there were no exports done from within the prison,
a count of how many file systems have been exported from
within the prison is kept in pr_exportcnt.

Reviewed by: markj
Discussed with: jamie
Differential Revision: https://reviews.freebsd.org/D38371
MFC after: 3 months


# d94e0bdc 04-Feb-2023 Rick Macklem <rmacklem@FreeBSD.org>

Revert "vfs_export: Add checks for correct prison when updating exports"

This reverts commit 7926a01ed7ae7cefd81ef4cc2142c35b84d81913.

A new patch in D38371 is being considered for doing this.


# 7926a01e 02-Feb-2023 Rick Macklem <rmacklem@FreeBSD.org>

vfs_export: Add checks for correct prison when updating exports

mountd(8) basically does the following:
getmntinfo()
for each mount
delete_exports
using nmount(2) to do the creation/deletion of individual exports.

For prison0 (and for other prisons if enforce_statfs == 0) getmntinfo()
returns all mount points, including ones being used within other prisons.
This can cause confusion if the same file system is specified in the
exports(5) file for multiple prisons.

This patch adds a perminent identifier to each prison
and marks which prison did the exports in a field of
the mount structure called mnt_exjail. This field can
then be compared to the perminent identifier for the
prison that the thread's credentials is in.
Also required was a new function called prison_isalive_permid()
which returns if the prison is alive, so that the check can be
ignored for prisons that have been removed.

This prepares the system to allow mountd(8) to run in multiple
prisons, including prison0.

Future commits will complete the modifications to allow mountd(8)
to run in vnet prisons. Until then, these changes should not affect
semantics.

Reviewed by: markj
MFC after: 3 months
Differential Revision: https://reviews.freebsd.org/D38144


# bba7a2e8 17-Dec-2022 Rick Macklem <rmacklem@FreeBSD.org>

kern_jail.c: Allow mountd/nfsd to optionally run in a jail

This patch adds "allow.nfsd" to the jail code based on a
new kernel build option VNET_NFSD. This will not work
until future patches fix nmount(2) to allow mountd to
run in a vnet prison and the NFS server code is patched
so that global variables are in a vnet.

The jail(8) man page will be patched in a future commit.

Reviewed by: jamie
MFC after: 4 months
Differential Revision: https://reviews.freebsd.org/D37637


# 5ecb5444 10-Mar-2022 Mateusz Guzik <mjg@FreeBSD.org>

jail: add process linkage

It allows iteration over processes belonging to given jail instead of
having to walk the entire allproc list.

Note the iteration can miss processes which remains bug-compatible
with previous code.

Reviewed by: jamie (previous version), markj (previous version)
Differential Revision: https://reviews.freebsd.org/D34522


# eb8dcdea 26-Dec-2021 Gleb Smirnoff <glebius@FreeBSD.org>

jail: network epoch protection for IP address lists

Now struct prison has two pointers (IPv4 and IPv6) of struct
prison_ip type. Each points into epoch context, address count
and variable size array of addresses. These structures are
freed with network epoch deferred free and are not edited in
place, instead a new structure is allocated and set.

While here, the change also generalizes a lot (but not enough)
of IPv4 and IPv6 processing. E.g. address family agnostic helpers
for kern_jail_set() are provided, that reduce v4-v6 copy-paste.

The fast-path prison_check_ip[46]_locked() is also generalized
into prison_ip_check() that can be executed with network epoch
protection only.

Reviewed by: jamie
Differential revision: https://reviews.freebsd.org/D33339


# 17db4b52 14-Aug-2021 Gordon Bergling <gbe@FreeBSD.org>

Fix some common typos in source code comments

- s/struture/structure/
- s/structre/structure/

MFC after: 5 days


# 2d741f33 15-Apr-2021 Kyle Evans <kevans@FreeBSD.org>

kern: ether_gen_addr: randomize on default hostuuid, too

Currently, this will still hash the default (all zero) hostuuid and
potentially arrive at a MAC address that has a high chance of collision
if another interface of the same name appears in the same broadcast
domain on another host without a hostuuid, e.g., some virtual machine
setups.

Instead of using the default hostuuid, just treat it as a failure and
generate a random LA unicast MAC address.

Reviewed by: bz, gbe, imp, kbowling, kp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D29788


# 811e27fa 22-Feb-2021 Jamie Gritton <jamie@FreeBSD.org>

jail: Add PD_KILL to remove a prison in prison_deref().

Add the PD_KILL flag that instructs prison_deref() to take steps
to actively kill a prison and its descendents, namely marking it
PRISON_STATE_DYING, clearing its PR_PERSIST flag, and killing any
attached processes.

This replaces a similar loop in sys_jail_remove(), bringing the
operation under the same single hold on allprison_lock that it already
has. It is also used to clean up failed jail (re-)creations in
kern_jail_set(), which didn't generally take all the proper steps.

Differential Revision: https://reviews.freebsd.org/D28473


# 1158508a 21-Feb-2021 Jamie Gritton <jamie@FreeBSD.org>

jail: Add pr_state to struct prison

Rather that using references (pr_ref and pr_uref) to deduce the state
of a prison, keep track of its state explicitly. A prison is either
"invalid" (pr_ref == 0), "alive" (pr_uref > 0) or "dying"
(pr_uref == 0).

State transitions are generally tied to the reference counts, but with
some flexibility: a new prison is "invalid" even though it now starts
with a reference, and jail_remove(2) sets the state to "dying" before
the user reference count drops to zero (which was prviously
accomplished via the PR_REMOVE flag).

pr_state is protected by both the prison mutex and allprison_lock, so
it has the same availablity guarantees as the reference counts do.

Differential Revision: https://reviews.freebsd.org/D27876


# f7496dca 21-Feb-2021 Jamie Gritton <jamie@FreeBSD.org>

jail: Change the locking around pr_ref and pr_uref

Require both the prison mutex and allprison_lock when pr_ref or
pr_uref go to/from zero. Adding a non-first or removing a non-last
reference remain lock-free. This means that a shared hold on
allprison_lock is sufficient for prison_isalive() to be useful, which
removes a number of cases of lock/check/unlock on the prison mutex.

Expand the locking in kern_jail_set() to keep allprison_lock held
exclusive until the new prison is valid, thus making invalid prisons
invisible to any thread holding allprison_lock (except of course the
one creating or destroying the prison). This renders prison_isvalid()
nearly redundant, now used only in asserts.

Differential Revision: https://reviews.freebsd.org/D28419
Differential Revision: https://reviews.freebsd.org/D28458


# cc7b7306 16-Feb-2021 Jamie Gritton <jamie@FreeBSD.org>

jail: Handle a possible race between jail_remove(2) and fork(2)

jail_remove(2) includes a loop that sends SIGKILL to all processes
in a jail, but skips processes in PRS_NEW state. Thus it is possible
the a process in mid-fork(2) during jail removal can survive the jail
being removed.

Add a prison flag PR_REMOVE, which is checked before the new process
returns. If the jail is being removed, the process will then exit.
Also check this flag in jail_attach(2) which has a similar issue.

Reported by: trasz
Approved by: kib
MFC after: 3 days


# 6754ae25 20-Jan-2021 Jamie Gritton <jamie@FreeBSD.org>

jail: Use refcount(9) for prison references.

Use refcount(9) for both pr_ref and pr_uref in struct prison. This
allows prisons to held and freed without requiring the prison mutex.
An exception to this is that dropping the last reference will still
lock the prison, to keep the guarantee that a locked prison remains
valid and alive (provided it was at the time it was locked).

Among other things, this honors the promise made in a comment in
crcopy(9), that it will not block, which hasn't been true for two
decades.


# 76ad42ab 18-Jan-2021 Jamie Gritton <jamie@FreeBSD.org>

jail: Add prison_isvalid() and prison_isalive()

prison_isvalid() checks if a prison record can be used at all, i.e.
pr_ref > 0. This filters out prisons that aren't fully created, and
those that are either in the process of being dismantled, or will be
at the next opportunity. While the check for pr_ref > 0 is simple
enough to make without a convenience function, this prepares the way
for other measures of prison validity.

prison_isalive() checks not only validity as far as the useablity of
the prison structure, but also whether the prison is visible to user
space. It replaces a test for pr_uref > 0, which is currently only
used within kern_jail.c, and not often there.

Both of these functions also assert that either the prison mutex or
allprison_lock is held, since it's generally the case that unlocked
prisons aren't guaranteed to remain useable for any length of time.
This isn't entirely true, for example a thread can assume its own
prison is good, but most exceptions will exist inside of kern_jail.c.


# 0fe74ae6 26-Dec-2020 Jamie Gritton <jamie@FreeBSD.org>

jail: Consistently handle the pr_allow bitmask

Return a boolean (i.e. 0 or 1) from prison_allow, instead of the flag
value itself, which is what sysctl expects.

Add prison_set_allow(), which can set or clear a permission bit, and
propagates cleared bits down to child jails.

Use prison_allow() and prison_set_allow() in the various jail.allow.*
sysctls, and others that depend on thoe permissions.

Add locking around checking both pr_allow and pr_enforce_statfs in
prison_priv_check().


# 43c27348 26-Dec-2020 Jamie Gritton <jamie@FreeBSD.org>

jail: Make comments on struct prison locking more precise


# 05e1e482 18-Nov-2020 Mariusz Zaborski <oshogbo@FreeBSD.org>

jail: introduce per jail suser_enabled setting

The suser_enable sysctl allows to remove a privileged rights from uid 0.
This change introduce per jail setting which allow to make root a
normal user.

Reviewed by: jamie
Previous version reviewed by: kevans, emaste, markj, me_igalic.co
Discussed with: pjd
Differential Revision: https://reviews.freebsd.org/D27128


# 08204289 29-Aug-2020 Jamie Gritton <jamie@FreeBSD.org>

Add __BEGIN_DECLS to jail.h to keep C++ happy.

PR: 238928
Reported by: yuri@


# 3f8bc99c 18-Apr-2020 Kristof Provost <kp@FreeBSD.org>

ethersubr: Make the mac address generation more robust

If we create two (vnet) jails and create a bridge interface in each we end up
with the same mac address on both bridge interfaces.
These very often conflicts, resulting in same mac address in both jails.

Mitigate this problem by including the jail name in the mac address.

Reviewed by: kevans, melifaro
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D24383


# 7029da5c 26-Feb-2020 Pawel Biernacki <kaktus@FreeBSD.org>

Mark more nodes as CTLFLAG_MPSAFE or CTLFLAG_NEEDGIANT (17 of many)

r357614 added CTLFLAG_NEEDGIANT to make it easier to find nodes that are
still not MPSAFE (or already are but aren’t properly marked).
Use it in preparation for a general review of all nodes.

This is non-functional change that adds annotations to SYSCTL_NODE and
SYSCTL_PROC nodes using one of the soon-to-be-required flags.

Mark all obvious cases as MPSAFE. All entries that haven't been marked
as MPSAFE before are by default marked as NEEDGIANT

Approved by: kib (mentor, blanket)
Commented by: kib, gallatin, melifaro
Differential Revision: https://reviews.freebsd.org/D23718


# e6081fe8 13-Feb-2020 Mateusz Guzik <mjg@FreeBSD.org>

Inline jailed().

It is constantly called from priv_check.


# c83dda36 31-Dec-2019 Alexander V. Chernikov <melifaro@FreeBSD.org>

Split gigantic rtsock route_output() into smaller functions.

Amount of changes to the original code has been intentionally minimised
to ease diffing.
The changes are mostly mechanical, with the following exceptions:

* lltable handler is now called directly based of RTF_LLINFO flag presense.
* "report" logic for updating rtm in RTM_GET/RTM_DELETE has been simplified,
fixing several potential use-after-free cases in rt_addrinfo.
* llable asserts has been replaced with error-returning, preventing kernel
crashes when lltable gw af family is invalid (root required).

MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D22864


# b3079544 27-Nov-2018 Jamie Gritton <jamie@FreeBSD.org>

In hardened systems, where the security.bsd.unprivileged_proc_debug sysctl
node is set, allow setting security.bsd.unprivileged_proc_debug per-jail.
In part, this is needed to create jails in which the Address Sanitizer
(ASAN) fully works as ASAN utilizes libkvm to inspect the virtual address
space. Instead of having to allow unprivileged process debugging for the
entire system, allow setting it on a per-jail basis.

The sysctl node is still security.bsd.unprivileged_proc_debug and the
jail(8) param is allow.unprivileged_proc_debug. The sysctl code is now a
sysctl proc rather than a sysctl int. This allows us to determine setting
the flag for the corresponding jail (or prison0).

As part of the change, the dynamic allow.* API needed to be modified to
take into account pr_allow flags which may now be disabled in prison0.
This prevents conflicts with new pr_allow flags (like that of vmm(4)) that
are added (and removed) dynamically.

Also teach the jail creation KPI to allow differences for certain pr_allow
flags between the parent and child jail. This can happen when unprivileged
process debugging is disabled in the parent prison, but enabled in the
child.

Submitted by: Shawn Webb <lattera at gmail.com>
Obtained from: HardenedBSD (45b3625edba0f73b3e3890b1ec3d0d1e95fd47e1, deba0b5078cef0faae43cbdafed3035b16587afc, ab21eeb3b4c72f2500987c96ff603ccf3b6e7de8)
Relnotes: yes
Sponsored by: HardenedBSD and G2, Inc
Differential Revision: https://reviews.freebsd.org/D18319


# b19d66fd 17-Oct-2018 Jamie Gritton <jamie@FreeBSD.org>

Add a new jail permission, allow.read_msgbuf. When true, jailed processes
can see the dmesg buffer (this is the current behavior). When false (the
new default), dmesg will be unavailable to jailed users, whether root or
not.

The security.bsd.unprivileged_read_msgbuf sysctl still works as before,
controlling system-wide whether non-root users can see the buffer.

PR: 211580
Submitted by: bz
Approved by: re@ (kib@)
MFC after: 3 days


# c542c43e 16-Aug-2018 Jamie Gritton <jamie@FreeBSD.org>

Revert r337922, except for some documention-only bits. This needs to wait
until user is changed to stop using jail(2).

Differential Revision: D14791


# 284001a2 16-Aug-2018 Jamie Gritton <jamie@FreeBSD.org>

Put jail(2) under COMPAT_FREEBSD11. It has been the "old" way of creating
jails since FreeBSD 7.

Along with the system call, put the various security.jail.allow_foo and
security.jail.foo_allowed sysctls partly under COMPAT_FREEBSD11 (or
BURN_BRIDGES). These sysctls had two disparate uses: on the system side,
they were global permissions for jails created via jail(2) which lacked
fine-grained permission controls; inside a jail, they're read-only
descriptions of what the current jail is allowed to do. The first use
is obsolete along with jail(2), but keep them for the second-read-only use.

Differential Revision: D14791


# ccd6ac9f 28-Jul-2018 Antoine Brodin <antoine@FreeBSD.org>

Add allow.mlock to jail parameters
It allows locking or unlocking physical pages in memory within a jail

This allows running elasticsearch with "bootstrap.memory_lock" inside a jail

Reviewed by: jamie@
Differential Revision: https://reviews.freebsd.org/D16342


# 0a172404 06-Jul-2018 Jamie Gritton <jamie@FreeBSD.org>

Change prison_add_vfs() to the more generic prison_add_allow(), which
can add any dynamic allow.* or allow.*.* parameter. Also keep
prison_add_vfs() as a wrapper.

Differential Revision: D16146


# 0e5c6bd4 04-May-2018 Jamie Gritton <jamie@FreeBSD.org>

Make it easier for filesystems to count themselves as jail-enabled,
by doing most of the work in a new function prison_add_vfs in kern_jail.c
Now a jail-enabled filesystem need only mark itself with VFCF_JAIL, and
the rest is taken care of. This includes adding a jail parameter like
allow.mount.foofs, and a sysctl like security.jail.mount_foofs_allowed.
Both of these used to be a static list of known filesystems, with
predefined permission bits.

Reviewed by: kib
Differential Revision: D14681


# c4e20cad 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/sys: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.


# 03af441c 09-Jul-2017 Alexander Leidinger <netchild@FreeBSD.org>

- Extend pr_allow flags visually to 32 bits, to make it more obvious at first look how much flags we still
have available to use in the future.
- Add kmem_access flag as a placeholder (reserve it), not used yet.

Differential Revision: D11451
Reviewed by: jamie
Sponsored by: Hackathon Essen 2017


# e28f9b7d 05-Jun-2017 Allan Jude <allanjude@FreeBSD.org>

Jails: Optionally prevent jailed root from binding to privileged ports

You may now optionally specify allow.noreserved_ports to prevent root
inside a jail from using privileged ports (less than 1024)

PR: 217728
Submitted by: Matt Miller <mattm916@pulsar.neomailbox.ch>
Reviewed by: jamie, cem, smh
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D10202


# 0ce1624d 08-Aug-2016 Stephen J. Kiernan <stevek@FreeBSD.org>

Move IPv4-specific jail functions to new file netinet/in_jail.c
_prison_check_ip4 renamed to prison_check_ip4_locked

Move IPv6-specific jail functions to new file netinet6/in6_jail.c
_prison_check_ip6 renamed to prison_check_ip6_locked

Add appropriate prototypes to sys/sys/jail.h

Adjust kern_jail.c to call prison_check_ip4_locked and
prison_check_ip6_locked accordingly.

Add netinet/in_jail.c and netinet6/in6_jail.c to the list of files that
need to be built when INET and INET6, respectively, are configured in the
kernel configuration file.

Reviewed by: jtl
Approved by: sjg (mentor)
Sponsored by: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D6799


# 73d9e52d 26-Apr-2016 Jamie Gritton <jamie@FreeBSD.org>

Delay revmoing the last jail reference in prison_proc_free, and instead
put it off into the pr_task. This is similar to prison_free, and in fact
uses the same task even though they do something slightly different.

This resolves a LOR between the process lock and allprison_lock, which
came about in r298565.

PR: 48471


# cc5fd8c7 24-Apr-2016 Jamie Gritton <jamie@FreeBSD.org>

Add a new jail OSD method, PR_METHOD_REMOVE. It's called when a jail is
removed from the user perspective, i.e. when the last pr_uref goes away,
even though the jail mail still exist in the dying state. It will also
be called if either PR_METHOD_CREATE or PR_METHOD_SET fail.

PR: 48471
MFC after: 5 days


# 2a549507 24-Apr-2016 Jamie Gritton <jamie@FreeBSD.org>

Remove the PR_REMOVE flag, which was meant as a temporary marker for
a jail that might be seen mid-removal. It hasn't been doing the right
thing since at least the ability to resurrect dying jails, and such
resurrection also makes it unnecessary.


# a63513d7 14-Nov-2015 Edward Tomasz Napierala <trasz@FreeBSD.org>

Doh, commit in a wrong directory. Fix r290857.

MFC after: 1 month
Sponsored by: The FreeBSD Foundation


# f19e47d6 19-Jul-2015 Marcelo Araujo <araujo@FreeBSD.org>

Add support to the jail framework to be able to mount linsysfs(5) and
linprocfs(5).

Differential Revision: D2846
Submitted by: Nikolai Lifanov <lifanov@mail.lifanov.com>
Reviewed by: jamie


# b96bd95b 27-Feb-2015 Ian Lepore <ian@FreeBSD.org>

Allow the kern.osrelease and kern.osreldate sysctl values to be set in a
jail's creation parameters. This allows the kernel version to be reliably
spoofed within the jail whether examined directly with sysctl or
indirectly with the uname -r and -K options.

The values can only be set at jail creation time, to eliminate the need
for any locking when accessing the values via sysctl.

The overridden values are inherited by nested jails (unless the config for
the nested jails also overrides the values).

There is no sanity or range checking, other than disallowing an empty
release string or a zero release date, by design. The system
administrator is trusted to set sane values. Setting values that are
newer than the actual running kernel will likely cause compatibility
problems.

Differential Revision: https://reviews.freebsd.org/D1948
Relnotes: yes


# 464aad14 28-Jan-2015 Jamie Gritton <jamie@FreeBSD.org>

Add allow.mount.fdescfs jail flag.

PR: 192951
Submitted by: ruben@verweg.com
MFC after: 3 days


# 6a3f2779 13-Jan-2015 Jamie Gritton <jamie@FreeBSD.org>

Remove the prison flags PR_IP4_DISABLE and PR_IP6_DISABLE, which have been
write-only for as long as they've existed.


# f15444cc 31-Jan-2014 Jamie Gritton <jamie@FreeBSD.org>

Back out r261266 pending security buy-in.

r261266:
Add a jail parameter, allow.kmem, which lets jailed processes access
/dev/kmem and related devices (i.e. grants PRIV_IO and PRIV_KMEM_WRITE).
This in conjunction with changing the drm driver's permission check from
PRIV_DRIVER to PRIV_KMEM_WRITE will allow a jailed Xorg server.


# 109ca2d5 29-Jan-2014 Jamie Gritton <jamie@FreeBSD.org>

Add a jail parameter, allow.kmem, which lets jailed processes access
/dev/kmem and related devices (i.e. grants PRIV_IO and PRIV_KMEM_WRITE).
This in conjunction with changing the drm driver's permission check from
PRIV_DRIVER to PRIV_KMEM_WRITE will allow a jailed Xorg server.

Submitted by: netchild
MFC after: 1 week


# 0d168b8d 01-Nov-2013 Gleb Smirnoff <glebius@FreeBSD.org>

prison_check_ip4() can take const arguments.


# 2454886e 23-Aug-2013 Xin LI <delphij@FreeBSD.org>

Allow tmpfs be mounted inside jail.


# 41c0675e 28-Feb-2012 Martin Matuska <mm@FreeBSD.org>

Add procfs to jail-mountable filesystems.

Reviewed by: jamie
MFC after: 1 week


# e7af90ab 26-Feb-2012 Martin Matuska <mm@FreeBSD.org>

Analogous to r232059, add a parameter for the ZFS file system:

allow.mount.zfs:
allow mounting the zfs filesystem inside a jail

This way the permssions for mounting all current VFCF_JAIL filesystems
inside a jail are controlled wia allow.mount.* jail parameters.

Update sysctl descriptions.
Update jail(8) and zfs(8) manpages.

TODO: document the connection of allow.mount.* and VFCF_JAIL for kernel
developers

MFC after: 10 days


# bf3db8aa 23-Feb-2012 Martin Matuska <mm@FreeBSD.org>

To improve control over the use of mount(8) inside a jail(8), introduce
a new jail parameter node with the following parameters:

allow.mount.devfs:
allow mounting the devfs filesystem inside a jail

allow.mount.nullfs:
allow mounting the nullfs filesystem inside a jail

Both parameters are disabled by default (equals the behavior before
devfs and nullfs in jails). Administrators have to explicitly allow
mounting devfs and nullfs for each jail. The value "-1" of the
devfs_ruleset parameter is removed in favor of the new allow setting.

Reviewed by: jamie
Suggested by: pjd
MFC after: 2 weeks


# 0cc207a6 09-Feb-2012 Martin Matuska <mm@FreeBSD.org>

Add support for mounting devfs inside jails.

A new jail(8) option "devfs_ruleset" defines the ruleset enforcement for
mounting devfs inside jails. A value of -1 disables mounting devfs in
jails, a value of zero means no restrictions. Nested jails can only
have mounting devfs disabled or inherit parent's enforcement as jails are
not allowed to view or manipulate devfs(8) rules.

Utilizes new functions introduced in r231265.

Reviewed by: jamie
MFC after: 1 month


# a7ad07bf 03-May-2011 Edward Tomasz Napierala <trasz@FreeBSD.org>

Change the way rctl interfaces with jails by introducing prison_racct
structure, which acts as a proxy between them. This makes jail rules
persistent, i.e. they can be added before jail gets created, and they
don't disappear when the jail gets destroyed.


# 097055e2 29-Mar-2011 Edward Tomasz Napierala <trasz@FreeBSD.org>

Add racct. It's an API to keep per-process, per-jail, per-loginclass
and per-loginclass resource accounting information, to be used by the new
resource limits code. It's connected to the build, but the code that
actually calls the new functions will come later.

Sponsored by: The FreeBSD Foundation
Reviewed by: kib (earlier version)


# e4cd31dd 21-Mar-2011 Jeff Roberson <jeff@FreeBSD.org>

- Merge changes to the base system to support OFED. These include
a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND,
and other miscellaneous small features.


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# 3bcceea4 23-Jan-2010 Bjoern A. Zeeb <bz@FreeBSD.org>

MFC r202468:

Add ip4.saddrsel/ip4.nosaddrsel (and equivalent for ip6) to control
whether to use source address selection (default) or the primary
jail address for unbound outgoing connections.

This is intended to be used by people upgrading from single-IP
jails to multi-IP jails but not having to change firewall rules,
application ACLs, ... but to force their connections (unless
otherwise changed) to the primry jail IP they had been used for
years, as well as for people prefering to implement similar policies.

Note that for IPv6, if configured incorrectly, this might lead to
scope violations, which single-IPv6 jails could as well, as by the
design of jails. [1]

Reviewed by: jamie, hrs (ipv6 part)
Pointed out by: hrs [1]


# 592bcae8 16-Jan-2010 Bjoern A. Zeeb <bz@FreeBSD.org>

Add ip4.saddrsel/ip4.nosaddrsel (and equivalent for ip6) to control
whether to use source address selection (default) or the primary
jail address for unbound outgoing connections.

This is intended to be used by people upgrading from single-IP
jails to multi-IP jails but not having to change firewall rules,
application ACLs, ... but to force their connections (unless
otherwise changed) to the primry jail IP they had been used for
years, as well as for people prefering to implement similar policies.

Note that for IPv6, if configured incorrectly, this might lead to
scope violations, which single-IPv6 jails could as well, as by the
design of jails. [1]

Reviewed by: jamie, hrs (ipv6 part)
Pointed out by: hrs [1]
MFC After: 2 weeks
Asked for by: Jase Thew (bazerka beardz.net)


# 950cde50 28-Dec-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

MFC r200473:

Throughout the network stack we have a few places of
if (jailed(cred))
left. If you are running with a vnet (virtual network stack) those will
return true and defer you to classic IP-jails handling and thus things
will be "denied" or returned with an error.

Work around this problem by introducing another "jailed()" function,
jailed_without_vnet(), that also takes vnets into account, and permits
the calls, should the jail from the given cred have its own virtual
network stack.

We cannot change the classic jailed() call to do that, as it is used
outside the network stack as well.

Discussed with: julian, zec, jamie, rwatson (back in Sept)


# de0bd6f7 13-Dec-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

Throughout the network stack we have a few places of
if (jailed(cred))
left. If you are running with a vnet (virtual network stack) those will
return true and defer you to classic IP-jails handling and thus things
will be "denied" or returned with an error.

Work around this problem by introducing another "jailed()" function,
jailed_without_vnet(), that also takes vnets into account, and permits
the calls, should the jail from the given cred have its own virtual
network stack.

We cannot change the classic jailed() call to do that, as it is used
outside the network stack as well.

Discussed with: julian, zec, jamie, rwatson (back in Sept)
MFC after: 5 days


# da2a30fc 13-Aug-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

MFC r196176:

Make it possible to change the vnet sysctl variables on jails
with their own virtual network stack. Jails only inheriting a
network stack cannot change anything that cannot be changed from
within a prison.

Reviewed by: rwatson, zec

Approved by: re (kib)


# eb79e1c7 13-Aug-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

Make it possible to change the vnet sysctl variables on jails
with their own virtual network stack. Jails only inheriting a
network stack cannot change anything that cannot be changed from
within a prison.

Reviewed by: rwatson, zec
Approved by: re (kib)


# 7cbf7213 25-Jul-2009 Jamie Gritton <jamie@FreeBSD.org>

Some jail parameters (in particular, "ip4" and "ip6" for IP address
restrictions) were found to be inadequately described by a boolean.
Define a new parameter type with three values (disable, new, inherit)
to handle these and future cases.

Approved by: re (kib), bz (mentor)
Discussed with: rwatson


# ca006477 24-Jun-2009 Jamie Gritton <jamie@FreeBSD.org>

Clean up struct prison, with the recent fields in more logical places,
and room for future expansion.

Approved by: bz (mentor)


# b97457e2 23-Jun-2009 Jamie Gritton <jamie@FreeBSD.org>

Add a limit for child jails via the "children.cur" and "children.max"
parameters. This replaces the simple "allow.jails" permission.

Approved by: bz (mentor)


# 679e1390 15-Jun-2009 Jamie Gritton <jamie@FreeBSD.org>

Manage vnets via the jail system. If a jail is given the boolean
parameter "vnet" when it is created, a new vnet instance will be created
along with the jail. Networks interfaces can be moved between prisons
with an ioctl similar to the one that moves them between vimages.
For now vnets will co-exist under both jails and vimages, but soon
struct vimage will be going away.

Reviewed by: zec, julian
Approved by: bz (mentor)


# c1f19219 13-Jun-2009 Jamie Gritton <jamie@FreeBSD.org>

Rename the host-related prison fields to be the same as the host.*
parameters they represent, and the variables they replaced, instead of
abbreviated versions of them.

Approved by: bz (mentor)


# 7455b100 12-Jun-2009 Jamie Gritton <jamie@FreeBSD.org>

Add counterparts to getcredhostname:
getcreddomainname, getcredhostuuid, getcredhostid

Suggested by: rmacklem
Approved by: bz


# 76ca6f88 29-May-2009 Jamie Gritton <jamie@FreeBSD.org>

Place hostnames and similar information fully under the prison system.
The system hostname is now stored in prison0, and the global variable
"hostname" has been removed, as has the hostname_mtx mutex. Jails may
have their own host information, or they may inherit it from the
parent/system. The proper way to read the hostname is via
getcredhostname(), which will copy either the hostname associated with
the passed cred, or the system hostname if you pass NULL. The system
hostname can still be accessed directly (and without locking) at
prison0.pr_host, but that should be avoided where possible.

The "similar information" referred to is domainname, hostid, and
hostuuid, which have also become prison parameters and had their
associated global variables removed.

Approved by: bz (mentor)


# 73d0971b 27-May-2009 Jamie Gritton <jamie@FreeBSD.org>

Add support for the arbitrary named jail parameters used by jail_set(2)
and jail_get(2). Jail(8) can now create jails using a "name=value"
format instead of just specifying a limited set of fixed parameters; it
can also modify parameters of existing jails. Jls(8) can display all
parameters of jails, or a specified set of parameters. The available
parameters are gathered from the kernel, and not hard-coded into these
programs.

Small patches on killall(1) and jexec(8) to support jail names with
jail_get(2).

Approved by: bz (mentor)


# 0304c731 27-May-2009 Jamie Gritton <jamie@FreeBSD.org>

Add hierarchical jails. A jail may further virtualize its environment
by creating a child jail, which is visible to that jail and to any
parent jails. Child jails may be restricted more than their parents,
but never less. Jail names reflect this hierarchy, being MIB-style
dot-separated strings.

Every thread now points to a jail, the default being prison0, which
contains information about the physical system. Prison0's root
directory is the same as rootvnode; its hostname is the same as the
global hostname, and its securelevel replaces the global securelevel.
Note that the variable "securelevel" has actually gone away, which
should not cause any problems for code that properly uses
securelevel_gt() and securelevel_ge().

Some jail-related permissions that were kept in global variables and
set via sysctls are now per-jail settings. The sysctls still exist for
backward compatibility, used only by the now-deprecated jail(2) system
call.

Approved by: bz (mentor)


# 7ae27ff4 07-May-2009 Jamie Gritton <jamie@FreeBSD.org>

Move the per-prison Linux MIB from a private one-off pointer to the new
OSD-based jail extensions. This allows the Linux MIB to accessed via
jail_set and jail_get, and serves as a demonstration of adding jail support
to a module.

Reviewed by: dchagin, kib
Approved by: bz (mentor)


# 49939083 04-May-2009 Jamie Gritton <jamie@FreeBSD.org>

Add a constant PR_MAXMETHOD to better define the jail/OSD interface.

Reviewed by: dchagin, kib
Approved by: bz (mentor)


# b38ff370 29-Apr-2009 Jamie Gritton <jamie@FreeBSD.org>

Introduce the extensible jail framework, using the same "name=value"
interface as nmount(2). Three new system calls are added:
* jail_set, to create jails and change the parameters of existing jails.
This replaces jail(2).
* jail_get, to read the parameters of existing jails. This replaces the
security.jail.list sysctl.
* jail_remove to kill off a jail's processes and remove the jail.
Most jail parameters may now be changed after creation, and jails may be
set to exist without any attached processes. The current jail(2) system
call still exists, though it is now a stub to jail_set(2).

Approved by: bz (mentor)


# 7074cfa2 29-Apr-2009 Jamie Gritton <jamie@FreeBSD.org>

With the permission of phk@ change the license on remaining jail code
to a 2 clause BSD license.

Approved by: phk
Approved by: bz (mentor)


# 8571af59 27-Mar-2009 Jamie Gritton <jamie@FreeBSD.org>

Whitespace/spelling fixes in advance of upcoming functional changes.

Approved by: bz (mentor)


# 3cb55d4a 17-Feb-2009 Jamie Gritton <jamie@FreeBSD.org>

Remove obsolete prison_service declarations.

Approved by: bz (mentor)


# ca04ba64 05-Feb-2009 Jamie Gritton <jamie@FreeBSD.org>

Don't allow creating a socket with a protocol family that the current
jail doesn't support. This involves a new function prison_check_af,
like prison_check_ip[46] but that checks only the family.

With this change, most of the errors generated by jailed sockets
shouldn't ever occur, at least until jails are changeable.

Approved by: bz (mentor)


# 1cecba0f 25-Jan-2009 Bjoern A. Zeeb <bz@FreeBSD.org>

For consistency with prison_{local,remote,check}_ipN rename
prison_getipN to prison_get_ipN.

Submitted by: jamie (as part of a larger patch)
MFC after: 1 week


# 413628a7 29-Nov-2008 Bjoern A. Zeeb <bz@FreeBSD.org>

MFp4:
Bring in updated jail support from bz_jail branch.

This enhances the current jail implementation to permit multiple
addresses per jail. In addtion to IPv4, IPv6 is supported as well.
Due to updated checks it is even possible to have jails without
an IP address at all, which basically gives one a chroot with
restricted process view, no networking,..

SCTP support was updated and supports IPv6 in jails as well.

Cpuset support permits jails to be bound to specific processor
sets after creation.

Jails can have an unrestricted (no duplicate protection, etc.) name
in addition to the hostname. The jail name cannot be changed from
within a jail and is considered to be used for management purposes
or as audit-token in the future.

DDB 'show jails' command was added to aid debugging.

Proper compat support permits 32bit jail binaries to be used on 64bit
systems to manage jails. Also backward compatibility was preserved where
possible: for jail v1 syscalls, as well as with user space management
utilities.

Both jail as well as prison version were updated for the new features.
A gap was intentionally left as the intermediate versions had been
used by various patches floating around the last years.

Bump __FreeBSD_version for the afore mentioned and in kernel changes.

Special thanks to:
- Pawel Jakub Dawidek (pjd) for his multi-IPv4 patches
and Olivier Houchard (cognet) for initial single-IPv6 patches.
- Jeff Roberson (jeff) and Randall Stewart (rrs) for their
help, ideas and review on cpuset and SCTP support.
- Robert Watson (rwatson) for lots and lots of help, discussions,
suggestions and review of most of the patch at various stages.
- John Baldwin (jhb) for his help.
- Simon L. Nielsen (simon) as early adopter testing changes
on cluster machines as well as all the testers and people
who provided feedback the last months on freebsd-jail and
other channels.
- My employer, CK Software GmbH, for the support so I could work on this.

Reviewed by: (see above)
MFC after: 3 months (this is just so that I get the mail)
X-MFC Before: 7.2-RELEASE if possible


# 1ba4a712 17-Nov-2008 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Update ZFS from version 6 to 13 and bring some FreeBSD-specific changes.

This bring huge amount of changes, I'll enumerate only user-visible changes:

- Delegated Administration

Allows regular users to perform ZFS operations, like file system
creation, snapshot creation, etc.

- L2ARC

Level 2 cache for ZFS - allows to use additional disks for cache.
Huge performance improvements mostly for random read of mostly
static content.

- slog

Allow to use additional disks for ZFS Intent Log to speed up
operations like fsync(2).

- vfs.zfs.super_owner

Allows regular users to perform privileged operations on files stored
on ZFS file systems owned by him. Very careful with this one.

- chflags(2)

Not all the flags are supported. This still needs work.

- ZFSBoot

Support to boot off of ZFS pool. Not finished, AFAIK.

Submitted by: dfr

- Snapshot properties

- New failure modes

Before if write requested failed, system paniced. Now one
can select from one of three failure modes:
- panic - panic on write error
- wait - wait for disk to reappear
- continue - serve read requests if possible, block write requests

- Refquota, refreservation properties

Just quota and reservation properties, but don't count space consumed
by children file systems, clones and snapshots.

- Sparse volumes

ZVOLs that don't reserve space in the pool.

- External attributes

Compatible with extattr(2).

- NFSv4-ACLs

Not sure about the status, might not be complete yet.

Submitted by: trasz

- Creation-time properties

- Regression tests for zpool(8) command.

Obtained from: OpenSolaris


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# 2110d913 19-Jun-2008 Xin LI <delphij@FreeBSD.org>

Revert rev. 178124 as requested by kris@. Having jail id not being
reused too frequently is useful for script controlled environment.


# 31c50f53 11-Apr-2008 Xin LI <delphij@FreeBSD.org>

Instead of rolling our own jail number allocation procedure, use
alloc_unr() to do it.

Submitted by: Ed Schouten <ed 80386 nl>
PR: kern/122270
MFC after: 1 month


# dc68a633 05-Apr-2007 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Implement functionality I called 'jail services'.

It may be used for external modules to attach some data to jail's in-kernel
structure.

- Change allprison_mtx mutex to allprison_sx sx(9) lock.
We will need to call external functions while holding this lock, which may
want to allocate memory.
Make use of the fact that this is shared-exclusive lock and use shared
version when possible.
- Implement the following functions:
prison_service_register() - registers a service that wants to be noticed
when a jail is created and destroyed
prison_service_deregister() - deregisters service
prison_service_data_add() - adds service-specific data to the jail structure
prison_service_data_get() - takes service-specific data from the jail
structure
prison_service_data_del() - removes service-specific data from the jail
structure

Reviewed by: rwatson


# 54b369c1 05-Apr-2007 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Make prison_find() globally accessible.


# 800c9408 06-Nov-2006 Robert Watson <rwatson@FreeBSD.org>

Add a new priv(9) kernel interface for checking the availability of
privilege for threads and credentials. Unlike the existing suser(9)
interface, priv(9) exposes a named privilege identifier to the privilege
checking code, allowing more complex policies regarding the granting of
privilege to be expressed. Two interfaces are provided, replacing the
existing suser(9) interface:

suser(td) -> priv_check(td, priv)
suser_cred(cred, flags) -> priv_check_cred(cred, priv, flags)

A comprehensive list of currently available kernel privileges may be
found in priv.h. New privileges are easily added as required, but the
comments on adding privileges found in priv.h and priv(9) should be read
before doing so.

The new privilege interface exposed sufficient information to the
privilege checking routine that it will now be possible for jail to
determine whether a particular privilege is granted in the check routine,
rather than relying on hints from the calling context via the
SUSER_ALLOWJAIL flag. For now, the flag is maintained, but a new jail
check function, prison_priv_check(), is exposed from kern_jail.c and used
by the privilege check routine to determine if the privilege is permitted
in jail. As a result, a centralized list of privileges permitted in jail
is now present in kern_jail.c.

The MAC Framework is now also able to instrument privilege checks, both
to deny privileges otherwise granted (mac_priv_check()), and to grant
privileges otherwise denied (mac_priv_grant()), permitting MAC Policy
modules to implement privilege models, as well as control a much broader
range of system behavior in order to constrain processes running with
root privilege.

The suser() and suser_cred() functions remain implemented, now in terms
of priv_check() and the PRIV_ROOT privilege, for use during the transition
and possibly continuing use by third party kernel modules that have not
been updated. The PRIV_DRIVER privilege exists to allow device drivers to
check privilege without adopting a more specific privilege identifier.

This change does not modify the actual security policy, rather, it
modifies the interface for privilege checks so changes to the security
policy become more feasible.

Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Discussed on: arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
Alex Lyashkov <umka at sevcity dot net>,
Skip Ford <skip dot ford at verizon dot net>,
Antoine Brodin <antoine dot brodin at laposte dot net>


# 820a0de9 09-Jun-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Rename sysctl security.jail.getfsstatroot_only to security.jail.enforce_statfs
and extend its functionality:

value policy
0 show all mount-points without any restrictions
1 show only mount-points below jail's chroot and show only part of the
mount-point's path (if jail's chroot directory is /jails/foo and
mount-point is /jails/foo/usr/home only /usr/home will be shown)
2 show only mount-point where jail's chroot directory is placed.

Default value is 2.

Discussed with: rwatson


# e69f1fa2 20-Mar-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Make prison structure visible from userland if _WANT_PRISON is defined
(simlar to _WANT_UCRED).

Reviewed by: gad
MFC after: 3 days


# 79653046 08-Feb-2005 Colin Percival <cperciva@FreeBSD.org>

Add a new sysctl, "security.jail.chflags_allowed", which controls the
behaviour of chflags within a jail. If set to 0 (the default), then a
jailed root user is treated as an unprivileged user; if set to 1, then
a jailed root user is treated the same as an unjailed root user.

This is necessary to allow "make installworld" to work inside a jail,
since it attempts to manipulate the system immutable flag on certain
files.

Discussed with: csjp, rwatson
MFC after: 2 weeks


# 60727d8b 06-Jan-2005 Warner Losh <imp@FreeBSD.org>

/* -> /*- for license, minor formatting changes


# 4bc6b2af 24-Nov-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Correct mutexes names in comment.

Reviewed by: rwatson


# 5a59cefc 26-Apr-2004 Bosko Milekic <bmilekic@FreeBSD.org>

Give jail(8) the feature to allow raw sockets from within a
jail, which is less restrictive but allows for more flexible
jail usage (for those who are willing to make the sacrifice).
The default is off, but allowing raw sockets within jails can
now be accomplished by tuning security.jail.allow_raw_sockets
to 1.

Turning this on will allow you to use things like ping(8)
or traceroute(8) from within a jail.

The patch being committed is not identical to the patch
in the PR. The committed version is more friendly to
APIs which pjd is working on, so it should integrate
into his work quite nicely. This change has also been
presented and addressed on the freebsd-hackers mailing
list.

Submitted by: Christian S.J. Peron <maneo@bsdpro.com>
PR: kern/65800


# f08df373 14-Feb-2004 Robert Watson <rwatson@FreeBSD.org>

By default, when a process in jail calls getfsstat(), only return the
data for the file system on which the jail's root vnode is located.
Previous behavior (show data for all mountpoints) can be restored
by setting security.jail.getfsstatroot_only to 0. Note: this also
has the effect of hiding other mounts inside a jail, such as /dev,
/tmp, and /proc, but errs on the side of leaking less information.


# b3059e09 23-Jan-2004 Robert Watson <rwatson@FreeBSD.org>

Defer the vrele() on a jail's root vnode reference from prison_free()
to a new prison_complete() task run by a task queue. This removes
a requirement for grabbing Giant in crfree(). Embed the 'struct task'
in 'struct prison' so that we don't have to allocate memory from
prison_free() (which means we also defer the FREE()).

With this change, I believe grabbing Giant from crfree() can now be
removed, but need to check the uidinfo code paths.

To avoid header pollution, move the definition of 'struct task'
to _task.h, and recursively include from taskqueue.h and jail.h; much
preferably to all files including jail.h picking up a requirement to
include taskqueue.h.

Bumped into by: sam
Reviewed by: bde, tjr


# fd7a8150 08-Apr-2003 Mike Barcroft <mike@FreeBSD.org>

o In struct prison, add an allprison linked list of prisons (protected
by allprison_mtx), a unique prison/jail identifier field, two path
fields (pr_path for reporting and pr_root vnode instance) to store
the chroot() point of each jail.
o Add jail_attach(2) to allow a process to bind to an existing jail.
o Add change_root() to perform the chroot operation on a specified
vnode.
o Generalize change_dir() to accept a vnode, and move namei() calls
to callers of change_dir().
o Add a new sysctl (security.jail.list) which is a group of
struct xprison instances that represent a snapshot of active jails.

Reviewed by: rwatson, tjr


# 607aa34e 05-May-2002 Bruce Evans <bde@FreeBSD.org>

Include <sys/queue.h> so that this file provides its own namespace
pollution which is required for its includes of <sys/_lock.h> and
<sys/_mutex.h> to work.


# 789f12fe 19-Mar-2002 Alfred Perlstein <alfred@FreeBSD.org>

Remove __P


# ad1ff099 27-Feb-2002 Robert Drehmel <robert@FreeBSD.org>

Make getcredhostname() take a buffer and the buffer's size
as arguments. The correct hostname is copied into the buffer
while having the prison's lock acquired in a jailed process'
case.

Reviewed by: jhb, rwatson


# 9484d0c0 27-Feb-2002 Robert Drehmel <robert@FreeBSD.org>

Add a function which returns the correct hostname for a given
credential.

Reviewed by: phk


# 01137630 03-Dec-2001 Robert Watson <rwatson@FreeBSD.org>

o Introduce pr_mtx into struct prison, providing protection for the
mutable contents of struct prison (hostname, securelevel, refcount,
pr_linux, ...)
o Generally introduce mtx_lock()/mtx_unlock() calls throughout kern/
so as to enforce these protections, in particular, in kern_mib.c
protection sysctl access to the hostname and securelevel, as well as
kern_prot.c access to the securelevel for access control purposes.
o Rewrite linux emulator abstractions for accessing per-jail linux
mib entries (osname, osrelease, osversion) so that they don't return
a pointer to the text in the struct linux_prison, rather, a copy
to an array passed into the calls. Likewise, update linprocfs to
use these primitives.
o Update in_pcb.c to always use prison_getip() rather than directly
accessing struct prison.

Reviewed by: jhb


# bda63e26 26-Sep-2001 Robert Watson <rwatson@FreeBSD.org>

o Introduce pr_securelevel, which holds a per-jail securelevel.

Obtained from: TrustedBSD Project


# 91421ba2 20-Feb-2001 Robert Watson <rwatson@FreeBSD.org>

o Move per-process jail pointer (p->pr_prison) to inside of the subject
credential structure, ucred (cr->cr_prison).
o Allow jail inheritence to be a function of credential inheritence.
o Abstract prison structure reference counting behind pr_hold() and
pr_free(), invoked by the similarly named credential reference
management functions, removing this code from per-ABI fork/exit code.
o Modify various jail() functions to use struct ucred arguments instead
of struct proc arguments.
o Introduce jailed() function to determine if a credential is jailed,
rather than directly checking pointers all over the place.
o Convert PRISON_CHECK() macro to prison_check() function.
o Move jail() function prototypes to jail.h.
o Emulate the P_JAILED flag in fill_kinfo_proc() and no longer set the
flag in the process flags field itself.
o Eliminate that "const" qualifier from suser/p_can/etc to reflect
mutex use.

Notes:

o Some further cleanup of the linux/jail code is still required.
o It's now possible to consider resolving some of the process vs
credential based permission checking confusion in the socket code.
o Mutex protection of struct prison is still not present, and is
required to protect the reference count plus some fields in the
structure.

Reviewed by: freebsd-arch
Obtained from: TrustedBSD Project


# cb1f0db9 30-Oct-2000 Robert Watson <rwatson@FreeBSD.org>

o Deny access to System V IPC from within jail by default, as in the
current implementation, jail neither virtualizes the Sys V IPC namespace,
nor provides inter-jail protections on IPC objects.
o Support for System V IPC can be enabled by setting jail.sysvipc_allowed=1
using sysctl.
o This is not the "real fix" which involves virtualizing the System V
IPC namespace, but prevents processes within jail from influencing those
outside of jail when not approved by the administrator.

Reported by: Paulo Fragoso <paulo@nlink.com.br>


# 7cadc266 03-Jun-2000 Robert Watson <rwatson@FreeBSD.org>

o Modify jail to limit creation of sockets to UNIX domain sockets,
TCP/IP (v4) sockets, and routing sockets. Previously, interaction
with IPv6 was not well-defined, and might be inappropriate for some
environments. Similarly, sysctl MIB entries providing interface
information also give out only addresses from those protocol domains.

For the time being, this functionality is enabled by default, and
toggleable using the sysctl variable jail.socket_unixiproute_only.
In the future, protocol domains will be able to determine whether or
not they are ``jail aware''.

o Further limitations on process use of getpriority() and setpriority()
by jailed processes. Addresses problem described in kern/17878.

Reviewed by: phk, jmg


# 83f1e257 12-Feb-2000 Robert Watson <rwatson@FreeBSD.org>

Yet-another-update: rename ``kern.prison'' to a new sysctl root entry,
``jail'', and move the set_hostname_allowed sysctl there, as well as
fixing a bug in the sysctl that resulted in jails being over-limited
(preventing them from reading as well as writing the hostname). Also,
correct some formatting issues, courtesy bde :-).

Reviewed by: phk
Approved by: jkh


# 5bdee2c5 10-Feb-2000 Robert Watson <rwatson@FreeBSD.org>

Fix sysctl namespace for jail: move the kern.jailcansethostname to
kern.prison.set_hostname_allowed, off of the kern.prison node. Future
jail twiddles should be placed in this namespace.


# 664a31e4 28-Dec-1999 Peter Wemm <peter@FreeBSD.org>

Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL"
is an application space macro and the applications are supposed to be free
to use it as they please (but cannot). This is consistant with the other
BSD's who made this change quite some time ago. More commits to come.


# 978f8d93 19-Sep-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Add a version number field to the jail(2) argument so that future changes
can be handled intelligently.


# c3aac50f 27-Aug-1999 Peter Wemm <peter@FreeBSD.org>

$Id$ -> $FreeBSD$


# c6dfea0e 27-Aug-1999 Marcel Moolenaar <marcel@FreeBSD.org>

Add sysctl variables for the Linuxulator. These reside under `compat.linux' as
discussed on current.

The following variables are defined (for now):

osname (defaults to "Linux")
Allow users to change the name of the OS as returned by uname(2),
specially added for all those Linux Netscape users and statistics
maniacs :-) We now have what we all wanted!

osrelease (defaults to "2.2.5")
Allow users to change the version of the OS as returned by uname(2).
Since -current supports glibc2.1 now, change the default to 2.2.5
(was 2.0.36).

oss_version (defaults to 198144 [0x030600])
This one will be used by the OSS_GETVERSION ioctl (PR 12917) which I
can commit now that we have the MIB. The default version number is the
lowest version possible with the current 'encoding'.

A note about imprisoned processes (see jail(2)):
These variables are copy-on-write (as suggested by phk). This means that
imprisoned processes will use the system wide value unless it is written/set
by the process. From that moment on, a copy local to the prison will be
used.

A note about the implementation:
I choose to add a single pointer to struct prison, because I didn't like the
idea of changing struct prison every time I come up with a new variable. As
a side effect, the extra storage is only needed when a variable is set from
within the prison. This also minimizes kernel bloat when the Linuxulator is
not used; both compiled in or as a module.

Reviewed by: bde (first version only) and phk


# d8bd3ac4 16-May-1999 Poul-Henning Kamp <phk@FreeBSD.org>

$ brucify -deblunder


# 75c13541 28-Apr-1999 Poul-Henning Kamp <phk@FreeBSD.org>

This Implements the mumbled about "Jail" feature.

This is a seriously beefed up chroot kind of thing. The process
is jailed along the same lines as a chroot does it, but with
additional tough restrictions imposed on what the superuser can do.

For all I know, it is safe to hand over the root bit inside a
prison to the customer living in that prison, this is what
it was developed for in fact: "real virtual servers".

Each prison has an ip number associated with it, which all IP
communications will be coerced to use and each prison has its own
hostname.

Needless to say, you need more RAM this way, but the advantage is
that each customer can run their own particular version of apache
and not stomp on the toes of their neighbors.

It generally does what one would expect, but setting up a jail
still takes a little knowledge.

A few notes:

I have no scripts for setting up a jail, don't ask me for them.

The IP number should be an alias on one of the interfaces.

mount a /proc in each jail, it will make ps more useable.

/proc/<pid>/status tells the hostname of the prison for
jailed processes.

Quotas are only sensible if you have a mountpoint per prison.

There are no privisions for stopping resource-hogging.

Some "#ifdef INET" and similar may be missing (send patches!)

If somebody wants to take it from here and develop it into
more of a "virtual machine" they should be most welcome!

Tools, comments, patches & documentation most welcome.

Have fun...

Sponsored by: http://www.rndassociates.com/
Run for almost a year by: http://www.servetheweb.com/