Cross Reference: /freebsd-10.1-release/sys/kern/kern

History log of /freebsd-10.1-release/sys/kern/kern_jail.c
Revision	Date	Author	Comments (<<< Hide modified files) (Show modified files >>>)
# 272461	02-Oct-2014	gjb	Copy stable/10@r272459 to releng/10.1 as part of the 10.1-RELEASE process. Approved by: re (implicit) Sponsored by: The FreeBSD Foundation /freebsd-10.1-release
# 271622	15-Sep-2014	trasz	MFC r271317: Avoid unlocking unlocked mutex in RCTL jail code. Specific test case is attached to PR. PR: 193457 Approved by: re (kib) Sponsored by: The FreeBSD Foundation
# 259847	24-Dec-2013	ae	MFC r259520: Fix copy/paste typo.
# 258929	04-Dec-2013	peter	MFC: r258718: fix emulated jail_v0 byte order Approved by: re (gjb)
# 256281	10-Oct-2013	gjb	Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle. Approved by: re (implicit) Sponsored by: The FreeBSD Foundation
# 255316	06-Sep-2013	jamie	Keep PRIV_KMEM_READ permitted inside jails as it is on the outside.
# 254741	23-Aug-2013	delphij	Allow tmpfs be mounted inside jail.
# 250804	19-May-2013	jamie	Refine the "nojail" rc keyword, adding "nojailvnet" for files that don't apply to most jails but do apply to vnet jails. This includes adding a new sysctl "security.jail.vnet" to identify vnet jails. PR: conf/149050 Submitted by: mdodd MFC after: 3 days
# 244404	18-Dec-2012	mjg	prison_racct_detach can be called for not fully initialized jail, so make it check that the jail has racct before doing anything PR: kern/174436 Reviewed by: trasz MFC after: 3 days
# 241896	22-Oct-2012	kib	Remove the support for using non-mpsafe filesystem modules. In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho
# 235803	22-May-2012	trasz	Fix use-after-free in kern_jail_set() triggered e.g. by attempts to clear "persist" flag from empty persistent jail, like this: jail -c persist=1 jail -n 1 -m persist=0 Submitted by: Mateusz Guzik <mjguzik at gmail dot com> MFC after: 2 weeks
# 235795	22-May-2012	trasz	Don't leak locks in prison_racct_modify(). Submitted by: Mateusz Guzik <mjguzik at gmail dot com> MFC after: 2 weeks
# 232598	06-Mar-2012	trasz	Make racct and rctl correctly handle jail renaming. Previously they would continue using old name, the one jail was created with. PR: bin/165207
# 232278	28-Feb-2012	mm	Add procfs to jail-mountable filesystems. Reviewed by: jamie MFC after: 1 week
# 232186	26-Feb-2012	mm	Analogous to r232059, add a parameter for the ZFS file system: allow.mount.zfs: allow mounting the zfs filesystem inside a jail This way the permssions for mounting all current VFCF_JAIL filesystems inside a jail are controlled wia allow.mount.* jail parameters. Update sysctl descriptions. Update jail(8) and zfs(8) manpages. TODO: document the connection of allow.mount.* and VFCF_JAIL for kernel developers MFC after: 10 days
# 232059	23-Feb-2012	mm	To improve control over the use of mount(8) inside a jail(8), introduce a new jail parameter node with the following parameters: allow.mount.devfs: allow mounting the devfs filesystem inside a jail allow.mount.nullfs: allow mounting the nullfs filesystem inside a jail Both parameters are disabled by default (equals the behavior before devfs and nullfs in jails). Administrators have to explicitly allow mounting devfs and nullfs for each jail. The value "-1" of the devfs_ruleset parameter is removed in favor of the new allow setting. Reviewed by: jamie Suggested by: pjd MFC after: 2 weeks
# 231267	09-Feb-2012	mm	Add support for mounting devfs inside jails. A new jail(8) option "devfs_ruleset" defines the ruleset enforcement for mounting devfs inside jails. A value of -1 disables mounting devfs in jails, a value of zero means no restrictions. Nested jails can only have mounting devfs disabled or inherit parent's enforcement as jails are not allowed to view or manipulate devfs(8) rules. Utilizes new functions introduced in r231265. Reviewed by: jamie MFC after: 1 month
# 230407	20-Jan-2012	mm	Use separate buffer for global path to avoid overflow of path buffer. Reviewed by: jamie@ MFC after: 3 weeks
# 230143	15-Jan-2012	mm	Fix missing in r230129: kern_jail.c: initialize fullpath_disabled to zero vfs_cache.c: add missing dot in comment Reported by: kib MFC after: 1 month
# 230129	15-Jan-2012	mm	Introduce vn_path_to_global_path() This function updates path string to vnode's full global path and checks the size of the new path string against the pathlen argument. In vfs_domount(), sys_unmount() and kern_jail_set() this new function is used to update the supplied path argument to the respective global path. Unbreaks jailed zfs(8) with enforce_statfs set to 1. Reviewed by: kib MFC after: 1 month
# 227309	07-Nov-2011	ed	Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs. The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.
# 227293	07-Nov-2011	ed	Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs. This means that their use is restricted to a single C file.
# 225617	16-Sep-2011	kmacy	In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)
# 225191	26-Aug-2011	jamie	Delay the recursive decrement of pr_uref when jails are made invisible but not removed; decrement it instead when the child jail actually goes away. This avoids letting the counter go below zero in the case where dying (pr_uref==0) jails are "resurrected", and an associated KASSERT panic. Submitted by: Steven Hartland Approved by: re (bz) MFC after: 1 week
# 224615	02-Aug-2011	mm	Always disable mount and unmount for jails with enforce_statfs==2. A working statfs(2) is required for umount(8) in jail. Reviewed by: pjd, kib Approved by: re (kib) MFC after: 2 weeks
# 224290	24-Jul-2011	mckusick	This update changes the mnt_flag field in the mount structure from 32 bits to 64 bits and eliminates the unused mnt_xflag field. The existing mnt_flag field is completely out of bits, so this update gives us room to expand. Note that the f_flags field in the statfs structure is already 64 bits, so the expanded mnt_flag field can be exported without having to make any changes in the statfs structure. Approved by: re (bz)
# 223735	03-Jul-2011	bz	Add infrastructure to allow all frames/packets received on an interface to be assigned to a non-default FIB instance. You may need to recompile world or ports due to the change of struct ifnet. Submitted by: cjsp Submitted by: Alexander V. Chernikov (melifaro ipfw.ru) (original versions) Reviewed by: julian Reviewed by: Alexander V. Chernikov (melifaro ipfw.ru) MFC after: 2 weeks X-MFC: use spare in struct ifnet
# 221362	03-May-2011	trasz	Change the way rctl interfaces with jails by introducing prison_racct structure, which acts as a proxy between them. This makes jail rules persistent, i.e. they can be added before jail gets created, and they don't disappear when the jail gets destroyed.
# 220163	30-Mar-2011	trasz	Add rctl. It's used by racct to take user-configurable actions based on the set of rules it maintains and the current resource usage. It also privides userland API to manage that ruleset. Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)
# 220137	29-Mar-2011	trasz	Add racct. It's an API to keep per-process, per-jail, per-loginclass and per-loginclass resource accounting information, to be used by the new resource limits code. It's connected to the build, but the code that actually calls the new functions will come later. Sponsored by: The FreeBSD Foundation Reviewed by: kib (earlier version)
# 219819	21-Mar-2011	jeff	- Merge changes to the base system to support OFED. These include a wider arg2 for sysctl, updates to vlan code, IFT_INFINIBAND, and other miscellaneous small features.
# 219304	05-Mar-2011	trasz	Add two new system calls, setloginclass(2) and getloginclass(2). This makes it possible for the kernel to track login class the process is assigned to, which is required for RCTL. This change also make setusercontext(3) call setloginclass(2) and makes it possible to retrieve current login class using id(1). Reviewed by: kib (as part of a larger patch)
# 217896	26-Jan-2011	dchagin	Add macro to test the sv_flags of any process. Change some places to test the flags instead of explicit comparing with address of known sysentvec structures. MFC after: 1 month
# 216861	31-Dec-2010	bz	Mfp4 CH177924: Add and export constants of array sizes of jail parameters as compiled into the kernel. This is the least intrusive way to allow kvm to read the (sparse) arrays independent of the options the kernel was compiled with. Reviewed by: jhb (originally) MFC after: 1 week Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH
# 212436	10-Sep-2010	jamie	Don't exit kern_jail_set without freeing options when enforce_statfs has an illegal value. MFC after: 3 days
# 211085	08-Aug-2010	jamie	Back out r210974. Any convenience of not typing "persist" is outweighed by the possibility of unintended partially-formed jails.
# 210974	06-Aug-2010	jamie	Implicitly make a new jail persistent if it's set not to attach. MFC after: 3 days
# 208803	04-Jun-2010	cperciva	Declare ip6 as (struct in6_addr ) instead of (struct in_addr ). This is a harmless bug since we never actually use ip6 as anything other than an opaque pointer. Found with: Coverty Prevent(tm) CID: 4319 MFC after: 1 month
# 205014	11-Mar-2010	nwhitehorn	Provide groundwork for 32-bit binary compatibility on non-x86 platforms, for upcoming 64-bit PowerPC and MIPS support. This renames the COMPAT_IA32 option to COMPAT_FREEBSD32, removes some IA32-specific code from MI parts of the kernel and enhances the freebsd32 compatibility code to support big-endian platforms. Reviewed by: kib, jhb
# 203052	26-Jan-2010	delphij	Revised revision 199201 (add interface description capability as inspired by OpenBSD), based on comments from many, including rwatson, jhb, brooks and others. Sponsored by: iXsystems, Inc. MFC after: 1 month
# 202468	17-Jan-2010	bz	Add ip4.saddrsel/ip4.nosaddrsel (and equivalent for ip6) to control whether to use source address selection (default) or the primary jail address for unbound outgoing connections. This is intended to be used by people upgrading from single-IP jails to multi-IP jails but not having to change firewall rules, application ACLs, ... but to force their connections (unless otherwise changed) to the primry jail IP they had been used for years, as well as for people prefering to implement similar policies. Note that for IPv6, if configured incorrectly, this might lead to scope violations, which single-IPv6 jails could as well, as by the design of jails. [1] Reviewed by: jamie, hrs (ipv6 part) Pointed out by: hrs [1] MFC After: 2 weeks Asked for by: Jase Thew (bazerka beardz.net)
# 202123	11-Jan-2010	bz	Change DDB show prison: - name some columns more closely to the user space variables, as we do for host.* or allow.* (in the listing) already. - print pr_childmax (children.max). - prefix hex values with 0x. MFC after: 3 weeks
# 202116	11-Jan-2010	bz	Adjust a comment to reflect reality, as we have proper source address selection, even for IPv4, since r183571. Pointed out by: Jase Thew (bazerka beardz.net) MFC after: 3 days
# 201145	28-Dec-2009	antoine	(S)LIST_HEAD_INITIALIZER takes a (S)LIST_HEAD as an argument. Fix some wrong usages. Note: this does not affect generated binaries as this argument is not used. PR: 137213 Submitted by: Eygene Ryabinkin (initial version) MFC after: 1 month
# 200473	13-Dec-2009	bz	Throughout the network stack we have a few places of if (jailed(cred)) left. If you are running with a vnet (virtual network stack) those will return true and defer you to classic IP-jails handling and thus things will be "denied" or returned with an error. Work around this problem by introducing another "jailed()" function, jailed_without_vnet(), that also takes vnets into account, and permits the calls, should the jail from the given cred have its own virtual network stack. We cannot change the classic jailed() call to do that, as it is used outside the network stack as well. Discussed with: julian, zec, jamie, rwatson (back in Sept) MFC after: 5 days
# 199231	12-Nov-2009	delphij	Revert revision 199201 for now as it has introduced a kernel vulnerability and requires more polishing.
# 199201	11-Nov-2009	delphij	Add interface description capability as inspired by OpenBSD. MFC after: 3 months
# 196970	08-Sep-2009	phk	Revert previous commit and add myself to the list of people who should know better than to commit with a cat in the area.
# 196969	08-Sep-2009	phk	Add necessary include.
# 196835	04-Sep-2009	jamie	Allow a jail's name to be the same as its jid (which is the default if no name is specified), but still disallow other numeric names. Reviewed by: zec Approved by: bz (mentor) MFC after: 3 days
# 196592	27-Aug-2009	jamie	Fix a LOR between allprison_lock and vnode locks by releasing allprison_lock before releasing a prison's root vnode. PR: kern/138004 Reviewed by: kib Approved by: bz (mentor) MFC after: 3 days
# 196505	24-Aug-2009	zec	When "jail -c vnet" request fails, the current code actually creates and leaves behind an orphaned vnet. This change ensures that such vnets get released. This change affects only options VIMAGE builds. Submitted by: jamie Discussed with: bz Approved by: re (rwatson), julian (mentor) MFC after: 3 days
# 196176	13-Aug-2009	bz	Make it possible to change the vnet sysctl variables on jails with their own virtual network stack. Jails only inheriting a network stack cannot change anything that cannot be changed from within a prison. Reviewed by: rwatson, zec Approved by: re (kib)
# 196135	12-Aug-2009	bz	Make the kernel compile without IP networking by moving a variable under a proper #ifdef. Approved by: re (rwatson)
# 196019	01-Aug-2009	rwatson	Merge the remainder of kern_vimage.c and vimage.h into vnet.c and vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes. Reviewed by: bz Approved by: re (vimage blanket)
# 196002	31-Jul-2009	jamie	Make the "enforce_statfs" default 2 (most restrictive) in jail_set(2), instead of whatever the parent/system has (which is generally 0). This mirrors the old-style default used for jail(2) in conjunction with the security.jail.enforce_statfs sysctl. Approved by: re (kib), bz (mentor)
# 195974	30-Jul-2009	jamie	Remove a LOR, where the the sleepable allprison_lock was being obtained in prison_equal_ip4/6 while an inp mutex was held. Locking allprison_lock can be avoided by making a restriction on the IP addresses associated with jails: Don't allow the "ip4" and "ip6" parameters to be changed after a jail is created. Setting the "ip4.addr" and "ip6.addr" parameters is allowed, but only if the jail was already created with either ip4/6=new or ip4/6=disable. With this restriction, the prison flags in question (PR_IP4_USER and PR_IP6_USER) become read-only and can be checked without locking. This also allows the simplification of a messy code path that was needed to handle an existing prison gaining an IP address list. PR: kern/136899 Reported by: Dirk Meyer Approved by: re (kib), bz (mentor)
# 195945	29-Jul-2009	jamie	Don't allow mixing the "vnet" and "ip4/6" jail parameters, since vnet jails have their own IP stack and don't have access to the parent IP addresses anyway. Note that a virtual network stack forms a break between prisons with regard to the list of allowed IP addresses. Approved by: re (kib), bz (mentor)
# 195944	29-Jul-2009	jamie	Change the default value of the "ip4" and "ip6" jail parameters to "disable", which only allows access to the parent/physical system's IP addresses when specifically directed. Change the default value of "host" to "new", and don't copy the parent host values, to insulate jails from the parent hostname et al. Approved by: re (kib), bz (mentor)
# 195870	25-Jul-2009	jamie	Some jail parameters (in particular, "ip4" and "ip6" for IP address restrictions) were found to be inadequately described by a boolean. Define a new parameter type with three values (disable, new, inherit) to handle these and future cases. Approved by: re (kib), bz (mentor) Discussed with: rwatson
# 195741	17-Jul-2009	jamie	Remove the interim vimage containers, struct vimage and struct procg, and the ioctl-based interface that supported them. Approved by: re (kib), bz (mentor)
# 194923	24-Jun-2009	jamie	Wrap a PR_VNET inside "#ifdef VIMAGE" since that the only place it applies. bz wants the blame for this. Noticed by: rwatson Approved by: bz (mentor)
# 194915	24-Jun-2009	jamie	In case of prisons with their own network stack, permit additional privileges as well as not restricting the type of sockets a user can open. Note: the VIMAGE/vnet fetaure of of jails is still considered experimental and cannot guarantee that privileged users can be kept imprisoned if enabled. Reviewed by: rwatson Approved by: bz (mentor)
# 194762	23-Jun-2009	jamie	Add a limit for child jails via the "children.cur" and "children.max" parameters. This replaces the simple "allow.jails" permission. Approved by: bz (mentor)
# 194251	15-Jun-2009	jamie	Manage vnets via the jail system. If a jail is given the boolean parameter "vnet" when it is created, a new vnet instance will be created along with the jail. Networks interfaces can be moved between prisons with an ioctl similar to the one that moves them between vimages. For now vnets will co-exist under both jails and vimages, but soon struct vimage will be going away. Reviewed by: zec, julian Approved by: bz (mentor)
# 194118	13-Jun-2009	jamie	Rename the host-related prison fields to be the same as the host.* parameters they represent, and the variables they replaced, instead of abbreviated versions of them. Approved by: bz (mentor)
# 194090	12-Jun-2009	jamie	Add counterparts to getcredhostname: getcreddomainname, getcredhostuuid, getcredhostid Suggested by: rmacklem Approved by: bz
# 193865	09-Jun-2009	jamie	Fix some overflow errors: a signed allocation and an insufficiant array size. Reported by: pho Tested by: pho Approved by: bz (mentor)
# 193511	05-Jun-2009	rwatson	Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd
# 193066	29-May-2009	jamie	Place hostnames and similar information fully under the prison system. The system hostname is now stored in prison0, and the global variable "hostname" has been removed, as has the hostname_mtx mutex. Jails may have their own host information, or they may inherit it from the parent/system. The proper way to read the hostname is via getcredhostname(), which will copy either the hostname associated with the passed cred, or the system hostname if you pass NULL. The system hostname can still be accessed directly (and without locking) at prison0.pr_host, but that should be avoided where possible. The "similar information" referred to is domainname, hostid, and hostuuid, which have also become prison parameters and had their associated global variables removed. Approved by: bz (mentor)
# 192895	27-May-2009	jamie	Add hierarchical jails. A jail may further virtualize its environment by creating a child jail, which is visible to that jail and to any parent jails. Child jails may be restricted more than their parents, but never less. Jail names reflect this hierarchy, being MIB-style dot-separated strings. Every thread now points to a jail, the default being prison0, which contains information about the physical system. Prison0's root directory is the same as rootvnode; its hostname is the same as the global hostname, and its securelevel replaces the global securelevel. Note that the variable "securelevel" has actually gone away, which should not cause any problems for code that properly uses securelevel_gt() and securelevel_ge(). Some jail-related permissions that were kept in global variables and set via sysctls are now per-jail settings. The sysctls still exist for backward compatibility, used only by the now-deprecated jail(2) system call. Approved by: bz (mentor)
# 192644	23-May-2009	jamie	Delay an error message until the variable it uses gets initialized. Found with: Coverity Prevent(tm) CID: 4316 Reported by: trasz Approved by: bz (mentor)
# 191915	08-May-2009	zec	Introduce a new virtualization container, provisionally named vprocg, to hold virtualized instances of hostname and domainname, as well as a new top-level virtualization struct vimage, which holds pointers to struct vnet and struct vprocg. Struct vprocg is likely to become replaced in the near future with a new jail management API import. As a consequence of this change, change struct ucred to point to a struct vimage, instead of directly pointing to a vnet. Merge vnet / vimage / ucred refcounting infrastructure from p4 / vimage branch. Permit kldload / kldunload operations to be executed only from the default vimage context. This change should have no functional impact on nooptions VIMAGE kernel builds. Reviewed by: bz Approved by: julian (mentor)
# 191896	07-May-2009	jamie	Move the per-prison Linux MIB from a private one-off pointer to the new OSD-based jail extensions. This allows the Linux MIB to accessed via jail_set and jail_get, and serves as a demonstration of adding jail support to a module. Reviewed by: dchagin, kib Approved by: bz (mentor)
# 191673	29-Apr-2009	jamie	Introduce the extensible jail framework, using the same "name=value" interface as nmount(2). Three new system calls are added: * jail_set, to create jails and change the parameters of existing jails. This replaces jail(2). * jail_get, to read the parameters of existing jails. This replaces the security.jail.list sysctl. * jail_remove to kill off a jail's processes and remove the jail. Most jail parameters may now be changed after creation, and jails may be set to exist without any attached processes. The current jail(2) system call still exists, though it is now a stub to jail_set(2). Approved by: bz (mentor)
# 191671	29-Apr-2009	jamie	Some non-functional changes: whitespace, KASSERT strings, declaration order. Approved by: bz (mentor)
# 190466	27-Mar-2009	jamie	Whitespace/spelling fixes in advance of upcoming functional changes. Approved by: bz (mentor)
# 188146	05-Feb-2009	jamie	Don't allow creating a socket with a protocol family that the current jail doesn't support. This involves a new function prison_check_af, like prison_check_ip[46] but that checks only the family. With this change, most of the errors generated by jailed sockets shouldn't ever occur, at least until jails are changeable. Approved by: bz (mentor)
# 188144	05-Feb-2009	jamie	Standardize the various prison_foo_ip[46] functions and prison_if to return zero on success and an error code otherwise. The possible errors are EADDRNOTAVAIL if an address being checked for doesn't match the prison, and EAFNOSUPPORT if the prison doesn't have any addresses in that address family. For most callers of these functions, use the returned error code instead of e.g. a hard-coded EADDRNOTAVAIL or EINVAL. Always include a jailed() check in these functions, where a non-jailed cred always returns success (and makes no changes). Remove the explicit jailed() checks that preceded many of the function calls. Approved by: bz (mentor)
# 187864	28-Jan-2009	ed	Mark most often used sysctl's as MPSAFE. After running a `make buildkernel', I noticed most of the Giant locks in sysctl are only caused by a very small amount of sysctl's: - sysctl.name2oid. This one is locked by SYSCTL_LOCK, just like sysctl.oidfmt. - kern.ident, kern.osrelease, kern.version, etc. These are just constant strings. - kern.arandom, used by the stack protector. It is already protected by arc4_mtx. I also saw the following sysctl's show up. Not as often as the ones above, but still quite often: - security.jail.jailed. Also mark security.jail.list as MPSAFE. They don't need locking or already use allprison_lock. - kern.devname, used by devname(3), ttyname(3), etc. This seems to reduce Giant locking inside sysctl by ~75% in my primitive test setup.
# 187684	25-Jan-2009	bz	For consistency with prison_{local,remote,check}_ipN rename prison_getipN to prison_get_ipN. Submitted by: jamie (as part of a larger patch) MFC after: 1 week
# 186736	04-Jan-2009	bz	Back out r186615; the sanitizing of the pointers in the error case is not needed and seems that it will not be needed either. Pointy hat: mine, mine, mine and not pho's
# 186615	30-Dec-2008	pho	Added missing second part of cleaning j->ip[46] as requested by bz Approved by: kib (mentor) Pointy hat: pho
# 186606	30-Dec-2008	pho	Make sure that unused j->ip[46] are cleared Reviewed by: bz Approved by: kib (mentor)
# 185899	10-Dec-2008	bz	Correctly check the number of prison states to not access anything outside the prison_states array. When checking if there is a name configured for the prison, check the first character to not be '\0' instead of checking if the char array is present, which it always is. Note, that this is different for the *jailname in the syscall. Found with: Coverity Prevent(tm) CID: 4156, 4155 MFC after: 4 weeks (just that I get the mail)
# 185441	29-Nov-2008	bz	Unbreak the no-networks (no INET/6) build that I broke with the commit in r185435. Pointyhat: no, but I could need a ski cap for the winter
# 185435	29-Nov-2008	bz	MFp4: Bring in updated jail support from bz_jail branch. This enhances the current jail implementation to permit multiple addresses per jail. In addtion to IPv4, IPv6 is supported as well. Due to updated checks it is even possible to have jails without an IP address at all, which basically gives one a chroot with restricted process view, no networking,.. SCTP support was updated and supports IPv6 in jails as well. Cpuset support permits jails to be bound to specific processor sets after creation. Jails can have an unrestricted (no duplicate protection, etc.) name in addition to the hostname. The jail name cannot be changed from within a jail and is considered to be used for management purposes or as audit-token in the future. DDB 'show jails' command was added to aid debugging. Proper compat support permits 32bit jail binaries to be used on 64bit systems to manage jails. Also backward compatibility was preserved where possible: for jail v1 syscalls, as well as with user space management utilities. Both jail as well as prison version were updated for the new features. A gap was intentionally left as the intermediate versions had been used by various patches floating around the last years. Bump __FreeBSD_version for the afore mentioned and in kernel changes. Special thanks to: - Pawel Jakub Dawidek (pjd) for his multi-IPv4 patches and Olivier Houchard (cognet) for initial single-IPv6 patches. - Jeff Roberson (jeff) and Randall Stewart (rrs) for their help, ideas and review on cpuset and SCTP support. - Robert Watson (rwatson) for lots and lots of help, discussions, suggestions and review of most of the patch at various stages. - John Baldwin (jhb) for his help. - Simon L. Nielsen (simon) as early adopter testing changes on cluster machines as well as all the testers and people who provided feedback the last months on freebsd-jail and other channels. - My employer, CK Software GmbH, for the support so I could work on this. Reviewed by: (see above) MFC after: 3 months (this is just so that I get the mail) X-MFC Before: 7.2-RELEASE if possible
# 185404	28-Nov-2008	bz	With the permissions of phk@ change the license on kern_jail.c to a 2 clause BSD license.
# 185029	17-Nov-2008	pjd	Update ZFS from version 6 to 13 and bring some FreeBSD-specific changes. This bring huge amount of changes, I'll enumerate only user-visible changes: - Delegated Administration Allows regular users to perform ZFS operations, like file system creation, snapshot creation, etc. - L2ARC Level 2 cache for ZFS - allows to use additional disks for cache. Huge performance improvements mostly for random read of mostly static content. - slog Allow to use additional disks for ZFS Intent Log to speed up operations like fsync(2). - vfs.zfs.super_owner Allows regular users to perform privileged operations on files stored on ZFS file systems owned by him. Very careful with this one. - chflags(2) Not all the flags are supported. This still needs work. - ZFSBoot Support to boot off of ZFS pool. Not finished, AFAIK. Submitted by: dfr - Snapshot properties - New failure modes Before if write requested failed, system paniced. Now one can select from one of three failure modes: - panic - panic on write error - wait - wait for disk to reappear - continue - serve read requests if possible, block write requests - Refquota, refreservation properties Just quota and reservation properties, but don't count space consumed by children file systems, clones and snapshots. - Sparse volumes ZVOLs that don't reserve space in the pool. - External attributes Compatible with extattr(2). - NFSv4-ACLs Not sure about the status, might not be complete yet. Submitted by: trasz - Creation-time properties - Regression tests for zpool(8) command. Obtained from: OpenSolaris
# 184205	23-Oct-2008	des	Retire the MALLOC and FREE macros. They are an abomination unto style(9). MFC after: 3 months
# 183550	02-Oct-2008	zec	Step 1.5 of importing the network stack virtualization infrastructure from the vimage project, as per plan established at devsummit 08/08: http://wiki.freebsd.org/Image/Notes200808DevSummit Introduce INIT_VNET_() initializer macros, VNET_FOREACH() iterator macros, and CURVNET_SET() context setting macros, all currently resolving to NOPs. Prepare for virtualization of selected SYSCTL objects by introducing a family of SYSCTL_V_() macros, currently resolving to their global counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT(). Move selected #defines from sys/sys/vimage.h to newly introduced header files specific to virtualized subsystems (sys/net/vnet.h, sys/netinet/vinet.h etc.). All the changes are verified to have zero functional impact at this point in time by doing MD5 comparision between pre- and post-change object files(). () netipsec/keysock.c did not validate depending on compile time options. Implemented by: julian, bz, brooks, zec Reviewed by: julian, bz, brooks, kris, rwatson, ... Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation
# 181803	17-Aug-2008	bz	Commit step 1 of the vimage project, (network stack) virtualization work done by Marko Zec (zec@). This is the first in a series of commits over the course of the next few weeks. Mark all uses of global variables to be virtualized with a V_ prefix. Use macros to map them back to their global names for now, so this is a NOP change only. We hope to have caught at least 85-90% of what is needed so we do not invalidate a lot of outstanding patches again. Obtained from: //depot/projects/vimage-commit2/... Reviewed by: brooks, des, ed, mav, julian, jamie, kris, rwatson, zec, ... (various people I forgot, different versions) md5 (with a bit of help) Sponsored by: NLnet Foundation, The FreeBSD Foundation X-MFC after: never V_Commit_Message_Reviewed_By: more people than the patch
# 180357	07-Jul-2008	bz	MFp4 144659: Plug a memory leak with jail services. PR: 125257 Submitted by: Mateusz Guzik <mjguzik gmail.com> MFC after: 6 days
# 180291	05-Jul-2008	rwatson	Introduce a new lock, hostname_mtx, and use it to synchronize access to global hostname and domainname variables. Where necessary, copy to or from a stack-local buffer before performing copyin() or copyout(). A few uses, such as in cd9660 and daemon_saver, remain under-synchronized and will require further updates. Correct a bug in which a failed copyin() of domainname would leave domainname potentially corrupted. MFC after: 3 weeks
# 179881	19-Jun-2008	delphij	Revert rev. 178124 as requested by kris@. Having jail id not being reused too frequently is useful for script controlled environment.
# 178124	11-Apr-2008	delphij	Instead of rolling our own jail number allocation procedure, use alloc_unr() to do it. Submitted by: Ed Schouten <ed 80386 nl> PR: kern/122270 MFC after: 1 month
# 177785	31-Mar-2008	kib	Add the support for the AT_FDCWD and fd-relative name lookups to the namei(9). Based on the submission by rdivacky, sponsored by Google Summer of Code 2007 Reviewed by: rwatson, rdivacky Tested by: pho
# 175630	24-Jan-2008	bz	Replace the last susers calls in netinet6/ with privilege checks. Introduce a new privilege allowing to set certain IP header options (hop-by-hop, routing headers). Leave a few comments to be addressed later. Reviewed by: rwatson (older version, before addressing his comments)
# 175294	13-Jan-2008	attilio	VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
# 175202	09-Jan-2008	attilio	vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed. Manpage and FreeBSD_version will be updated through further commits. As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock. Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>
# 172930	24-Oct-2007	rwatson	Merge first in a series of TrustedBSD MAC Framework KPI changes from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer
# 172860	21-Oct-2007	rwatson	Add PRIV_VFS_STAT privilege, which will allow overriding policy limits on the right to stat() a file, such as in mac_bsdextended. Obtained from: TrustedBSD Project MFC after: 3 months
# 168699	13-Apr-2007	pjd	Fix jails and jail-friendly file systems handling: - We need to allow for PRIV_VFS_MOUNT_OWNER inside a jail. - Move security checks to vfs_suser() and deny unmounting and updating for jailed root from different jails, etc. OK'ed by: rwatson
# 168591	10-Apr-2007	rwatson	Allow PRIV_NETINET_REUSEPORT in jail.
# 168489	08-Apr-2007	pjd	prison_free() can be called with a mutex held. This wasn't a problem until I converted allprison_mtx mutex to allprison_lock sx lock. To fix this LOR, move prison removal to prison_complete() entirely. To ensure that noone will reference this prison before it's beeing removed from the list skip prisons with 'pr_ref == 0' in prison_find() and assert that pr_ref has to greater than 0 in prison_hold(). Reported by: kris OK'ed by: rwatson
# 168487	08-Apr-2007	pjd	Only use prison mutex to protect the fields that need to be protected by it.
# 168483	08-Apr-2007	pjd	pr_list is protected by the allprison_lock.
# 168401	05-Apr-2007	pjd	Implement functionality I called 'jail services'. It may be used for external modules to attach some data to jail's in-kernel structure. - Change allprison_mtx mutex to allprison_sx sx(9) lock. We will need to call external functions while holding this lock, which may want to allocate memory. Make use of the fact that this is shared-exclusive lock and use shared version when possible. - Implement the following functions: prison_service_register() - registers a service that wants to be noticed when a jail is created and destroyed prison_service_deregister() - deregisters service prison_service_data_add() - adds service-specific data to the jail structure prison_service_data_get() - takes service-specific data from the jail structure prison_service_data_del() - removes service-specific data from the jail structure Reviewed by: rwatson
# 168399	05-Apr-2007	pjd	Make prison_find() globally accessible.
# 168396	05-Apr-2007	pjd	Add security.jail.mount_allowed sysctl, which allows to mount and unmount jail-friendly file systems from within a jail. Precisely it grants PRIV_VFS_MOUNT, PRIV_VFS_UNMOUNT and PRIV_VFS_MOUNT_NONUSER privileges for a jailed super-user. It is turned off by default. A jail-friendly file system is a file system which driver registers itself with VFCF_JAIL flag via VFS_SET(9) API. The lsvfs(1) command can be used to see which file systems are jail-friendly ones. There currently no jail-friendly file systems, ZFS will be the first one. In the future we may consider marking file systems like nullfs as jail-friendly. Reviewed by: rwatson
# 167354	09-Mar-2007	pjd	Minor simplification.
# 167309	07-Mar-2007	pjd	White space nits.
# 167211	04-Mar-2007	rwatson	Remove 'MPSAFE' annotations from the comments above most system calls: all system calls now enter without Giant held, and then in some cases, acquire Giant explicitly. Remove a number of other MPSAFE annotations in the credential code and tweak one or two other adjacent comments.
# 167152	01-Mar-2007	pjd	Rename PRIV_VFS_CLEARSUGID to PRIV_VFS_RETAINSUGID, which seems to better describe the privilege. OK'ed by: rwatson
# 166838	19-Feb-2007	rwatson	Remove unused PRIV_IPC_EXEC. Renumbers System V IPC privilege.
# 166832	19-Feb-2007	rwatson	Rename three quota privileges from the UFS privilege namespace to the VFS privilege namespace: exceedquota, getquota, and setquota. Leave UFS-specific quota configuration privileges in the UFS name space. This renumbers VFS and UFS privileges, so requires rebuilding modules if you are using security policies aware of privilege identifiers. This is likely no one at this point since none of the committed MAC policies use the privilege checks.
# 166831	19-Feb-2007	rwatson	Limit quota privileges in jail to PRIV_UFS_GETQUOTA and PRIV_UFS_SETQUOTA.
# 166827	19-Feb-2007	rwatson	For now, reflect practical reality that Audit system calls aren't allowed in Jail: return a privilege error.
# 164032	06-Nov-2006	rwatson	Add a new priv(9) kernel interface for checking the availability of privilege for threads and credentials. Unlike the existing suser(9) interface, priv(9) exposes a named privilege identifier to the privilege checking code, allowing more complex policies regarding the granting of privilege to be expressed. Two interfaces are provided, replacing the existing suser(9) interface: suser(td) -> priv_check(td, priv) suser_cred(cred, flags) -> priv_check_cred(cred, priv, flags) A comprehensive list of currently available kernel privileges may be found in priv.h. New privileges are easily added as required, but the comments on adding privileges found in priv.h and priv(9) should be read before doing so. The new privilege interface exposed sufficient information to the privilege checking routine that it will now be possible for jail to determine whether a particular privilege is granted in the check routine, rather than relying on hints from the calling context via the SUSER_ALLOWJAIL flag. For now, the flag is maintained, but a new jail check function, prison_priv_check(), is exposed from kern_jail.c and used by the privilege check routine to determine if the privilege is permitted in jail. As a result, a centralized list of privileges permitted in jail is now present in kern_jail.c. The MAC Framework is now also able to instrument privilege checks, both to deny privileges otherwise granted (mac_priv_check()), and to grant privileges otherwise denied (mac_priv_grant()), permitting MAC Policy modules to implement privilege models, as well as control a much broader range of system behavior in order to constrain processes running with root privilege. The suser() and suser_cred() functions remain implemented, now in terms of priv_check() and the PRIV_ROOT privilege, for use during the transition and possibly continuing use by third party kernel modules that have not been updated. The PRIV_DRIVER privilege exists to allow device drivers to check privilege without adopting a more specific privilege identifier. This change does not modify the actual security policy, rather, it modifies the interface for privilege checks so changes to the security policy become more feasible. Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>
# 163606	22-Oct-2006	rwatson	Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA
# 162383	17-Sep-2006	rwatson	Declare security and security.bsd sysctl hierarchies in sysctl.h along with other commonly used sysctl name spaces, rather than declaring them all over the place. MFC after: 1 month Sponsored by: nCircle Network Security, Inc.
# 150652	27-Sep-2005	csjp	Push Giant down in jails. Pass the MPSAFE flag to NDINIT, and keep track of whether or not Giant was picked up by the filesystem. Add VFS_LOCK_GIANT macros around vrele as it's possible that this can call in the VOP_INACTIVE filesystem specific code. Also while we are here, remove the Giant assertion. from the sysctl handler, we do not actually require Giant here so we shouldn't assert it. Doing so will just complicate things when Giant is removed from the sysctl framework.
# 147559	23-Jun-2005	pjd	Actually only protect mount-point if security.jail.enforce_statfs is set to 2. If we don't return statistics about requested file systems, system tools may not work correctly or at all. Approved by: re (scottl)
# 147185	09-Jun-2005	pjd	Rename sysctl security.jail.getfsstatroot_only to security.jail.enforce_statfs and extend its functionality: value policy 0 show all mount-points without any restrictions 1 show only mount-points below jail's chroot and show only part of the mount-point's path (if jail's chroot directory is /jails/foo and mount-point is /jails/foo/usr/home only /usr/home will be shown) 2 show only mount-point where jail's chroot directory is placed. Default value is 2. Discussed with: rwatson
# 144660	05-Apr-2005	jeff	- Use taskqueue_thread rather than taskqueue_swi since our task is going to vrele, which may vop lock. This is not safe in a software interrupt context.
# 144442	31-Mar-2005	jhb	Drop a bogus mp_fixme(). Adding a lock would do nothing to reduce userland races regarding changing of jail-related sysctls.
# 141543	08-Feb-2005	cperciva	Add a new sysctl, "security.jail.chflags_allowed", which controls the behaviour of chflags within a jail. If set to 0 (the default), then a jailed root user is treated as an unprivileged user; if set to 1, then a jailed root user is treated the same as an unjailed root user. This is necessary to allow "make installworld" to work inside a jail, since it attempts to manipulate the system immutable flag on certain files. Discussed with: csjp, rwatson MFC after: 2 weeks
# 139804	06-Jan-2005	imp	/* -> /*- for copyright notices, minor format tweaks as necessary
# 131177	27-Jun-2004	pjd	Add two missing includes and remove two uneeded. This is quite serious fix, because even with MAC framework compiled in, MAC entry points in those two files were simply ignored.
# 129462	20-May-2004	pjd	Fix sysctl name: security.jail.getfsstate_getfsstatroot_only -> security.jail.getfsstatroot_only. Approved by: rwatson
# 128664	26-Apr-2004	bmilekic	Give jail(8) the feature to allow raw sockets from within a jail, which is less restrictive but allows for more flexible jail usage (for those who are willing to make the sacrifice). The default is off, but allowing raw sockets within jails can now be accomplished by tuning security.jail.allow_raw_sockets to 1. Turning this on will allow you to use things like ping(8) or traceroute(8) from within a jail. The patch being committed is not identical to the patch in the PR. The committed version is more friendly to APIs which pjd is working on, so it should integrate into his work quite nicely. This change has also been presented and addressed on the freebsd-hackers mailing list. Submitted by: Christian S.J. Peron <maneo@bsdpro.com> PR: kern/65800
# 127020	15-Mar-2004	pjd	Remove sysctl security.jail.list_allowed. This functionality was a misfeature, sysctl was added and turned off by default just to check if nobody complains. Reviewed by: rwatson
# 126023	19-Feb-2004	nectar	Rework jail_attach(2) so that an already jailed process cannot hop to another jail. Submitted by: rwatson
# 126004	19-Feb-2004	pjd	Added sysctl security.jail.jailed. It returns 1 is process is inside of jail and 0 if it is not. Information if we are in jail or not is not a secret, there is plenty of ways to discover it. Many people are using own hack to check this and this will be a legal way from now on. It will be great if our starting scripts will take advantage of this sysctl to allow clean "boot" inside jail. Approved by: rwatson, scottl (mentor)
# 125806	14-Feb-2004	rwatson	By default, don't allow processes in a jail to list the set of jails in the system. Previous behavior (allowed) may be restored by setting security.jail.list_allowed=1.
# 125805	14-Feb-2004	rwatson	Fix mismerge in last commit: check that cred->cr_prison is NULL before dereferencing the prison pointer.
# 125804	14-Feb-2004	rwatson	By default, when a process in jail calls getfsstat(), only return the data for the file system on which the jail's root vnode is located. Previous behavior (show data for all mountpoints) can be restored by setting security.jail.getfsstatroot_only to 0. Note: this also has the effect of hiding other mounts inside a jail, such as /dev, /tmp, and /proc, but errs on the side of leaking less information.
# 124882	23-Jan-2004	rwatson	Defer the vrele() on a jail's root vnode reference from prison_free() to a new prison_complete() task run by a task queue. This removes a requirement for grabbing Giant in crfree(). Embed the 'struct task' in 'struct prison' so that we don't have to allocate memory from prison_free() (which means we also defer the FREE()). With this change, I believe grabbing Giant from crfree() can now be removed, but need to check the uidinfo code paths. To avoid header pollution, move the definition of 'struct task' to _task.h, and recursively include from taskqueue.h and jail.h; much preferably to all files including jail.h picking up a requirement to include taskqueue.h. Bumped into by: sam Reviewed by: bde, tjr
# 116182	10-Jun-2003	obrien	Use __FBSDID().
# 114168	28-Apr-2003	mike	style(9)
# 113630	17-Apr-2003	jhb	- The prison mutex cannot possibly protect pointers to the prison it protects, so don't bother locking it while we assign it to a ucred's cr_prison. - Fully construct the new credential for a process before assigning it to p_ucred.
# 113275	09-Apr-2003	mike	o In struct prison, add an allprison linked list of prisons (protected by allprison_mtx), a unique prison/jail identifier field, two path fields (pr_path for reporting and pr_root vnode instance) to store the chroot() point of each jail. o Add jail_attach(2) to allow a process to bind to an existing jail. o Add change_root() to perform the chroot operation on a specified vnode. o Generalize change_dir() to accept a vnode, and move namei() calls to callers of change_dir(). o Add a new sysctl (security.jail.list) which is a group of struct xprison instances that represent a snapshot of active jails. Reviewed by: rwatson, tjr
# 111119	19-Feb-2003	imp	Back out M_* changes, per decision of the TRB. Approved by: trb
# 109623	21-Jan-2003	alfred	Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
# 108128	20-Dec-2002	mux	Don't forget to destroy the mutex if an error occurs in the jail() system call. Submitted by: Pawel Jakub Dawidek <nick@garage.freebsd.pl>
# 107850	14-Dec-2002	alfred	remove syscallarg(). Suggested by: peter
# 105354	17-Oct-2002	robert	Use strlcpy() instead of strncpy() to copy NUL terminated strings for safety and consistency.
# 99227	01-Jul-2002	iedowse	The jail syscall calls chroot, which is not mpsafe, so put back a mtx_lock(&Giant) around that call. Reviewed by: arr
# 98832	25-Jun-2002	arr	- Alleviate jail() from having the burden of acquiring Giant by simply removing. We can do this since we no longer need Giant to safely execute jail(). Reviewed by: rwatson, jhb
# 93818	04-Apr-2002	jhb	Change callers of mtx_init() to pass in an appropriate lock type name. In most cases NULL is passed, but in some cases such as network driver locks (which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used. Tested on: i386, alpha, sparc64
# 93593	01-Apr-2002	jhb	Change the suser() API to take advantage of td_ucred as well as do a general cleanup of the API. The entire API now consists of two functions similar to the pre-KSE API. The suser() function takes a thread pointer as its only argument. The td_ucred member of this thread must be valid so the only valid thread pointers are curthread and a few kernel threads such as thread0. The suser_cred() function takes a pointer to a struct ucred as its first argument and an integer flag as its second argument. The flag is currently only used for the PRISON_ROOT flag. Discussed on: smp@
# 91391	27-Feb-2002	robert	Make getcredhostname() take a buffer and the buffer's size as arguments. The correct hostname is copied into the buffer while having the prison's lock acquired in a jailed process' case. Reviewed by: jhb, rwatson
# 91384	27-Feb-2002	robert	Add a function which returns the correct hostname for a given credential. Reviewed by: phk
# 89414	16-Jan-2002	arr	- Attempt to help declutter kern. sysctl by moving security out from beneath it. Reviewed by: rwatson
# 87716	12-Dec-2001	arr	- Move _jail sysctl node underneath _kern_security in order to standardize where our security related sysctl tuneables are located. Also, this will help if/when we move _security node out from under _kern as to help make _kern less cluttered. Approved by: rwatson Review by: rwatson
# 87275	03-Dec-2001	rwatson	o Introduce pr_mtx into struct prison, providing protection for the mutable contents of struct prison (hostname, securelevel, refcount, pr_linux, ...) o Generally introduce mtx_lock()/mtx_unlock() calls throughout kern/ so as to enforce these protections, in particular, in kern_mib.c protection sysctl access to the hostname and securelevel, as well as kern_prot.c access to the securelevel for access control purposes. o Rewrite linux emulator abstractions for accessing per-jail linux mib entries (osname, osrelease, osversion) so that they don't return a pointer to the text in the struct linux_prison, rather, a copy to an array passed into the calls. Likewise, update linprocfs to use these primitives. o Update in_pcb.c to always use prison_getip() rather than directly accessing struct prison. Reviewed by: jhb
# 85844	01-Nov-2001	rwatson	o Move suser() calls in kern/ to using suser_xxx() with an explicit credential selection, rather than reference via a thread or process pointer. This is part of a gradual migration to suser() accepting a struct ucred instead of a struct proc, simplifying the reference and locking semantics of suser(). Obtained from: TrustedBSD Project
# 84828	11-Oct-2001	jhb	- Catch up to the new ucred API. - Add proc locking to the jail() syscall. This mostly involved shuffling a few things around so that blockable things like malloc and copyin were performed before acquiring the lock and checking the existing ucred and then updating the ucred as one "atomic" change under the proc lock.
# 83989	26-Sep-2001	rwatson	o Initialize per-jail securelevel from global securelevel as part of jail creation. Obtained from: TrustedBSD Project
# 83366	12-Sep-2001	julian	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
# 82710	01-Sep-2001	dillon	Pushdown Giant for acct(), kqueue(), kevent(), execve(), fork(), vfork(), rfork(), jail().
# 81114	03-Aug-2001	rwatson	Anton kindly pointed out (and fixed) a bug in the Jail handling of the bind() call on IPv4 sockets: Currently, if one tries to bind a socket using INADDR_LOOPBACK inside a jail, it will fail because prison_ip() does not take this possibility into account. On the other hand, when one tries to connect(), for example, to localhost, prison_remote_ip() will silently convert INADDR_LOOPBACK to the jail's IP address. Therefore, it is desirable to make bind() to do this implicit conversion as well. Apart from this, the patch also replaces 0x7f000001 in prison_remote_ip() to a more correct INADDR_LOOPBACK. This is a 4.4-RELEASE "during the freeze, thanks" MFC candidate. Submitted by: Anton Berezin <tobez@FreeBSD.org> Discussed with at some point: phk MFC after: 3 days
# 72786	21-Feb-2001	rwatson	o Move per-process jail pointer (p->pr_prison) to inside of the subject credential structure, ucred (cr->cr_prison). o Allow jail inheritence to be a function of credential inheritence. o Abstract prison structure reference counting behind pr_hold() and pr_free(), invoked by the similarly named credential reference management functions, removing this code from per-ABI fork/exit code. o Modify various jail() functions to use struct ucred arguments instead of struct proc arguments. o Introduce jailed() function to determine if a credential is jailed, rather than directly checking pointers all over the place. o Convert PRISON_CHECK() macro to prison_check() function. o Move jail() function prototypes to jail.h. o Emulate the P_JAILED flag in fill_kinfo_proc() and no longer set the flag in the process flags field itself. o Eliminate that "const" qualifier from suser/p_can/etc to reflect mutex use. Notes: o Some further cleanup of the linux/jail code is still required. o It's now possible to consider resolving some of the process vs credential based permission checking confusion in the socket code. o Mutex protection of struct prison is still not present, and is required to protect the reference count plus some fields in the structure. Reviewed by: freebsd-arch Obtained from: TrustedBSD Project
# 69781	08-Dec-2000	dwmalone	Convert more malloc+bzero to malloc+M_ZERO. Submitted by: josh@zipperup.org Submitted by: Robert Drehmel <robd@gmx.net>
# 68024	30-Oct-2000	rwatson	o Deny access to System V IPC from within jail by default, as in the current implementation, jail neither virtualizes the Sys V IPC namespace, nor provides inter-jail protections on IPC objects. o Support for System V IPC can be enabled by setting jail.sysvipc_allowed=1 using sysctl. o This is not the "real fix" which involves virtualizing the System V IPC namespace, but prevents processes within jail from influencing those outside of jail when not approved by the administrator. Reported by: Paulo Fragoso <paulo@nlink.com.br>
# 61235	04-Jun-2000	rwatson	o Modify jail to limit creation of sockets to UNIX domain sockets, TCP/IP (v4) sockets, and routing sockets. Previously, interaction with IPv6 was not well-defined, and might be inappropriate for some environments. Similarly, sysctl MIB entries providing interface information also give out only addresses from those protocol domains. For the time being, this functionality is enabled by default, and toggleable using the sysctl variable jail.socket_unixiproute_only. In the future, protocol domains will be able to determine whether or not they are ``jail aware''. o Further limitations on process use of getpriority() and setpriority() by jailed processes. Addresses problem described in kern/17878. Reviewed by: phk, jmg
# 57163	12-Feb-2000	rwatson	Yet-another-update: rename ``kern.prison'' to a new sysctl root entry, ``jail'', and move the set_hostname_allowed sysctl there, as well as fixing a bug in the sysctl that resulted in jails being over-limited (preventing them from reading as well as writing the hostname). Also, correct some formatting issues, courtesy bde :-). Reviewed by: phk Approved by: jkh
# 51398	19-Sep-1999	phk	Add a version number field to the jail(2) argument so that future changes can be handled intelligently.
# 50477	27-Aug-1999	peter	$Id$ -> $FreeBSD$
# 46197	30-Apr-1999	phk	Add beer-ware license and $Id$ Noticed by: dillon
# 46194	30-Apr-1999	phk	Make BOOTP to work again. Submitted by: dillon Reviewed by: phk
# 46155	28-Apr-1999	phk	This Implements the mumbled about "Jail" feature. This is a seriously beefed up chroot kind of thing. The process is jailed along the same lines as a chroot does it, but with additional tough restrictions imposed on what the superuser can do. For all I know, it is safe to hand over the root bit inside a prison to the customer living in that prison, this is what it was developed for in fact: "real virtual servers". Each prison has an ip number associated with it, which all IP communications will be coerced to use and each prison has its own hostname. Needless to say, you need more RAM this way, but the advantage is that each customer can run their own particular version of apache and not stomp on the toes of their neighbors. It generally does what one would expect, but setting up a jail still takes a little knowledge. A few notes: I have no scripts for setting up a jail, don't ask me for them. The IP number should be an alias on one of the interfaces. mount a /proc in each jail, it will make ps more useable. /proc/<pid>/status tells the hostname of the prison for jailed processes. Quotas are only sensible if you have a mountpoint per prison. There are no privisions for stopping resource-hogging. Some "#ifdef INET" and similar may be missing (send patches!) If somebody wants to take it from here and develop it into more of a "virtual machine" they should be most welcome! Tools, comments, patches & documentation most welcome. Have fun... Sponsored by: http://www.rndassociates.com/ Run for almost a year by: http://www.servetheweb.com/