History log of /freebsd-current/sys/kern/vfs_extattr.c
Revision Date Author Comments
# fdafd315 24-Nov-2023 Warner Losh <imp@FreeBSD.org>

sys: Automated cleanup of cdefs and other formatting

Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.

Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/

Sponsored by: Netflix


# 3555be01 08-Sep-2023 John Baldwin <jhb@FreeBSD.org>

Move kern_extattr_* prototypes to <sys/syscallsubr.h>

All of the kern_* prototypes belong in this header. While here, sort
the prototypes by function name.

Reviewed by: dchagin
Fixes: 6453d4240f6b vfs: Export exattr methods to reuse by Linuxulator
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D41766


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 6453d424 22-Jul-2023 Dmitry Chagin <dchagin@FreeBSD.org>

vfs: Export exattr methods to reuse by Linuxulator

Reviewed by:
Differential revision: https://reviews.freebsd.org/D35543
MFC after: 1 month


# 4d846d26 10-May-2023 Warner Losh <imp@FreeBSD.org>

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix


# 85dac03e 17-Nov-2022 Mateusz Guzik <mjg@FreeBSD.org>

vfs: stop using NDFREE

It provides nothing but a branchfest and next to no consumers want it
anyway.

Tested by: pho


# a75d1ddd 17-Sep-2022 Mateusz Guzik <mjg@FreeBSD.org>

vfs: introduce V_PCATCH to stop abusing PCATCH


# bb92cd7b 24-Mar-2022 Mateusz Guzik <mjg@FreeBSD.org>

vfs: NDFREE(&nd, NDF_ONLY_PNBUF) -> NDFREE_PNBUF(&nd)


# 7e1d3eef 25-Nov-2021 Mateusz Guzik <mjg@FreeBSD.org>

vfs: remove the unused thread argument from NDINIT*

See b4a58fbf640409a1 ("vfs: remove cn_thread")

Bump __FreeBSD_version to 1400043.


# 98dae405 10-Oct-2021 Greg V <greg@unrelenting.technology>

O_PATH: allow vfs_extattr syscalls

These calls do operate on vnodes only, not file contents.
This is useful for e.g. the xdg-document-portal fuse filesystem.

Reviewed by: kib, markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D32438


# b21ae0ff 13-May-2020 Conrad Meyer <cem@FreeBSD.org>

vfs_extattr: Allow extattr names up to the full max

Extattr names are allowed to be 255 bytes -- not 254 bytes plus trailing
NUL. Provide a 256 buffer so that copyinstr() has room for the trailing
NUL.

Re-enable test for maximal name lengths.

PR: 208965
Reported by: asomers
Reviewed by: asomers
Differential Revision: https://reviews.freebsd.org/D24584


# e126c5a3 14-Feb-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: use new capsicum helpers


# 3ff65f71 30-Jan-2020 Mateusz Guzik <mjg@FreeBSD.org>

Remove duplicated empty lines from kern/*.c

No functional changes.


# b249ce48 03-Jan-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: drop the mostly unused flags argument from VOP_UNLOCK

Filesystems which want to use it in limited capacity can employ the
VOP_UNLOCK_FLAGS macro.

Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D21427


# e682df53 04-Feb-2019 Conrad Meyer <cem@FreeBSD.org>

extattr_list_vp: Narrow locked section somewhat

Suggested by: mjg
Reviewed by: kib, mjg
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D19083


# c3eb848c 04-Feb-2019 Conrad Meyer <cem@FreeBSD.org>

extattr_list_vp: Only take shared vnode lock

List is a 'read'-type operation that does not modify shared state; it's safe
for multiple thread to proceed concurrently. This is reflected in the vnode
operation LISTEXTATTR locking protocol specification, which only requires a
shared lock.

(Similar to previous r248933.)

Reported by: Case van Rij <case.vanrij AT isilon.com>
Reviewed by: kib, mjg
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D19082


# 0a2c60c3 05-Feb-2018 Brooks Davis <brooks@FreeBSD.org>

Reduce duplication in extattr_*_(file|link) syscalls.

Reviewed by: rwatson
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14173


# 02bc058f 05-Feb-2018 Brooks Davis <brooks@FreeBSD.org>

ANSIfy syscall implementations.

Reviewed by: rwatson
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D14172


# 4ba058c0 13-Dec-2017 Fedor Uporov <fsu@FreeBSD.org>

Fix kernel build if MAC is not defined.

Reported by: Ravi Pokala, Andrew Turner
Approved by: pfg (mentor)
MFC after: 1 week


# 61b214f3 12-Dec-2017 Fedor Uporov <fsu@FreeBSD.org>

Move buffer size checks outside of the vnode locks.

Reviewed by: kib, cem, pfg (mentor)
Approved by: pfg (mentor)
MFC after: 1 weeks

Differential Revision: https://reviews.freebsd.org/D13405


# 8a36da99 27-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/kern: adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.


# 4da8456f 16-Jun-2015 Mateusz Guzik <mjg@FreeBSD.org>

Replace struct filedesc argument in getvnode with struct thread

This is is a step towards removal of spurious arguments.


# 4a144410 16-Mar-2014 Robert Watson <rwatson@FreeBSD.org>

Update kernel inclusions of capability.h to use capsicum.h instead; some
further refinement is required as some device drivers intended to be
portable over FreeBSD versions rely on __FreeBSD_version to decide whether
to include capability.h.

MFC after: 3 weeks


# 7008be5b 04-Sep-2013 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Change the cap_rights_t type from uint64_t to a structure that we can extend
in the future in a backward compatible (API and ABI) way.

The cap_rights_t represents capability rights. We used to use one bit to
represent one right, but we are running out of spare bits. Currently the new
structure provides place for 114 rights (so 50 more than the previous
cap_rights_t), but it is possible to grow the structure to hold at least 285
rights, although we can make it even larger if 285 rights won't be enough.

The structure definition looks like this:

struct cap_rights {
uint64_t cr_rights[CAP_RIGHTS_VERSION + 2];
};

The initial CAP_RIGHTS_VERSION is 0.

The top two bits in the first element of the cr_rights[] array contain total
number of elements in the array - 2. This means if those two bits are equal to
0, we have 2 array elements.

The top two bits in all remaining array elements should be 0.
The next five bits in all array elements contain array index. Only one bit is
used and bit position in this five-bits range defines array index. This means
there can be at most five array elements in the future.

To define new right the CAPRIGHT() macro must be used. The macro takes two
arguments - an array index and a bit to set, eg.

#define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL)

We still support aliases that combine few rights, but the rights have to belong
to the same array element, eg:

#define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL)
#define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL)

#define CAP_FCHMODAT (CAP_FCHMOD | CAP_LOOKUP)

There is new API to manage the new cap_rights_t structure:

cap_rights_t *cap_rights_init(cap_rights_t *rights, ...);
void cap_rights_set(cap_rights_t *rights, ...);
void cap_rights_clear(cap_rights_t *rights, ...);
bool cap_rights_is_set(const cap_rights_t *rights, ...);

bool cap_rights_is_valid(const cap_rights_t *rights);
void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src);
void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src);
bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little);

Capability rights to the cap_rights_init(), cap_rights_set(),
cap_rights_clear() and cap_rights_is_set() functions are provided by
separating them with commas, eg:

cap_rights_t rights;

cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT);

There is no need to terminate the list of rights, as those functions are
actually macros that take care of the termination, eg:

#define cap_rights_set(rights, ...) \
__cap_rights_set((rights), __VA_ARGS__, 0ULL)
void __cap_rights_set(cap_rights_t *rights, ...);

Thanks to using one bit as an array index we can assert in those functions that
there are no two rights belonging to different array elements provided
together. For example this is illegal and will be detected, because CAP_LOOKUP
belongs to element 0 and CAP_PDKILL to element 1:

cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL);

Providing several rights that belongs to the same array's element this way is
correct, but is not advised. It should only be used for aliases definition.

This commit also breaks compatibility with some existing Capsicum system calls,
but I see no other way to do that. This should be fine as Capsicum is still
experimental and this change is not going to 9.x.

Sponsored by: The FreeBSD Foundation


# 926cd204 30-Mar-2013 Matthew D Fleming <mdf@FreeBSD.org>

Use a shared lock for VOP_GETEXTATTR, as it is a read-like operation.

MFC after: 1 week


# 5050aa86 22-Oct-2012 Konstantin Belousov <kib@FreeBSD.org>

Remove the support for using non-mpsafe filesystem modules.

In particular, do not lock Giant conditionally when calling into the
filesystem module, remove the VFS_LOCK_GIANT() and related
macros. Stop handling buffers belonging to non-mpsafe filesystems.

The VFS_VERSION is bumped to indicate the interface change which does
not result in the interface signatures changes.

Conducted and reviewed by: attilio
Tested by: pho


# 526d0bd5 20-Feb-2012 Konstantin Belousov <kib@FreeBSD.org>

Fix found places where uio_resid is truncated to int.

Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the
sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from
the usermode.

Discussed with: bde, das (previous versions)
MFC after: 1 month


# 8451d0dd 16-Sep-2011 Kip Macy <kmacy@FreeBSD.org>

In order to maximize the re-usability of kernel code in user space this
patch modifies makesyscalls.sh to prefix all of the non-compatibility
calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel
entry points and all places in the code that use them. It also
fixes an additional name space collision between the kernel function
psignal and the libc function of the same name by renaming the kernel
psignal kern_psignal(). By introducing this change now we will ease future
MFCs that change syscalls.

Reviewed by: rwatson
Approved by: re (bz)


# a9d2f8d8 10-Aug-2011 Robert Watson <rwatson@FreeBSD.org>

Second-to-last commit implementing Capsicum capabilities in the FreeBSD
kernel for FreeBSD 9.0:

Add a new capability mask argument to fget(9) and friends, allowing system
call code to declare what capabilities are required when an integer file
descriptor is converted into an in-kernel struct file *. With options
CAPABILITIES compiled into the kernel, this enforces capability
protection; without, this change is effectively a no-op.

Some cases require special handling, such as mmap(2), which must preserve
information about the maximum rights at the time of mapping in the memory
map so that they can later be enforced in mprotect(2) -- this is done by
narrowing the rights in the existing max_protection field used for similar
purposes with file permissions.

In namei(9), we assert that the code is not reached from within capability
mode, as we're not yet ready to enforce namespace capabilities there.
This will follow in a later commit.

Update two capability names: CAP_EVENT and CAP_KEVENT become
CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they
represent.

Approved by: re (bz)
Submitted by: jonathan
Sponsored by: Google Inc


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# 1a996ed1 18-Jul-2010 Edward Tomasz Napierala <trasz@FreeBSD.org>

Revert r210225 - turns out I was wrong; the "/*-" is not license-only
thing; it's also used to indicate that the comment should not be automatically
rewrapped.

Explained by: cperciva@


# 805cc58a 18-Jul-2010 Edward Tomasz Napierala <trasz@FreeBSD.org>

The "/*-" comment marker is supposed to denote copyrights. Remove non-copyright
occurences from sys/sys/ and sys/kern/.


# 14961ba7 27-Jun-2009 Robert Watson <rwatson@FreeBSD.org>

Replace AUDIT_ARG() with variable argument macros with a set more more
specific macros for each audit argument type. This makes it easier to
follow call-graphs, especially for automated analysis tools (such as
fxr).

In MFC, we should leave the existing AUDIT_ARG() macros as they may be
used by third-party kernel modules.

Suggested by: brooks
Approved by: re (kib)
Obtained from: TrustedBSD Project
MFC after: 1 week


# bcf11e8d 05-Jun-2009 Robert Watson <rwatson@FreeBSD.org>

Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC
and used in a large number of files, but also because an increasing number
of incorrect uses of MAC calls were sneaking in due to copy-and-paste of
MAC-aware code without the associated opt_mac.h include.

Discussed with: pjd


# dfd233ed 11-May-2009 Attilio Rao <attilio@FreeBSD.org>

Remove the thread argument from the FSD (File-System Dependent) parts of
the VFS. Now all the VFS_* functions and relating parts don't want the
context as long as it always refers to curthread.

In some points, in particular when dealing with VOPs and functions living
in the same namespace (eg. vflush) which still need to be converted,
pass curthread explicitly in order to retain the old behaviour.
Such loose ends will be fixed ASAP.

While here fix a bug: now, UFS_EXTATTR can be compiled alone without the
UFS_EXTATTR_AUTOSTART option.

VFS KPI is heavilly changed by this commit so thirdy parts modules needs
to be recompiled. Bump __FreeBSD_version in order to signal such
situation.


# 885868cd 10-Apr-2009 Robert Watson <rwatson@FreeBSD.org>

Remove VOP_LEASE and supporting functions. This hasn't been used since
the removal of NQNFS, but was left in in case it was required for NFSv4.
Since our new NFSv4 client and server can't use it for their
requirements, GC the old mechanism, as well as other unused lease-
related code and interfaces.

Due to its impact on kernel programming and binary interfaces, this
change should not be MFC'd.

Proposed by: jeff
Reviewed by: jeff
Discussed with: rmacklem, zach loafman @ isilon


# fefd0ac8 07-Mar-2009 Robert Watson <rwatson@FreeBSD.org>

Remove 'uio' argument from MAC Framework and MAC policy entry points for
extended attribute get/set; in the case of get an uninitialized user
buffer was passed before the EA was retrieved, making it of relatively
little use; the latter was simply unused by any policies.

Obtained from: TrustedBSD Project
Sponsored by: Google, Inc.


# d19b9927 07-Jan-2009 Konstantin Belousov <kib@FreeBSD.org>

Do not call namei() while having another user-controlled vnode
locked. Lookup could attempt to recursively lock that vnode.

Do not call vn_start_write(V_WAIT) while vnode is locked, this may
result in a deadlock with suspension.

vfs_busy() the mountpoint before dropping vnode lock for vnode
that was used to look up the mountpoint, to prevent unmount in
between.

Reported and tested by: pho
Reviewed by: rwatson
MFC after: 3 weeks


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# 57b4252e 30-Mar-2008 Konstantin Belousov <kib@FreeBSD.org>

Add the support for the AT_FDCWD and fd-relative name lookups to the
namei(9).

Based on the submission by rdivacky,
sponsored by Google Summer of Code 2007
Reviewed by: rwatson, rdivacky
Tested by: pho


# 22db15c0 13-Jan-2008 Attilio Rao <attilio@FreeBSD.org>

VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>


# cb05b60a 09-Jan-2008 Attilio Rao <attilio@FreeBSD.org>

vn_lock() is currently only used with the 'curthread' passed as argument.
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.

Manpage and FreeBSD_version will be updated through further commits.

As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.

Tested by: Diego Sardina <siarodx at gmail dot com>,
Andrea Di Pasquale <whyx dot it at gmail dot com>


# 30d239bc 24-Oct-2007 Robert Watson <rwatson@FreeBSD.org>

Merge first in a series of TrustedBSD MAC Framework KPI changes
from Mac OS X Leopard--rationalize naming for entry points to
the following general forms:

mac_<object>_<method/action>
mac_<object>_check_<method/action>

The previous naming scheme was inconsistent and mostly
reversed from the new scheme. Also, make object types more
consistent and remove spaces from object types that contain
multiple parts ("posix_sem" -> "posixsem") to make mechanical
parsing easier. Introduce a new "netinet" object type for
certain IPv4/IPv6-related methods. Also simplify, slightly,
some entry point names.

All MAC policy modules will need to be recompiled, and modules
not updates as part of this commit will need to be modified to
conform to the new KPI.

Sponsored by: SPARTA (original patches against Mac OS X)
Obtained from: TrustedBSD Project, Apple Computer


# afae6388 22-Dec-2006 Robert Watson <rwatson@FreeBSD.org>

Update comments to reflect changes in the extattrctl() code.

Clean up comment formatting.

Obtained from: TrustedBSD Project


# 168d0553 22-Dec-2006 Robert Watson <rwatson@FreeBSD.org>

Following a repo-copy of vfs_syscalls.c to vfs_extattr.c, remove
non-extattr functions from vfs_extattr.c, and extattr functions from
vfs_syscalls.c.

Change copyright/license on vfs_extattr.c to my copyright/license on
the extended attribute implementation (from extattr.h).

Clean up includes a bit.

Obtained from: TrustedBSD Project


# acd3428b 06-Nov-2006 Robert Watson <rwatson@FreeBSD.org>

Sweep kernel replacing suser(9) calls with priv(9) calls, assigning
specific privilege names to a broad range of privileges. These may
require some future tweaking.

Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Discussed on: arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
Alex Lyashkov <umka at sevcity dot net>,
Skip Ford <skip dot ford at verizon dot net>,
Antoine Brodin <antoine dot brodin at laposte dot net>


# 9a969e62 26-Oct-2006 Konstantin Belousov <kib@FreeBSD.org>

The attempt to rename "." with MAC framework compiled in would cause attempt
to twice unlock the vnode. Check that ni_vp and ni_dvp are different before
doing second unlock.

Reviewed by: rwatson
Approved by: pjd (mentor)
MFC after: 1 week


# aed55708 22-Oct-2006 Robert Watson <rwatson@FreeBSD.org>

Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h
begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.

This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.

Obtained from: TrustedBSD Project
Sponsored by: SPARTA


# a1e363f2 25-Sep-2006 Tor Egge <tegge@FreeBSD.org>

Add mnt_noasync counter to better handle interleaved calls to nmount(),
sync() and sync_fsync() without losing MNT_ASYNC. Add MNTK_ASYNC flag
which is set only when MNT_ASYNC is set and mnt_noasync is zero, and
check that flag instead of MNT_ASYNC before initiating async io.


# 5da56ddb 25-Sep-2006 Tor Egge <tegge@FreeBSD.org>

Use mount interlock to protect all changes to mnt_flag and mnt_kern_flag.
This eliminates a race where MNT_UPDATE flag could be lost when nmount()
raced against sync(), sync_fsync() or quotactl().


# 783deec1 20-Sep-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

There is no need to set 'sp' to NULL anymore.


# 4e59868e 19-Sep-2006 Tor Egge <tegge@FreeBSD.org>

Copy stat information from mount structure before it can change identity.


# 5702e096 17-Sep-2006 Robert Watson <rwatson@FreeBSD.org>

Declare security and security.bsd sysctl hierarchies in sysctl.h along
with other commonly used sysctl name spaces, rather than declaring them
all over the place.

MFC after: 1 month
Sponsored by: nCircle Network Security, Inc.


# 9802d04c 02-Aug-2006 John Baldwin <jhb@FreeBSD.org>

Fix some bugs in the previous revision (1.419). Don't perform extra
vfs_rel() on the mountpoint if the MAC checks fail in kern_statfs() and
kern_fstatfs(). Similarly, don't perform an extra vfs_rel() if we get
a doomed vnode in kern_fstatfs(), and handle the case of mp being NULL
(for some doomed vnodes) by conditionalizing the vfs_rel() in
kern_fstatfs() on mp != NULL.

CID: 1517
Found by: Coverity Prevent (tm) (kern_fstatfs())
Pointy hat to: jhb


# ea175645 27-Jul-2006 John Baldwin <jhb@FreeBSD.org>

Hold the reference on the mountpoint slightly longer in kern_statfs() and
kern_fstatfs() so that it is still held when prison_enforce_statfs() is
called (since that function likes to poke and prod at the mountpoint
structure).

MFC after: 3 days


# 2f198e89 19-Jul-2006 John Baldwin <jhb@FreeBSD.org>

Call change_dir() instead of duplicating the code in fchdir().


# be5747d5 11-Jul-2006 John Baldwin <jhb@FreeBSD.org>

- Add conditional VFS Giant locking to getdents_common() (linux ABIs),
ibcs2_getdents(), ibcs2_read(), ogetdirentries(), svr4_sys_getdents(),
and svr4_sys_getdents64() similar to that in getdirentries().
- Mark ibcs2_getdents(), ibcs2_read(), linux_getdents(), linux_getdents64(),
linux_readdir(), ogetdirentries(), svr4_sys_getdents(), and
svr4_sys_getdents64() MPSAFE.


# 65ee602e 06-Jul-2006 Wayne Salamon <wsalamon@FreeBSD.org>

Audit the remaining parameters to the extattr system calls. Generate
the audit records for those calls.

Obtained from: TrustedBSD Project
Approved by: rwatson (mentor)


# 6e79e6f8 05-Jun-2006 Robert Watson <rwatson@FreeBSD.org>

Audit command, uid arguments for quotactl().
Audit the mode argument to mkfifo().
Audit the target path passed to symlink().

Submitted by: wsalamon
Obtained from: TrustedBSD Project


# 3bbd6d8a 30-Mar-2006 Jeff Roberson <jeff@FreeBSD.org>

- Release the references acquired by VOP_GETWRITEMOUNT and vfs_getvfs().

Discussed with: tegge
Tested by: kris
Sponsored by: Isilon Systems, Inc.


# 861dab08 28-Mar-2006 John Baldwin <jhb@FreeBSD.org>

Change vn_open() to honor the MPSAFE flag in the passed in nameidata object
and use that instead of testing fdidx against -1 to determine if it should
release Giant if Giant was locked due to the requested file residing on a
non-MPSAFE VFS.

Discussed with: jeff


# 77c79550 21-Mar-2006 Jeff Roberson <jeff@FreeBSD.org>

- Remove explicit calls to lock and unlock Giant and replace them with
VFS_LOCK_GIANT/VFS_UNLOCK_GIANT calls. This completely removes Giant
acquisition in the syscall path for ffs.

Bug fix to kern_fhstatfs from: Todd Miller <Todd.Miller@sparta.com>
Sponsored by: Isilon Systems, Inc.


# 6308f39d 03-Mar-2006 Paul Saab <ps@FreeBSD.org>

use strlcpy in cvtstatfs and copy_statfs instead of bcopy to ensure
the copied strings are properly terminated.

bzero the statfs32 struct in copy_statfs.


# 6815739e 03-Mar-2006 Paul Saab <ps@FreeBSD.org>

Don't truncate f_mntfromname & f_mntonname to 16 characters when
translating statfs into ostatfs. This allows 4.x binaries making
statfs calls to work on 6.x.


# 8febcfb9 22-Feb-2006 Jeff Roberson <jeff@FreeBSD.org>

- Use vfs_ref/rel to protect a mountpoint from going away while VFS_STATFS
is being called. Be sure to grab the ref before we unlock the vnode to
prevent the mount from disappearing.

Tested by: kris


# bc5504b9 22-Feb-2006 Wayne Salamon <wsalamon@FreeBSD.org>

Add pathname and/or vnode argument auditing for the following system calls:
quotactl, statfs, fstatfs, fchdir, chdir, chroot, open, mknod, mkfifo,
link, symlink, undelete, unlink, access, eaccess, stat, lstat, pathconf,
readlink, chflags, lchflags, fchflags, chmod, lchmod, fchmod, chown,
lchown, fchown, utimes, lutimes, futimes, truncate, ftruncate, fsync,
rename, mkdir, rmdir, getdirentries, revoke, lgetfh, getfh, extattrctl,
extattr_set_file, extattr_set_link, extattr_get_file, extattr_get_link,
extattr_delete_file, extattr_delete_link, extattr_list_file, extattr_list_link.

In many cases the pathname and vnode auditing is done within namei lookup
instead of directly in the system call.

Audit the remaining arguments to these system calls:
fstatfs, fchdir, open, mknod, chflags, lchflags, fchflags, chmod, lchmod,
fchmod, chown, lchown, fchown, futimes, ftruncate, fsync, mkdir,
getdirentries.


# c5dcb840 22-Feb-2006 Jeff Roberson <jeff@FreeBSD.org>

- Revert r1.406 until a solution can be found that doesn't break nfs. The
statfs handler in nfs will lock vnodes which may lead to deadlock or
recursion.

Found by: kris
Pointy hat to: me


# 05b6a20a 21-Feb-2006 Jeff Roberson <jeff@FreeBSD.org>

- Hold the vnode used in the statfs related functions until we're done with
the VFS_STATFS call to prevent the mount from disappearing while we're
stating.
- Convert these routines to use MPSAFE namei semantics.

MFC After: 1 week


# 809f984b 06-Feb-2006 John Baldwin <jhb@FreeBSD.org>

Add a kern_eaccess() function and use it to implement xenix_eaccess()
rather than kern_access().

Suggested by: rwatson


# 2f0bca55 06-Feb-2006 Jeff Roberson <jeff@FreeBSD.org>

- Don't check v_mount for NULL to determine if a vnode has been recycled.
Use the more appropriate VI_DOOMED flag instead.

Sponsored by: Isilon Systems, Inc.
MFC After: 1 week


# 59428b0b 03-Feb-2006 Robert Watson <rwatson@FreeBSD.org>

In fchdir(), Giant must be separately acquired and dropped if the old
vnode is from a file system that is not MPSAFE, as vrele() expects
Giant to be held when it is called on a non-MPSAFE vnode.

Spotted by: kris
Tested by: glebius


# 0ac72424 01-Feb-2006 Jeff Roberson <jeff@FreeBSD.org>

- chroot and chdir need to lock giant as appropriate for the outgoing vp
as well as the new vp.

Sponsored by: Isilon Systems, Inc.
MFC After: 3 days


# 89b0e109 31-Jan-2006 Jeff Roberson <jeff@FreeBSD.org>

- Reorder calls to vrele() after calls to vput() when the vrele is a
directory. vrele() may lock the passed vnode, which in these cases would
give an invalid lock order of child -> parent. These situations are
deadlock prone although do not typically deadlock because the vrele
is typically not releasing the last reference to the vnode. Users of
vrele must consider it as a call to vn_lock() and order it appropriately.

MFC After: 1 week
Sponsored by: Isilon Systems, Inc.
Tested by: kkenn


# 1dd5fc0f 22-Jan-2006 Don Lewis <truckman@FreeBSD.org>

Tweak previous vfs_lookup.c commit to return an EINVAL error from
lookup() instead of EPERM when a DELETE or RENAME operation is
attempted on "..".

In kern_unlink(), remap EINVAL errors returned from namei() to EPERM
to match existing (and POSIX required) behaviour.

Discussed with: bde
MFC after: 3 days


# c3d78136 04-Jan-2006 Diomidis Spinellis <dds@FreeBSD.org>

Fix style bug.

Prompted by: bde


# f8ccc6ce 03-Jan-2006 Diomidis Spinellis <dds@FreeBSD.org>

Replace tv_usec normalization with the return of EINVAL.
This addresses two objections to the previous behavior,
and unbreaks the alpha tinderbox build.

TODO: update the utimes(2) man page.


# 51339e85 03-Jan-2006 Diomidis Spinellis <dds@FreeBSD.org>

Normalize the tv_usec part of the utimes(2) arguments to ensure
that a file's atime and mtime are only set to correct fractional
second values (0-999999000ns with the current interface).
Prior to this change users could create files with values outside
that range. Moreover, on 32-bit machines tv_usec offsets larger than
4.3s would result in an unnormalized AND wrong timestamp value,
due to overflow.

MFC after: 1 week


# c505fe7a 19-Dec-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Reduce Giant scope a bit, as fdrop() is believed to be MPSAFE.
The purpose of this change is consistency (not performance improvement:)),
as it was hard to tell if fdrop() is MPSAFE or not when I saw it sometimes
under the Giant and sometimes without it.

Glanced at by: ssouhlal, kan


# c47a4d1c 24-Sep-2005 Christian S.J. Peron <csjp@FreeBSD.org>

Implement new world order in VFS locking for extended attributes. This will
remove the unconditional acquisition of Giant for extended attribute related
operations. If the file system is set as being MP safe and debug.mpsafevfs is
1, do not pickup Giant.

Mark the following system calls as being MP safe so we no longer pickup Giant
in the system call handler:

o extattrctl
o extattr_set_file
o extattr_get_file
o extattr_delete_file
o extattr_set_fd
o extattr_get_fd
o extattr_delete_fd
o extattr_set_link
o extattr_get_link
o extattr_delete_link
o extattr_list_file
o extattr_list_link
o extattr_list_fd

-Pass MPSAFE flags to namei(9) lookup and introduce vfslocked variable which
will keep track of any Giant acquisitions.
-Wrap any fd operations which manipulate vnodes in VFS_{UN}LOCK_GIANT
-Drop VFS_ASSERT_GIANT into function which operate on vnodes to ensure that
we are sufficiently protected.

I've tested these changes with various TrustedBSD MAC policies which use
extended attribute a lot on SMP and UP systems (thanks to Scott Long for
making some SMP hardware available to me for testing).

Discussed with: jeff
Requested by: jhb, rwatson


# 68ff2a43 15-Sep-2005 Christian S.J. Peron <csjp@FreeBSD.org>

Improve the MP safeness associated with the creation of symbolic
links and the execution of ELF binaries. Two problems were found:

1) The link path wasn't tagged as being MP safe and thus was not properly
protected.
2) The ELF interpreter vnode wasnt being locked in namei(9) and thus was
insufficiently protected.

This commit makes the following changes:

-Sets the MPSAFE flag in NDINIT for symbolic link paths
-Sets the MPSAFE flag in NDINIT and introduce a vfslocked variable which
will be used to instruct VFS_UNLOCK_GIANT to unlock Giant if it has been
picked up.
-Drop in an assertion into vfs_lookup which ensures that if the MPSAFE
flag is NOT set, that we have picked up giant. If not panic (if WITNESS
compiled into the kernel). This should help us find conditions where vnode
operations are in-sufficiently protected.

This is a RELENG_6 candidate.

Discussed with: jeff
MFC after: 4 days


# d8b464e5 01-Sep-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

In case of mac_check_vnode_rename_from() or vn_start_write() failure,
vn_finished_write() should not be called.

Reviewed by: ssouhlal
MFC after: 3 days


# 06a13778 23-Jun-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Actually only protect mount-point if security.jail.enforce_statfs is set to 2.
If we don't return statistics about requested file systems, system tools
may not work correctly or at all.

Approved by: re (scottl)


# dbb3ec5c 13-Jun-2005 Jeff Roberson <jeff@FreeBSD.org>

- Remove vnode lock asserts at the end of vfs syscalls. These asserts were
used to ensure that we weren't exiting the syscall with a lock still
held. This wasn't safe, however, because we'd already executed a vput()
and on a loaded system the vnode may have been free'd by the time we
assert. This functionality is also handled by the td_locks assert in
userret, which doesn't tell you what the syscall was, but will at least
panic before you deadlock.

Sponsored by: Isilon Systems, Inc.
Discovred by: Peter Holm
Approved by: re (blanket vfs)


# 65ac438c 12-Jun-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Do not allocate memory while holding a mutex.
I introduce a very small race here (some file system can be mounted or
unmounted between 'count' calculation and file systems list creation),
but it is harmless.

Found by: FreeBSD Kernel Stress Test Suite: http://www.holm.cc/stress/
Reported by: Peter Holm <peter@holm.cc>


# 3a996d6e 11-Jun-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Do not allocate memory based on not-checked argument from userland.
It can be used to panic the kernel by giving too big value.
Fix it by moving allocation and size verification into kern_getfsstat().
This even simplifies kern_getfsstat() consumers, but destroys symmetry -
memory is allocated inside kern_getfsstat(), but has to be freed by the
caller.

Found by: FreeBSD Kernel Stress Test Suite: http://www.holm.cc/stress/
Reported by: Peter Holm <peter@holm.cc>


# 820a0de9 09-Jun-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Rename sysctl security.jail.getfsstatroot_only to security.jail.enforce_statfs
and extend its functionality:

value policy
0 show all mount-points without any restrictions
1 show only mount-points below jail's chroot and show only part of the
mount-point's path (if jail's chroot directory is /jails/foo and
mount-point is /jails/foo/usr/home only /usr/home will be shown)
2 show only mount-point where jail's chroot directory is placed.

Default value is 2.

Discussed with: rwatson


# 13a82b96 09-Jun-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Avoid code duplication in serval places by introducing universal
kern_getfsstat() function.

Obtained from: jhb


# 1209e08f 08-Jun-2005 Craig Rodrigues <rodrigc@FreeBSD.org>

Initialize uio_iovcnt to 1 in extattr_list_vp() and extattr_get_vp()

PR: kern/79357
Approved by: rwatson


# f8e5f642 28-May-2005 Robert Watson <rwatson@FreeBSD.org>

Acquire Giant explicitly in quotactl() so that the syscalls.master
entry can become MSTD.


# f73e1f57 27-May-2005 Robert Watson <rwatson@FreeBSD.org>

Acquire Giant explicitly in fhopen(), fhstat(), and kern_fhstatfs(),
so that we can start to eliminate the presence of non-MPSAFE system
call entries in syscalls.master.


# 9e9cfdca 27-May-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Remove (now) unused argument 'td' from cvtstatfs().


# 2b19a055 27-May-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Sync locking in freebsd4_getfsstat() with getfsstat().
Giant is probably also needed in kern_fhstatfs().


# fd916868 27-May-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Use consistent style in functions I want to modify in the near future.


# c95cbf7e 22-May-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Protect fsid in freebsd4_getfsstat() in simlar way as it is done in
getfsstat().


# a0e96a49 22-May-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

If we need to hide fsid, kern_statfs()/kern_fstatfs() will do it for us,
so do not duplicate the code in cvtstatfs().
Note, that we now need to clear fsid in freebsd4_getfsstat().

This moves all security related checks from functions like cvtstatfs()
and will allow to add more security related stuff (like statfs(2), etc.
protection for jails) a bit easier.


# 836c5b41 11-Apr-2005 Jeff Roberson <jeff@FreeBSD.org>

- vput(tvp) before vrele(tdvp) in kern_rename() to avoid lock order issues.


# 5ef9827c 08-Apr-2005 Jeff Roberson <jeff@FreeBSD.org>

- Remove the namei NOOBJ flag. It is meaningless now.

Sponsored by: Isilon Systems, Inc.


# d830f828 24-Mar-2005 Jeff Roberson <jeff@FreeBSD.org>

- Pass LK_EXCLUSIVE to VFS_ROOT() to satisfy the new flags argument. For
now, all calls to VFS_ROOT() should still acquire exclusive locks.

Sponsored by: Isilon Systems, Inc.


# ae88db8a 23-Mar-2005 Jeff Roberson <jeff@FreeBSD.org>

- Remove the #ifdef LOOKUP_SHARED from some calls to NDINIT. The
LOCKSHARED flag is simply ignored in namei() if LOOKUP_SHARED is not
enabled.

Sponsored by: Isilon Systems, Inc.


# 9331fd13 13-Mar-2005 Jeff Roberson <jeff@FreeBSD.org>

- Don't VOP_UNLOCK prior to VOP_REVOKE. The lock is required now.

Sponsored by: Isilon Systems, Inc.


# 88e5b12a 08-Feb-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Drag another softupdates tentacle back into FFS: Now that FFS's
vop_fsync is separate from the internal use we can do the full job
there.


# fee4a6af 07-Feb-2005 John Baldwin <jhb@FreeBSD.org>

Implement a kern_pathconf() wrapper for pathconf() which can take the
filename from either a user space or a kernel space pointer.


# 76951d21 07-Feb-2005 John Baldwin <jhb@FreeBSD.org>

- Tweak kern_msgctl() to return a copy of the requested message queue id
structure in the struct pointed to by the 3rd argument for IPC_STAT and
get rid of the 4th argument. The old way returned a pointer into the
kernel array that the calling function would then access afterwards
without holding the appropriate locks and doing non-lock-safe things like
copyout() with the data anyways. This change removes that unsafeness and
resulting race conditions as well as simplifying the interface.
- Implement kern_foo wrappers for stat(), lstat(), fstat(), statfs(),
fstatfs(), and fhstatfs(). Use these wrappers to cut out a lot of
code duplication for freebsd4 and netbsd compatability system calls.
- Add a new lookup function kern_alternate_path() that looks up a filename
under an alternate prefix and determines which filename should be used.
This is basically a more general version of linux_emul_convpath() that
can be shared by all the ABIs thus allowing for further reduction of
code duplication.


# ff05fd5d 02-Feb-2005 Jeff Roberson <jeff@FreeBSD.org>

- Correct a typo in kern_rename. tvfslocked should be initialized from
tond and not fromnd. This could lead us to leak Giant, or unlock it
twice, depending on the filesystems involved. renames within a single
filesystem would not have caused any problems.

Sponsored by: Isilon Systems, Inc.


# 37c15216 01-Feb-2005 Jeff Roberson <jeff@FreeBSD.org>

- Or MPSAFE with the correct set of flags in stat(). This affected only
the LOOKUP_SHARED case.

Spotted by: jhb


# 8516dd18 24-Jan-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Don't use VOP_GETVOBJECT, use vp->v_object directly.


# dcff5b14 24-Jan-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Don't call VOP_CREATEVOBJECT(), it's the responsibility of the
filesystem which owns the vnode.


# 94a94585 24-Jan-2005 Jeff Roberson <jeff@FreeBSD.org>

- Change all vfs syscalls to use VFS_LOCK_GIANT(), and MPSAFE nds.
- Move Giant acquisition into the few vfs syscalls that weren't already
directly acquiring it.

Sponsored By: Isilon Systems, Inc.


# e39db32a 12-Jan-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Ditch vfs_object_create() and make the callers call VOP_CREATEVOBJECT()
directly.


# 8df6bac4 11-Jan-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Remove the unused credential argument from VOP_FSYNC() and VFS_SYNC().

I'm not sure why a credential was added to these in the first place, it is
not used anywhere and it doesn't make much sense:

The credentials for syncing a file (ability to write to the
file) should be checked at the system call level.

Credentials for syncing one or more filesystems ("none")
should be checked at the system call level as well.

If the filesystem implementation needs a particular credential
to carry out the syncing it would logically have to the
cached mount credential, or a credential cached along with
any delayed write data.

Discussed with: rwatson


# 9454b2d8 06-Jan-2005 Warner Losh <imp@FreeBSD.org>

/* -> /*- for copyright notices, minor format tweaks as necessary


# 9bb42816 16-Nov-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Eliminate pointless goto.


# 48ab5b2d 15-Nov-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Forgot to remove now unused variable in last commit.


# 136211e5 15-Nov-2004 Poul-Henning Kamp <phk@FreeBSD.org>

It is not necessary to hold vn_start_write/vn_finished_write around VOP_REVOKE.


# 718fe8e2 15-Nov-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Next FILEDESC_LOCK properly around FILE_LOCK


# 124e4c3b 13-Nov-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Introduce an alias for FILEDESC_{UN}LOCK() with the suffix _FAST.

Use this in all the places where sleeping with the lock held is not
an issue.

The distinction will become significant once we finalize the exact
lock-type to use for this kind of case.


# ef11fbd7 07-Nov-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Introduce fdclose() which will clean an entry in a filedesc.

Replace homerolled versions with call to fdclose().

Make fdunused() static to kern_descrip.c


# 56f21b9d 26-Jul-2004 Colin Percival <cperciva@FreeBSD.org>

Rename suser_cred()'s PRISON_ROOT flag to SUSER_ALLOWJAIL. This is
somewhat clearer, but more importantly allows for a consistent naming
scheme for suser_cred flags.

The old name is still defined, but will be removed in a few days (unless I
hear any complaints...)

Discussed with: rwatson, scottl
Requested by: jhb


# f257b7a5 12-Jul-2004 Alfred Perlstein <alfred@FreeBSD.org>

Make VFS_ROOT() and vflush() take a thread argument.
This is to allow filesystems to decide based on the passed thread
which vnode to return.
Several filesystems used curthread, they now use the passed thread.


# 4f3bf9b9 24-Jun-2004 Robert Watson <rwatson@FreeBSD.org>

Don't cuddle else's so much as we removed additional parts of each
block.


# 5e11031e 24-Jun-2004 Robert Watson <rwatson@FreeBSD.org>

Remove temporary API bandage that allowed applications speaking the
older API to list attributes on a file (zero-length attribute name)
to function. extattr_list_*() are now the only available APIs to
use when listing attributes.


# 9260798f 21-Jun-2004 Robert Watson <rwatson@FreeBSD.org>

Acquire Giant in link() so that the system call can be marked
MPSAFE. Don't want to acquire Giant in kern_link() sync linux
compat code performs actions requiring Giant prior to calling
kern_link().


# 694b21cf 21-Jun-2004 Robert Watson <rwatson@FreeBSD.org>

Acquire Giant in link() so that we can mark it as MSTD in
syscalls.master. Don't want to do it in kern_link() since the
Linux emulation code calls kern_link() after performing other
actions requiring Giant.


# d7086f31 19-Jun-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Only initialize f_data and f_ops if nobody else did so already.


# 1930e303 11-Jun-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Deorbit COMPAT_SUNOS.

We inherited this from the sparc32 port of BSD4.4-Lite1. We have neither
a sparc32 port nor a SunOS4.x compatibility desire these days.


# 79db0f1c 06-Jun-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Remove unused code.

Submitted by: Bjoern A. Zeeb


# c4d85674 04-Jun-2004 Tim J. Robbins <tjr@FreeBSD.org>

Remove a stale comment.


# f52e2ef2 11-May-2004 Tim J. Robbins <tjr@FreeBSD.org>

Eliminate a memory leak in kern_symlink() that could occur if
vn_start_write() failed.


# 6c0ad4a7 26-Apr-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Always use nd.ni_vp->v_mount as an argument for VFS_QUOTACTL(), just like
in RELENG_4.

Pointed out by: Alex Lyashkov <umka@sevinter.net>


# 0c0c597f 22-Apr-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Look out! vn_start_write() is able to return 0 and NULL 'mp'.

Submitted by: Alex Lyashkov <shadow@psoft.net>


# 295ed752 06-Apr-2004 Bruce Evans <bde@FreeBSD.org>

Removed some less than useful comments:
- don't say what a small subset of the options includes are for.
- don't mark up functions which use all their args with /* ARGSUSED */.
The markup should have been removed when the unused retval parameter
was removed.
- don't comment on what routine suser() checks do. Removed nearby
excessive vertical whitespace.


# 7f8a436f 05-Apr-2004 Warner Losh <imp@FreeBSD.org>

Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core


# 0b0a60fb 05-Apr-2004 Doug Rabson <dfr@FreeBSD.org>

Add lgetfh(2) which is like getfh(2) but doesn't follow symlinks.


# 31c7e8b0 16-Mar-2004 David Malone <dwmalone@FreeBSD.org>

Nudge Giant as far as I can into kern_open(). Mark open() as MPSAFE.
Use kern_open() to implement creat() rather than taking the long route
through open(). Mark creat as MPSAFE.

While I'm at it, mark nosys() (syscall 0) as MPSAFE, for all the
difference it will make.


# dd604e26 08-Mar-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Add two new sysctls:

- security.bsd.hardlink_check_uid, when set, means, that unprivileged
users are not permitted to create hard links to files not
owned by them,
- security.bsd.hardlink_check_gid, when set, means, that unprivileged
users are not permitted to create hard links to files owned
by group they don't belong to.

OK'ed by: rwatson


# a1cc6206 16-Feb-2004 David Malone <dwmalone@FreeBSD.org>

Correct a comment.

Reviewed by: alfred, tanimura


# f08df373 14-Feb-2004 Robert Watson <rwatson@FreeBSD.org>

By default, when a process in jail calls getfsstat(), only return the
data for the file system on which the jail's root vnode is located.
Previous behavior (show data for all mountpoints) can be restored
by setting security.jail.getfsstatroot_only to 0. Note: this also
has the effect of hiding other mounts inside a jail, such as /dev,
/tmp, and /proc, but errs on the side of leaking less information.


# a2fe44e8 15-Jan-2004 Dag-Erling Smørgrav <des@FreeBSD.org>

New file descriptor allocation code, derived from similar code introduced
in OpenBSD by Niels Provos. The patch introduces a bitmap of allocated
file descriptors which is used to locate available descriptors when a new
one is needed. It also moves the task of growing the file descriptor table
out of fdalloc(), reducing complexity in both fdalloc() and do_dup().

Debts of gratitude are owed to tjr@ (who provided the original patch on
which this work is based), grog@ (for the gdb(4) man page) and rwatson@
(for assistance with pxeboot(8)).


# 05c3c5c8 11-Jan-2004 Dag-Erling Smørgrav <des@FreeBSD.org>

Mechanical whitespace cleanup; parenthesize return values; other minor
style nits. The #ifdefs in this file give me a headache...


# 69546b2f 24-Dec-2003 Robert Watson <rwatson@FreeBSD.org>

Document that when we are addressing an open()/close() race, the reason
we call vn_close() manually rather than letting fdrop() take care of it
is that we haven't yet hooked up the various 'struct file' fields.


# fde81c7d 12-Nov-2003 Kirk McKusick <mckusick@FreeBSD.org>

Update the statfs structure with 64-bit fields to allow
accurate reporting of multi-terabyte filesystem sizes.

You should build and boot a new kernel BEFORE doing a `make world'
as the new kernel will know about binaries using the old statfs
structure, but an old kernel will not know about the new system
calls that support the new statfs structure. Running an old kernel
after a `make world' will cause programs such as `df' that do a
statfs system call to fail with a bad system call.

Reviewed by: Bruce Evans <bde@zeta.org.au>
Reviewed by: Tim Robbins <tjr@freebsd.org>
Reviewed by: Julian Elischer <julian@elischer.org>
Reviewed by: the hoards of <arch@freebsd.org>
Sponsored by: DARPA & NAI Labs.


# e1419c08 19-Oct-2003 David Malone <dwmalone@FreeBSD.org>

falloc allocates a file structure and adds it to the file descriptor
table, acquiring the necessary locks as it works. It usually returns
two references to the new descriptor: one in the descriptor table
and one via a pointer argument.

As falloc releases the FILEDESC lock before returning, there is a
potential for a process to close the reference in the file descriptor
table before falloc's caller gets to use the file. I don't think this
can happen in practice at the moment, because Giant indirectly protects
closes.

To stop the file being completly closed in this situation, this change
makes falloc set the refcount to two when both references are returned.
This makes life easier for several of falloc's callers, because the
first thing they previously did was grab an extra reference on the
file.

Reviewed by: iedowse
Idea run past: jhb


# c096756c 21-Aug-2003 Robert Watson <rwatson@FreeBSD.org>

Add mac_check_vnode_deleteextattr() and mac_check_vnode_listextattr():
explicit access control checks to delete and list extended attributes
on a vnode, rather than implicitly combining with the setextattr and
getextattr checks. This reflects EA API changes in the kernel made
recently, including the move to explicit VOP's for both of these
operations.

Obtained from: TrustedBSD PRoject
Sponsored by: DARPA, Network Associates Laboratories


# b2db7dc6 07-Aug-2003 John Baldwin <jhb@FreeBSD.org>

td_dupfd just needs to be less than 0, it does not have to hold the
negative value of the index of the new file, so just use -1.


# 76bd2355 04-Aug-2003 Ian Dowse <iedowse@FreeBSD.org>

In the mknod(), mkfifo(), link(), symlink() and undelete() syscalls,
use vrele() instead of vput() on the parent directory vnode returned
by namei() in the case where it is equal to the target vnode. This
handles namei()'s somewhat strange (but documented) behaviour of
not locking either vnode when the two vnodes are equal and LOCKPARENT
but not LOCKLEAF is specified.

Note that since a vnode double-unlock is not currently fatal, these
coding errors were effectively harmless.

Spotted by: Juergen Hannken-Illjes <hannken@eis.cs.tu-bs.de>
Reviewed by: mckusick


# 9080ff25 28-Jul-2003 Robert Watson <rwatson@FreeBSD.org>

Rename VOP_RMEXTATTR() to VOP_DELETEEXTATTR() for consistency with the
kernel ACL interfaces and system call names.

Break out UFS2 and FFS extattr delete and list vnode operations from
setextattr and getextattr to deleteextattr and listextattr, which
cleans up the implementations, and makes the results more readable,
and makes the APIs more clear.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


# cf774299 27-Jul-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Pass the file descriptor index down to vn_open.

If the method vector was replaced and we got the "special return code"
smile and trust that whatever happened below DTRT.


# 7c89f162 27-Jul-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Add fdidx argument to vn_open() and vn_open_cred() and pass -1 throughout.


# a8d43c90 26-Jul-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Add a "int fd" argument to VOP_OPEN() which in the future will
contain the filedescriptor number on opens from userland.

The index is used rather than a "struct file *" since it conveys a bit
more information, which may be useful to in particular fdescfs and /dev/fd/*

For now pass -1 all over the place.


# 1226914c 03-Jul-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Use the f_vnode field to tell which file descriptors have a vnode.


# 6b42f0a2 22-Jun-2003 Robert Watson <rwatson@FreeBSD.org>

Prefer the vop_rmextattr() vnode operation for removing extended
attributes from objects over vop_setextattr() with a NULL uio; if
the file system doesn't support the vop_rmextattr() method, fall
back to the vop_setextattr() method.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


# 3b6d9652 22-Jun-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Add a f_vnode field to struct file.

Several of the subtypes have an associated vnode which is used for
stuff like the f*() functions.

By giving the vnode a speparate field, a number of checks for the specific
subtype can be replaced simply with a check for f_vnode != NULL, and
we can later free f_data up to subtype specific use.

At this point in time, f_data still points to the vnode, so any code I
might have overlooked will still work.


# eaaca5de 20-Jun-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Don't (re)initialize f_gcflag to zero.

Move initialization of DTYPE_VNODE specific field f_seqcount into
the DTYPE_VNODE specific code.


# 6084b6c9 18-Jun-2003 Don Lewis <truckman@FreeBSD.org>

FILE_LOCK() uses a pool mutex, as does the vnode v_vnlock. Since pool
mutexes are supposed to only be used as leaf mutexes, and what appear
to be separate pool mutexes could be aliased together, it is bad idea
for a thread to attempt to hold two pool mutexes at the same time.

Slightly rearrange the code in kern_open() so that FILE_UNLOCK() is
called before calling VOP_GETVOBJECT(), which will grab the v_vnlock
mutex.


# 2db4b023 18-Jun-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Introduce a new flag on a file descriptor: DFLAG_SEEKABLE and use that
rather than assume that only DTYPE_VNODE is seekable.


# 677b542e 10-Jun-2003 David E. O'Brien <obrien@FreeBSD.org>

Use __FBSDID().


# 77762179 04-Jun-2003 Robert Watson <rwatson@FreeBSD.org>

If a system call comes in requesting to retrieve an attribute named
"", temporarily map it to a call to extattr_list_vp() to provide
compatibility for older applications using the "" API to retrieve
EA lists.

Use VOP_LISTEXTATTR() to support extattr_list_vp() rather than
VOP_GETEXTATTR(..., "", ...).

Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Asssociates Laboratories


# 8bebbb1a 03-Jun-2003 Robert Watson <rwatson@FreeBSD.org>

Implementations of extattr_list_fd(), extattr_list_file(), and
extattr_list_link() system calls, which return a least of extended
attributes defined for a vnode referenced by a file descriptor
or path name. Currently, we just invoke VOP_GETEXTATTR() since
it will convert a request for an empty name into a query for a
name list, which was the old (more hackish) API. At some point
in the near future, we'll push the distinction between get and
list down to the vnode operation layer, but this provides access
to the new API for applications in the short term.

Pointed out by: Dominic Giampaolo <dbg@apple.com>
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


# 67096659 31-May-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Remove unused variable(s).

Found by: FlexeLint


# 104a9b7e 29-Apr-2003 Alexander Kabaev <kan@FreeBSD.org>

Deprecate machine/limits.h in favor of new sys/limits.h.
Change all in-tree consumers to include <sys/limits.h>

Discussed on: standards@
Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>


# b6e48e03 23-Apr-2003 Alan Cox <alc@FreeBSD.org>

- Acquire the vm_object's lock when performing vm_object_page_clean().
- Add a parameter to vm_pageout_flush() that tells vm_pageout_flush()
whether its caller has locked the vm_object. (This is a temporary
measure to bootstrap vm_object locking.)


# fd7a8150 08-Apr-2003 Mike Barcroft <mike@FreeBSD.org>

o In struct prison, add an allprison linked list of prisons (protected
by allprison_mtx), a unique prison/jail identifier field, two path
fields (pr_path for reporting and pr_root vnode instance) to store
the chroot() point of each jail.
o Add jail_attach(2) to allow a process to bind to an existing jail.
o Add change_root() to perform the chroot operation on a specified
vnode.
o Generalize change_dir() to accept a vnode, and move namei() calls
to callers of change_dir().
o Add a new sysctl (security.jail.list) which is a group of
struct xprison instances that represent a snapshot of active jails.

Reviewed by: rwatson, tjr


# a184d471 05-Mar-2003 Robert Watson <rwatson@FreeBSD.org>

Move the initialization of the vattr flags field in setfflags() to
before the MAC check so that we pass the flags field into the MAC
check properly initialized. This didn't affect any current MAC
modules since they didn't care what the flags argument was (as
they were primarily interested in the fact that it was a meta-data
write, not the contents of the write), but would be relevant to
future modules relying on that field.

Submitted by: Mike Halderman <mrh@spawar.navy.mil>
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


# a163d034 18-Feb-2003 Warner Losh <imp@FreeBSD.org>

Back out M_* changes, per decision of the TRB.

Approved by: trb


# a44009e0 15-Feb-2003 Jeffrey Hsu <hsu@FreeBSD.org>

Remove extraneous FILEDESC_LOCK around atomic read.


# 565211b2 31-Jan-2003 Robert Watson <rwatson@FreeBSD.org>

Correct handling of locking for chroot() and chdir() cases: rather
than having change_dir() release the vnode lock on success, hold the
lock so that we can use it later when invoking MAC checks and
VOP_ACCESS() in the chroot() code. Update the comment to reflect
this calling convention. Update callers to unlock the vnode
lock. Correct a typo regarding vnode naming in the MAC case that
crept in via the previous patch applied.


# 7278944d 31-Jan-2003 Robert Watson <rwatson@FreeBSD.org>

Clean up vnode handling on return from chroot() in certain error
cases: we might multiply vrele() a vnode when certain classes of
failures occur. This appears to stem from earlier Giant/file
descriptor lock pushdown and restructuring.

Submitted by: maxim


# 44956c98 21-Jan-2003 Alfred Perlstein <alfred@FreeBSD.org>

Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.


# 48e3128b 12-Jan-2003 Matthew Dillon <dillon@FreeBSD.org>

Bow to the whining masses and change a union back into void *. Retain
removal of unnecessary casts and throw in some minor cleanups to see if
anyone complains, just for the hell of it.


# cd72f218 11-Jan-2003 Matthew Dillon <dillon@FreeBSD.org>

Change struct file f_data to un_data, a union of the correct struct
pointer types, and remove a huge number of casts from code using it.

Change struct xfile xf_data to xun_data (ABI is still compatible).

If we need to add a #define for f_data and xf_data we can, but I don't
think it will be necessary. There are no operational changes in this
commit.


# f0c09328 06-Jan-2003 Jacques Vidrine <nectar@FreeBSD.org>

Correct file descriptor leaks in lseek and do_dup.
The leak in lseek was introduced in vfs_syscalls.c revision 1.218.
The leak in do_dup was introduced in kern_descrip.c revision 1.158.

Submitted by: iedowse


# f97182ac 14-Dec-2002 Alfred Perlstein <alfred@FreeBSD.org>

unwrap lines made short enough by SCARGS removal


# b80521fe 13-Dec-2002 Alfred Perlstein <alfred@FreeBSD.org>

remove syscallarg().

Suggested by: peter


# d1e405c5 13-Dec-2002 Alfred Perlstein <alfred@FreeBSD.org>

SCARGS removal take II.


# bc9e75d7 13-Dec-2002 Alfred Perlstein <alfred@FreeBSD.org>

Backout removal SCARGS, the code freeze is only "selectively" over.


# 0bbe7292 13-Dec-2002 Alfred Perlstein <alfred@FreeBSD.org>

Remove SCARGS.

Reviewed by: md5


# 4e08ccb2 27-Oct-2002 Ian Dowse <iedowse@FreeBSD.org>

Fix a case in kern_rename() where a vn_finished_write() call was
missed. This bug has been present since the vn_start_write() and
vn_finished_write() calls were first added in revision 1.159. When
the case is triggered, any attempts to create snapshots on the
filesystem will deadlock and also prevent further write activity
on that filesystem.


# c7047e52 27-Oct-2002 Garrett Wollman <wollman@FreeBSD.org>

Change the way support for asynchronous I/O is indicated to applications
to conform to 1003.1-2001. Make it possible for applications to actually
tell whether or not asynchronous I/O is supported.

Since FreeBSD's aio implementation works on all descriptor types, don't
call down into file or vnode ops when [f]pathconf() is asked about
_PC_ASYNC_IO; this avoids the need for every file and vnode op to know about
it.


# 7587203c 19-Oct-2002 Robert Watson <rwatson@FreeBSD.org>

Hook up most of the MAC entry points relating to file/directory/node
creation, deletion, and rename. There are one or two other stray
cases I'll catch in follow-up commits (such as unix domain socket
creation); this permits MAC policy modules to limit the ability to
perform these operations based on existing UNIX credential / vnode
attributes, extended attributes, and security labels. In the rename
case using MAC, we now have to lock the from directory and file
vnodes for the MAC check, but this is done only in the MAC case,
and the locks are immediately released so that the remainder of the
rename implementation remains the same. Because the create check
takes a vattr to know object type information, we now initialize
additional fields in the VATTR passed to VOP_SYMLINK() in the MAC
case.

Approved by: re
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


# 2dba710d 10-Oct-2002 Robert Watson <rwatson@FreeBSD.org>

Incremental style improvements: more consistently avoid assignments
in conditionals; remove some excess vertical whitespace; remove a
bug in the return handling of the delete_vp() case for MAC.

Spotted by: bde


# b101411b 09-Oct-2002 Robert Watson <rwatson@FreeBSD.org>

Explore new heights in alphabetization for _file and _fd variations on
the extended attribute system calls.


# 6f90723c 09-Oct-2002 Robert Watson <rwatson@FreeBSD.org>

Implement extattr_{delete,get,set}_link() system calls: extended attribute
operations that do not follow links. Sync to MAC tree.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


# 197b023b 07-Oct-2002 Ian Dowse <iedowse@FreeBSD.org>

Add back a fdrop() call at the end of kern_open() that got lost in
revision 1.218. This bug caused a "struct file" reference to be
leaked if VOP_ADVLOCK(), vn_start_write(), or mac_check_vnode_write()
failed during the open operation.

PR: kern/43739
Reported by: Arne Woerner <woerner@mediabase-gmbh.de>


# 0a694196 05-Oct-2002 Robert Watson <rwatson@FreeBSD.org>

Merge support for mac_check_vnode_link(), a MAC framework/policy entry
point that instruments the creation of hard links. Policy implementations
to follow.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


# 8c5d0137 02-Oct-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Fix mis-indentation.

Spotted by: FlexeLint


# d40a8125 24-Sep-2002 Jeff Roberson <jeff@FreeBSD.org>

- Properly lock v_vflags in getdirents().


# fa288043 19-Sep-2002 Don Lewis <truckman@FreeBSD.org>

VOP_FSYNC() requires that it's vnode argument be locked, which nfs_link()
wasn't doing. Rather than just lock and unlock the vnode around the call
to VOP_FSYNC(), implement rwatson's suggestion to lock the file vnode
in kern_link() before calling VOP_LINK(), since the other filesystems
also locked the file vnode right away in their link methods. Remove the
locking and and unlocking from the leaf filesystem link methods.

Reviewed by: rwatson, bde (except for the unionfs_link() changes)


# d3a7b5e7 10-Sep-2002 Bruce Evans <bde@FreeBSD.org>

vfs_syscalls.c:
Changed rename(2) to follow the letter of the POSIX spec. POSIX
requires rename() to have no effect if its args "resolve to the same
existing file". I think "file" can only reasonably be read as referring
to the inode, although the rationale and "resolve" seem to say that
sameness is at the level of (resolved) directory entries.

ext2fs_vnops.c, ufs_vnops.c:
Replaced code that gave the historical BSD behaviour of removing one
link name by checks that this code is now unreachable. This fixes
some races. All vnodes needed to be unlocked for the removal, and
locking at another level using something like IN_RENAME was not even
attempted, so it was possible for rename(x, y) to return with both x
and y removed even without any unlink(2) syscalls (one process can
remove x using rename(x, y) and another process can remove y using
rename(y, x)).

Prodded by: alfred
MFC after: 8 weeks
PR: 42617


# 8f19eb88 01-Sep-2002 Ian Dowse <iedowse@FreeBSD.org>

Split out a number of mostly VFS and signal related syscalls into
a kernel-internal kern_*() version and a wrapper that is called via
the syscall vector table. For paths and structure pointers, the
internal version either takes a uio_seg parameter or requires the
caller to copyin() the data to kernel memory as appropiate. This
will permit emulation layers to use these syscalls without having
to copy out translated arguments to the stack gap.

Discussed on: -arch
Review/suggestions: bde, jhb, peter, marcel


# 856d3a05 20-Aug-2002 Jeff Roberson <jeff@FreeBSD.org>

- Hold the vnode lock across unlink() so that the v_vflag check is safe.
- Fix the long broken error handling for VV_ROOT and VDIR.


# 177142e4 19-Aug-2002 Robert Watson <rwatson@FreeBSD.org>

Pass active_cred and file_cred into the MAC framework explicitly
for mac_check_vnode_{poll,read,stat,write}(). Pass in fp->f_cred
when calling these checks with a struct file available. Otherwise,
pass NOCRED. All currently MAC policies use active_cred, but
could now offer the cached credential semantic used for the base
system security model.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# 7f724f8b 19-Aug-2002 Robert Watson <rwatson@FreeBSD.org>

Break out mac_check_vnode_op() into three seperate checks:
mac_check_vnode_poll(), mac_check_vnode_read(), mac_check_vnode_write().
This improves the consistency with other existing vnode checks, and
allows policies to avoid implementing switch statements to determine
what operations they do and do not want to authorize.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# ea6027a8 15-Aug-2002 Robert Watson <rwatson@FreeBSD.org>

Make similar changes to fo_stat() and fo_poll() as made earlier to
fo_read() and fo_write(): explicitly use the cred argument to fo_poll()
as "active_cred" using the passed file descriptor's f_cred reference
to provide access to the file credential. Add an active_cred
argument to fo_stat() so that implementers have access to the active
credential as well as the file credential. Generally modify callers
of fo_stat() to pass in td->td_ucred rather than fp->f_cred, which
was redundantly provided via the fp argument. This set of modifications
also permits threads to perform these operations on behalf of another
thread without modifying their credential.

Trickle this change down into fo_stat/poll() implementations:

- badfo_poll(), badfo_stat(): modify/add arguments.
- kqueue_poll(), kqueue_stat(): modify arguments.
- pipe_poll(), pipe_stat(): modify/add arguments, pass active_cred to
MAC checks rather than td->td_ucred.
- soo_poll(), soo_stat(): modify/add arguments, pass fp->f_cred rather
than cred to pru_sopoll() to maintain current semantics.
- sopoll(): moidfy arguments.
- vn_poll(), vn_statfile(): modify/add arguments, pass new arguments
to vn_stat(). Pass active_cred to MAC and fp->f_cred to VOP_POLL()
to maintian current semantics.
- vn_close(): rename cred to file_cred to reflect reality while I'm here.
- vn_stat(): Add active_cred and file_cred arguments to vn_stat()
and consumers so that this distinction is maintained at the VFS
as well as 'struct file' layer. Pass active_cred instead of
td->td_ucred to MAC and to VOP_GETATTR() to maintain current semantics.

- fifofs: modify the creation of a "filetemp" so that the file
credential is properly initialized and can be used in the socket
code if desired. Pass ap->a_td->td_ucred as the active
credential to soo_poll(). If we teach the vnop interface about
the distinction between file and active credentials, we would use
the active credential here.

Note that current inconsistent passing of active_cred vs. file_cred to
VOP's is maintained. It's not clear why GETATTR would be authorized
using active_cred while POLL would be authorized using file_cred at
the file system level.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# e6e370a7 04-Aug-2002 Jeff Roberson <jeff@FreeBSD.org>

- Replace v_flag with v_iflag and v_vflag
- v_vflag is protected by the vnode lock and is used when synchronization
with VOP calls is needed.
- v_iflag is protected by interlock and is used for dealing with vnode
management issues. These flags include X/O LOCK, FREE, DOOMED, etc.
- All accesses to v_iflag and v_vflag have either been locked or marked with
mp_fixme's.
- Many ASSERT_VOP_LOCKED calls have been added where the locking was not
clear.
- Many functions in vfs_subr.c were restructured to provide for stronger
locking.

Idea stolen from: BSD/OS


# 18b770b2 01-Aug-2002 Robert Watson <rwatson@FreeBSD.org>

Introduce support for Mandatory Access Control and extensible
kernel access control.

Invoke appropriate MAC framework entry points to authorize readdir()
operations in the native ABI.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# f9d0d524 01-Aug-2002 Robert Watson <rwatson@FreeBSD.org>

Include file cleanup; mac.h and malloc.h at one point had ordering
relationship requirements, and no longer do.

Reminded by: bde


# f4d2cfdd 01-Aug-2002 Robert Watson <rwatson@FreeBSD.org>

Introduce support for Mandatory Access Control and extensible
kernel access control.

Invoke appropriate MAC entry points to authorize the following
operations:

truncate on open() (write)
access() (access)
readlink() (readlink)
chflags(), lchflags(), fchflags() (setflag)
chmod(), fchmod(), lchmod() (setmode)
chown(), fchown(), lchown() (setowner)
utimes(), lutimes(), futimes() (setutimes)
truncate(), ftrunfcate() (write)
revoke() (revoke)
fhopen() (open)
truncate on fhopen() (write)
extattr_set_fd, extattr_set_file() (setextattr)
extattr_get_fd, extattr_get_file() (getextattr)
extattr_delete_fd(), extattr_delete_file() (setextattr)

These entry points permit MAC policies to enforce a variety of
protections on vnodes. More vnode checks to come, especially in
non-native ABIs.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# b3e13e1c 31-Jul-2002 Robert Watson <rwatson@FreeBSD.org>

Introduce support for Mandatory Access Control and extensible
kernel access control.

Instrument chdir() and chroot()-related system calls to invoke
appropriate MAC entry points to authorize the two operations.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# b285e7f9 31-Jul-2002 Robert Watson <rwatson@FreeBSD.org>

Improve formatting and variable use consistency in extattr system
calls.

Submitted by: green
Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# 956fc3f8 31-Jul-2002 Robert Watson <rwatson@FreeBSD.org>

Simplify the logic to enter VFS_EXTATTRCTL().

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# 2712d0ee 30-Jul-2002 Robert Watson <rwatson@FreeBSD.org>

Introduce support for Mandatory Access Control and extensible
kernel access control.

Implement MAC framework access control entry points relating to
operations on mountpoints. Currently, this consists only of
access control on mountpoint listing using the various statfs()
variations. In the future, it might also be desirable to
implement checks on mount() and unmount().

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# e66c87b7 30-Jul-2002 Robert Watson <rwatson@FreeBSD.org>

When referencing nd_cnp after namei(), always pass SAVENAME into
NDINIT() operation flags.

Submitted by: green
Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# 0b1040cb 21-Jul-2002 Robert Watson <rwatson@FreeBSD.org>

Set VAPPEND in open mode when O_APPEND is specified as an argument to
open() of fhopen(). Currently this has no actual affect due to the
treatment of VAPPEND in vaccess() and vaccess_acl() as a subset of
VWRITE, but when MAC comes in, MAC will distinguish the two. Note:
if any file systems are cutting their own permission models, they
may wish to now take this into account.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# fb36a3d8 16-Jul-2002 Kirk McKusick <mckusick@FreeBSD.org>

Change utimes to set the file creation time (for filesystems that
support creation times such as UFS2) to the value of the
modification time if the value of the modification time is older
than the current creation time. See utimes(2) for further details.

Sponsored by: DARPA & NAI Labs.


# faab4e27 16-Jul-2002 Kirk McKusick <mckusick@FreeBSD.org>

Change the name of st_createtime to st_birthtime. This change is
made to reduce confusion between st_ctime and st_createtime.

Submitted by: Eric Allman <eric@sendmail.org>
Sponsored by: DARPA & NAI Labs.


# 03d7a9ff 12-Jul-2002 John Baldwin <jhb@FreeBSD.org>

- Change chroot_refuse_vdir_fds() to require that the passed in struct
filedesc is already locked rather than having chroot() unlock the
filedesc so chroot_refuse_vdir_fds() can immediately relock it.
- Reorder chroot() a bitso that we do the namei lookup before checking
the process's struct filedesc. This closes at least one potential race
and allows us to only acquire the filedsec lock once in chroot().
- Push down Giant slightly into chroot().


# 2b4edb69 02-Jul-2002 Maxime Henrion <mux@FreeBSD.org>

Move every code related to mount(2) in a new file, vfs_mount.c.
The file vfs_conf.c which was dealing with root mounting has
been repo-copied into vfs_mount.c to preserve history.
This makes nmount related development easier, and help reducing
the size of vfs_syscalls.c, which is still an enormous file.

Reviewed by: rwatson
Repo-copy by: peter


# 6bd521df 01-Jul-2002 Ian Dowse <iedowse@FreeBSD.org>

Use indirect function pointer hooks instead of #ifdef SOFTUPDATES
direct calls for the two places where the kernel calls into soft
updates code. Set up the hooks in softdep_initialize() and NULL
them out in softdep_uninitialize(). This change allows soft updates
to function correctly when ufs is loaded as a module.

Reviewed by: mckusick


# a7884425 28-Jun-2002 Alfred Perlstein <alfred@FreeBSD.org>

Remove unneeded casts to caddr_t.


# 84b2995b 28-Jun-2002 Ian Dowse <iedowse@FreeBSD.org>

In vn_mkdir(), use vrele() instead of vput() on the parent directory
vnode in the case that the target exists and is the same vnode as
the parent (i.e. "mkdir ."). The namei() call does not leave the
vnode locked in this case even though you might expect it to.

This bug was mostly harmless in practice because unlocking an already
unlocked vnode currently does not trigger any panics or warnings.

Reviewed by: jeff


# d374764f 24-Jun-2002 Kirk McKusick <mckusick@FreeBSD.org>

Use proper size in bzero of stat structure.

Submitted by: Jake Burkholder <jake@locore.ca>
Sponsored by: DARPA & NAI Labs.


# 6524dddc 22-Jun-2002 Kirk McKusick <mckusick@FreeBSD.org>

This patch fixes a size problem with the stat structure for
64-bit architectures that was introduced in the UFS2 code
merge two days ago. The stat structure change that caused
the problem was the addition of the file create time.

Submitted by: Bruce Evans <bde@zeta.org.au>
Sponsored by: DARPA & NAI Labs.


# cacd1c9b 22-Jun-2002 Maxime Henrion <mux@FreeBSD.org>

o Remove the initialization of unused fields in the struct
uio now that we don't use uiomove() anymore.
o Enforce stricter checks on the length of the iov's in
nmount(2) since we now malloc() them individually and
corrupted iov's could make the kernel crash in malloc()
with "kmem_map too small".

Reviewed by: phk


# 1c85e6a3 21-Jun-2002 Kirk McKusick <mckusick@FreeBSD.org>

This commit adds basic support for the UFS2 filesystem. The UFS2
filesystem expands the inode to 256 bytes to make space for 64-bit
block pointers. It also adds a file-creation time field, an ability
to use jumbo blocks per inode to allow extent like pointer density,
and space for extended attributes (up to twice the filesystem block
size worth of attributes, e.g., on a 16K filesystem, there is space
for 32K of attributes). UFS2 fully supports and runs existing UFS1
filesystems. New filesystems built using newfs can be built in either
UFS1 or UFS2 format using the -O option. In this commit UFS1 is
the default format, so if you want to build UFS2 format filesystems,
you must specify -O 2. This default will be changed to UFS2 when
UFS2 proves itself to be stable. In this commit the boot code for
reading UFS2 filesystems is not compiled (see /sys/boot/common/ufsread.c)
as there is insufficient space in the boot block. Once the size of the
boot block is increased, this code can be defined.

Things to note: the definition of SBSIZE has changed to SBLOCKSIZE.
The header file <ufs/ufs/dinode.h> must be included before
<ufs/ffs/fs.h> so as to get the definitions of ufs2_daddr_t and
ufs_lbn_t.

Still TODO:
Verify that the first level bootstraps work for all the architectures.
Convert the utility ffsinfo to understand UFS2 and test growfs.
Add support for the extended attribute storage. Update soft updates
to ensure integrity of extended attribute storage. Switch the
current extended attribute interfaces to use the extended attribute
storage. Add the extent like functionality (framework is there,
but is currently never used).

Sponsored by: DARPA & NAI Labs.
Reviewed by: Poul-Henning Kamp <phk@freebsd.org>


# 7d2d4409 20-Jun-2002 Maxime Henrion <mux@FreeBSD.org>

Change the way we internally store the mount options to
a linked list. This is to allow the merging of the mount
options in the MNT_UPDATE case, as the current data structure
is unsuitable for this.

There are no functional differences in this commit.

Reviewed by: phk


# 8eb0098f 28-May-2002 Maxime Henrion <mux@FreeBSD.org>

Remove a duplicated vfs_freeopts() that I introduced in last
revision.


# 2274ec99 23-May-2002 Maxime Henrion <mux@FreeBSD.org>

Style nit, no functional changes.


# 9ee6bf71 23-May-2002 Maxime Henrion <mux@FreeBSD.org>

Slightly change the way we pass mount options to the filesystem
VFS_NMOUNT operations.

Reviewed by: phk


# e9e705b0 20-May-2002 Maxime Henrion <mux@FreeBSD.org>

Change two vput() that should have been vrele().

Submitted by: iedowse


# d394511d 16-May-2002 Tom Rhodes <trhodes@FreeBSD.org>

More s/file system/filesystem/g


# 0e2d6cc8 14-May-2002 Jeff Roberson <jeff@FreeBSD.org>

Disable the shared locking namei() code for now. It breaks several stacking
filesystems. This is on hold until the rest of VFS Locking is reviewed and
deemed safe. It can be enabled with 'options LOOKUP_SHARED'.


# 9d997d8b 05-May-2002 Maxime Henrion <mux@FreeBSD.org>

Add the lchflags(2) syscall.

Reviewed by: rwatson


# 576365ba 05-May-2002 Jeff Roberson <jeff@FreeBSD.org>

Move a KASSERT() in open() prior to unlocking the vnode. It's not safe to
call VOP_GETVOBJECT without a lock.


# afd458b0 04-May-2002 Maxime Henrion <mux@FreeBSD.org>

Fix a typo.

Submitted by: dwmalone


# 7a0776e4 22-Apr-2002 Robert Watson <rwatson@FreeBSD.org>

Slightly restructure extattr_get_vp() so that there's only one entry point
to VOP_GETEXTATTR(). This simplifies code flow when inserting MAC hooks.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# 05103170 19-Apr-2002 Robert Watson <rwatson@FreeBSD.org>

Improve style consistency of vfs_syscalls.c by converting the style used
in various extattr_*() calls to match the rest of the file. Originally,
these bits at the end looked more like style(9). This patch was submitted
by green by way of the TrustedBSD MAC tree, and I fixed a few problems
with it on the way through. Someone with more time on their hands should
convert the entire file to style(9); this commit is for diff reduction
purposes.

Submitted by: green
Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# df99ca52 16-Apr-2002 Ian Dowse <iedowse@FreeBSD.org>

The recent NFS forced unmount improvements introduced a side-effect
where some client operations might be unexpectedly cancelled during
an unsuccessful non-forced unmount attempt. This causes problems
for amd(8), because it periodically attempts a non-forced unmount
to check if the filesystem is still in use.

Fix this by adding a new mountpoint flag MNTK_UNMOUNTF that is set
only during the operation of a forced unmount. Use this instead of
MNTK_UNMOUNT to trigger the cancellation of hung NFS operations.

Also correct a problem where dounmount() might inadvertently clear
the MNTK_UNMOUNT flag.

Reported by: simokawa
MFC after: 1 week


# a59f8b9e 08-Apr-2002 Jeff Roberson <jeff@FreeBSD.org>

Turn #ifdef LOOKUP_SHARED into #ifndef LOOKUP_EXCLUSIVE to enable this
behavior by default. Also, change the options line to reflect this.

If there are no problems reported this will become the only behavior and the
knob will be removed in a month or so.

Demanded by: obrien


# a48ca369 08-Apr-2002 Maxime Henrion <mux@FreeBSD.org>

The fourth parameter to copystr() is a size_t, not an int.

Approved by: peter


# 9d835373 07-Apr-2002 Maxime Henrion <mux@FreeBSD.org>

o Change kernel_vmount() interface to be more convenient : pass two
separate strings instead of passing "foo=bar".
o Don't forget to clear the VMOUNT flag on the vnode when vfs_nmount()
fails because the fs doesn't implement VFS_NMOUNT (and in vfs_mount()
when the fs doesn't implement VFS_MOUNT) ; also decrement the vfs
refcount in the !MNT_UPDATE case.


# bcc93175 02-Apr-2002 Maxime Henrion <mux@FreeBSD.org>

Add two forgotten vfs_unbusy() calls, in vfs_mount() and vfs_nmount().

Reviewed by: phk


# 44731cab 01-Apr-2002 John Baldwin <jhb@FreeBSD.org>

Change the suser() API to take advantage of td_ucred as well as do a
general cleanup of the API. The entire API now consists of two functions
similar to the pre-KSE API. The suser() function takes a thread pointer
as its only argument. The td_ucred member of this thread must be valid
so the only valid thread pointers are curthread and a few kernel threads
such as thread0. The suser_cred() function takes a pointer to a struct
ucred as its first argument and an integer flag as its second argument.
The flag is currently only used for the PRISON_ROOT flag.

Discussed on: smp@


# daab5e24 28-Mar-2002 Maxime Henrion <mux@FreeBSD.org>

- Properly sync vfs_nmount() with changes that have be already done
in vfs_mount(), in particular revisions 1.215, 1.227 and 1.240.
- flag2 is a low quality variable name, change it to kern_flag.
- strncpy NUL-terminates f_fstypename and f_mntonname since the strings
have length <= <buffer length> - 1, so the explicit NUL-termination is
bogus.
- M_ZERO'ing space for fstype and fspath is stupid since we never use the
space beyond the end of the string.
- Do various style(9) cleanups in both functions.

Submitted by: bde
Reviewed by: phk


# dcce8874 26-Mar-2002 Andrew R. Reiter <arr@FreeBSD.org>

- Fixup a few style nits:
- return error -> return (error);
- move a declaration to the top of the function.
- become bug for bug compatible with if (error) lines.

Submitted by: bde


# 17594b93 26-Mar-2002 Maxime Henrion <mux@FreeBSD.org>

As discussed in -arch, add the new nmount(2) system call and the
new vfs_getopt()/vfs_copyopt() API. This is intended to be used
later, when there will be filesystems implementing the VFS_NMOUNT
operation. The mount(2) system call will disappear when all
filesystems will be converted to the new API. Documentation will
be committed in a while.

Reviewed by: phk


# 517f30c2 25-Mar-2002 Andrew R. Reiter <arr@FreeBSD.org>

- Recommit the securelevel_gt() calls removed by commits rev. 1.84 of
kern_linker.c and rev. 1.237 of vfs_syscalls.c since these are not the
source of the recent panics occuring around kldloading file system
support modules.

Requested by: rwatson


# fe3240e9 21-Mar-2002 Andrew R. Reiter <arr@FreeBSD.org>

- Back out the commit to make the linker_load_file() securelevel check
made aware in jail environments. Supposedly something is broken, so
this should be backed out until further investigation proves otherwise,
or a proper fix can be provided.


# e85b9ae9 21-Mar-2002 Andrew R. Reiter <arr@FreeBSD.org>

- Fix a logic error in checking the securelevel that was introduced in the
previous commit.

Pointy hats to: arr, rwatson


# c457a440 20-Mar-2002 Andrew R. Reiter <arr@FreeBSD.org>

- Change a check of securelevel to securelevel_gt() call in order to help
against users within a jail attempting to load kernel modules.
- Add a check of securelevel_gt() to vfs_mount() in order to chop some
low hanging fruit for the repair of securelevel checking of linking and
unlinking files from within jails. There is more to be done here.

Reviewed by: rwatson


# c897b813 19-Mar-2002 Jeff Roberson <jeff@FreeBSD.org>

Remove references to vm_zone.h and switch over to the new uma API.

Also, remove maxsockets. If you look carefully you'll notice that the old
zone allocator never honored this anyway.


# 4d77a549 19-Mar-2002 Alfred Perlstein <alfred@FreeBSD.org>

Remove __P.


# 4a950215 18-Mar-2002 Alfred Perlstein <alfred@FreeBSD.org>

Close a race when vfs_syscalls.c:checkdirs() runs.

To do this protect the filedesc pointer in the proc with PROC_LOCK
in both checkdirs() and kern_descrip.c:fdfree().


# 8de00f4a 11-Mar-2002 Jeff Roberson <jeff@FreeBSD.org>

This patch adds the "LOCKSHARED" option to namei which causes it to only acquire shared locks on leafs.
The stat() and open() calls have been changed to make use of this new functionality. Using shared locks in
these cases is sufficient and can significantly reduce their latency if IO is pending to these vnodes. Also,
this reduces the number of exclusive locks that are floating around in the system, which helps reduce the
number of deadlocks that occur.

A new kernel option "LOOKUP_SHARED" has been added. It defaults to off so this patch can be turned on for
testing, and should eventually go away once it is proven to be stable. I have personally been running this
patch for over a year now, so it is believed to be fully stable.

Reviewed by: jake, obrien
Approved by: jake


# 89e1164e 05-Mar-2002 Robert Watson <rwatson@FreeBSD.org>

Three p_ucred -> td_ucred's missed in jhb's earlier pass; all appear to
be safe.


# b0ad6e20 05-Mar-2002 Robert Watson <rwatson@FreeBSD.org>

The change from td->td_proc->p_ucred to td->td_ucred has shortened some
lines: more agressively line wrap under those circumstances.


# bdd67d48 27-Feb-2002 John Baldwin <jhb@FreeBSD.org>

- Change namei() to use td_ucred instead of p_ucred.
- Change the hack in access() that uses a temporary credential to set
td_ucred to the temp cred instead of p_ucred.


# a854ed98 27-Feb-2002 John Baldwin <jhb@FreeBSD.org>

Simple p_ucred -> td_ucred changes to start using the per-thread ucred
reference.


# 1ea030d8 10-Feb-2002 Robert Watson <rwatson@FreeBSD.org>

Make sure to hold vnode lock when calling into VOP_GETATTR().

Discussed with: mckusick, phk


# c0a9dc83 10-Feb-2002 Robert Watson <rwatson@FreeBSD.org>

Make sure to grab vnode lock on a vnode before calling VOP_GETATTR()
to perform an ownership test in revoke(). This is also required for
MAC hooks so that the vnode lock is held during a call to the MAC
framework. Release the lock before calling VOP_REVOKE().

Discussed with: phk, mckusick


# 56e04d01 09-Feb-2002 Robert Watson <rwatson@FreeBSD.org>

Remove a stray 'const' that slept into extattr_set_vp(), and could
result in compiler warnings.


# 74237f55 09-Feb-2002 Robert Watson <rwatson@FreeBSD.org>

Part I: Update extended attribute API and ABI:

o Modify the system call syntax for extattr_{get,set}_{fd,file}() so
as not to use the scatter gather API (which appeared not to be used
by any consumers, and be less portable), rather, accepts 'data'
and 'nbytes' in the style of other simple read/write interfaces.
This changes the API and ABI.

o Modify system call semantics so that extattr_get_{fd,file}() return
a size_t. When performing a read, the number of bytes read will
be returned, unless the data pointer is NULL, in which case the
number of bytes of data are returned. This changes the API only.

o Modify the VOP_GETEXTATTR() vnode operation to accept a *size_t
argument so as to return the size, if desirable. If set to NULL,
the size will not be returned.

o Update various filesystems (pseodofs, ufs) to DTRT.

These changes should make extended attributes more useful and more
portable. More commits to rebuild the system call files, as well
as update userland utilities to follow.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# 143bb598 07-Feb-2002 Robert Watson <rwatson@FreeBSD.org>

o Merge various recent fixes from the MAC branch relating to extattrctl():
- Fix null-pointer dereference introduced when snapshotting
was introduced. This occured because unlike the previous code,
vn_start_write() doesn't always return a non-NULL mp, as
filesystems may not support the VOP_GETWRITEMOUNT() call. For
now, rely on two pointers, so that vn_finished_write() works
properly.
- Fix locking problems on exit, introduced at some past time,
some when snapshots came in, where a vnode might not be
unlocked before being vrele'd in various error situations.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# 079b7bad 07-Feb-2002 Julian Elischer <julian@FreeBSD.org>

Pre-KSE/M3 commit.
this is a low-functionality change that changes the kernel to access the main
thread of a process via the linked list of threads rather than
assuming that it is embedded in the process. It IS still embeded there
but remove all teh code that assumes that in preparation for the next commit
which will actually move it out.

Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,


# b7184973 01-Feb-2002 Alfred Perlstein <alfred@FreeBSD.org>

Don't recurse on filedesc lock in chroot_refuse_vdir_fds().

Noticed by: Michael Nottebrock <michaelnottebrock@gmx.net>


# a4db4953 13-Jan-2002 Alfred Perlstein <alfred@FreeBSD.org>

Replace ffind_* with fget calls.

Make fget MPsafe.

Make fgetvp and fgetsock use the fget subsystem to reduce code bloat.

Push giant down in fpathconf().


# 426da3bc 13-Jan-2002 Alfred Perlstein <alfred@FreeBSD.org>

SMP Lock struct file, filedesc and the global file list.

Seigo Tanimura (tanimura) posted the initial delta.

I've polished it quite a bit reducing the need for locking and
adapting it for KSE.

Locks:

1 mutex in each filedesc
protects all the fields.
protects "struct file" initialization, while a struct file
is being changed from &badfileops -> &pipeops or something
the filedesc should be locked.

1 mutex in each struct file
protects the refcount fields.
doesn't protect anything else.
the flags used for garbage collection have been moved to
f_gcflag which was the FILLER short, this doesn't need
locking because the garbage collection is a single threaded
container.
could likely be made to use a pool mutex.

1 sx lock for the global filelist.

struct file * fhold(struct file *fp);
/* increments reference count on a file */

struct file * fhold_locked(struct file *fp);
/* like fhold but expects file to locked */

struct file * ffind_hold(struct thread *, int fd);
/* finds the struct file in thread, adds one reference and
returns it unlocked */

struct file * ffind_lock(struct thread *, int fd);
/* ffind_hold, but returns file locked */

I still have to smp-safe the fget cruft, I'll get to that asap.


# 1f493270 09-Jan-2002 Ian Dowse <iedowse@FreeBSD.org>

Change dounmount() to return EBUSY in the non-MNT_FORCE case if we
can't acquire the mnt_lock without blocking. Normally non-forced
unmount attempts return EBUSY quickly if any vnodes are active, so
this just extends that behaviour to cover the per-mount mnt_lock
too.


# 10cc6dff 03-Jan-2002 Stefan Eßer <se@FreeBSD.org>

Return EBADF in case some vnode field has been reset to a NULL pointer.
(There has been some discussion, whether ENOENT or EBADF is more
appropriate. I choose the latter, since the operation is not supported
on the file descriptor at that time, even if it was, immediately before.)

PR: 32681
Reviewed by: dillon, iedowse, ...
Approved by: nectar
MFC after: 3 days
(pending RE approval)


# 751a2cd0 05-Nov-2001 Poul-Henning Kamp <phk@FreeBSD.org>

Define a new mount flag "MNT_JAILDEVFS"

Collect the magic combination of flags which can be updated into
a macro in sys/mount.h rather than inlining them (twice!) in
vfs_syscalls.c


# 6b8bd2ef 04-Nov-2001 Matthew Dillon <dillon@FreeBSD.org>

Add mnt_reservedvnlist so we can MFC to 4.x, in order to make all mount
structure changes now rather then piecemeal later on. mnt_nvnodelist
currently holds all the vnodes under the mount point. This will eventually
be split into a 'dirty' and 'clean' list. This way we only break kld's once
rather then twice. nvnodelist will eventually turn into the dirty list
and should remain compatible with the klds.


# cd778f02 02-Nov-2001 Robert Watson <rwatson@FreeBSD.org>

o Remove the local temporary variable "struct proc *p" from vfs_mount()
in vfs_syscalls.c. Although it did save some indirection, many of
those savings will be obscured with the impending commit of suser()
changes, and the result is increased code complexity. Also, once
p->p_ucred and td->td_ucred are distinguished, this will make
vfs_mount() use the correct thread credential, rather than the
process credential.


# 0bd1a2d0 02-Nov-2001 Poul-Henning Kamp <phk@FreeBSD.org>

Argh!

patch added the nmount at the bottom first time around.

Take 3!


# bad69977 02-Nov-2001 Poul-Henning Kamp <phk@FreeBSD.org>

Add empty shell for nmount syscall (take 2!)


# 06d133c4 02-Nov-2001 Poul-Henning Kamp <phk@FreeBSD.org>

Add nmount() stub function and regenerate the syscall-glue which should
not need to check in generated files.


# a06fe511 24-Oct-2001 Matthew Dillon <dillon@FreeBSD.org>

unwind v_writecount in fhopen() if we are unable to allocate the
descriptor.

MFC after: 3 days


# c72ccd01 22-Oct-2001 Matthew Dillon <dillon@FreeBSD.org>

Change the vnode list under the mount point from a LIST to a TAILQ
in preparation for an implementation of limiting code for kern.maxvnodes.

MFC after: 3 days


# c6ab2f6b 01-Oct-2001 Robert Watson <rwatson@FreeBSD.org>

o Complete the migration from suser error checking in the following form
in vfs_syscalls.c:

if (mp->mnt_stat.f_owner != p->p_ucred->cr_uid &&
(error = suser_td(td)) != 0) {
unwrap_lots_of_stuff();
return (error);
}

to:

if (mp->mnt_stat.f_owner != p->p_ucred->cr_uid) {
error = suser_td(td);
if (error) {
unwrap_lots_of_stuff();
return (error);
}
}

This makes the code more readable when complex clauses are in use,
and minimizes conflicts for large outstanding patchsets modifying the
kernel authorization code (of which I have several), especially where
existing authorization and context code are combined in the same if()
conditional.

Obtained from: TrustedBSD Project


# b4799065 21-Sep-2001 Robert Watson <rwatson@FreeBSD.org>

o vpaccess() -> vn_access() -- Peter reminds me that there is already
a convention for vnop helper routines of this sort.

Submitted by: Mr Wemm <peter>


# 9c94f773 21-Sep-2001 Robert Watson <rwatson@FreeBSD.org>

o Introduce eaccess(2), a version of access(2) that uses the effective
credentials rather than the real credentials. This is useful for
implementing GUI's which need to modify icons based on access rights,
but where use of open(2) is too expensive, use of stat(2) doesn't
reflect the file system's real protection model, and use of
access() suffers from real/effective credential confusion. This
implementation provides the same semantics as the call of the same
name on SCO OpenServer. Note: using this call improperly can
leave you subject to some of the same races present in the
access(2) call.
o To implement this, break out the basic logic of access(2) into
vpaccess(), which accepts a passed credential to perform the
invocation of VOP_ACCESS(). Add eaccess(2) to invoke vpaccess(),
and modify access(2) to use vpaccess().

Obtained from: TrustedBSD Project


# b40ce416 12-Sep-2001 Julian Elischer <julian@FreeBSD.org>

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# 63347f1e 29-Aug-2001 Andrey A. Chernov <ache@FreeBSD.org>

lseek: simplify overflow checks


# c4778eed 26-Aug-2001 Andrey A. Chernov <ache@FreeBSD.org>

Cosmetique & style fixes from bde


# db106eff 23-Aug-2001 Andrey A. Chernov <ache@FreeBSD.org>

lseek: fix check for vattr.va_size overflow. Check suggested by bde simple not
works with unsigned types.


# b82f5b62 23-Aug-2001 Andrey A. Chernov <ache@FreeBSD.org>

Cosmetique: more <sys/*> into one group, separate include families by
blank line


# 383f169d 21-Aug-2001 Andrey A. Chernov <ache@FreeBSD.org>

Make lseek() POSIXed: for non character special files

1) handle off_t overflow with EOVERFLOW
2) handle negative offsets with EINVAL

Reviewed by: arch discussion


# 8774836b 20-Aug-2001 Ian Dowse <iedowse@FreeBSD.org>

Avoid sleeping while holding a mutex in dounmount(). This problem
has existed for a long time, but I made it worse a few months ago
by by adding calls to VFS_ROOT() and checkdirs() in revision 1.179.

Also, remove the LK_REENABLE flag in the lockmgr() call; this flag
has been ignored by the lockmgr code for 4 years. This was the only
remaining mention of it apart from its definition.

Reviewed by: jhb


# a9a8ba3d 10-Aug-2001 Ian Dowse <iedowse@FreeBSD.org>

Arbitrarily limit to 64k the number of bytes that can be read at
a time using the ogetdirentries() compatibility syscall. This is a
hack to ensure that rediculous values don't get passed to MALLOC().

Reviewed by: kris


# f0cc1c6f 09-Jul-2001 Dag-Erling Smørgrav <des@FreeBSD.org>

Constify the fstype argument to vfs_mount(). This eliminates at least one
"call discards qualifier" warning (in sys/compat/linux/linux_file.c).


# 0cddd8f0 04-Jul-2001 Matthew Dillon <dillon@FreeBSD.org>

With Alfred's permission, remove vm_mtx in favor of a fine-grained approach
(this commit is just the first stage). Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.


# c0a0fb85 06-Jun-2001 Thomas Moestl <tmm@FreeBSD.org>

Fix an instance of NDINIT in the extattrctl syscall: LOCKLEAF was or'ed
to the operation parameter, not to the flags as it should be.

Reviewed by: rwatson


# b1fc0ec1 25-May-2001 Robert Watson <rwatson@FreeBSD.org>

o Merge contents of struct pcred into struct ucred. Specifically, add the
real uid, saved uid, real gid, and saved gid to ucred, as well as the
pcred->pc_uidinfo, which was associated with the real uid, only rename
it to cr_ruidinfo so as not to conflict with cr_uidinfo, which
corresponds to the effective uid.
o Remove p_cred from struct proc; add p_ucred to struct proc, replacing
original macro that pointed.
p->p_ucred to p->p_cred->pc_ucred.
o Universally update code so that it makes use of ucred instead of pcred,
p->p_ucred instead of p->p_pcred, cr_ruidinfo instead of p_uidinfo,
cr_{r,sv}{u,g}id instead of p_*, etc.
o Remove pcred0 and its initialization from init_main.c; initialize
cr_ruidinfo there.
o Restruction many credential modification chunks to always crdup while
we figure out locking and optimizations; generally speaking, this
means moving to a structure like this:
newcred = crdup(oldcred);
...
p->p_ucred = newcred;
crfree(oldcred);
It's not race-free, but better than nothing. There are also races
in sys_process.c, all inter-process authorization, fork, exec, and
exit.
o Remove sigio->sio_ruid since sigio->sio_ucred now contains the ruid;
remove comments indicating that the old arrangement was a problem.
o Restructure exec1() a little to use newcred/oldcred arrangement, and
use improved uid management primitives.
o Clean up exit1() so as to do less work in credential cleanup due to
pcred removal.
o Clean up fork1() so as to do less work in credential cleanup and
allocation.
o Clean up ktrcanset() to take into account changes, and move to using
suser_xxx() instead of performing a direct uid==0 comparision.
o Improve commenting in various kern_prot.c credential modification
calls to better document current behavior. In a couple of places,
current behavior is a little questionable and we need to check
POSIX.1 to make sure it's "right". More commenting work still
remains to be done.
o Update credential management calls, such as crfree(), to take into
account new ruidinfo reference.
o Modify or add the following uid and gid helper routines:
change_euid()
change_egid()
change_ruid()
change_rgid()
change_svuid()
change_svgid()
In each case, the call now acts on a credential not a process, and as
such no longer requires more complicated process locking/etc. They
now assume the caller will do any necessary allocation of an
exclusive credential reference. Each is commented to document its
reference requirements.
o CANSIGIO() is simplified to require only credentials, not processes
and pcreds.
o Remove lots of (p_pcred==NULL) checks.
o Add an XXX to authorization code in nfs_lock.c, since it's
questionable, and needs to be considered carefully.
o Simplify posix4 authorization code to require only credentials, not
processes and pcreds. Note that this authorization, as well as
CANSIGIO(), needs to be updated to use the p_cansignal() and
p_cansched() centralized authorization routines, as they currently
do not take into account some desirable restrictions that are handled
by the centralized routines, as well as being inconsistent with other
similar authorization instances.
o Update libkvm to take these changes into account.

Obtained from: TrustedBSD Project
Reviewed by: green, bde, jhb, freebsd-arch, freebsd-audit


# bdc60f5b 23-May-2001 John Baldwin <jhb@FreeBSD.org>

Don't release Giant around vm_oject_page_clean() in fsync() as the pager
putpages called will need Giant.


# 99d300a1 23-May-2001 Ruslan Ermilov <ru@FreeBSD.org>

- FDESC, FIFO, NULL, PORTAL, PROC, UMAP and UNION file
systems were repo-copied from sys/miscfs to sys/fs.

- Renamed the following file systems and their modules:
fdesc -> fdescfs, portal -> portalfs, union -> unionfs.

- Renamed corresponding kernel options:
FDESC -> FDESCFS, PORTAL -> PORTALFS, UNION -> UNIONFS.

- Install header files for the above file systems.

- Removed bogus -I${.CURDIR}/../../sys CFLAGS from userland
Makefiles.


# 23955314 18-May-2001 Alfred Perlstein <alfred@FreeBSD.org>

Introduce a global lock for the vm subsystem (vm_mtx).

vm_mtx does not recurse and is required for most low level
vm operations.

faults can not be taken without holding Giant.

Memory subsystems can now call the base page allocators safely.

Almost all atomic ops were removed as they are covered under the
vm mutex.

Alpha and ia64 now need to catch up to i386's trap handlers.

FFS and NFS have been tested, other filesystems will need minor
changes (grabbing the vm lock when twiddling page properties).

Reviewed (partially) by: jake, jhb


# 60fb0ce3 28-Apr-2001 Greg Lehey <grog@FreeBSD.org>

Revert consequences of changes to mount.h, part 2.

Requested by: bde


# d98dc34f 23-Apr-2001 Greg Lehey <grog@FreeBSD.org>

Correct #includes to work with fixed sys/mount.h.


# fec605c8 31-Mar-2001 Robert Watson <rwatson@FreeBSD.org>

o Introduce extattr_{delete,get,set}_fd() to allow extended attribute
operations on file descriptors, which complement the existing set of
calls, extattr_{delete,get,set}_file() which act on paths. In doing
so, restructure the system call implementation such that the two sets
of functions share most of the relevant code, rather than duplicating
it. This pushes the vnode locking into the shared code, but keeps
the copying in of some arguments in the system call code. Allowing
access via file descriptors reduces the opportunity for race
conditions when managing extended attributes.

Obtained from: TrustedBSD Project


# 1005a129 28-Mar-2001 John Baldwin <jhb@FreeBSD.org>

Convert the allproc and proctree locks from lockmgr locks to sx locks.


# 0abc15fd 20-Mar-2001 Bruce Evans <bde@FreeBSD.org>

Fixed breakage of access() in rev.1.164. Wrong credentials were used for
the final path component.


# 30632071 18-Mar-2001 Robert Watson <rwatson@FreeBSD.org>

o Rename "namespace" argument to "attrnamespace" as namespace is a C++
reserved word.

Submitted by: jkh
Obtained from: TrustedBSD Project


# 70f36851 14-Mar-2001 Robert Watson <rwatson@FreeBSD.org>

o Change the API and ABI of the Extended Attribute kernel interfaces to
introduce a new argument, "namespace", rather than relying on a first-
character namespace indicator. This is in line with more recent
thinking on EA interfaces on various mailing lists, including the
posix1e, Linux acl-devel, and trustedbsd-discuss forums. Two namespaces
are defined by default, EXTATTR_NAMESPACE_SYSTEM and
EXTATTR_NAMESPACE_USER, where the primary distinction lies in the
access control model: user EAs are accessible based on the normal
MAC and DAC file/directory protections, and system attributes are
limited to kernel-originated or appropriately privileged userland
requests.

o These API changes occur at several levels: the namespace argument is
introduced in the extattr_{get,set}_file() system call interfaces,
at the vnode operation level in the vop_{get,set}extattr() interfaces,
and in the UFS extended attribute implementation. Changes are also
introduced in the VFS extattrctl() interface (system call, VFS,
and UFS implementation), where the arguments are modified to include
a namespace field, as well as modified to advoid direct access to
userspace variables from below the VFS layer (in the style of recent
changes to mount by adrian@FreeBSD.org). This required some cleanup
and bug fixing regarding VFS locks and the VFS interface, as a vnode
pointer may now be optionally submitted to the VFS_EXTATTRCTL()
call. Updated documentation for the VFS interface will be committed
shortly.

o In the near future, the auto-starting feature will be updated to
search two sub-directories to the ".attribute" directory in appropriate
file systems: "user" and "system" to locate attributes intended for
those namespaces, as the single filename is no longer sufficient
to indicate what namespace the attribute is intended for. Until this
is committed, all attributes auto-started by UFS will be placed in
the EXTATTR_NAMESPACE_SYSTEM namespace.

o The default POSIX.1e attribute names for ACLs and Capabilities have
been updated to no longer include the '$' in their filename. As such,
if you're using these features, you'll need to rename the attribute
backing files to the same names without '$' symbols in front.

o Note that these changes will require changes in userland, which will
be committed shortly. These include modifications to the extended
attribute utilities, as well as to libutil for new namespace
string conversion routines. Once the matching userland changes are
committed, a buildworld is recommended to update all the necessary
include files and verify that the kernel and userland environments
are in sync. Note: If you do not use extended attributes (most people
won't), upgrading is not imperative although since the system call
API has changed, the new userland extended attribute code will no longer
compile with old include files.

o Couple of minor cleanups while I'm there: make more code compilation
conditional on FFS_EXTATTR, which should recover a bit of space on
kernels running without EA's, as well as update copyright dates.

Obtained from: TrustedBSD Project


# 2aa33d2f 06-Mar-2001 John Baldwin <jhb@FreeBSD.org>

Check to see if p_fd is NULL before derferencing it in checkdirs(). It's
possible for us to see a process in the early stages of fork before p_fd
has been initialized. Ideally, we wouldn't stick a process on the allproc
list until it was fully created however.


# fbedc117 02-Mar-2001 Adrian Chadd <adrian@FreeBSD.org>

Mismatched MFSNAMELEN and MNAMELEN with fstype / fspath.

Submitted by: Naoki Kobayashi <shibata@geo.titech.ac.jp>


# f3a90da9 01-Mar-2001 Adrian Chadd <adrian@FreeBSD.org>

Reviewed by: jlemon

An initial tidyup of the mount() syscall and VFS mount code.

This code replaces the earlier work done by jlemon in an attempt to
make linux_mount() work.

* the guts of the mount work has been moved into vfs_mount().

* move `type', `path' and `flags' from being userland variables into being
kernel variables in vfs_mount(). `data' remains a pointer into
userspace.

* Attempt to verify the `type' and `path' strings passed to vfs_mount()
aren't too long.

* rework mount() and linux_mount() to take the userland parameters
(besides data, as mentioned) and pass kernel variables to vfs_mount().
(linux_mount() already did this, I've just tidied it up a little more.)

* remove the copyin*() stuff for `path'. `data' still requires copyin*()
since its a pointer into userland.

* set `mount->mnt_statf_mntonname' in vfs_mount() rather than in each
filesystem. This variable is generally initialised with `path', and
each filesystem can override it if they want to.

* NOTE: f_mntonname is intiailised with "/" in the case of a root mount.


# a90ef2ae 28-Feb-2001 Ian Dowse <iedowse@FreeBSD.org>

The kernel did not hold a vnode reference associated with the
`rootvnode' pointer, but vfs_syscalls.c's checkdirs() assumed that
it did. This bug reliably caused a panic at reboot time if any
filesystem had been mounted directly over /.

The checkdirs() function is called at mount time to find any process
fd_cdir or fd_rdir pointers referencing the covered mountpoint
vnode. It transfers these to point at the root of the new filesystem.
However, this process was not reversed at unmount time, so processes
with a cwd/root at a mount point would unexpectedly lose their
cwd/root following a mount-unmount cycle at that mountpoint.

This change should fix both of the above issues. Start_init() now
holds an extra vnode reference corresponding to `rootvnode', and
dounmount() releases this reference when the root filesystem is
unmounted just before reboot. Dounmount() now undoes the actions
taken by checkdirs() at mount time; any process cdir/rdir pointers
that reference the root vnode of the unmounted filesystem are
transferred to the now-uncovered vnode.

Reviewed by: bde, phk


# 91421ba2 20-Feb-2001 Robert Watson <rwatson@FreeBSD.org>

o Move per-process jail pointer (p->pr_prison) to inside of the subject
credential structure, ucred (cr->cr_prison).
o Allow jail inheritence to be a function of credential inheritence.
o Abstract prison structure reference counting behind pr_hold() and
pr_free(), invoked by the similarly named credential reference
management functions, removing this code from per-ABI fork/exit code.
o Modify various jail() functions to use struct ucred arguments instead
of struct proc arguments.
o Introduce jailed() function to determine if a credential is jailed,
rather than directly checking pointers all over the place.
o Convert PRISON_CHECK() macro to prison_check() function.
o Move jail() function prototypes to jail.h.
o Emulate the P_JAILED flag in fill_kinfo_proc() and no longer set the
flag in the process flags field itself.
o Eliminate that "const" qualifier from suser/p_can/etc to reflect
mutex use.

Notes:

o Some further cleanup of the linux/jail code is still required.
o It's now possible to consider resolving some of the process vs
credential based permission checking confusion in the socket code.
o Mutex protection of struct prison is still not present, and is
required to protect the reference count plus some fields in the
structure.

Reviewed by: freebsd-arch
Obtained from: TrustedBSD Project


# c3d7bcdf 16-Feb-2001 Jonathan Lemon <jlemon@FreeBSD.org>

Introduce copyinfrom and copyinstrfrom, which can copy data from either
user or kernel space. This will allow layering of os-compat (e.g.: linux)
system calls. Apply the changes to mount.


# 9ed346ba 08-Feb-2001 Bosko Milekic <bmilekic@FreeBSD.org>

Change and clean the mutex lock interface.

mtx_enter(lock, type) becomes:

mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks)
mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)

similarily, for releasing a lock, we now have:

mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN.
We change the caller interface for the two different types of locks
because the semantics are entirely different for each case, and this
makes it explicitly clear and, at the same time, it rids us of the
extra `type' argument.

The enter->lock and exit->unlock change has been made with the idea
that we're "locking data" and not "entering locked code" in mind.

Further, remove all additional "flags" previously passed to the
lock acquire/release routines with the exception of two:

MTX_QUIET and MTX_NOSWITCH

The functionality of these flags is preserved and they can be passed
to the lock/unlock routines by calling the corresponding wrappers:

mtx_{lock, unlock}_flags(lock, flag(s)) and
mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN
locks, respectively.

Re-inline some lock acq/rel code; in the sleep lock case, we only
inline the _obtain_lock()s in order to ensure that the inlined code
fits into a cache line. In the spin lock case, we inline recursion and
actually only perform a function call if we need to spin. This change
has been made with the idea that we generally tend to avoid spin locks
and that also the spin locks that we do have and are heavily used
(i.e. sched_lock) do recurse, and therefore in an effort to reduce
function call overhead for some architectures (such as alpha), we
inline recursion for this case.

Create a new malloc type for the witness code and retire from using
the M_DEV type. The new type is called M_WITNESS and is only declared
if WITNESS is enabled.

Begin cleaning up some machdep/mutex.h code - specifically updated the
"optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN
and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently
need those.

Finally, caught up to the interface changes in all sys code.

Contributors: jake, jhb, jasone (in no particular order)


# c0c25570 12-Dec-2000 Jake Burkholder <jake@FreeBSD.org>

- Change the allproc_lock to use a macro, ALLPROC_LOCK(how), instead
of explicit calls to lockmgr. Also provides macros for the flags
pased to specify shared, exclusive or release which map to the
lockmgr flags. This is so that the use of lockmgr can be easily
replaced with optimized reader-writer locks.
- Add some locking that I missed the first time.


# 7cc0979f 08-Dec-2000 David Malone <dwmalone@FreeBSD.org>

Convert more malloc+bzero to malloc+M_ZERO.

Submitted by: josh@zipperup.org
Submitted by: Robert Drehmel <robd@gmx.net>


# 553629eb 22-Nov-2000 Jake Burkholder <jake@FreeBSD.org>

Protect the following with a lockmgr lock:

allproc
zombproc
pidhashtbl
proc.p_list
proc.p_hash
nextpid

Reviewed by: jhb
Obtained from: BSD/OS and netbsd


# 279d7226 18-Nov-2000 Matthew Dillon <dillon@FreeBSD.org>

This patchset fixes a large number of file descriptor race conditions.
Pre-rfork code assumed inherent locking of a process's file descriptor
array. However, with the advent of rfork() the file descriptor table
could be shared between processes. This patch closes over a dozen
serious race conditions related to one thread manipulating the table
(e.g. closing or dup()ing a descriptor) while another is blocked in
an open(), close(), fcntl(), read(), write(), etc...

PR: kern/11629
Discussed with: Alexander Viro <viro@math.psu.edu>


# 1d7e3e42 02-Nov-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Take VBLK devices further out of their missery.

This should fix the panic I introduced in my previous commit on this topic.


# 35e0e5b3 20-Oct-2000 John Baldwin <jhb@FreeBSD.org>

Catch up to moving headers:
- machine/ipl.h -> sys/ipl.h
- machine/mutex.h -> sys/mutex.h


# a18b1f1d 03-Oct-2000 Jason Evans <jasone@FreeBSD.org>

Convert lockmgr locks from using simple locks to using mutexes.

Add lockdestroy() and appropriate invocations, which corresponds to
lockinit() and must be called to clean up after a lockmgr lock is no
longer needed.


# 98d39ed4 14-Sep-2000 Eivind Eklund <eivind@FreeBSD.org>

Add function comments for functions missing them


# 1d95078a 14-Sep-2000 Eivind Eklund <eivind@FreeBSD.org>

Blow away COMPAT_43 support for mount


# 9ff5ce6b 12-Sep-2000 Boris Popov <bp@FreeBSD.org>

Add three new VOPs: VOP_CREATEVOBJECT, VOP_DESTROYVOBJECT and VOP_GETVOBJECT.
They will be used by nullfs and other stacked filesystems to support full
cache coherency.

Reviewed in general by: mckusick, dillon


# b4d0de58 04-Sep-2000 Robert Watson <rwatson@FreeBSD.org>

o Remove commented out code which modified return values from
extattr_{get,set} syscalls in the face of partial reads or writes.

Obtained from: TrustedBSD Project


# 8577117c 01-Sep-2000 Don Lewis <truckman@FreeBSD.org>

access() shouldn't diddle with the contents of a potentially shared
credential. Create a temporary copy of the current credential and
modify the copy.

Submitted by: tegge


# 3c2498c0 08-Aug-2000 Tor Egge <tegge@FreeBSD.org>

Don't set flags on the mount structure before all permission checks have
been done.

Don't allow multiple mount operations with MNT_UPDATE at the same
time on the same mount point. When the first mount operation
completed, MNT_UPDATE was cleared in the mount structure, causing
the second to complete as if it was a no-update mount operation
with the following bad side effects:

- mount structure inserted multiple times onto the mountlist
- vp->v_mountedhere incorrectly set, causing next namei
operation walking into the mountpoint to crash with
a locking against myself panic.

Plug a vnode leak in case vinvalbuf fails.


# fc3345a4 28-Jul-2000 Robert Watson <rwatson@FreeBSD.org>

o Modify extattr_{set,get}() syscalls so that partial reads and writes
with an error condition such as EINTR, EWOULDBLOCK, and ERESTART,
are reported to the application, not silently conceal. This
behavior was copied from the {read,write}v() syscalls, and is
appropriate there but not here.
o Correct a bug in extattr_delete() wherein the LOCKLEAF flag is
passed to the wrong argument in namei(), resulting in some
unexpected errors during name resolution, and passing in an unlocked
vnode.

Obtained from: TrustedBSD Project


# 3ce7b7aa 26-Jul-2000 Robert Watson <rwatson@FreeBSD.org>

o Lock vnode before calling extattr_* VOP's, and modify vnode spec to
allow for that.
o Remember to call NDFREE() if exiting as a result of a failed
vn_start_write() when snapshotting.

Reviewed by: mckusick
Obtained from: TrustedBSD Project


# aec3bbe1 24-Jul-2000 Kirk McKusick <mckusick@FreeBSD.org>

Do not need vrele(nd.ni_vp) as that is done by NDFREE(&nd, 0);

Submitted by: Peter Holm <pho@freebsd.org>


# f2a2857b 11-Jul-2000 Kirk McKusick <mckusick@FreeBSD.org>

Add snapshots to the fast filesystem. Most of the changes support
the gating of system calls that cause modifications to the underlying
filesystem. The gating can be enabled by any filesystem that needs
to consistently suspend operations by adding the vop_stdgetwritemount
to their set of vnops. Once gating is enabled, the function
vfs_write_suspend stops all new write operations to a filesystem,
allows any filesystem modifying system calls already in progress
to complete, then sync's the filesystem to disk and returns. The
function vfs_write_resume allows the suspended write operations to
begin again. Gating is not added by default for all filesystems as
for SMP systems it adds two extra locks to such critical kernel
paths as the write system call. Thus, gating should only be added
as needed.

Details on the use and current status of snapshots in FFS can be
found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness
is not included here. Unless and until you create a snapshot file,
these changes should have no effect on your system (famous last words).


# e6796b67 03-Jul-2000 Kirk McKusick <mckusick@FreeBSD.org>

Move the truncation code out of vn_open and into the open system call
after the acquisition of any advisory locks. This fix corrects a case
in which a process tries to open a file with a non-blocking exclusive
lock. Even if it fails to get the lock it would still truncate the
file even though its open failed. With this change, the truncation
is done only after the lock is successfully acquired.

Obtained from: BSD/OS


# 3275cf73 03-Jul-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Make the two calls from kern/* into softupdates #ifdef SOFTUPDATES,
that is way cleaner than using the softupdates_stub stunt, which
should be killed when convenient.

Discussed with: mckusick


# 6c66bbed 29-Jun-2000 Archie Cobbs <archie@FreeBSD.org>

Move the securelevel check before loading KLD's into linker_load_file(),
instead of requiring every caller of linker_load_file() to perform the
check itself. This avoids netgraph loading KLD's when securelevel > 0,
not to mention any future code that may call linker_load_file().

Reviewed by: dfr


# 7c50d772 16-Jun-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Revert part of my bioops change which implemented panic(8).


# a2e7a027 16-Jun-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Virtualizes & untangles the bioops operations vector.

Ref: Message-ID: <18317.961014572@critter.freebsd.dk> To: current@


# 9626b608 05-May-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Separate the struct bio related stuff out of <sys/buf.h> into
<sys/bio.h>.

<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall
not be made a nested include according to bdes teachings on the
subject of nested includes.

Diskdrivers and similar stuff below specfs::strategy() should no
longer need to include <sys/buf.> unless they need caching of data.

Still a few bogus uses of struct buf to track down.

Repocopy by: peter


# 36e9f877 28-Mar-2000 Matthew Dillon <dillon@FreeBSD.org>

Commit major SMP cleanups and move the BGL (big giant lock) in the
syscall path inward. A system call may select whether it needs the MP
lock or not (the default being that it does need it).

A great deal of conditional SMP code for various deadended experiments
has been removed. 'cil' and 'cml' have been removed entirely, and the
locking around the cpl has been removed. The conditional
separately-locked fast-interrupt code has been removed, meaning that
interrupts must hold the CPL now (but they pretty much had to anyway).
Another reason for doing this is that the original separate-lock for
interrupts just doesn't apply to the interrupt thread mechanism being
contemplated.

Modifications to the cpl may now ONLY occur while holding the MP
lock. For example, if an otherwise MP safe syscall needs to mess with
the cpl, it must hold the MP lock for the duration and must (as usual)
save/restore the cpl in a nested fashion.

This is precursor work for the real meat coming later: avoiding having
to hold the MP lock for common syscalls and I/O's and interrupt threads.
It is expected that the spl mechanisms and new interrupt threading
mechanisms will be able to run in tandem, allowing a slow piecemeal
transition to occur.

This patch should result in a moderate performance improvement due to
the considerable amount of code that has been removed from the critical
path, especially the simplification of the spl*() calls. The real
performance gains will come later.

Approved by: jkh
Reviewed by: current, bde (exception.s)
Some work taken from: luoqi's patch


# bd5f5da9 09-Jan-2000 Kirk McKusick <mckusick@FreeBSD.org>

Add bwillwrite to all system calls that create things in the filesystem.
Benchmarks that create huge trees of empty files overwhelm the buffer cache.


# 91f37dcb 18-Dec-1999 Robert Watson <rwatson@FreeBSD.org>

Second pass commit to introduce new ACL and Extended Attribute system
calls, vnops, vfsops, both in /kern, and to individual file systems that
require a vfsop_ array entry.

Reviewed by: eivind


# 762e6b85 15-Dec-1999 Eivind Eklund <eivind@FreeBSD.org>

Introduce NDFREE (and remove VOP_ABORTOP)


# 3854a87e 11-Dec-1999 Matthew Dillon <dillon@FreeBSD.org>

Remove accidental pollution unrelated to previous commit. The issue
here is real but has not yet been discussed with Eivind.


# 4f79d873 11-Dec-1999 Matthew Dillon <dillon@FreeBSD.org>

Add MAP_NOSYNC feature to mmap(), and MADV_NOSYNC and MADV_AUTOSYNC to
madvise().

This feature prevents the update daemon from gratuitously flushing
dirty pages associated with a mapped file-backed region of memory. The
system pager will still page the memory as necessary and the VM system
will still be fully coherent with the filesystem. Modifications made
by other means to the same area of memory, for example by write(), are
unaffected. The feature works on a page-granularity basis.

MAP_NOSYNC allows one to use mmap() to share memory between processes
without incuring any significant filesystem overhead, putting it in
the same performance category as SysV Shared memory and anonymous memory.

Reviewed by: julian, alc, dg


# 0429e37a 20-Nov-1999 Poul-Henning Kamp <phk@FreeBSD.org>

struct mountlist and struct mount.mnt_list have no business being
a CIRCLEQ. Change them to TAILQ_HEAD and TAILQ_ENTRY respectively.

This removes ugly mp != (void*)&mountlist comparisons.

Requested by: phk
Submitted by: Jake Burkholder jake@checker.org
PR: 14967


# 91921bd5 18-Nov-1999 Matthew Dillon <dillon@FreeBSD.org>

Ensure that garbage from the kernel stack does not wind up being
returned to user mode in the spare fields of the stat structure.

PR: kern/14966
Reviewed by: dillon@freebsd.org
Submitted by: Kelly Yancey kbyanc@posi.net


# 1b727751 16-Nov-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Commit the remaining part of PR14914:

Alot of the code in sys/kern directly accesses the *Q_HEAD and *Q_ENTRY
structures for list operations. This patch makes all list operations
in sys/kern use the queue(3) macros, rather than directly accessing the
*Q_{HEAD,ENTRY} structures.

Reviewed by: phk
Submitted by: Jake Burkholder <jake@checker.org>
PR: 14914


# dd8c04f4 13-Nov-1999 Eivind Eklund <eivind@FreeBSD.org>

Remove WILLRELE from VOP_SYMLINK

Note: Previous commit to these files (except coda_vnops and devfs_vnops)
that claimed to remove WILLRELE from VOP_RENAME actually removed it from
VOP_MKNOD.


# 020024f3 13-Nov-1999 Eivind Eklund <eivind@FreeBSD.org>

Fix style bugs from last commit


# edfe736d 11-Nov-1999 Eivind Eklund <eivind@FreeBSD.org>

Remove WILLRELE from VOP_RENAME


# 5b42dac8 31-Oct-1999 Julian Elischer <julian@FreeBSD.org>

Most modern OSs have the ability to flag certain mounts as ones to
be ignored by default by the df(1) program. This is used mostly to
avoid stat()-ing entries that do not represent "real" disk mount
points (such as those made by an automounter such as amd.) It is
also useful not to have to stat() these entries because it takes
longer to report them that for other file systems, being that these
mount points are served by a user-level file server and resulting in
several context switches. Worse, if the automounter is down
unexpectedly, a causal df(1) will hang in an interruptible way.

PR: kern/9764
Submitted by: Erez Zadok <ezk@cs.columbia.edu>


# d1f088da 11-Oct-1999 Peter Wemm <peter@FreeBSD.org>

Trim unused options (or #ifdef for undoc options).

Submitted by: phk


# 3b6fb885 02-Oct-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Before we start to mess with the VFS name-cache clean things up a little bit:
Isolate the namecache in its own file, and give it a dedicated malloc type.


# 1b5464ef 29-Sep-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Remove v_maxio from struct vnode.

Replace it with mnt_iosize_max in struct mount.

Nits from: bde


# 2fe5bd8b 25-Sep-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Fix a hole in jail(2).

Noticed by: Alexander Bezroutchko <abb@zenon.net>


# c24fda81 10-Sep-1999 Alfred Perlstein <alfred@FreeBSD.org>

Seperate the export check in VFS_FHTOVP, exports are now checked via
VFS_CHECKEXP.

Add fh(open|stat|stafs) syscalls to allow userland to query filesystems
based on (network) filehandle.

Obtained from: NetBSD


# c3aac50f 27-Aug-1999 Peter Wemm <peter@FreeBSD.org>

$Id$ -> $FreeBSD$


# dbafb366 26-Aug-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Simplify the handling of VCHR and VBLK vnodes using the new dev_t:

Make the alias list a SLIST.

Drop the "fast recycling" optimization of vnodes (including
the returning of a prexisting but stale vnode from checkalias).
It doesn't buy us anything now that we don't hardlimit
vnodes anymore.

Rename checkalias2() and checkalias() to addalias() and
addaliasu() - which takes dev_t and udev_t arg respectively.

Make the revoke syscalls use vcount() instead of VALIASED.

Remove VALIASED flag, we don't need it now and it is faster
to traverse the much shorter lists than to maintain the
flag.

vfs_mountedon() can check the dev_t directly, all the vnodes
point to the same one.

Print the devicename in specfs/vprint().

Remove a couple of stale LFS vnode flags.

Remove unimplemented/unused LK_DRAINED;


# af255dc5 22-Aug-1999 John Polstra <jdp@FreeBSD.org>

Go back to using microtime() to get the timestamps for {f,l,}utimes(path,
NULL) for now. Bruce says I jumped the gun with my change in
revision 1.131, or maybe it should use nanotime(), or maybe it
shouldn't be decided in the VFS layer at all. I'm leaving it with
the old behavior until the Trans-Pacific Internet Vulcan Mind Meld
yields fuller understanding.


# 4f2a0d4f 21-Aug-1999 John Polstra <jdp@FreeBSD.org>

Use the new vfs_timestamp() function to create the timestamps used
by utimes(path, NULL). This gives them the same precision as the
timestamps produced by write operations. Do likewise for lutimes()
and futimes().

Suggested by bde.


# f4af31cb 12-Aug-1999 Alfred Perlstein <alfred@FreeBSD.org>

Replace a redundant vfs_object_create() call (already done in vn_open)
with a KASSERT.

Reviewed by: Eivind, Alan Cox


# e32c66c5 04-Aug-1999 Brian Feldman <green@FreeBSD.org>

Fix fd race conditions (during shared fd table usage.) Badfileops is
now used in f_ops in place of NULL, and modifications to the files
are more carefully ordered. f_ops should also be set to &badfileops
upon "close" of a file.

This does not fix other problems mentioned in this PR than the first
one.

PR: 11629
Reviewed by: peter


# 711103c1 03-Aug-1999 Warner Losh <imp@FreeBSD.org>

o Typo in prior version kept it from compiling (blush).

Noticed by: Nobody!

o Add comment about why we restrict chflags to root for devices.
o nit noticed by bde wrt return values.


# e82ef978 03-Aug-1999 Warner Losh <imp@FreeBSD.org>

brucify:
o use suser_xxx rather than suser to support JAIL code.
o KNF comment convention
o use vp->type rather than vaddr.type and eliminate call to
VOP_GETATTR. Bruce says that vp->type is valid at this
point.

Submitted by: bde.

Not fixed:
o return (value)
o Comment needs to be longer and more explicit. It will be after
the advisory.


# f76f09c1 02-Aug-1999 Warner Losh <imp@FreeBSD.org>

Only allow root to set file flags on devices.


# ab533dd0 29-Jul-1999 Brian Feldman <green@FreeBSD.org>

lutimes() bug: FOLLOW should be NOFOLLOW for this one.

Submitted by: Dan Nelson <dnelson@emsphone.com>


# 67452993 26-Jul-1999 Alan Cox <alc@FreeBSD.org>

Add sysctl and support code to allow directories to be VMIO'd. The default
setting for the sysctl is OFF, which is the historical operation.

Submitted by: dillon


# 75c13541 28-Apr-1999 Poul-Henning Kamp <phk@FreeBSD.org>

This Implements the mumbled about "Jail" feature.

This is a seriously beefed up chroot kind of thing. The process
is jailed along the same lines as a chroot does it, but with
additional tough restrictions imposed on what the superuser can do.

For all I know, it is safe to hand over the root bit inside a
prison to the customer living in that prison, this is what
it was developed for in fact: "real virtual servers".

Each prison has an ip number associated with it, which all IP
communications will be coerced to use and each prison has its own
hostname.

Needless to say, you need more RAM this way, but the advantage is
that each customer can run their own particular version of apache
and not stomp on the toes of their neighbors.

It generally does what one would expect, but setting up a jail
still takes a little knowledge.

A few notes:

I have no scripts for setting up a jail, don't ask me for them.

The IP number should be an alias on one of the interfaces.

mount a /proc in each jail, it will make ps more useable.

/proc/<pid>/status tells the hostname of the prison for
jailed processes.

Quotas are only sensible if you have a mountpoint per prison.

There are no privisions for stopping resource-hogging.

Some "#ifdef INET" and similar may be missing (send patches!)

If somebody wants to take it from here and develop it into
more of a "virtual machine" they should be most welcome!

Tools, comments, patches & documentation most welcome.

Have fun...

Sponsored by: http://www.rndassociates.com/
Run for almost a year by: http://www.servetheweb.com/


# f711d546 27-Apr-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Suser() simplification:

1:
s/suser/suser_xxx/

2:
Add new function: suser(struct proc *), prototyped in <sys/proc.h>.

3:
s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/

The remaining suser_xxx() calls will be scrutinized and dealt with
later.

There may be some unneeded #include <sys/cred.h>, but they are left
as an exercise for Bruce.

More changes to the suser() API will come along with the "jail" code.


# cc7532aa 23-Mar-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Add a sysctl variable which can help stop chroot(2) escapes.

kern.chroot_allow_open_directories = 0
chroot(2) fails if there are open directories.

kern.chroot_allow_open_directories = 1 (default)
chroot(2) fails if there are open directories and the process
is subject of a previous chroot(2).

kern.chroot_allow_open_directories = anything else
filedescriptors are not checked. (old behaviour).

I'm very interested in reports about software which breaks when
running with the default setting.


# cb11191c 02-Mar-1999 Julian Elischer <julian@FreeBSD.org>

Slight cleanup of code resurected for union mounts..
Submitted by: Tony Finch <dot@dotat.at>


# 1871f6cd 27-Feb-1999 Julian Elischer <julian@FreeBSD.org>

Fix code for union mounts
Accidentally deleted by peter when he extracted the unionfs stuff in 1.109

Submitted by: Tony Finch <dot@dotat.at>


# a5c9bce7 25-Feb-1999 Bruce Evans <bde@FreeBSD.org>

Added a used #include (don't depend on "vnode_if.h" including <sys/buf.h>).


# ce02431f 16-Feb-1999 Doug Rabson <dfr@FreeBSD.org>

* Change sysctl from using linker_set to construct its tree using SLISTs.
This makes it possible to change the sysctl tree at runtime.

* Change KLD to find and register any sysctl nodes contained in the loaded
file and to unregister them when the file is unloaded.

Reviewed by: Archie Cobbs <archie@whistle.com>,
Peter Wemm <peter@netplex.com.au> (well they looked at it anyway)


# 4e48a6bf 29-Jan-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Use suser() to determine super-user-ness.
Collapse some duplicated checks.

Reviewed by: bde


# 697457a1 28-Jan-1999 Matthew Dillon <dillon@FreeBSD.org>

Fix warnings related to -Wall -Wcast-qual


# d254af07 27-Jan-1999 Matthew Dillon <dillon@FreeBSD.org>

Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile


# 73a6265d 23-Jan-1999 Bruce Evans <bde@FreeBSD.org>

Go back to only supporting revoke() for bdevs and cdevs. It is very
buggy for fifos, and no one seems to have investigated its behaviour
on other types of files. It has been broken since the Lite2 merge
in rev.1.54.

Nagged about by: Brian Feldman (green@unixhelp.org)


# fb116777 05-Jan-1999 Eivind Eklund <eivind@FreeBSD.org>

Remove the 'waslocked' parameter to vfs_object_create().


# 4c016975 12-Dec-1998 Matthew Dillon <dillon@FreeBSD.org>

PR: kern/8965
Obtained from: Stephen Clawson <sclawson@cs.utah.edu>

Wakeup anyone waiting on a mount point prior to returning from umount,
whether an error occurs or not. Fixes a stat/NFS-umount race and other
potential future problems. Fix taken from bug/pr which also indicated
that the same fix has already been applied to OpenBSD and NetBSD.


# 02fc72db 03-Nov-1998 Peter Wemm <peter@FreeBSD.org>

make mount(2) automatically kldload modules if the requested filesystem
isn't present.


# 8c14bf40 03-Nov-1998 Peter Wemm <peter@FreeBSD.org>

Change the #ifdef UNION code into a callable hook. Arrange to have this
set up when unionfs is present, either statically or as a kld module.


# b421db37 31-Oct-1998 Peter Wemm <peter@FreeBSD.org>

The last argument to vm_object_page_clean() are now bit flags, rather than
the old true/false.

While here, have vfs_msync() only call vm_object_page_clean() with
OBJPC_SYNC if called with MNT_WAIT flags. vfs_msync() is called at unmount
time (with MNT_WAIT) and from the syncer process (formerly update).
This should make dirty mmap writebacks a little less nasty.

I have tested this a little with SOFTUPDATES enabled, but I don't normally
use it since I've been badly burned too many times.


# e266594c 24-Sep-1998 Luoqi Chen <luoqi@FreeBSD.org>

Eliminate a race in VOP_FSYNC() when softupdates is enabled.
Submitted by: Kirk McKusick <mckusick@McKusick.COM>
Two minor changes are also included,
1. Remove gratuitious checks for error return from vn_lock with LK_RETRY set,
vn_lock should always succeed in these cases.
2. Back out change rev. 1.36->1.37, which unnecessarily makes async mount
a little more unstable. It also keeps us in sync with other BSDs.
Suggested by: Bruce Evans <bde@zeta.org.au>


# a58915fc 09-Sep-1998 Tor Egge <tegge@FreeBSD.org>

Don't keep the underlying directory locked while performing the file
system specific VFS_MOUNT operation.
PR: 1067


# a23d65bf 14-Jul-1998 Bruce Evans <bde@FreeBSD.org>

Cast pointers to uintptr_t/intptr_t instead of to u_long/long,
respectively. Most of the longs should probably have been
u_longs, but this changes is just to prevent warnings about
casts between pointers and integers of different sizes, not
to fix poorly chosen types.


# e25169f2 02-Jul-1998 David Greenman <dg@FreeBSD.org>

Reset MNT_ASYNC flag if needed if unmount() should fail.
Submitted by: Paul Saab <paul@mu.org>


# 0d3dd8fb 08-Jun-1998 John Dyson <dyson@FreeBSD.org>

Remove some junk left over from a previous commit.
Submitted by: phk


# ecbb00a2 07-Jun-1998 Doug Rabson <dfr@FreeBSD.org>

This commit fixes various 64bit portability problems required for
FreeBSD/alpha. The most significant item is to change the command
argument to ioctl functions from int to u_long. This change brings us
inline with various other BSD versions. Driver writers may like to
use (__FreeBSD_version == 300003) to detect this change.

The prototype FreeBSD/alpha machdep will follow in a couple of days
time.


# 1f562172 10-May-1998 John Dyson <dyson@FreeBSD.org>

Fix the futimes/undelete/utrace conflict with other BSD's. Note that
the only common usage of utrace (the possible problem with this
commit) is with malloc, so this should be a real problem. Add
the various NetBSD syscalls that allow full emulation of their
development environment.


# 7be2d300 06-May-1998 Mike Smith <msmith@FreeBSD.org>

In the words of the submitter:

---------
Make callers of namei() responsible for releasing references or locks
instead of having the underlying filesystems do it. This eliminates
redundancy in all terminal filesystems and makes it possible for stacked
transport layers such as umapfs or nullfs to operate correctly.

Quality testing was done with testvn, and lat_fs from the lmbench suite.

Some NFS client testing courtesy of Patrik Kudo.

vop_mknod and vop_symlink still release the returned vpp. vop_rename
still releases 4 vnode arguments before it returns. These remaining cases
will be corrected in the next set of patches.
---------

Submitted by: Michael Hancock <michaelh@cet.co.jp>


# 59bad7c5 19-Apr-1998 Dag-Erling Smørgrav <des@FreeBSD.org>

Backed out lseek changes.


# 25096724 18-Apr-1998 Dag-Erling Smørgrav <des@FreeBSD.org>

Return EINVAL and do not change file pointer if resulting offset is negative.
PR: kern/6184


# 5ddc8ded 08-Apr-1998 Wolfram Schneider <wosch@FreeBSD.org>

New mount option nosymfollow. If enabled, the kernel lookup()
function will not follow symbolic links on the mounted
file system and return EACCES (Permission denied).


# 006b9b7d 29-Mar-1998 John Dyson <dyson@FreeBSD.org>

Correct a significant problem with the softupdates port. Allow fsync
to work properly within the softupdates framework, and thereby eliminate
some unfortunate panics.


# b1897c19 08-Mar-1998 Julian Elischer <julian@FreeBSD.org>

Reviewed by: dyson@freebsd.org (john Dyson), dg@root.com (david greenman)
Submitted by: Kirk McKusick (mcKusick@mckusick.com)
Obtained from: WHistle development tree


# 8f9110f6 07-Mar-1998 John Dyson <dyson@FreeBSD.org>

This mega-commit is meant to fix numerous interrelated problems. There
has been some bitrot and incorrect assumptions in the vfs_bio code. These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances. Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code. This code might have been committed seperately, but
almost everything is interrelated.

1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
are fully valid.
2) Rather than deactivating erroneously read initial (header) pages in
kern_exec, we now free them.
3) Fix the rundown of non-VMIO buffers that are in an inconsistent
(missing vp) state.
4) Fix the disassociation of pages from buffers in brelse. The previous
code had rotted and was faulty in a couple of important circumstances.
5) Remove a gratuitious buffer wakeup in vfs_vmio_release.
6) Remove a crufty and currently unused cluster mechanism for VBLK
files in vfs_bio_awrite. When the code is functional, I'll add back
a cleaner version.
7) The page busy count wakeups assocated with the buffer cache usage were
incorrectly cleaned up in a previous commit by me. Revert to the
original, correct version, but with a cleaner implementation.
8) The cluster read code now tries to keep data associated with buffers
more aggressively (without breaking the heuristics) when it is presumed
that the read data (buffers) will be soon needed.
9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The
delay loop waiting is not useful for filesystem locks, due to the
length of the time intervals.
10) Correct and clean-up spec_getpages.
11) Implement a fully functional nfs_getpages, nfs_putpages.
12) Fix nfs_write so that modifications are coherent with the NFS data on
the server disk (at least as well as NFS seems to allow.)
13) Properly support MS_INVALIDATE on NFS.
14) Properly pass down MS_INVALIDATE to lower levels of the VM code from
vm_map_clean.
15) Better support the notion of pages being busy but valid, so that
fewer in-transit waits occur. (use p->busy more for pageouts instead
of PG_BUSY.) Since the page is fully valid, it is still usable for
reads.
16) It is possible (in error) for cached pages to be busy. Make the
page allocation code handle that case correctly. (It should probably
be a printf or panic, but I want the system to handle coding errors
robustly. I'll probably add a printf.)
17) Correct the design and usage of vm_page_sleep. It didn't handle
consistancy problems very well, so make the design a little less
lofty. After vm_page_sleep, if it ever blocked, it is still important
to relookup the page (if the object generation count changed), and
verify it's status (always.)
18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20) Fix vm_pager_put_pages and it's descendents to support an int flag
instead of a boolean, so that we can pass down the invalidate bit.


# 9f24f214 14-Feb-1998 John Dyson <dyson@FreeBSD.org>

Make the rootdir handling more consistent. Now, processes always
have a root vnode associated with them, and no special checks for
the null case are needed.
Submitted by: terry@freebsd.org


# 3217023e 07-Feb-1998 John Dyson <dyson@FreeBSD.org>

Fix a problem with vn_lock in fsync.


# 0b08f5f7 05-Feb-1998 Eivind Eklund <eivind@FreeBSD.org>

Back out DIAGNOSTIC changes.


# 47cfdb16 04-Feb-1998 Eivind Eklund <eivind@FreeBSD.org>

Turn DIAGNOSTIC into a new-style option.


# 95e5e988 05-Jan-1998 John Dyson <dyson@FreeBSD.org>

Make our v_usecount vnode reference count work identically to the
original BSD code. The association between the vnode and the vm_object
no longer includes reference counts. The major difference is that
vm_object's are no longer freed gratuitiously from the vnode, and so
once an object is created for the vnode, it will last as long as the
vnode does.

When a vnode object reference count is incremented, then the underlying
vnode reference count is incremented also. The two "objects" are now
more intimately related, and so the interactions are now much less
complex.

When vnodes are now normally placed onto the free queue with an object still
attached. The rundown of the object happens at vnode rundown time, and
happens with exactly the same filesystem semantics of the original VFS
code. There is absolutely no need for vnode_pager_uncache and other
travesties like that anymore.

A side-effect of these changes is that SMP locking should be much simpler,
the I/O copyin/copyout optimizations work, NFS should be more ponderable,
and further work on layered filesystems should be less frustrating, because
of the totally coherent management of the vnode objects and vnodes.

Please be careful with your system while running this code, but I would
greatly appreciate feedback as soon a reasonably possible.


# 2be70f79 28-Dec-1997 John Dyson <dyson@FreeBSD.org>

Lots of improvements, including restructring the caching and management
of vnodes and objects. There are some metadata performance improvements
that come along with this. There are also a few prototypes added when
the need is noticed. Changes include:

1) Cleaning up vref, vget.
2) Removal of the object cache.
3) Nuke vnode_pager_uncache and friends, because they aren't needed anymore.
4) Correct some missing LK_RETRY's in vn_lock.
5) Correct the page range in the code for msync.

Be gentle, and please give me feedback asap.


# 675ea6f0 26-Dec-1997 Bruce Evans <bde@FreeBSD.org>

Unspammed nested include of <vm/vm_zone.h>.


# 5591b823d 16-Dec-1997 Eivind Eklund <eivind@FreeBSD.org>

Make COMPAT_43 and COMPAT_SUNOS new-style options.


# 52aef196 02-Dec-1997 Bruce Evans <bde@FreeBSD.org>

Cleaned up __getcwd(). This should be cosmetic except disabled calls
are now counted.

Reviewed by: phk


# 865737f4 21-Nov-1997 Bruce Evans <bde@FreeBSD.org>

Staticized.

Use OID_AUTO instead of a magic number for the debug.syncprt sysctl.
(This sysctl doesn't actually work. FreeBSD nuked it, but parts
of it were mismerged from Lite2. It is not very good, but better
than nothing.)


# d02601f8 21-Nov-1997 Bruce Evans <bde@FreeBSD.org>

Fixed rev.1.81. mp->mnt_kern_flag was restored in the non-error case of
`mount -u'. This only matters for `mount -u' competing with unmounts.
If I understand the locking correctly: if mount() blocks, then unmount()
may run and set mp->kern_flag for the same mp. Then unmount() blocks
waiting for mount() to finish. When unmount() continues, its MNTK flags
(MNTK_UNMOUNT and MNTK_MWAIT) may have been clobbered.

Didn't fix old bugs:
- restoring mp->mnt_kern_flag is wrong for the same reasons in the error
case.
- the error case of unmount() seems to be broken too:
(a) MNTK_UNMOUNT gets clobbered, although another unmount() may have
set it. Perhaps it shouldn't be set until after the full lock is
aquired.
(b) MNTK_MWAIT isn't honoured.

Fixed a nearby style bug.


# 52bf64c7 12-Nov-1997 Julian Elischer <julian@FreeBSD.org>

Reviewed by: hackers@freebsd.org in general
Obtained from: Whistle Communications tree

Add an option to the way UFS works dependent on the SUID bit of directories
This changes makes things a whole lot simpler on systems running as
fileservers for PCs and MACS. to enable the new code you must
1/ enable option SUIDDIR on the kernel.
2/ mount the filesystem with option suiddir.
hopefully this makes it difficult enough for people to
do this accidentally.
see the new chmod(2) man page for detailed info.


# b1f4a44b 11-Nov-1997 Julian Elischer <julian@FreeBSD.org>

Reviewed by: various.

Ever since I first say the way the mount flags were used I've hated the
fact that modes, and events, internal and exported, and short-term
and long term flags are all thrown together. Finally it's annoyed me enough..
This patch to the entire FreeBSD tree adds a second mount flag word
to the mount struct. it is not exported to userspace. I have moved
some of the non exported flags over to this word. this means that we now
have 8 free bits in the mount flags. There are another two that might
well move over, but which I'm not sure about.
The only user visible change would have been in pstat -v, except
that davidg has disabled it anyhow.
I'd still like to move the state flags and the 'command' flags
apart from each other.. e.g. MNT_FORCE really doesn't have the
same semantics as MNT_RDONLY, but that's left for another day.


# cb226aaa 06-Nov-1997 Poul-Henning Kamp <phk@FreeBSD.org>

Move the "retval" (3rd) parameter from all syscall functions and put
it in struct proc instead.

This fixes a boatload of compiler warning, and removes a lot of cruft
from the sources.

I have not removed the /*ARGSUSED*/, they will require some looking at.

libkvm, ps and other userland struct proc frobbing programs will need
recompiled.


# 1315bcf7 28-Oct-1997 Bruce Evans <bde@FreeBSD.org>

Fixed style bugs in open() fix.


# 1c1ff294 23-Oct-1997 KATO Takenori <kato@FreeBSD.org>

Disallow non-root mount. If you want to allow non-root mount, change
vfs.usermount into 1 with sysctl.


# 2094bd73 22-Oct-1997 Joerg Wunsch <joerg@FreeBSD.org>

Reject attempts to call open() with an illegal combination of O_RDONLY,
O_WRONLY, O_RDWR.


# a1c995b6 12-Oct-1997 Poul-Henning Kamp <phk@FreeBSD.org>

Last major round (Unless Bruce thinks of somthing :-) of malloc changes.

Distribute all but the most fundamental malloc types. This time I also
remembered the trick to making things static: Put "static" in front of
them.

A couple of finer points by: bde


# ad324c88 28-Sep-1997 Poul-Henning Kamp <phk@FreeBSD.org>

Fix handling of nested mountpoints in __getcwd()

Detected by: Simon Shapiro <Shimon@i-Connect.Net>


# 81bca6dd 27-Sep-1997 KATO Takenori <kato@FreeBSD.org>

Clustered read and write are switched at mount-option level.

1. Clustered I/O is switched by the MNT_NOCLUSTERR and MNT_NOCLUSTERW
bits of the mnt_flag. The sysctl variables, vfs.foo.doclusterread
and vfs.foo.doclusterwrite are deleted. Only mount option can
control clustered I/O from userland.
2. When foofs_mount mounts block device, foofs_mount checks D_CLUSTERR
and D_CLUSTERW bits of the d_flags member in the block device switch
table. If D_NOCLUSTERR / D_NOCLUSTERW are set, MNT_NOCLUSTERR /
MNT_NOCLUSTERW bits will be set. In this case, MNT_NOCLUSTERR and
MNT_NOCLUSTERW cannot be cleared from userland.
3. Vnode driver disables both clustered read and write.
4. Union filesystem disables clutered write.

Reviewed by: bde


# 00544193 24-Sep-1997 Poul-Henning Kamp <phk@FreeBSD.org>

A couple of handles to tweak, more statistics.


# 99448ed1 20-Sep-1997 John Dyson <dyson@FreeBSD.org>

Change the M_NAMEI allocations to use the zone allocator. This change
plus the previous changes to use the zone allocator decrease the useage
of malloc by half. The Zone allocator will be upgradeable to be able
to use per CPU-pools, and has more intelligent usage of SPLs. Additionally,
it has reasonable stats gathering capabilities, while making most calls
inline.


# 044839fb 16-Sep-1997 Poul-Henning Kamp <phk@FreeBSD.org>

Don't leak memory, from sef.
Stylistic nits and a blunder, from bde.


# 7874d7a3 15-Sep-1997 Poul-Henning Kamp <phk@FreeBSD.org>

Solve race-condition, return path in normal order.
A couple of stylistic nits from Bruce.

If your libc contains version 1.11 or 1.12 of getcwd.c, (ie: if
you recompiled libc one of the last couple of days):
>>> Recompile LIBC before you boot a new kernel <<<
A new libc will deal with both old and new kernels.


# d56f6402 15-Sep-1997 Poul-Henning Kamp <phk@FreeBSD.org>

Deal more correctly with mountpoints.


# 7822f1c6 14-Sep-1997 Poul-Henning Kamp <phk@FreeBSD.org>

Add a __getcwd() syscall. This is intentionally undocumented, but all
it does is to try to figure the pwd out from the vfs namecache, and
return a reversed string to it. libc:getcwd() is responsible for
flipping it back.


# e4ba6a82 02-Sep-1997 Bruce Evans <bde@FreeBSD.org>

Removed unused #includes.


# f6b4c285 17-Jul-1997 Doug Rabson <dfr@FreeBSD.org>

Merge WebNFS support from NetBSD

Obtained from: NetBSD


# 42146e37 04-Apr-1997 Doug Rabson <dfr@FreeBSD.org>

[Previous comment was incorrect for these files]
Added calls to VFS lock debugging macros to make fixing filesystems' locking
easier.


# de15ef6a 04-Apr-1997 Doug Rabson <dfr@FreeBSD.org>

Add a function vop_sharedlock which a copy of vop_nolock without the
implementation #ifdef out. This can be used for now by NFS. As soon
as all the other filesystems' locking is fixed, this can go away.

Print the vnode address in vprint for easier debugging.


# 57862eed 30-Mar-1997 Peter Wemm <peter@FreeBSD.org>

Code to do lchown(2), copied from chown(2) except it's NOFOLLOW in ND_INIT
instead of FOLLOW.


# 6c14d95d 30-Mar-1997 Peter Wemm <peter@FreeBSD.org>

Treat symlinks as first class citizens with their own uid/gid rather than
as shadows of their containing directory. This should solve the problem
of users not being able to delete their symlinks from /tmp once and for
all.

Symlinks do not have modes though, they are accessable to everything that
can read the directory (as before). They are made to show this fact at
lstat time (they appear as mode 0777 always, since that's how the the
lookup routines in the kernel treat them).

More commits will follow, eg: add a real lchown() syscall and man pages.


# 8f89943e 23-Mar-1997 Guido van Rooij <guido@FreeBSD.org>

Add generation number randomization. Newly created filesystems wil now
automatically have random generation numbers. The kenel way of handling those
also changed. Further it is advised to run fsirand on all your nfs exported
filesystems. the code is mostly copied from OpenBSD, with the randomization
chanegd to use /dev/urandom
Reviewed by: Garrett
Obtained from: OpenBSD


# 3ac4d1ef 22-Mar-1997 Bruce Evans <bde@FreeBSD.org>

Don't #include <sys/fcntl.h> in <sys/file.h> if KERNEL is defined.
Fixed everything that depended on getting fcntl.h stuff from the wrong
place. Most things don't depend on file.h stuff at all.


# 3a558f83 04-Mar-1997 Mike Smith <msmith@FreeBSD.org>

Check that vp->v_mount is non-null in fsync() before dereferencing it to
obtain the mountpoint's MNT_ASYNC flag.

This is a Very Definite Last-Minute 2.2 Bugfix Candidate.

Reviewed by: sef


# 6875d254 22-Feb-1997 Peter Wemm <peter@FreeBSD.org>

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


# 61f84e5b 12-Feb-1997 Mike Pritchard <mpp@FreeBSD.org>

Don't depend on FIFO being defined to enable mkfifo.
It is now always compiled.

Submitted by: bde


# 72a5ee14 12-Feb-1997 Mike Pritchard <mpp@FreeBSD.org>

Add function protypes for the new Lite2 unionfs functions.


# 820d8cf4 11-Feb-1997 Mike Pritchard <mpp@FreeBSD.org>

Comment out a call to the #ifdef DIAGNOSTIC routine
vfs_bufstats(). This routine was not imported in the
Lite2 merge.


# 996c772f 09-Feb-1997 John Dyson <dyson@FreeBSD.org>

This is the kernel Lite/2 commit. There are some requisite userland
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.

The system boots and can mount UFS filesystems.

Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
Mount_std mounts will not work until the getfsent
library routine is changed.

Reviewed by: various people
Submitted by: Jeffery Hsu <hsu@freebsd.org>


# 1130b656 14-Jan-1997 Jordan K. Hubbard <jkh@FreeBSD.org>

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# bb65f5a1 19-Dec-1996 Bruce Evans <bde@FreeBSD.org>

Fixed lseek() on named pipes. It always succeeded but should always fail.
Broke locking on named pipes in the same way as locking on non-vnodes
(wrong errno). This will be fixed later.

The fix involves negative logic. Named pipes are now distinguished from
other types of files with vnodes, and there is additional code to handle
vnodes and named pipes in the same way only where that makes sense (not
for lseek, locking or TIOCSCTTY).


# 030e2e9e 19-Sep-1996 Nate Williams <nate@FreeBSD.org>

In sys/time.h, struct timespec is defined as:

/*
* Structure defined by POSIX.4 to be like a timeval.
*/
struct timespec {
time_t ts_sec; /* seconds */
long ts_nsec; /* and nanoseconds */
};

The correct names of the fields are tv_sec and tv_nsec.

Reminded by: James Drobina <jdrobina@infinet.com>


# b71fec07 03-Sep-1996 Bruce Evans <bde@FreeBSD.org>

Eliminated nested include of <sys/unistd.h> in <sys/file.h> in the kernel.
Include it directly in the few places where it is used.

Reduced some #includes of <sys/file.h> to #includes of <sys/fcntl.h> or
nothing.


# 9e043042 03-Sep-1996 David Greenman <dg@FreeBSD.org>

Implemented kernel side of MNT_NOATIME mount option. This option disables
the file access time update on reads and can be useful in reducing
filesystem overhead in cases where the access time is not important (like
Usenet news spools).


# 472fe5e4 24-May-1996 Peter Wemm <peter@FreeBSD.org>

Dont allow directories to be link()ed or unlink()ed, even for root
(returns EPERM always, the errno is specified by POSIX).

If you really have a desperate need to link or unlink a directory, you
can use fsdb. :-)

This should stop any chance of ftpd, rdist, "rm -rf", etc from
bugging out and damaging the filesystem structure or loosing races
with malicious users.

Reviewed by: davidg, bde


# d03b4017 10-May-1996 Bruce Evans <bde@FreeBSD.org>

Hide options for emulators and static file systems in opt_dontuse.h.
These options only apply at config time. Using them at compile time
would break the corresponding lkms.


# c128d215 16-Jan-1996 David Greenman <dg@FreeBSD.org>

Make sure the mountpoint is marked busy before doing operations on it.
This fixes a panic that freefall suffered last night.

Obtained partially from 4.4-lite2, but minus the new bug that it introduced


# 692910e6 05-Jan-1996 Garrett Wollman <wollman@FreeBSD.org>

convert FDESC, KERNFS, NULLFS, PORTAL, UMAPFS, and UNION to the new
style of options.


# 27a0b398 17-Dec-1995 Poul-Henning Kamp <phk@FreeBSD.org>

Staticize.
Unstaticize a function in scsi/scsi_base that was used, with an undocumented
option.
My last count on the LINT kernel shows:
Total symbols: 3647
unref symbols: 463
undef symbols: 4
1 ref symbols: 1751
2 ref symbols: 485
Approaching the pain threshold now.


# a316d390 10-Dec-1995 John Dyson <dyson@FreeBSD.org>

Changes to support 1Tb filesizes. Pages are now named by an
(object,index) pair instead of (object,offset) pair.


# efeaf95a 06-Dec-1995 David Greenman <dg@FreeBSD.org>

Untangled the vm.h include file spaghetti.


# 75a85811 18-Nov-1995 Bruce Evans <bde@FreeBSD.org>

Fixed the errno returned by rename("dir1", "dir2/."). It was EISDIR
(duh); translate it to EINVAL which is the errno for other renames
to ".".


# 395e6735 14-Nov-1995 Poul-Henning Kamp <phk@FreeBSD.org>

Change some of the debug sysctl vars. The semantics of these will change.


# c8cbd868 13-Nov-1995 Bruce Evans <bde@FreeBSD.org>

Fixed a cast in olseek().

Fixed confusing order of declarations of getvnode()'s args.


# d2d3e875 11-Nov-1995 Bruce Evans <bde@FreeBSD.org>

Included <sys/sysproto.h> to get central declarations for syscall args
structs and prototypes for syscalls.

Ifdefed duplicated decentralized declarations of args structs. It's
convenient to have this visible but they are hard to maintain. Some
are already different from the central declarations. 4.4lite2 puts
them in comments in the function headers but I wanted to avoid the
large changes for that.


# 4dc4e0d2 05-Nov-1995 John Dyson <dyson@FreeBSD.org>

Make MNT_ASYNC more effective for UFS. It should not be too much more
dangerous than the original MNT_ASYNC. There might be some minor
security considerations due to data writes not being posted as promptly
as before. Meta-data operations are still not quite as fast as Linux,
but streaming I/O is still higher.


# 046bc053 04-Nov-1995 Bruce Evans <bde@FreeBSD.org>

Prototype getvnode() in the right place (where ibcs2_stat.c can see it).


# d68a4190 22-Oct-1995 David Greenman <dg@FreeBSD.org>

Moved the filesystem read-only check out of the syscalls and into the
filesystem layer, as was done in lite-2. Merged in some other cosmetic
changes while I was at it. Rewrote most of msdosfs_access() to be more
like ufs_access() and to include the FS read-only check.

Obtained from: partially from 4.4BSD-lite2


# ad7507e2 07-Oct-1995 Steven Wallace <swallace@FreeBSD.org>

Remove prototype definitions from <sys/systm.h>.
Prototypes are located in <sys/sysproto.h>.

Add appropriate #include <sys/sysproto.h> to files that needed
protos from systm.h.

Add structure definitions to appropriate files that relied on sys/systm.h,
right before system call definition, as in the rest of the kernel source.

In kern_prot.c, instead of using the dummy structure "args", create
individual dummy structures named <syscall>_args. This makes
life easier for prototype generation.


# 2b14f991 28-Aug-1995 Julian Elischer <julian@FreeBSD.org>

Reviewed by: julian with quick glances by bruce and others
Submitted by: terry (terry lambert)
This is a composite of 3 patch sets submitted by terry.
they are:
New low-level init code that supports loadbal modules better
some cleanups in the namei code to help terry in 16-bit character support
some changes to the mount-root code to make it a little more
modular..

NOTE: mounting root off cdrom or NFS MIGHT be broken as I haven't been able
to test those cases..

certainly mounting root of disk still works just fine..
mfs should work but is untested. (tomorrows task)

The low level init stuff includes a total rewrite of init_main.c
to make it possible for new modules to have an init phase by simply
adding an entry to a TEXT_SET (or is it DATA_SET) list. thus a new module can
be added to the kernel without editing any other files other than the
'files' file.


# cf2455a3 17-Aug-1995 Bruce Evans <bde@FreeBSD.org>

The `cred' and `proc' args were missing for some VOP_OPEN() and VOP_CLOSE()
calls.

Found by: gcc -Wstrict-prototypes after I supplied some of the 5000+
missing prototypes. Now I have 9000+ lines of warnings and errors
about bogus conversions of function pointers.


# 628641f8 11-Aug-1995 David Greenman <dg@FreeBSD.org>

Converted mountlist to a CIRCLEQ.

Partially obtained from: 4.4BSD-Lite2


# 47777413 01-Aug-1995 David Greenman <dg@FreeBSD.org>

Removed my special-case hack for VOP_LINK and fixed the problem with the
wrong vp's ops vector being used by changing the VOP_LINK's argument order.
The special-case hack doesn't go far enough and breaks the generic
bypass routine used in some non-leaf filesystems. Pointed out by Kirk
McKusick.


# 70eec742 30-Jul-1995 Bruce Evans <bde@FreeBSD.org>

Ignore trailing slashes in pathnames that "refer to a directory",
as is required to be POSIXLY_CORRECT and "right". I interpret
"referring to a directory" as being a directory or becoming a
directory. E.g., the trailing slashes in mkdir("/nonesuch/"),
rename("/tmp", /nonesuch/") and link("/tmp", "/root_can_like_dirs/")
are ignored because the target will become a directory if the
syscall succeeds. A trailing slash on a symlink causes the symlink
to be followed (this is a bug if the symlink doesn't point to a
directory; fix later).


# 24a1cce3 13-Jul-1995 David Greenman <dg@FreeBSD.org>

NOTE: libkvm, w, ps, 'top', and any other utility which depends on struct
proc or any VM system structure will have to be rebuilt!!!

Much needed overhaul of the VM system. Included in this first round of
changes:

1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages,
haspage, and sync operations are supported. The haspage interface now
provides information about clusterability. All pager routines now take
struct vm_object's instead of "pagers".

2) Improved data structures. In the previous paradigm, there is constant
confusion caused by pagers being both a data structure ("allocate a
pager") and a collection of routines. The idea of a pager structure has
escentially been eliminated. Objects now have types, and this type is
used to index the appropriate pager. In most cases, items in the pager
structure were duplicated in the object data structure and thus were
unnecessary. In the few cases that remained, a un_pager structure union
was created in the object to contain these items.

3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now
be removed. For instance, vm_object_enter(), vm_object_lookup(),
vm_object_remove(), and the associated object hash list were some of the
things that were removed.

4) simple_lock's removed. Discussion with several people reveals that the
SMP locking primitives used in the VM system aren't likely the mechanism
that we'll be adopting. Even if it were, the locking that was in the code
was very inadequate and would have to be mostly re-done anyway. The
locking in a uni-processor kernel was a no-op but went a long way toward
making the code difficult to read and debug.

5) Places that attempted to kludge-up the fact that we don't have kernel
thread support have been fixed to reflect the reality that we are really
dealing with processes, not threads. The VM system didn't have complete
thread support, so the comments and mis-named routines were just wrong.
We now use tsleep and wakeup directly in the lock routines, for instance.

6) Where appropriate, the pagers have been improved, especially in the
pager_alloc routines. Most of the pager_allocs have been rewritten and
are now faster and easier to maintain.

7) The pagedaemon pageout clustering algorithm has been rewritten and
now tries harder to output an even number of pages before and after
the requested page. This is sort of the reverse of the ideal pagein
algorithm and should provide better overall performance.

8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup
have been removed. Some other unnecessary casts have also been removed.

9) Some almost useless debugging code removed.

10) Terminology of shadow objects vs. backing objects straightened out.
The fact that the vm_object data structure escentially had this
backwards really confused things. The use of "shadow" and "backing
object" throughout the code is now internally consistent and correct
in the Mach terminology.

11) Several minor bug fixes, including one in the vm daemon that caused
0 RSS objects to not get purged as intended.

12) A "default pager" has now been created which cleans up the transition
of objects to the "swap" type. The previous checks throughout the code
for swp->pg_data != NULL were really ugly. This change also provides
the rudiments for future backing of "anonymous" memory by something
other than the swap pager (via the vnode pager, for example), and it
allows the decision about which of these pagers to use to be made
dynamically (although will need some additional decision code to do
this, of course).

13) (dyson) MAP_COPY has been deprecated and the corresponding "copy
object" code has been removed. MAP_COPY was undocumented and non-
standard. It was furthermore broken in several ways which caused its
behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will
continue to work correctly, but via the slightly different semantics
of MAP_PRIVATE.

14) (dyson) Sharing maps have been removed. It's marginal usefulness in a
threads design can be worked around in other ways. Both #12 and #13
were done to simplify the code and improve readability and maintain-
ability. (As were most all of these changes)

TODO:

1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing
this will reduce the vnode pager to a mere fraction of its current size.

2) Rewrite vm_fault and the swap/vnode pagers to use the clustering
information provided by the new haspage pager interface. This will
substantially reduce the overhead by eliminating a large number of
VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be
improved to provide both a "behind" and "ahead" indication of
contiguousness.

3) Implement the extended features of pager_haspage in swap_pager_haspage().
It currently just says 0 pages ahead/behind.

4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps
via a much more general mechanism that could also be used for disk
striping of regular filesystems.

5) Do something to improve the architecture of vm_object_collapse(). The
fact that it makes calls into the swap pager and knows too much about
how the swap pager operates really bothers me. It also doesn't allow
for collapsing of non-swap pager objects ("unnamed" objects backed by
other pagers).


# aa2cabb9 27-Jun-1995 David Greenman <dg@FreeBSD.org>

1) Converted v_vmdata to v_object.
2) Removed unnecessary vm_object_lookup()/pager_cache(object, TRUE) pairs
after vnode_pager_alloc() calls - the object is already guaranteed to be
persistent.
3) Removed some gratuitous casts.


# 98796526 28-Jun-1995 David Greenman <dg@FreeBSD.org>

Fixed VOP_LINK argument order botch.


# 61f5d510 21-May-1995 David Greenman <dg@FreeBSD.org>

Changes to fix the following bugs:

1) Files weren't properly synced on filesystems other than UFS. In some
cases, this lead to lost data. Most likely would be noticed on NFS.
The fix is to make the VM page sync/object_clean general rather than
in each filesystem.
2) Mixing regular and mmaped file I/O on NFS was very broken. It caused
chunks of files to end up as zeroes rather than the intended contents.
The fix was to fix several race conditions and to kludge up the
"b_dirtyoff" and "b_dirtyend" that NFS relies upon - paying attention
to page modifications that occurred via the mmapping.

Reviewed by: David Greenman
Submitted by: John Dyson


# 1469eec8 15-May-1995 David Greenman <dg@FreeBSD.org>

Fixed incompleteness that would allow dirty filesystems to get mounted
when the single user shell was terminated. These changes disallow mounting
or R/W upgrading filesystems that are dirty unless "-f" (force) option
is used with mount. /etc/rc has been modified to abort the startup if
one or more non-nfs partitions fail to mount.

Reviewed by: Poul-Henning Kamp, Rod Grimes


# c9ae46b1 02-May-1995 David Greenman <dg@FreeBSD.org>

Removed unused variable caused by last commit.


# beef0195 02-May-1995 David Greenman <dg@FreeBSD.org>

Fix for sync() to close a potential panic with accessing a mount struct
that had been freed.

Submitted by: John Dyson


# 2547597b 29-Mar-1995 David Greenman <dg@FreeBSD.org>

Added a set of braces to make the compiler happy.


# ab828ab8 19-Mar-1995 David Greenman <dg@FreeBSD.org>

Moved call to vnode_pager_uncache in rename() to before the VOP_RENAME.
It was previously after the VOP_RENAME and the reference and lock on
the vnode had already been lost, allowing interesting internel
inconsistencies. This is one of the two reasons why freefall was crashing
every hour or two (the other being nullfs bugs).
Don't call vnode_pager_uncache in revoke(). revoke() is only allowed on
VCHR and VBLK vnodes.


# b5e8ce9f 16-Mar-1995 Bruce Evans <bde@FreeBSD.org>

Add and move declarations to fix all of the warnings from `gcc -Wimplicit'
(except in netccitt, netiso and netns) and most of the warnings from
`gcc -Wnested-externs'. Fix all the bugs found. There were no serious
ones.


# 519b3d1a 27-Feb-1995 David Greenman <dg@FreeBSD.org>

Do a vnode_pager_uncache after the VOP_RENAME to lose the remaining
reference to the old vnode.

Suggested by: Bruce Evans


# 2655f626 13-Feb-1995 David Greenman <dg@FreeBSD.org>

In sync(), don't dereference the proc pointer if it's NULL. Should fix
most or all of the problems with calling sync() without a curproc (which
can happen in machdep.c during a panic sync).


# c4a7b7e1 04-Nov-1994 David Greenman <dg@FreeBSD.org>

From tim@cs.city.ac.uk (Tim Wilkinson):

Find enclosed a short bugfix to get the union filesystem up and running
in FreeBSD-current. We don't think we've got all the problems yet but
these fixes sort out the major ones (which mostly concert bad locking
of vnodes), no doubt we'll post others as necessary. Known problems
include the inability of the umount command (not the system call) to unmount
unions in certain circumstances (this is due the way "realpath" works),
and the failure of direntries to always get all available files in
unioned subdirectories. We are, as they say, working on it.

Submitted by: tim@cs.city.ac.uk (Tim Wilkinson)


# 091b0456 20-Oct-1994 Garrett Wollman <wollman@FreeBSD.org>

Make my ALLDEVS kernel compile (basically, LINT minus a lot of options).

This involves fixing a few things I broke last time.


# 17b9f9f4 14-Oct-1994 Poul-Henning Kamp <phk@FreeBSD.org>

Fix the problem with panics when mounting on nonexistant directories. Probably
my fault in the first place...


# 99ec0d5b 11-Oct-1994 Søren Schmidt <sos@FreeBSD.org>

Removed static declaration of getvnode() (used in ibcs2)


# dcd01eb3 08-Oct-1994 Poul-Henning Kamp <phk@FreeBSD.org>

Cosmetics: added ()'s and fixed prinf-formats to make gcc silent.


# 8e58bf68 05-Oct-1994 David Greenman <dg@FreeBSD.org>

Stuff object into v_vmdata rather than pager. Not important which at
the moment, but will be in the future. Other changes mostly cosmetic,
but are made for future VMIO considerations.

Submitted by: John Dyson


# 797f2d22 02-Oct-1994 Poul-Henning Kamp <phk@FreeBSD.org>

All of this is cosmetic. prototypes, #includes, printfs and so on. Makes
GCC a lot more silent.


# 9abf4d6e 28-Sep-1994 Doug Rabson <dfr@FreeBSD.org>

Make NFS ask the filesystems for directory cookies instead of making them
itself.


# c9b1d604 22-Sep-1994 Garrett Wollman <wollman@FreeBSD.org>

More loadable VFS changes:

- Make a number of filesystems work again when they are statically compiled
(blush)

- FIFOs are no longer optional; ``options FIFO'' removed from distributed
config files.


# c901836c 20-Sep-1994 Garrett Wollman <wollman@FreeBSD.org>

Implemented loadable VFS modules, and made most existing filesystems
loadable. (NFS is a notable exception.)


# 8fceb1ba 02-Sep-1994 David Greenman <dg@FreeBSD.org>

Disallow truncating to negative file sizes. Doing so causes ffs_truncate()
and perhaps other fs truncate's to go crazy and panic the machine or worse.
This fixes the truncate bug reported by Michael Class.


# 2fc62994 01-Sep-1994 David Greenman <dg@FreeBSD.org>

Make olstat() consistent with lstat() - so they both return the same
owner..

Submitted by: Kirk McKusick


# e0e9c421 20-Aug-1994 David Greenman <dg@FreeBSD.org>

Implemented filesystem clean bit via:

machdep.c:
Changed printf's a little and call vfs_unmountall() if the sync was
successful.

cd9660_vfsops.c, ffs_vfsops.c, nfs_vfsops.c, lfs_vfsops.c:
Allow dismount of root FS. It is now disallowed at a higher level.

vfs_conf.c:
Removed unused rootfs global.

vfs_subr.c:
Added new routines vfs_unmountall and vfs_unmountroot. Filesystems
are now dismounted if the machine is properly rebooted.

ffs_vfsops.c:
Toggle clean bit at the appropriate places. Print warning if an
unclean FS is mounted.

ffs_vfsops.c, lfs_vfsops.c:
Fix bug in selecting proper flags for VOP_CLOSE().

vfs_syscalls.c:
Disallow dismounting root FS via umount syscall.


# 3c4dd356 02-Aug-1994 David Greenman <dg@FreeBSD.org>

Added $Id$


# 26f9a767 25-May-1994 Rodney W. Grimes <rgrimes@FreeBSD.org>

The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.

Reviewed by: Rodney W. Grimes
Submitted by: John Dyson and David Greenman


# df8bae1d 24-May-1994 Rodney W. Grimes <rgrimes@FreeBSD.org>

BSD 4.4 Lite Kernel Sources