History log of /freebsd-current/sys/sys/mount.h
Revision Date Author Comments
# 2496fb72 02-Mar-2024 Konstantin Belousov <kib@FreeBSD.org>

sys/mount.h: align values of MNTK_XXX flags

Sponsored by: The FreeBSD Foundation
MFC after: 3 days


# f76cb7bd 30-Jan-2024 Konstantin Belousov <kib@FreeBSD.org>

sys/mount.h: use __inline

instead of plain inline, for C89

Reported by: antoine
Sponsored by: The FreeBSD Foundation
MFC after: 3 days


# 3334a537 26-Dec-2023 Konstantin Belousov <kib@FreeBSD.org>

Convert fsidcmp(9) from macro to inline function

This allows type checking the arguments.

Explicit structure members comparisions are done to avoid introducting
string.h pollution for userspace.

Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D43205


# 29363fb4 23-Nov-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove ancient SCCS tags.

Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.

Sponsored by: Netflix


# f5f27772 23-Nov-2023 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: Fix NFS access to .zfs/snapshot snapshots

When a process attempts to access a snapshot under
/<dataset>/.zfs/snapshot, the snapshot is automounted.
However, without this patch, the automount does not
set mnt_exjail, which results in the snapshot not being
accessible over NFS.

This patch defines a new function called vfs_exjail_clone()
which sets mnt_exjail from another mount point and
then uses that function to set mnt_exjail in the snapshot
automount. A separate patch that is currently a pull request
for OpenZFS, calls this function to fix the problem.

PR: 275200
Reviewed by: markj
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D42672


# 2ff63af9 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .h pattern

Remove /^\s*\*+\s*\$FreeBSD\$.*$\n/


# 88175af8 21-Feb-2023 Rick Macklem <rmacklem@FreeBSD.org>

vfs_export: Add mnt_exjail to control exports done in prisons

If there are multiple instances of mountd(8) (in different
prisons), there will be confusion if they manipulate the
exports of the same file system. This patch adds mnt_exjail
to "struct mount" so that the credentials (and, therefore,
the prison) that did the exports for that file system can
be recorded. If another prison has already exported the
file system, vfs_export() will fail with an error.
If mnt_exjail == NULL, the file system has not been exported.
mnt_exjail is checked by the NFS server, so that exports done
from within a different prison will not be used.

The patch also implements vfs_exjail_destroy(), which is
called from prison_cleanup() to release all the mnt_exjail
credential references, so that the prison can be removed.
Mainly to avoid doing a scan of the mountlist for the case
where there were no exports done from within the prison,
a count of how many file systems have been exported from
within the prison is kept in pr_exportcnt.

Reviewed by: markj
Discussed with: jamie
Differential Revision: https://reviews.freebsd.org/D38371
MFC after: 3 months


# db565512 04-Feb-2023 Rick Macklem <rmacklem@FreeBSD.org>

vfs_mount.c: Free exports structures in vfs_destroy_mount()

During testing of exporting file systems in jails, I
noticed that the export structures on a mount
were not being free'd when the mount is dismounted.

This bug appears to have been in the system for a
very long time. It would have resulted in a slow memory
leak when exported file systems were dismounted.

Prior to r362158, freeing the structures during dismount
would not have been safe, since VFS_CHECKEXP() returned
a pointer into an export structure, which might still have been
used by the NFS server for an in-progress RPC when the file system
is dismounted. r362158 fixed this, so it should now be safe
to free the structures in vfs_mount_destroy(), which is what
this patch does.

Reviewed by: kib
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D38385


# d94e0bdc 04-Feb-2023 Rick Macklem <rmacklem@FreeBSD.org>

Revert "vfs_export: Add checks for correct prison when updating exports"

This reverts commit 7926a01ed7ae7cefd81ef4cc2142c35b84d81913.

A new patch in D38371 is being considered for doing this.


# 7926a01e 02-Feb-2023 Rick Macklem <rmacklem@FreeBSD.org>

vfs_export: Add checks for correct prison when updating exports

mountd(8) basically does the following:
getmntinfo()
for each mount
delete_exports
using nmount(2) to do the creation/deletion of individual exports.

For prison0 (and for other prisons if enforce_statfs == 0) getmntinfo()
returns all mount points, including ones being used within other prisons.
This can cause confusion if the same file system is specified in the
exports(5) file for multiple prisons.

This patch adds a perminent identifier to each prison
and marks which prison did the exports in a field of
the mount structure called mnt_exjail. This field can
then be compared to the perminent identifier for the
prison that the thread's credentials is in.
Also required was a new function called prison_isalive_permid()
which returns if the prison is alive, so that the check can be
ignored for prisons that have been removed.

This prepares the system to allow mountd(8) to run in multiple
prisons, including prison0.

Future commits will complete the modifications to allow mountd(8)
to run in vnet prisons. Until then, these changes should not affect
semantics.

Reviewed by: markj
MFC after: 3 months
Differential Revision: https://reviews.freebsd.org/D38144


# 521fbb72 23-Nov-2022 Doug Rabson <dfr@FreeBSD.org>

Add support for mounting single files in nullfs

The main use-case for this is to support mounting config files and
secrets into OCI containers. My current workaround copies the files into
the container which is messy and risks secrets leaking into container
images if the cleanup fails.

This adds a VFCF flag to indicate whether the filesystem supports file
mounts and allows fspath to be either a directory or a file if the flag
is set.

Test Plan:
$ sudo mkdir -p /mnt
$ sudo touch /mnt/foo
$ sudo mount -t nullfs /COPYRIGHT /mnt/foo

Reviewed by: mjg, kib
Tested by: pho


# ce00b119 14-Jun-2022 Doug Ambrisko <ambrisko@FreeBSD.org>

mount: revert the active vnode reporting feature

Revert the computing of active vnode reporting since statfs is used
by a lot of tools. Only report the vnodes used.

Reported by: mjg


# 6468cd8e 13-Jun-2022 Doug Ambrisko <ambrisko@FreeBSD.org>

mount: add vnode usage per file system with mount -v

This avoids the need to drop into the ddb to figure out vnode
usage per file system. It helps to see if they are or are not
being freed. Suggestion to report active vnode count was from
kib@

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D35436


# eca39864 01-Apr-2022 Konstantin Belousov <kib@FreeBSD.org>

Add sysctl KERN_LOCKF

reporting the shapshot of the active advisory locks.

A new VFS ops method vfs_report_lockf if provided in the mount point
op table. If it is NULL, as it is currently for all existing
filesystems, vfs_report_lockf() function is used, which gathers
information from the standard implementation inside kern/kern_lockf.c.

Filesystems implementing its own locking (NFSv4 as example) can provide
a custom implementation.

Reviewed by: markj, rmacklem
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D34756


# eb574ba0 19-Mar-2022 Mateusz Guzik <mjg@FreeBSD.org>

vfs: replace VFS_NOTIFY_UPPER_* macros with an enum


# 93a0ba8f 17-Sep-2021 Mateusz Guzik <mjg@FreeBSD.org>

vfs: retire the no longer used MNTK_LOOKUP_EXCL_DOTDOT flag

Reviewed by: markj
Tested by: pho (previous version)
Differential Revision: https://reviews.freebsd.org/D34466


# 1cb0045c 07-Mar-2022 Mateusz Guzik <mjg@FreeBSD.org>

vfs: add MNTK_UNLOCKED_INSMNTQUE

Can be used when the fs at hand can synchronize insmntque with other
means than the vnode lock.

Reviewed by: markj
Tested by: pho (previous version)
Differential Revision: https://reviews.freebsd.org/D34466


# 4a4b059a 25-Dec-2021 Konstantin Belousov <kib@FreeBSD.org>

Add vfs_remount_ro()

a helper to remount filesystem from rw to ro.

Tested by: pho
Reviewed by: markj, mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D33721


# dd2f6e14 10-Dec-2021 Mateusz Guzik <mjg@FreeBSD.org>

vfs: annotate all unused MNTK_ flags


# 4dd23ae1 10-Dec-2021 Mateusz Guzik <mjg@FreeBSD.org>

vfs: retire MNTK_NOKNOTE and VV_NOKNOTE

MNTK_NOKNOTE was introduced in 679985d03a64f5dfb4355538ae6e3b70f8347f38
(dated 2005), VV_NOKNOTE in 34cc826ae8999f454dd6cb9c77d17ce83b169f92 few
months later.

Neither was ever used by anything in the tree.


# 4dcdf398 17-May-2021 Mateusz Guzik <mjg@FreeBSD.org>

vfs: replace the MNTK_TEXT_REFS flag with VIRF_TEXT_REF

This allows to stop maintaing the VI_TEXT_REF flag and consequently
opens up fully lockless v_writecount adjustment.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D33127


# 8981a100 20-Nov-2021 Robert Wing <rew@FreeBSD.org>

mount: retire kernel_vmount()

The last usage of this function was removed in e3b1c847a4237ad9.

There are no in-tree consumers of kernel_vmount().

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D32607


# 311b95bb 23-Oct-2021 Robert Wing <rew@FreeBSD.org>

sys/mount.h: remove dead prototype

vfs_getrootfsid() was removed in 245efbba4d6a3e60a0d6d16d18d9a5fad6260733

Reviewed by: mjg
Differential Revision: https://reviews.freebsd.org/D32606


# a8c732f4 07-Aug-2021 Jason A. Harmening <jah@FreeBSD.org>

VFS: add retry limit and delay for failed recursive unmounts

A forcible unmount attempt may fail due to a transient condition, but
it may also fail due to some issue in the filesystem implementation
that will indefinitely prevent successful unmount. In such a case,
the retry logic in the recursive unmount facility will cause the
deferred unmount taskqueue to execute constantly.

Avoid this scenario by imposing a retry limit, with a default value
of 10, beyond which the recursive unmount facility will emit a log
message and give up. Additionally, introduce a grace period, with
a default value of 1s, between successive unmount retries on the
same mount.

Create a new sysctl node, vfs.deferred_unmount, to export the total
number of failed recursive unmount attempts since boot, and to allow
the retry limit and retry grace period to be tuned.

Reviewed by: kib (earlier revision), mkusick
Differential Revision: https://reviews.freebsd.org/D31450


# c66e9307 14-Aug-2021 Piotr Pawel Stefaniak <pstef@FreeBSD.org>

mount.h: improve a comment about flags

The comment only specifies MNT_ROOTFS - which is set by the kernel when
mounting its root file system. So it's not clear if any other flags
are not quite right and for what reason.


# 2bc16e8a 17-Jul-2021 Jason A. Harmening <jah@FreeBSD.org>

VFS: remove MNTK_MARKER

We no longer allow upper filesystems to be unregistered from the base
mount while vfs_notify_upper() or any other upper operation is pending.
New upper mounts can still be registered during this period, but they
will be added at the end of the upper mount tailq. We therefore no
longer need to allocate marker nodes during vfs_notify_upper() to keep
our place in the iteration.

Reviewed by: kib, mckusick
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D31016


# c746ed72 12-Jun-2021 Jason A. Harmening <jah@FreeBSD.org>

Allow stacked filesystems to be recursively unmounted

In certain emergency cases such as media failure or removal, UFS will
initiate a forced unmount in order to prevent dirty buffers from
accumulating against the no-longer-usable filesystem. The presence
of a stacked filesystem such as nullfs or unionfs above the UFS mount
will prevent this forced unmount from succeeding.

This change addreses the situation by allowing stacked filesystems to
be recursively unmounted on a taskqueue thread when the MNT_RECURSE
flag is specified to dounmount(). This call will block until all upper
mounts have been removed unless the caller specifies the MNT_DEFERRED
flag to indicate the base filesystem should also be unmounted from the
taskqueue.

To achieve this, the recently-added vfs_pin_from_vp()/vfs_unpin() KPIs
have been combined with the existing 'mnt_uppers' list used by nullfs
and renamed to vfs_register_upper_from_vp()/vfs_unregister_upper().
The format of the mnt_uppers list has also been changed to accommodate
filesystems such as unionfs in which a given mount may be stacked atop
more than one lower mount. Additionally, management of lower FS
reclaim/unlink notifications has been split into a separate list
managed by a separate set of KPIs, as registration of an upper FS no
longer implies interest in these notifications.

Reviewed by: kib, mckusick
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D31016


# 59409cb9 17-May-2021 Jason A. Harmening <jah@FreeBSD.org>

Add a generic mechanism for preventing forced unmount

This is aimed at preventing stacked filesystems like nullfs and unionfs
from "losing" their lower mounts due to forced unmount. Otherwise,
VFS operations that are passed through to the lower filesystem(s) may
crash or otherwise cause unpredictable behavior.

Introduce two new functions: vfs_pin_from_vp() and vfs_unpin().
which are intended to be called on the lower mount(s) when the stacked
filesystem is mounted and unmounted, respectively.
Much as registration in the mnt_uppers list previously did, pinning
will prevent even forced unmount of the lower FS and will allow the
stacked FS to freely operate on the lower mount either by direct
use of the struct mount* or indirect use through a properly-referenced
vnode's v_mount field.

vfs_pin_from_vp() is modeled after vfs_ref_from_vp() in that it uses
the mount interlock coupled with re-checking vp->v_mount to ensure
that it will fail in the face of a pending unmount request, even if
the concurrent unmount fully completes.

Adopt these new functions in both nullfs and unionfs.

Reviewed By: kib, markj
Differential Revision: https://reviews.freebsd.org/D30401


# a4b07a27 11-May-2021 Jason A. Harmening <jah@FreeBSD.org>

VFS_QUOTACTL(9): allow implementation to indicate busy state changes

Instead of requiring all implementations of vfs_quotactl to unbusy
the mount for Q_QUOTAON and Q_QUOTAOFF, add an "mp_busy" in/out param
to VFS_QUOTACTL(9). The implementation may then indicate to the caller
whether it needed to unbusy the mount.

Also, add stbool.h to libprocstat modules which #define _KERNEL
before including sys/mount.h. Otherwise they'll pull in sys/types.h
before defining _KERNEL and therefore won't have the bool definition
they need for mp_busy.

Reviewed By: kib, markj
Differential Revision: https://reviews.freebsd.org/D30556


# 271fcf1c 29-May-2021 Jason A. Harmening <jah@FreeBSD.org>

Revert commits 6d3e78ad6c11 and 54256e7954d7

Parts of libprocstat like to pretend they're kernel components for the
sake of including mount.h, and including sys/types.h in the _KERNEL
case doesn't fix the build for some reason. Revert both the
VFS_QUOTACTL() change and the follow-up "fix" for now.


# 54256e79 29-May-2021 Jason A. Harmening <jah@FreeBSD.org>

Fix userspace build after commit 6d3e78ad6c11

Reported by: jenkins


# 6d3e78ad 11-May-2021 Jason A. Harmening <jah@FreeBSD.org>

VFS_QUOTACTL(9): allow implementation to indicate busy state changes

Instead of requiring all implementations of vfs_quotactl to unbusy
the mount for Q_QUOTAON and Q_QUOTAOFF, add an "mp_busy" in/out param
to VFS_QUOTACTL(9). The implementation may then indicate to the caller
whether it needed to unbusy the mount.

Reviewed By: kib, markj
Differential Revision: https://reviews.freebsd.org/D30218


# f784da88 17-May-2021 Konstantin Belousov <kib@FreeBSD.org>

Move mnt_maxsymlinklen into appropriate fs mount data structures

Reviewed by: mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
X-MFC-Note: struct mount layout
Differential revision: https://reviews.freebsd.org/D30325


# 9a2fac6b 16-May-2021 Kirk McKusick <mckusick@FreeBSD.org>

Fix handling of embedded symbolic links (and history lesson).

The original filesystem release (4.2BSD) had no embedded sysmlinks.
Historically symbolic links were just a different type of file, so
the content of the symbolic link was contained in a single disk block
fragment. We observed that most symbolic links were short enough that
they could fit in the area of the inode that normally holds the block
pointers. So we created embedded symlinks where the content of the
link was held in the inode's pointer area thus avoiding the need to
seek and read a data fragment and reducing the pressure on the block
cache. At the time we had only UFS1 with 32-bit block pointers,
so the test for a fastlink was:

di_size < (NDADDR + NIADDR) * sizeof(daddr_t)

(where daddr_t would be ufs1_daddr_t today).

When embedded symlinks were added, a spare field in the superblock
with a known zero value became fs_maxsymlinklen. New filesystems
set this field to (NDADDR + NIADDR) * sizeof(daddr_t). Embedded
symlinks were assumed when di_size < fs->fs_maxsymlinklen. Thus
filesystems that preceeded this change always read from blocks
(since fs->fs_maxsymlinklen == 0) and newer ones used embedded
symlinks if they fit. Similarly symlinks created on pre-embedded
symlink filesystems always spill into blocks while newer ones will
embed if they fit.

At the same time that the embedded symbolic links were added, the
on-disk directory structure was changed splitting the former
u_int16_t d_namlen into u_int8_t d_type and u_int8_t d_namlen.
Thus fs_maxsymlinklen <= 0 (as used by the OFSFMT() macro) can
be used to distinguish old directory formats. In retrospect that
should have just been an added flag, but we did not realize we
needed to know about that change until it was already in production.

Code was split into ufs/ffs so that the log structured filesystem could
use ufs functionality while doing its own disk layout. This meant
that no ffs superblock fields could be used in the ufs code. Thus
ffs superblock fields that were needed in ufs code had to be copied
to fields in the mount structure. Since ufs_readlink needed to know
if a link was embedded, fs_maxlinklen gets copied to mnt_maxsymlinklen.

The kernel panic that arose to making this fix was triggered when a
disk error created an inode of type symlink with no allocated data
blocks but a large size. When readlink was called the uiomove was
attempted which segment faulted.

static int
ufs_readlink(ap)
struct vop_readlink_args /* {
struct vnode *a_vp;
struct uio *a_uio;
struct ucred *a_cred;
} */ *ap;
{
struct vnode *vp = ap->a_vp;
struct inode *ip = VTOI(vp);
doff_t isize;

isize = ip->i_size;
if ((isize < vp->v_mount->mnt_maxsymlinklen) ||
DIP(ip, i_blocks) == 0) { /* XXX - for old fastlink support */
return (uiomove(SHORTLINK(ip), isize, ap->a_uio));
}
return (VOP_READ(vp, ap->a_uio, 0, ap->a_cred));
}

The second part of the "if" statement that adds

DIP(ip, i_blocks) == 0) { /* XXX - for old fastlink support */

is problematic. It never appeared in BSD released by Berkeley because
as noted above mnt_maxsymlinklen is 0 for old format filesystems, so
will always fall through to the VOP_READ as it should. I had to dig
back through `git blame' to find that Rodney Grimes added it as
part of ``The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.''
He must have brought it across from an earlier FreeBSD. Unfortunately
the source-control logs for FreeBSD up to the merger with the
AT&T-blessed 4.4BSD-Lite conversion were destroyed as part of the
agreement to let FreeBSD remain unencumbered, so I cannot pin-point
where that line got added on the FreeBSD side.

The one change needed here is that mnt_maxsymlinklen is declared as
an `int' and should be changed to be `u_int64_t'.

This discovery led us to check out the code that deletes symbolic
links. Specifically

if (vp->v_type == VLNK &&
(ip->i_size < vp->v_mount->mnt_maxsymlinklen ||
datablocks == 0)) {
if (length != 0)
panic("ffs_truncate: partial truncate of symlink");
bzero(SHORTLINK(ip), (u_int)ip->i_size);
ip->i_size = 0;
DIP_SET(ip, i_size, 0);
UFS_INODE_SET_FLAG(ip, IN_SIZEMOD | IN_CHANGE | IN_UPDATE);
if (needextclean)
goto extclean;
return (ffs_update(vp, waitforupdate));
}

Here too our broken symlink inode with no data blocks allocated
and a large size will segment fault as we are incorrectly using the
test that we have no data blocks to decide that it is an embdedded
symbolic link and attempting to bzero past the end of the inode.
The test for datablocks == 0 is unnecessary as the test for
ip->i_size < vp->v_mount->mnt_maxsymlinklen will do the right
thing in all cases.

The test for datablocks == 0 was added by David Greenman in this commit:

Author: David Greenman <dg@FreeBSD.org>
Date: Tue Aug 2 13:51:05 1994 +0000

Completed (hopefully) the kernel support for old style "fastlinks".

Notes:
svn path=/head/; revision=1821

I am guessing that he likely earlier added the incorrect test in the
ufs_readlink code.

I asked David if he had any recollection of why he made this change.
Amazingly, he still had a recollection of why he had made a one-line
change more than twenty years ago. And unsurpisingly it was because
he had been stuck between a rock and a hard place.

FreeBSD was up to 1.1.5 before the switch to the 4.4BSD-Lite code
base. Prior to that, there were three years of development in all
areas of the kernel, including the filesystem code, from the combined
set of people including Bill Jolitz, Patchkit contributors, and
FreeBSD Project members. The compatibility issue at hand was caused
by the FASTLINKS patches from Curt Mayer. In merging in the 4.4BSD-Lite
changes David had to find a way to provide compatibility with both
the changes that had been made in FreeBSD 1.1.5 and with 4.4BSD-Lite.
He felt that these changes would provide compatibility with both systems.

In his words:
``My recollection is that the 'FASTLINKS' symlinks support in
FreeBSD-1.x, as implemented by Curt Mayer, worked differently than
4.4BSD. He used a spare field in the inode to duplicately store the
length. When the 4.4BSD-Lite merge was done, the optimized symlinks
support for existing filesystems (those that were initialized in
FreeBSD-1.x) were broken due to the FFS on-disk structure of
4.4BSD-Lite differing from FreeBSD-1.x. My commit was needed to
restore the backward compatibility with FreeBSD-1.x filesystems.
I think it was the best that could be done in the somewhat urgent
circumstances of the post Berkeley-USL settlement. Also, regarding
Rod's massive commit with little explanation, some context: John
Dyson and I did the initial re-port of the 4.4BSD-Lite kernel to
the 386 platform in just 10 days. It was by far the most intense
hacking effort of my life. In addition to the porting of tons of
FreeBSD-1 code, I think we wrote more than 30,000 lines of new code
in that time to deal with the missing pieces and architectural
changes of 4.4BSD-Lite. We didn't make many notes along the way.
There was a lot of pressure to get something out to the rest of the
developer community as fast as possible, so detailed discrete commits
didn't happen - it all came as a giant wad, which is why Rod's
commit message was worded the way it was.''

Reported by: Chuck Silvers
Tested by: Chuck Silvers
History by: David Greenman Lawrence
MFC after: 1 week
Sponsored by: Netflix


# 5af1131d 08-Apr-2021 Konstantin Belousov <kib@FreeBSD.org>

struct mount uppers: correct locking annotations

It is all locked by the uppers' interlock.

Noted by: Alexander Lochmann <alexander.lochmann@tu-dortmund.de>
Sponsored by: The FreeBSD Foundation
MFC after: 3 days


# b5449c92 26-Feb-2021 Konstantin Belousov <kib@FreeBSD.org>

Use atomic_interrupt_fence() instead of bare __compiler_membar()

for the which which definitely use membar to sync with interrupt handlers.

libc and rtld uses of __compiler_membar() seems to want compiler barriers
proper.

The barrier in sched_unpin_lite() after td_pinned decrement seems to be not
needed and removed, instead of convertion.

Reviewed by: markj
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D28956


# d485c77f 18-Feb-2021 Konstantin Belousov <kib@FreeBSD.org>

Remove #define _KERNEL hacks from libprocstat

Make sys/buf.h, sys/pipe.h, sys/fs/devfs/devfs*.h headers usable in
userspace, assuming that the consumer has an idea what it is for.
Unhide more material from sys/mount.h and sys/ufs/ufs/inode.h,
sys/ufs/ufs/ufsmount.h for consumption of userspace tools, with the
same caveat.

Remove unacceptable hack from usr.sbin/makefs which relied on sys/buf.h
being unusable in userspace, where it override struct buf with its own
definition. Instead, provide struct m_buf and struct m_vnode and adapt
code to use local variants.

Reviewed by: mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D28679


# a15f787a 15-Feb-2021 Mateusz Guzik <mjg@FreeBSD.org>

vfs: add vfs_ref_from_vp

This generalizes what vop_stdgetwritemount used to be doing.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D28695


# f6dd1aef 09-Nov-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: group mount per-cpu vars into one struct

While here move frequently read stuff into the same cacheline.

This shrinks struct mount by 64 bytes.

Tested by: pho


# f1084587 05-Nov-2020 Konstantin Belousov <kib@FreeBSD.org>

Suspend all writeable local filesystems on power suspend.

This ensures that no writes are pending in memory, either metadata or
user data, but not including dirty pages not yet converted to fs writes.

Only filesystems declared local are suspended.

Note that this does not guarantee absence of the metadata errors or
leaks if resume is not done: for instance, on UFS unlinked but opened
inodes are leaked and require fsck to gc.

Reviewed by: markj
Discussed with: imp
Tested by: imp (previous version), pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D27054


# ad89066a 17-Oct-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: annotate mountlist_mtx with __exclusive_cache_line


# c5ce27ba 26-Aug-2020 Rick Macklem <rmacklem@FreeBSD.org>

Add MNT_EXTLSxxx flags that will be used for NFS over TLS exports.

These flags are not currently used, but will be used by future commits to
implement export(5) requirements for the use of NFS over TLS by clients.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D26180


# 08b242ae 19-Aug-2020 Warner Losh <imp@FreeBSD.org>

Move the mount name to bit mapping into sys/mount.h so it can be shared with the
kernel.

Discussed with: kib@
Reviewed by: kirk@ (prior version)
Sponsored by: Netflix
Diffential Revision: https://reviews.freebsd.org/D25969


# 17a66c70 04-Aug-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: add vfs_op_thread_enter/exit _crit variants

and employ them in the namecache. Eliminates all spurious checks for preemption.


# 07d2145a 25-Jul-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: add the infrastructure for lockless lookup

Reviewed by: kib
Tested by: pho (in a patchset)
Differential Revision: https://reviews.freebsd.org/D25577


# 9d6fc996 13-Jun-2020 Rick Macklem <rmacklem@FreeBSD.org>

Oops, r362158 committed a duplicate definition of MAXSECFLAVORS.

This patch gets rid of the duplicate.


# 1f7104d7 13-Jun-2020 Rick Macklem <rmacklem@FreeBSD.org>

Fix export_args ex_flags field so that is 64bits, the same as mnt_flags.

Since mnt_flags was upgraded to 64bits there has been a quirk in
"struct export_args", since it hold a copy of mnt_flags
in ex_flags, which is an "int" (32bits).
This happens to currently work, since all the flag bits used in ex_flags are
defined in the low order 32bits. However, new export flags cannot be defined.
Also, ex_anon is a "struct xucred", which limits it to 16 additional groups.
This patch revises "struct export_args" to make ex_flags 64bits and replaces
ex_anon with ex_uid, ex_ngroups and ex_groups (which points to a
groups list, so it can be malloc'd up to NGROUPS in size.
This requires that the VFS_CHECKEXP() arguments change, so I also modified the
last "secflavors" argument to be an array pointer, so that the
secflavors could be copied in VFS_CHECKEXP() while the export entry is locked.
(Without this patch VFS_CHECKEXP() returns a pointer to the secflavors
array and then it is used after being unlocked, which is potentially
a problem if the exports entry is changed.
In practice this does not occur when mountd is run with "-S",
but I think it is worth fixing.)

This patch also deleted the vfs_oexport_conv() function, since
do_mount_update() does the conversion, as required by the old vfs_cmount()
calls.

Reviewed by: kib, freqlabs
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D25088


# 693d10a2 03-Jun-2020 Ryan Moeller <freqlabs@FreeBSD.org>

tmpfs: Preserve alignment of struct fid fields

On 64-bit platforms, the two short fields in `struct tmpfs_fid` are padded to
the 64-bit alignment of the long field. This pushes the offsets of the
subsequent fields by 4 bytes and makes `struct tmpfs_fid` bigger than
`struct fid`. `tmpfs_vptofh()` casts a `struct fid *` to `struct tmpfs_fid *`,
causing 4 bytes of adjacent memory to be overwritten when the struct fields are
set. Through several layers of indirection and embedded structs, the adjacent
memory for one particular call to `tmpfs_vptofh()` happens to be the stack
canary for `nfsrvd_compound()`. Half of the canary ends up being clobbered,
going unnoticed until eventually the stack check fails when `nfsrvd_compound()`
returns and a panic is triggered.

Instead of duplicating fields of `struct fid` in `struct tmpfs_fid`, narrow the
struct to cover only the unique fields for tmpfs and assert at compile time
that the struct fits in the allotted space. This way we don't have to
replicate the offsets of `struct fid` fields, we just use them directly.

Reviewed by: kib, mav, rmacklem
Approved by: mav (mentor)
MFC after: 1 week
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D25077


# 245bfd34 20-May-2020 Ryan Moeller <freqlabs@FreeBSD.org>

Deduplicate fsid comparisons

Comparing fsid_t objects requires internal knowledge of the fsid structure
and yet this is duplicated across a number of places in the code.

Simplify by creating a fsidcmp function (macro).

Reviewed by: mjg, rmacklem
Approved by: mav (mentor)
MFC after: 1 week
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D24749


# f15ccf88 06-Mar-2020 Chuck Silvers <chs@FreeBSD.org>

Add a new "mntfs" pseudo file system which provides private device vnodes for
file systems to safely access their disk devices, and adapt FFS to use it.
Also add a new BO_NOBUFS flag to allow enforcing that file systems using
mntfs vnodes do not accidentally use the original devfs vnode to create buffers.

Reviewed by: kib, mckusick
Approved by: imp (mentor)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D23787


# 123c5197 12-Feb-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: switch to smp_rendezvous_cpus_retry for vfs_op_thread_enter/exit

In particular on amd64 this eliminates an atomic op in the common case,
trading it for IPIs in the uncommon case of catching CPUs executing the
code while the filesystem is getting suspended or unmounted.


# 8f2b73dc 07-Feb-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: use newly added zpcpu routines instead of direct access where appropriate


# d3cc5354 17-Jan-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: provide F_ISUNIONSTACK as a kludge for libc

Prior to introduction of this op libc's readdir would call fstatfs(2), in
effect unnecessarily copying kilobytes of data just to check fs name and a
mount flag.

Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D23162


# cc3593fb 12-Jan-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: rework vnode list management

The current notion of an active vnode is eliminated.

Vnodes transition between 0<->1 hold counts all the time and the
associated traversal between different lists induces significant
scalability problems in certain workloads.

Introduce a global list containing all allocated vnodes. They get
unlinked only when UMA reclaims memory and are only requeued when
hold count reaches 0.

Sample result from an incremental make -s -j 104 bzImage on tmpfs:
stock: 118.55s user 3649.73s system 7479% cpu 50.382 total
patched: 122.38s user 1780.45s system 6242% cpu 30.480 total

Reviewed by: jeff
Tested by: pho (in a larger patch, previous version)
Differential Revision: https://reviews.freebsd.org/D22997


# 57083d25 12-Jan-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: add per-mount vnode lazy list and use it for deferred inactive + msync

This obviates the need to scan the entire active list looking for vnodes
of interest.

msync is handled by adding all vnodes with write count to the lazy list.

deferred inactive directly adds vnodes as it sets the VI_DEFINACT flag.

Vnodes get dequeued from the list when their hold count reaches 0.

Newly added MNT_VNODE_FOREACH_LAZY* macros support filtering so that
spurious locking is avoided in the common case.

Reviewed by: jeff
Tested by: pho (in a larger patch, previous version)
Differential Revision: https://reviews.freebsd.org/D22995


# c8b3463d 07-Jan-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: reimplement deferred inactive to use a dedicated flag (VI_DEFINACT)

The previous behavior of leaving VI_OWEINACT vnodes on the active list without
a hold count is eliminated. Hold count is kept and inactive processing gets
explicitly deferred by setting the VI_DEFINACT flag. The syncer is then
responsible for vdrop.

Reviewed by: kib (previous version)
Tested by: pho (in a larger patch, previous version)
Differential Revision: https://reviews.freebsd.org/D23036


# 5b87ecc6 22-Oct-2019 Konstantin Belousov <kib@FreeBSD.org>

Assert that vnode_pager_setsize() is called with the vnode exclusively locked

except for filesystems that set the MNTK_VMSETSIZE_BUG, Set the flag for ZFS.

Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D21883


# d1cbf3ee 13-Oct-2019 Mateusz Guzik <mjg@FreeBSD.org>

vfs: add MNTK_NOMSYNC

On many filesystems the traversal is effectively a no-op. Add a way to avoid
the overhead.

Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D22009


# dc20b834 06-Oct-2019 Mateusz Guzik <mjg@FreeBSD.org>

vfs: add optional root vnode caching

Root vnodes looekd up all the time, e.g. when crossing a mount point.
Currently used routines always perform a costly lookup which can be
trivially avoided.

Reviewed by: jeff (previous version), kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21646


# ba7a55d9 22-Sep-2019 Sean Eric Fagan <sef@FreeBSD.org>

Add two options to allow mount to avoid covering up existing mount points.
The two options are

* nocover/cover: Prevent/allow mounting over an existing root mountpoint.
E.g., "mount -t ufs -o nocover /dev/sd1a /usr/local" will fail if /usr/local
is already a mountpoint.
* emptydir/noemptydir: Prevent/allow mounting on a non-empty directory.
E.g., "mount -t ufs -o emptydir /dev/sd1a /usr" will fail.

Neither of these options is intended to be a default, for historical and
compatibility reasons.

Reviewed by: allanjude, kib
Differential Revision: https://reviews.freebsd.org/D21458


# b488246b 19-Sep-2019 Mateusz Guzik <mjg@FreeBSD.org>

vfs: group fields used for per-cpu ops in one cacheline

Sponsored by: The FreeBSD Foundation


# 4cace859 16-Sep-2019 Mateusz Guzik <mjg@FreeBSD.org>

vfs: convert struct mount counters to per-cpu

There are 3 counters modified all the time in this structure - one for
keeping the structure alive, one for preventing unmount and one for
tracking active writers. Exact values of these counters are very rarely
needed, which makes them a prime candidate for conversion to a per-cpu
scheme, resulting in much better performance.

Sample benchmark performing fstatfs (modifying 2 out of 3 counters) on
a 104-way 2 socket Skylake system:
before: 852393 ops/s
after: 76682077 ops/s

Reviewed by: kib, jeff
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21637


# a8c8e44b 16-Sep-2019 Mateusz Guzik <mjg@FreeBSD.org>

vfs: manage mnt_ref with atomics

New primitive is introduced to denote sections can operate locklessly
on aspects of struct mount, but which can also be disabled if necessary.
This provides an opportunity to start scaling common case modifications
while providing stable state of the struct when facing unmount, write
suspendion or other events.

mnt_ref is the first counter to start being managed in this manner with
the intent to make it per-cpu.

Reviewed by: kib, jeff
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21425


# 58aa4dbf 04-Sep-2019 Conrad Meyer <cem@FreeBSD.org>

sys/mount.h: Comment on distinction between vfs_{c,}mount

Hope to save someone else a little future effort in ugly duplicated code.

No functional change.


# 25c8d940 23-Aug-2019 Mateusz Guzik <mjg@FreeBSD.org>

vfs: assert the lock held in MNT_REF/MNT_REL

Sponsored by: The FreeBSD Foundation


# e671edac 23-Aug-2019 Konstantin Belousov <kib@FreeBSD.org>

De-commision the MNTK_NOINSMNTQ kernel mount flag.

After all the changes, its dynamic scope is same as for MNTK_UNMOUNT,
but to allow the syncer vnode to be re-installed on unmount failure.
But the case of syncer was already handled by using the VV_FORCEINSMQ
flag for quite some time.

Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# de4e1aeb 18-Aug-2019 Konstantin Belousov <kib@FreeBSD.org>

Fix an issue with executing tmpfs binary.

Suppose that a binary was executed from tmpfs mount, and the text
vnode was reclaimed while the binary was still running. It is
possible during even the normal operations since tmpfs vnode'
vm_object has swap type, and no references on the vnode is held. Also
assume that the text vnode was revived for some reason. Then, on the
process exit or exec, unmapping of the text mapping tries to remove
the text reference from the vnode, but since it went from
recycle/instantiation cycle, there is no reference kept, and assertion
in VOP_UNSET_TEXT_CHECKED() triggers.

Fix this by keeping a use reference on the tmpfs vnode for each exec
reference. This prevents the vnode reclamation while executable map
entry is active.

Do it by adding per-mount flag MNTK_TEXT_REFS that directs
vop_stdset_text() to add use ref on first vnode text use, and
per-vnode VI_TEXT_REF flag, to record the need on unref in
vop_stdunset_text() on last vnode text use going away. Set
MNTK_TEXT_REFS for tmpfs mounts.

Reported by: bdrewery
Tested by: sbruno, pho (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# daba4da8 01-Jul-2019 Kirk McKusick <mckusick@FreeBSD.org>

Add a new "untrusted" option to the mount command. Its purpose
is to notify the kernel that the file system is untrusted and it
should use more extensive checks on the file-system's metadata
before using it. This option is intended to be used when mounting
file systems from untrusted media such as USB memory sticks or other
externally-provided media.

It will initially be used by the UFS/FFS file system, but should
likely be expanded to be used by other file systems that may appear
on external media like msdosfs, exfat, and ext2fs.

Reviewed by: kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D20786


# d1fd400a 07-Dec-2018 Konstantin Belousov <kib@FreeBSD.org>

Add new file handle system calls.

Namely, getfhat(2), fhlink(2), fhlinkat(2), fhreadlink(2). The
syscalls are provided for a NFS userspace server (nfs-ganesha).

Submitted by: Jack Halford <jack@gandi.net>
Sponsored by: Gandi.net
Tested by: pho
Feedback from: brooks, markj
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D18359


# 6acf1b20 29-Oct-2018 Konstantin Belousov <kib@FreeBSD.org>

Clarify explanation of VFCF_SBDRY.

Requested by: mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 3 days


# 8ff7fad1 23-Oct-2018 Konstantin Belousov <kib@FreeBSD.org>

Only call sigdeferstop() for NFS.

Use bypass to catch any NFS VOP dispatch and route it through the
wrapper which does sigdeferstop() and then dispatches original
VOP. NFS does not need a bypass below it, which is not supported.

The vop offset in the vop_vector is added since otherwise it is
impossible to get vop_op_t from the internal table, and I did not
wanted to create the layered fs only to wrap NFS VOPs.

VFS_OP()s wrap is straightforward.

Requested and reviewed by: mjg (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D17658


# 0e5c6bd4 04-May-2018 Jamie Gritton <jamie@FreeBSD.org>

Make it easier for filesystems to count themselves as jail-enabled,
by doing most of the work in a new function prison_add_vfs in kern_jail.c
Now a jail-enabled filesystem need only mark itself with VFCF_JAIL, and
the rest is taken care of. This includes adding a jail parameter like
allow.mount.foofs, and a sysctl like security.jail.mount_foofs_allowed.
Both of these used to be a static list of known filesystems, with
predefined permission bits.

Reviewed by: kib
Differential Revision: D14681


# 82614df4 31-Dec-2017 Colin Percival <cperciva@FreeBSD.org>

Use the TSLOG framework to record entry/exit timestamps for VFS_MOUNT calls.


# 51369649 20-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.


# 9a81ba0f 31-May-2017 Stephen J. Kiernan <stevek@FreeBSD.org>

Add MD_VERIFY option to enable O_VERIFY in open for vnode type.
Add -o [no]verify option to mdconfig (and document in man page.)
Implement GEOM attribute MNT::verified to ask md if the backing vnode is
verified.
Check for MNT::verified in cd9660 mount to flag the mount as MNT_VERIFIED if
the underlying device has been verified.

Reviewed by: rwatson
Approved by: sjg (mentor)
Obtained from: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D2902


# 69921123 23-May-2017 Konstantin Belousov <kib@FreeBSD.org>

Commit the 64-bit inode project.

Extend the ino_t, dev_t, nlink_t types to 64-bit ints. Modify
struct dirent layout to add d_off, increase the size of d_fileno
to 64-bits, increase the size of d_namlen to 16-bits, and change
the required alignment. Increase struct statfs f_mntfromname[] and
f_mntonname[] array length MNAMELEN to 1024.

ABI breakage is mitigated by providing compatibility using versioned
symbols, ingenious use of the existing padding in structures, and
by employing other tricks. Unfortunately, not everything can be
fixed, especially outside the base system. For instance, third-party
APIs which pass struct stat around are broken in backward and
forward incompatible ways.

Kinfo sysctl MIBs ABI is changed in backward-compatible way, but
there is no general mechanism to handle other sysctl MIBS which
return structures where the layout has changed. It was considered
that the breakage is either in the management interfaces, where we
usually allow ABI slip, or is not important.

Struct xvnode changed layout, no compat shims are provided.

For struct xtty, dev_t tty device member was reduced to uint32_t.
It was decided that keeping ABI compat in this case is more useful
than reporting 64-bit dev_t, for the sake of pstat.

Update note: strictly follow the instructions in UPDATING. Build
and install the new kernel with COMPAT_FREEBSD11 option enabled,
then reboot, and only then install new world.

Credits: The 64-bit inode project, also known as ino64, started life
many years ago as a project by Gleb Kurtsou (gleb). Kirk McKusick
(mckusick) then picked up and updated the patch, and acted as a
flag-waver. Feedback, suggestions, and discussions were carried
by Ed Maste (emaste), John Baldwin (jhb), Jilles Tjoelker (jilles),
and Rick Macklem (rmacklem). Kris Moore (kris) performed an initial
ports investigation followed by an exp-run by Antoine Brodin (antoine).
Essential and all-embracing testing was done by Peter Holm (pho).
The heavy lifting of coordinating all these efforts and bringing the
project to completion were done by Konstantin Belousov (kib).

Sponsored by: The FreeBSD Foundation (emaste, kib)
Differential revision: https://reviews.freebsd.org/D10439


# fbbd9655 28-Feb-2017 Warner Losh <imp@FreeBSD.org>

Renumber copyright clause 4

Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by: Jan Schaumann <jschauma@stevens.edu>
Pull Request: https://github.com/freebsd/freebsd/pull/96


# 2f304845 05-Jan-2017 Konstantin Belousov <kib@FreeBSD.org>

Do not allocate struct statfs on kernel stack.

Right now size of the structure is 472 bytes on amd64, which is
already large and stack allocations are indesirable. With the ino64
work, MNAMELEN is increased to 1024, which will make it impossible to have
struct statfs on the stack.

Extracted from: ino64 work by gleb
Discussed with: mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# abc15156 27-Nov-2016 Konstantin Belousov <kib@FreeBSD.org>

NFSv4 client tracks opens, and the track records are only dropped when
the vnode is inactivated. This contradicts with the nullfs caching
which keeps upper vnode around, as consequence keeping the use
reference to lower vnode.

Add a filesystem flag to request nullfs to not cache when mounted over
that filesystem, and set the flag for nfs v4 mounts.

Reported by: asomers
Reviewed by: rmacklem
Tested by: asomers, rmacklem
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 5bb81f9b 30-Sep-2016 Mateusz Guzik <mjg@FreeBSD.org>

vfs: batch free vnodes in per-mnt lists

Previously free vnodes would always by directly returned to the global
LRU list. With this change up to mnt_free_list_batch vnodes are collected
first.

syncer runs always return the batch regardless of its size.

While vnodes on per-mnt lists are not counted as free, they can be
returned in case of vnode shortage.

Reviewed by: kib
Tested by: pho


# debc480e 07-Jul-2016 Edward Tomasz Napierala <trasz@FreeBSD.org>

Add new unmount(2) flag, MNT_NONBUSY, to check whether there are
any open vnodes before proceeding. Make autounmound(8) use this flag.
Without it, even an unsuccessfull unmount causes filesystem flush,
which interferes with normal operation.

Reviewed by: kib@
Approved by: re (gjb@)
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D7047


# 3a1e5dd8 26-Jun-2016 Konstantin Belousov <kib@FreeBSD.org>

Rewrite sigdeferstop(9) and sigallowstop(9) into more flexible
framework allowing to set the suspension policy for the dynamic block.
Extend the currently possible policies of stopping on interruptible
sleeps and ignoring such sleeps by two more: do not suspend at
interruptible sleeps, but interrupt them with either EINTR or ERESTART.

Reviewed by: jilles
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Approved by: re (gjb)


# 09c837b8 20-Nov-2015 Gleb Smirnoff <glebius@FreeBSD.org>

Remove remnants of the old NFS from vnode pager.

Reviewed by: kib
Sponsored by: Netflix


# 5f34e93c 05-Jul-2015 Mark Johnston <markj@FreeBSD.org>

Check suspendability on the mountpoint returned by VOP_GETWRITEMOUNT.
This obviates the need for a MNTK_SUSPENDABLE flag, since passthrough
filesystems like nullfs and unionfs no longer need to inherit this
information from their lower layer(s). This change also restores the
pre-r273336 behaviour of using the presence of a susp_clean VFS method to
request suspension support.

Reviewed by: kib, mjg
Differential Revision: https://reviews.freebsd.org/D2937


# dda11d4a 15-Apr-2015 Rick Macklem <rmacklem@FreeBSD.org>

File systems that do not use the buffer cache (such as ZFS) must
use VOP_FSYNC() to perform the NFS server's Commit operation.
This patch adds a mnt_kern_flag called MNTK_USES_BCACHE which
is set by file systems that use the buffer cache. If this flag
is not set, the NFS server always does a VOP_FSYNC().
This should be ok for old file system modules that do not set
MNTK_USES_BCACHE, since calling VOP_FSYNC() is correct, although
it might not be optimal for file systems that use the buffer cache.

Reviewed by: kib
MFC after: 2 weeks


# a25100c5 08-Dec-2014 Konstantin Belousov <kib@FreeBSD.org>

Add functions syncer_suspend() and syncer_resume(), which are supposed
to be called before suspension and after resume, correspondingly. The
syncer_suspend() ensures that all filesystems dirty data and metadata
are saved to the permanent storage, and stops kernel threads which
might modify filesystems. The syncer_resume() restores stopped
threads.

For now, only syncer is stopped. This is needed, because each sync
loop causes superblock updates for UFS.

Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 4fce16e4 20-Oct-2014 Mateusz Guzik <mjg@FreeBSD.org>

Provide vfs suspension support only for filesystems which need it, take
two.

nullfs and unionfs need to request suspension if underlying filesystem(s)
use it. Utilize mnt_kern_flag for this purpose.

This is a fixup for 273271.

No strong objections from: kib
Pointy hat to: mjg
MFC after: 2 weeks


# 020b8f17 19-Oct-2014 Mateusz Guzik <mjg@FreeBSD.org>

Provide vfs suspension support only for filesystems which need it.

Need is expressed by providing vfs_susp_clean function in vfsops.

Differential Revision: D952
Reviewed by: kib (previous version)
MFC after: 2 weeks


# 3914ddf8 17-Aug-2014 Edward Tomasz Napierala <trasz@FreeBSD.org>

Bring in the new automounter, similar to what's provided in most other
UNIX systems, eg. MacOS X and Solaris. It uses Sun-compatible map format,
has proper kernel support, and LDAP integration.

There are still a few outstanding problems; they will be fixed shortly.

Reviewed by: allanjude@, emaste@, kib@, wblock@ (earlier versions)
Phabric: D523
MFC after: 2 weeks
Relnotes: yes
Sponsored by: The FreeBSD Foundation


# 168f4ee0 02-Aug-2014 Konstantin Belousov <kib@FreeBSD.org>

Remove Giant acquisition from the mount and unmount pathes.

It could be claimed that two things were reasonable protected by
Giant. One is vfsconf list links, which is converted to the new
dedicated sx vfsconf_sx. Another is vfsconf.vfc_refcount, which is
now updated with atomics.

Note that vfc_refcount still has the same races now as it has under
the Giant, the unload of filesystem modules can happen while the
module is still in use.

Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# beb199ac 09-Nov-2013 Konstantin Belousov <kib@FreeBSD.org>

Hide MNT_SHARED_WRITES() and MNT_EXTENDED_SHARED() under the #ifdef
_KERNEL braces. Struct mount is only defined for the kernel build.

Reported and tested by: andreast
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 6272798a 09-Nov-2013 Konstantin Belousov <kib@FreeBSD.org>

Both vn_close() and VFS_PROLOGUE() evaluate vp->v_mount twice, without
holding the vnode lock; vp->v_mount is checked first for NULL
equiality, and then dereferenced if not NULL. If vnode is reclaimed
meantime, second dereference would still give NULL. Change
VFS_PROLOGUE() to evaluate the mp once, convert MNTK_SHARED_WRITES and
MNTK_EXTENDED_SHARED tests into inline functions.

Reviewed by: alc
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# d5814e82 28-Oct-2013 Sergey Kandaurov <pluknet@FreeBSD.org>

G/c unused mountrootfsname.
It was replaced with rootdevnames in r52778.


# 8fe6bddf 01-Sep-2013 Rick Macklem <rmacklem@FreeBSD.org>

Forced dismounts of NFS mounts can fail when thread(s) are stuck
waiting for an RPC reply from the server while holding the mount
point busy (mnt_lockref incremented). This happens because dounmount()
msleep()s waiting for mnt_lockref to become 0, before calling
VFS_UNMOUNT(). This patch adds a new VFS operation called VFS_PURGE(),
which the NFS client implements as purging RPCs in progress. Making
this call before checking mnt_lockref fixes the problem, by ensuring
that the VOP_xxx() calls will fail and unbusy the mount point.

Reported by: sbruno
Reviewed by: kib
MFC after: 2 weeks


# 477e6ee4 23-Aug-2013 Alfred Perlstein <alfred@FreeBSD.org>

Grow some spares in struct vfsops.

This should hopefully prevent ABI breakage
on adding new vfsops in 10.x.


# 4612275f 10-Jun-2013 Marcel Moolenaar <marcel@FreeBSD.org>

Revert r251590. It unexpectedly broke the build and there were some
questions on locking. As part of commit-bit grooming, I'd like Steve
to handle this, but can't leave things broken in the mean time.


# 8c7ca16f 09-Jun-2013 Marcel Moolenaar <marcel@FreeBSD.org>

Add vfs_mounted and vfs_unmounted events so that components can be informed
about mount and unmount events. This is used by Juniper to implement a more
optimal implementation of NetBSD's veriexec.

Submitted by: stevek@juniper.net
Obtained from: Juniper Networks, Inc


# 0fc6daa7 11-May-2013 Konstantin Belousov <kib@FreeBSD.org>

- Fix nullfs vnode reference leak in nullfs_reclaim_lowervp(). The
null_hashget() obtains the reference on the nullfs vnode, which must
be dropped.

- Fix a wart which existed from the introduction of the nullfs
caching, do not unlock lower vnode in the nullfs_reclaim_lowervp().
It should be innocent, but now it is also formally safe. Inform the
nullfs_reclaim() about this using the NULLV_NOUNLOCK flag set on
nullfs inode.

- Add a callback to the upper filesystems for the lower vnode
unlinking. When inactivating a nullfs vnode, check if the lower
vnode was unlinked, indicated by nullfs flag NULLV_DROP or VV_NOSYNC
on the lower vnode, and reclaim upper vnode if so. This allows
nullfs to purge cached vnodes for the unlinked lower vnode, avoiding
excessive caching.

Reported by: G??ran L??wkrantz <goran.lowkrantz@ismobile.com>
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# f8c09530 19-Mar-2013 Konstantin Belousov <kib@FreeBSD.org>

A flag for the filesystem to indicate to the upper levels that it accepts
unmapped buffers for the VOP_STRATEGY().

Sponsored by: The FreeBSD Foundation
Tested by: pho


# 593efaf9 21-Feb-2013 John Baldwin <jhb@FreeBSD.org>

Further refine the handling of stop signals in the NFS client. The
changes in r246417 were incomplete as they did not add explicit calls to
sigdeferstop() around all the places that previously passed SBDRY to
_sleep(). In addition, nfs_getcacheblk() could trigger a write RPC from
getblk() resulting in sigdeferstop() recursing. Rather than manually
deferring stop signals in specific places, change the VFS_*() and VOP_*()
methods to defer stop signals for filesystems which request this behavior
via a new VFCF_SBDRY flag. Note that this has to be a VFC flag rather than
a MNTK flag so that it works properly with VFS_MOUNT() when the mount is
not yet fully constructed. For now, only the NFS clients are set this new
flag in VFS_SET().

A few other related changes:
- Add an assertion to ensure that TDF_SBDRY doesn't leak to userland.
- When a lookup request uses VOP_READLINK() to follow a symlink, mark
the request as being on behalf of the thread performing the lookup
(cnp_thread) rather than using a NULL thread pointer. This causes
NFS to properly handle signals during this VOP on an interruptible
mount.

PR: kern/176179
Reported by: Russell Cattelan (sigdeferstop() recursion)
Reviewed by: kib
MFC after: 1 month


# 6cd3574c 24-Jan-2013 Sergey Kandaurov <pluknet@FreeBSD.org>

Update and clarify comments regarding VFS op table initialization
in the man page and its header counterpart.

Submitted by: Christoph Mallon <christoph.mallon@gmx.de> (initial version)
Reviewed and further improved by: bde (previous version)
All bugs are: mine


# d1c5e3f8 03-Jan-2013 Konstantin Belousov <kib@FreeBSD.org>

Remove the deprecated MNT_VNODE_FOREACH interface. Use the
MNT_VNODE_FOREACH_ALL instead.


# 14df601e 14-Dec-2012 Konstantin Belousov <kib@FreeBSD.org>

When mnt_vnode_next_active iterator cannot lock the next vnode and
yields, specify the user priority for the yield. Otherwise, a
higher-priority (kernel) thread could fall into the priority-inversion
with the thread owning the mutex lock.

On single-processor machines or UP kernels, do not loop adaptively
when the next vnode cannot be locked, instead yield unconditionally.

Restructure the iteration initializer and the iterator to remove code
duplication. Put the code to fetch and lock a vnode next to the
current marker, into the mnt_vnode_next_active() function, and use it
instead of repeating the loop.

Reported by: hrs, rmacklem
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 3 days


# 4eea8aea 14-Dec-2012 Konstantin Belousov <kib@FreeBSD.org>

Line up the continuation backslashes.

Sponsored by: The FreeBSD Foundation
MFC after: 3 days


# bc2258da 09-Nov-2012 Attilio Rao <attilio@FreeBSD.org>

Complete MPSAFE VFS interface and remove MNTK_MPSAFE flag.
Porters should refer to __FreeBSD_version 1000021 for this change as
it may have happened at the same timeframe.


# 5050aa86 22-Oct-2012 Konstantin Belousov <kib@FreeBSD.org>

Remove the support for using non-mpsafe filesystem modules.

In particular, do not lock Giant conditionally when calling into the
filesystem module, remove the VFS_LOCK_GIANT() and related
macros. Stop handling buffers belonging to non-mpsafe filesystems.

The VFS_VERSION is bumped to indicate the interface change which does
not result in the interface signatures changes.

Conducted and reviewed by: attilio
Tested by: pho


# 102548d1 05-Oct-2012 Andriy Gapon <avg@FreeBSD.org>

mount.h: MNTK_VGONE_UPPER and MNTK_VGONE_WAITER were supposed to be different

... otherwise a waiter is never woken up.

Reported by: swills
Discussed with: jhb
Approved by: kib
MFC after: 3 days


# bcd5bb8e 09-Sep-2012 Konstantin Belousov <kib@FreeBSD.org>

Add a facility for vgone() to inform the set of subscribed mounts
about vnode reclamation. Typical use is for the bypass mounts like
nullfs to get a notification about lower vnode going away.

Now, vgone() calls new VFS op vfs_reclaim_lowervp() with an argument
lowervp which is reclaimed. It is possible to register several
reclamation event listeners, to correctly handle the case of several
nullfs mounts over the same directory.

For the filesystem not having nullfs mounts over it, the overhead
added is a single mount interlock lock/unlock in the vnode reclamation
path.

In collaboration with: pho
MFC after: 3 weeks


# 84c3cd4f 09-Sep-2012 Konstantin Belousov <kib@FreeBSD.org>

Add MNTK_LOOKUP_EXCL_DOTDOT struct mount flag, which specifies to the
lookup code that dotdot lookups shall override any shared lock
requests with the exclusive one. The flag is useful for filesystems
which sometimes need to upgrade shared lock to exclusive inside the
VOP_LOOKUP or later, which cannot be done safely for dotdot, due to
dvp also locked and causing LOR.

In collaboration with: pho
MFC after: 3 weeks


# 41014d99 30-May-2012 Konstantin Belousov <kib@FreeBSD.org>

vn_io_fault() is a facility to prevent page faults while filesystems
perform copyin/copyout of the file data into the usermode
buffer. Typical filesystem hold vnode lock and some buffer locks over
the VOP_READ() and VOP_WRITE() operations, and since page fault
handler may need to recurse into VFS to get the page content, a
deadlock is possible.

The facility works by disabling page faults handling for the current
thread and attempting to execute i/o while allowing uiomove() to
access the usermode mapping of the i/o buffer. If all buffer pages are
resident, uiomove() is successfull and request is finished. If EFAULT
is returned from uiomove(), the pages backing i/o buffer are faulted
in and held, and the copyin/out is performed using uiomove_fromphys()
over the held pages for the second attempt of VOP call.

Since pages are hold in chunks to prevent large i/o requests from
starving free pages pool, and since vnode lock is only taken for
i/o over the current chunk, the vnode lock no longer protect atomicity
of the whole i/o request. Use newly added rangelocks to provide the
required atomicity of i/o regardind other i/o and truncations.

Filesystems need to explicitely opt-in into the scheme, by setting the
MNTK_NO_IOPF struct mount flag, and optionally by using
vn_io_fault_uiomove(9) helper which takes care of calling uiomove() or
converting uio into request for uiomove_fromphys().

Reviewed by: bf (comments), mdf, pjd (previous version)
Tested by: pho
Tested by: flo, Gustau P?rez <gperez entel upc edu> (previous version)
MFC after: 2 months


# 11c15f90 18-May-2012 Kirk McKusick <mckusick@FreeBSD.org>

Update comment to document that the vnode free-list mutex needs to be
held when updating mnt_activevnodelist and mnt_activevnodelistsize.


# f257ebbb 20-Apr-2012 Kirk McKusick <mckusick@FreeBSD.org>

This change creates a new list of active vnodes associated with
a mount point. Active vnodes are those with a non-zero use or hold
count, e.g., those vnodes that are not on the free list. Note that
this list is in addition to the list of all the vnodes associated
with a mount point.

To avoid adding another set of linkage pointers to the vnode
structure, the active list uses the existing linkage pointers
used by the free list (previously named v_freelist, now renamed
v_actfreelist).

This update adds the MNT_VNODE_FOREACH_ACTIVE interface that loops
over just the active vnodes associated with a mount point (typically
less than 1% of the vnodes associated with the mount point).

Reviewed by: kib
Tested by: Peter Holm
MFC after: 2 weeks


# 71469bb3 17-Apr-2012 Kirk McKusick <mckusick@FreeBSD.org>

Replace the MNT_VNODE_FOREACH interface with MNT_VNODE_FOREACH_ALL.
The primary changes are that the user of the interface no longer
needs to manage the mount-mutex locking and that the vnode that
is returned has its mutex locked (thus avoiding the need to check
to see if its is DOOMED or other possible end of life senarios).

To minimize compatibility issues for third-party developers, the
old MNT_VNODE_FOREACH interface will remain available so that this
change can be MFC'ed to 9. Following the MFC to 9, MNT_VNODE_FOREACH
will be removed in head.

The reason for this update is to prepare for the addition of the
MNT_VNODE_FOREACH_ACTIVE interface that will loop over just the
active vnodes associated with a mount point (typically less than
1% of the vnodes associated with the mount point).

Reviewed by: kib
Tested by: Peter Holm
MFC after: 2 weeks


# e8f8ad72 11-Apr-2012 Kirk McKusick <mckusick@FreeBSD.org>

Whitespace cleanup.


# 0ff93c48 07-Apr-2012 Gleb Kurtsou <gleb@FreeBSD.org>

Add vfs_getopt_size. Support human readable file system options in tmpfs.

Increase maximum tmpfs file system size to 4GB*PAGE_SIZE on 32 bit archs.

Discussed with: delphij
MFC after: 2 weeks


# 38ddb572 08-Mar-2012 Konstantin Belousov <kib@FreeBSD.org>

Decomission mnt_noasync. Introduce MNTK_NOASYNC mnt_kern_flag which
allows a filesystem to request VFS to not allow MNTK_ASYNC.

MFC after: 1 week


# cc672d35 16-Jan-2012 Kirk McKusick <mckusick@FreeBSD.org>

Make sure all intermediate variables holding mount flags (mnt_flag)
and that all internal kernel calls passing mount flags are declared
as uint64_t so that flags in the top 32-bits are not lost.

MFC after: 2 weeks


# d716efa9 24-Jul-2011 Kirk McKusick <mckusick@FreeBSD.org>

Move the MNTK_SUJ flag in mnt_kern_flag to MNT_SUJ in mnt_flag
so that it is visible to userland programs. This change enables
the `mount' command with no arguments to be able to show if a
filesystem is mounted using journaled soft updates as opposed
to just normal soft updates.

Approved by: re (bz)


# 6beb3bb4 24-Jul-2011 Kirk McKusick <mckusick@FreeBSD.org>

This update changes the mnt_flag field in the mount structure from
32 bits to 64 bits and eliminates the unused mnt_xflag field. The
existing mnt_flag field is completely out of bits, so this update
gives us room to expand. Note that the f_flags field in the statfs
structure is already 64 bits, so the expanded mnt_flag field can
be exported without having to make any changes in the statfs structure.

Approved by: re (bz)


# 694a586a 21-May-2011 Rick Macklem <rmacklem@FreeBSD.org>

Add a lock flags argument to the VFS_FHTOVP() file system
method, so that callers can indicate the minimum vnode
locking requirement. This will allow some file systems to choose
to return a LK_SHARED locked vnode when LK_SHARED is specified
for the flags argument. This patch only adds the flag. It
does not change any file system to use it and all callers
specify LK_EXCLUSIVE, so file system semantics are not changed.

Reviewed by: kib


# 8dfec4a3 21-Dec-2010 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Close body of the VFS_UNLOCK_GIANT() macro into do { } while (0) loop,
so it can be used in code like this:

if (cond)
VFS_UNLOCK_GIANT(vfslocked);
else
; /* Do something else. */

Before the change, compiler couldn't decide on its own if else should be
applied to the 'if (cond)' or to the if statement inside VFS_UNLOCK_GIANT()
macro.


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# d0cc54f3 10-Oct-2010 Konstantin Belousov <kib@FreeBSD.org>

The r184588 changed the layout of struct export_args, causing an ABI
breakage for old mount(2) syscall, since most struct <filesystem>_args
embed export_args. The mount(2) is supposed to provide ABI
compatibility for pre-nmount mount(8) binaries, so restore ABI to
pre-r184588.

Requested and reviewed by: bde
MFC after: 2 weeks


# 418f1e7b 14-Sep-2010 Konstantin Belousov <kib@FreeBSD.org>

Rename the field to not confuse readers. The bytes are actually used.

Discussed with: rmacklem
MFC after: 1 week


# 9a24dc07 11-Sep-2010 Konstantin Belousov <kib@FreeBSD.org>

Protect mnt_syncer with the sync_mtx. This prevents a (rare) vnode leak
when mount and update are executed in parallel.

Encapsulate syncer vnode deallocation into the helper function
vfs_deallocate_syncvnode(), to not externalize sync_mtx from vfs_subr.c.

Found and reviewed by: jh (previous version of the patch)
Tested by: pho
MFC after: 3 weeks


# c87f1ad4 28-Aug-2010 Pawel Jakub Dawidek <pjd@FreeBSD.org>

There is a bug in vfs_allocate_syncvnode() failure handling in mount code.
Actually it is hard to properly handle such a failure, especially in MNT_UPDATE
case. The only reason for the vfs_allocate_syncvnode() function to fail is
getnewvnode() failure. Fortunately it is impossible for current implementation
of getnewvnode() to fail, so we can assert this and make
vfs_allocate_syncvnode() void. This in turn free us from handling its failures
in the mount code.

Reviewed by: kib
MFC after: 1 month


# 113db2dd 24-Apr-2010 Jeff Roberson <jeff@FreeBSD.org>

- Merge soft-updates journaling from projects/suj/head into head. This
brings in support for an optional intent log which eliminates the need
for background fsck on unclean shutdown.

Sponsored by: iXsystems, Yahoo!, and Juniper.
With help from: McKusick and Peter Holm


# 0718d64d 18-Apr-2010 Edward Tomasz Napierala <trasz@FreeBSD.org>

MFC r200796:

Implement NFSv4 ACL support for UFS.

Reviewed by: rwatson


# 9340fc72 21-Dec-2009 Edward Tomasz Napierala <trasz@FreeBSD.org>

Implement NFSv4 ACL support for UFS.

Reviewed by: rwatson


# fe1d3f15 28-Jun-2009 Stanislav Sedov <stas@FreeBSD.org>

- Turn the third (islocked) argument of the knote call into flags parameter.
Introduce the new flag KNF_NOKQLOCK to allow event callers to be called
without KQ_LOCK mtx held.
- Modify VFS knote calls to always use KNF_NOKQLOCK flag. This is required
for ZFS as its getattr implementation may sleep.

Approved by: re (rwatson)
Reviewed by: kib
MFC after: 2 weeks


# 27bfb741 08-Jun-2009 Paul Saab <ps@FreeBSD.org>

Simply shared vnode locking and extend it to also include fsync.
Also, in vop_write, no longer assert for exclusive locks on the
vnode.

Reviewed by: jhb, kmacy, jeffr


# a6d545d8 04-Jun-2009 Paul Saab <ps@FreeBSD.org>

Support shared vnode locks for write operations when the offset is
provided on filesystems that support it. This really improves mysql
+ innodb performance on ZFS.

Reviewed by: jhb, kmacy, jeffr


# faef64cc 30-May-2009 Attilio Rao <attilio@FreeBSD.org>

Remove the now invalid (and possibly unused) debug.mpsafevfs
sysctl/tunable.

Reviewed by: emaste
Sponsored by: Sandvine Incorporated


# 61cea482 29-May-2009 Edward Tomasz Napierala <trasz@FreeBSD.org>

There is only one spare MNT_ flag left, and I want to use it for NFSv4 ACLs.
Make room for additional filesystem flags now, to avoid breaking ABI later.

Reviewed by: kib@


# dfd233ed 11-May-2009 Attilio Rao <attilio@FreeBSD.org>

Remove the thread argument from the FSD (File-System Dependent) parts of
the VFS. Now all the VFS_* functions and relating parts don't want the
context as long as it always refers to curthread.

In some points, in particular when dealing with VOPs and functions living
in the same namespace (eg. vflush) which still need to be converted,
pass curthread explicitly in order to retain the old behaviour.
Such loose ends will be fixed ASAP.

While here fix a bug: now, UFS_EXTATTR can be compiled alone without the
UFS_EXTATTR_AUTOSTART option.

VFS KPI is heavilly changed by this commit so thirdy parts modules needs
to be recompiled. Bump __FreeBSD_version in order to signal such
situation.


# 33fc3625 11-Mar-2009 John Baldwin <jhb@FreeBSD.org>

Add a new internal mount flag (MNTK_EXTENDED_SHARED) to indicate that a
filesystem supports additional operations using shared vnode locks.
Currently this is used to enable shared locks for open() and close() of
read-only file descriptors.
- When an ISOPEN namei() request is performed with LOCKSHARED, use a
shared vnode lock for the leaf vnode only if the mount point has the
extended shared flag set.
- Set LOCKSHARED in vn_open_cred() for requests that specify O_RDONLY but
not O_CREAT.
- Use a shared vnode lock around VOP_CLOSE() if the file was opened with
O_RDONLY and the mountpoint has the extended shared flag set.
- Adjust md(4) to upgrade the vnode lock on the vnode it gets back from
vn_open() since it now may only have a shared vnode lock.
- Don't enable shared vnode locks on FIFO vnodes in ZFS and UFS since
FIFO's require exclusive vnode locks for their open() and close()
routines. (My recent MPSAFE patches for UDF and cd9660 already included
this change.)
- Enable extended shared operations on UFS, cd9660, and UDF.

Submitted by: ups
Reviewed by: pjd (ZFS bits)
MFC after: 1 month


# f86bce5e 02-Mar-2009 Jamie Gritton <jamie@FreeBSD.org>

Extend the "vfsopt" mount options for more general use. Make struct
vfsopt and the vfs_buildopts function public, and add some new fields
to struct vfsopt (pos and seen), and new functions vfs_getopt_pos and
vfs_opterror.

Further extend the interface to allow reading options from the kernel
in addition to sending them to the kernel, with vfs_setopt and related
functions.

While this allows the "name=value" option interface to be used for more
than just FS mounts (planned use is for jails), it retains the current
"vfsopt" name and <sys/mount.h> requirement.

Approved by: bz (mentor)


# ec48c16f 06-Feb-2009 Edward Tomasz Napierala <trasz@FreeBSD.org>

Add KASSERTs to make it easier to debug problems like the one fixed
in r188141.

Reviewed by: kib,attilio
Approved by: rwatson (mentor)
Tested by: pho
Sponsored by: FreeBSD Foundation


# 4a0f8076 16-Dec-2008 Attilio Rao <attilio@FreeBSD.org>

1) Fix a deadlock in the VFS:
- threadA runs vfs_rel(mp1)
- threadB does unmount the mp1 fs, sets MNTK_UNMOUNT and drop MNT_ILOCK()
- threadA runs vfs_busy(mp1) and, as long as, MNTK_UNMOUNT is set, sleeps
waiting for threadB to complete the unmount
- threadB, in vfs_mount_destroy(), finds mnt_lock > 0 and sleeps waiting
for the refcount to expire.

Fix the deadlock by adding a flag called MNTK_REFEXPIRE which signals the
unmounter is waiting for mnt_ref to expire.
The vfs_busy contenders got awake, fails, and if they retry the
MNTK_REFEXPIRE won't allow them to sleep again.

2) Simplify significantly the code of vfs_mount_destroy() trimming
unnecessary codes:
- as long as any reference exited, it is no-more possible to have
write-op (primarty and secondary) in progress.
- it is no needed to drop and reacquire the mount lock.
- filling the structures with dummy values is unuseful as long as
it is going to be freed.

Tested by: pho, Andrea Barberio <insomniac at slackware dot it>
Discussed with: kib


# 61791644 29-Nov-2008 Konstantin Belousov <kib@FreeBSD.org>

In the nfsrv_fhtovp(), after the vfs_getvfs() function found the pointer
to the fs, but before a vnode on the fs is locked, unmount may free fs
structures, causing access to destroyed data and freed memory.

Introduce a vfs_busymp() function that looks up and busies found
fs while mountlist_mtx is held. Use it in nfsrv_fhtovp() and in the
implementation of the handle syscalls.

Two other uses of the vfs_getvfs() in the vfs_subr.c, namely in
sysctl_vfs_ctl and vfs_getnewfsid seems to be ok. In particular,
sysctl_vfs_ctl is protected by Giant by being a non-sleeping sysctl
handler, that prevents Giant-locked unmount code to interfere with it.

Noted by: tegge
Reviewed by: dfr
Tested by: pho
MFC after: 1 month


# 1ba4a712 17-Nov-2008 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Update ZFS from version 6 to 13 and bring some FreeBSD-specific changes.

This bring huge amount of changes, I'll enumerate only user-visible changes:

- Delegated Administration

Allows regular users to perform ZFS operations, like file system
creation, snapshot creation, etc.

- L2ARC

Level 2 cache for ZFS - allows to use additional disks for cache.
Huge performance improvements mostly for random read of mostly
static content.

- slog

Allow to use additional disks for ZFS Intent Log to speed up
operations like fsync(2).

- vfs.zfs.super_owner

Allows regular users to perform privileged operations on files stored
on ZFS file systems owned by him. Very careful with this one.

- chflags(2)

Not all the flags are supported. This still needs work.

- ZFSBoot

Support to boot off of ZFS pool. Not finished, AFAIK.

Submitted by: dfr

- Snapshot properties

- New failure modes

Before if write requested failed, system paniced. Now one
can select from one of three failure modes:
- panic - panic on write error
- wait - wait for disk to reappear
- continue - serve read requests if possible, block write requests

- Refquota, refreservation properties

Just quota and reservation properties, but don't count space consumed
by children file systems, clones and snapshots.

- Sparse volumes

ZVOLs that don't reserve space in the pool.

- External attributes

Compatible with extattr(2).

- NFSv4-ACLs

Not sure about the status, might not be complete yet.

Submitted by: trasz

- Creation-time properties

- Regression tests for zpool(8) command.

Obtained from: OpenSolaris


# 30f60d8c 03-Nov-2008 Attilio Rao <attilio@FreeBSD.org>

Remove the mnt_holdcnt and mnt_holdcntwaiters because they are useless.
Really, the concept of holdcnt in the struct mount is rappresented by
the mnt_ref (which prevents the type-stable structure from being
"recycled) handled through vfs_ref() and vfs_rel().
On this optic, switch the holdcnt acquisition into an emulated vfs_ref()
(and subsequent release into vfs_rel()).

Discussed with: kib
Tested by: pho


# a9148abd 03-Nov-2008 Doug Rabson <dfr@FreeBSD.org>

Implement support for RPCSEC_GSS authentication to both the NFS client
and server. This replaces the RPC implementation of the NFS client and
server with the newer RPC implementation originally developed
(actually ported from the userland sunrpc code) to support the NFS
Lock Manager. I have tested this code extensively and I believe it is
stable and that performance is at least equal to the legacy RPC
implementation.

The NFS code currently contains support for both the new RPC
implementation and the older legacy implementation inherited from the
original NFS codebase. The default is to use the new implementation -
add the NFS_LEGACYRPC option to fall back to the old code. When I
merge this support back to RELENG_7, I will probably change this so
that users have to 'opt in' to get the new code.

To use RPCSEC_GSS on either client or server, you must build a kernel
which includes the KGSSAPI option and the crypto device. On the
userland side, you must build at least a new libc, mountd, mount_nfs
and gssd. You must install new versions of /etc/rc.d/gssd and
/etc/rc.d/nfsd and add 'gssd_enable=YES' to /etc/rc.conf.

As long as gssd is running, you should be able to mount an NFS
filesystem from a server that requires RPCSEC_GSS authentication. The
mount itself can happen without any kerberos credentials but all
access to the filesystem will be denied unless the accessing user has
a valid ticket file in the standard place (/tmp/krb5cc_<uid>). There
is currently no support for situations where the ticket file is in a
different place, such as when the user logged in via SSH and has
delegated credentials from that login. This restriction is also
present in Solaris and Linux. In theory, we could improve this in
future, possibly using Brooks Davis' implementation of variant
symlinks.

Supporting RPCSEC_GSS on a server is nearly as simple. You must create
service creds for the server in the form 'nfs/<fqdn>@<REALM>' and
install them in /etc/krb5.keytab. The standard heimdal utility ktutil
makes this fairly easy. After the service creds have been created, you
can add a '-sec=krb5' option to /etc/exports and restart both mountd
and nfsd.

The only other difference an administrator should notice is that nfsd
doesn't fork to create service threads any more. In normal operation,
there will be two nfsd processes, one in userland waiting for TCP
connections and one in the kernel handling requests. The latter
process will create as many kthreads as required - these should be
visible via 'top -H'. The code has some support for varying the number
of service threads according to load but initially at least, nfsd uses
a fixed number of threads according to the value supplied to its '-n'
option.

Sponsored by: Isilon Systems
MFC after: 1 month


# 83b3bdbc 02-Nov-2008 Attilio Rao <attilio@FreeBSD.org>

Improve VFS locking:
- Implement real draining for vfs consumers by not relying on the
mnt_lock and using instead a refcount in order to keep track of lock
requesters.
- Due to the change above, remove the mnt_lock lockmgr because it is now
useless.
- Due to the change above, vfs_busy() is no more linked to a lockmgr.
Change so its KPI by removing the interlock argument and defining 2 new
flags for it: MBF_NOWAIT which basically replaces the LK_NOWAIT of the
old version (which was unlinked from the lockmgr alredy) and
MBF_MNTLSTLOCK which provides the ability to drop the mountlist_mtx
once the mnt interlock is held (ability still desired by most consumers).
- The stub used into vfs_mount_destroy(), that allows to override the
mnt_ref if running for more than 3 seconds, make it totally useless.
Remove it as it was thought to work into older versions.
If a problem of "refcount held never going away" should appear, we will
need to fix properly instead than trust on such hackish solution.
- Fix a bug where returning (with an error) from dounmount() was still
leaving the MNTK_MWAIT flag on even if it the waiters were actually
woken up. Just a place in vfs_mount_destroy() is left because it is
going to recycle the structure in any case, so it doesn't matter.
- Remove the markercnt refcount as it is useless.

This patch modifies VFS ABI and breaks KPI for vfs_busy() so manpages and
__FreeBSD_version will be modified accordingly.

Discussed with: kib
Tested by: pho


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# 6e6049e9 19-Sep-2008 David E. O'Brien <obrien@FreeBSD.org>

Add freebsd32 compat shim for nmount(2).
(and quiet some compiler warnings for vfs_donmount)


# 2814d5ba 16-Sep-2008 Konstantin Belousov <kib@FreeBSD.org>

When attempt is made to suspend a filesystem that is already syspended,
wait until the current suspension is lifted instead of silently returning
success immediately. The consequences of calling vfs_write() resume when
not owning the suspension are not well-defined at best.

Add the vfs_susp_clean() mount method to be called from
vfs_write_resume(). Set it to process_deferred_inactive() for ffs, and
stop calling it manually.

Add the thread flag TDP_IGNSUSP that allows to bypass the suspension
point in the vn_start_write. It is intended for use by VFS in the
situations where the suspender want to do some i/o requiring calls to
vn_start_write(), and this i/o cannot be done later.

Reviewed by: tegge
In collaboration with: pho
MFC after: 1 month


# 59d49325 31-Aug-2008 Attilio Rao <attilio@FreeBSD.org>

Decontextualize vfs_busy(), vfs_unbusy() and vfs_mount_alloc() functions.

Manpages are updated accordingly.

Tested by: Diego Sardina <siarodx at gmail dot com>


# a7053783 09-Jun-2008 Konstantin Belousov <kib@FreeBSD.org>

Provide the mutual exclusion between the nfs export list modifications
and nfs requests processing. Lockmgr lock provides the shared locking for
nfs requests, while exclusive mode is used for modifications. The writer
starvation is handled by lockmgr too.

Reported by: kris, pho, many
Based on the submission by: mohan
Tested by: pho
MFC after: 2 weeks


# 3800322f 26-Apr-2008 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Implement 'show mount' command in DDB. Without argument, it prints short
info about all currently mounted file systems. When an address is given
as an argument, prints detailed info about the given mount point.

MFC after: 2 weeks


# 7fbfba7b 01-Mar-2008 Attilio Rao <attilio@FreeBSD.org>

- Handle buffer lock waiters count directly in the buffer cache instead
than rely on the lockmgr support [1]:
* bump the waiters only if the interlock is held
* let brelvp() return the waiters count
* rely on brelvp() instead than BUF_LOCKWAITERS() in order to check
for the waiters number
- Remove a namespace pollution introduced recently with lockmgr.h
including lock.h by including lock.h directly in the consumers and
making it mandatory for using lockmgr.
- Modify flags accepted by lockinit():
* introduce LK_NOPROFILE which disables lock profiling for the
specified lockmgr
* introduce LK_QUIET which disables ktr tracing for the specified
lockmgr [2]
* disallow LK_SLEEPFAIL and LK_NOWAIT to be passed there so that it
can only be used on a per-instance basis
- Remove BUF_LOCKWAITERS() and lockwaiters() as they are no longer
used

This patch breaks KPI so __FreBSD_version will be bumped and manpages
updated by further commits. Additively, 'struct buf' changes results in
a disturbed ABI also.

[2] Really, currently there is no ktr tracing in the lockmgr, but it
will be added soon.

[1] Submitted by: kib
Tested by: pho, Andrea Barberio <insomniac at slackware dot it>


# 245b2044 12-Sep-2007 Konstantin Belousov <kib@FreeBSD.org>

When restoring the mount after umount failed, the MNTK_UNMOUNT flag
prevents insmntque() from placing reallocated syncer vnode on mount
list, that causes panic in vfs_allocate_syncvnode().

Introduce MNTK_NOINSMNTQ flag, that marks the period when instmntque is
not allowed to success, instead of MNTK_UNMOUNT. The MNTK_NOINSMNTQ is
set and cleared simultaneously with MNTK_UNMOUNT, except on umount error
path, where it is cleaned just before the syncer vnode is going to be
allocated.

Reported by: Peter Jeremy <peterjeremy optushome com au>
Suggested by: tegge
Approved by: re (rwatson)


# cc479dda 28-Aug-2007 John Baldwin <jhb@FreeBSD.org>

Rework the routines to convert a 5.x+ statfs structure (with fixed-size
64-bit counters) to a 4.x statfs structure (with long-sized counters).
- For block counters, we scale up the block size sufficiently large so
that the resulting block counts fit into a the long-sized (long for the
ABI, so 32-bit in freebsd32) counters. In 4.x the NFS client's statfs
VOP did this already. This can lie about the block size to 4.x binaries,
but it presents a more accurate picture of the ratios of free and
available space.
- For non-block counters, fix the freebsd32 stats converter to cap the
values at INT32_MAX rather than losing the upper 32-bits to match the
behavior of the 4.x statfs conversion routine in vfs_syscalls.c

Approved by: re (kensmith)


# eb542415 22-Apr-2007 Robert Watson <rwatson@FreeBSD.org>

In the MAC Framework implementation, file systems have two per-mountpoint
labels: the mount label (label of the mountpoint) and the fs label (label
of the file system). In practice, policies appear to only ever use one,
and the distinction is not helpful.

Combine mnt_mntlabel and mnt_fslabel into a single mnt_label, and
eliminate extra machinery required to maintain the additional label.
Update policies to reflect removal of extra entry points and label.

Obtained from: TrustedBSD Project
Sponsored by: SPARTA, Inc.


# 7760d840 17-Apr-2007 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Export vfs_mount_alloc() as it is used in ZFS.


# f3a8d2f9 05-Apr-2007 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Add security.jail.mount_allowed sysctl, which allows to mount and
unmount jail-friendly file systems from within a jail.
Precisely it grants PRIV_VFS_MOUNT, PRIV_VFS_UNMOUNT and
PRIV_VFS_MOUNT_NONUSER privileges for a jailed super-user.
It is turned off by default.

A jail-friendly file system is a file system which driver registers
itself with VFCF_JAIL flag via VFS_SET(9) API.
The lsvfs(1) command can be used to see which file systems are
jail-friendly ones.

There currently no jail-friendly file systems, ZFS will be the first one.
In the future we may consider marking file systems like nullfs as
jail-friendly.

Reviewed by: rwatson


# 4874b3fb 01-Apr-2007 Pawel Jakub Dawidek <pjd@FreeBSD.org>

More style nits.


# daa88cdf 01-Apr-2007 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Style nit.


# 695919ad 31-Mar-2007 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Make vfs_mount_destroy() and vfs_freeopts() non-static, I'd like to use them.


# c146055f 29-Mar-2007 Konstantin Belousov <kib@FreeBSD.org>

Extend rev. 1.210 to avoid dereference NULL mp in VFS_NEEDSGIANT and
VFS_ASSERT_GIANT. Stop using reserved namespace.

Reported and tested by: kris
Reviewed and enhanced by: tegge
MFC after: 1 week


# 2c7b0f41 16-Feb-2007 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Remove VFS_VPTOFH entirely. API is already broken and it is good time to
do it.

Suggested by: rwatson


# 10bcafe9 15-Feb-2007 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Move vnode-to-file-handle translation from vfs_vptofh to vop_vptofh method.
This way we may support multiple structures in v_data vnode field within
one file system without using black magic.

Vnode-to-file-handle should be VOP in the first place, but was made VFS
operation to keep interface as compatible as possible with SUN's VFS.
BTW. Now Solaris also implements vnode-to-file-handle as VOP operation.

VFS_VPTOFH() was left for API backward compatibility, but is marked for
removal before 8.0-RELEASE.

Approved by: mckusick
Discussed with: many (on IRC)
Tested with: ufs, msdosfs, cd9660, nullfs and zfs


# 2892f3bb 16-Dec-2006 Craig Rodrigues <rodrigc@FreeBSD.org>

Add a function vfs_deleteopt() which searches through the vfsoptlist
linked list of mount options by name, and deletes the option if it finds it.


# 206ad245 31-Oct-2006 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Add MNT_GJOURNAL flag which indicates, that file system has gjournal
support enabled.
Add mnt_gjprovider field which keeps gjournal provider's name on which
file system is placed on. This allows to not place file system on gjournal
directly and allows gjournal class to pair gjournal provider with file
system.

Sponsored by: home.pl


# 30af7119 03-Oct-2006 Konstantin Belousov <kib@FreeBSD.org>

Fix the remaining race in the revs. 1.232, 1,233 that could occur during
unmount when mp structure is reused while waiting for coveredvp lock.
Introduce struct mount generation count, increment it on each reuse and
compare the generations before and after obtaining the coveredvp lock.

Reviewed by: tegge, pjd
Approved by: pjd (mentor)
MFC after: 2 weeks


# 55b4ff0d 25-Sep-2006 Tor Egge <tegge@FreeBSD.org>

Increase mnt_noasync once in softdep_mount() to disallow async io,
closing a window where a file system using softupdates could be async
for a short while if both MNT_UPDATE and MNT_ASYNC were passed as flags
to nmount(). Add MNTK_SOFTDEP flag to ensure that softdep_mount()
doesn't increase mnt_noasync multiple times.


# a1e363f2 25-Sep-2006 Tor Egge <tegge@FreeBSD.org>

Add mnt_noasync counter to better handle interleaved calls to nmount(),
sync() and sync_fsync() without losing MNT_ASYNC. Add MNTK_ASYNC flag
which is set only when MNT_ASYNC is set and mnt_noasync is zero, and
check that flag instead of MNT_ASYNC before initiating async io.


# 5da56ddb 25-Sep-2006 Tor Egge <tegge@FreeBSD.org>

Use mount interlock to protect all changes to mnt_flag and mnt_kern_flag.
This eliminates a race where MNT_UPDATE flag could be lost when nmount()
raced against sync(), sync_fsync() or quotactl().


# 7d7d9e22 13-Sep-2006 Mohan Srinivasan <mohans@FreeBSD.org>

Fixes up the handling of shared vnode lock lookups in the NFS client,
adds a FS type specific flag indicating that the FS supports shared
vnode lock lookups, adds some logic in vfs_lookup.c to test this flag
and set lock flags appropriately.

- amd on 6.x is a non-starter (without this change). Using amd under
heavy load results in a deadlock (with cascading vnode locks all the
way to the root) very quickly.
- This change should also fix the more general problem of cascading
vnode deadlocks when an NFS server goes down.

Ideally, we wouldn't need these changes, as enabling shared vnode lock
lookups globally would work. Unfortunately, UFS, for example isn't
ready for shared vnode lock lookups, crashing pretty quickly.

This change is the result of discussions with Stephan Uphoff (ups@).

Reviewed by: ups@


# 5ac6cbfd 05-May-2006 Tor Egge <tegge@FreeBSD.org>

Avoid dereferencing NULL pointer.


# 9eee2605 30-Mar-2006 Jeff Roberson <jeff@FreeBSD.org>

- Define mnt_startzero and mnt_endzero as a range that excludes mnt_mtx
and mnt_lock so that the mountpoint can be explicitly zeroed on
creation.

Discussed with: tegge
Tested by: kris
Sponsored by: Isilon Systems, Inc.


# ca2fa807 10-Mar-2006 Tor Egge <tegge@FreeBSD.org>

Block secondary writes while expunging active unlinked files.

Fix detection of active unlinked files by checking VI_OWEINACT and
VI_DOINGINACT in addition to v_usecount.

Defer inactive handling for unlinked files if the file system is mostly
suspended (secondary writes being blocked).

Perform deferred inactive handling after the file system is resumed.


# 791dd2fa 08-Mar-2006 Tor Egge <tegge@FreeBSD.org>

Use vn_start_secondary_write() and vn_finished_secondary_write() as a
replacement for vn_write_suspend_wait() to better account for secondary write
processing.

Close race where secondary writes could be started after ffs_sync() returned
but before the file system was marked as suspended.

Detect if secondary writes or softdep processing occurred during vnode sync
loop in ffs_sync() and retry the loop if needed.


# eb2ea105 01-Mar-2006 Jeff Roberson <jeff@FreeBSD.org>

- Move softdep from using a global worklist to per-mount worklists. This
has many positive effects including improved smp locking, reducing
interdependencies between mounts that can lead to deadlocks, etc.
- Add the softdep worklist and various counters to the ufsmnt structure.
- Add a mount pointer to the workitem and remove mount pointers from the
various structures derived from the workitem as they are now redundant.
- Remove the poor-man's semaphore protecting softdep_process_worklist and
softdep_flushworklist. Several threads may now process the list
simultaneously.
- Add softdep_waitidle() to block the thread until all pending
dependencies being operated on by other threads have been flushed.
- Use softdep_waitidle() in unmount and snapshots to block either
operation until the fs is stable.
- Remove softdep worklist processing from the syncer and move it into the
softdep_flush() thread. This thread processes all softdep mounts
once each second and when it is called via the new softdep_speedup()
when there is a resource shortage. This removes the softdep hook
from the kernel and various hacks in header files to support it.

Reviewed by/Discussed with: tegge, truckman, mckusick
Tested by: kris


# 04f6d3ef 06-Feb-2006 Jeff Roberson <jeff@FreeBSD.org>

- Add a ref count to the mount structure. Sleep for up to 3 seconds in
vfs_mount_destroy waiting for this ref to hit 0. We don't print an
error if we are rebooting as the root mount always retains some refernces
by init proc.
- Acquire a mnt ref for every vnode allocated to a mount point. Drop this
ref only once vdestroy() has been called and the mount has been freed.
- No longer NULL the v_mount pointer in delmntque() so that we may release
the ref after vgone() has been called. This allows us to guarantee
that the mount point structure will be valid until the last vnode has
lost its last ref.
- Fix a few places that rely on checking v_mount to detect recycling.

Sponsored by: Isilon Systems, Inc.
MFC After: 1 week


# 82be0a5a 09-Jan-2006 Tor Egge <tegge@FreeBSD.org>

Add marker vnodes to ensure that all vnodes associated with the mount point are
iterated over when using MNT_VNODE_FOREACH.

Reviewed by: truckman


# a94d0a9d 18-Dec-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

- Document another spare flag (0x00000010).
- Add a 'XXX' comment about MNT_ACLS and MNT_BYFSID flags collision and
explain why it is harmless.
- Add a colon after 'XXX' for consistency.


# 0430a5e2 13-Dec-2005 Dag-Erling Smørgrav <des@FreeBSD.org>

Eradicate caddr_t from the VFS API.


# 2207c764 28-Nov-2005 Craig Rodrigues <rodrigc@FreeBSD.org>

Remove MNT_NODEV mount option. In RELENG_6, MNT_NODEV was a no-op.
The presence of MNT_NODEV was confusing the am-utils autoconf scripts.

PR: conf/79715


# 84e69560 07-Nov-2005 Craig Rodrigues <rodrigc@FreeBSD.org>

Add utility function to propagate mount errors as text string messages.

Discussed with: phk


# b133aa18 23-Oct-2005 Pawel Jakub Dawidek <pjd@FreeBSD.org>

MNT_JAILDEVFS is not used anymore. Mark it as spare.

OK'ed by: phk


# 34cc826a 05-Aug-2005 Suleiman Souhlal <ssouhlal@FreeBSD.org>

Holding a vnode doesn't prevent v_mount from disappearing (when the
vnode is inactivated), possibly leading to a NULL dereference when
checking if the mount wants knotes to be activated in the VOP hooks.
So, we add a new vnode flag VV_NOKNOTE that is only set in getnewvnode(),
if necessary, and check it when activating knotes.
Since the flags are not erased when a vnode is being held, we can safely
read them.

Reviewed by: kris@
MFC after: 3 days


# 571dcd15 01-Jul-2005 Suleiman Souhlal <ssouhlal@FreeBSD.org>

Fix the recent panics/LORs/hangs created by my kqueue commit by:

- Introducing the possibility of using locks different than mutexes
for the knlist locking. In order to do this, we add three arguments to
knlist_init() to specify the functions to use to lock, unlock and
check if the lock is owned. If these arguments are NULL, we assume
mtx_lock, mtx_unlock and mtx_owned, respectively.

- Using the vnode lock for the knlist locking, when doing kqueue operations
on a vnode. This way, we don't have to lock the vnode while holding a
mutex, in filt_vfsread.

Reviewed by: jmg
Approved by: re (scottl), scottl (mentor override)
Pointyhat to: ssouhlal
Will be happy: everyone


# 679985d0 09-Jun-2005 Suleiman Souhlal <ssouhlal@FreeBSD.org>

Allow EVFILT_VNODE events to work on every filesystem type, not just
UFS by:
- Making the pre and post hooks for the VOP functions work even when
DEBUG_VFS_LOCKS is not defined.
- Moving the KNOTE activations into the corresponding VOP hooks.
- Creating a MNTK_NOKNOTE flag for the mnt_kern_flag field of struct
mount that permits filesystems to disable the new behavior.
- Creating a default VOP_KQFILTER function: vfs_kqfilter()

My benchmarks have not revealed any performance degradation.

Reviewed by: jeff, bde
Approved by: rwatson, jmg (kqueue changes), grehan (mentor)


# 35f19cdc 24-Mar-2005 Jeff Roberson <jeff@FreeBSD.org>

- Add a 'flags' parameter to VFS_ROOT(). This is intended to allow
lookup to do shared locks on the root. Filesystems are free to ignore
flags and instead acquire an exclusive lock if they do not support
shared locks.

Sponsored by: Isilon Systems, Inc.


# 78bb3c21 16-Mar-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Add mnt_hashseed to struct mount and initialize it witn PRNG bits, use
it to get better hashing in vfs_hash.

In case of an insert collision in vfs_hash_insert(), put the loosing vnode
on a special list so that vfs_hash_remove() can just assume that it is on
a list.

Drop the VI_HASHED flag.


# e8ed9330 20-Feb-2005 David Schultz <das@FreeBSD.org>

Remove VFS_START(). Its original purpose involved the mfs filesystem,
which is long gone.

Discussed with: mckusick
Reviewed by: phk


# ebbfc2f8 09-Feb-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Make various mountpoint related functions static.


# db50d057 24-Jan-2005 Jeff Roberson <jeff@FreeBSD.org>

- Add the mount flag MNTK_MPSAFE which indicates whether or not Giant
must be held when any vnode owned by the filesystem is manipulated.
- Add VFS_LOCK_GIANT and VFS_UNLOCK_GIANT macros which are used to
conditionally lock and unlock Giant based on a particular mountpoint.


# 8df6bac4 11-Jan-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Remove the unused credential argument from VOP_FSYNC() and VFS_SYNC().

I'm not sure why a credential was added to these in the first place, it is
not used anywhere and it doesn't make much sense:

The credentials for syncing a file (ability to write to the
file) should be checked at the system call level.

Credentials for syncing one or more filesystems ("none")
should be checked at the system call level as well.

If the filesystem implementation needs a particular credential
to carry out the syncing it would logically have to the
cached mount credential, or a credential cached along with
any delayed write data.

Discussed with: rwatson


# 60727d8b 06-Jan-2005 Warner Losh <imp@FreeBSD.org>

/* -> /*- for license, minor formatting changes


# 20a92a18 07-Dec-2004 Poul-Henning Kamp <phk@FreeBSD.org>

The remaining part of nmount/omount/rootfs mount changes. I cannot sensibly
split the conversion of the remaining three filesystems out from the root
mounting changes, so in one go:

cd9660:
Convert to nmount.
Add omount compat shims.
Remove dedicated rootfs mounting code.
Use vfs_mountedfrom()
Rely on vfs_mount.c calling VFS_STATFS()

nfs(client):
Convert to nmount (the simple way, mount_nfs(8) is still necessary).
Add omount compat shims.
Drop COMPAT_PRELITE2 mount arg compatibility.

ffs:
Convert to nmount.
Add omount compat shims.
Remove dedicated rootfs mounting code.
Use vfs_mountedfrom()
Rely on vfs_mount.c calling VFS_STATFS()

Remove vfs_omount() method, all filesystems are now converted.

Remove MNTK_WANTRDWR, handling RO/RW conversions is a filesystem
task, and they all do it now.

Change rootmounting to use DEVFS trampoline:

vfs_mount.c:
Mount devfs on /. Devfs needs no 'from' so this is clean.
symlink /dev to /. This makes it possible to lookup /dev/foo.
Mount "real" root filesystem on /.
Surgically move the devfs mountpoint from under the real root
filesystem onto /dev in the real root filesystem.

Remove now unnecessary getdiskbyname().

kern_init.c:
Don't do devfs mounting and rootvnode assignment here, it was
already handled by vfs_mount.c.

Remove now unused bdevvp(), addaliasu() and addalias(). Put the
few necessary lines in devfs where they belong. This eliminates the
second-last source of bogo vnodes, leaving only the lemming-syncer.

Remove rootdev variable, it doesn't give meaning in a global context and
was not trustworth anyway. Correct information is provided by
statfs(/).


# 53a05b7c 06-Dec-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Add more functions for handling mount arguments in VFS_MOUNT():

vfs_flagopt() for binary/boolean options.
vfs_getopts() for string options
vfs_filteropt() to check for unknown options.
vfs_scanopt() for scanf() like processing of options.

Also add function for setting the stat.f_mntfromname field.


# 5ddb0739 06-Dec-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Change the first argument of vfs_cmount() to a handy struct mntarg* and
call it accordingly.

(No filesystems implement vfs_cmount() yet, so this is a no-op commit)


# 49bfeeb8 06-Dec-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Add a few convenient functions in the mount_arg() family and collect the
entire family at the end of the source file.


# a804d99c 05-Dec-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Make struct vfsopt{list} private to vfs_mount.c


# 74331236 05-Dec-2004 Poul-Henning Kamp <phk@FreeBSD.org>

VFS_STATFS(mp, ...) is mostly called with &mp->mnt_stat, but a few cases
doesn't. Most of the implementations have grown weeds for this so they
copy some fields from mnt_stat if the passed argument isn't that.

Fix this the cleaner way: Always call the implementation on mnt_stat
and copy that in toto to the VFS_STATFS argument if different.


# 6c12df5a 03-Dec-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Implement a function, mount_arg() for accumulating a list of mount parameters
to nmount.

Make kernel_mount() accept the output from mount_arg() and know how to
free the malloc'ed space.

Make kernel_vmount() use the new function.


# 7ec0ec06 03-Dec-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Add vfs_cmount() method to vfs_ops, this is to convert old-style mount
args to nmount request.


# a08805c7 03-Dec-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Retire unused vfs_mount() function in the name of nmount migration.


# 32ba8e93 03-Dec-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Introduce vfs_byname_kld() which will try to load the filesystem
as a module if possible.

Use it so we don't have linker magic in the middle of the already
complex mount code.


# 6518a5aa 26-Nov-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Eliminate MNT_NODEV usage, it doesn't have any meaning any more.

Keep a #define MNT_NODEV 0 around to avoid dealing with contrib
userland like mount_smbfs.


# de4cbbf5 25-Nov-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Integrate the relevant bits of vfs_rootmountalloc() where it matters.


# 996b2c82 29-Oct-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Loose vfs_mountedon()


# f9b3b0e6 04-Aug-2004 Maxim Konovalov <maxim@FreeBSD.org>

o Fix a typo in the comment.


# 5e8c582a 30-Jul-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Put a version element in the VFS filesystem configuration structure
and refuse initializing filesystems with a wrong version. This will
aid maintenance activites on the 5-stable branch.

s/vfs_mount/vfs_omount/

s/vfs_nmount/vfs_mount/

Name our filesystems mount function consistently.

Eliminate the namiedata argument to both vfs_mount and vfs_omount.
It was originally there to save stack space. A few places abused
it to get hold of some credentials to pass around. Effectively
it is unused.

Reorganize the root filesystem selection code.


# 3dfe213e 27-Jul-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Convert the vfsconf list to a TAILQ.

Introduce vfs_byname() function to find things on it.

Staticize vfs_nmount() function under the name vfs_donmount().

Various cleanups.


# 26035074 17-Jul-2004 Alfred Perlstein <alfred@FreeBSD.org>

Fix macro so that we don't get missing initializer warnings.


# f257b7a5 12-Jul-2004 Alfred Perlstein <alfred@FreeBSD.org>

Make VFS_ROOT() and vflush() take a thread argument.
This is to allow filesystems to decide based on the passed thread
which vnode to return.
Several filesystems used curthread, they now use the passed thread.


# 2260fef1 07-Jul-2004 Alfred Perlstein <alfred@FreeBSD.org>

struct mount->mnt_data has been a qaddr_t since '94 (rev 1.1),
It should be a void *, fix it.


# 81d16e2d 07-Jul-2004 Alfred Perlstein <alfred@FreeBSD.org>

do the vfsstd thing instead of messing up our VFS_SYSCTL macro.


# ea0104b0 06-Jul-2004 Alfred Perlstein <alfred@FreeBSD.org>

Introduce vfs_suser(), used to test if a user should have special privs
for a mount.


# c713aaae 06-Jul-2004 Alfred Perlstein <alfred@FreeBSD.org>

NFS mobility PHASE I, II & III (phase VI, and V pending):

Rebind the client socket when we experience a timeout. This fixes
the case where our IP changes for some reason.

Signal a VFS event when NFS transitions from up to down and vice
versa.

Add a placeholder vfs_sysctl where we will put status reporting
shortly.

Also:
Make down NFS mounts return EIO instead of EINTR when there is a
soft timeout or force unmount in progress.


# 2d1dca73 04-Jul-2004 Alfred Perlstein <alfred@FreeBSD.org>

Pass the operation in with the fsidctl.
Remove some fsidctls that we will not be using.
Correct prototypes for fs sysctls.


# 94ed9c8a 04-Jul-2004 Alfred Perlstein <alfred@FreeBSD.org>

Introduce a new kevent filter. EVFILT_FS that will be used to signal
generic filesystem events to userspace. Currently only mount and unmount
of filesystems are signalled. Soon to be added, up/down status of NFS.

Introduce a sysctl node used to route requests to/from filesystems
based on filesystem ids.

Introduce a new vfsop, vfs_sysctl(mp, req) that is used as the callback/
entrypoint by the sysctl code to change individual filesystems.


# e3c5a7a4 04-Jul-2004 Poul-Henning Kamp <phk@FreeBSD.org>

When we traverse the vnodes on a mountpoint we need to look out for
our cached 'next vnode' being removed from this mountpoint. If we
find that it was recycled, we restart our traversal from the start
of the list.

Code to do that is in all local disk filesystems (and a few other
places) and looks roughly like this:

MNT_ILOCK(mp);
loop:
for (vp = TAILQ_FIRST(&mp...);
(vp = nvp) != NULL;
nvp = TAILQ_NEXT(vp,...)) {
if (vp->v_mount != mp)
goto loop;
MNT_IUNLOCK(mp);
...
MNT_ILOCK(mp);
}
MNT_IUNLOCK(mp);

The code which takes vnodes off a mountpoint looks like this:

MNT_ILOCK(vp->v_mount);
...
TAILQ_REMOVE(&vp->v_mount->mnt_nvnodelist, vp, v_nmntvnodes);
...
MNT_IUNLOCK(vp->v_mount);
...
vp->v_mount = something;

(Take a moment and try to spot the locking error before you read on.)

On a SMP system, one CPU could have removed nvp from our mountlist
but not yet gotten to assign a new value to vp->v_mount while another
CPU simultaneously get to the top of the traversal loop where it
finds that (vp->v_mount != mp) is not true despite the fact that
the vnode has indeed been removed from our mountpoint.

Fix:

Introduce the macro MNT_VNODE_FOREACH() to traverse the list of
vnodes on a mountpoint while taking into account that vnodes may
be removed from the list as we go. This saves approx 65 lines of
duplicated code.

Split the insmntque() which potentially moves a vnode from one mount
point to another into delmntque() and insmntque() which does just
what the names say.

Fix delmntque() to set vp->v_mount to NULL while holding the
mountpoint lock.


# 89c9c53d 16-Jun-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Do the dreaded s/dev_t/struct cdev */
Bump __FreeBSD_version accordingly.


# 4af6b509 11-Apr-2004 Maxime Henrion <mux@FreeBSD.org>

Belatedly remove the getvfsent(3) API. All the consumers have been
updated to use getvfsbyname(3) or the vfs.conflist sysctl since a
long time, except mount_smbfs(8) which has just been fixed.


# 0bf57301 11-Apr-2004 Maxime Henrion <mux@FreeBSD.org>

Put struct ovfsconf inside BURN_BRIDGES as well.


# 82c6e879 06-Apr-2004 Warner Losh <imp@FreeBSD.org>

Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core


# 71529a89 06-Apr-2004 Bruce Evans <bde@FreeBSD.org>

Oops, fixed insertion sort error in the fix for an insertion sort error.

While here, begin fixing dependencies of <sys/mount.h> on normal namespace
pollution (__BSD_VISIBLE) by not using u_int in the prototype for nmount(2),
although it is used in the man page.

While there, begin cleaning up another set of prototypes:
- use u_int in the prototype for the kernel part of nmount().
- consistently don't use parameter names in prototypes in the
"exported vnode operations" set of prototypes, although style(9) says to
use names in the kernel.


# f468f075 06-Apr-2004 Bruce Evans <bde@FreeBSD.org>

Fixed unsorting of prototypes in previous commit and 1.134.


# e2c8a799 05-Apr-2004 Doug Rabson <dfr@FreeBSD.org>

Regen.


# 537370d0 16-Mar-2004 Tim J. Robbins <tjr@FreeBSD.org>

Make vfs_nmount() public. The Linux emulator needs this in order to mount
linprocfs filesystems.


# 2b348f74 11-Mar-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Remove unused mnt_reservedvnlist field.


# 43a55a72 02-Feb-2004 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Added flag MNT_USER to MNT_UPDATEMASK, it will be used for detecting
file systems mounted by unprivileged users.

Reviewed by: rwatson
Approved by: scottl (mentor)
MFC after: 3 days


# fde81c7d 12-Nov-2003 Kirk McKusick <mckusick@FreeBSD.org>

Update the statfs structure with 64-bit fields to allow
accurate reporting of multi-terabyte filesystem sizes.

You should build and boot a new kernel BEFORE doing a `make world'
as the new kernel will know about binaries using the old statfs
structure, but an old kernel will not know about the new system
calls that support the new statfs structure. Running an old kernel
after a `make world' will cause programs such as `df' that do a
statfs system call to fail with a bad system call.

Reviewed by: Bruce Evans <bde@zeta.org.au>
Reviewed by: Tim Robbins <tjr@freebsd.org>
Reviewed by: Julian Elischer <julian@elischer.org>
Reviewed by: the hoards of <arch@freebsd.org>
Sponsored by: DARPA & NAI Labs.


# eca8a663 11-Nov-2003 Robert Watson <rwatson@FreeBSD.org>

Modify the MAC Framework so that instead of embedding a (struct label)
in various kernel objects to represent security data, we embed a
(struct label *) pointer, which now references labels allocated using
a UMA zone (mac_label.c). This allows the size and shape of struct
label to be varied without changing the size and shape of these kernel
objects, which become part of the frozen ABI with 5-STABLE. This opens
the door for boot-time selection of the number of label slots, and hence
changes to the bound on the number of simultaneous labeled policies
at boot-time instead of compile-time. This also makes it easier to
embed label references in new objects as required for locking/caching
with fine-grained network stack locking, such as inpcb structures.

This change also moves us further in the direction of hiding the
structure of kernel objects from MAC policy modules, not to mention
dramatically reducing the number of '&' symbols appearing in both the
MAC Framework and MAC policy modules, and improving readability.

While this results in minimal performance change with MAC enabled, it
will observably shrink the size of a number of critical kernel data
structures for the !MAC case, and should have a small (but measurable)
performance benefit (i.e., struct vnode, struct socket) do to memory
conservation and reduced cost of zeroing memory.

NOTE: Users of MAC must recompile their kernel and all MAC modules as a
result of this change. Because this is an API change, third party
MAC modules will also need to be updated to make less use of the '&'
symbol.

Suggestions from: bmilekic
Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


# 5c957adb 11-Nov-2003 Alexander Kabaev <kan@FreeBSD.org>

1. Consolidate mount struct allocation/destruction into a common code in
vfs_mount_alloc/vfs_mount_destroy functions and take care to completely
destroy the mount point along with its locks. Mount struct has grown in
coplexity recently and depending on each failure path to destroy it
completely isn't working anymore.

2. Eliminate largely identical vfs_mount and vfs_unmount question by
moving the code to handle both cases into a newly introduced vfs_domount
function.

3. Simplify nfs_mount_diskless to always expect an allocated mount
struct and never attempt an allocation/destruction itself. The
vfs_allocroot allocation was there to support 'magic' swap space
configuration for diskless clients that was already removed by PHK some
time ago.

4. Include a vfs_buildopts cleanups by Peter Edwards to validate the
sanity of nmount parameters passed from userland.

Submitted by: (4) Peter Edwards <peter.edwards@openet-telecom.com>
Reviewed by: rwatson


# ca430f2e 04-Nov-2003 Alexander Kabaev <kan@FreeBSD.org>

Remove mntvnode_mtx and replace it with per-mountpoint mutex.
Introduce two new macros MNT_ILOCK(mp)/MNT_IUNLOCK(mp) to
operate on this mutex transparently.

Eventually new mutex will be protecting more fields in
struct mount, not only vnode list.

Discussed with: jeff


# 318f2fb4 01-Jul-2003 Ian Dowse <iedowse@FreeBSD.org>

Add a new mount flag MNT_BYFSID that can be used to unmount a file
system by specifying the file system ID instead of a path. Use this
by default in umount(8). This avoids the need to perform any vnode
operations to look up the mount point, so it makes it possible to
unmount a file system whose root vnode cannot be looked up (e.g.
due to a dead NFS server, or a file system that has become detached
from the hierarchy because an underlying file system was unmounted).
It also provides an unambiguous way to specify which file system is
to be unmunted.

Since the ability to unmount using a path name is retained only for
compatibility, that case now just uses a simple string comparison
of the supplied path against f_mntonname of each mounted file system.

Discussed on: freebsd-arch
mdoc help from: ru


# 6b080461 26-Mar-2003 Tor Egge <tegge@FreeBSD.org>

Adjust the number of vnodes scanned by vlrureclaim() according to the
size of the vnode list.


# c162e9c2 11-Mar-2003 Alexander Kabaev <kan@FreeBSD.org>

Rename vfs_stdsync function to vfs_stdnosync which matches more
closely what function is really doing. Update all existing consumers
to use the new name.

Introduce a new vfs_stdsync function, which iterates over mount
point's vnodes and call FSYNC on each one of them in turn.

Make nwfs and smbfs use this new function instead of rolling their
own identical sync implementations.

Reviewed by: jeff


# 72f0679c 10-Mar-2003 Alexander Kabaev <kan@FreeBSD.org>

Remove trainling whitespace.


# 5606ac9e 27-Dec-2002 Robert Watson <rwatson@FreeBSD.org>

Re-add MNT_ACLS to the list of "updateable" mount flags, per our
documentation. Generally, you really shouldn't twiddle the flag,
but there are sensible scenarios where one might.

Obtained from: TrustedBSD Project


# b78beb60 07-Nov-2002 Maxime Henrion <mux@FreeBSD.org>

A bunch of style(9) fixes.

Obtained from: bde


# b65d1ba9 07-Nov-2002 Maxime Henrion <mux@FreeBSD.org>

- Use a better definition for MNAMELEN which doesn't require
to have one #ifdef per architecture.
- Change a space to a tab after a nearby #define.

Obtained from: bde


# a16a92af 14-Oct-2002 Robert Watson <rwatson@FreeBSD.org>

Define MNT_ACLS, which can report on the status of the FS_ACLS flag
used by UFS to administratively enable support for extended ACLs.

While I'm here, remove MNT_MULTILABEL from the list of file system
flags we permit to be updated after the initial mount.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


# fee7d450 19-Aug-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Keep a copy of the credential used to mount filesystems around so
we can check and use it later on.

Change the pieces of code which relied on mount->mnt_stat.f_owner
to check which user mounted the filesystem.

This became needed as the EA code needs to be able to allocate
blocks for "system" EA users like ACLs.

There seems to be some half-baked (probably only quarter- actually)
notion that the superuser for a given filesystem is the user who
mounted it, but this has far from been carried through. It is
unclear if it should be.

Sponsored by: DARPA & NAI Labs.


# 01abbb42 13-Aug-2002 Robert Watson <rwatson@FreeBSD.org>

Move to a nested include of _label.h instead of mac.h in sys/sys/*.h
(Most of the places where mac.h was recursively included from another
kernel header file. net/netinet to follow.)

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs
Suggested by: bde


# bf20c7a3 13-Aug-2002 Maxime Henrion <mux@FreeBSD.org>

Forward define struct iovec instead of including
sys/uio.h and polluting the namespace even more.


# 9bf1a756 13-Aug-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Introduce typedefs for the member functions of struct vfsops and employ
these in the main filesystems. This does not change the resulting code
but makes the source a little bit more grepable.

Sponsored by: DARPA and NAI Labs.


# 4033e07e 10-Aug-2002 Maxime Henrion <mux@FreeBSD.org>

Don't #ifdef _KERNEL struct vfsconf, mount_smbfs(8)
still uses it.

Submitted by: jake


# 136be715 10-Aug-2002 Maxime Henrion <mux@FreeBSD.org>

One declaration for struct xvfsconf is enough. I have
no idea how this happened. :-)

Reported by: Norman C. Rice <nrice@emu.sourcee.com>


# 5965373e 10-Aug-2002 Maxime Henrion <mux@FreeBSD.org>

- Introduce a new struct xvfsconf, the userland version of struct vfsconf.
- Make getvfsbyname() take a struct xvfsconf *.
- Convert several consumers of getvfsbyname() to use struct xvfsconf.
- Correct the getvfsbyname.3 manpage.
- Create a new vfs.conflist sysctl to dump all the struct xvfsconf in the
kernel, and rewrite getvfsbyname() to use this instead of the weird
existing API.
- Convert some {set,get,end}vfsent() consumers to use the new vfs.conflist
sysctl.
- Convert a vfsload() call in nfsiod.c to kldload() and remove the useless
vfsisloadable() and endvfsent() calls.
- Add a warning printf() in vfs_sysctl() to tell people they are using
an old userland.

After these changes, it's possible to modify struct vfsconf without
breaking the binary compatibility. Please note that these changes don't
break this compatibility either.

When bp will have updated mount_smbfs(8) with the patch I sent him, there
will be no more consumers of the {set,get,end}vfsent(), vfsisloadable()
and vfsload() API, and I will promptly delete it.


# 3b2e6009 30-Jul-2002 Robert Watson <rwatson@FreeBSD.org>

Begin committing support for Mandatory Access Control and extensible
kernel access control. The MAC framework permits loadable kernel
modules to link to the kernel at compile-time, boot-time, or run-time,
and augment the system security policy. This commit includes the
initial kernel implementation, although the interface with the userland
components of the oeprating system is still under work, and not all
kernel subsystems are supported. Later in this commit sequence,
documentation of which kernel subsystems will not work correctly with
a kernel compiled with MAC support will be added.

Label file system mount points, permitting security information to be
maintained at the granularity of the file system. Two labels are
currently maintained: a security label for the mount itself, and
a default label for objects in the file system (in particular, for
file systems not supporting per-vnode labeling directly).

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# fbedc80b 09-Jul-2002 Maxime Henrion <mux@FreeBSD.org>

Remove vfs_stdmount() and vfs_stdunmount(). They are not
really useful and are incompatible with nmount.


# 563af2ec 03-Jul-2002 Maxime Henrion <mux@FreeBSD.org>

Remove an unused argument in vfs_mountroot().


# 2b4edb69 02-Jul-2002 Maxime Henrion <mux@FreeBSD.org>

Move every code related to mount(2) in a new file, vfs_mount.c.
The file vfs_conf.c which was dealing with root mounting has
been repo-copied into vfs_mount.c to preserve history.
This makes nmount related development easier, and help reducing
the size of vfs_syscalls.c, which is still an enormous file.

Reviewed by: rwatson
Repo-copy by: peter


# cacd1c9b 22-Jun-2002 Maxime Henrion <mux@FreeBSD.org>

o Remove the initialization of unused fields in the struct
uio now that we don't use uiomove() anymore.
o Enforce stricter checks on the length of the iov's in
nmount(2) since we now malloc() them individually and
corrupted iov's could make the kernel crash in malloc()
with "kmem_map too small".

Reviewed by: phk


# 7d2d4409 20-Jun-2002 Maxime Henrion <mux@FreeBSD.org>

Change the way we internally store the mount options to
a linked list. This is to allow the merging of the mount
options in the MNT_UPDATE case, as the current data structure
is unsuitable for this.

There are no functional differences in this commit.

Reviewed by: phk


# fe937506 14-Jun-2002 Maxime Henrion <mux@FreeBSD.org>

Change vfs_copyopt() so that the length argument passed to it
must be the exact same size as the mount option. This makes
vfs_copyopt() much more useful.


# cdb5638a 23-May-2002 Maxime Henrion <mux@FreeBSD.org>

Update comments to better match reality.


# d394511d 16-May-2002 Tom Rhodes <trhodes@FreeBSD.org>

More s/file system/filesystem/g


# df99ca52 16-Apr-2002 Ian Dowse <iedowse@FreeBSD.org>

The recent NFS forced unmount improvements introduced a side-effect
where some client operations might be unexpectedly cancelled during
an unsuccessful non-forced unmount attempt. This causes problems
for amd(8), because it periodically attempts a non-forced unmount
to check if the filesystem is still in use.

Fix this by adding a new mountpoint flag MNTK_UNMOUNTF that is set
only during the operation of a forced unmount. Use this instead of
MNTK_UNMOUNT to trigger the cancellation of hung NFS operations.

Also correct a problem where dounmount() might inadvertently clear
the MNTK_UNMOUNT flag.

Reported by: simokawa
MFC after: 1 week


# 5616879a 26-Mar-2002 Maxime Henrion <mux@FreeBSD.org>

Commit the good prototype for nmount(2).

Reviewed by: phk


# 17594b93 26-Mar-2002 Maxime Henrion <mux@FreeBSD.org>

As discussed in -arch, add the new nmount(2) system call and the
new vfs_getopt()/vfs_copyopt() API. This is intended to be used
later, when there will be filesystems implementing the VFS_NMOUNT
operation. The mount(2) system call will disappear when all
filesystems will be converted to the new API. Documentation will
be committed in a while.

Reviewed by: phk


# c58eb46e 23-Mar-2002 Bruce Evans <bde@FreeBSD.org>

Fixed some style bugs in the removal of __P(()). The main ones were
not removing tabs before "__P((", and not outdenting continuation lines
to preserve non-KNF lining up of code with parentheses. Switch to KNF
formatting and/or rewrap the whole prototype in some cases.


# 789f12fe 19-Mar-2002 Alfred Perlstein <alfred@FreeBSD.org>

Remove __P


# a0595d02 16-Mar-2002 Kirk McKusick <mckusick@FreeBSD.org>

Add a flags parameter to VFS_VGET to pass through the desired
locking flags when acquiring a vnode. The immediate purpose is
to allow polling lock requests (LK_NOWAIT) needed by soft updates
to avoid deadlock when enlisting other processes to help with
the background cleanup. For the future it will allow the use of
shared locks for read access to vnodes. This change touches a
lot of files as it affects most filesystems within the system.
It has been well tested on FFS, loopback, and CD-ROM filesystems.
only lightly on the others, so if you find a problem there, please
let me (mckusick@mckusick.com) know.


# fb92273b 08-Mar-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Move the mount of the root filesystem to happen in the init process before
the exec if /sbin/init.

This allows the scheduler to get started and kthreads a chance to run
before we start filesystem operations.


# fdc6e087 05-Mar-2002 Robert Watson <rwatson@FreeBSD.org>

Reserve a mount flag, MNT_MULTILABEL, used by the MAC subsystem and
individual filesystems to determine whether they should operate in
"file system as a single object" mode, or "file system as a set of objects
with individual labels" mode. Note: in the trustedbsd_mac branch,
this is refered to as "MNT_MULTILEVEL", but the two mean the same thing.
MNT_MULTILABEL is more suggestive of a flexible policy system than one
providing purely hierarchal policies. The need for a reserved flag will
go away once nmount() is done.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# 751a2cd0 05-Nov-2001 Poul-Henning Kamp <phk@FreeBSD.org>

Define a new mount flag "MNT_JAILDEVFS"

Collect the magic combination of flags which can be updated into
a macro in sys/mount.h rather than inlining them (twice!) in
vfs_syscalls.c


# 6b8bd2ef 04-Nov-2001 Matthew Dillon <dillon@FreeBSD.org>

Add mnt_reservedvnlist so we can MFC to 4.x, in order to make all mount
structure changes now rather then piecemeal later on. mnt_nvnodelist
currently holds all the vnodes under the mount point. This will eventually
be split into a 'dirty' and 'clean' list. This way we only break kld's once
rather then twice. nvnodelist will eventually turn into the dirty list
and should remain compatible with the klds.


# c72ccd01 22-Oct-2001 Matthew Dillon <dillon@FreeBSD.org>

Change the vnode list under the mount point from a LIST to a TAILQ
in preparation for an implementation of limiting code for kern.maxvnodes.

MFC after: 3 days


# b40ce416 12-Sep-2001 Julian Elischer <julian@FreeBSD.org>

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# 9ceb1844 30-Jul-2001 Jake Burkholder <jake@FreeBSD.org>

Machine dependent ifdefs for sparc64.


# f0cc1c6f 09-Jul-2001 Dag-Erling Smørgrav <des@FreeBSD.org>

Constify the fstype argument to vfs_mount(). This eliminates at least one
"call discards qualifier" warning (in sys/compat/linux/linux_file.c).


# fb49f549 09-Jun-2001 Benno Rice <benno@FreeBSD.org>

Changes to sys/ includes to support PowerPC.

Reviewed by: obrien, dfr


# fb919e4d 01-May-2001 Mark Murray <markm@FreeBSD.org>

Undo part of the tangle of having sys/lock.h and sys/mutex.h included in
other "system" header files.

Also help the deprecation of lockmgr.h by making it a sub-include of
sys/lock.h and removing sys/lockmgr.h form kernel .c files.

Sort sys/*.h includes where possible in affected files.

OK'ed by: bde (with reservations)


# a13234bb 25-Apr-2001 Poul-Henning Kamp <phk@FreeBSD.org>

Move the netexport structure from the fs-specific mountstructure
to struct mount.

This makes the "struct netexport *" paramter to the vfs_export
and vfs_checkexport interface unneeded.

Consequently that all non-stacking filesystems can use
vfs_stdcheckexp().

At the same time, make it a pointer to a struct netexport
in struct mount, so that we can remove the bogus AF_MAX
and #include <net/radix.h> from <sys/mount.h>


# b186f62c 23-Apr-2001 Greg Lehey <grog@FreeBSD.org>

Back out previous commit.

Requested by: bde


# e84a5d83 23-Apr-2001 Greg Lehey <grog@FreeBSD.org>

Remove bogus #include and duplicate definition of AF_MAX. These were
made necessary by breakage in usr.sbin/pstat and usr.bin/fstat, since
fixed.

Suggested by: phk
Unearthed by: John Hood <jhood@sitaranetworks.com>


# 4c68f41d 22-Apr-2001 Greg Lehey <grog@FreeBSD.org>

Add address families AF_SLOW and AF_SCLUSTER. These are used by the
Sitara QoSworks box.

Obtained from: Sitara Networks Inc.


# 30632071 18-Mar-2001 Robert Watson <rwatson@FreeBSD.org>

o Rename "namespace" argument to "attrnamespace" as namespace is a C++
reserved word.

Submitted by: jkh
Obtained from: TrustedBSD Project


# 70f36851 14-Mar-2001 Robert Watson <rwatson@FreeBSD.org>

o Change the API and ABI of the Extended Attribute kernel interfaces to
introduce a new argument, "namespace", rather than relying on a first-
character namespace indicator. This is in line with more recent
thinking on EA interfaces on various mailing lists, including the
posix1e, Linux acl-devel, and trustedbsd-discuss forums. Two namespaces
are defined by default, EXTATTR_NAMESPACE_SYSTEM and
EXTATTR_NAMESPACE_USER, where the primary distinction lies in the
access control model: user EAs are accessible based on the normal
MAC and DAC file/directory protections, and system attributes are
limited to kernel-originated or appropriately privileged userland
requests.

o These API changes occur at several levels: the namespace argument is
introduced in the extattr_{get,set}_file() system call interfaces,
at the vnode operation level in the vop_{get,set}extattr() interfaces,
and in the UFS extended attribute implementation. Changes are also
introduced in the VFS extattrctl() interface (system call, VFS,
and UFS implementation), where the arguments are modified to include
a namespace field, as well as modified to advoid direct access to
userspace variables from below the VFS layer (in the style of recent
changes to mount by adrian@FreeBSD.org). This required some cleanup
and bug fixing regarding VFS locks and the VFS interface, as a vnode
pointer may now be optionally submitted to the VFS_EXTATTRCTL()
call. Updated documentation for the VFS interface will be committed
shortly.

o In the near future, the auto-starting feature will be updated to
search two sub-directories to the ".attribute" directory in appropriate
file systems: "user" and "system" to locate attributes intended for
those namespaces, as the single filename is no longer sufficient
to indicate what namespace the attribute is intended for. Until this
is committed, all attributes auto-started by UFS will be placed in
the EXTATTR_NAMESPACE_SYSTEM namespace.

o The default POSIX.1e attribute names for ACLs and Capabilities have
been updated to no longer include the '$' in their filename. As such,
if you're using these features, you'll need to rename the attribute
backing files to the same names without '$' symbols in front.

o Note that these changes will require changes in userland, which will
be committed shortly. These include modifications to the extended
attribute utilities, as well as to libutil for new namespace
string conversion routines. Once the matching userland changes are
committed, a buildworld is recommended to update all the necessary
include files and verify that the kernel and userland environments
are in sync. Note: If you do not use extended attributes (most people
won't), upgrading is not imperative although since the system call
API has changed, the new userland extended attribute code will no longer
compile with old include files.

o Couple of minor cleanups while I'm there: make more code compilation
conditional on FFS_EXTATTR, which should recover a bit of space on
kernels running without EA's, as well as update copyright dates.

Obtained from: TrustedBSD Project


# f3a90da9 01-Mar-2001 Adrian Chadd <adrian@FreeBSD.org>

Reviewed by: jlemon

An initial tidyup of the mount() syscall and VFS mount code.

This code replaces the earlier work done by jlemon in an attempt to
make linux_mount() work.

* the guts of the mount work has been moved into vfs_mount().

* move `type', `path' and `flags' from being userland variables into being
kernel variables in vfs_mount(). `data' remains a pointer into
userspace.

* Attempt to verify the `type' and `path' strings passed to vfs_mount()
aren't too long.

* rework mount() and linux_mount() to take the userland parameters
(besides data, as mentioned) and pass kernel variables to vfs_mount().
(linux_mount() already did this, I've just tidied it up a little more.)

* remove the copyin*() stuff for `path'. `data' still requires copyin*()
since its a pointer into userland.

* set `mount->mnt_statf_mntonname' in vfs_mount() rather than in each
filesystem. This variable is generally initialised with `path', and
each filesystem can override it if they want to.

* NOTE: f_mntonname is intiailised with "/" in the case of a root mount.


# c0511d3b 18-Feb-2001 Brian Feldman <green@FreeBSD.org>

Switch to using a struct xucred instead of a struct xucred when not
actually in the kernel. This structure is a different size than
what is currently in -CURRENT, but should hopefully be the last time
any application breakage is caused there. As soon as any major
inconveniences are removed, the definition of the in-kernel struct
ucred should be conditionalized upon defined(_KERNEL).

This also changes struct export_args to remove dependency on the
constantly-changing struct ucred, as well as limiting the bounds
of the size fields to the correct size. This means: a) mountd and
friends won't break all the time, b) mountd and friends won't crash
the kernel all the time if they don't know what they're doing wrt
actual struct export_args layout.

Reviewed by: bde


# c3d7bcdf 16-Feb-2001 Jonathan Lemon <jlemon@FreeBSD.org>

Introduce copyinfrom and copyinstrfrom, which can copy data from either
user or kernel space. This will allow layering of os-compat (e.g.: linux)
system calls. Apply the changes to mount.


# 7a8671e9 04-Dec-2000 Alfred Perlstein <alfred@FreeBSD.org>

remove struct mount from useland visibility


# 6092d187 13-Oct-2000 Bruce Evans <bde@FreeBSD.org>

Fixed namespace pollution in rev.1.78. Don't export <sys/stat.h> to
userland from here; just forward declare struct stat. fhstat.2
(== fhopen.2 == fhstatfs.2) has always specified including
<sys/stat.h> before using any of the fh functions although this is
only necessary for dereferencing the "struct stat *" arg of fhstat(),
so applications should not notice this change.

Fixed unsorting of user prototypes in rev.1.78.


# a18b1f1d 03-Oct-2000 Jason Evans <jasone@FreeBSD.org>

Convert lockmgr locks from using simple locks to using mutexes.

Add lockdestroy() and appropriate invocations, which corresponds to
lockinit() and must be called to clean up after a lockmgr lock is no
longer needed.


# 918c9eec 29-Sep-2000 Doug Rabson <dfr@FreeBSD.org>

Add ia64 support.


# d4c18169 11-Jul-2000 Kirk McKusick <mckusick@FreeBSD.org>

Clean up warning about undeclared function by declaring softdep_fsync
in mount.h instead of ffs_extern.h. The correct solution is to use
an indirect function pointer so that the kernel does not have to be
built with options FFS, but that will be left for another day.


# 22e5a623 03-Jul-2000 Kirk McKusick <mckusick@FreeBSD.org>

Get userland visible flags added for snapshots to give a few days
advance preparation for them to get migrated into place so that
subsequent changes in utilities will not fail to compile for lack
of up-to-date header files in /usr/include.


# 75236818 16-Jun-2000 Poul-Henning Kamp <phk@FreeBSD.org>

ARGH! I have too many source trees :-(

Fix prototype errors in last commit.


# a2e7a027 16-Jun-2000 Poul-Henning Kamp <phk@FreeBSD.org>

Virtualizes & untangles the bioops operations vector.

Ref: Message-ID: <18317.961014572@critter.freebsd.dk> To: current@


# e3975643 25-May-2000 Jake Burkholder <jake@FreeBSD.org>

Back out the previous change to the queue(3) interface.
It was not discussed and should probably not happen.

Requested by: msmith and others


# 740a1973 23-May-2000 Jake Burkholder <jake@FreeBSD.org>

Change the way that the queue(3) structures are declared; don't assume that
the type argument to *_HEAD and *_ENTRY is a struct.

Suggested by: phk
Reviewed by: phk
Approved by: mdodd


# 8f073875 18-Jan-2000 Robert Watson <rwatson@FreeBSD.org>

Fix bde'isms in acl/extattr syscall interface, renaming syscalls to
prettier (?) names, adding some const's around here, et al.

Reviewed by: bde


# 664a31e4 28-Dec-1999 Peter Wemm <peter@FreeBSD.org>

Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL"
is an application space macro and the applications are supposed to be free
to use it as they please (but cannot). This is consistant with the other
BSD's who made this change quite some time ago. More commits to come.


# 91f37dcb 18-Dec-1999 Robert Watson <rwatson@FreeBSD.org>

Second pass commit to introduce new ACL and Extended Attribute system
calls, vnops, vfsops, both in /kern, and to individual file systems that
require a vfsop_ array entry.

Reviewed by: eivind


# 21a9e9a1 02-Dec-1999 Jordan K. Hubbard <jkh@FreeBSD.org>

Define name length differently for alpha in order to preserve
backwards compatibility.

Submitted by: Andrew Gallatin <gallatin@cs.duke.edu>
Reviewed by: mckusick


# e9cc4758 30-Nov-1999 Kirk McKusick <mckusick@FreeBSD.org>

Collect read and write counts for filesystems. This new code
drops the counting in bwrite and puts it all in spec_strategy.
I did some tests and verified that the counts collected for writes
in spec_strategy is identical to the counts that we previously
collected in bwrite. We now also get read counts (async reads
come from requests for read-ahead blocks). Note that you need
to compile a new version of mount to get the read counts printed
out. The old mount binary is completely compatible, the only
reason to install a new mount is to get the read counts printed.

Submitted by: Craig A Soules <soules+@andrew.cmu.edu>
Reviewed by: Kirk McKusick <mckusick@mckusick.com>


# 0429e37a 20-Nov-1999 Poul-Henning Kamp <phk@FreeBSD.org>

struct mountlist and struct mount.mnt_list have no business being
a CIRCLEQ. Change them to TAILQ_HEAD and TAILQ_ENTRY respectively.

This removes ugly mp != (void*)&mountlist comparisons.

Requested by: phk
Submitted by: Jake Burkholder jake@checker.org
PR: 14967


# 5b42dac8 31-Oct-1999 Julian Elischer <julian@FreeBSD.org>

Most modern OSs have the ability to flag certain mounts as ones to
be ignored by default by the df(1) program. This is used mostly to
avoid stat()-ing entries that do not represent "real" disk mount
points (such as those made by an automounter such as amd.) It is
also useful not to have to stat() these entries because it takes
longer to report them that for other file systems, being that these
mount points are served by a user-level file server and resulting in
several context switches. Worse, if the automounter is down
unexpectedly, a causal df(1) will hang in an interruptible way.

PR: kern/9764
Submitted by: Erez Zadok <ezk@cs.columbia.edu>


# 4cf49a43 21-Oct-1999 Julian Elischer <julian@FreeBSD.org>

Whistle's Netgraph link-layer (sometimes more) networking infrastructure.
Been in production for 3 years now. Gives Instant Frame relay to if_sr
and if_ar drivers, and PPPOE support soon. See:
ftp://ftp.whistle.com/pub/archie/netgraph/index.html
for on-line manual pages.

Reviewed by: Doug Rabson (dfr@freebsd.org)
Obtained from: Whistle CVS tree


# 114ae644 14-Oct-1999 Mike Smith <msmith@FreeBSD.org>

Implement pseudo_AF_HDRCMPLT, which controls the state of the 'header
completion' flag. If set, the interface output routine will assume that
the packet already has a valid link-level source address. This defaults
to off (the address is overwritten)

PR: kern/10680
Submitted by: "Christopher N . Harrell" <cnh@mindspring.net>
Obtained from: NetBSD


# 1b5464ef 29-Sep-1999 Poul-Henning Kamp <phk@FreeBSD.org>

Remove v_maxio from struct vnode.

Replace it with mnt_iosize_max in struct mount.

Nits from: bde


# e6f71111 19-Sep-1999 Matthew Dillon <dillon@FreeBSD.org>

Fix BOOTP root FS mounts. Also cleanup vfs_getnewfsid() and collapse
addaliasu() into addalias() (no operational change) and clarify comments
relating to a trick that vclean() uses.

The fix to BOOTP is yet another hack. Actually, rootfsid handling
is already a major hack. The whole thing needs to be cleaned up.

Reviewed by: David Greenman <dg@root.com>, Alan Cox <alc@cs.rice.edu>


# c24fda81 10-Sep-1999 Alfred Perlstein <alfred@FreeBSD.org>

Seperate the export check in VFS_FHTOVP, exports are now checked via
VFS_CHECKEXP.

Add fh(open|stat|stafs) syscalls to allow userland to query filesystems
based on (network) filehandle.

Obtained from: NetBSD


# 5a5fccc8 07-Sep-1999 Alfred Perlstein <alfred@FreeBSD.org>

All unimplemented VFS ops now have entries in kern/vfs_default.c that return
reasonable defaults.

This avoids confusing and ugly casting to eopnotsupp or making dummy functions.
Bogus casting of filesystem sysctls to eopnotsupp() have been removed.

This should make *_vfsops.c more readable and reduce bloat.

Reviewed by: msmith, eivind
Approved by: phk
Tested by: Jeroen Ruigrok/Asmodai <asmodai@wxs.nl>


# c3aac50f 27-Aug-1999 Peter Wemm <peter@FreeBSD.org>

$Id$ -> $FreeBSD$


# e9189611 17-Apr-1999 Peter Wemm <peter@FreeBSD.org>

Well folks, this is it - The second stage of the removal for build support
for LKM's..


# ce02431f 16-Feb-1999 Doug Rabson <dfr@FreeBSD.org>

* Change sysctl from using linker_set to construct its tree using SLISTs.
This makes it possible to change the sysctl tree at runtime.

* Change KLD to find and register any sysctl nodes contained in the loaded
file and to unregister them when the file is unloaded.

Reviewed by: Archie Cobbs <archie@whistle.com>,
Peter Wemm <peter@netplex.com.au> (well they looked at it anyway)


# afbbfd3b 15-Nov-1998 Bruce Evans <bde@FreeBSD.org>

Fixed the type and order of vfs_modevent. This fixes part of a spew of
warnings for the recent change of the type of a module event handler.

Fixed a rotted comment (numeric types of filesystems are not listed here).

Made the function protototype in VFS_SET() more like the corresponding
function definition (don't use extern for prototypes).

Enforce a semicolon after the LKM case of VFS_SET().


# 4e61198e 10-Nov-1998 Peter Wemm <peter@FreeBSD.org>

Make the vnode opv vector construction fully dynamic. Previously we
leaked memory on each unload and were limited to items referenced in
the kernel copy of vnode_if.c. Now a kernel module is free to create
it's own VOP_FOO() routines and the rest of the system will happily
deal with it, including passthrough layers like union/umap/etc.

Have VFS_SET() call a common vfs_modevent() handler rather than
inline duplicating the common code all over the place.

Have VNODEOP_SET() have the vnodeops removed at unload time (assuming a
module) so that the vop_t ** vector is reclaimed.

Slightly adjust the vop_t ** vectors so that calling slot 0 is a panic
rather than a page fault. This could happen if VOP_something() was called
without *any* handlers being present anywhere (including in vfs_default.c).
slot 1 becomes the default vector for the vnodeop table.

TODO: reclaim zones on unload (eg: nfs code)


# 7c8faeb3 06-Nov-1998 Peter Wemm <peter@FreeBSD.org>

oops! s/vfs_register/vfs_unregister/ in the unload case..

Mentioned by: dfr


# a429d69f 06-Nov-1998 Peter Wemm <peter@FreeBSD.org>

Remove trailing ';' - use the one supplied by the caller: "VFS_SET(foo);"


# aa855a59 15-Oct-1998 Peter Wemm <peter@FreeBSD.org>

*gulp*. Jordan specifically OK'ed this..

This is the bulk of the support for doing kld modules. Two linker_sets
were replaced by SYSINIT()'s. VFS's and exec handlers are self registered.
kld is now a superset of lkm. I have converted most of them, they will
follow as a seperate commit as samples.
This all still works as a static a.out kernel using LKM's.


# 3f8c4506 15-Sep-1998 Poul-Henning Kamp <phk@FreeBSD.org>

(this is an extract from src/share/examples/atm/README)

===================================
HARP | Host ATM Research Platform
===================================

HARP 3

What is this stuff?
-------------------
The Advanced Networking Group (ANG) at the Minnesota Supercomputer Center,
Inc. (MSCI), as part of its work on the MAGIC Gigabit Testbed, developed
the Host ATM Research Platform (HARP) software, which allows IP hosts to
communicate over ATM networks using standard protocols. It is intended to
be a high-quality platform for IP/ATM research.

HARP provides a way for IP hosts to connect to ATM networks. It supports
standard methods of communication using IP over ATM. A host's standard IP
software sends and receives datagrams via a HARP ATM interface. HARP provides
functionality similar to (and typically replaces) vendor-provided ATM device
driver software.

HARP includes full source code, making it possible for researchers to
experiment with different approaches to running IP over ATM. HARP is
self-contained; it requires no other licenses or commercial software packages.

HARP implements support for the IETF Classical IP model for using IP over ATM
networks, including:

o IETF ATMARP address resolution client
o IETF ATMARP address resolution server
o IETF SCSP/ATMARP server
o UNI 3.1 and 3.0 signalling protocols
o Fore Systems's SPANS signalling protocol

What's supported
----------------
The following are supported by HARP 3:

o ATM Host Interfaces
- FORE Systems, Inc. SBA-200 and SBA-200E ATM SBus Adapters
- FORE Systems, Inc. PCA-200E ATM PCI Adapters
- Efficient Networks, Inc. ENI-155p ATM PCI Adapters

o ATM Signalling Protocols
- The ATM Forum UNI 3.1 signalling protocol
- The ATM Forum UNI 3.0 signalling protocol
- The ATM Forum ILMI address registration
- FORE Systems's proprietary SPANS signalling protocol
- Permanent Virtual Channels (PVCs)

o IETF "Classical IP and ARP over ATM" model
- RFC 1483, "Multiprotocol Encapsulation over ATM Adaptation Layer 5"
- RFC 1577, "Classical IP and ARP over ATM"
- RFC 1626, "Default IP MTU for use over ATM AAL5"
- RFC 1755, "ATM Signaling Support for IP over ATM"
- RFC 2225, "Classical IP and ARP over ATM"
- RFC 2334, "Server Cache Synchronization Protocol (SCSP)"
- Internet Draft draft-ietf-ion-scsp-atmarp-00.txt,
"A Distributed ATMARP Service Using SCSP"

o ATM Sockets interface
- The file atm-sockets.txt contains further information

What's not supported
--------------------
The following major features of the above list are not currently supported:

o UNI point-to-multipoint support
o Driver support for Traffic Control/Quality of Service
o SPANS multicast and MPP support
o SPANS signalling using Efficient adapters

This software was developed under the sponsorship of the Defense Advanced
Research Projects Agency (DARPA).

Reviewed (lightly) by: phk
Submitted by: Network Computing Services, Inc.


# 8994ca3c 07-Sep-1998 Bruce Evans <bde@FreeBSD.org>

Removed statically configured mount type numbers (MOUNT_*) and all
references to them.

The change a couple of days ago to ignore these numbers in statically
configured vfsconf structs was slightly premature because the cd9660,
cfs, devfs, ext2fs, nfs vfs's still used MOUNT_* instead of the number
in their vfsconf struct.


# 500b04a2 05-Sep-1998 Bruce Evans <bde@FreeBSD.org>

Instantiate `nfs_mount_type' in a standard file so that it is present
when nfs is an LKM. Declare it in a header file. Don't forget to use
it in non-Lite2 code. Initialize it to -1 instead of to 0, since 0
will soon be the mount type number for the first vfs loaded.

NetBSD uses strcmp() to avoid this ugly global.


# 3baf1478 02-Sep-1998 Bruce Evans <bde@FreeBSD.org>

Added a vfs_oid pointer and a vfs_uninit() function to struct
vfsops. vfs_oid will be used to attach and detach vfs sysctls
dynamically. vfs_uninit() will be used to clean up before
modunloading vfs LKMs. The nfs LKM needs these features most.


# 53d2eb24 02-Sep-1998 Bruce Evans <bde@FreeBSD.org>

Backed out previous commit. VFS_LKM_NO_DEFAULT_DISPATCH wasn't used for
long, and the ifdef for it broke the forward declaration for the
dispatch function.


# 38bfd69b 25-Jul-1998 Alexander Langer <alex@FreeBSD.org>

Allow VFS LKMs to override the default module dispatch functions if
VFS_LKM_NO_DEFAULT_DISPATCH is defined.


# 79cc756d 05-May-1998 Mike Smith <msmith@FreeBSD.org>

As described by the submitter:

Reverse the VFS_VRELE patch. Reference counting of vnodes does not need
to be done per-fs. I noticed this while fixing vfs layering violations.
Doing reference counting in generic code is also the preference cited by
John Heidemann in recent discussions with him.

The implementation of alternative vnode management per-fs is still a valid
requirement for some filesystems but will be revisited sometime later,
most likely using a different framework.

Submitted by: Michael Hancock <michaelh@cet.co.jp>


# 5ddc8ded 08-Apr-1998 Wolfram Schneider <wosch@FreeBSD.org>

New mount option nosymfollow. If enabled, the kernel lookup()
function will not follow symbolic links on the mounted
file system and return EACCES (Permission denied).


# 8c375f58 27-Mar-1998 Bruce Evans <bde@FreeBSD.org>

Don't export anything from <sys/socket.h> except AF_MAX from here.
This only affects the KERNEL case.

Don't include <sys/radix.h> twice for the KERNEL case. This fixes
a mismerge from Lite2.

Don't include <sys/radix.h> at all for the !KERNEL case. This fixes
a wrong cleanup in Lite2.


# 08637435 28-Mar-1998 Bruce Evans <bde@FreeBSD.org>

Moved some #includes from <sys/param.h> nearer to where they are actually
used.


# b1897c19 08-Mar-1998 Julian Elischer <julian@FreeBSD.org>

Reviewed by: dyson@freebsd.org (john Dyson), dg@root.com (david greenman)
Submitted by: Kirk McKusick (mcKusick@mckusick.com)
Obtained from: WHistle development tree


# 34bdbbd0 01-Mar-1998 Mike Smith <msmith@FreeBSD.org>

The intent is to get rid of WILLRELE in vnode_if.src by making
a complement to all ops that return a vpp, VFS_VRELE. This is
initially only for file systems that implement the following ops
that do a WILLRELE:

vop_create, vop_whiteout, vop_mknod, vop_remove, vop_link,
vop_rename, vop_mkdir, vop_rmdir, vop_symlink

This is initial DNA that doesn't do anything yet. VFS_VRELE is
implemented but not called.

A default vfs_vrele was created for fs implementations that use the
standard vnode management routines.

VFS_VRELE implementations were made for the following file systems:

Standard (vfs_vrele)
ffs mfs nfs msdosfs devfs ext2fs

Custom
union umapfs

Just EOPNOTSUPP
fdesc procfs kernfs portal cd9660

These implementations may change as VOP changes are implemented.

In the next phase, in the vop implementations calls to vrele and the vrele
part of vput will be moved to the top layer vfs_vnops and made visible
to all layers. vput will be replaced by unlock in these cases. Unlocking
will still be done in the per fs layer but the refcount decrement will be
triggered at the top because it doesn't hurt to hold a vnode reference a
little longer. This will have minimal impact on the structure of the
existing code.

This will only be done for vnode arguments that are released by the various
fs vop implementations.

Wider use of VFS_VRELE will likely require restructuring of the code.

Reviewed by: phk, dyson, terry et. al.
Submitted by: Michael Hancock <michaelh@cet.co.jp>


# 2a44bbdd 21-Feb-1998 Jordan K. Hubbard <jkh@FreeBSD.org>

MF22: correct comments.


# c60ee1df 21-Feb-1998 Jordan K. Hubbard <jkh@FreeBSD.org>

MF22: CODA entries. They'll have to rework their usage of malloc
somewhat in -current before this will work, but these should at least
serve as place-holders.


# 7d6c26d6 05-Feb-1998 John Dyson <dyson@FreeBSD.org>

Add MNT_LAZY.


# bf49c427 20-Jan-1998 Bruce Evans <bde@FreeBSD.org>

Moved most of the (source-level) compatibility hacks for the vfsconf
interface from sys/mount.h to libc/getvfsent.c The new interface is
now the default.

Sorted the prototypes for the library functions.


# 95802bf8 25-Nov-1997 Julian Elischer <julian@FreeBSD.org>

Shift a few SYSINT() calls around.
this results in a few functions becoming static, and
the SYSINITs being close to the code they are related to.
setting up the dump device is with dumpsys() and
kicking off the scheduler is with the scheduler.
Mounting root is with the code that does it.

Reviewed by: phk


# f2915552 21-Nov-1997 Bruce Evans <bde@FreeBSD.org>

Fixed some style and contents bugs in comments. Copied comments are
usually wrong.


# 52bf64c7 12-Nov-1997 Julian Elischer <julian@FreeBSD.org>

Reviewed by: hackers@freebsd.org in general
Obtained from: Whistle Communications tree

Add an option to the way UFS works dependent on the SUID bit of directories
This changes makes things a whole lot simpler on systems running as
fileservers for PCs and MACS. to enable the new code you must
1/ enable option SUIDDIR on the kernel.
2/ mount the filesystem with option suiddir.
hopefully this makes it difficult enough for people to
do this accidentally.
see the new chmod(2) man page for detailed info.


# b1f4a44b 11-Nov-1997 Julian Elischer <julian@FreeBSD.org>

Reviewed by: various.

Ever since I first say the way the mount flags were used I've hated the
fact that modes, and events, internal and exported, and short-term
and long term flags are all thrown together. Finally it's annoyed me enough..
This patch to the entire FreeBSD tree adds a second mount flag word
to the mount struct. it is not exported to userspace. I have moved
some of the non exported flags over to this word. this means that we now
have 8 free bits in the mount flags. There are another two that might
well move over, but which I'm not sure about.
The only user visible change would have been in pstat -v, except
that davidg has disabled it anyhow.
I'd still like to move the state flags and the 'command' flags
apart from each other.. e.g. MNT_FORCE really doesn't have the
same semantics as MNT_RDONLY, but that's left for another day.


# a1c995b6 12-Oct-1997 Poul-Henning Kamp <phk@FreeBSD.org>

Last major round (Unless Bruce thinks of somthing :-) of malloc changes.

Distribute all but the most fundamental malloc types. This time I also
remembered the trick to making things static: Put "static" in front of
them.

A couple of finer points by: bde


# 81bca6dd 27-Sep-1997 KATO Takenori <kato@FreeBSD.org>

Clustered read and write are switched at mount-option level.

1. Clustered I/O is switched by the MNT_NOCLUSTERR and MNT_NOCLUSTERW
bits of the mnt_flag. The sysctl variables, vfs.foo.doclusterread
and vfs.foo.doclusterwrite are deleted. Only mount option can
control clustered I/O from userland.
2. When foofs_mount mounts block device, foofs_mount checks D_CLUSTERR
and D_CLUSTERW bits of the d_flags member in the block device switch
table. If D_NOCLUSTERR / D_NOCLUSTERW are set, MNT_NOCLUSTERR /
MNT_NOCLUSTERW bits will be set. In this case, MNT_NOCLUSTERR and
MNT_NOCLUSTERW cannot be cleared from userland.
3. Vnode driver disables both clustered read and write.
4. Union filesystem disables clutered write.

Reviewed by: bde


# f116a277 16-Sep-1997 Bruce Evans <bde@FreeBSD.org>

Drop temporary source-level compatibility for old mount(2) interface.


# 57bf258e 16-Aug-1997 Garrett Wollman <wollman@FreeBSD.org>

Fix all areas of the system (or at least all those in LINT) to avoid storing
socket addresses in mbufs. (Socket buffers are the one exception.) A number
of kernel APIs needed to get fixed in order to make this happen. Also,
fix three protocol families which kept PCBs in mbufs to not malloc them
instead. Delete some old compatibility cruft while we're at it, and add
some new routines in the in_cksum family.


# 8b059767 22-Jul-1997 Bruce Evans <bde@FreeBSD.org>

Quick and dirty (?) fix for noatime option. The WebNFS changes
broke it by using the same value for MNT_EXPUBLIC as for MNT_NOATIME.
Just use a different value for MNT_EXPUBLIC.


# 2279b5f4 16-Jul-1997 Doug Rabson <dfr@FreeBSD.org>

Merge WebNFS changes from NetBSD.

Obtained from: NetBSD


# 0ddf9be1 06-Apr-1997 Peter Dufault <dufault@FreeBSD.org>

Make MOD_* macros almost consistent:

Use the name argument almost the same in all LKM types. Maintain
the current behavior for the external (e.g., modstat) name for DEV,
EXEC, and MISC types being #name ## "_mod" and SYCALL and VFS only
#name. This is a candidate for change and I vote just the name without
the "_mod".

Change the DISPATCH macro to MOD_DISPATCH for consistency with the
other macros.

Add an LKM_ANON #define to eliminate the magic -1 and associated
signed/unsigned warnings.

Add MOD_PRIVATE to support wcd.c's poking around in the lkm structure.

Change source in tree to use the new interface.

Reviewed by: Bruce Evans


# 379184c8 03-Mar-1997 Bruce Evans <bde@FreeBSD.org>

Fixed the getvfsbyname macro hack.


# dc91a89e 02-Mar-1997 Bruce Evans <bde@FreeBSD.org>

Restored some pre-Lite2-merge source-level compatibility to the mount()
and getvfsbyname() interfaces. The new interfaces are now hidden from
applications unless _NEW_VFSCONF is defined. The new vfsconf interfaces
don't work yet.


# 6875d254 22-Feb-1997 Peter Wemm <peter@FreeBSD.org>

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


# 670718e2 12-Feb-1997 Mike Pritchard <mpp@FreeBSD.org>

Remove function prototypes for vfs_mountroot and vgoneall, since
they were removed with the Lite2 merge.

Submitted by: bde


# 724ab195 11-Feb-1997 Mike Pritchard <mpp@FreeBSD.org>

Add function prototypes for most of the new Lite2 functions.
Also made a few of the miscfs routines static to be
consistent. Some modules simply required some additional
#includes to remove -Wall warnings.


# 996c772f 09-Feb-1997 John Dyson <dyson@FreeBSD.org>

This is the kernel Lite/2 commit. There are some requisite userland
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.

The system boots and can mount UFS filesystems.

Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
Mount_std mounts will not work until the getfsent
library routine is changed.

Reviewed by: various people
Submitted by: Jeffery Hsu <hsu@freebsd.org>


# 1130b656 14-Jan-1997 Jordan K. Hubbard <jkh@FreeBSD.org>

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# 17a6a9e3 17-Oct-1996 Jordan K. Hubbard <jkh@FreeBSD.org>

Some very small changes to support Netcon's TFS filesystem.
These patches were formerly applied by the Netcon installer
before rebuilding your kernel.


# caa05533 11-Sep-1996 Bruce Evans <bde@FreeBSD.org>

Added a struct tag `fsid' for fsid_t so that sysproto.h can declare
prototypes for the lfs syscalls without having to include <sys/mount.h>
and its nested spam.


# 9e043042 03-Sep-1996 David Greenman <dg@FreeBSD.org>

Implemented kernel side of MNT_NOATIME mount option. This option disables
the file access time update on reads and can be useful in reducing
filesystem overhead in cases where the access time is not important (like
Usenet news spools).


# 02e2c406 11-Mar-1996 Peter Wemm <peter@FreeBSD.org>

Import 4.4BSD-Lite2 onto the vendor branch, note that in the kernel, all
files are off the vendor branch, so this should not change anything.

A "U" marker generally means that the file was not changed in between
the 4.4Lite and Lite-2 releases, and does not need a merge. "C" generally
means that there was a change.
[new sys/syscallargs.h file, to be "cvs rm"ed]


# 6c5e9bbd 30-Jan-1996 Mike Pritchard <mpp@FreeBSD.org>

Fix a bunch of spelling errors in the comment fields of
a bunch of system include files.


# 13a6df99 22-Dec-1995 Poul-Henning Kamp <phk@FreeBSD.org>

Remove the now obsolete vfs_sysctl vfsops element.


# e7b632b5 13-Nov-1995 Bruce Evans <bde@FreeBSD.org>

Replaced nosys() by lkm_nullcmd().


# bacc8b16 05-Nov-1995 John Dyson <dyson@FreeBSD.org>

Changes to existing files for ext2fs support. The UFS mods need rework
in the future as they are a bit crufty -- but at least the stuff is in the
tree now.


# 4590fd3a 09-Sep-1995 David Greenman <dg@FreeBSD.org>

Fixed init functions argument type - caddr_t -> void *. Fixed a couple of
compiler warnings.


# 8d7459c5 29-Aug-1995 Bruce Evans <bde@FreeBSD.org>

Declare vfs_mountroot() in the right place.


# 2b14f991 28-Aug-1995 Julian Elischer <julian@FreeBSD.org>

Reviewed by: julian with quick glances by bruce and others
Submitted by: terry (terry lambert)
This is a composite of 3 patch sets submitted by terry.
they are:
New low-level init code that supports loadbal modules better
some cleanups in the namei code to help terry in 16-bit character support
some changes to the mount-root code to make it a little more
modular..

NOTE: mounting root off cdrom or NFS MIGHT be broken as I haven't been able
to test those cases..

certainly mounting root of disk still works just fine..
mfs should work but is untested. (tomorrows task)

The low level init stuff includes a total rewrite of init_main.c
to make it possible for new modules to have an init phase by simply
adding an entry to a TEXT_SET (or is it DATA_SET) list. thus a new module can
be added to the kernel without editing any other files other than the
'files' file.


# 825a4d8e 25-Aug-1995 David Greenman <dg@FreeBSD.org>

Killed MNT_NOAUTO.


# d38820bf 23-Aug-1995 Jordan K. Hubbard <jkh@FreeBSD.org>

Damn! As Rod just reminded me, I didn't apply the tweak to make
this obey the mask properly. I had it locally, but not in the diffs
I brought across.. :-( Thanks, Rod.


# e0adc2d3 22-Aug-1995 Jordan K. Hubbard <jkh@FreeBSD.org>

Support for NOAUTO mounts.
Submitted by: "Full Name Not Supplied" <simon@masi.ibp.fr>


# 628641f8 11-Aug-1995 David Greenman <dg@FreeBSD.org>

Converted mountlist to a CIRCLEQ.

Partially obtained from: 4.4BSD-Lite2


# a62dc406 27-Jun-1995 Doug Rabson <dfr@FreeBSD.org>

Changes to support version 3 of the NFS protocol.
The version 2 support has been tested (client+server) against FreeBSD-2.0,
IRIX 5.3 and FreeBSD-current (using a loopback mount). The version 2 support
is stable AFAIK.
The version 3 support has been tested with a loopback mount and minimally
against an IRIX 5.3 server. It needs more testing and may have problems.
I have patched amd to support the new variable length filehandles although
it will still only use version 2 of the protocol.

Before booting a kernel with these changes, nfs clients will need to at least
build and install /usr/sbin/mount_nfs. Servers will need to build and
install /usr/sbin/mountd.

NFS diskless support is untested.

Obtained from: Rick Macklem <rick@snowhite.cis.uoguelph.ca>


# 9b2e5354 30-May-1995 Rodney W. Grimes <rgrimes@FreeBSD.org>

Remove trailing whitespace.


# 61f5d510 21-May-1995 David Greenman <dg@FreeBSD.org>

Changes to fix the following bugs:

1) Files weren't properly synced on filesystems other than UFS. In some
cases, this lead to lost data. Most likely would be noticed on NFS.
The fix is to make the VM page sync/object_clean general rather than
in each filesystem.
2) Mixing regular and mmaped file I/O on NFS was very broken. It caused
chunks of files to end up as zeroes rather than the intended contents.
The fix was to fix several race conditions and to kludge up the
"b_dirtyoff" and "b_dirtyend" that NFS relies upon - paying attention
to page modifications that occurred via the mmapping.

Reviewed by: David Greenman
Submitted by: John Dyson


# 999422d7 19-Apr-1995 Julian Elischer <julian@FreeBSD.org>

Reviewed by: no-one yet, but non-intrusive
Submitted by: julian@tfs.com
Obtained from: written from scratch

slight changes to make space for devfs..
(also conditional test code in i386/isa/fd.c)

===================================================================
RCS file: /home/ncvs/src/sys/sys/malloc.h,v
retrieving revision 1.7
diff -r1.7 malloc.h
113a114,117
> #define M_DEVFSMNT 62 /* DEVFS mount structure */
> #define M_DEVFSBACK 63 /* DEVFS Back node */
> #define M_DEVFSFRONT 64 /* DEVFS Front node */
> #define M_DEVFSNODE 65 /* DEVFS node */
184c188,192
< NULL, NULL, NULL, NULL, NULL, \
---
> "DEVFS mount", /* 62 M_DEVFSMNT */ \
> "DEVFS back", /* 63 M_DEVFSBACK */ \
> "DEVFS front", /* 64 M_DEVFSFRONT */ \
> "DEVFS node", /* 65 M_DEVFSNODE */ \
> NULL, \
Index: sys/mount.h
===================================================================
RCS file: /home/ncvs/src/sys/sys/mount.h,v
retrieving revision 1.16
diff -r1.16 mount.h
100c100,101
< #define MOUNT_MAXTYPE 15
---
> #define MOUNT_DEVFS 16 /* existing device Filesystem */
> #define MOUNT_MAXTYPE 16
118a120
> "devfs", /* 15 MOUNT_DEVFS */ \
Index: sys/vnode.h
===================================================================
RCS file: /home/ncvs/src/sys/sys/vnode.h,v
retrieving revision 1.19
diff -r1.19 vnode.h
61c61
< VT_UNION, VT_MSDOSFS
---
> VT_UNION, VT_MSDOSFS, VT_DEVFS


# 3c6bef7e 10-Apr-1995 Garrett Wollman <wollman@FreeBSD.org>

Correct name `cd9660' for MOUNT_CD9660 (but NB that this whole table
is bogus and only exists for the benefit of find(1)). Old name was
`iso9660fs'.

Submitted by: Andrew Atrens <atreand@statcan.ca>


# bbf3a566 16-Mar-1995 Garrett Wollman <wollman@FreeBSD.org>

Add four more filesystem flags:

VFCF_NETWORK (this FS goes over the net)
VFCF_READONLY (read-write mounts do not make any sense)
VFCF_SYNTHETIC (data in this FS is not real)
VFCF_LOOPBACK (this FS aliases something else)

cd9660 is readonly; nullfs, umapfs, and union are loopback; NFS is netowkr;
procfs, kernfs, and fdesc are synthetic.


# cff19ac2 16-Mar-1995 Garrett Wollman <wollman@FreeBSD.org>

Statically-compiled filesystems now use a VFCF_STATIC flag rather than
abusing the refcount.


# b5e8ce9f 16-Mar-1995 Bruce Evans <bde@FreeBSD.org>

Add and move declarations to fix all of the warnings from `gcc -Wimplicit'
(except in netccitt, netiso and netns) and most of the warnings from
`gcc -Wnested-externs'. Fix all the bugs found. There were no serious
ones.


# 03a62940 19-Oct-1994 Garrett Wollman <wollman@FreeBSD.org>

Actually implement the functionality documented in sysctl.h for type CTL_FS.
(Namely, call a filesystem-dependent sysctl function analogous to how it works
for networking and (now) physical devices.)


# c172c3e6 27-Sep-1994 Poul-Henning Kamp <phk@FreeBSD.org>

ktrace.c: added decl of ktrnamei
lkm.h: added decl of lkmdispatch
mount.h: added decl of vfs_busy,vfs_unbusy
syscall: The "created from" changed.


# dff55bb5 21-Sep-1994 Garrett Wollman <wollman@FreeBSD.org>

mount.h: Declare getvfs* functions from libc.
vfs_init.c: Fix fs_sysctl() so that getvfs* functions actually work.


# 67bfdf83 21-Sep-1994 Garrett Wollman <wollman@FreeBSD.org>

Fix a few niggling little bugs:

- set args->lkm_offset correctly so that VFS modules can be unloaded
- initialize _fs_vfsops.vfc_refcount correctly so that VFS modules can
be unloaded
- include kernel.h in a few placves to get the correct definition of DATA_SET


# c901836c 20-Sep-1994 Garrett Wollman <wollman@FreeBSD.org>

Implemented loadable VFS modules, and made most existing filesystems
loadable. (NFS is a notable exception.)


# 27a0bc89 19-Sep-1994 Doug Rabson <dfr@FreeBSD.org>

Added msdosfs.

Obtained from: NetBSD


# d8f10c11 15-Sep-1994 Bruce Evans <bde@FreeBSD.org>

Add some prototypes.


# b531a9b1 22-Aug-1994 Bruce Evans <bde@FreeBSD.org>

- Fix warnings in df, etc. caused by misplaced declaration of doumount().
- Fix bogus comments caused by misplaced #endif.


# af9da405 20-Aug-1994 Paul Richards <paul@FreeBSD.org>

Made them all idempotent.
Reviewed by:
Submitted by:


# e0e9c421 20-Aug-1994 David Greenman <dg@FreeBSD.org>

Implemented filesystem clean bit via:

machdep.c:
Changed printf's a little and call vfs_unmountall() if the sync was
successful.

cd9660_vfsops.c, ffs_vfsops.c, nfs_vfsops.c, lfs_vfsops.c:
Allow dismount of root FS. It is now disallowed at a higher level.

vfs_conf.c:
Removed unused rootfs global.

vfs_subr.c:
Added new routines vfs_unmountall and vfs_unmountroot. Filesystems
are now dismounted if the machine is properly rebooted.

ffs_vfsops.c:
Toggle clean bit at the appropriate places. Print warning if an
unclean FS is mounted.

ffs_vfsops.c, lfs_vfsops.c:
Fix bug in selecting proper flags for VOP_CLOSE().

vfs_syscalls.c:
Disallow dismounting root FS via umount syscall.


# 3c4dd356 02-Aug-1994 David Greenman <dg@FreeBSD.org>

Added $Id$


# df8bae1d 24-May-1994 Rodney W. Grimes <rgrimes@FreeBSD.org>

BSD 4.4 Lite Kernel Sources