Cross Reference: /freebsd-current/sys/sys/mount.h

History log of /freebsd-current/sys/sys/mount.h
Revision	Date	Author	Comments
# 2496fb72	02-Mar-2024	Konstantin Belousov <kib@FreeBSD.org>	sys/mount.h: align values of MNTK_XXX flags Sponsored by: The FreeBSD Foundation MFC after: 3 days
# f76cb7bd	30-Jan-2024	Konstantin Belousov <kib@FreeBSD.org>	sys/mount.h: use __inline instead of plain inline, for C89 Reported by: antoine Sponsored by: The FreeBSD Foundation MFC after: 3 days
# 3334a537	26-Dec-2023	Konstantin Belousov <kib@FreeBSD.org>	Convert fsidcmp(9) from macro to inline function This allows type checking the arguments. Explicit structure members comparisions are done to avoid introducting string.h pollution for userspace. Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D43205
# 29363fb4	23-Nov-2023	Warner Losh <imp@FreeBSD.org>	sys: Remove ancient SCCS tags. Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl script. Sponsored by: Netflix
# f5f27772	23-Nov-2023	Rick Macklem <rmacklem@FreeBSD.org>	nfsd: Fix NFS access to .zfs/snapshot snapshots When a process attempts to access a snapshot under /<dataset>/.zfs/snapshot, the snapshot is automounted. However, without this patch, the automount does not set mnt_exjail, which results in the snapshot not being accessible over NFS. This patch defines a new function called vfs_exjail_clone() which sets mnt_exjail from another mount point and then uses that function to set mnt_exjail in the snapshot automount. A separate patch that is currently a pull request for OpenZFS, calls this function to fix the problem. PR: 275200 Reviewed by: markj MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D42672
# 2ff63af9	16-Aug-2023	Warner Losh <imp@FreeBSD.org>	sys: Remove $FreeBSD$: one-line .h pattern Remove /^\s\+\s\$FreeBSD\$.$\n/
# 88175af8	21-Feb-2023	Rick Macklem <rmacklem@FreeBSD.org>	vfs_export: Add mnt_exjail to control exports done in prisons If there are multiple instances of mountd(8) (in different prisons), there will be confusion if they manipulate the exports of the same file system. This patch adds mnt_exjail to "struct mount" so that the credentials (and, therefore, the prison) that did the exports for that file system can be recorded. If another prison has already exported the file system, vfs_export() will fail with an error. If mnt_exjail == NULL, the file system has not been exported. mnt_exjail is checked by the NFS server, so that exports done from within a different prison will not be used. The patch also implements vfs_exjail_destroy(), which is called from prison_cleanup() to release all the mnt_exjail credential references, so that the prison can be removed. Mainly to avoid doing a scan of the mountlist for the case where there were no exports done from within the prison, a count of how many file systems have been exported from within the prison is kept in pr_exportcnt. Reviewed by: markj Discussed with: jamie Differential Revision: https://reviews.freebsd.org/D38371 MFC after: 3 months
# db565512	04-Feb-2023	Rick Macklem <rmacklem@FreeBSD.org>	vfs_mount.c: Free exports structures in vfs_destroy_mount() During testing of exporting file systems in jails, I noticed that the export structures on a mount were not being free'd when the mount is dismounted. This bug appears to have been in the system for a very long time. It would have resulted in a slow memory leak when exported file systems were dismounted. Prior to r362158, freeing the structures during dismount would not have been safe, since VFS_CHECKEXP() returned a pointer into an export structure, which might still have been used by the NFS server for an in-progress RPC when the file system is dismounted. r362158 fixed this, so it should now be safe to free the structures in vfs_mount_destroy(), which is what this patch does. Reviewed by: kib MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D38385
# d94e0bdc	04-Feb-2023	Rick Macklem <rmacklem@FreeBSD.org>	Revert "vfs_export: Add checks for correct prison when updating exports" This reverts commit 7926a01ed7ae7cefd81ef4cc2142c35b84d81913. A new patch in D38371 is being considered for doing this.
# 7926a01e	02-Feb-2023	Rick Macklem <rmacklem@FreeBSD.org>	vfs_export: Add checks for correct prison when updating exports mountd(8) basically does the following: getmntinfo() for each mount delete_exports using nmount(2) to do the creation/deletion of individual exports. For prison0 (and for other prisons if enforce_statfs == 0) getmntinfo() returns all mount points, including ones being used within other prisons. This can cause confusion if the same file system is specified in the exports(5) file for multiple prisons. This patch adds a perminent identifier to each prison and marks which prison did the exports in a field of the mount structure called mnt_exjail. This field can then be compared to the perminent identifier for the prison that the thread's credentials is in. Also required was a new function called prison_isalive_permid() which returns if the prison is alive, so that the check can be ignored for prisons that have been removed. This prepares the system to allow mountd(8) to run in multiple prisons, including prison0. Future commits will complete the modifications to allow mountd(8) to run in vnet prisons. Until then, these changes should not affect semantics. Reviewed by: markj MFC after: 3 months Differential Revision: https://reviews.freebsd.org/D38144
# 521fbb72	23-Nov-2022	Doug Rabson <dfr@FreeBSD.org>	Add support for mounting single files in nullfs The main use-case for this is to support mounting config files and secrets into OCI containers. My current workaround copies the files into the container which is messy and risks secrets leaking into container images if the cleanup fails. This adds a VFCF flag to indicate whether the filesystem supports file mounts and allows fspath to be either a directory or a file if the flag is set. Test Plan: $ sudo mkdir -p /mnt $ sudo touch /mnt/foo $ sudo mount -t nullfs /COPYRIGHT /mnt/foo Reviewed by: mjg, kib Tested by: pho
# ce00b119	14-Jun-2022	Doug Ambrisko <ambrisko@FreeBSD.org>	mount: revert the active vnode reporting feature Revert the computing of active vnode reporting since statfs is used by a lot of tools. Only report the vnodes used. Reported by: mjg
# 6468cd8e	13-Jun-2022	Doug Ambrisko <ambrisko@FreeBSD.org>	mount: add vnode usage per file system with mount -v This avoids the need to drop into the ddb to figure out vnode usage per file system. It helps to see if they are or are not being freed. Suggestion to report active vnode count was from kib@ Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D35436
# eca39864	01-Apr-2022	Konstantin Belousov <kib@FreeBSD.org>	Add sysctl KERN_LOCKF reporting the shapshot of the active advisory locks. A new VFS ops method vfs_report_lockf if provided in the mount point op table. If it is NULL, as it is currently for all existing filesystems, vfs_report_lockf() function is used, which gathers information from the standard implementation inside kern/kern_lockf.c. Filesystems implementing its own locking (NFSv4 as example) can provide a custom implementation. Reviewed by: markj, rmacklem Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D34756
# eb574ba0	19-Mar-2022	Mateusz Guzik <mjg@FreeBSD.org>	vfs: replace VFS_NOTIFY_UPPER_* macros with an enum
# 93a0ba8f	17-Sep-2021	Mateusz Guzik <mjg@FreeBSD.org>	vfs: retire the no longer used MNTK_LOOKUP_EXCL_DOTDOT flag Reviewed by: markj Tested by: pho (previous version) Differential Revision: https://reviews.freebsd.org/D34466
# 1cb0045c	07-Mar-2022	Mateusz Guzik <mjg@FreeBSD.org>	vfs: add MNTK_UNLOCKED_INSMNTQUE Can be used when the fs at hand can synchronize insmntque with other means than the vnode lock. Reviewed by: markj Tested by: pho (previous version) Differential Revision: https://reviews.freebsd.org/D34466
# 4a4b059a	25-Dec-2021	Konstantin Belousov <kib@FreeBSD.org>	Add vfs_remount_ro() a helper to remount filesystem from rw to ro. Tested by: pho Reviewed by: markj, mckusick Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D33721
# dd2f6e14	10-Dec-2021	Mateusz Guzik <mjg@FreeBSD.org>	vfs: annotate all unused MNTK_ flags
# 4dd23ae1	10-Dec-2021	Mateusz Guzik <mjg@FreeBSD.org>	vfs: retire MNTK_NOKNOTE and VV_NOKNOTE MNTK_NOKNOTE was introduced in 679985d03a64f5dfb4355538ae6e3b70f8347f38 (dated 2005), VV_NOKNOTE in 34cc826ae8999f454dd6cb9c77d17ce83b169f92 few months later. Neither was ever used by anything in the tree.
# 4dcdf398	17-May-2021	Mateusz Guzik <mjg@FreeBSD.org>	vfs: replace the MNTK_TEXT_REFS flag with VIRF_TEXT_REF This allows to stop maintaing the VI_TEXT_REF flag and consequently opens up fully lockless v_writecount adjustment. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D33127
# 8981a100	20-Nov-2021	Robert Wing <rew@FreeBSD.org>	mount: retire kernel_vmount() The last usage of this function was removed in e3b1c847a4237ad9. There are no in-tree consumers of kernel_vmount(). Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D32607
# 311b95bb	23-Oct-2021	Robert Wing <rew@FreeBSD.org>	sys/mount.h: remove dead prototype vfs_getrootfsid() was removed in 245efbba4d6a3e60a0d6d16d18d9a5fad6260733 Reviewed by: mjg Differential Revision: https://reviews.freebsd.org/D32606
# a8c732f4	07-Aug-2021	Jason A. Harmening <jah@FreeBSD.org>	VFS: add retry limit and delay for failed recursive unmounts A forcible unmount attempt may fail due to a transient condition, but it may also fail due to some issue in the filesystem implementation that will indefinitely prevent successful unmount. In such a case, the retry logic in the recursive unmount facility will cause the deferred unmount taskqueue to execute constantly. Avoid this scenario by imposing a retry limit, with a default value of 10, beyond which the recursive unmount facility will emit a log message and give up. Additionally, introduce a grace period, with a default value of 1s, between successive unmount retries on the same mount. Create a new sysctl node, vfs.deferred_unmount, to export the total number of failed recursive unmount attempts since boot, and to allow the retry limit and retry grace period to be tuned. Reviewed by: kib (earlier revision), mkusick Differential Revision: https://reviews.freebsd.org/D31450
# c66e9307	14-Aug-2021	Piotr Pawel Stefaniak <pstef@FreeBSD.org>	mount.h: improve a comment about flags The comment only specifies MNT_ROOTFS - which is set by the kernel when mounting its root file system. So it's not clear if any other flags are not quite right and for what reason.
# 2bc16e8a	17-Jul-2021	Jason A. Harmening <jah@FreeBSD.org>	VFS: remove MNTK_MARKER We no longer allow upper filesystems to be unregistered from the base mount while vfs_notify_upper() or any other upper operation is pending. New upper mounts can still be registered during this period, but they will be added at the end of the upper mount tailq. We therefore no longer need to allocate marker nodes during vfs_notify_upper() to keep our place in the iteration. Reviewed by: kib, mckusick Tested by: pho Differential Revision: https://reviews.freebsd.org/D31016
# c746ed72	12-Jun-2021	Jason A. Harmening <jah@FreeBSD.org>	Allow stacked filesystems to be recursively unmounted In certain emergency cases such as media failure or removal, UFS will initiate a forced unmount in order to prevent dirty buffers from accumulating against the no-longer-usable filesystem. The presence of a stacked filesystem such as nullfs or unionfs above the UFS mount will prevent this forced unmount from succeeding. This change addreses the situation by allowing stacked filesystems to be recursively unmounted on a taskqueue thread when the MNT_RECURSE flag is specified to dounmount(). This call will block until all upper mounts have been removed unless the caller specifies the MNT_DEFERRED flag to indicate the base filesystem should also be unmounted from the taskqueue. To achieve this, the recently-added vfs_pin_from_vp()/vfs_unpin() KPIs have been combined with the existing 'mnt_uppers' list used by nullfs and renamed to vfs_register_upper_from_vp()/vfs_unregister_upper(). The format of the mnt_uppers list has also been changed to accommodate filesystems such as unionfs in which a given mount may be stacked atop more than one lower mount. Additionally, management of lower FS reclaim/unlink notifications has been split into a separate list managed by a separate set of KPIs, as registration of an upper FS no longer implies interest in these notifications. Reviewed by: kib, mckusick Tested by: pho Differential Revision: https://reviews.freebsd.org/D31016
# 59409cb9	17-May-2021	Jason A. Harmening <jah@FreeBSD.org>	Add a generic mechanism for preventing forced unmount This is aimed at preventing stacked filesystems like nullfs and unionfs from "losing" their lower mounts due to forced unmount. Otherwise, VFS operations that are passed through to the lower filesystem(s) may crash or otherwise cause unpredictable behavior. Introduce two new functions: vfs_pin_from_vp() and vfs_unpin(). which are intended to be called on the lower mount(s) when the stacked filesystem is mounted and unmounted, respectively. Much as registration in the mnt_uppers list previously did, pinning will prevent even forced unmount of the lower FS and will allow the stacked FS to freely operate on the lower mount either by direct use of the struct mount* or indirect use through a properly-referenced vnode's v_mount field. vfs_pin_from_vp() is modeled after vfs_ref_from_vp() in that it uses the mount interlock coupled with re-checking vp->v_mount to ensure that it will fail in the face of a pending unmount request, even if the concurrent unmount fully completes. Adopt these new functions in both nullfs and unionfs. Reviewed By: kib, markj Differential Revision: https://reviews.freebsd.org/D30401
# a4b07a27	11-May-2021	Jason A. Harmening <jah@FreeBSD.org>	VFS_QUOTACTL(9): allow implementation to indicate busy state changes Instead of requiring all implementations of vfs_quotactl to unbusy the mount for Q_QUOTAON and Q_QUOTAOFF, add an "mp_busy" in/out param to VFS_QUOTACTL(9). The implementation may then indicate to the caller whether it needed to unbusy the mount. Also, add stbool.h to libprocstat modules which #define _KERNEL before including sys/mount.h. Otherwise they'll pull in sys/types.h before defining _KERNEL and therefore won't have the bool definition they need for mp_busy. Reviewed By: kib, markj Differential Revision: https://reviews.freebsd.org/D30556
# 271fcf1c	29-May-2021	Jason A. Harmening <jah@FreeBSD.org>	Revert commits 6d3e78ad6c11 and 54256e7954d7 Parts of libprocstat like to pretend they're kernel components for the sake of including mount.h, and including sys/types.h in the _KERNEL case doesn't fix the build for some reason. Revert both the VFS_QUOTACTL() change and the follow-up "fix" for now.
# 54256e79	29-May-2021	Jason A. Harmening <jah@FreeBSD.org>	Fix userspace build after commit 6d3e78ad6c11 Reported by: jenkins
# 6d3e78ad	11-May-2021	Jason A. Harmening <jah@FreeBSD.org>	VFS_QUOTACTL(9): allow implementation to indicate busy state changes Instead of requiring all implementations of vfs_quotactl to unbusy the mount for Q_QUOTAON and Q_QUOTAOFF, add an "mp_busy" in/out param to VFS_QUOTACTL(9). The implementation may then indicate to the caller whether it needed to unbusy the mount. Reviewed By: kib, markj Differential Revision: https://reviews.freebsd.org/D30218
# f784da88	17-May-2021	Konstantin Belousov <kib@FreeBSD.org>	Move mnt_maxsymlinklen into appropriate fs mount data structures Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week X-MFC-Note: struct mount layout Differential revision: https://reviews.freebsd.org/D30325
# 9a2fac6b	16-May-2021	Kirk McKusick <mckusick@FreeBSD.org>	Fix handling of embedded symbolic links (and history lesson). The original filesystem release (4.2BSD) had no embedded sysmlinks. Historically symbolic links were just a different type of file, so the content of the symbolic link was contained in a single disk block fragment. We observed that most symbolic links were short enough that they could fit in the area of the inode that normally holds the block pointers. So we created embedded symlinks where the content of the link was held in the inode's pointer area thus avoiding the need to seek and read a data fragment and reducing the pressure on the block cache. At the time we had only UFS1 with 32-bit block pointers, so the test for a fastlink was: di_size < (NDADDR + NIADDR) * sizeof(daddr_t) (where daddr_t would be ufs1_daddr_t today). When embedded symlinks were added, a spare field in the superblock with a known zero value became fs_maxsymlinklen. New filesystems set this field to (NDADDR + NIADDR) * sizeof(daddr_t). Embedded symlinks were assumed when di_size < fs->fs_maxsymlinklen. Thus filesystems that preceeded this change always read from blocks (since fs->fs_maxsymlinklen == 0) and newer ones used embedded symlinks if they fit. Similarly symlinks created on pre-embedded symlink filesystems always spill into blocks while newer ones will embed if they fit. At the same time that the embedded symbolic links were added, the on-disk directory structure was changed splitting the former u_int16_t d_namlen into u_int8_t d_type and u_int8_t d_namlen. Thus fs_maxsymlinklen <= 0 (as used by the OFSFMT() macro) can be used to distinguish old directory formats. In retrospect that should have just been an added flag, but we did not realize we needed to know about that change until it was already in production. Code was split into ufs/ffs so that the log structured filesystem could use ufs functionality while doing its own disk layout. This meant that no ffs superblock fields could be used in the ufs code. Thus ffs superblock fields that were needed in ufs code had to be copied to fields in the mount structure. Since ufs_readlink needed to know if a link was embedded, fs_maxlinklen gets copied to mnt_maxsymlinklen. The kernel panic that arose to making this fix was triggered when a disk error created an inode of type symlink with no allocated data blocks but a large size. When readlink was called the uiomove was attempted which segment faulted. static int ufs_readlink(ap) struct vop_readlink_args /* { struct vnode a_vp; struct uio a_uio; struct ucred a_cred; } / ap; { struct vnode vp = ap->a_vp; struct inode ip = VTOI(vp); doff_t isize; isize = ip->i_size; if ((isize < vp->v_mount->mnt_maxsymlinklen) \|\| DIP(ip, i_blocks) == 0) { / XXX - for old fastlink support / return (uiomove(SHORTLINK(ip), isize, ap->a_uio)); } return (VOP_READ(vp, ap->a_uio, 0, ap->a_cred)); } The second part of the "if" statement that adds DIP(ip, i_blocks) == 0) { / XXX - for old fastlink support */ is problematic. It never appeared in BSD released by Berkeley because as noted above mnt_maxsymlinklen is 0 for old format filesystems, so will always fall through to the VOP_READ as it should. I had to dig back through `git blame' to find that Rodney Grimes added it as part of ``The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.'' He must have brought it across from an earlier FreeBSD. Unfortunately the source-control logs for FreeBSD up to the merger with the AT&T-blessed 4.4BSD-Lite conversion were destroyed as part of the agreement to let FreeBSD remain unencumbered, so I cannot pin-point where that line got added on the FreeBSD side. The one change needed here is that mnt_maxsymlinklen is declared as an `int' and should be changed to be `u_int64_t'. This discovery led us to check out the code that deletes symbolic links. Specifically if (vp->v_type == VLNK && (ip->i_size < vp->v_mount->mnt_maxsymlinklen \|\| datablocks == 0)) { if (length != 0) panic("ffs_truncate: partial truncate of symlink"); bzero(SHORTLINK(ip), (u_int)ip->i_size); ip->i_size = 0; DIP_SET(ip, i_size, 0); UFS_INODE_SET_FLAG(ip, IN_SIZEMOD \| IN_CHANGE \| IN_UPDATE); if (needextclean) goto extclean; return (ffs_update(vp, waitforupdate)); } Here too our broken symlink inode with no data blocks allocated and a large size will segment fault as we are incorrectly using the test that we have no data blocks to decide that it is an embdedded symbolic link and attempting to bzero past the end of the inode. The test for datablocks == 0 is unnecessary as the test for ip->i_size < vp->v_mount->mnt_maxsymlinklen will do the right thing in all cases. The test for datablocks == 0 was added by David Greenman in this commit: Author: David Greenman <dg@FreeBSD.org> Date: Tue Aug 2 13:51:05 1994 +0000 Completed (hopefully) the kernel support for old style "fastlinks". Notes: svn path=/head/; revision=1821 I am guessing that he likely earlier added the incorrect test in the ufs_readlink code. I asked David if he had any recollection of why he made this change. Amazingly, he still had a recollection of why he had made a one-line change more than twenty years ago. And unsurpisingly it was because he had been stuck between a rock and a hard place. FreeBSD was up to 1.1.5 before the switch to the 4.4BSD-Lite code base. Prior to that, there were three years of development in all areas of the kernel, including the filesystem code, from the combined set of people including Bill Jolitz, Patchkit contributors, and FreeBSD Project members. The compatibility issue at hand was caused by the FASTLINKS patches from Curt Mayer. In merging in the 4.4BSD-Lite changes David had to find a way to provide compatibility with both the changes that had been made in FreeBSD 1.1.5 and with 4.4BSD-Lite. He felt that these changes would provide compatibility with both systems. In his words: ``My recollection is that the 'FASTLINKS' symlinks support in FreeBSD-1.x, as implemented by Curt Mayer, worked differently than 4.4BSD. He used a spare field in the inode to duplicately store the length. When the 4.4BSD-Lite merge was done, the optimized symlinks support for existing filesystems (those that were initialized in FreeBSD-1.x) were broken due to the FFS on-disk structure of 4.4BSD-Lite differing from FreeBSD-1.x. My commit was needed to restore the backward compatibility with FreeBSD-1.x filesystems. I think it was the best that could be done in the somewhat urgent circumstances of the post Berkeley-USL settlement. Also, regarding Rod's massive commit with little explanation, some context: John Dyson and I did the initial re-port of the 4.4BSD-Lite kernel to the 386 platform in just 10 days. It was by far the most intense hacking effort of my life. In addition to the porting of tons of FreeBSD-1 code, I think we wrote more than 30,000 lines of new code in that time to deal with the missing pieces and architectural changes of 4.4BSD-Lite. We didn't make many notes along the way. There was a lot of pressure to get something out to the rest of the developer community as fast as possible, so detailed discrete commits didn't happen - it all came as a giant wad, which is why Rod's commit message was worded the way it was.'' Reported by: Chuck Silvers Tested by: Chuck Silvers History by: David Greenman Lawrence MFC after: 1 week Sponsored by: Netflix
# 5af1131d	08-Apr-2021	Konstantin Belousov <kib@FreeBSD.org>	struct mount uppers: correct locking annotations It is all locked by the uppers' interlock. Noted by: Alexander Lochmann <alexander.lochmann@tu-dortmund.de> Sponsored by: The FreeBSD Foundation MFC after: 3 days
# b5449c92	26-Feb-2021	Konstantin Belousov <kib@FreeBSD.org>	Use atomic_interrupt_fence() instead of bare __compiler_membar() for the which which definitely use membar to sync with interrupt handlers. libc and rtld uses of __compiler_membar() seems to want compiler barriers proper. The barrier in sched_unpin_lite() after td_pinned decrement seems to be not needed and removed, instead of convertion. Reviewed by: markj MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28956
# d485c77f	18-Feb-2021	Konstantin Belousov <kib@FreeBSD.org>	Remove #define _KERNEL hacks from libprocstat Make sys/buf.h, sys/pipe.h, sys/fs/devfs/devfs*.h headers usable in userspace, assuming that the consumer has an idea what it is for. Unhide more material from sys/mount.h and sys/ufs/ufs/inode.h, sys/ufs/ufs/ufsmount.h for consumption of userspace tools, with the same caveat. Remove unacceptable hack from usr.sbin/makefs which relied on sys/buf.h being unusable in userspace, where it override struct buf with its own definition. Instead, provide struct m_buf and struct m_vnode and adapt code to use local variants. Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D28679
# a15f787a	15-Feb-2021	Mateusz Guzik <mjg@FreeBSD.org>	vfs: add vfs_ref_from_vp This generalizes what vop_stdgetwritemount used to be doing. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D28695
# f6dd1aef	09-Nov-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: group mount per-cpu vars into one struct While here move frequently read stuff into the same cacheline. This shrinks struct mount by 64 bytes. Tested by: pho
# f1084587	05-Nov-2020	Konstantin Belousov <kib@FreeBSD.org>	Suspend all writeable local filesystems on power suspend. This ensures that no writes are pending in memory, either metadata or user data, but not including dirty pages not yet converted to fs writes. Only filesystems declared local are suspended. Note that this does not guarantee absence of the metadata errors or leaks if resume is not done: for instance, on UFS unlinked but opened inodes are leaked and require fsck to gc. Reviewed by: markj Discussed with: imp Tested by: imp (previous version), pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D27054
# ad89066a	17-Oct-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: annotate mountlist_mtx with __exclusive_cache_line
# c5ce27ba	26-Aug-2020	Rick Macklem <rmacklem@FreeBSD.org>	Add MNT_EXTLSxxx flags that will be used for NFS over TLS exports. These flags are not currently used, but will be used by future commits to implement export(5) requirements for the use of NFS over TLS by clients. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D26180
# 08b242ae	19-Aug-2020	Warner Losh <imp@FreeBSD.org>	Move the mount name to bit mapping into sys/mount.h so it can be shared with the kernel. Discussed with: kib@ Reviewed by: kirk@ (prior version) Sponsored by: Netflix Diffential Revision: https://reviews.freebsd.org/D25969
# 17a66c70	04-Aug-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: add vfs_op_thread_enter/exit _crit variants and employ them in the namecache. Eliminates all spurious checks for preemption.
# 07d2145a	25-Jul-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: add the infrastructure for lockless lookup Reviewed by: kib Tested by: pho (in a patchset) Differential Revision: https://reviews.freebsd.org/D25577
# 9d6fc996	13-Jun-2020	Rick Macklem <rmacklem@FreeBSD.org>	Oops, r362158 committed a duplicate definition of MAXSECFLAVORS. This patch gets rid of the duplicate.
# 1f7104d7	13-Jun-2020	Rick Macklem <rmacklem@FreeBSD.org>	Fix export_args ex_flags field so that is 64bits, the same as mnt_flags. Since mnt_flags was upgraded to 64bits there has been a quirk in "struct export_args", since it hold a copy of mnt_flags in ex_flags, which is an "int" (32bits). This happens to currently work, since all the flag bits used in ex_flags are defined in the low order 32bits. However, new export flags cannot be defined. Also, ex_anon is a "struct xucred", which limits it to 16 additional groups. This patch revises "struct export_args" to make ex_flags 64bits and replaces ex_anon with ex_uid, ex_ngroups and ex_groups (which points to a groups list, so it can be malloc'd up to NGROUPS in size. This requires that the VFS_CHECKEXP() arguments change, so I also modified the last "secflavors" argument to be an array pointer, so that the secflavors could be copied in VFS_CHECKEXP() while the export entry is locked. (Without this patch VFS_CHECKEXP() returns a pointer to the secflavors array and then it is used after being unlocked, which is potentially a problem if the exports entry is changed. In practice this does not occur when mountd is run with "-S", but I think it is worth fixing.) This patch also deleted the vfs_oexport_conv() function, since do_mount_update() does the conversion, as required by the old vfs_cmount() calls. Reviewed by: kib, freqlabs Relnotes: yes Differential Revision: https://reviews.freebsd.org/D25088
# 693d10a2	03-Jun-2020	Ryan Moeller <freqlabs@FreeBSD.org>	tmpfs: Preserve alignment of struct fid fields On 64-bit platforms, the two short fields in `struct tmpfs_fid` are padded to the 64-bit alignment of the long field. This pushes the offsets of the subsequent fields by 4 bytes and makes `struct tmpfs_fid` bigger than `struct fid`. `tmpfs_vptofh()` casts a `struct fid ` to `struct tmpfs_fid `, causing 4 bytes of adjacent memory to be overwritten when the struct fields are set. Through several layers of indirection and embedded structs, the adjacent memory for one particular call to `tmpfs_vptofh()` happens to be the stack canary for `nfsrvd_compound()`. Half of the canary ends up being clobbered, going unnoticed until eventually the stack check fails when `nfsrvd_compound()` returns and a panic is triggered. Instead of duplicating fields of `struct fid` in `struct tmpfs_fid`, narrow the struct to cover only the unique fields for tmpfs and assert at compile time that the struct fits in the allotted space. This way we don't have to replicate the offsets of `struct fid` fields, we just use them directly. Reviewed by: kib, mav, rmacklem Approved by: mav (mentor) MFC after: 1 week Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D25077
# 245bfd34	20-May-2020	Ryan Moeller <freqlabs@FreeBSD.org>	Deduplicate fsid comparisons Comparing fsid_t objects requires internal knowledge of the fsid structure and yet this is duplicated across a number of places in the code. Simplify by creating a fsidcmp function (macro). Reviewed by: mjg, rmacklem Approved by: mav (mentor) MFC after: 1 week Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D24749
# f15ccf88	06-Mar-2020	Chuck Silvers <chs@FreeBSD.org>	Add a new "mntfs" pseudo file system which provides private device vnodes for file systems to safely access their disk devices, and adapt FFS to use it. Also add a new BO_NOBUFS flag to allow enforcing that file systems using mntfs vnodes do not accidentally use the original devfs vnode to create buffers. Reviewed by: kib, mckusick Approved by: imp (mentor) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D23787
# 123c5197	12-Feb-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: switch to smp_rendezvous_cpus_retry for vfs_op_thread_enter/exit In particular on amd64 this eliminates an atomic op in the common case, trading it for IPIs in the uncommon case of catching CPUs executing the code while the filesystem is getting suspended or unmounted.
# 8f2b73dc	07-Feb-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: use newly added zpcpu routines instead of direct access where appropriate
# d3cc5354	17-Jan-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: provide F_ISUNIONSTACK as a kludge for libc Prior to introduction of this op libc's readdir would call fstatfs(2), in effect unnecessarily copying kilobytes of data just to check fs name and a mount flag. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D23162
# cc3593fb	12-Jan-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: rework vnode list management The current notion of an active vnode is eliminated. Vnodes transition between 0<->1 hold counts all the time and the associated traversal between different lists induces significant scalability problems in certain workloads. Introduce a global list containing all allocated vnodes. They get unlinked only when UMA reclaims memory and are only requeued when hold count reaches 0. Sample result from an incremental make -s -j 104 bzImage on tmpfs: stock: 118.55s user 3649.73s system 7479% cpu 50.382 total patched: 122.38s user 1780.45s system 6242% cpu 30.480 total Reviewed by: jeff Tested by: pho (in a larger patch, previous version) Differential Revision: https://reviews.freebsd.org/D22997
# 57083d25	12-Jan-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: add per-mount vnode lazy list and use it for deferred inactive + msync This obviates the need to scan the entire active list looking for vnodes of interest. msync is handled by adding all vnodes with write count to the lazy list. deferred inactive directly adds vnodes as it sets the VI_DEFINACT flag. Vnodes get dequeued from the list when their hold count reaches 0. Newly added MNT_VNODE_FOREACH_LAZY* macros support filtering so that spurious locking is avoided in the common case. Reviewed by: jeff Tested by: pho (in a larger patch, previous version) Differential Revision: https://reviews.freebsd.org/D22995
# c8b3463d	07-Jan-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: reimplement deferred inactive to use a dedicated flag (VI_DEFINACT) The previous behavior of leaving VI_OWEINACT vnodes on the active list without a hold count is eliminated. Hold count is kept and inactive processing gets explicitly deferred by setting the VI_DEFINACT flag. The syncer is then responsible for vdrop. Reviewed by: kib (previous version) Tested by: pho (in a larger patch, previous version) Differential Revision: https://reviews.freebsd.org/D23036
# 5b87ecc6	22-Oct-2019	Konstantin Belousov <kib@FreeBSD.org>	Assert that vnode_pager_setsize() is called with the vnode exclusively locked except for filesystems that set the MNTK_VMSETSIZE_BUG, Set the flag for ZFS. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D21883
# d1cbf3ee	13-Oct-2019	Mateusz Guzik <mjg@FreeBSD.org>	vfs: add MNTK_NOMSYNC On many filesystems the traversal is effectively a no-op. Add a way to avoid the overhead. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D22009
# dc20b834	06-Oct-2019	Mateusz Guzik <mjg@FreeBSD.org>	vfs: add optional root vnode caching Root vnodes looekd up all the time, e.g. when crossing a mount point. Currently used routines always perform a costly lookup which can be trivially avoided. Reviewed by: jeff (previous version), kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21646
# ba7a55d9	22-Sep-2019	Sean Eric Fagan <sef@FreeBSD.org>	Add two options to allow mount to avoid covering up existing mount points. The two options are * nocover/cover: Prevent/allow mounting over an existing root mountpoint. E.g., "mount -t ufs -o nocover /dev/sd1a /usr/local" will fail if /usr/local is already a mountpoint. * emptydir/noemptydir: Prevent/allow mounting on a non-empty directory. E.g., "mount -t ufs -o emptydir /dev/sd1a /usr" will fail. Neither of these options is intended to be a default, for historical and compatibility reasons. Reviewed by: allanjude, kib Differential Revision: https://reviews.freebsd.org/D21458
# b488246b	19-Sep-2019	Mateusz Guzik <mjg@FreeBSD.org>	vfs: group fields used for per-cpu ops in one cacheline Sponsored by: The FreeBSD Foundation
# 4cace859	16-Sep-2019	Mateusz Guzik <mjg@FreeBSD.org>	vfs: convert struct mount counters to per-cpu There are 3 counters modified all the time in this structure - one for keeping the structure alive, one for preventing unmount and one for tracking active writers. Exact values of these counters are very rarely needed, which makes them a prime candidate for conversion to a per-cpu scheme, resulting in much better performance. Sample benchmark performing fstatfs (modifying 2 out of 3 counters) on a 104-way 2 socket Skylake system: before: 852393 ops/s after: 76682077 ops/s Reviewed by: kib, jeff Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21637
# a8c8e44b	16-Sep-2019	Mateusz Guzik <mjg@FreeBSD.org>	vfs: manage mnt_ref with atomics New primitive is introduced to denote sections can operate locklessly on aspects of struct mount, but which can also be disabled if necessary. This provides an opportunity to start scaling common case modifications while providing stable state of the struct when facing unmount, write suspendion or other events. mnt_ref is the first counter to start being managed in this manner with the intent to make it per-cpu. Reviewed by: kib, jeff Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21425
# 58aa4dbf	04-Sep-2019	Conrad Meyer <cem@FreeBSD.org>	sys/mount.h: Comment on distinction between vfs_{c,}mount Hope to save someone else a little future effort in ugly duplicated code. No functional change.
# 25c8d940	23-Aug-2019	Mateusz Guzik <mjg@FreeBSD.org>	vfs: assert the lock held in MNT_REF/MNT_REL Sponsored by: The FreeBSD Foundation
# e671edac	23-Aug-2019	Konstantin Belousov <kib@FreeBSD.org>	De-commision the MNTK_NOINSMNTQ kernel mount flag. After all the changes, its dynamic scope is same as for MNTK_UNMOUNT, but to allow the syncer vnode to be re-installed on unmount failure. But the case of syncer was already handled by using the VV_FORCEINSMQ flag for quite some time. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week
# de4e1aeb	18-Aug-2019	Konstantin Belousov <kib@FreeBSD.org>	Fix an issue with executing tmpfs binary. Suppose that a binary was executed from tmpfs mount, and the text vnode was reclaimed while the binary was still running. It is possible during even the normal operations since tmpfs vnode' vm_object has swap type, and no references on the vnode is held. Also assume that the text vnode was revived for some reason. Then, on the process exit or exec, unmapping of the text mapping tries to remove the text reference from the vnode, but since it went from recycle/instantiation cycle, there is no reference kept, and assertion in VOP_UNSET_TEXT_CHECKED() triggers. Fix this by keeping a use reference on the tmpfs vnode for each exec reference. This prevents the vnode reclamation while executable map entry is active. Do it by adding per-mount flag MNTK_TEXT_REFS that directs vop_stdset_text() to add use ref on first vnode text use, and per-vnode VI_TEXT_REF flag, to record the need on unref in vop_stdunset_text() on last vnode text use going away. Set MNTK_TEXT_REFS for tmpfs mounts. Reported by: bdrewery Tested by: sbruno, pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week
# daba4da8	01-Jul-2019	Kirk McKusick <mckusick@FreeBSD.org>	Add a new "untrusted" option to the mount command. Its purpose is to notify the kernel that the file system is untrusted and it should use more extensive checks on the file-system's metadata before using it. This option is intended to be used when mounting file systems from untrusted media such as USB memory sticks or other externally-provided media. It will initially be used by the UFS/FFS file system, but should likely be expanded to be used by other file systems that may appear on external media like msdosfs, exfat, and ext2fs. Reviewed by: kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20786
# d1fd400a	07-Dec-2018	Konstantin Belousov <kib@FreeBSD.org>	Add new file handle system calls. Namely, getfhat(2), fhlink(2), fhlinkat(2), fhreadlink(2). The syscalls are provided for a NFS userspace server (nfs-ganesha). Submitted by: Jack Halford <jack@gandi.net> Sponsored by: Gandi.net Tested by: pho Feedback from: brooks, markj MFC after: 1 week Differential revision: https://reviews.freebsd.org/D18359
# 6acf1b20	29-Oct-2018	Konstantin Belousov <kib@FreeBSD.org>	Clarify explanation of VFCF_SBDRY. Requested by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 3 days
# 8ff7fad1	23-Oct-2018	Konstantin Belousov <kib@FreeBSD.org>	Only call sigdeferstop() for NFS. Use bypass to catch any NFS VOP dispatch and route it through the wrapper which does sigdeferstop() and then dispatches original VOP. NFS does not need a bypass below it, which is not supported. The vop offset in the vop_vector is added since otherwise it is impossible to get vop_op_t from the internal table, and I did not wanted to create the layered fs only to wrap NFS VOPs. VFS_OP()s wrap is straightforward. Requested and reviewed by: mjg (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D17658
# 0e5c6bd4	04-May-2018	Jamie Gritton <jamie@FreeBSD.org>	Make it easier for filesystems to count themselves as jail-enabled, by doing most of the work in a new function prison_add_vfs in kern_jail.c Now a jail-enabled filesystem need only mark itself with VFCF_JAIL, and the rest is taken care of. This includes adding a jail parameter like allow.mount.foofs, and a sysctl like security.jail.mount_foofs_allowed. Both of these used to be a static list of known filesystems, with predefined permission bits. Reviewed by: kib Differential Revision: D14681
# 82614df4	31-Dec-2017	Colin Percival <cperciva@FreeBSD.org>	Use the TSLOG framework to record entry/exit timestamps for VFS_MOUNT calls.
# 51369649	20-Nov-2017	Pedro F. Giffuni <pfg@FreeBSD.org>	sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.
# 9a81ba0f	31-May-2017	Stephen J. Kiernan <stevek@FreeBSD.org>	Add MD_VERIFY option to enable O_VERIFY in open for vnode type. Add -o [no]verify option to mdconfig (and document in man page.) Implement GEOM attribute MNT::verified to ask md if the backing vnode is verified. Check for MNT::verified in cd9660 mount to flag the mount as MNT_VERIFIED if the underlying device has been verified. Reviewed by: rwatson Approved by: sjg (mentor) Obtained from: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D2902
# 69921123	23-May-2017	Konstantin Belousov <kib@FreeBSD.org>	Commit the 64-bit inode project. Extend the ino_t, dev_t, nlink_t types to 64-bit ints. Modify struct dirent layout to add d_off, increase the size of d_fileno to 64-bits, increase the size of d_namlen to 16-bits, and change the required alignment. Increase struct statfs f_mntfromname[] and f_mntonname[] array length MNAMELEN to 1024. ABI breakage is mitigated by providing compatibility using versioned symbols, ingenious use of the existing padding in structures, and by employing other tricks. Unfortunately, not everything can be fixed, especially outside the base system. For instance, third-party APIs which pass struct stat around are broken in backward and forward incompatible ways. Kinfo sysctl MIBs ABI is changed in backward-compatible way, but there is no general mechanism to handle other sysctl MIBS which return structures where the layout has changed. It was considered that the breakage is either in the management interfaces, where we usually allow ABI slip, or is not important. Struct xvnode changed layout, no compat shims are provided. For struct xtty, dev_t tty device member was reduced to uint32_t. It was decided that keeping ABI compat in this case is more useful than reporting 64-bit dev_t, for the sake of pstat. Update note: strictly follow the instructions in UPDATING. Build and install the new kernel with COMPAT_FREEBSD11 option enabled, then reboot, and only then install new world. Credits: The 64-bit inode project, also known as ino64, started life many years ago as a project by Gleb Kurtsou (gleb). Kirk McKusick (mckusick) then picked up and updated the patch, and acted as a flag-waver. Feedback, suggestions, and discussions were carried by Ed Maste (emaste), John Baldwin (jhb), Jilles Tjoelker (jilles), and Rick Macklem (rmacklem). Kris Moore (kris) performed an initial ports investigation followed by an exp-run by Antoine Brodin (antoine). Essential and all-embracing testing was done by Peter Holm (pho). The heavy lifting of coordinating all these efforts and bringing the project to completion were done by Konstantin Belousov (kib). Sponsored by: The FreeBSD Foundation (emaste, kib) Differential revision: https://reviews.freebsd.org/D10439
# fbbd9655	28-Feb-2017	Warner Losh <imp@FreeBSD.org>	Renumber copyright clause 4 Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96
# 2f304845	05-Jan-2017	Konstantin Belousov <kib@FreeBSD.org>	Do not allocate struct statfs on kernel stack. Right now size of the structure is 472 bytes on amd64, which is already large and stack allocations are indesirable. With the ino64 work, MNAMELEN is increased to 1024, which will make it impossible to have struct statfs on the stack. Extracted from: ino64 work by gleb Discussed with: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week
# abc15156	27-Nov-2016	Konstantin Belousov <kib@FreeBSD.org>	NFSv4 client tracks opens, and the track records are only dropped when the vnode is inactivated. This contradicts with the nullfs caching which keeps upper vnode around, as consequence keeping the use reference to lower vnode. Add a filesystem flag to request nullfs to not cache when mounted over that filesystem, and set the flag for nfs v4 mounts. Reported by: asomers Reviewed by: rmacklem Tested by: asomers, rmacklem Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 5bb81f9b	30-Sep-2016	Mateusz Guzik <mjg@FreeBSD.org>	vfs: batch free vnodes in per-mnt lists Previously free vnodes would always by directly returned to the global LRU list. With this change up to mnt_free_list_batch vnodes are collected first. syncer runs always return the batch regardless of its size. While vnodes on per-mnt lists are not counted as free, they can be returned in case of vnode shortage. Reviewed by: kib Tested by: pho
# debc480e	07-Jul-2016	Edward Tomasz Napierala <trasz@FreeBSD.org>	Add new unmount(2) flag, MNT_NONBUSY, to check whether there are any open vnodes before proceeding. Make autounmound(8) use this flag. Without it, even an unsuccessfull unmount causes filesystem flush, which interferes with normal operation. Reviewed by: kib@ Approved by: re (gjb@) MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D7047
# 3a1e5dd8	26-Jun-2016	Konstantin Belousov <kib@FreeBSD.org>	Rewrite sigdeferstop(9) and sigallowstop(9) into more flexible framework allowing to set the suspension policy for the dynamic block. Extend the currently possible policies of stopping on interruptible sleeps and ignoring such sleeps by two more: do not suspend at interruptible sleeps, but interrupt them with either EINTR or ERESTART. Reviewed by: jilles Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Approved by: re (gjb)
# 09c837b8	20-Nov-2015	Gleb Smirnoff <glebius@FreeBSD.org>	Remove remnants of the old NFS from vnode pager. Reviewed by: kib Sponsored by: Netflix
# 5f34e93c	05-Jul-2015	Mark Johnston <markj@FreeBSD.org>	Check suspendability on the mountpoint returned by VOP_GETWRITEMOUNT. This obviates the need for a MNTK_SUSPENDABLE flag, since passthrough filesystems like nullfs and unionfs no longer need to inherit this information from their lower layer(s). This change also restores the pre-r273336 behaviour of using the presence of a susp_clean VFS method to request suspension support. Reviewed by: kib, mjg Differential Revision: https://reviews.freebsd.org/D2937
# dda11d4a	15-Apr-2015	Rick Macklem <rmacklem@FreeBSD.org>	File systems that do not use the buffer cache (such as ZFS) must use VOP_FSYNC() to perform the NFS server's Commit operation. This patch adds a mnt_kern_flag called MNTK_USES_BCACHE which is set by file systems that use the buffer cache. If this flag is not set, the NFS server always does a VOP_FSYNC(). This should be ok for old file system modules that do not set MNTK_USES_BCACHE, since calling VOP_FSYNC() is correct, although it might not be optimal for file systems that use the buffer cache. Reviewed by: kib MFC after: 2 weeks
# a25100c5	08-Dec-2014	Konstantin Belousov <kib@FreeBSD.org>	Add functions syncer_suspend() and syncer_resume(), which are supposed to be called before suspension and after resume, correspondingly. The syncer_suspend() ensures that all filesystems dirty data and metadata are saved to the permanent storage, and stops kernel threads which might modify filesystems. The syncer_resume() restores stopped threads. For now, only syncer is stopped. This is needed, because each sync loop causes superblock updates for UFS. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 4fce16e4	20-Oct-2014	Mateusz Guzik <mjg@FreeBSD.org>	Provide vfs suspension support only for filesystems which need it, take two. nullfs and unionfs need to request suspension if underlying filesystem(s) use it. Utilize mnt_kern_flag for this purpose. This is a fixup for 273271. No strong objections from: kib Pointy hat to: mjg MFC after: 2 weeks
# 020b8f17	19-Oct-2014	Mateusz Guzik <mjg@FreeBSD.org>	Provide vfs suspension support only for filesystems which need it. Need is expressed by providing vfs_susp_clean function in vfsops. Differential Revision: D952 Reviewed by: kib (previous version) MFC after: 2 weeks
# 3914ddf8	17-Aug-2014	Edward Tomasz Napierala <trasz@FreeBSD.org>	Bring in the new automounter, similar to what's provided in most other UNIX systems, eg. MacOS X and Solaris. It uses Sun-compatible map format, has proper kernel support, and LDAP integration. There are still a few outstanding problems; they will be fixed shortly. Reviewed by: allanjude@, emaste@, kib@, wblock@ (earlier versions) Phabric: D523 MFC after: 2 weeks Relnotes: yes Sponsored by: The FreeBSD Foundation
# 168f4ee0	02-Aug-2014	Konstantin Belousov <kib@FreeBSD.org>	Remove Giant acquisition from the mount and unmount pathes. It could be claimed that two things were reasonable protected by Giant. One is vfsconf list links, which is converted to the new dedicated sx vfsconf_sx. Another is vfsconf.vfc_refcount, which is now updated with atomics. Note that vfc_refcount still has the same races now as it has under the Giant, the unload of filesystem modules can happen while the module is still in use. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# beb199ac	09-Nov-2013	Konstantin Belousov <kib@FreeBSD.org>	Hide MNT_SHARED_WRITES() and MNT_EXTENDED_SHARED() under the #ifdef _KERNEL braces. Struct mount is only defined for the kernel build. Reported and tested by: andreast Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# 6272798a	09-Nov-2013	Konstantin Belousov <kib@FreeBSD.org>	Both vn_close() and VFS_PROLOGUE() evaluate vp->v_mount twice, without holding the vnode lock; vp->v_mount is checked first for NULL equiality, and then dereferenced if not NULL. If vnode is reclaimed meantime, second dereference would still give NULL. Change VFS_PROLOGUE() to evaluate the mp once, convert MNTK_SHARED_WRITES and MNTK_EXTENDED_SHARED tests into inline functions. Reviewed by: alc Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# d5814e82	28-Oct-2013	Sergey Kandaurov <pluknet@FreeBSD.org>	G/c unused mountrootfsname. It was replaced with rootdevnames in r52778.
# 8fe6bddf	01-Sep-2013	Rick Macklem <rmacklem@FreeBSD.org>	Forced dismounts of NFS mounts can fail when thread(s) are stuck waiting for an RPC reply from the server while holding the mount point busy (mnt_lockref incremented). This happens because dounmount() msleep()s waiting for mnt_lockref to become 0, before calling VFS_UNMOUNT(). This patch adds a new VFS operation called VFS_PURGE(), which the NFS client implements as purging RPCs in progress. Making this call before checking mnt_lockref fixes the problem, by ensuring that the VOP_xxx() calls will fail and unbusy the mount point. Reported by: sbruno Reviewed by: kib MFC after: 2 weeks
# 477e6ee4	23-Aug-2013	Alfred Perlstein <alfred@FreeBSD.org>	Grow some spares in struct vfsops. This should hopefully prevent ABI breakage on adding new vfsops in 10.x.
# 4612275f	10-Jun-2013	Marcel Moolenaar <marcel@FreeBSD.org>	Revert r251590. It unexpectedly broke the build and there were some questions on locking. As part of commit-bit grooming, I'd like Steve to handle this, but can't leave things broken in the mean time.
# 8c7ca16f	09-Jun-2013	Marcel Moolenaar <marcel@FreeBSD.org>	Add vfs_mounted and vfs_unmounted events so that components can be informed about mount and unmount events. This is used by Juniper to implement a more optimal implementation of NetBSD's veriexec. Submitted by: stevek@juniper.net Obtained from: Juniper Networks, Inc
# 0fc6daa7	11-May-2013	Konstantin Belousov <kib@FreeBSD.org>	- Fix nullfs vnode reference leak in nullfs_reclaim_lowervp(). The null_hashget() obtains the reference on the nullfs vnode, which must be dropped. - Fix a wart which existed from the introduction of the nullfs caching, do not unlock lower vnode in the nullfs_reclaim_lowervp(). It should be innocent, but now it is also formally safe. Inform the nullfs_reclaim() about this using the NULLV_NOUNLOCK flag set on nullfs inode. - Add a callback to the upper filesystems for the lower vnode unlinking. When inactivating a nullfs vnode, check if the lower vnode was unlinked, indicated by nullfs flag NULLV_DROP or VV_NOSYNC on the lower vnode, and reclaim upper vnode if so. This allows nullfs to purge cached vnodes for the unlinked lower vnode, avoiding excessive caching. Reported by: G??ran L??wkrantz <goran.lowkrantz@ismobile.com> Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# f8c09530	19-Mar-2013	Konstantin Belousov <kib@FreeBSD.org>	A flag for the filesystem to indicate to the upper levels that it accepts unmapped buffers for the VOP_STRATEGY(). Sponsored by: The FreeBSD Foundation Tested by: pho
# 593efaf9	21-Feb-2013	John Baldwin <jhb@FreeBSD.org>	Further refine the handling of stop signals in the NFS client. The changes in r246417 were incomplete as they did not add explicit calls to sigdeferstop() around all the places that previously passed SBDRY to _sleep(). In addition, nfs_getcacheblk() could trigger a write RPC from getblk() resulting in sigdeferstop() recursing. Rather than manually deferring stop signals in specific places, change the VFS_() and VOP_() methods to defer stop signals for filesystems which request this behavior via a new VFCF_SBDRY flag. Note that this has to be a VFC flag rather than a MNTK flag so that it works properly with VFS_MOUNT() when the mount is not yet fully constructed. For now, only the NFS clients are set this new flag in VFS_SET(). A few other related changes: - Add an assertion to ensure that TDF_SBDRY doesn't leak to userland. - When a lookup request uses VOP_READLINK() to follow a symlink, mark the request as being on behalf of the thread performing the lookup (cnp_thread) rather than using a NULL thread pointer. This causes NFS to properly handle signals during this VOP on an interruptible mount. PR: kern/176179 Reported by: Russell Cattelan (sigdeferstop() recursion) Reviewed by: kib MFC after: 1 month
# 6cd3574c	24-Jan-2013	Sergey Kandaurov <pluknet@FreeBSD.org>	Update and clarify comments regarding VFS op table initialization in the man page and its header counterpart. Submitted by: Christoph Mallon <christoph.mallon@gmx.de> (initial version) Reviewed and further improved by: bde (previous version) All bugs are: mine
# d1c5e3f8	03-Jan-2013	Konstantin Belousov <kib@FreeBSD.org>	Remove the deprecated MNT_VNODE_FOREACH interface. Use the MNT_VNODE_FOREACH_ALL instead.
# 14df601e	14-Dec-2012	Konstantin Belousov <kib@FreeBSD.org>	When mnt_vnode_next_active iterator cannot lock the next vnode and yields, specify the user priority for the yield. Otherwise, a higher-priority (kernel) thread could fall into the priority-inversion with the thread owning the mutex lock. On single-processor machines or UP kernels, do not loop adaptively when the next vnode cannot be locked, instead yield unconditionally. Restructure the iteration initializer and the iterator to remove code duplication. Put the code to fetch and lock a vnode next to the current marker, into the mnt_vnode_next_active() function, and use it instead of repeating the loop. Reported by: hrs, rmacklem Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 days
# 4eea8aea	14-Dec-2012	Konstantin Belousov <kib@FreeBSD.org>	Line up the continuation backslashes. Sponsored by: The FreeBSD Foundation MFC after: 3 days
# bc2258da	09-Nov-2012	Attilio Rao <attilio@FreeBSD.org>	Complete MPSAFE VFS interface and remove MNTK_MPSAFE flag. Porters should refer to __FreeBSD_version 1000021 for this change as it may have happened at the same timeframe.
# 5050aa86	22-Oct-2012	Konstantin Belousov <kib@FreeBSD.org>	Remove the support for using non-mpsafe filesystem modules. In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho
# 102548d1	05-Oct-2012	Andriy Gapon <avg@FreeBSD.org>	mount.h: MNTK_VGONE_UPPER and MNTK_VGONE_WAITER were supposed to be different ... otherwise a waiter is never woken up. Reported by: swills Discussed with: jhb Approved by: kib MFC after: 3 days
# bcd5bb8e	09-Sep-2012	Konstantin Belousov <kib@FreeBSD.org>	Add a facility for vgone() to inform the set of subscribed mounts about vnode reclamation. Typical use is for the bypass mounts like nullfs to get a notification about lower vnode going away. Now, vgone() calls new VFS op vfs_reclaim_lowervp() with an argument lowervp which is reclaimed. It is possible to register several reclamation event listeners, to correctly handle the case of several nullfs mounts over the same directory. For the filesystem not having nullfs mounts over it, the overhead added is a single mount interlock lock/unlock in the vnode reclamation path. In collaboration with: pho MFC after: 3 weeks
# 84c3cd4f	09-Sep-2012	Konstantin Belousov <kib@FreeBSD.org>	Add MNTK_LOOKUP_EXCL_DOTDOT struct mount flag, which specifies to the lookup code that dotdot lookups shall override any shared lock requests with the exclusive one. The flag is useful for filesystems which sometimes need to upgrade shared lock to exclusive inside the VOP_LOOKUP or later, which cannot be done safely for dotdot, due to dvp also locked and causing LOR. In collaboration with: pho MFC after: 3 weeks
# 41014d99	30-May-2012	Konstantin Belousov <kib@FreeBSD.org>	vn_io_fault() is a facility to prevent page faults while filesystems perform copyin/copyout of the file data into the usermode buffer. Typical filesystem hold vnode lock and some buffer locks over the VOP_READ() and VOP_WRITE() operations, and since page fault handler may need to recurse into VFS to get the page content, a deadlock is possible. The facility works by disabling page faults handling for the current thread and attempting to execute i/o while allowing uiomove() to access the usermode mapping of the i/o buffer. If all buffer pages are resident, uiomove() is successfull and request is finished. If EFAULT is returned from uiomove(), the pages backing i/o buffer are faulted in and held, and the copyin/out is performed using uiomove_fromphys() over the held pages for the second attempt of VOP call. Since pages are hold in chunks to prevent large i/o requests from starving free pages pool, and since vnode lock is only taken for i/o over the current chunk, the vnode lock no longer protect atomicity of the whole i/o request. Use newly added rangelocks to provide the required atomicity of i/o regardind other i/o and truncations. Filesystems need to explicitely opt-in into the scheme, by setting the MNTK_NO_IOPF struct mount flag, and optionally by using vn_io_fault_uiomove(9) helper which takes care of calling uiomove() or converting uio into request for uiomove_fromphys(). Reviewed by: bf (comments), mdf, pjd (previous version) Tested by: pho Tested by: flo, Gustau P?rez <gperez entel upc edu> (previous version) MFC after: 2 months
# 11c15f90	18-May-2012	Kirk McKusick <mckusick@FreeBSD.org>	Update comment to document that the vnode free-list mutex needs to be held when updating mnt_activevnodelist and mnt_activevnodelistsize.
# f257ebbb	20-Apr-2012	Kirk McKusick <mckusick@FreeBSD.org>	This change creates a new list of active vnodes associated with a mount point. Active vnodes are those with a non-zero use or hold count, e.g., those vnodes that are not on the free list. Note that this list is in addition to the list of all the vnodes associated with a mount point. To avoid adding another set of linkage pointers to the vnode structure, the active list uses the existing linkage pointers used by the free list (previously named v_freelist, now renamed v_actfreelist). This update adds the MNT_VNODE_FOREACH_ACTIVE interface that loops over just the active vnodes associated with a mount point (typically less than 1% of the vnodes associated with the mount point). Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks
# 71469bb3	17-Apr-2012	Kirk McKusick <mckusick@FreeBSD.org>	Replace the MNT_VNODE_FOREACH interface with MNT_VNODE_FOREACH_ALL. The primary changes are that the user of the interface no longer needs to manage the mount-mutex locking and that the vnode that is returned has its mutex locked (thus avoiding the need to check to see if its is DOOMED or other possible end of life senarios). To minimize compatibility issues for third-party developers, the old MNT_VNODE_FOREACH interface will remain available so that this change can be MFC'ed to 9. Following the MFC to 9, MNT_VNODE_FOREACH will be removed in head. The reason for this update is to prepare for the addition of the MNT_VNODE_FOREACH_ACTIVE interface that will loop over just the active vnodes associated with a mount point (typically less than 1% of the vnodes associated with the mount point). Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks
# e8f8ad72	11-Apr-2012	Kirk McKusick <mckusick@FreeBSD.org>	Whitespace cleanup.
# 0ff93c48	07-Apr-2012	Gleb Kurtsou <gleb@FreeBSD.org>	Add vfs_getopt_size. Support human readable file system options in tmpfs. Increase maximum tmpfs file system size to 4GB*PAGE_SIZE on 32 bit archs. Discussed with: delphij MFC after: 2 weeks
# 38ddb572	08-Mar-2012	Konstantin Belousov <kib@FreeBSD.org>	Decomission mnt_noasync. Introduce MNTK_NOASYNC mnt_kern_flag which allows a filesystem to request VFS to not allow MNTK_ASYNC. MFC after: 1 week
# cc672d35	16-Jan-2012	Kirk McKusick <mckusick@FreeBSD.org>	Make sure all intermediate variables holding mount flags (mnt_flag) and that all internal kernel calls passing mount flags are declared as uint64_t so that flags in the top 32-bits are not lost. MFC after: 2 weeks
# d716efa9	24-Jul-2011	Kirk McKusick <mckusick@FreeBSD.org>	Move the MNTK_SUJ flag in mnt_kern_flag to MNT_SUJ in mnt_flag so that it is visible to userland programs. This change enables the `mount' command with no arguments to be able to show if a filesystem is mounted using journaled soft updates as opposed to just normal soft updates. Approved by: re (bz)
# 6beb3bb4	24-Jul-2011	Kirk McKusick <mckusick@FreeBSD.org>	This update changes the mnt_flag field in the mount structure from 32 bits to 64 bits and eliminates the unused mnt_xflag field. The existing mnt_flag field is completely out of bits, so this update gives us room to expand. Note that the f_flags field in the statfs structure is already 64 bits, so the expanded mnt_flag field can be exported without having to make any changes in the statfs structure. Approved by: re (bz)
# 694a586a	21-May-2011	Rick Macklem <rmacklem@FreeBSD.org>	Add a lock flags argument to the VFS_FHTOVP() file system method, so that callers can indicate the minimum vnode locking requirement. This will allow some file systems to choose to return a LK_SHARED locked vnode when LK_SHARED is specified for the flags argument. This patch only adds the flag. It does not change any file system to use it and all callers specify LK_EXCLUSIVE, so file system semantics are not changed. Reviewed by: kib
# 8dfec4a3	21-Dec-2010	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Close body of the VFS_UNLOCK_GIANT() macro into do { } while (0) loop, so it can be used in code like this: if (cond) VFS_UNLOCK_GIANT(vfslocked); else ; /* Do something else. */ Before the change, compiler couldn't decide on its own if else should be applied to the 'if (cond)' or to the if statement inside VFS_UNLOCK_GIANT() macro.
# a7d5f7eb	19-Oct-2010	Jamie Gritton <jamie@FreeBSD.org>	A new jail(8) with a configuration file, to replace the work currently done by /etc/rc.d/jail.
# d0cc54f3	10-Oct-2010	Konstantin Belousov <kib@FreeBSD.org>	The r184588 changed the layout of struct export_args, causing an ABI breakage for old mount(2) syscall, since most struct <filesystem>_args embed export_args. The mount(2) is supposed to provide ABI compatibility for pre-nmount mount(8) binaries, so restore ABI to pre-r184588. Requested and reviewed by: bde MFC after: 2 weeks
# 418f1e7b	14-Sep-2010	Konstantin Belousov <kib@FreeBSD.org>	Rename the field to not confuse readers. The bytes are actually used. Discussed with: rmacklem MFC after: 1 week
# 9a24dc07	11-Sep-2010	Konstantin Belousov <kib@FreeBSD.org>	Protect mnt_syncer with the sync_mtx. This prevents a (rare) vnode leak when mount and update are executed in parallel. Encapsulate syncer vnode deallocation into the helper function vfs_deallocate_syncvnode(), to not externalize sync_mtx from vfs_subr.c. Found and reviewed by: jh (previous version of the patch) Tested by: pho MFC after: 3 weeks
# c87f1ad4	28-Aug-2010	Pawel Jakub Dawidek <pjd@FreeBSD.org>	There is a bug in vfs_allocate_syncvnode() failure handling in mount code. Actually it is hard to properly handle such a failure, especially in MNT_UPDATE case. The only reason for the vfs_allocate_syncvnode() function to fail is getnewvnode() failure. Fortunately it is impossible for current implementation of getnewvnode() to fail, so we can assert this and make vfs_allocate_syncvnode() void. This in turn free us from handling its failures in the mount code. Reviewed by: kib MFC after: 1 month
# 113db2dd	24-Apr-2010	Jeff Roberson <jeff@FreeBSD.org>	- Merge soft-updates journaling from projects/suj/head into head. This brings in support for an optional intent log which eliminates the need for background fsck on unclean shutdown. Sponsored by: iXsystems, Yahoo!, and Juniper. With help from: McKusick and Peter Holm
# 0718d64d	18-Apr-2010	Edward Tomasz Napierala <trasz@FreeBSD.org>	MFC r200796: Implement NFSv4 ACL support for UFS. Reviewed by: rwatson
# 9340fc72	21-Dec-2009	Edward Tomasz Napierala <trasz@FreeBSD.org>	Implement NFSv4 ACL support for UFS. Reviewed by: rwatson
# fe1d3f15	28-Jun-2009	Stanislav Sedov <stas@FreeBSD.org>	- Turn the third (islocked) argument of the knote call into flags parameter. Introduce the new flag KNF_NOKQLOCK to allow event callers to be called without KQ_LOCK mtx held. - Modify VFS knote calls to always use KNF_NOKQLOCK flag. This is required for ZFS as its getattr implementation may sleep. Approved by: re (rwatson) Reviewed by: kib MFC after: 2 weeks
# 27bfb741	08-Jun-2009	Paul Saab <ps@FreeBSD.org>	Simply shared vnode locking and extend it to also include fsync. Also, in vop_write, no longer assert for exclusive locks on the vnode. Reviewed by: jhb, kmacy, jeffr
# a6d545d8	04-Jun-2009	Paul Saab <ps@FreeBSD.org>	Support shared vnode locks for write operations when the offset is provided on filesystems that support it. This really improves mysql + innodb performance on ZFS. Reviewed by: jhb, kmacy, jeffr
# faef64cc	30-May-2009	Attilio Rao <attilio@FreeBSD.org>	Remove the now invalid (and possibly unused) debug.mpsafevfs sysctl/tunable. Reviewed by: emaste Sponsored by: Sandvine Incorporated
# 61cea482	29-May-2009	Edward Tomasz Napierala <trasz@FreeBSD.org>	There is only one spare MNT_ flag left, and I want to use it for NFSv4 ACLs. Make room for additional filesystem flags now, to avoid breaking ABI later. Reviewed by: kib@
# dfd233ed	11-May-2009	Attilio Rao <attilio@FreeBSD.org>	Remove the thread argument from the FSD (File-System Dependent) parts of the VFS. Now all the VFS_* functions and relating parts don't want the context as long as it always refers to curthread. In some points, in particular when dealing with VOPs and functions living in the same namespace (eg. vflush) which still need to be converted, pass curthread explicitly in order to retain the old behaviour. Such loose ends will be fixed ASAP. While here fix a bug: now, UFS_EXTATTR can be compiled alone without the UFS_EXTATTR_AUTOSTART option. VFS KPI is heavilly changed by this commit so thirdy parts modules needs to be recompiled. Bump __FreeBSD_version in order to signal such situation.
# 33fc3625	11-Mar-2009	John Baldwin <jhb@FreeBSD.org>	Add a new internal mount flag (MNTK_EXTENDED_SHARED) to indicate that a filesystem supports additional operations using shared vnode locks. Currently this is used to enable shared locks for open() and close() of read-only file descriptors. - When an ISOPEN namei() request is performed with LOCKSHARED, use a shared vnode lock for the leaf vnode only if the mount point has the extended shared flag set. - Set LOCKSHARED in vn_open_cred() for requests that specify O_RDONLY but not O_CREAT. - Use a shared vnode lock around VOP_CLOSE() if the file was opened with O_RDONLY and the mountpoint has the extended shared flag set. - Adjust md(4) to upgrade the vnode lock on the vnode it gets back from vn_open() since it now may only have a shared vnode lock. - Don't enable shared vnode locks on FIFO vnodes in ZFS and UFS since FIFO's require exclusive vnode locks for their open() and close() routines. (My recent MPSAFE patches for UDF and cd9660 already included this change.) - Enable extended shared operations on UFS, cd9660, and UDF. Submitted by: ups Reviewed by: pjd (ZFS bits) MFC after: 1 month
# f86bce5e	02-Mar-2009	Jamie Gritton <jamie@FreeBSD.org>	Extend the "vfsopt" mount options for more general use. Make struct vfsopt and the vfs_buildopts function public, and add some new fields to struct vfsopt (pos and seen), and new functions vfs_getopt_pos and vfs_opterror. Further extend the interface to allow reading options from the kernel in addition to sending them to the kernel, with vfs_setopt and related functions. While this allows the "name=value" option interface to be used for more than just FS mounts (planned use is for jails), it retains the current "vfsopt" name and <sys/mount.h> requirement. Approved by: bz (mentor)
# ec48c16f	06-Feb-2009	Edward Tomasz Napierala <trasz@FreeBSD.org>	Add KASSERTs to make it easier to debug problems like the one fixed in r188141. Reviewed by: kib,attilio Approved by: rwatson (mentor) Tested by: pho Sponsored by: FreeBSD Foundation
# 4a0f8076	16-Dec-2008	Attilio Rao <attilio@FreeBSD.org>	1) Fix a deadlock in the VFS: - threadA runs vfs_rel(mp1) - threadB does unmount the mp1 fs, sets MNTK_UNMOUNT and drop MNT_ILOCK() - threadA runs vfs_busy(mp1) and, as long as, MNTK_UNMOUNT is set, sleeps waiting for threadB to complete the unmount - threadB, in vfs_mount_destroy(), finds mnt_lock > 0 and sleeps waiting for the refcount to expire. Fix the deadlock by adding a flag called MNTK_REFEXPIRE which signals the unmounter is waiting for mnt_ref to expire. The vfs_busy contenders got awake, fails, and if they retry the MNTK_REFEXPIRE won't allow them to sleep again. 2) Simplify significantly the code of vfs_mount_destroy() trimming unnecessary codes: - as long as any reference exited, it is no-more possible to have write-op (primarty and secondary) in progress. - it is no needed to drop and reacquire the mount lock. - filling the structures with dummy values is unuseful as long as it is going to be freed. Tested by: pho, Andrea Barberio <insomniac at slackware dot it> Discussed with: kib
# 61791644	29-Nov-2008	Konstantin Belousov <kib@FreeBSD.org>	In the nfsrv_fhtovp(), after the vfs_getvfs() function found the pointer to the fs, but before a vnode on the fs is locked, unmount may free fs structures, causing access to destroyed data and freed memory. Introduce a vfs_busymp() function that looks up and busies found fs while mountlist_mtx is held. Use it in nfsrv_fhtovp() and in the implementation of the handle syscalls. Two other uses of the vfs_getvfs() in the vfs_subr.c, namely in sysctl_vfs_ctl and vfs_getnewfsid seems to be ok. In particular, sysctl_vfs_ctl is protected by Giant by being a non-sleeping sysctl handler, that prevents Giant-locked unmount code to interfere with it. Noted by: tegge Reviewed by: dfr Tested by: pho MFC after: 1 month
# 1ba4a712	17-Nov-2008	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Update ZFS from version 6 to 13 and bring some FreeBSD-specific changes. This bring huge amount of changes, I'll enumerate only user-visible changes: - Delegated Administration Allows regular users to perform ZFS operations, like file system creation, snapshot creation, etc. - L2ARC Level 2 cache for ZFS - allows to use additional disks for cache. Huge performance improvements mostly for random read of mostly static content. - slog Allow to use additional disks for ZFS Intent Log to speed up operations like fsync(2). - vfs.zfs.super_owner Allows regular users to perform privileged operations on files stored on ZFS file systems owned by him. Very careful with this one. - chflags(2) Not all the flags are supported. This still needs work. - ZFSBoot Support to boot off of ZFS pool. Not finished, AFAIK. Submitted by: dfr - Snapshot properties - New failure modes Before if write requested failed, system paniced. Now one can select from one of three failure modes: - panic - panic on write error - wait - wait for disk to reappear - continue - serve read requests if possible, block write requests - Refquota, refreservation properties Just quota and reservation properties, but don't count space consumed by children file systems, clones and snapshots. - Sparse volumes ZVOLs that don't reserve space in the pool. - External attributes Compatible with extattr(2). - NFSv4-ACLs Not sure about the status, might not be complete yet. Submitted by: trasz - Creation-time properties - Regression tests for zpool(8) command. Obtained from: OpenSolaris
# 30f60d8c	03-Nov-2008	Attilio Rao <attilio@FreeBSD.org>	Remove the mnt_holdcnt and mnt_holdcntwaiters because they are useless. Really, the concept of holdcnt in the struct mount is rappresented by the mnt_ref (which prevents the type-stable structure from being "recycled) handled through vfs_ref() and vfs_rel(). On this optic, switch the holdcnt acquisition into an emulated vfs_ref() (and subsequent release into vfs_rel()). Discussed with: kib Tested by: pho
# a9148abd	03-Nov-2008	Doug Rabson <dfr@FreeBSD.org>	Implement support for RPCSEC_GSS authentication to both the NFS client and server. This replaces the RPC implementation of the NFS client and server with the newer RPC implementation originally developed (actually ported from the userland sunrpc code) to support the NFS Lock Manager. I have tested this code extensively and I believe it is stable and that performance is at least equal to the legacy RPC implementation. The NFS code currently contains support for both the new RPC implementation and the older legacy implementation inherited from the original NFS codebase. The default is to use the new implementation - add the NFS_LEGACYRPC option to fall back to the old code. When I merge this support back to RELENG_7, I will probably change this so that users have to 'opt in' to get the new code. To use RPCSEC_GSS on either client or server, you must build a kernel which includes the KGSSAPI option and the crypto device. On the userland side, you must build at least a new libc, mountd, mount_nfs and gssd. You must install new versions of /etc/rc.d/gssd and /etc/rc.d/nfsd and add 'gssd_enable=YES' to /etc/rc.conf. As long as gssd is running, you should be able to mount an NFS filesystem from a server that requires RPCSEC_GSS authentication. The mount itself can happen without any kerberos credentials but all access to the filesystem will be denied unless the accessing user has a valid ticket file in the standard place (/tmp/krb5cc_<uid>). There is currently no support for situations where the ticket file is in a different place, such as when the user logged in via SSH and has delegated credentials from that login. This restriction is also present in Solaris and Linux. In theory, we could improve this in future, possibly using Brooks Davis' implementation of variant symlinks. Supporting RPCSEC_GSS on a server is nearly as simple. You must create service creds for the server in the form 'nfs/<fqdn>@<REALM>' and install them in /etc/krb5.keytab. The standard heimdal utility ktutil makes this fairly easy. After the service creds have been created, you can add a '-sec=krb5' option to /etc/exports and restart both mountd and nfsd. The only other difference an administrator should notice is that nfsd doesn't fork to create service threads any more. In normal operation, there will be two nfsd processes, one in userland waiting for TCP connections and one in the kernel handling requests. The latter process will create as many kthreads as required - these should be visible via 'top -H'. The code has some support for varying the number of service threads according to load but initially at least, nfsd uses a fixed number of threads according to the value supplied to its '-n' option. Sponsored by: Isilon Systems MFC after: 1 month
# 83b3bdbc	02-Nov-2008	Attilio Rao <attilio@FreeBSD.org>	Improve VFS locking: - Implement real draining for vfs consumers by not relying on the mnt_lock and using instead a refcount in order to keep track of lock requesters. - Due to the change above, remove the mnt_lock lockmgr because it is now useless. - Due to the change above, vfs_busy() is no more linked to a lockmgr. Change so its KPI by removing the interlock argument and defining 2 new flags for it: MBF_NOWAIT which basically replaces the LK_NOWAIT of the old version (which was unlinked from the lockmgr alredy) and MBF_MNTLSTLOCK which provides the ability to drop the mountlist_mtx once the mnt interlock is held (ability still desired by most consumers). - The stub used into vfs_mount_destroy(), that allows to override the mnt_ref if running for more than 3 seconds, make it totally useless. Remove it as it was thought to work into older versions. If a problem of "refcount held never going away" should appear, we will need to fix properly instead than trust on such hackish solution. - Fix a bug where returning (with an error) from dounmount() was still leaving the MNTK_MWAIT flag on even if it the waiters were actually woken up. Just a place in vfs_mount_destroy() is left because it is going to recycle the structure in any case, so it doesn't matter. - Remove the markercnt refcount as it is useless. This patch modifies VFS ABI and breaks KPI for vfs_busy() so manpages and __FreeBSD_version will be modified accordingly. Discussed with: kib Tested by: pho
# d7f03759	19-Oct-2008	Ulf Lilleengen <lulf@FreeBSD.org>	- Import the HEAD csup code which is the basis for the cvsmode work.
# 6e6049e9	19-Sep-2008	David E. O'Brien <obrien@FreeBSD.org>	Add freebsd32 compat shim for nmount(2). (and quiet some compiler warnings for vfs_donmount)
# 2814d5ba	16-Sep-2008	Konstantin Belousov <kib@FreeBSD.org>	When attempt is made to suspend a filesystem that is already syspended, wait until the current suspension is lifted instead of silently returning success immediately. The consequences of calling vfs_write() resume when not owning the suspension are not well-defined at best. Add the vfs_susp_clean() mount method to be called from vfs_write_resume(). Set it to process_deferred_inactive() for ffs, and stop calling it manually. Add the thread flag TDP_IGNSUSP that allows to bypass the suspension point in the vn_start_write. It is intended for use by VFS in the situations where the suspender want to do some i/o requiring calls to vn_start_write(), and this i/o cannot be done later. Reviewed by: tegge In collaboration with: pho MFC after: 1 month
# 59d49325	31-Aug-2008	Attilio Rao <attilio@FreeBSD.org>	Decontextualize vfs_busy(), vfs_unbusy() and vfs_mount_alloc() functions. Manpages are updated accordingly. Tested by: Diego Sardina <siarodx at gmail dot com>
# a7053783	09-Jun-2008	Konstantin Belousov <kib@FreeBSD.org>	Provide the mutual exclusion between the nfs export list modifications and nfs requests processing. Lockmgr lock provides the shared locking for nfs requests, while exclusive mode is used for modifications. The writer starvation is handled by lockmgr too. Reported by: kris, pho, many Based on the submission by: mohan Tested by: pho MFC after: 2 weeks
# 3800322f	26-Apr-2008	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Implement 'show mount' command in DDB. Without argument, it prints short info about all currently mounted file systems. When an address is given as an argument, prints detailed info about the given mount point. MFC after: 2 weeks
# 7fbfba7b	01-Mar-2008	Attilio Rao <attilio@FreeBSD.org>	- Handle buffer lock waiters count directly in the buffer cache instead than rely on the lockmgr support [1]: * bump the waiters only if the interlock is held * let brelvp() return the waiters count * rely on brelvp() instead than BUF_LOCKWAITERS() in order to check for the waiters number - Remove a namespace pollution introduced recently with lockmgr.h including lock.h by including lock.h directly in the consumers and making it mandatory for using lockmgr. - Modify flags accepted by lockinit(): * introduce LK_NOPROFILE which disables lock profiling for the specified lockmgr * introduce LK_QUIET which disables ktr tracing for the specified lockmgr [2] * disallow LK_SLEEPFAIL and LK_NOWAIT to be passed there so that it can only be used on a per-instance basis - Remove BUF_LOCKWAITERS() and lockwaiters() as they are no longer used This patch breaks KPI so __FreBSD_version will be bumped and manpages updated by further commits. Additively, 'struct buf' changes results in a disturbed ABI also. [2] Really, currently there is no ktr tracing in the lockmgr, but it will be added soon. [1] Submitted by: kib Tested by: pho, Andrea Barberio <insomniac at slackware dot it>
# 245b2044	12-Sep-2007	Konstantin Belousov <kib@FreeBSD.org>	When restoring the mount after umount failed, the MNTK_UNMOUNT flag prevents insmntque() from placing reallocated syncer vnode on mount list, that causes panic in vfs_allocate_syncvnode(). Introduce MNTK_NOINSMNTQ flag, that marks the period when instmntque is not allowed to success, instead of MNTK_UNMOUNT. The MNTK_NOINSMNTQ is set and cleared simultaneously with MNTK_UNMOUNT, except on umount error path, where it is cleaned just before the syncer vnode is going to be allocated. Reported by: Peter Jeremy <peterjeremy optushome com au> Suggested by: tegge Approved by: re (rwatson)
# cc479dda	28-Aug-2007	John Baldwin <jhb@FreeBSD.org>	Rework the routines to convert a 5.x+ statfs structure (with fixed-size 64-bit counters) to a 4.x statfs structure (with long-sized counters). - For block counters, we scale up the block size sufficiently large so that the resulting block counts fit into a the long-sized (long for the ABI, so 32-bit in freebsd32) counters. In 4.x the NFS client's statfs VOP did this already. This can lie about the block size to 4.x binaries, but it presents a more accurate picture of the ratios of free and available space. - For non-block counters, fix the freebsd32 stats converter to cap the values at INT32_MAX rather than losing the upper 32-bits to match the behavior of the 4.x statfs conversion routine in vfs_syscalls.c Approved by: re (kensmith)
# eb542415	22-Apr-2007	Robert Watson <rwatson@FreeBSD.org>	In the MAC Framework implementation, file systems have two per-mountpoint labels: the mount label (label of the mountpoint) and the fs label (label of the file system). In practice, policies appear to only ever use one, and the distinction is not helpful. Combine mnt_mntlabel and mnt_fslabel into a single mnt_label, and eliminate extra machinery required to maintain the additional label. Update policies to reflect removal of extra entry points and label. Obtained from: TrustedBSD Project Sponsored by: SPARTA, Inc.
# 7760d840	17-Apr-2007	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Export vfs_mount_alloc() as it is used in ZFS.
# f3a8d2f9	05-Apr-2007	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Add security.jail.mount_allowed sysctl, which allows to mount and unmount jail-friendly file systems from within a jail. Precisely it grants PRIV_VFS_MOUNT, PRIV_VFS_UNMOUNT and PRIV_VFS_MOUNT_NONUSER privileges for a jailed super-user. It is turned off by default. A jail-friendly file system is a file system which driver registers itself with VFCF_JAIL flag via VFS_SET(9) API. The lsvfs(1) command can be used to see which file systems are jail-friendly ones. There currently no jail-friendly file systems, ZFS will be the first one. In the future we may consider marking file systems like nullfs as jail-friendly. Reviewed by: rwatson
# 4874b3fb	01-Apr-2007	Pawel Jakub Dawidek <pjd@FreeBSD.org>	More style nits.
# daa88cdf	01-Apr-2007	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Style nit.
# 695919ad	31-Mar-2007	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Make vfs_mount_destroy() and vfs_freeopts() non-static, I'd like to use them.
# c146055f	29-Mar-2007	Konstantin Belousov <kib@FreeBSD.org>	Extend rev. 1.210 to avoid dereference NULL mp in VFS_NEEDSGIANT and VFS_ASSERT_GIANT. Stop using reserved namespace. Reported and tested by: kris Reviewed and enhanced by: tegge MFC after: 1 week
# 2c7b0f41	16-Feb-2007	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Remove VFS_VPTOFH entirely. API is already broken and it is good time to do it. Suggested by: rwatson
# 10bcafe9	15-Feb-2007	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Move vnode-to-file-handle translation from vfs_vptofh to vop_vptofh method. This way we may support multiple structures in v_data vnode field within one file system without using black magic. Vnode-to-file-handle should be VOP in the first place, but was made VFS operation to keep interface as compatible as possible with SUN's VFS. BTW. Now Solaris also implements vnode-to-file-handle as VOP operation. VFS_VPTOFH() was left for API backward compatibility, but is marked for removal before 8.0-RELEASE. Approved by: mckusick Discussed with: many (on IRC) Tested with: ufs, msdosfs, cd9660, nullfs and zfs
# 2892f3bb	16-Dec-2006	Craig Rodrigues <rodrigc@FreeBSD.org>	Add a function vfs_deleteopt() which searches through the vfsoptlist linked list of mount options by name, and deletes the option if it finds it.
# 206ad245	31-Oct-2006	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Add MNT_GJOURNAL flag which indicates, that file system has gjournal support enabled. Add mnt_gjprovider field which keeps gjournal provider's name on which file system is placed on. This allows to not place file system on gjournal directly and allows gjournal class to pair gjournal provider with file system. Sponsored by: home.pl
# 30af7119	03-Oct-2006	Konstantin Belousov <kib@FreeBSD.org>	Fix the remaining race in the revs. 1.232, 1,233 that could occur during unmount when mp structure is reused while waiting for coveredvp lock. Introduce struct mount generation count, increment it on each reuse and compare the generations before and after obtaining the coveredvp lock. Reviewed by: tegge, pjd Approved by: pjd (mentor) MFC after: 2 weeks
# 55b4ff0d	25-Sep-2006	Tor Egge <tegge@FreeBSD.org>	Increase mnt_noasync once in softdep_mount() to disallow async io, closing a window where a file system using softupdates could be async for a short while if both MNT_UPDATE and MNT_ASYNC were passed as flags to nmount(). Add MNTK_SOFTDEP flag to ensure that softdep_mount() doesn't increase mnt_noasync multiple times.
# a1e363f2	25-Sep-2006	Tor Egge <tegge@FreeBSD.org>	Add mnt_noasync counter to better handle interleaved calls to nmount(), sync() and sync_fsync() without losing MNT_ASYNC. Add MNTK_ASYNC flag which is set only when MNT_ASYNC is set and mnt_noasync is zero, and check that flag instead of MNT_ASYNC before initiating async io.
# 5da56ddb	25-Sep-2006	Tor Egge <tegge@FreeBSD.org>	Use mount interlock to protect all changes to mnt_flag and mnt_kern_flag. This eliminates a race where MNT_UPDATE flag could be lost when nmount() raced against sync(), sync_fsync() or quotactl().
# 7d7d9e22	13-Sep-2006	Mohan Srinivasan <mohans@FreeBSD.org>	Fixes up the handling of shared vnode lock lookups in the NFS client, adds a FS type specific flag indicating that the FS supports shared vnode lock lookups, adds some logic in vfs_lookup.c to test this flag and set lock flags appropriately. - amd on 6.x is a non-starter (without this change). Using amd under heavy load results in a deadlock (with cascading vnode locks all the way to the root) very quickly. - This change should also fix the more general problem of cascading vnode deadlocks when an NFS server goes down. Ideally, we wouldn't need these changes, as enabling shared vnode lock lookups globally would work. Unfortunately, UFS, for example isn't ready for shared vnode lock lookups, crashing pretty quickly. This change is the result of discussions with Stephan Uphoff (ups@). Reviewed by: ups@
# 5ac6cbfd	05-May-2006	Tor Egge <tegge@FreeBSD.org>	Avoid dereferencing NULL pointer.
# 9eee2605	30-Mar-2006	Jeff Roberson <jeff@FreeBSD.org>	- Define mnt_startzero and mnt_endzero as a range that excludes mnt_mtx and mnt_lock so that the mountpoint can be explicitly zeroed on creation. Discussed with: tegge Tested by: kris Sponsored by: Isilon Systems, Inc.
# ca2fa807	10-Mar-2006	Tor Egge <tegge@FreeBSD.org>	Block secondary writes while expunging active unlinked files. Fix detection of active unlinked files by checking VI_OWEINACT and VI_DOINGINACT in addition to v_usecount. Defer inactive handling for unlinked files if the file system is mostly suspended (secondary writes being blocked). Perform deferred inactive handling after the file system is resumed.
# 791dd2fa	08-Mar-2006	Tor Egge <tegge@FreeBSD.org>	Use vn_start_secondary_write() and vn_finished_secondary_write() as a replacement for vn_write_suspend_wait() to better account for secondary write processing. Close race where secondary writes could be started after ffs_sync() returned but before the file system was marked as suspended. Detect if secondary writes or softdep processing occurred during vnode sync loop in ffs_sync() and retry the loop if needed.
# eb2ea105	01-Mar-2006	Jeff Roberson <jeff@FreeBSD.org>	- Move softdep from using a global worklist to per-mount worklists. This has many positive effects including improved smp locking, reducing interdependencies between mounts that can lead to deadlocks, etc. - Add the softdep worklist and various counters to the ufsmnt structure. - Add a mount pointer to the workitem and remove mount pointers from the various structures derived from the workitem as they are now redundant. - Remove the poor-man's semaphore protecting softdep_process_worklist and softdep_flushworklist. Several threads may now process the list simultaneously. - Add softdep_waitidle() to block the thread until all pending dependencies being operated on by other threads have been flushed. - Use softdep_waitidle() in unmount and snapshots to block either operation until the fs is stable. - Remove softdep worklist processing from the syncer and move it into the softdep_flush() thread. This thread processes all softdep mounts once each second and when it is called via the new softdep_speedup() when there is a resource shortage. This removes the softdep hook from the kernel and various hacks in header files to support it. Reviewed by/Discussed with: tegge, truckman, mckusick Tested by: kris
# 04f6d3ef	06-Feb-2006	Jeff Roberson <jeff@FreeBSD.org>	- Add a ref count to the mount structure. Sleep for up to 3 seconds in vfs_mount_destroy waiting for this ref to hit 0. We don't print an error if we are rebooting as the root mount always retains some refernces by init proc. - Acquire a mnt ref for every vnode allocated to a mount point. Drop this ref only once vdestroy() has been called and the mount has been freed. - No longer NULL the v_mount pointer in delmntque() so that we may release the ref after vgone() has been called. This allows us to guarantee that the mount point structure will be valid until the last vnode has lost its last ref. - Fix a few places that rely on checking v_mount to detect recycling. Sponsored by: Isilon Systems, Inc. MFC After: 1 week
# 82be0a5a	09-Jan-2006	Tor Egge <tegge@FreeBSD.org>	Add marker vnodes to ensure that all vnodes associated with the mount point are iterated over when using MNT_VNODE_FOREACH. Reviewed by: truckman
# a94d0a9d	18-Dec-2005	Pawel Jakub Dawidek <pjd@FreeBSD.org>	- Document another spare flag (0x00000010). - Add a 'XXX' comment about MNT_ACLS and MNT_BYFSID flags collision and explain why it is harmless. - Add a colon after 'XXX' for consistency.
# 0430a5e2	13-Dec-2005	Dag-Erling Smørgrav <des@FreeBSD.org>	Eradicate caddr_t from the VFS API.
# 2207c764	28-Nov-2005	Craig Rodrigues <rodrigc@FreeBSD.org>	Remove MNT_NODEV mount option. In RELENG_6, MNT_NODEV was a no-op. The presence of MNT_NODEV was confusing the am-utils autoconf scripts. PR: conf/79715
# 84e69560	07-Nov-2005	Craig Rodrigues <rodrigc@FreeBSD.org>	Add utility function to propagate mount errors as text string messages. Discussed with: phk
# b133aa18	23-Oct-2005	Pawel Jakub Dawidek <pjd@FreeBSD.org>	MNT_JAILDEVFS is not used anymore. Mark it as spare. OK'ed by: phk
# 34cc826a	05-Aug-2005	Suleiman Souhlal <ssouhlal@FreeBSD.org>	Holding a vnode doesn't prevent v_mount from disappearing (when the vnode is inactivated), possibly leading to a NULL dereference when checking if the mount wants knotes to be activated in the VOP hooks. So, we add a new vnode flag VV_NOKNOTE that is only set in getnewvnode(), if necessary, and check it when activating knotes. Since the flags are not erased when a vnode is being held, we can safely read them. Reviewed by: kris@ MFC after: 3 days
# 571dcd15	01-Jul-2005	Suleiman Souhlal <ssouhlal@FreeBSD.org>	Fix the recent panics/LORs/hangs created by my kqueue commit by: - Introducing the possibility of using locks different than mutexes for the knlist locking. In order to do this, we add three arguments to knlist_init() to specify the functions to use to lock, unlock and check if the lock is owned. If these arguments are NULL, we assume mtx_lock, mtx_unlock and mtx_owned, respectively. - Using the vnode lock for the knlist locking, when doing kqueue operations on a vnode. This way, we don't have to lock the vnode while holding a mutex, in filt_vfsread. Reviewed by: jmg Approved by: re (scottl), scottl (mentor override) Pointyhat to: ssouhlal Will be happy: everyone
# 679985d0	09-Jun-2005	Suleiman Souhlal <ssouhlal@FreeBSD.org>	Allow EVFILT_VNODE events to work on every filesystem type, not just UFS by: - Making the pre and post hooks for the VOP functions work even when DEBUG_VFS_LOCKS is not defined. - Moving the KNOTE activations into the corresponding VOP hooks. - Creating a MNTK_NOKNOTE flag for the mnt_kern_flag field of struct mount that permits filesystems to disable the new behavior. - Creating a default VOP_KQFILTER function: vfs_kqfilter() My benchmarks have not revealed any performance degradation. Reviewed by: jeff, bde Approved by: rwatson, jmg (kqueue changes), grehan (mentor)
# 35f19cdc	24-Mar-2005	Jeff Roberson <jeff@FreeBSD.org>	- Add a 'flags' parameter to VFS_ROOT(). This is intended to allow lookup to do shared locks on the root. Filesystems are free to ignore flags and instead acquire an exclusive lock if they do not support shared locks. Sponsored by: Isilon Systems, Inc.
# 78bb3c21	16-Mar-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Add mnt_hashseed to struct mount and initialize it witn PRNG bits, use it to get better hashing in vfs_hash. In case of an insert collision in vfs_hash_insert(), put the loosing vnode on a special list so that vfs_hash_remove() can just assume that it is on a list. Drop the VI_HASHED flag.
# e8ed9330	20-Feb-2005	David Schultz <das@FreeBSD.org>	Remove VFS_START(). Its original purpose involved the mfs filesystem, which is long gone. Discussed with: mckusick Reviewed by: phk
# ebbfc2f8	09-Feb-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Make various mountpoint related functions static.
# db50d057	24-Jan-2005	Jeff Roberson <jeff@FreeBSD.org>	- Add the mount flag MNTK_MPSAFE which indicates whether or not Giant must be held when any vnode owned by the filesystem is manipulated. - Add VFS_LOCK_GIANT and VFS_UNLOCK_GIANT macros which are used to conditionally lock and unlock Giant based on a particular mountpoint.
# 8df6bac4	11-Jan-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Remove the unused credential argument from VOP_FSYNC() and VFS_SYNC(). I'm not sure why a credential was added to these in the first place, it is not used anywhere and it doesn't make much sense: The credentials for syncing a file (ability to write to the file) should be checked at the system call level. Credentials for syncing one or more filesystems ("none") should be checked at the system call level as well. If the filesystem implementation needs a particular credential to carry out the syncing it would logically have to the cached mount credential, or a credential cached along with any delayed write data. Discussed with: rwatson
# 60727d8b	06-Jan-2005	Warner Losh <imp@FreeBSD.org>	/* -> /*- for license, minor formatting changes
# 20a92a18	07-Dec-2004	Poul-Henning Kamp <phk@FreeBSD.org>	The remaining part of nmount/omount/rootfs mount changes. I cannot sensibly split the conversion of the remaining three filesystems out from the root mounting changes, so in one go: cd9660: Convert to nmount. Add omount compat shims. Remove dedicated rootfs mounting code. Use vfs_mountedfrom() Rely on vfs_mount.c calling VFS_STATFS() nfs(client): Convert to nmount (the simple way, mount_nfs(8) is still necessary). Add omount compat shims. Drop COMPAT_PRELITE2 mount arg compatibility. ffs: Convert to nmount. Add omount compat shims. Remove dedicated rootfs mounting code. Use vfs_mountedfrom() Rely on vfs_mount.c calling VFS_STATFS() Remove vfs_omount() method, all filesystems are now converted. Remove MNTK_WANTRDWR, handling RO/RW conversions is a filesystem task, and they all do it now. Change rootmounting to use DEVFS trampoline: vfs_mount.c: Mount devfs on /. Devfs needs no 'from' so this is clean. symlink /dev to /. This makes it possible to lookup /dev/foo. Mount "real" root filesystem on /. Surgically move the devfs mountpoint from under the real root filesystem onto /dev in the real root filesystem. Remove now unnecessary getdiskbyname(). kern_init.c: Don't do devfs mounting and rootvnode assignment here, it was already handled by vfs_mount.c. Remove now unused bdevvp(), addaliasu() and addalias(). Put the few necessary lines in devfs where they belong. This eliminates the second-last source of bogo vnodes, leaving only the lemming-syncer. Remove rootdev variable, it doesn't give meaning in a global context and was not trustworth anyway. Correct information is provided by statfs(/).
# 53a05b7c	06-Dec-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Add more functions for handling mount arguments in VFS_MOUNT(): vfs_flagopt() for binary/boolean options. vfs_getopts() for string options vfs_filteropt() to check for unknown options. vfs_scanopt() for scanf() like processing of options. Also add function for setting the stat.f_mntfromname field.
# 5ddb0739	06-Dec-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Change the first argument of vfs_cmount() to a handy struct mntarg* and call it accordingly. (No filesystems implement vfs_cmount() yet, so this is a no-op commit)
# 49bfeeb8	06-Dec-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Add a few convenient functions in the mount_arg() family and collect the entire family at the end of the source file.
# a804d99c	05-Dec-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Make struct vfsopt{list} private to vfs_mount.c
# 74331236	05-Dec-2004	Poul-Henning Kamp <phk@FreeBSD.org>	VFS_STATFS(mp, ...) is mostly called with &mp->mnt_stat, but a few cases doesn't. Most of the implementations have grown weeds for this so they copy some fields from mnt_stat if the passed argument isn't that. Fix this the cleaner way: Always call the implementation on mnt_stat and copy that in toto to the VFS_STATFS argument if different.
# 6c12df5a	03-Dec-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Implement a function, mount_arg() for accumulating a list of mount parameters to nmount. Make kernel_mount() accept the output from mount_arg() and know how to free the malloc'ed space. Make kernel_vmount() use the new function.
# 7ec0ec06	03-Dec-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Add vfs_cmount() method to vfs_ops, this is to convert old-style mount args to nmount request.
# a08805c7	03-Dec-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Retire unused vfs_mount() function in the name of nmount migration.
# 32ba8e93	03-Dec-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Introduce vfs_byname_kld() which will try to load the filesystem as a module if possible. Use it so we don't have linker magic in the middle of the already complex mount code.
# 6518a5aa	26-Nov-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Eliminate MNT_NODEV usage, it doesn't have any meaning any more. Keep a #define MNT_NODEV 0 around to avoid dealing with contrib userland like mount_smbfs.
# de4cbbf5	25-Nov-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Integrate the relevant bits of vfs_rootmountalloc() where it matters.
# 996b2c82	29-Oct-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Loose vfs_mountedon()
# f9b3b0e6	04-Aug-2004	Maxim Konovalov <maxim@FreeBSD.org>	o Fix a typo in the comment.
# 5e8c582a	30-Jul-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Put a version element in the VFS filesystem configuration structure and refuse initializing filesystems with a wrong version. This will aid maintenance activites on the 5-stable branch. s/vfs_mount/vfs_omount/ s/vfs_nmount/vfs_mount/ Name our filesystems mount function consistently. Eliminate the namiedata argument to both vfs_mount and vfs_omount. It was originally there to save stack space. A few places abused it to get hold of some credentials to pass around. Effectively it is unused. Reorganize the root filesystem selection code.
# 3dfe213e	27-Jul-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Convert the vfsconf list to a TAILQ. Introduce vfs_byname() function to find things on it. Staticize vfs_nmount() function under the name vfs_donmount(). Various cleanups.
# 26035074	17-Jul-2004	Alfred Perlstein <alfred@FreeBSD.org>	Fix macro so that we don't get missing initializer warnings.
# f257b7a5	12-Jul-2004	Alfred Perlstein <alfred@FreeBSD.org>	Make VFS_ROOT() and vflush() take a thread argument. This is to allow filesystems to decide based on the passed thread which vnode to return. Several filesystems used curthread, they now use the passed thread.
# 2260fef1	07-Jul-2004	Alfred Perlstein <alfred@FreeBSD.org>	struct mount->mnt_data has been a qaddr_t since '94 (rev 1.1), It should be a void *, fix it.
# 81d16e2d	07-Jul-2004	Alfred Perlstein <alfred@FreeBSD.org>	do the vfsstd thing instead of messing up our VFS_SYSCTL macro.
# ea0104b0	06-Jul-2004	Alfred Perlstein <alfred@FreeBSD.org>	Introduce vfs_suser(), used to test if a user should have special privs for a mount.
# c713aaae	06-Jul-2004	Alfred Perlstein <alfred@FreeBSD.org>	NFS mobility PHASE I, II & III (phase VI, and V pending): Rebind the client socket when we experience a timeout. This fixes the case where our IP changes for some reason. Signal a VFS event when NFS transitions from up to down and vice versa. Add a placeholder vfs_sysctl where we will put status reporting shortly. Also: Make down NFS mounts return EIO instead of EINTR when there is a soft timeout or force unmount in progress.
# 2d1dca73	04-Jul-2004	Alfred Perlstein <alfred@FreeBSD.org>	Pass the operation in with the fsidctl. Remove some fsidctls that we will not be using. Correct prototypes for fs sysctls.
# 94ed9c8a	04-Jul-2004	Alfred Perlstein <alfred@FreeBSD.org>	Introduce a new kevent filter. EVFILT_FS that will be used to signal generic filesystem events to userspace. Currently only mount and unmount of filesystems are signalled. Soon to be added, up/down status of NFS. Introduce a sysctl node used to route requests to/from filesystems based on filesystem ids. Introduce a new vfsop, vfs_sysctl(mp, req) that is used as the callback/ entrypoint by the sysctl code to change individual filesystems.
# e3c5a7a4	04-Jul-2004	Poul-Henning Kamp <phk@FreeBSD.org>	When we traverse the vnodes on a mountpoint we need to look out for our cached 'next vnode' being removed from this mountpoint. If we find that it was recycled, we restart our traversal from the start of the list. Code to do that is in all local disk filesystems (and a few other places) and looks roughly like this: MNT_ILOCK(mp); loop: for (vp = TAILQ_FIRST(&mp...); (vp = nvp) != NULL; nvp = TAILQ_NEXT(vp,...)) { if (vp->v_mount != mp) goto loop; MNT_IUNLOCK(mp); ... MNT_ILOCK(mp); } MNT_IUNLOCK(mp); The code which takes vnodes off a mountpoint looks like this: MNT_ILOCK(vp->v_mount); ... TAILQ_REMOVE(&vp->v_mount->mnt_nvnodelist, vp, v_nmntvnodes); ... MNT_IUNLOCK(vp->v_mount); ... vp->v_mount = something; (Take a moment and try to spot the locking error before you read on.) On a SMP system, one CPU could have removed nvp from our mountlist but not yet gotten to assign a new value to vp->v_mount while another CPU simultaneously get to the top of the traversal loop where it finds that (vp->v_mount != mp) is not true despite the fact that the vnode has indeed been removed from our mountpoint. Fix: Introduce the macro MNT_VNODE_FOREACH() to traverse the list of vnodes on a mountpoint while taking into account that vnodes may be removed from the list as we go. This saves approx 65 lines of duplicated code. Split the insmntque() which potentially moves a vnode from one mount point to another into delmntque() and insmntque() which does just what the names say. Fix delmntque() to set vp->v_mount to NULL while holding the mountpoint lock.
# 89c9c53d	16-Jun-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Do the dreaded s/dev_t/struct cdev */ Bump __FreeBSD_version accordingly.
# 4af6b509	11-Apr-2004	Maxime Henrion <mux@FreeBSD.org>	Belatedly remove the getvfsent(3) API. All the consumers have been updated to use getvfsbyname(3) or the vfs.conflist sysctl since a long time, except mount_smbfs(8) which has just been fixed.
# 0bf57301	11-Apr-2004	Maxime Henrion <mux@FreeBSD.org>	Put struct ovfsconf inside BURN_BRIDGES as well.
# 82c6e879	06-Apr-2004	Warner Losh <imp@FreeBSD.org>	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999. Approved by: core
# 71529a89	06-Apr-2004	Bruce Evans <bde@FreeBSD.org>	Oops, fixed insertion sort error in the fix for an insertion sort error. While here, begin fixing dependencies of <sys/mount.h> on normal namespace pollution (__BSD_VISIBLE) by not using u_int in the prototype for nmount(2), although it is used in the man page. While there, begin cleaning up another set of prototypes: - use u_int in the prototype for the kernel part of nmount(). - consistently don't use parameter names in prototypes in the "exported vnode operations" set of prototypes, although style(9) says to use names in the kernel.
# f468f075	06-Apr-2004	Bruce Evans <bde@FreeBSD.org>	Fixed unsorting of prototypes in previous commit and 1.134.
# e2c8a799	05-Apr-2004	Doug Rabson <dfr@FreeBSD.org>	Regen.
# 537370d0	16-Mar-2004	Tim J. Robbins <tjr@FreeBSD.org>	Make vfs_nmount() public. The Linux emulator needs this in order to mount linprocfs filesystems.
# 2b348f74	11-Mar-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Remove unused mnt_reservedvnlist field.
# 43a55a72	02-Feb-2004	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Added flag MNT_USER to MNT_UPDATEMASK, it will be used for detecting file systems mounted by unprivileged users. Reviewed by: rwatson Approved by: scottl (mentor) MFC after: 3 days
# fde81c7d	12-Nov-2003	Kirk McKusick <mckusick@FreeBSD.org>	Update the statfs structure with 64-bit fields to allow accurate reporting of multi-terabyte filesystem sizes. You should build and boot a new kernel BEFORE doing a `make world' as the new kernel will know about binaries using the old statfs structure, but an old kernel will not know about the new system calls that support the new statfs structure. Running an old kernel after a `make world' will cause programs such as `df' that do a statfs system call to fail with a bad system call. Reviewed by: Bruce Evans <bde@zeta.org.au> Reviewed by: Tim Robbins <tjr@freebsd.org> Reviewed by: Julian Elischer <julian@elischer.org> Reviewed by: the hoards of <arch@freebsd.org> Sponsored by: DARPA & NAI Labs.
# eca8a663	11-Nov-2003	Robert Watson <rwatson@FreeBSD.org>	Modify the MAC Framework so that instead of embedding a (struct label) in various kernel objects to represent security data, we embed a (struct label *) pointer, which now references labels allocated using a UMA zone (mac_label.c). This allows the size and shape of struct label to be varied without changing the size and shape of these kernel objects, which become part of the frozen ABI with 5-STABLE. This opens the door for boot-time selection of the number of label slots, and hence changes to the bound on the number of simultaneous labeled policies at boot-time instead of compile-time. This also makes it easier to embed label references in new objects as required for locking/caching with fine-grained network stack locking, such as inpcb structures. This change also moves us further in the direction of hiding the structure of kernel objects from MAC policy modules, not to mention dramatically reducing the number of '&' symbols appearing in both the MAC Framework and MAC policy modules, and improving readability. While this results in minimal performance change with MAC enabled, it will observably shrink the size of a number of critical kernel data structures for the !MAC case, and should have a small (but measurable) performance benefit (i.e., struct vnode, struct socket) do to memory conservation and reduced cost of zeroing memory. NOTE: Users of MAC must recompile their kernel and all MAC modules as a result of this change. Because this is an API change, third party MAC modules will also need to be updated to make less use of the '&' symbol. Suggestions from: bmilekic Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
# 5c957adb	11-Nov-2003	Alexander Kabaev <kan@FreeBSD.org>	1. Consolidate mount struct allocation/destruction into a common code in vfs_mount_alloc/vfs_mount_destroy functions and take care to completely destroy the mount point along with its locks. Mount struct has grown in coplexity recently and depending on each failure path to destroy it completely isn't working anymore. 2. Eliminate largely identical vfs_mount and vfs_unmount question by moving the code to handle both cases into a newly introduced vfs_domount function. 3. Simplify nfs_mount_diskless to always expect an allocated mount struct and never attempt an allocation/destruction itself. The vfs_allocroot allocation was there to support 'magic' swap space configuration for diskless clients that was already removed by PHK some time ago. 4. Include a vfs_buildopts cleanups by Peter Edwards to validate the sanity of nmount parameters passed from userland. Submitted by: (4) Peter Edwards <peter.edwards@openet-telecom.com> Reviewed by: rwatson
# ca430f2e	04-Nov-2003	Alexander Kabaev <kan@FreeBSD.org>	Remove mntvnode_mtx and replace it with per-mountpoint mutex. Introduce two new macros MNT_ILOCK(mp)/MNT_IUNLOCK(mp) to operate on this mutex transparently. Eventually new mutex will be protecting more fields in struct mount, not only vnode list. Discussed with: jeff
# 318f2fb4	01-Jul-2003	Ian Dowse <iedowse@FreeBSD.org>	Add a new mount flag MNT_BYFSID that can be used to unmount a file system by specifying the file system ID instead of a path. Use this by default in umount(8). This avoids the need to perform any vnode operations to look up the mount point, so it makes it possible to unmount a file system whose root vnode cannot be looked up (e.g. due to a dead NFS server, or a file system that has become detached from the hierarchy because an underlying file system was unmounted). It also provides an unambiguous way to specify which file system is to be unmunted. Since the ability to unmount using a path name is retained only for compatibility, that case now just uses a simple string comparison of the supplied path against f_mntonname of each mounted file system. Discussed on: freebsd-arch mdoc help from: ru
# 6b080461	26-Mar-2003	Tor Egge <tegge@FreeBSD.org>	Adjust the number of vnodes scanned by vlrureclaim() according to the size of the vnode list.
# c162e9c2	11-Mar-2003	Alexander Kabaev <kan@FreeBSD.org>	Rename vfs_stdsync function to vfs_stdnosync which matches more closely what function is really doing. Update all existing consumers to use the new name. Introduce a new vfs_stdsync function, which iterates over mount point's vnodes and call FSYNC on each one of them in turn. Make nwfs and smbfs use this new function instead of rolling their own identical sync implementations. Reviewed by: jeff
# 72f0679c	10-Mar-2003	Alexander Kabaev <kan@FreeBSD.org>	Remove trainling whitespace.
# 5606ac9e	27-Dec-2002	Robert Watson <rwatson@FreeBSD.org>	Re-add MNT_ACLS to the list of "updateable" mount flags, per our documentation. Generally, you really shouldn't twiddle the flag, but there are sensible scenarios where one might. Obtained from: TrustedBSD Project
# b78beb60	07-Nov-2002	Maxime Henrion <mux@FreeBSD.org>	A bunch of style(9) fixes. Obtained from: bde
# b65d1ba9	07-Nov-2002	Maxime Henrion <mux@FreeBSD.org>	- Use a better definition for MNAMELEN which doesn't require to have one #ifdef per architecture. - Change a space to a tab after a nearby #define. Obtained from: bde
# a16a92af	14-Oct-2002	Robert Watson <rwatson@FreeBSD.org>	Define MNT_ACLS, which can report on the status of the FS_ACLS flag used by UFS to administratively enable support for extended ACLs. While I'm here, remove MNT_MULTILABEL from the list of file system flags we permit to be updated after the initial mount. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
# fee7d450	19-Aug-2002	Poul-Henning Kamp <phk@FreeBSD.org>	Keep a copy of the credential used to mount filesystems around so we can check and use it later on. Change the pieces of code which relied on mount->mnt_stat.f_owner to check which user mounted the filesystem. This became needed as the EA code needs to be able to allocate blocks for "system" EA users like ACLs. There seems to be some half-baked (probably only quarter- actually) notion that the superuser for a given filesystem is the user who mounted it, but this has far from been carried through. It is unclear if it should be. Sponsored by: DARPA & NAI Labs.
# 01abbb42	13-Aug-2002	Robert Watson <rwatson@FreeBSD.org>	Move to a nested include of _label.h instead of mac.h in sys/sys/*.h (Most of the places where mac.h was recursively included from another kernel header file. net/netinet to follow.) Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs Suggested by: bde
# bf20c7a3	13-Aug-2002	Maxime Henrion <mux@FreeBSD.org>	Forward define struct iovec instead of including sys/uio.h and polluting the namespace even more.
# 9bf1a756	13-Aug-2002	Poul-Henning Kamp <phk@FreeBSD.org>	Introduce typedefs for the member functions of struct vfsops and employ these in the main filesystems. This does not change the resulting code but makes the source a little bit more grepable. Sponsored by: DARPA and NAI Labs.
# 4033e07e	10-Aug-2002	Maxime Henrion <mux@FreeBSD.org>	Don't #ifdef _KERNEL struct vfsconf, mount_smbfs(8) still uses it. Submitted by: jake
# 136be715	10-Aug-2002	Maxime Henrion <mux@FreeBSD.org>	One declaration for struct xvfsconf is enough. I have no idea how this happened. :-) Reported by: Norman C. Rice <nrice@emu.sourcee.com>
# 5965373e	10-Aug-2002	Maxime Henrion <mux@FreeBSD.org>	- Introduce a new struct xvfsconf, the userland version of struct vfsconf. - Make getvfsbyname() take a struct xvfsconf *. - Convert several consumers of getvfsbyname() to use struct xvfsconf. - Correct the getvfsbyname.3 manpage. - Create a new vfs.conflist sysctl to dump all the struct xvfsconf in the kernel, and rewrite getvfsbyname() to use this instead of the weird existing API. - Convert some {set,get,end}vfsent() consumers to use the new vfs.conflist sysctl. - Convert a vfsload() call in nfsiod.c to kldload() and remove the useless vfsisloadable() and endvfsent() calls. - Add a warning printf() in vfs_sysctl() to tell people they are using an old userland. After these changes, it's possible to modify struct vfsconf without breaking the binary compatibility. Please note that these changes don't break this compatibility either. When bp will have updated mount_smbfs(8) with the patch I sent him, there will be no more consumers of the {set,get,end}vfsent(), vfsisloadable() and vfsload() API, and I will promptly delete it.
# 3b2e6009	30-Jul-2002	Robert Watson <rwatson@FreeBSD.org>	Begin committing support for Mandatory Access Control and extensible kernel access control. The MAC framework permits loadable kernel modules to link to the kernel at compile-time, boot-time, or run-time, and augment the system security policy. This commit includes the initial kernel implementation, although the interface with the userland components of the oeprating system is still under work, and not all kernel subsystems are supported. Later in this commit sequence, documentation of which kernel subsystems will not work correctly with a kernel compiled with MAC support will be added. Label file system mount points, permitting security information to be maintained at the granularity of the file system. Two labels are currently maintained: a security label for the mount itself, and a default label for objects in the file system (in particular, for file systems not supporting per-vnode labeling directly). Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
# fbedc80b	09-Jul-2002	Maxime Henrion <mux@FreeBSD.org>	Remove vfs_stdmount() and vfs_stdunmount(). They are not really useful and are incompatible with nmount.
# 563af2ec	03-Jul-2002	Maxime Henrion <mux@FreeBSD.org>	Remove an unused argument in vfs_mountroot().
# 2b4edb69	02-Jul-2002	Maxime Henrion <mux@FreeBSD.org>	Move every code related to mount(2) in a new file, vfs_mount.c. The file vfs_conf.c which was dealing with root mounting has been repo-copied into vfs_mount.c to preserve history. This makes nmount related development easier, and help reducing the size of vfs_syscalls.c, which is still an enormous file. Reviewed by: rwatson Repo-copy by: peter
# cacd1c9b	22-Jun-2002	Maxime Henrion <mux@FreeBSD.org>	o Remove the initialization of unused fields in the struct uio now that we don't use uiomove() anymore. o Enforce stricter checks on the length of the iov's in nmount(2) since we now malloc() them individually and corrupted iov's could make the kernel crash in malloc() with "kmem_map too small". Reviewed by: phk
# 7d2d4409	20-Jun-2002	Maxime Henrion <mux@FreeBSD.org>	Change the way we internally store the mount options to a linked list. This is to allow the merging of the mount options in the MNT_UPDATE case, as the current data structure is unsuitable for this. There are no functional differences in this commit. Reviewed by: phk
# fe937506	14-Jun-2002	Maxime Henrion <mux@FreeBSD.org>	Change vfs_copyopt() so that the length argument passed to it must be the exact same size as the mount option. This makes vfs_copyopt() much more useful.
# cdb5638a	23-May-2002	Maxime Henrion <mux@FreeBSD.org>	Update comments to better match reality.
# d394511d	16-May-2002	Tom Rhodes <trhodes@FreeBSD.org>	More s/file system/filesystem/g
# df99ca52	16-Apr-2002	Ian Dowse <iedowse@FreeBSD.org>	The recent NFS forced unmount improvements introduced a side-effect where some client operations might be unexpectedly cancelled during an unsuccessful non-forced unmount attempt. This causes problems for amd(8), because it periodically attempts a non-forced unmount to check if the filesystem is still in use. Fix this by adding a new mountpoint flag MNTK_UNMOUNTF that is set only during the operation of a forced unmount. Use this instead of MNTK_UNMOUNT to trigger the cancellation of hung NFS operations. Also correct a problem where dounmount() might inadvertently clear the MNTK_UNMOUNT flag. Reported by: simokawa MFC after: 1 week
# 5616879a	26-Mar-2002	Maxime Henrion <mux@FreeBSD.org>	Commit the good prototype for nmount(2). Reviewed by: phk
# 17594b93	26-Mar-2002	Maxime Henrion <mux@FreeBSD.org>	As discussed in -arch, add the new nmount(2) system call and the new vfs_getopt()/vfs_copyopt() API. This is intended to be used later, when there will be filesystems implementing the VFS_NMOUNT operation. The mount(2) system call will disappear when all filesystems will be converted to the new API. Documentation will be committed in a while. Reviewed by: phk
# c58eb46e	23-Mar-2002	Bruce Evans <bde@FreeBSD.org>	Fixed some style bugs in the removal of __P(()). The main ones were not removing tabs before "__P((", and not outdenting continuation lines to preserve non-KNF lining up of code with parentheses. Switch to KNF formatting and/or rewrap the whole prototype in some cases.
# 789f12fe	19-Mar-2002	Alfred Perlstein <alfred@FreeBSD.org>	Remove __P
# a0595d02	16-Mar-2002	Kirk McKusick <mckusick@FreeBSD.org>	Add a flags parameter to VFS_VGET to pass through the desired locking flags when acquiring a vnode. The immediate purpose is to allow polling lock requests (LK_NOWAIT) needed by soft updates to avoid deadlock when enlisting other processes to help with the background cleanup. For the future it will allow the use of shared locks for read access to vnodes. This change touches a lot of files as it affects most filesystems within the system. It has been well tested on FFS, loopback, and CD-ROM filesystems. only lightly on the others, so if you find a problem there, please let me (mckusick@mckusick.com) know.
# fb92273b	08-Mar-2002	Poul-Henning Kamp <phk@FreeBSD.org>	Move the mount of the root filesystem to happen in the init process before the exec if /sbin/init. This allows the scheduler to get started and kthreads a chance to run before we start filesystem operations.
# fdc6e087	05-Mar-2002	Robert Watson <rwatson@FreeBSD.org>	Reserve a mount flag, MNT_MULTILABEL, used by the MAC subsystem and individual filesystems to determine whether they should operate in "file system as a single object" mode, or "file system as a set of objects with individual labels" mode. Note: in the trustedbsd_mac branch, this is refered to as "MNT_MULTILEVEL", but the two mean the same thing. MNT_MULTILABEL is more suggestive of a flexible policy system than one providing purely hierarchal policies. The need for a reserved flag will go away once nmount() is done. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
# 751a2cd0	05-Nov-2001	Poul-Henning Kamp <phk@FreeBSD.org>	Define a new mount flag "MNT_JAILDEVFS" Collect the magic combination of flags which can be updated into a macro in sys/mount.h rather than inlining them (twice!) in vfs_syscalls.c
# 6b8bd2ef	04-Nov-2001	Matthew Dillon <dillon@FreeBSD.org>	Add mnt_reservedvnlist so we can MFC to 4.x, in order to make all mount structure changes now rather then piecemeal later on. mnt_nvnodelist currently holds all the vnodes under the mount point. This will eventually be split into a 'dirty' and 'clean' list. This way we only break kld's once rather then twice. nvnodelist will eventually turn into the dirty list and should remain compatible with the klds.
# c72ccd01	22-Oct-2001	Matthew Dillon <dillon@FreeBSD.org>	Change the vnode list under the mount point from a LIST to a TAILQ in preparation for an implementation of limiting code for kern.maxvnodes. MFC after: 3 days
# b40ce416	12-Sep-2001	Julian Elischer <julian@FreeBSD.org>	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
# 9ceb1844	30-Jul-2001	Jake Burkholder <jake@FreeBSD.org>	Machine dependent ifdefs for sparc64.
# f0cc1c6f	09-Jul-2001	Dag-Erling Smørgrav <des@FreeBSD.org>	Constify the fstype argument to vfs_mount(). This eliminates at least one "call discards qualifier" warning (in sys/compat/linux/linux_file.c).
# fb49f549	09-Jun-2001	Benno Rice <benno@FreeBSD.org>	Changes to sys/ includes to support PowerPC. Reviewed by: obrien, dfr
# fb919e4d	01-May-2001	Mark Murray <markm@FreeBSD.org>	Undo part of the tangle of having sys/lock.h and sys/mutex.h included in other "system" header files. Also help the deprecation of lockmgr.h by making it a sub-include of sys/lock.h and removing sys/lockmgr.h form kernel .c files. Sort sys/*.h includes where possible in affected files. OK'ed by: bde (with reservations)
# a13234bb	25-Apr-2001	Poul-Henning Kamp <phk@FreeBSD.org>	Move the netexport structure from the fs-specific mountstructure to struct mount. This makes the "struct netexport *" paramter to the vfs_export and vfs_checkexport interface unneeded. Consequently that all non-stacking filesystems can use vfs_stdcheckexp(). At the same time, make it a pointer to a struct netexport in struct mount, so that we can remove the bogus AF_MAX and #include <net/radix.h> from <sys/mount.h>
# b186f62c	23-Apr-2001	Greg Lehey <grog@FreeBSD.org>	Back out previous commit. Requested by: bde
# e84a5d83	23-Apr-2001	Greg Lehey <grog@FreeBSD.org>	Remove bogus #include and duplicate definition of AF_MAX. These were made necessary by breakage in usr.sbin/pstat and usr.bin/fstat, since fixed. Suggested by: phk Unearthed by: John Hood <jhood@sitaranetworks.com>
# 4c68f41d	22-Apr-2001	Greg Lehey <grog@FreeBSD.org>	Add address families AF_SLOW and AF_SCLUSTER. These are used by the Sitara QoSworks box. Obtained from: Sitara Networks Inc.
# 30632071	18-Mar-2001	Robert Watson <rwatson@FreeBSD.org>	o Rename "namespace" argument to "attrnamespace" as namespace is a C++ reserved word. Submitted by: jkh Obtained from: TrustedBSD Project
# 70f36851	14-Mar-2001	Robert Watson <rwatson@FreeBSD.org>	o Change the API and ABI of the Extended Attribute kernel interfaces to introduce a new argument, "namespace", rather than relying on a first- character namespace indicator. This is in line with more recent thinking on EA interfaces on various mailing lists, including the posix1e, Linux acl-devel, and trustedbsd-discuss forums. Two namespaces are defined by default, EXTATTR_NAMESPACE_SYSTEM and EXTATTR_NAMESPACE_USER, where the primary distinction lies in the access control model: user EAs are accessible based on the normal MAC and DAC file/directory protections, and system attributes are limited to kernel-originated or appropriately privileged userland requests. o These API changes occur at several levels: the namespace argument is introduced in the extattr_{get,set}_file() system call interfaces, at the vnode operation level in the vop_{get,set}extattr() interfaces, and in the UFS extended attribute implementation. Changes are also introduced in the VFS extattrctl() interface (system call, VFS, and UFS implementation), where the arguments are modified to include a namespace field, as well as modified to advoid direct access to userspace variables from below the VFS layer (in the style of recent changes to mount by adrian@FreeBSD.org). This required some cleanup and bug fixing regarding VFS locks and the VFS interface, as a vnode pointer may now be optionally submitted to the VFS_EXTATTRCTL() call. Updated documentation for the VFS interface will be committed shortly. o In the near future, the auto-starting feature will be updated to search two sub-directories to the ".attribute" directory in appropriate file systems: "user" and "system" to locate attributes intended for those namespaces, as the single filename is no longer sufficient to indicate what namespace the attribute is intended for. Until this is committed, all attributes auto-started by UFS will be placed in the EXTATTR_NAMESPACE_SYSTEM namespace. o The default POSIX.1e attribute names for ACLs and Capabilities have been updated to no longer include the '$' in their filename. As such, if you're using these features, you'll need to rename the attribute backing files to the same names without '$' symbols in front. o Note that these changes will require changes in userland, which will be committed shortly. These include modifications to the extended attribute utilities, as well as to libutil for new namespace string conversion routines. Once the matching userland changes are committed, a buildworld is recommended to update all the necessary include files and verify that the kernel and userland environments are in sync. Note: If you do not use extended attributes (most people won't), upgrading is not imperative although since the system call API has changed, the new userland extended attribute code will no longer compile with old include files. o Couple of minor cleanups while I'm there: make more code compilation conditional on FFS_EXTATTR, which should recover a bit of space on kernels running without EA's, as well as update copyright dates. Obtained from: TrustedBSD Project
# f3a90da9	01-Mar-2001	Adrian Chadd <adrian@FreeBSD.org>	Reviewed by: jlemon An initial tidyup of the mount() syscall and VFS mount code. This code replaces the earlier work done by jlemon in an attempt to make linux_mount() work. * the guts of the mount work has been moved into vfs_mount(). * move `type', `path' and `flags' from being userland variables into being kernel variables in vfs_mount(). `data' remains a pointer into userspace. * Attempt to verify the `type' and `path' strings passed to vfs_mount() aren't too long. * rework mount() and linux_mount() to take the userland parameters (besides data, as mentioned) and pass kernel variables to vfs_mount(). (linux_mount() already did this, I've just tidied it up a little more.) * remove the copyin() stuff for `path'. `data' still requires copyin() since its a pointer into userland. * set `mount->mnt_statf_mntonname' in vfs_mount() rather than in each filesystem. This variable is generally initialised with `path', and each filesystem can override it if they want to. * NOTE: f_mntonname is intiailised with "/" in the case of a root mount.
# c0511d3b	18-Feb-2001	Brian Feldman <green@FreeBSD.org>	Switch to using a struct xucred instead of a struct xucred when not actually in the kernel. This structure is a different size than what is currently in -CURRENT, but should hopefully be the last time any application breakage is caused there. As soon as any major inconveniences are removed, the definition of the in-kernel struct ucred should be conditionalized upon defined(_KERNEL). This also changes struct export_args to remove dependency on the constantly-changing struct ucred, as well as limiting the bounds of the size fields to the correct size. This means: a) mountd and friends won't break all the time, b) mountd and friends won't crash the kernel all the time if they don't know what they're doing wrt actual struct export_args layout. Reviewed by: bde
# c3d7bcdf	16-Feb-2001	Jonathan Lemon <jlemon@FreeBSD.org>	Introduce copyinfrom and copyinstrfrom, which can copy data from either user or kernel space. This will allow layering of os-compat (e.g.: linux) system calls. Apply the changes to mount.
# 7a8671e9	04-Dec-2000	Alfred Perlstein <alfred@FreeBSD.org>	remove struct mount from useland visibility
# 6092d187	13-Oct-2000	Bruce Evans <bde@FreeBSD.org>	Fixed namespace pollution in rev.1.78. Don't export <sys/stat.h> to userland from here; just forward declare struct stat. fhstat.2 (== fhopen.2 == fhstatfs.2) has always specified including <sys/stat.h> before using any of the fh functions although this is only necessary for dereferencing the "struct stat *" arg of fhstat(), so applications should not notice this change. Fixed unsorting of user prototypes in rev.1.78.
# a18b1f1d	03-Oct-2000	Jason Evans <jasone@FreeBSD.org>	Convert lockmgr locks from using simple locks to using mutexes. Add lockdestroy() and appropriate invocations, which corresponds to lockinit() and must be called to clean up after a lockmgr lock is no longer needed.
# 918c9eec	29-Sep-2000	Doug Rabson <dfr@FreeBSD.org>	Add ia64 support.
# d4c18169	11-Jul-2000	Kirk McKusick <mckusick@FreeBSD.org>	Clean up warning about undeclared function by declaring softdep_fsync in mount.h instead of ffs_extern.h. The correct solution is to use an indirect function pointer so that the kernel does not have to be built with options FFS, but that will be left for another day.
# 22e5a623	03-Jul-2000	Kirk McKusick <mckusick@FreeBSD.org>	Get userland visible flags added for snapshots to give a few days advance preparation for them to get migrated into place so that subsequent changes in utilities will not fail to compile for lack of up-to-date header files in /usr/include.
# 75236818	16-Jun-2000	Poul-Henning Kamp <phk@FreeBSD.org>	ARGH! I have too many source trees :-( Fix prototype errors in last commit.
# a2e7a027	16-Jun-2000	Poul-Henning Kamp <phk@FreeBSD.org>	Virtualizes & untangles the bioops operations vector. Ref: Message-ID: <18317.961014572@critter.freebsd.dk> To: current@
# e3975643	25-May-2000	Jake Burkholder <jake@FreeBSD.org>	Back out the previous change to the queue(3) interface. It was not discussed and should probably not happen. Requested by: msmith and others
# 740a1973	23-May-2000	Jake Burkholder <jake@FreeBSD.org>	Change the way that the queue(3) structures are declared; don't assume that the type argument to _HEAD and _ENTRY is a struct. Suggested by: phk Reviewed by: phk Approved by: mdodd
# 8f073875	18-Jan-2000	Robert Watson <rwatson@FreeBSD.org>	Fix bde'isms in acl/extattr syscall interface, renaming syscalls to prettier (?) names, adding some const's around here, et al. Reviewed by: bde
# 664a31e4	28-Dec-1999	Peter Wemm <peter@FreeBSD.org>	Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL" is an application space macro and the applications are supposed to be free to use it as they please (but cannot). This is consistant with the other BSD's who made this change quite some time ago. More commits to come.
# 91f37dcb	18-Dec-1999	Robert Watson <rwatson@FreeBSD.org>	Second pass commit to introduce new ACL and Extended Attribute system calls, vnops, vfsops, both in /kern, and to individual file systems that require a vfsop_ array entry. Reviewed by: eivind
# 21a9e9a1	02-Dec-1999	Jordan K. Hubbard <jkh@FreeBSD.org>	Define name length differently for alpha in order to preserve backwards compatibility. Submitted by: Andrew Gallatin <gallatin@cs.duke.edu> Reviewed by: mckusick
# e9cc4758	30-Nov-1999	Kirk McKusick <mckusick@FreeBSD.org>	Collect read and write counts for filesystems. This new code drops the counting in bwrite and puts it all in spec_strategy. I did some tests and verified that the counts collected for writes in spec_strategy is identical to the counts that we previously collected in bwrite. We now also get read counts (async reads come from requests for read-ahead blocks). Note that you need to compile a new version of mount to get the read counts printed out. The old mount binary is completely compatible, the only reason to install a new mount is to get the read counts printed. Submitted by: Craig A Soules <soules+@andrew.cmu.edu> Reviewed by: Kirk McKusick <mckusick@mckusick.com>
# 0429e37a	20-Nov-1999	Poul-Henning Kamp <phk@FreeBSD.org>	struct mountlist and struct mount.mnt_list have no business being a CIRCLEQ. Change them to TAILQ_HEAD and TAILQ_ENTRY respectively. This removes ugly mp != (void*)&mountlist comparisons. Requested by: phk Submitted by: Jake Burkholder jake@checker.org PR: 14967
# 5b42dac8	31-Oct-1999	Julian Elischer <julian@FreeBSD.org>	Most modern OSs have the ability to flag certain mounts as ones to be ignored by default by the df(1) program. This is used mostly to avoid stat()-ing entries that do not represent "real" disk mount points (such as those made by an automounter such as amd.) It is also useful not to have to stat() these entries because it takes longer to report them that for other file systems, being that these mount points are served by a user-level file server and resulting in several context switches. Worse, if the automounter is down unexpectedly, a causal df(1) will hang in an interruptible way. PR: kern/9764 Submitted by: Erez Zadok <ezk@cs.columbia.edu>
# 4cf49a43	21-Oct-1999	Julian Elischer <julian@FreeBSD.org>	Whistle's Netgraph link-layer (sometimes more) networking infrastructure. Been in production for 3 years now. Gives Instant Frame relay to if_sr and if_ar drivers, and PPPOE support soon. See: ftp://ftp.whistle.com/pub/archie/netgraph/index.html for on-line manual pages. Reviewed by: Doug Rabson (dfr@freebsd.org) Obtained from: Whistle CVS tree
# 114ae644	14-Oct-1999	Mike Smith <msmith@FreeBSD.org>	Implement pseudo_AF_HDRCMPLT, which controls the state of the 'header completion' flag. If set, the interface output routine will assume that the packet already has a valid link-level source address. This defaults to off (the address is overwritten) PR: kern/10680 Submitted by: "Christopher N . Harrell" <cnh@mindspring.net> Obtained from: NetBSD
# 1b5464ef	29-Sep-1999	Poul-Henning Kamp <phk@FreeBSD.org>	Remove v_maxio from struct vnode. Replace it with mnt_iosize_max in struct mount. Nits from: bde
# e6f71111	19-Sep-1999	Matthew Dillon <dillon@FreeBSD.org>	Fix BOOTP root FS mounts. Also cleanup vfs_getnewfsid() and collapse addaliasu() into addalias() (no operational change) and clarify comments relating to a trick that vclean() uses. The fix to BOOTP is yet another hack. Actually, rootfsid handling is already a major hack. The whole thing needs to be cleaned up. Reviewed by: David Greenman <dg@root.com>, Alan Cox <alc@cs.rice.edu>
# c24fda81	10-Sep-1999	Alfred Perlstein <alfred@FreeBSD.org>	Seperate the export check in VFS_FHTOVP, exports are now checked via VFS_CHECKEXP. Add fh(open\|stat\|stafs) syscalls to allow userland to query filesystems based on (network) filehandle. Obtained from: NetBSD
# 5a5fccc8	07-Sep-1999	Alfred Perlstein <alfred@FreeBSD.org>	All unimplemented VFS ops now have entries in kern/vfs_default.c that return reasonable defaults. This avoids confusing and ugly casting to eopnotsupp or making dummy functions. Bogus casting of filesystem sysctls to eopnotsupp() have been removed. This should make *_vfsops.c more readable and reduce bloat. Reviewed by: msmith, eivind Approved by: phk Tested by: Jeroen Ruigrok/Asmodai <asmodai@wxs.nl>
# c3aac50f	27-Aug-1999	Peter Wemm <peter@FreeBSD.org>	$Id$ -> $FreeBSD$
# e9189611	17-Apr-1999	Peter Wemm <peter@FreeBSD.org>	Well folks, this is it - The second stage of the removal for build support for LKM's..
# ce02431f	16-Feb-1999	Doug Rabson <dfr@FreeBSD.org>	* Change sysctl from using linker_set to construct its tree using SLISTs. This makes it possible to change the sysctl tree at runtime. * Change KLD to find and register any sysctl nodes contained in the loaded file and to unregister them when the file is unloaded. Reviewed by: Archie Cobbs <archie@whistle.com>, Peter Wemm <peter@netplex.com.au> (well they looked at it anyway)
# afbbfd3b	15-Nov-1998	Bruce Evans <bde@FreeBSD.org>	Fixed the type and order of vfs_modevent. This fixes part of a spew of warnings for the recent change of the type of a module event handler. Fixed a rotted comment (numeric types of filesystems are not listed here). Made the function protototype in VFS_SET() more like the corresponding function definition (don't use extern for prototypes). Enforce a semicolon after the LKM case of VFS_SET().
# 4e61198e	10-Nov-1998	Peter Wemm <peter@FreeBSD.org>	Make the vnode opv vector construction fully dynamic. Previously we leaked memory on each unload and were limited to items referenced in the kernel copy of vnode_if.c. Now a kernel module is free to create it's own VOP_FOO() routines and the rest of the system will happily deal with it, including passthrough layers like union/umap/etc. Have VFS_SET() call a common vfs_modevent() handler rather than inline duplicating the common code all over the place. Have VNODEOP_SET() have the vnodeops removed at unload time (assuming a module) so that the vop_t vector is reclaimed. Slightly adjust the vop_t vectors so that calling slot 0 is a panic rather than a page fault. This could happen if VOP_something() was called without any handlers being present anywhere (including in vfs_default.c). slot 1 becomes the default vector for the vnodeop table. TODO: reclaim zones on unload (eg: nfs code)
# 7c8faeb3	06-Nov-1998	Peter Wemm <peter@FreeBSD.org>	oops! s/vfs_register/vfs_unregister/ in the unload case.. Mentioned by: dfr
# a429d69f	06-Nov-1998	Peter Wemm <peter@FreeBSD.org>	Remove trailing ';' - use the one supplied by the caller: "VFS_SET(foo);"
# aa855a59	15-Oct-1998	Peter Wemm <peter@FreeBSD.org>	gulp. Jordan specifically OK'ed this.. This is the bulk of the support for doing kld modules. Two linker_sets were replaced by SYSINIT()'s. VFS's and exec handlers are self registered. kld is now a superset of lkm. I have converted most of them, they will follow as a seperate commit as samples. This all still works as a static a.out kernel using LKM's.
# 3f8c4506	15-Sep-1998	Poul-Henning Kamp <phk@FreeBSD.org>	(this is an extract from src/share/examples/atm/README) =================================== HARP \| Host ATM Research Platform =================================== HARP 3 What is this stuff? ------------------- The Advanced Networking Group (ANG) at the Minnesota Supercomputer Center, Inc. (MSCI), as part of its work on the MAGIC Gigabit Testbed, developed the Host ATM Research Platform (HARP) software, which allows IP hosts to communicate over ATM networks using standard protocols. It is intended to be a high-quality platform for IP/ATM research. HARP provides a way for IP hosts to connect to ATM networks. It supports standard methods of communication using IP over ATM. A host's standard IP software sends and receives datagrams via a HARP ATM interface. HARP provides functionality similar to (and typically replaces) vendor-provided ATM device driver software. HARP includes full source code, making it possible for researchers to experiment with different approaches to running IP over ATM. HARP is self-contained; it requires no other licenses or commercial software packages. HARP implements support for the IETF Classical IP model for using IP over ATM networks, including: o IETF ATMARP address resolution client o IETF ATMARP address resolution server o IETF SCSP/ATMARP server o UNI 3.1 and 3.0 signalling protocols o Fore Systems's SPANS signalling protocol What's supported ---------------- The following are supported by HARP 3: o ATM Host Interfaces - FORE Systems, Inc. SBA-200 and SBA-200E ATM SBus Adapters - FORE Systems, Inc. PCA-200E ATM PCI Adapters - Efficient Networks, Inc. ENI-155p ATM PCI Adapters o ATM Signalling Protocols - The ATM Forum UNI 3.1 signalling protocol - The ATM Forum UNI 3.0 signalling protocol - The ATM Forum ILMI address registration - FORE Systems's proprietary SPANS signalling protocol - Permanent Virtual Channels (PVCs) o IETF "Classical IP and ARP over ATM" model - RFC 1483, "Multiprotocol Encapsulation over ATM Adaptation Layer 5" - RFC 1577, "Classical IP and ARP over ATM" - RFC 1626, "Default IP MTU for use over ATM AAL5" - RFC 1755, "ATM Signaling Support for IP over ATM" - RFC 2225, "Classical IP and ARP over ATM" - RFC 2334, "Server Cache Synchronization Protocol (SCSP)" - Internet Draft draft-ietf-ion-scsp-atmarp-00.txt, "A Distributed ATMARP Service Using SCSP" o ATM Sockets interface - The file atm-sockets.txt contains further information What's not supported -------------------- The following major features of the above list are not currently supported: o UNI point-to-multipoint support o Driver support for Traffic Control/Quality of Service o SPANS multicast and MPP support o SPANS signalling using Efficient adapters This software was developed under the sponsorship of the Defense Advanced Research Projects Agency (DARPA). Reviewed (lightly) by: phk Submitted by: Network Computing Services, Inc.
# 8994ca3c	07-Sep-1998	Bruce Evans <bde@FreeBSD.org>	Removed statically configured mount type numbers (MOUNT_) and all references to them. The change a couple of days ago to ignore these numbers in statically configured vfsconf structs was slightly premature because the cd9660, cfs, devfs, ext2fs, nfs vfs's still used MOUNT_ instead of the number in their vfsconf struct.
# 500b04a2	05-Sep-1998	Bruce Evans <bde@FreeBSD.org>	Instantiate `nfs_mount_type' in a standard file so that it is present when nfs is an LKM. Declare it in a header file. Don't forget to use it in non-Lite2 code. Initialize it to -1 instead of to 0, since 0 will soon be the mount type number for the first vfs loaded. NetBSD uses strcmp() to avoid this ugly global.
# 3baf1478	02-Sep-1998	Bruce Evans <bde@FreeBSD.org>	Added a vfs_oid pointer and a vfs_uninit() function to struct vfsops. vfs_oid will be used to attach and detach vfs sysctls dynamically. vfs_uninit() will be used to clean up before modunloading vfs LKMs. The nfs LKM needs these features most.
# 53d2eb24	02-Sep-1998	Bruce Evans <bde@FreeBSD.org>	Backed out previous commit. VFS_LKM_NO_DEFAULT_DISPATCH wasn't used for long, and the ifdef for it broke the forward declaration for the dispatch function.
# 38bfd69b	25-Jul-1998	Alexander Langer <alex@FreeBSD.org>	Allow VFS LKMs to override the default module dispatch functions if VFS_LKM_NO_DEFAULT_DISPATCH is defined.
# 79cc756d	05-May-1998	Mike Smith <msmith@FreeBSD.org>	As described by the submitter: Reverse the VFS_VRELE patch. Reference counting of vnodes does not need to be done per-fs. I noticed this while fixing vfs layering violations. Doing reference counting in generic code is also the preference cited by John Heidemann in recent discussions with him. The implementation of alternative vnode management per-fs is still a valid requirement for some filesystems but will be revisited sometime later, most likely using a different framework. Submitted by: Michael Hancock <michaelh@cet.co.jp>
# 5ddc8ded	08-Apr-1998	Wolfram Schneider <wosch@FreeBSD.org>	New mount option nosymfollow. If enabled, the kernel lookup() function will not follow symbolic links on the mounted file system and return EACCES (Permission denied).
# 8c375f58	27-Mar-1998	Bruce Evans <bde@FreeBSD.org>	Don't export anything from <sys/socket.h> except AF_MAX from here. This only affects the KERNEL case. Don't include <sys/radix.h> twice for the KERNEL case. This fixes a mismerge from Lite2. Don't include <sys/radix.h> at all for the !KERNEL case. This fixes a wrong cleanup in Lite2.
# 08637435	28-Mar-1998	Bruce Evans <bde@FreeBSD.org>	Moved some #includes from <sys/param.h> nearer to where they are actually used.
# b1897c19	08-Mar-1998	Julian Elischer <julian@FreeBSD.org>	Reviewed by: dyson@freebsd.org (john Dyson), dg@root.com (david greenman) Submitted by: Kirk McKusick (mcKusick@mckusick.com) Obtained from: WHistle development tree
# 34bdbbd0	01-Mar-1998	Mike Smith <msmith@FreeBSD.org>	The intent is to get rid of WILLRELE in vnode_if.src by making a complement to all ops that return a vpp, VFS_VRELE. This is initially only for file systems that implement the following ops that do a WILLRELE: vop_create, vop_whiteout, vop_mknod, vop_remove, vop_link, vop_rename, vop_mkdir, vop_rmdir, vop_symlink This is initial DNA that doesn't do anything yet. VFS_VRELE is implemented but not called. A default vfs_vrele was created for fs implementations that use the standard vnode management routines. VFS_VRELE implementations were made for the following file systems: Standard (vfs_vrele) ffs mfs nfs msdosfs devfs ext2fs Custom union umapfs Just EOPNOTSUPP fdesc procfs kernfs portal cd9660 These implementations may change as VOP changes are implemented. In the next phase, in the vop implementations calls to vrele and the vrele part of vput will be moved to the top layer vfs_vnops and made visible to all layers. vput will be replaced by unlock in these cases. Unlocking will still be done in the per fs layer but the refcount decrement will be triggered at the top because it doesn't hurt to hold a vnode reference a little longer. This will have minimal impact on the structure of the existing code. This will only be done for vnode arguments that are released by the various fs vop implementations. Wider use of VFS_VRELE will likely require restructuring of the code. Reviewed by: phk, dyson, terry et. al. Submitted by: Michael Hancock <michaelh@cet.co.jp>
# 2a44bbdd	21-Feb-1998	Jordan K. Hubbard <jkh@FreeBSD.org>	MF22: correct comments.
# c60ee1df	21-Feb-1998	Jordan K. Hubbard <jkh@FreeBSD.org>	MF22: CODA entries. They'll have to rework their usage of malloc somewhat in -current before this will work, but these should at least serve as place-holders.
# 7d6c26d6	05-Feb-1998	John Dyson <dyson@FreeBSD.org>	Add MNT_LAZY.
# bf49c427	20-Jan-1998	Bruce Evans <bde@FreeBSD.org>	Moved most of the (source-level) compatibility hacks for the vfsconf interface from sys/mount.h to libc/getvfsent.c The new interface is now the default. Sorted the prototypes for the library functions.
# 95802bf8	25-Nov-1997	Julian Elischer <julian@FreeBSD.org>	Shift a few SYSINT() calls around. this results in a few functions becoming static, and the SYSINITs being close to the code they are related to. setting up the dump device is with dumpsys() and kicking off the scheduler is with the scheduler. Mounting root is with the code that does it. Reviewed by: phk
# f2915552	21-Nov-1997	Bruce Evans <bde@FreeBSD.org>	Fixed some style and contents bugs in comments. Copied comments are usually wrong.
# 52bf64c7	12-Nov-1997	Julian Elischer <julian@FreeBSD.org>	Reviewed by: hackers@freebsd.org in general Obtained from: Whistle Communications tree Add an option to the way UFS works dependent on the SUID bit of directories This changes makes things a whole lot simpler on systems running as fileservers for PCs and MACS. to enable the new code you must 1/ enable option SUIDDIR on the kernel. 2/ mount the filesystem with option suiddir. hopefully this makes it difficult enough for people to do this accidentally. see the new chmod(2) man page for detailed info.
# b1f4a44b	11-Nov-1997	Julian Elischer <julian@FreeBSD.org>	Reviewed by: various. Ever since I first say the way the mount flags were used I've hated the fact that modes, and events, internal and exported, and short-term and long term flags are all thrown together. Finally it's annoyed me enough.. This patch to the entire FreeBSD tree adds a second mount flag word to the mount struct. it is not exported to userspace. I have moved some of the non exported flags over to this word. this means that we now have 8 free bits in the mount flags. There are another two that might well move over, but which I'm not sure about. The only user visible change would have been in pstat -v, except that davidg has disabled it anyhow. I'd still like to move the state flags and the 'command' flags apart from each other.. e.g. MNT_FORCE really doesn't have the same semantics as MNT_RDONLY, but that's left for another day.
# a1c995b6	12-Oct-1997	Poul-Henning Kamp <phk@FreeBSD.org>	Last major round (Unless Bruce thinks of somthing :-) of malloc changes. Distribute all but the most fundamental malloc types. This time I also remembered the trick to making things static: Put "static" in front of them. A couple of finer points by: bde
# 81bca6dd	27-Sep-1997	KATO Takenori <kato@FreeBSD.org>	Clustered read and write are switched at mount-option level. 1. Clustered I/O is switched by the MNT_NOCLUSTERR and MNT_NOCLUSTERW bits of the mnt_flag. The sysctl variables, vfs.foo.doclusterread and vfs.foo.doclusterwrite are deleted. Only mount option can control clustered I/O from userland. 2. When foofs_mount mounts block device, foofs_mount checks D_CLUSTERR and D_CLUSTERW bits of the d_flags member in the block device switch table. If D_NOCLUSTERR / D_NOCLUSTERW are set, MNT_NOCLUSTERR / MNT_NOCLUSTERW bits will be set. In this case, MNT_NOCLUSTERR and MNT_NOCLUSTERW cannot be cleared from userland. 3. Vnode driver disables both clustered read and write. 4. Union filesystem disables clutered write. Reviewed by: bde
# f116a277	16-Sep-1997	Bruce Evans <bde@FreeBSD.org>	Drop temporary source-level compatibility for old mount(2) interface.
# 57bf258e	16-Aug-1997	Garrett Wollman <wollman@FreeBSD.org>	Fix all areas of the system (or at least all those in LINT) to avoid storing socket addresses in mbufs. (Socket buffers are the one exception.) A number of kernel APIs needed to get fixed in order to make this happen. Also, fix three protocol families which kept PCBs in mbufs to not malloc them instead. Delete some old compatibility cruft while we're at it, and add some new routines in the in_cksum family.
# 8b059767	22-Jul-1997	Bruce Evans <bde@FreeBSD.org>	Quick and dirty (?) fix for noatime option. The WebNFS changes broke it by using the same value for MNT_EXPUBLIC as for MNT_NOATIME. Just use a different value for MNT_EXPUBLIC.
# 2279b5f4	16-Jul-1997	Doug Rabson <dfr@FreeBSD.org>	Merge WebNFS changes from NetBSD. Obtained from: NetBSD
# 0ddf9be1	06-Apr-1997	Peter Dufault <dufault@FreeBSD.org>	Make MOD_* macros almost consistent: Use the name argument almost the same in all LKM types. Maintain the current behavior for the external (e.g., modstat) name for DEV, EXEC, and MISC types being #name ## "_mod" and SYCALL and VFS only #name. This is a candidate for change and I vote just the name without the "_mod". Change the DISPATCH macro to MOD_DISPATCH for consistency with the other macros. Add an LKM_ANON #define to eliminate the magic -1 and associated signed/unsigned warnings. Add MOD_PRIVATE to support wcd.c's poking around in the lkm structure. Change source in tree to use the new interface. Reviewed by: Bruce Evans
# 379184c8	03-Mar-1997	Bruce Evans <bde@FreeBSD.org>	Fixed the getvfsbyname macro hack.
# dc91a89e	02-Mar-1997	Bruce Evans <bde@FreeBSD.org>	Restored some pre-Lite2-merge source-level compatibility to the mount() and getvfsbyname() interfaces. The new interfaces are now hidden from applications unless _NEW_VFSCONF is defined. The new vfsconf interfaces don't work yet.
# 6875d254	22-Feb-1997	Peter Wemm <peter@FreeBSD.org>	Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.
# 670718e2	12-Feb-1997	Mike Pritchard <mpp@FreeBSD.org>	Remove function prototypes for vfs_mountroot and vgoneall, since they were removed with the Lite2 merge. Submitted by: bde
# 724ab195	11-Feb-1997	Mike Pritchard <mpp@FreeBSD.org>	Add function prototypes for most of the new Lite2 functions. Also made a few of the miscfs routines static to be consistent. Some modules simply required some additional #includes to remove -Wall warnings.
# 996c772f	09-Feb-1997	John Dyson <dyson@FreeBSD.org>	This is the kernel Lite/2 commit. There are some requisite userland changes, so don't expect to be able to run the kernel as-is (very well) without the appropriate Lite/2 userland changes. The system boots and can mount UFS filesystems. Untested: ext2fs, msdosfs, NFS Known problems: Incorrect Berkeley ID strings in some files. Mount_std mounts will not work until the getfsent library routine is changed. Reviewed by: various people Submitted by: Jeffery Hsu <hsu@freebsd.org>
# 1130b656	14-Jan-1997	Jordan K. Hubbard <jkh@FreeBSD.org>	Make the long-awaited change from $Id$ to $FreeBSD$ This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long. Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
# 17a6a9e3	17-Oct-1996	Jordan K. Hubbard <jkh@FreeBSD.org>	Some very small changes to support Netcon's TFS filesystem. These patches were formerly applied by the Netcon installer before rebuilding your kernel.
# caa05533	11-Sep-1996	Bruce Evans <bde@FreeBSD.org>	Added a struct tag `fsid' for fsid_t so that sysproto.h can declare prototypes for the lfs syscalls without having to include <sys/mount.h> and its nested spam.
# 9e043042	03-Sep-1996	David Greenman <dg@FreeBSD.org>	Implemented kernel side of MNT_NOATIME mount option. This option disables the file access time update on reads and can be useful in reducing filesystem overhead in cases where the access time is not important (like Usenet news spools).
# 02e2c406	11-Mar-1996	Peter Wemm <peter@FreeBSD.org>	Import 4.4BSD-Lite2 onto the vendor branch, note that in the kernel, all files are off the vendor branch, so this should not change anything. A "U" marker generally means that the file was not changed in between the 4.4Lite and Lite-2 releases, and does not need a merge. "C" generally means that there was a change. [new sys/syscallargs.h file, to be "cvs rm"ed]
# 6c5e9bbd	30-Jan-1996	Mike Pritchard <mpp@FreeBSD.org>	Fix a bunch of spelling errors in the comment fields of a bunch of system include files.
# 13a6df99	22-Dec-1995	Poul-Henning Kamp <phk@FreeBSD.org>	Remove the now obsolete vfs_sysctl vfsops element.
# e7b632b5	13-Nov-1995	Bruce Evans <bde@FreeBSD.org>	Replaced nosys() by lkm_nullcmd().
# bacc8b16	05-Nov-1995	John Dyson <dyson@FreeBSD.org>	Changes to existing files for ext2fs support. The UFS mods need rework in the future as they are a bit crufty -- but at least the stuff is in the tree now.
# 4590fd3a	09-Sep-1995	David Greenman <dg@FreeBSD.org>	Fixed init functions argument type - caddr_t -> void *. Fixed a couple of compiler warnings.
# 8d7459c5	29-Aug-1995	Bruce Evans <bde@FreeBSD.org>	Declare vfs_mountroot() in the right place.
# 2b14f991	28-Aug-1995	Julian Elischer <julian@FreeBSD.org>	Reviewed by: julian with quick glances by bruce and others Submitted by: terry (terry lambert) This is a composite of 3 patch sets submitted by terry. they are: New low-level init code that supports loadbal modules better some cleanups in the namei code to help terry in 16-bit character support some changes to the mount-root code to make it a little more modular.. NOTE: mounting root off cdrom or NFS MIGHT be broken as I haven't been able to test those cases.. certainly mounting root of disk still works just fine.. mfs should work but is untested. (tomorrows task) The low level init stuff includes a total rewrite of init_main.c to make it possible for new modules to have an init phase by simply adding an entry to a TEXT_SET (or is it DATA_SET) list. thus a new module can be added to the kernel without editing any other files other than the 'files' file.
# 825a4d8e	25-Aug-1995	David Greenman <dg@FreeBSD.org>	Killed MNT_NOAUTO.
# d38820bf	23-Aug-1995	Jordan K. Hubbard <jkh@FreeBSD.org>	Damn! As Rod just reminded me, I didn't apply the tweak to make this obey the mask properly. I had it locally, but not in the diffs I brought across.. :-( Thanks, Rod.
# e0adc2d3	22-Aug-1995	Jordan K. Hubbard <jkh@FreeBSD.org>	Support for NOAUTO mounts. Submitted by: "Full Name Not Supplied" <simon@masi.ibp.fr>
# 628641f8	11-Aug-1995	David Greenman <dg@FreeBSD.org>	Converted mountlist to a CIRCLEQ. Partially obtained from: 4.4BSD-Lite2
# a62dc406	27-Jun-1995	Doug Rabson <dfr@FreeBSD.org>	Changes to support version 3 of the NFS protocol. The version 2 support has been tested (client+server) against FreeBSD-2.0, IRIX 5.3 and FreeBSD-current (using a loopback mount). The version 2 support is stable AFAIK. The version 3 support has been tested with a loopback mount and minimally against an IRIX 5.3 server. It needs more testing and may have problems. I have patched amd to support the new variable length filehandles although it will still only use version 2 of the protocol. Before booting a kernel with these changes, nfs clients will need to at least build and install /usr/sbin/mount_nfs. Servers will need to build and install /usr/sbin/mountd. NFS diskless support is untested. Obtained from: Rick Macklem <rick@snowhite.cis.uoguelph.ca>
# 9b2e5354	30-May-1995	Rodney W. Grimes <rgrimes@FreeBSD.org>	Remove trailing whitespace.
# 61f5d510	21-May-1995	David Greenman <dg@FreeBSD.org>	Changes to fix the following bugs: 1) Files weren't properly synced on filesystems other than UFS. In some cases, this lead to lost data. Most likely would be noticed on NFS. The fix is to make the VM page sync/object_clean general rather than in each filesystem. 2) Mixing regular and mmaped file I/O on NFS was very broken. It caused chunks of files to end up as zeroes rather than the intended contents. The fix was to fix several race conditions and to kludge up the "b_dirtyoff" and "b_dirtyend" that NFS relies upon - paying attention to page modifications that occurred via the mmapping. Reviewed by: David Greenman Submitted by: John Dyson
# 999422d7	19-Apr-1995	Julian Elischer <julian@FreeBSD.org>	Reviewed by: no-one yet, but non-intrusive Submitted by: julian@tfs.com Obtained from: written from scratch slight changes to make space for devfs.. (also conditional test code in i386/isa/fd.c) =================================================================== RCS file: /home/ncvs/src/sys/sys/malloc.h,v retrieving revision 1.7 diff -r1.7 malloc.h 113a114,117 > #define M_DEVFSMNT 62 /* DEVFS mount structure / > #define M_DEVFSBACK 63 / DEVFS Back node / > #define M_DEVFSFRONT 64 / DEVFS Front node / > #define M_DEVFSNODE 65 / DEVFS node / 184c188,192 < NULL, NULL, NULL, NULL, NULL, \ --- > "DEVFS mount", / 62 M_DEVFSMNT / \ > "DEVFS back", / 63 M_DEVFSBACK / \ > "DEVFS front", / 64 M_DEVFSFRONT / \ > "DEVFS node", / 65 M_DEVFSNODE / \ > NULL, \ Index: sys/mount.h =================================================================== RCS file: /home/ncvs/src/sys/sys/mount.h,v retrieving revision 1.16 diff -r1.16 mount.h 100c100,101 < #define MOUNT_MAXTYPE 15 --- > #define MOUNT_DEVFS 16 / existing device Filesystem / > #define MOUNT_MAXTYPE 16 118a120 > "devfs", / 15 MOUNT_DEVFS */ \ Index: sys/vnode.h =================================================================== RCS file: /home/ncvs/src/sys/sys/vnode.h,v retrieving revision 1.19 diff -r1.19 vnode.h 61c61 < VT_UNION, VT_MSDOSFS --- > VT_UNION, VT_MSDOSFS, VT_DEVFS
# 3c6bef7e	10-Apr-1995	Garrett Wollman <wollman@FreeBSD.org>	Correct name `cd9660' for MOUNT_CD9660 (but NB that this whole table is bogus and only exists for the benefit of find(1)). Old name was `iso9660fs'. Submitted by: Andrew Atrens <atreand@statcan.ca>
# bbf3a566	16-Mar-1995	Garrett Wollman <wollman@FreeBSD.org>	Add four more filesystem flags: VFCF_NETWORK (this FS goes over the net) VFCF_READONLY (read-write mounts do not make any sense) VFCF_SYNTHETIC (data in this FS is not real) VFCF_LOOPBACK (this FS aliases something else) cd9660 is readonly; nullfs, umapfs, and union are loopback; NFS is netowkr; procfs, kernfs, and fdesc are synthetic.
# cff19ac2	16-Mar-1995	Garrett Wollman <wollman@FreeBSD.org>	Statically-compiled filesystems now use a VFCF_STATIC flag rather than abusing the refcount.
# b5e8ce9f	16-Mar-1995	Bruce Evans <bde@FreeBSD.org>	Add and move declarations to fix all of the warnings from `gcc -Wimplicit' (except in netccitt, netiso and netns) and most of the warnings from `gcc -Wnested-externs'. Fix all the bugs found. There were no serious ones.
# 03a62940	19-Oct-1994	Garrett Wollman <wollman@FreeBSD.org>	Actually implement the functionality documented in sysctl.h for type CTL_FS. (Namely, call a filesystem-dependent sysctl function analogous to how it works for networking and (now) physical devices.)
# c172c3e6	27-Sep-1994	Poul-Henning Kamp <phk@FreeBSD.org>	ktrace.c: added decl of ktrnamei lkm.h: added decl of lkmdispatch mount.h: added decl of vfs_busy,vfs_unbusy syscall: The "created from" changed.
# dff55bb5	21-Sep-1994	Garrett Wollman <wollman@FreeBSD.org>	mount.h: Declare getvfs* functions from libc. vfs_init.c: Fix fs_sysctl() so that getvfs* functions actually work.
# 67bfdf83	21-Sep-1994	Garrett Wollman <wollman@FreeBSD.org>	Fix a few niggling little bugs: - set args->lkm_offset correctly so that VFS modules can be unloaded - initialize _fs_vfsops.vfc_refcount correctly so that VFS modules can be unloaded - include kernel.h in a few placves to get the correct definition of DATA_SET
# c901836c	20-Sep-1994	Garrett Wollman <wollman@FreeBSD.org>	Implemented loadable VFS modules, and made most existing filesystems loadable. (NFS is a notable exception.)
# 27a0bc89	19-Sep-1994	Doug Rabson <dfr@FreeBSD.org>	Added msdosfs. Obtained from: NetBSD
# d8f10c11	15-Sep-1994	Bruce Evans <bde@FreeBSD.org>	Add some prototypes.
# b531a9b1	22-Aug-1994	Bruce Evans <bde@FreeBSD.org>	- Fix warnings in df, etc. caused by misplaced declaration of doumount(). - Fix bogus comments caused by misplaced #endif.
# af9da405	20-Aug-1994	Paul Richards <paul@FreeBSD.org>	Made them all idempotent. Reviewed by: Submitted by:
# e0e9c421	20-Aug-1994	David Greenman <dg@FreeBSD.org>	Implemented filesystem clean bit via: machdep.c: Changed printf's a little and call vfs_unmountall() if the sync was successful. cd9660_vfsops.c, ffs_vfsops.c, nfs_vfsops.c, lfs_vfsops.c: Allow dismount of root FS. It is now disallowed at a higher level. vfs_conf.c: Removed unused rootfs global. vfs_subr.c: Added new routines vfs_unmountall and vfs_unmountroot. Filesystems are now dismounted if the machine is properly rebooted. ffs_vfsops.c: Toggle clean bit at the appropriate places. Print warning if an unclean FS is mounted. ffs_vfsops.c, lfs_vfsops.c: Fix bug in selecting proper flags for VOP_CLOSE(). vfs_syscalls.c: Disallow dismounting root FS via umount syscall.
# 3c4dd356	02-Aug-1994	David Greenman <dg@FreeBSD.org>	Added $Id$
# df8bae1d	24-May-1994	Rodney W. Grimes <rgrimes@FreeBSD.org>	BSD 4.4 Lite Kernel Sources