Cross Reference: /freebsd-current/sys/ufs/ffs/ffs

History log of /freebsd-current/sys/ufs/ffs/ffs_vfsops.c
Revision	Date	Author	Comments
# 29363fb4	23-Nov-2023	Warner Losh <imp@FreeBSD.org>	sys: Remove ancient SCCS tags. Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl script. Sponsored by: Netflix
# 685dc743	16-Aug-2023	Warner Losh <imp@FreeBSD.org>	sys: Remove $FreeBSD$: one-line .c pattern Remove /^[\s]__FBSDID$"\$FreeBSD\$"$;?\s*\n/
# 886fd36e	10-Aug-2023	Kirk McKusick <mckusick@FreeBSD.org>	Clean up and document UFS/FFS error returns. The ffs_inotovp() function returns a vnode from a mounted filesystem for an inode number with specified generation number. We now consistently return ESTALE if the inode with given generation number no longer exists on that filesystem. The ffs_reload() function reloads all incore data for a filesystem. It is used after running fsck on a mounted filesystem and finding things to fix. It now returns the EINTEGRITY error if it is unable to find a valid superblock. MFC-after: 1 week Sponsored-by: The FreeBSD Foundation
# 60a41168	10-Aug-2023	Chuck Silvers <chs@FreeBSD.org>	UFS: panic rather than forcibly unmount the root fs If the root fs is forcibly unmounted then basically every process will die with a SEGV as soon as it tries to run again because libc.so is gone, which leaves the system basically hung. It seems better to just panic instead, so let's do that. Requested-by: karels Reviewed-by: imp, mckusick, karels Sponsored-by: Netflix Differential Revision: https://reviews.freebsd.org/D41387
# 831b1ff7	27-Jul-2023	Kirk McKusick <mckusick@FreeBSD.org>	UFS/FFS: Migrate to modern uintXX_t from u_intXX_t. As per https://lists.freebsd.org/archives/freebsd-scsi/2023-July/000257.html move to the modern uintXX_t. While here also migrate u_char to uint8_t. Where other kernel interfaces allow, migrate u_long to uint64_t. No functional changes intended. MFC-after: 1 week Sponsored-by: The FreeBSD Foundation
# 4d9b2ed3	22-Jul-2023	Mateusz Guzik <mjg@FreeBSD.org>	ufs: stop using LK_SLEEPFAIL in ffs_sync It provides nothing as either locking succeeds or fails with ENOENT as is.
# 1d9f3a37	06-Jan-2023	Konstantin Belousov <kib@FreeBSD.org>	Stop cleaning MNT_LOCAL on unmount There is no point in clearing just this flag. Flags are reset on the struct mount re-allocation for reuse anyway. Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D37966
# 829f0bcb	19-Dec-2022	Mateusz Guzik <mjg@FreeBSD.org>	vfs: add the concept of vnode state transitions To quote from a comment above vput_final: <quote> * XXX Some filesystems pass in an exclusively locked vnode and strongly depend * on the lock being held all the way until VOP_INACTIVE. This in particular * happens with UFS which adds half-constructed vnodes to the hash, where they * can be found by other code. </quote> As is there is no mechanism which allows filesystems to denote that a vnode is fully initialized, consequently problems like the above are only found the hard way(tm). Add rudimentary support for state transitions, which in particular allow to assert the vnode is not legally unlocked until its fate is decided (either construction finishes or vgone is called to abort it). The new field lands in a 1-byte hole, thus it does not grow the struct. Bump __FreeBSD_version to 1400077 Reviewed by: kib (previous version) Tested by: pho Differential Revision: https://reviews.freebsd.org/D37759
# ed1bb254	19-Dec-2022	Mateusz Guzik <mjg@FreeBSD.org>	mntfs: change mntfs_allocvp API to relock on its own Reviewed by: kib Tested by: pho Differential Revision: https://reviews.freebsd.org/D37759
# 08e5f519	05-Dec-2022	Kirk McKusick <mckusick@FreeBSD.org>	Provide more precise mount(8) failure message. Suggested by: Xin LI Reviewed by: kib PR: 19683 MFC after: 1 week
# 27d673fb	27-Sep-2022	Kirk McKusick <mckusick@FreeBSD.org>	When taking a snapshot on a UFS/FFS filesystem, it must be mounted. The "update" mount option must be specified when the "snapshot" mount option is used. Return EINVAL if the "snapshot" option is specified without the "update" option also requested. Reported by: Robert Morris Reviewed by: kib PR: 265362 MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
# 6b9d4fbb	13-Aug-2022	Kirk McKusick <mckusick@FreeBSD.org>	Explicitly initialize rather than reading newly allocated UFS inodes. The function ffs_vgetf() is used to find or load UFS inodes into a vnode. It first looks up the inode and if found in the cache its vnode is returned. If it is not already in the cache, a new vnode is allocated and its associated inode read in from the disk. The read is done even for inodes that are being initially created. The contents for the inode on the disk are assumed to be empty. If the on-disk contents had been corrupted either due to a hardware glitch or an agent deliberately trying to exploit the system, the UFS code could panic from the unexpected partially-allocated inode. Rather then having fsck_ffs(8) verify that all unallocated inodes are properly empty, it is easier and quicker to add a flag to ffs_vgetf() to indicate that the request is for a newly allocated inode. When set, the disk read is skipped and the inode is set to its expected empty (zero'ed out) initial state. As a side benefit, an unneeded disk I/O is avoided. Reported by: Peter Holm Sponsored by: The FreeBSD Foundation
# e6886616	13-Aug-2022	Kirk McKusick <mckusick@FreeBSD.org>	Move the ability to search for alternate UFS superblocks from fsck_ffs(8) into ffs_sbsearch() to allow use by other parts of the system. Historically only fsck_ffs(8), the UFS filesystem checker, had code to track down and use alternate UFS superblocks. Since fsdb(8) used much of the fsck_ffs(8) implementation it had some ability to track down alternate superblocks. This change extracts the code to track down alternate superblocks from fsck_ffs(8) and puts it into a new function ffs_sbsearch() in sys/ufs/ffs/ffs_subr.c. Like ffs_sbget() and ffs_sbput() also found in ffs_subr.c, these functions can be used directly by the kernel subsystems. Additionally they are exported to the UFS library, libufs(8) so that they can be used by user-level programs. The new functions added to libufs(8) are sbfind(3) that is an alternative to sbread(3) and sbsearch(3) that is an alternative to sbget(3). See their manual pages for further details. The utilities that have been changed to search for superblocks are dumpfs(8), fsdb(8), ffsinfo(8), and fsck_ffs(8). Also, the prtblknos(8) tool found in tools/diag/prtblknos searches for superblocks. The UFS specific mount code uses the superblock search interface when mounting the root filesystem and when the administrator doing a mount(8) command specifies the force flag (-f). The standalone UFS boot code (found in stand/libsa/ufs.c) uses the superblock search code in the hope of being able to get the system up and running so that fsck_ffs(8) can be used to get the filesystem cleaned up. The following utilities have not been changed to search for superblocks: clri(8), tunefs(8), snapinfo(8), fstyp(8), quot(8), dump(8), fsirand(8), growfs(8), quotacheck(8), gjournal(8), and glabel(8). When these utilities fail, they do report the cause of the failure. The one exception is the tasting code used to try and figure what a given disk contains. The tasting code will remain silent so as not to put out a slew of messages as it trying to taste every new mass storage device that shows up. Reviewed by: kib Reviewed by: Warner Losh Tested by: Peter Holm Differential Revision: https://reviews.freebsd.org/D36053 Sponsored by: The FreeBSD Foundation
# b21582ee	30-Jul-2022	Kirk McKusick <mckusick@FreeBSD.org>	Add a flags parameter to the ffs_sbget() function that reads UFS superblocks. Rather than trying to shoehorn flags into the requested superblock address, create a separate flags parameter to the ffs_sbget() function in sys/ufs/ffs/ffs_subr.c. The ffs_sbget() function is used both in the kernel and in user-level utilities through export to the sbget() function in the libufs(3) library (see sbget(3) for details). The kernel uses ffs_sbget() when mounting UFS filesystems, in the glabel(8) and gjournal(8) GEOM utilities, and in the standalone library used when booting the system from a UFS root filesystem. The ffs_sbget() function reads the superblock located at the byte offset specified by its sblockloc parameter. The value UFS_STDSB may be specified for sblockloc to request that the standard location for the superblock be read. The two existing options are now flags: UFS_NOHASHFAIL will note if the check hash is wrong but will still return the superblock. This is used by the bootstrap code to give the system a chance to come up so that fsck can be run to correct the problem. UFS_NOMSG indicates that superblock inconsistency error messages should not be printed. It is used by programs like fsck that want to print their own error message and programs like glabel(8) that just want to know if a UFS filesystem exists on a partition. One additional flag is added: UFS_NOCSUM causes only the superblock itself to be returned, but does not read in any auxiliary data structures like the cylinder group summary information. It is used by clients like glabel(8) that just want to check for possible filesystem types. Using UFS_NOCSUM skips the superblock checks for csum data which allows superblocks that have corrupted csum data to be read and used. The validate_sblock() function checks that the superblock has not been corrupted in a way that can crash or hang the system. Unless the UFS_NOMSG flag is specified, it will print out any errors that it finds. Prior to this commit, validate_sblock() returned as soon as it found an inconsistency so would print at most one message. It now does all its checks so when UFS_NOMSG has not been specified will print out everything that it finds inconsistent. Sponsored by: The FreeBSD Foundation
# 064e6b43	13-Jul-2022	Kirk McKusick <mckusick@FreeBSD.org>	Rewrite function definitions in the UFS/FFS code base with identifier lists. The K&R style in UFS and other places in the tree's days are numbered as this syntax is removed in C2x proposal N2432: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2432.pdf Though running to nearly 6000 lines of diffs this update should cause no functional change to the code. Requested by: Warner Losh MFC after: 2 weeks
# f1b4324b	22-Jun-2022	Chuck Silvers <chs@FreeBSD.org>	ffs: fix vn_read_from_obj() usage for PAGE_SIZE > block size vn_read_from_obj() requires that all pages of a vnode (except the last partial page) be either completely valid or completely invalid, but for file systems with block size smaller than PAGE_SIZE, partially valid pages may exist anywhere in the file. Do not enable the vn_read_from_obj() path in this case. Reviewed by: mckusick, kib, markj Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D34836
# ca7c2d2e	09-May-2022	Konstantin Belousov <kib@FreeBSD.org>	UFS: clear fs_fmod once more, in the buffer data copy. This is needed for in-kernel copy of the code, where allocation might happen after fs_fmod is cleared in ffs_sbput() but before the write. Reported by: markj Reviewed by: chs, markj PR: 263765 Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D35149
# 4ac2df8f	08-May-2022	Konstantin Belousov <kib@FreeBSD.org>	ffs_use_bwrite: make the superblock snapshot more consistent Copy in-memory struct fs to the superblock buffer under the UFS mutex. Reviewed by: chs, markj PR: 263765 Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D35149
# 3dc5f8e1	08-Apr-2022	Chuck Silvers <chs@FreeBSD.org>	ffs: wait for trims earlier during unmount to avoid panic All softdep processing is supposed to be completed by softdep_flushfiles() and no more deps are supposed to be created after that, but if a pending trim completes after softdep_flushfiles() and before softdep_unmount() then the blkfree that is performed by ffs_blkfree_trim_task() will create a dep when none should exist, and if softdep_unmount() is called before that dep is freed then the kernel will panic. Prevent this by waiting for trims to complete earlier in the unmount process, in ffs_flushfiles(), so that any deps will be freed and any modified CG buffers will be flushed by the final fsync of the devvp in ffs_flushfiles() as intended. Reviewed by: mckusick, kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D34806
# bb92cd7b	24-Mar-2022	Mateusz Guzik <mjg@FreeBSD.org>	vfs: NDFREE(&nd, NDF_ONLY_PNBUF) -> NDFREE_PNBUF(&nd)
# ab2dbd9b	16-Mar-2022	Robert Wing <rew@FreeBSD.org>	ffs_mount(): fix snapshotting Commit 0455cc7104ec broke snapshotting for ffs. In that commit, ffs_mount() was changed so the namei() lookup for a disk device happens before ffs_snapshot(). This caused the issue where namei() would lookup the snapshot file and fail because the file doesn't exist. Even if it did exist, taking a snapshot would still fail since it's not a disk device. Fix this by taking a snapshot of the filesystem as-is and return without altering ro/rw or any other attributes that are passed in. Reported by: pho Reviewed by: mckusick Fixes: 0455cc7104ec ("ffs_mount(): return early if namei() fails to lookup disk device") Differential Revision: https://reviews.freebsd.org/D34562
# 0455cc71	07-Mar-2022	Robert Wing <rew@FreeBSD.org>	ffs_mount(): return early if namei() fails to lookup disk device With soft updates enabled, an INVARIANTS panic is hit in ffs_unmount(). The problem occurs in ffs_mount() when upgrading a mount from ro->rw. During a mount update, the soft update code gets set up but doesn't get cleaned up if namei() fails when looking up the disk device. Avoid this scenario by looking up the disk device first and bail early if the namei() lookup fails. PR: 256511 MFC After: 2 weeks Reviewed by: mckusick, kib Differential Revision: https://reviews.freebsd.org/D30870
# 303d3ae7	31-Jan-2022	Konstantin Belousov <kib@FreeBSD.org>	ufs, msdosfs: do not record witness order when creating vnode When allocating new vnode, we need to lock it exclusively before making it externally visible. Since other threads cannot observe the vnode yet, current lock order cannot create LoR conditions. Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D34126
# 1fbcaa13	02-Jan-2022	Kirk McKusick <mckusick@FreeBSD.org>	When doing a read-only mount of a UFS filesystem using gjournal(8), suppress error message about a missing gjournal provider. Submitted by: Andreas Longwitz MFC after: 2 weeks Sponsored by: Netflix
# 7e1d3eef	25-Nov-2021	Mateusz Guzik <mjg@FreeBSD.org>	vfs: remove the unused thread argument from NDINIT* See b4a58fbf640409a1 ("vfs: remove cn_thread") Bump __FreeBSD_version to 1400043.
# c34a5148	16-Nov-2021	Konstantin Belousov <kib@FreeBSD.org>	ffs: fix newly introduced LOR between mntfs vnode lock and topology lock The mntfs vnode lock should be before topology, as established in ffs_mountfs(). Extend the locked region in ffs_unmount(). Reported and reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D33013
# 9e9dcac9	15-Nov-2021	Kirk McKusick <mckusick@FreeBSD.org>	Allow forced r/w mount of UFS/FFS filesystem with a bad check hash. Normally a UFS/FFS filesystem with a bad check hash can only be mounted read only. With this commit the mount(8) -f (force) option can be used to force a read-write mount of a UFS/FFS filesystem with a bad check hash. Conveniently the filesystem will proceed to update its on-disk superblock with a corrected check hash. Sponsored by: Netflix
# 25809a01	01-Nov-2021	Konstantin Belousov <kib@FreeBSD.org>	mntfs: lock mntfs pseudo devfs vnode properly Require devvp locked for mntfs_freevp(), to have it locked around vgone(). Make that true for ffs, which is the only consumer of the interface. Reported and tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32761
# 76b05e3e	01-Nov-2021	Konstantin Belousov <kib@FreeBSD.org>	ffs: Remove assertions about locked um_devvp in several places Namely, ffs_blkfree_cg(), and ffs_flushfiles(). Reported and tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32761
# 2030ee0e	19-Oct-2021	Konstantin Belousov <kib@FreeBSD.org>	ufs: remove write-only variables Mark variables as __diagused for invariant-only vars Reviewed by: imp, mjg Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32577
# 9acea164	02-Oct-2021	Robert Wing <rew@FreeBSD.org>	ffs: retire unused fsckpid mount option The fsckpid mount option was introduced in 927a12ae16433b50 along with a couple sysctl's to support SU+J with snapshots. However, those sysctl's were never used and eventually removed in f2620e9ceb3ede02. There are no in-tree consumers of this mount option. Reviewed by: mckusick, kib Differential Revision: https://reviews.freebsd.org/D32015
# 440320b6	04-Sep-2021	Robert Wing <rew@FreeBSD.org>	ffs: remove unused thread argument from ffs_reload() MFC After: 1 week Reviewed by: imp, kib Differential Revision: https://reviews.freebsd.org/D31127
# 8df4bc48	05-Aug-2021	Konstantin Belousov <kib@FreeBSD.org>	ufs rename: ensure that the result of ufs_checkpath() is stable ufs_rename() calls ufs_checkpath() to ensure that the target directory is not a child of the source. If not, rename would create a loop. For instance: source->X1->X2->target and if source moved under target, we get corrupted filesystem. Suppose that we initially have source->X1 .... and X2->target where X1 is not on path from root to X2. Then ufs_checkpath() accepts the inodes, but there is nothing preventing parallel rename of X2 to become under X1, after checkpath finished. Ensure stability of ufs_checkpath() result by taking a per-mount sx in ufs_rename right before ufs_checkpath() and till the end. Reviewed by: chs, mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# 211ec9b7	17-Jul-2021	Jason A. Harmening <jah@FreeBSD.org>	FFS: remove ffs_fsfail_task Now that dounmount() supports a dedicated taskqueue, we can simply call it with MNT_DEFERRED directly from the failing context. This also avoids blocking taskqueue_thread with a potentially-expensive unmount operation. Reviewed by: kib, mckusick Tested by: pho Differential Revision: https://reviews.freebsd.org/D31016
# c746ed72	12-Jun-2021	Jason A. Harmening <jah@FreeBSD.org>	Allow stacked filesystems to be recursively unmounted In certain emergency cases such as media failure or removal, UFS will initiate a forced unmount in order to prevent dirty buffers from accumulating against the no-longer-usable filesystem. The presence of a stacked filesystem such as nullfs or unionfs above the UFS mount will prevent this forced unmount from succeeding. This change addreses the situation by allowing stacked filesystems to be recursively unmounted on a taskqueue thread when the MNT_RECURSE flag is specified to dounmount(). This call will block until all upper mounts have been removed unless the caller specifies the MNT_DEFERRED flag to indicate the base filesystem should also be unmounted from the taskqueue. To achieve this, the recently-added vfs_pin_from_vp()/vfs_unpin() KPIs have been combined with the existing 'mnt_uppers' list used by nullfs and renamed to vfs_register_upper_from_vp()/vfs_unregister_upper(). The format of the mnt_uppers list has also been changed to accommodate filesystems such as unionfs in which a given mount may be stacked atop more than one lower mount. Additionally, management of lower FS reclaim/unlink notifications has been split into a separate list managed by a separate set of KPIs, as registration of an upper FS no longer implies interest in these notifications. Reviewed by: kib, mckusick Tested by: pho Differential Revision: https://reviews.freebsd.org/D31016
# f784da88	17-May-2021	Konstantin Belousov <kib@FreeBSD.org>	Move mnt_maxsymlinklen into appropriate fs mount data structures Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week X-MFC-Note: struct mount layout Differential revision: https://reviews.freebsd.org/D30325
# 2af934cc	03-Mar-2021	Konstantin Belousov <kib@FreeBSD.org>	Assert that um_softdep is NULL on free(ump), i.e. softdep_unmount() was called Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D29178
# f776c54c	03-Mar-2021	Konstantin Belousov <kib@FreeBSD.org>	ffs_mount: when remounting ro->rw and sbupdate failed, cleanup softdeps Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D29178
# 7c7a6681	28-Feb-2021	Konstantin Belousov <kib@FreeBSD.org>	ffs: clear MNT_SOFTDEP earlier when remounting rw to ro Suppose that we remount rw->ro and in parallel some reader tries to instantiate a vnode, e.g. during lookup. Suppose that softdep_unmount() already started, but we did not cleared the MNT_SOFTDEP flag yet. Then ffs_vgetf() calls into softdep_load_inodeblock() which accessed destroyed hashes and freed memory. Set/clear fs_ronly simultaneously (WRT to files flush) with MNT_SOFTDEP. It might be reasonable to move the change of fs_ronly to under MNT_ILOCK, but no readers take it. Reported and tested by: pho Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D29178
# fd97fa64	03-Mar-2021	Konstantin Belousov <kib@FreeBSD.org>	Add FFSV_FORCEINODEDEP flag for ffs_vgetf() It will be used to allow SU flush code to sync the volume while external consumers see that SU is already disabled on the filesystem. Use it where ffs_vgetf() called by SU code to process dependencies. Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D29178
# 2bfd8992	14-Feb-2021	Konstantin Belousov <kib@FreeBSD.org>	vnode: move write cluster support data to inodes. The data is only needed by filesystems that 1. use buffer cache 2. utilize clustering write support. Requested by: mjg Reviewed by: asomers (previous version), fsu (ext2 parts), mckusick Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28679
# 89fd61d9	28-Jan-2021	Konstantin Belousov <kib@FreeBSD.org>	Merge ufs_fhtovp() into ffs_inotovp(). The function alone was not used for anything but ffs_fstovp() for long time. Suggested by: mckusick Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
# 5952c86c	26-Jan-2021	Konstantin Belousov <kib@FreeBSD.org>	ffs_inotovp(): interface to convert (ino, gen) into alive vnode It generalizes the VFS_FHTOVP() interface, making it possible to fetch the inode without faking filehandle. Also it adds the ffs flags argument which allows to control ffs_vgetf() call. Requested by: mckusick Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
# f16c26b1	26-Jan-2021	Konstantin Belousov <kib@FreeBSD.org>	ffs: Add FFSV_REPLACE_DOOMED flag to ffs_vgetf() It specifies that caller requests a fresh non-doomed vnode. If doomed vnode is found in the hash, it should behave similarly to FFSV_REPLACE. Or, to put it differently, the flag is same as FFSV_REPLACE, but only when the found hashed vnode is doomed. Reviewed by: chs, mkcusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
# bf0db193	29-Jan-2021	Konstantin Belousov <kib@FreeBSD.org>	buf SU hooks: track buf_start() calls with B_IOSTARTED flag and only call buf_complete() if previously started. Some error paths, like CoW failire, might skip buf_start() and do bufdone(), which itself call buf_complete(). Various SU handle_written_XXX() functions check that io was started and incomplete parts of the buffer data reverted before restoring them. This is a useful invariant that B_IO_STARTED on buffer layer allows to keep instead of changing check and panic into check and return. Reported by: pho Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundations
# a63eae65	30-Jan-2021	Kirk McKusick <mckusick@FreeBSD.org>	Revert 2d4422e7991a, Eliminate lock order reversal in UFS ffs_unmount(). After discussion with Chuck Silvers (chs@) we have decided that there is a better way to resolve this lock order reversal which will be committed separately. Sponsored by: Netflix
# 79a5c790	15-Jan-2021	Kirk McKusick <mckusick@FreeBSD.org>	Eliminate a locking panic when cleaning up UFS snapshots after a disk failure. Each vnode has an embedded lock that controls access to its contents. However vnodes describing a UFS snapshot all share a single snapshot lock to coordinate their access and update. As part of mounting a UFS filesystem with snapshots, each of the vnodes describing a snapshot has its individual lock replaced with the snapshot lock. When the filesystem is unmounted the vnode's original lock is returned replacing the snapshot lock. When a disk fails while the UFS filesystem it contains is still mounted (for example when a thumb drive is removed) UFS forcibly unmounts the filesystem. The loss of the drive causes the GEOM subsystem to orphan the provider, but the consumer remains until the filesystem has finished with the unmount. Information describing the snapshot locks was being prematurely cleared during the orphaning causing the return of the snapshot vnode's original locks to fail. The fix is to not clear the needed information prematurely. Sponsored by: Netflix
# 2d4422e7	11-Jan-2021	Kirk McKusick <mckusick@FreeBSD.org>	Eliminate lock order reversal in UFS ffs_unmount(). UFS uses a new "mntfs" pseudo file system which provides private device vnodes for a file system to safely access its disk device. The original device vnode is saved in um_odevvp to hold the exclusive lock on the device so that any attempts to open it for writing will fail. But it is otherwise unused and has its BO_NOBUFS flag set to enforce that file systems using mntfs vnodes do not accidentally use the original devfs vnode. When the file system is unmounted, um_odevvp is no longer needed and is released. The lock order reversal happens because device vnodes must be locked before UFS vnodes. During unmount, the root directory vnode lock is held. When when calling vrele() on um_odevvp, vrele() attempts to exclusive lock um_odevvp causing the lock order reversal. The problem is eliminated by doing a non-blocking exclusive lock on um_odevvp which will always succeed since there are no users of um_odevvp. With um_odevvp locked, it can be released using vput which does not attempt to do a blocking exclusive lock request and thus avoids the lock order reversal. Sponsored by: Netflix
# cd853791	27-Nov-2020	Konstantin Belousov <kib@FreeBSD.org>	Make MAXPHYS tunable. Bump MAXPHYS to 1M. Replace MAXPHYS by runtime variable maxphys. It is initialized from MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys. Make b_pages[] array in struct buf flexible. Size b_pages[] for buffer cache buffers exactly to atop(maxbcachebuf) (currently it is sized to atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1. The +1 for pbufs allow several pbuf consumers, among them vmapbuf(), to use unaligned buffers still sized to maxphys, esp. when such buffers come from userspace (). Overall, we save significant amount of otherwise wasted memory in b_pages[] for buffer cache buffers, while bumping MAXPHYS to desired high value. Eliminate all direct uses of the MAXPHYS constant in kernel and driver sources, except a place which initialize maxphys. Some random (and arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted straight. Some drivers, which use MAXPHYS to size embeded structures, get private MAXPHYS-like constant; their convertion is out of scope for this work. Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs, dev/siis, where either submitted by, or based on changes by mav. Suggested by: mav () Reviewed by: imp, mav, imp, mckusick, scottl (intermediate versions) Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D27225
# 8a1509e4	13-Nov-2020	Konstantin Belousov <kib@FreeBSD.org>	Handle LoR in flush_pagedep_deps(). When operating in SU or SU+J mode, ffs_syncvnode() might need to instantiate other vnode by inode number while owning syncing vnode lock. Typically this other vnode is the parent of our vnode, but due to renames occuring right before fsync (or during fsync when we drop the syncing vnode lock, see below) it might be no longer parent. More, the called function flush_pagedep_deps() needs to lock other vnode while owning the lock for vnode which owns the buffer, for which the dependencies are flushed. This creates another instance of the same LoR as was fixed in softdep_sync(). Put the generic code for safe relocking into new SU helper get_parent_vp() and use it in flush_pagedep_deps(). The case for safe relocking of two vnodes with undefined lock order was extracted into vn helper vn_lock_pair(). Due to call sequence ffs_syncvnode()->softdep_sync_buf()->flush_pagedep_deps(), ffs_syncvnode() indicates with ERELOOKUP that passed vnode was unlocked in process, and can return ENOENT if the passed vnode reclaimed. All callers of the function were inspected. Because UFS namei lookups store auxiliary information about directory entry in in-memory directory inode, and this information is then used by UFS code that creates/removed directory entry in the actual mutating VOPs, it is critical that directory vnode lock is not dropped between lookup and VOP. For softdep_prelink(), which ensures that later link/unlink operation can proceed without overflowing the journal, calls were moved to the place where it is safe to drop processing VOP because mutations are not yet applied. Then, ERELOOKUP causes restart of the whole VFS operation (typically VFS syscall) at top level, including the re-lookup of the involved pathes. [Note that we already do the same restart for failing calls to vn_start_write(), so formally this patch does not introduce new behavior.] Similarly, unsafe calls to fsync in snapshot creation code were plugged. A possible view on these failures is that it does not make sense to continue creating snapshot if the snapshot vnode was reclaimed due to forced unmount. It is possible that relock/ERELOOKUP situation occurs in ffs_truncate() called from ufs_inactive(). In this case, dropping the vnode lock is not safe. Detect the situation with VI_DOINGINACT and reschedule inactivation by setting VI_OWEINACT. ufs_inactive() rechecks VI_OWEINACT and avoids reclaiming vnode is truncation failed this way. In ffs_truncate(), allocation of the EOF block for partial truncation is re-done after vnode is synced, since we cannot leave the buffer locked through ffs_syncvnode(). In collaboration with: pho Reviewed by: mckusick (previous version), markj Tested by: markj (syzkaller), pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D26136
# 61846fc4	13-Nov-2020	Konstantin Belousov <kib@FreeBSD.org>	Add a framework that tracks exclusive vnode lock generation count for UFS. This count is memoized together with the lookup metadata in directory inode, and we assert that accesses to lookup metadata are done under the same lock generation as they were stored. Enabled under DIAGNOSTICS. UFS saves additional data for parent dirent when doing lookup (i_offset, i_count, i_endoff), and this data is used later by VOPs operating on dirents. If parent vnode exclusive lock is dropped and re-acquired between lookup and the VOP call, we corrupt directories. Framework asserts that corruption cannot occur that way, by tracking vnode lock generation counter. Updates to inode dirent members also save the counter, while users compare current and saved counters values. Also, fix a case in ufs_lookup_ino() where i_offset and i_count could be updated under shared lock. It is not a bug on its own since dvp i_offset results from such lookup cannot be used, but it causes false positive in the checker. In collaboration with: pho Reviewed by: mckusick (previous version), markj Tested by: markj (syzkaller), pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D26136
# 996d40f9	24-Oct-2020	Kirk McKusick <mckusick@FreeBSD.org>	Various new check-hash checks have been added to the UFS filesystem over various major releases. Superblock check hashes were added for the 12 release and cylinder-group and inode check hashes will appear in the 13 release. When a disk with a UFS filesystem is writably mounted, the kernel clears the feature flags for anything that it does not support. For example, if a UFS disk from a 12-stable kernel is mounted on an 11-stable system, the 11-stable kernel will clear the flag in the filesystem superblock that indicates that superblock check-hashs are being maintained. Thus if the disk is later moved back to a 12-stable system, the 12-stable system will know to ignore its incorrect check-hash. If the only filesystem modification done on the earlier kernel is to run a utility such as growfs(8) that modifies the superblock but neither updates the check-hash nor clears the feature flag indicating that it does not support the check-hash, the disk will fail to mount if it is moved back to its original newer kernel. This patch moves the code that clears the filesystem feature flags from the mount code (ffs_mountfs()) to the code that reads the superblock (ffs_sbget()). As ffs_sbget() is used by the kernel mount code and is imported into libufs(3), all the filesystem utilities will now also clear these flags when they make modifications to the filesystem. As suggested by John Baldwin, fsck_ffs(8) has been changed to accept and repair bad superblock check-hashes rather than refusing to run. This change allows fsck to recover filesystems that have been impacted by utilities older than those created after this change and is a sensible thing to do in any event. Reported by: John Baldwin (jhb@) MFC after: 2 weeks Sponsored by: Netflix
# e1ef4c29	08-Oct-2020	Konstantin Belousov <kib@FreeBSD.org>	Do not leak B_BARRIER. Normally when a buffer with B_BARRIER is written, the flag is cleared by g_vfs_strategy() when creating bio. But in some cases FFS buffer might not reach g_vfs_strategy(), for instance when copy-on-write reports an error like ENOSPC. In this case buffer is returned to dirty queue and might be written later by other means. Among then bdwrite() reasonably asserts that B_BARRIER is not set. In fact, the only current use of B_BARRIER is for lazy inode block initialization, where write of the new inode block is fenced against cylinder group write to mark inode as used. The situation could be seen that we break dependency by updating cg without written out inode. Practically since CoW was not able to find space for a copy of inode block, for the same reason cg group block write should fail. Reported by: pho Discussed with: chs, imp, mckusick Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D26511
# d90f2c36	01-Sep-2020	Mateusz Guzik <mjg@FreeBSD.org>	ufs: clean up empty lines in .c and .h files
# 7ad2a82d	18-Aug-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: drop the error parameter from vn_isdisk, introduce vn_isdisk_error Most consumers pass NULL.
# a92a971b	16-Aug-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: remove the thread argument from vget It was already asserted to be curthread. Semantic patch: @@ expression arg1, arg2, arg3; @@ - vget(arg1, arg2, arg3) + vget(arg1, arg2)
# 03337743	10-Aug-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: clean MNTK_FPLOOKUP if MNT_UNION is set Elides checking it during lookup.
# 9d5a594f	25-Jul-2020	Mateusz Guzik <mjg@FreeBSD.org>	ufs: add support for lockless lookup ACLs are not supported, meaning their presence will force the use of the old lookup. Reviewed by: kib Tested by: pho (in a patchset) Differential Revision: https://reviews.freebsd.org/D25579
# 93440bbe	18-Jun-2020	Kirk McKusick <mckusick@FreeBSD.org>	The binary representation of the superblock (the fs structure) is written out verbatim to the disk: see ffs_sbput() in sys/ufs/ffs/ffs_subr.c. It contains a pointer to the fs_summary_info structure. This pointer value inadvertently causes garbage to be stored. It is garbage because the pointer to the fs_summary_info structure is the address the then current stack or heap. Although a mere pointer does not reveal anything useful (like a part of a private key) to an attacker, garbage output deteriorates reproducibility. This commit zeros out the pointer to the fs_summary_info structure before writing the out the superblock. Reviewed by: kib Tested by: Peter Holm PR: 246983 Sponsored by: Netflix
# 34816cb9	18-Jun-2020	Kirk McKusick <mckusick@FreeBSD.org>	Move the pointers stored in the superblock into a separate fs_summary_info structure. This change was originally done by the CheriBSD project as they need larger pointers that do not fit in the existing superblock. This cleanup of the superblock eases the task of the commit that immediately follows this one. Suggested by: brooks Reviewed by: kib PR: 246983 Sponsored by: Netflix
# d9a8abf6	17-Jun-2020	Chuck Silvers <chs@FreeBSD.org>	Move all of the functions in ffs_subr.c that are only used by the ufs kernel module from that file into ffs_vfsops.c. This fixes the build for kernel configs that don't include FFS. PR: 247256 Submitted by: glebius Reviewed by: mckusick (earlier version) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D25285
# 1f7104d7	13-Jun-2020	Rick Macklem <rmacklem@FreeBSD.org>	Fix export_args ex_flags field so that is 64bits, the same as mnt_flags. Since mnt_flags was upgraded to 64bits there has been a quirk in "struct export_args", since it hold a copy of mnt_flags in ex_flags, which is an "int" (32bits). This happens to currently work, since all the flag bits used in ex_flags are defined in the low order 32bits. However, new export flags cannot be defined. Also, ex_anon is a "struct xucred", which limits it to 16 additional groups. This patch revises "struct export_args" to make ex_flags 64bits and replaces ex_anon with ex_uid, ex_ngroups and ex_groups (which points to a groups list, so it can be malloc'd up to NGROUPS in size. This requires that the VFS_CHECKEXP() arguments change, so I also modified the last "secflavors" argument to be an array pointer, so that the secflavors could be copied in VFS_CHECKEXP() while the export entry is locked. (Without this patch VFS_CHECKEXP() returns a pointer to the secflavors array and then it is used after being unlocked, which is potentially a problem if the exports entry is changed. In practice this does not occur when mountd is run with "-S", but I think it is worth fixing.) This patch also deleted the vfs_oexport_conv() function, since do_mount_update() does the conversion, as required by the old vfs_cmount() calls. Reviewed by: kib, freqlabs Relnotes: yes Differential Revision: https://reviews.freebsd.org/D25088
# d79ff54b	25-May-2020	Chuck Silvers <chs@FreeBSD.org>	This commit enables a UFS filesystem to do a forcible unmount when the underlying media fails or becomes inaccessible. For example when a USB flash memory card hosting a UFS filesystem is unplugged. The strategy for handling disk I/O errors when soft updates are enabled is to stop writing to the disk of the affected file system but continue to accept I/O requests and report that all future writes by the file system to that disk actually succeed. Then initiate an asynchronous forced unmount of the affected file system. There are two cases for disk I/O errors: - ENXIO, which means that this disk is gone and the lower layers of the storage stack already guarantee that no future I/O to this disk will succeed. - EIO (or most other errors), which means that this particular I/O request has failed but subsequent I/O requests to this disk might still succeed. For ENXIO, we can just clear the error and continue, because we know that the file system cannot affect the on-disk state after we see this error. For EIO or other errors, we arrange for the geom_vfs layer to reject all future I/O requests with ENXIO just like is done when the geom_vfs is orphaned. In both cases, the file system code can just clear the error and proceed with the forcible unmount. This new treatment of I/O errors is needed for writes of any buffer that is involved in a dependency. Most dependencies are described by a structure attached to the buffer's b_dep field. But some are created and processed as a result of the completion of the dependencies attached to the buffer. Clearing of some dependencies require a read. For example if there is a dependency that requires an inode to be written, the disk block containing that inode must be read, the updated inode copied into place in that buffer, and the buffer then written back to disk. Often the needed buffer is already in memory and can be used. But if it needs to be read from the disk, the read will fail, so we fabricate a buffer full of zeroes and pretend that the read succeeded. This zero'ed buffer can be updated and written back to disk. The only case where a buffer full of zeros causes the code to do the wrong thing is when reading an inode buffer containing an inode that still has an inode dependency in memory that will reinitialize the effective link count (i_effnlink) based on the actual link count (i_nlink) that we read. To handle this case we now store the i_nlink value that we wrote in the inode dependency so that it can be restored into the zero'ed buffer thus keeping the tracking of the inode link count consistent. Because applications depend on knowing when an attempt to write their data to stable storage has failed, the fsync(2) and msync(2) system calls need to return errors if data fails to be written to stable storage. So these operations return ENXIO for every call made on files in a file system where we have otherwise been ignoring I/O errors. Coauthered by: mckusick Reviewed by: kib Tested by: Peter Holm Approved by: mckusick (mentor) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D24088
# 71f26429	09-Apr-2020	Konstantin Belousov <kib@FreeBSD.org>	ufs: apply suspension for non-forced rw unmounts. Forced rw unmounts and remounts from rw to ro already suspend filesystem, which closes races with writers instantiating new vnodes while unmount flushes the queue. Original intent of not including non-forced unmounts into this regime was to allow such unmounts to fail if writer was active, but this did not worked well. Similar change, but causing all unmount, even involving only ro filesystem, were proposed in D24088, but I believe that suspending ro is undesirable, and definitely spends CPU time. Reported by: markj Discussed with: chs, mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week
# f15ccf88	06-Mar-2020	Chuck Silvers <chs@FreeBSD.org>	Add a new "mntfs" pseudo file system which provides private device vnodes for file systems to safely access their disk devices, and adapt FFS to use it. Also add a new BO_NOBUFS flag to allow enforcing that file systems using mntfs vnodes do not accidentally use the original devfs vnode to create buffers. Reviewed by: kib, mckusick Approved by: imp (mentor) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D23787
# 13532153	16-Feb-2020	Scott Long <scottl@FreeBSD.org>	Add rudamentary support for UFS to probe whether a block device supports the BIO_SPEEDUP command. Add complimentary support to the CAM periphs that support it. This is a redo of r357710.
# 85eb41f7	10-Feb-2020	Scott Long <scottl@FreeBSD.org>	Revert r357710 and 357711 until they can be debugged
# 7d99bda7	09-Feb-2020	Scott Long <scottl@FreeBSD.org>	Add rudamentary support for UFS to probe whether a block device supports the BIO_SPEEDUP command. Add complimentary support to the CAM periphs that support it.
# 6c44a3e0	25-Jan-2020	Mateusz Guzik <mjg@FreeBSD.org>	ufs: add vgone calls for unconstructed vnodes in the error path This mostly eliminates the requirement that vput never unlocks the vnode before calling VOP_INACTIVE. Note it may still be present for other filesystems. See r356126 for an example bug. Note vput stopped doing early unlock in r357070 thus this change does not affect correctness as it is. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23215
# 0297c138	14-Jan-2020	Kirk McKusick <mckusick@FreeBSD.org>	When sync'ing a mount point, the mount point's vnodes were scanned twice. Once to update the changed inodes, and a second time to update changed quota information. This change merges these two scans into a single scan which does both inode and quota updates. MFC after: 7 days
# 80663cad	12-Jan-2020	Mateusz Guzik <mjg@FreeBSD.org>	ufs: use lazy list instead of active list for syncer Quota code is temporarily regressed to do a full vnode scan. Reviewed by: jeff Tested by: pho (in a larger patch, previous version) Differential Revision: https://reviews.freebsd.org/D22996
# ac4ec141	12-Jan-2020	Mateusz Guzik <mjg@FreeBSD.org>	ufs: add a setter for inode i_flag field This will be used later to add vnodes to the lazy list. Reviewed by: kib (previous version), jeff Tested by: pho (in a larger patch) Differential Revision: https://reviews.freebsd.org/D22994
# b249ce48	03-Jan-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: drop the mostly unused flags argument from VOP_UNLOCK Filesystems which want to use it in limited capacity can employ the VOP_UNLOCK_FLAGS macro. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D21427
# e35cd9e3	06-Oct-2019	Mateusz Guzik <mjg@FreeBSD.org>	ufs: add root vnode caching See r353150. Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21646
# 44d37182	03-Oct-2019	Kirk McKusick <mckusick@FreeBSD.org>	Update ffs_getcg() function to accept a flags parameter to be passed to breadn_flags() in preparation for later need when doing forcible unmount when disk dies or is removed. No functional change. Sponsored by: Netflix
# f3cf6225	06-Sep-2019	Conrad Meyer <cem@FreeBSD.org>	ufs: Remove redundant brelse() after r294954 Same automation. No functional change.
# 16040222	29-Aug-2019	Konstantin Belousov <kib@FreeBSD.org>	UFS: stop reusing the vnode for reallocated inode. In ffs_valloc(), force reclaim existing vnode on inode reuse, instead of trying to re-initialize the same vnode for new purposes. This is done in preparation of changes to the vp->v_object lifecycle handling. A new FFSV_REPLACE flag to ffs_vgetf() directs the function to vgone(9) the vnode if found in vfs hash, instead of returning it. Reviewed by: markj, mckusick Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D21412
# 9454b4fd	06-Aug-2019	Kirk McKusick <mckusick@FreeBSD.org>	A race condition existed between the time a UFS/FFS superblock check hash was computed and the time that the superblock was copied to a buffer to be written to disk. The result was a failed superblock check hash the next time that the superblock was read. The fix is to compute the check hash after the superblock has been copied to a buffer to be written. PR: 236504 Reported by: Peter Holm Tested by: Peter Holm Sponsored by: Netflix
# fdf34aa3	17-Jul-2019	Kirk McKusick <mckusick@FreeBSD.org>	The error reported in FS-14-UFS-3 can only happen on UFS/FFS filesystems that have block pointers that are out-of-range for their filesystem. These out-of-range block pointers are corrected by fsck(8) so are only encountered when an unchecked filesystem is mounted. A new "untrusted" flag has been added to the generic mount interface that can be set when mounting media of unknown provenance or integrity. For example, a daemon that automounts a filesystem on a flash drive when it is plugged into a system. This commit adds a test to UFS/FFS that validates all block numbers before using them. Because checking for out-of-range blocks adds unnecessary overhead to normal operation, the tests are only done when the filesystem is mounted as an "untrusted" filesystem. Reported by: Christopher Krah, Thomas Barabosch, and Jan-Niclas Hilgert of Fraunhofer FKIE Reported as: FS-14-UFS-3: Out of bounds read in write-2 (ffs_alloccg) Reviewed by: kib Sponsored by: Netflix
# daba4da8	01-Jul-2019	Kirk McKusick <mckusick@FreeBSD.org>	Add a new "untrusted" option to the mount command. Its purpose is to notify the kernel that the file system is untrusted and it should use more extensive checks on the file-system's metadata before using it. This option is intended to be used when mounting file systems from untrusted media such as USB memory sticks or other externally-provided media. It will initially be used by the UFS/FFS file system, but should likely be expanded to be used by other file systems that may appear on external media like msdosfs, exfat, and ext2fs. Reviewed by: kib Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20786
# f89d2072	17-Jun-2019	Xin LI <delphij@FreeBSD.org>	Separate kernel crc32() implementation to its own header (gsb_crc32.h) and rename the source to gsb_crc32.c. This is a prerequisite of unifying kernel zlib instances. PR: 229763 Submitted by: Yoshihiro Ota <ota at j.email.ne.jp> Differential Revision: https://reviews.freebsd.org/D20193
# daec9284	21-May-2019	Conrad Meyer <cem@FreeBSD.org>	Include ktr.h in more compilation units Similar to r348026, exhaustive search for uses of CTRn() and cross reference ktr.h includes. Where it was obvious that an OS compat header of some kind included ktr.h indirectly, .c files were left alone. Some of these files clearly got ktr.h via header pollution in some scenarios, or tinderbox would not be passing prior to this revision, but go ahead and explicitly include it in files using it anyway. Like r348026, these CUs did not show up in tinderbox as missing the include. Reported by: peterj (arm64/mp_machdep.c) X-MFC-With: r347984 Sponsored by: Dell EMC Isilon
# 5ffc99e2	08-Apr-2019	Konstantin Belousov <kib@FreeBSD.org>	Handle races when remounting UFS volume from ro to rw. In particular, ensure that writers are not unleashed before SU structures are initialized. Also, correctly handle MNT_ASYNC before this. Reported and tested by: pho Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 72d28f97	15-Dec-2018	Kirk McKusick <mckusick@FreeBSD.org>	Reorder ffs_verify_dinode_ckhash() so that it checks the inode check-hash before copying in the inode so that the mode and link-count are not set if the check-hash fails. This change ensures that the vnode will be properly unwound and recycled rather than being held in the cache. Initialize the file mode is zero so that if the loading of the inode fails (for example because of a check-hash failure), the vnode will be properly unwound and recycled. Reported by: Gary Jennejohn (gj) Sponsored by: Netflix
# 8f829a5c	11-Dec-2018	Kirk McKusick <mckusick@FreeBSD.org>	Continuing efforts to provide hardening of FFS. This change adds a check hash to the filesystem inodes. Access attempts to files associated with an inode with an invalid check hash will fail with EINVAL (Invalid argument). Access is reestablished after an fsck is run to find and validate the inodes with invalid check-hashes. This check avoids a class of filesystem panics related to corrupted inodes. The hash is done using crc32c. Note this check-hash is for the inode itself and not any of its indirect blocks. Check-hash validation may be extended to also cover indirect block pointers, but that will be a separate (and more costly) feature. Check hashes are added only to UFS2 and not to UFS1 as UFS1 is primarily used in embedded systems with small memories and low-powered processors which need as light-weight a filesystem as possible. Reviewed by: kib Tested by: Peter Holm Sponsored by: Netflix
# fb14e73c	05-Dec-2018	Kirk McKusick <mckusick@FreeBSD.org>	Normally when an attempt is made to mount a UFS/FFS filesystem whose superblock has a check-hash error, an error message noting the superblock check-hash failure is printed and the mount fails. The administrator then runs fsck to repair the filesystem and when successful, the filesystem can once again be mounted. This approach fails if the filesystem in question is a root filesystem from which you are trying to boot. Here, the loader fails when trying to access the filesystem to get the kernel to boot. So it is necessary to allow the loader to ignore the superblock check-hash error and make a best effort to read the kernel. The filesystem may be suffiently corrupted that the read attempt fails, but there is no harm in trying since the loader makes no attempt to write to the filesystem. Once the kernel is loaded and starts to run, it attempts to mount its root filesystem. Once again, failure means that it breaks to its prompt to ask where to get its root filesystem. Unless you have an alternate root filesystem, you are stuck. Since the root filesystem is initially mounted read-only, it is safe to make an attempt to mount the root filesystem with the failed superblock check-hash. Thus, when asked to mount a root filesystem with a failed superblock check-hash, the kernel prints a warning message that the root filesystem superblock check-hash needs repair, but notes that it is ignoring the error and proceeding. It does mark the filesystem as needing an fsck which prevents it from being enabled for writing until fsck has been run on it. The net effect is that the reboot fails to single user, but at least at that point the administrator has the tools at hand to fix the problem. Reported by: Rick Macklem (rmacklem@) Discussed with: Warner Losh (imp@) Sponsored by: Netflix
# a02bd3e3	25-Nov-2018	Kirk McKusick <mckusick@FreeBSD.org>	Move the check for the filesystem having been run on a kernel that predates metadata check hashes so that it is done before deciding whether to compute a check-hash of the superblock. Reported by: Rick Macklem <rmacklem@uoguelph.ca> Sponsored by: Netflix
# 9fc5d538	13-Nov-2018	Kirk McKusick <mckusick@FreeBSD.org>	In preparation for adding inode check-hashes, clean up and document the libufs interface for fetching and storing inodes. The undocumented getino / putino interface has been replaced with a new getinode / putinode interface. Convert the utilities that had been using the undocumented interface to use the new documented interface. No functional change (as for now the libufs library does not do inode check-hashes). Reviewed by: kib Tested by: Peter Holm Sponsored by: Netflix
# ec888383	23-Oct-2018	Kirk McKusick <mckusick@FreeBSD.org>	Continuing efforts to provide hardening of FFS, this change adds a check hash to the superblock. If a check hash fails when an attempt is made to mount a filesystem, the mount fails with EINVAL (Invalid argument). This avoids a class of filesystem panics related to corrupted superblocks. The hash is done using crc32c. Check hases are added only to UFS2 and not to UFS1 as UFS1 is primarily used in embedded systems with small memories and low-powered processors which need as light-weight a filesystem as possible. Reviewed by: kib Tested by: Peter Holm Sponsored by: Netflix
# 7e038bc2	18-Aug-2018	Kirk McKusick <mckusick@FreeBSD.org>	Replace the TRIM consolodation framework originally added in -r337396 driven by problems found with the algorithms being tested for TRIM consolodation. Reported by: Peter Holm Suggested by: kib Reviewed by: kib Sponsored by: Netflix
# cc91864c	18-Aug-2018	Kirk McKusick <mckusick@FreeBSD.org>	Revert -r337396. It is being replaced with a revised interface that resulted from testing and further reviews.
# 68c49bcc	06-Aug-2018	Kirk McKusick <mckusick@FreeBSD.org>	Put in place the framework for consolodating contiguous blocks into a smaller number of larger TRIM requests. The hope had been to have the full TRIM consolodation in place for 12.0, but the algorithms are still under development and need further testing. With this framework in place it will be possible to easily add TRIM consolodation once the optimal strategy has been found. The only functional change with this patch is the elimination of TRIM requests for blocks that are freed before they have been likely to have been written. Reviewed by: kib Discussed with: Warner Losh and Chuck Silvers Sponsored by: Netflix
# ab0bcb60	29-Jun-2018	Kirk McKusick <mckusick@FreeBSD.org>	Create um_flags in the ufsmount structure to hold flags for a UFS filesystem. Convert integer structure flags to use um_flags: int um_candelete; /* devvp supports TRIM / int um_writesuspended; / suspension in progress / become: #define UM_CANDELETE 0x00000001 / devvp supports TRIM / #define UM_WRITESUSPENDED 0x00000002 / suspension in progress */ This is in preparation for adding other flags to indicate forcible unmount in progress after a disk failure and possibly forcible downgrade to read-only. No functional change intended. Sponsored by: Netflix
# efbf3964	01-Mar-2018	Kirk McKusick <mckusick@FreeBSD.org>	This change is some refactoring of Mark Johnston's changes in r329375 to fix the memory leak that I introduced in r328426. Instead of trying to clear up the possible memory leak in all the clients, I ensure that it gets cleaned up in the source (e.g., ffs_sbget ensures that memory is always freed if it returns an error). The original change in r328426 was a bit sparse in its description. So I am expanding on its description here (thanks cem@ and rgrimes@ for your encouragement for my longer commit messages). In preparation for adding check hashing to superblocks, r328426 is a refactoring of the code to get the reading/writing of the superblock into one place. Unlike the cylinder group reading/writing which ends up in two places (ffs_getcg/ffs_geom_strategy in the kernel and cgget/cgput in libufs), I have the core superblock functions just in the kernel (ffs_sbfetch/ffs_sbput in ffs_subr.c which is already imported into utilities like fsck_ffs as well as libufs to implement sbget/sbput). The ffs_sbfetch and ffs_sbput functions take a function pointer to do the actual I/O for which there are four variants: ffs_use_bread / ffs_use_bwrite for the in-kernel filesystem g_use_g_read_data / g_use_g_write_data for kernel geom clients ufs_use_sa_read for the standalone code (stand/libsa/ufs.c but not stand/libsa/ufsread.c which is size constrained) use_pread / use_pwrite for libufs Uses of these interfaces are in the UFS filesystem, geoms journal & label, libsa changes, and libufs. They also permeate out into the filesystem utilities fsck_ffs, newfs, growfs, clri, dump, quotacheck, fsirand, fstyp, and quot. Some of these utilities should probably be converted to directly use libufs (like dumpfs was for example), but there does not seem to be much win in doing so. Tested by: Peter Holm (pho@)
# 16759360	16-Feb-2018	Mark Johnston <markj@FreeBSD.org>	Fix a memory leak introduced in r328426. ffs_sbget() may return a superblock buffer even if it fails, so the caller must be prepared to free it in this case. Moreover, when tasting alternate superblock locations in a loop, ffs_sbget()'s readfunc callback must free the previously allocated buffer. Reported and tested by: pho Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D14390
# 068beacf	08-Feb-2018	Kirk McKusick <mckusick@FreeBSD.org>	The goal of this change is to prevent accidental foot shooting by folks running filesystems created on check-hash enabled kernels (which I will call "new") on a non-check-hash enabled kernels (which I will call "old). The idea here is to detect when a filesystem is run on an old kernel and flag the filesystem so that when it gets moved back to a new kernel, it will not start getting a slew of check-hash errors. Back when the UFS version 2 filesystem was created, it added a file flag FS_INDEXDIRS that was to be set on any filesystem that kept some sort of on-disk indexing for directories. The idea was precisely to solve the issue we have today. Specifically that a newer kernel that supported indexing would be able to tell that the filesystem had been run on an older non-indexing kernel and that the indexes should not be used until they had been rebuilt. Since we have never implemented on-disk directory indicies, the FS_INDEXDIRS flag is cleared every time any UFS version 2 filesystem ever created is mounted for writing. This commit repurposes the FS_INDEXDIRS flag as the FS_METACKHASH flag. Thus, the FS_METACKHASH is definitively known to have always been cleared. The FS_INDEXDIRS flag has been moved to a new block of flags that will always be cleared starting with this commit (until they get used to implement some future feature which needs to detect that the filesystem was mounted on a kernel that predates the new feature). If a filesystem with check-hashes enabled is mounted on an old kernel the FS_METACKHASH flag is cleared. When that filesystem is mounted on a new kernel it will see that the FS_METACKHASH has been cleared and clears all of the fs_metackhash flags. To get them re-enabled the user must run fsck (in interactive mode without the -y flag) which will ask for each supported check hash whether it should be rebuilt and enabled. When fsck is run in its default preen mode, it will just ignore the check hashes so they will remain disabled. The kernel has always disabled any check hash functions that it does not support, so as more types of check hashes are added, we will get a non-surprising result. Specifically if filesystems get moved to kernels supporting fewer of the check hashes, those that are not supported will be disabled. If the filesystem is moved back to a kernel with more of the check-hashes available and fsck is run interactively to rebuild them, then their checking will resume. Otherwise just the smaller subset will be checked. A side effect of this commit is that filesystems running with cylinder-group check hashes will stop having them checked until fsck is run to re-enable them (since none of them currently have the FS_METACKHASH flag set). So, if you want check hashes enabled on your filesystems after booting a kernel with these changes, you need to run fsck to enable them. Any newly created filesystems will have check hashes enabled. If in doubt as to whether you have check hashes emabled, run dumpfs and look at the list of enabled flags at the end of the superblock details.
# 47806d1b	05-Feb-2018	Kirk McKusick <mckusick@FreeBSD.org>	Occasional cylinder-group check-hash errors were being reported on systems running with a heavy filesystem load. Tracking down this bug was elusive because there were actually two problems. Sometimes the in-memory check hash was wrong and sometimes the check hash computed when doing the read was wrong. The occurrence of either error caused a check-hash mismatch to be reported. The first error was that the check hash in the in-memory cylinder group was incorrect. This error was caused by the following sequence of events: - We read a cylinder-group buffer and the check hash is valid. - We update its cg_time and cg_old_time which makes the in-memory check-hash value invalid but we do not mark the cylinder group dirty. - We do not make any other changes to the cylinder group, so we never mark it dirty, thus do not write it out, and hence never update the incorrect check hash for the in-memory buffer. - Later, the buffer gets freed, but the page with the old incorrect check hash is still in the VM cache. - Later, we read the cylinder group again, and the first page with the old check hash is still in the VM cache, but some other pages are not, so we have to do a read. - The read does not actually get the first page from disk, but rather from the VM cache, resulting in the old check hash in the buffer. - The value computed after doing the read does not match causing the error to be printed. The fix for this problem is to only set cg_time and cg_old_time as the cylinder group is being written to disk. This keeps the in-memory check-hash valid unless the cylinder group has had other modifications which will require it to be written with a new check hash calculated. It also requires that the check hash be recalculated in the in-memory cylinder group when it is marked clean after doing a background write. The second problem was that the check hash computed at the end of the read was incorrect because the calculation of the check hash on completion of the read was being done too soon. - When a read completes we had the following sequence: - bufdone() -- b_ckhashcalc (calculates check hash) -- bufdone_finish() --- vfs_vmio_iodone() (replaces bogus pages with the cached ones) - When we are reading a buffer where one or more pages are already in memory (but not all pages, or we wouldn't be doing the read), the I/O is done with bogus_page mapped in for the pages that exist in the VM cache. This mapping is done to avoid corrupting the cached pages if there is any I/O overrun. The vfs_vmio_iodone() function is responsible for replacing the bogus_page(s) with the cached ones. But we were calculating the check hash before the bogus_page(s) were replaced. Hence, when we were calculating the check hash, we were partly reading from bogus_page, which means we calculated a bad check hash (e.g., because multiple pages have been mapped to bogus_page, so its contents are indeterminate). The second fix is to move the check-hash calculation from bufdone() to bufdone_finish() after the call to vfs_vmio_iodone() so that it computes the check hash over the correct set of pages. With these two changes, the occasional cylinder-group check-hash errors are gone. Submitted by: David Pfitzner <dpfitzner@netflix.com> Reviewed by: kib Tested by: David Pfitzner
# dffce215	25-Jan-2018	Kirk McKusick <mckusick@FreeBSD.org>	Refactoring of reading and writing of the UFS/FFS superblock. Specifically reading is done if ffs_sbget() and writing is done in ffs_sbput(). These functions are exported to libufs via the sbget() and sbput() functions which then used in the various filesystem utilities. This work is in preparation for adding subperblock check hashes. No functional change intended. Reviewed by: kib
# 377f88fb	09-Jan-2018	Konstantin Belousov <kib@FreeBSD.org>	Postpone the disassotiation of the background write buffer with devvp so that buf_complete() sees fully constructed buffer. This is a NOP right now, but will be needed by the forthcoming SU change. Reported and tested by: pho Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 51369649	20-Nov-2017	Pedro F. Giffuni <pfg@FreeBSD.org>	sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.
# 75e3597a	21-Sep-2017	Kirk McKusick <mckusick@FreeBSD.org>	Continuing efforts to provide hardening of FFS, this change adds a check hash to cylinder groups. If a check hash fails when a cylinder group is read, no further allocations are attempted in that cylinder group until it has been fixed by fsck. This avoids a class of filesystem panics related to corrupted cylinder group maps. The hash is done using crc32c. Check hases are added only to UFS2 and not to UFS1 as UFS1 is primarily used in embedded systems with small memories and low-powered processors which need as light-weight a filesystem as possible. Specifics of the changes: sys/sys/buf.h: Add BX_FSPRIV to reserve a set of eight b_xflags that may be used by individual filesystems for their own purpose. Their specific definitions are found in the header files for each filesystem that uses them. Also add fields to struct buf as noted below. sys/kern/vfs_bio.c: It is only necessary to compute a check hash for a cylinder group when it is actually read from disk. When calling bread, you do not know whether the buffer was found in the cache or read. So a new flag (GB_CKHASH) and a pointer to a function to perform the hash has been added to breadn_flags to say that the function should be called to calculate a hash if the data has been read. The check hash is placed in b_ckhash and the B_CKHASH flag is set to indicate that a read was done and a check hash calculated. Though a rather elaborate mechanism, it should also work for check hashing other metadata in the future. A kernel internal API change was to change breada into a static fucntion and add flags and a function pointer to a check-hash function. sys/ufs/ffs/fs.h: Add flags for types of check hashes; stored in a new word in the superblock. Define corresponding BX_ flags for the different types of check hashes. Add a check hash word in the cylinder group. sys/ufs/ffs/ffs_alloc.c: In ffs_getcg do the dance with breadn_flags to get a check hash and if one is provided, check it. sys/ufs/ffs/ffs_vfsops.c: Copy across the BX_FFSTYPES flags in background writes. Update the check hash when writing out buffers that need them. sys/ufs/ffs/ffs_snapshot.c: Recompute check hash when updating snapshot cylinder groups. sys/libkern/crc32.c: lib/libufs/Makefile: lib/libufs/libufs.h: lib/libufs/cgroup.c: Include libkern/crc32.c in libufs and use it to compute check hashes when updating cylinder groups. Four utilities are affected: sbin/newfs/mkfs.c: Add the check hashes when building the cylinder groups. sbin/fsck_ffs/fsck.h: sbin/fsck_ffs/fsutil.c: Verify and update check hashes when checking and writing cylinder groups. sbin/fsck_ffs/pass5.c: Offer to add check hashes to existing filesystems. Precompute check hashes when rebuilding cylinder group (although this will be done when it is written in fsutil.c it is necessary to do it early before comparing with the old cylinder group) sbin/dumpfs/dumpfs.c Print out the new check hash flag(s) sbin/fsdb/Makefile: Needs to add libufs now used by pass5.c imported from fsck_ffs. Reviewed by: kib Tested by: Peter Holm (pho)
# 9c4f551e	28-Jun-2017	Kirk McKusick <mckusick@FreeBSD.org>	Create a new function ffs_getcg() to read in and verify a cylinder group. Change all code points that open-coded this functionality to use the new function. This commit is a refactoring with no change in functionality. In the future this change allows more robust checking of cylinder group reads along the lines discussed in the hardening UFS session at BSDCan (retry I/O, add checksums, etc). For more detail see the session notes at https://wiki.freebsd.org/DevSummit/201706/HardeningUFS Reviewed by: kib
# 4cbc378c	03-Jun-2017	Konstantin Belousov <kib@FreeBSD.org>	Clean possible td_su reference on the struct mount being unmounted as the last step of ffs_unmount(). It is possible that the mount point is recorded for cleanup in AST context while softdep flush is executed during unmount. The workitems are flushed by other means for the unmount, but the stray reference to struct mount blocks destruction of mount. Check for the situation and manually call vfs_rel() before returning from ffs_unmount(). Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 9ed01c32	17-Apr-2017	Gleb Smirnoff <glebius@FreeBSD.org>	All these files need sys/vmmeter.h, but now they got it implicitly included via sys/pcpu.h.
# a96da1c3	04-Apr-2017	Conrad Meyer <cem@FreeBSD.org>	ufs: Export UFS_MAXNAMLEN to pathconf, statfs Rather than the global NAME_MAX constant. This change is required to support systems with a NAME_MAX/MAXNAMLEN that differs from UFS_MAXNAMLEN. This was missed in r313475 due to the alternative spelling ("NAME_MAX") of MAXNAMLEN. This change is also similar in spirit to r313780. Reported by: ngie@ Sponsored by: Dell EMC Isilon
# fbbd9655	28-Feb-2017	Warner Losh <imp@FreeBSD.org>	Renumber copyright clause 4 Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96
# 1dc349ab	15-Feb-2017	Ed Maste <emaste@FreeBSD.org>	prefix UFS symbols with UFS_ to reduce namespace pollution Specifically: ROOTINO -> UFS_ROOTINO WINO -> UFS_WINO NXADDR -> UFS_NXADDR NDADDR -> UFS_NDADDR NIADDR -> UFS_NIADDR MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency) Also prefix ext2's and nandfs's NDADDR and NIADDR with EXT2_ and NANDFS_ Reviewed by: kib, mckusick Obtained from: NetBSD MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D9536
# 714b7df5	13-Nov-2016	Konstantin Belousov <kib@FreeBSD.org>	Provide simple mutual exclusion between mount point update and unmount. Currently mount update keeps vfs_busy(9) reference on the mount point during MNT_UPDATE VFS_MOUNT() vfsops call. This already provides the exclusion, but is problematic for filesystems which need to perform namei(9) during VFS_MOUNT(MNT_UPDATE) operations, e.g. to refresh mnt_from path, because namei(9) must not be called while the vfs_busy(9) reference is owned. Check for MNT_UPDATE flag before setting MNTK_UNMOUNT, and for MNTK_UNMOUNT before entering innards of vfs_domount_update(), failing syscalls with EBUSY if conflict is detected. Keep vfs_busy(9) reference around VFS_MOUNT(MNT_UPDATE) calls still to not change VFS KPI. In the update path in ffs_mount(), drop vfs_busy() reference around namei(), which is now safe due to unmount never executing in parallel with VFS_MOUNT(MNT_UPDATE), and which avoids the deadlock. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# ad544726	28-Oct-2016	Kirk McKusick <mckusick@FreeBSD.org>	Avoid possible overflow when calclating malloc size for auxillary data structure sizes when mounting and reloading UFS/FFS filesystems by using a u_long rather than an int for the size. Reported by: Mariusz Zaborski <oshogbo@> MFC after: 1 week
# 8660b707	30-Sep-2016	Mateusz Guzik <mjg@FreeBSD.org>	vfs: remove the __bo_vnode field from struct vnode The pointer can be obtained using __containerof instead. Reviewed by: kib
# e1db6897	17-Sep-2016	Konstantin Belousov <kib@FreeBSD.org>	Reduce size of ufs inode. Remove redunand i_dev and i_fs pointers, which are available as ip->i_ump->um_dev and ip->i_ump->um_fs, and reorder members by size to reduce padding. To compensate added derefences, the most often i_ump access to differentiate between UFS1 and UFS2 dinode layout is removed, by addition of the new i_flag IN_UFS2. Overall, this actually reduces the amount of memory dereferences. On 64bit machine, original struct inode size is 176, reduced to 152 bytes with the change. Tested by: pho (previous version) Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# df426577	07-Sep-2016	Konstantin Belousov <kib@FreeBSD.org>	Partially lift suspension when ffs_reload() finished with cgs and going to re-read inodes. Secondary write initiators, e.g. ufs_inactive(), might need to start a write while owning the vnode lock. Since the suspended state established by /dev/ufssuspend prevents them from entering vn_start_secondary_write(), we get deadlock otherwise. Note that it is arguably not very useful to re-read inodes after /dev/ufssuspend suspension, because the suspension does not block readers, and other threads might read existing files in parallel with suspension owner (for now, only growfs(8)) operations. This effectively means that suspension owner cannot safely modify existing inodes, and then there is no sense in re-reading. But keep the code enabled for now. Reported and tested by: pho Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# 57d2ac2f	22-May-2016	Kevin Lo <kevlo@FreeBSD.org>	arc4random() returns 0 to (2**32)−1, use an alternative to initialize i_gen if it's zero rather than a divide by 2. With inputs from delphij, mckusick, rmacklem Reviewed by: mckusick
# 5f36cc5b	21-May-2016	Konstantin Belousov <kib@FreeBSD.org>	Stop dropping and reacquiring Giant around geom calls in UFS. Sponsored by: The FreeBSD Foundation
# c70b3cd2	21-May-2016	Konstantin Belousov <kib@FreeBSD.org>	Improve handling of rdev->si_mountpt on mount and unmount of FFS volumes. Treat the field as a semaphore protecting availability of the device for mounting. Do no access devvp->v_rdev without the vnode lock owned. Protect change of the devvp->v_bufobj bo_ops vector with the vnode lock. Reviewed by: bde Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# bfe89a8e	17-May-2016	Konstantin Belousov <kib@FreeBSD.org>	Do enable io accounting for read-only mounts and mounts which are remounted to writeable after initial read-only. Assign to dev->si_mountpt earlier to account the accesses done at the mount time. Based on submission by: bde MFC after: 1 week
# 98cbffd7	17-May-2016	Konstantin Belousov <kib@FreeBSD.org>	Fix comments. Submitted by: bde MFC after: 1 week
# 08e94183	29-Apr-2016	Pedro F. Giffuni <pfg@FreeBSD.org>	UFS: spelling fixes on comments. No functional change.
# c79dff0f	27-Mar-2016	Konstantin Belousov <kib@FreeBSD.org>	Split the global taskqueue used to process all UFS trim completions, into per-mount taskqueue with the private taskqueue processing thread. This allows to drain the taskqueue on unmount, to ensure that all TRIMs are finished before mount structures are freed. But just draining the taskqueue where TRIM biodone geom-up completions are processed is not enough, since ffs_blkfree(), called by the task, might result in more writes. Count inflight delayed blkfree's and pause() unmount until the counter drains as well. Reported by: Nick Evans <nevans@talkpoint.com> Tested by: Nick Evans <nevans@talkpoint.com>, pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# abe53f7e	27-Jan-2016	Kirk McKusick <mckusick@FreeBSD.org>	This fixes a bug in UFS2 exported NFS volumes. An NFS client can crash a server that has exported UFS2 by presenting a filehandle with an inode number that references an uninitialized inode in a cylinder group. The problem is that UFS2 only initializes blocks of inodes as they are first allocated and ffs_fhtovp() does not validate that the inode is in a range of inodes that have been initialized. Attempting to read an uninitialized inode gets random data from the disk. When the kernel tries to interpret it as an inode, panics often arise. Reported by: Christos Zoulas (from NetBSD) Reviewed by: kib
# 43a993bb	29-Nov-2015	Kirk McKusick <mckusick@FreeBSD.org>	For performance reasons, it is useful to have a single string used as the name of a filesystem when setting it as the first parameter to the getnewvnode() function. Most filesystems call getnewvnode from just one place so can use a literal string as the first parameter. However, NFS calls getnewvnode from two places, so we create a global constant string that can be used by the two instances. This change also collapses two instances of getnewvnode() in the UFS filesystem to a single call. Reviewed by: kib Tested by: Peter Holm
# fade8dd7	23-Jul-2015	Jeff Roberson <jeff@FreeBSD.org>	Refactor unmapped buffer address handling. - Use pointer assignment rather than a combination of pointers and flags to switch buffers between unmapped and mapped. This eliminates multiple flags and generally simplifies the logic. - Eliminate b_saveaddr since it is only used with pager bufs which have their b_data re-initialized on each allocation. - Gather up some convenience routines in the buffer cache for manipulating buf space and buf malloc space. - Add an inline, buf_mapped(), to standardize checks around unmapped buffers. In collaboration with: mlaier Reviewed by: kib Tested by: pho (many small revisions ago) Sponsored by: EMC / Isilon Storage Division
# 5f34e93c	05-Jul-2015	Mark Johnston <markj@FreeBSD.org>	Check suspendability on the mountpoint returned by VOP_GETWRITEMOUNT. This obviates the need for a MNTK_SUSPENDABLE flag, since passthrough filesystems like nullfs and unionfs no longer need to inherit this information from their lower layer(s). This change also restores the pre-r273336 behaviour of using the presence of a susp_clean VFS method to request suspension support. Reviewed by: kib, mjg Differential Revision: https://reviews.freebsd.org/D2937
# 521987c3	29-Jun-2015	Konstantin Belousov <kib@FreeBSD.org>	Simplify code, no need to test the flag before clearing it. Submitted by: ed MFC after: 12 days
# b2c3df84	27-Jun-2015	Konstantin Belousov <kib@FreeBSD.org>	Handle errors from background write of the cylinder group blocks. First, on the write error, bufdone() call from ffs_backgroundwrite() panics because pbrelvp() cleared bp->b_bufobj, while brelse() would try to re-dirty the copy of the cg buffer. Handle this by setting B_INVAL for the case of BIO_ERROR. Second, we must re-dirty the real buffer containing the cylinder group block data when background write failed. Real cg buffer was already marked clean in ffs_bufwrite(). After the BV_BKGRDINPROG flag is cleared on the real cg buffer in ffs_backgroundwrite(), buffer scan may reuse the buffer at any moment. The result is lost write, and if the write error was only transient, we get corrupted bitmaps. We cannot re-dirty the original cg buffer in the ffs_backgroundwritedone(), since the context is not sleepable, preventing us from sleeping for origbp' lock. Add BV_BKGDERR flag (protected by the buffer object lock), which is converted into delayed write by brelse(), bqrelse() and buffer scan. In collaboration with: Conrad Meyer <cse.cem@gmail.com> Reviewed by: mckusick Sponsored by: The FreeBSD Foundation (kib), EMC/Isilon storage division (Conrad) MFC after: 2 weeks
# 1eabd967	16-Jun-2015	Konstantin Belousov <kib@FreeBSD.org>	vfs_msync(), called from syncer vnode fsync VOP, only iterates over the active vnode list for the given mount point, with the assumption that vnodes with dirty pages are active. This is enforced by vinactive() doing vm_object_page_clean() pass over the vnode pages. The issue is, if vinactive() cannot be called during vput() due to the vnode being only shared-locked, we might end up with the dirty pages for the vnode on the free list. Such vnode is invisible to syncer, and pages are only cleaned on the vnode reactivation. In other words, the race results in the broken guarantee that user data, written through the mmap(2), is written to the disk not later than in 30 seconds after the write. Fix this by keeping the vnode which is freed but still owing inactivation, on the active list. When syncer loops find such vnode, it is deactivated and cleaned by the final vput() call. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# 69baeadc	29-May-2015	Konstantin Belousov <kib@FreeBSD.org>	Remove several write-only variables, all reported by the gcc 4.9 buildkernel run. Some of them were write-only under some kernel options, e.g. variables keeping values only used by CTR() macros. It costs nothing to the code readability and correctness to eliminate the warnings in those cases too by removing the local cached values used only for single-access. Review: https://reviews.freebsd.org/D2665 Reviewed by: rodrigc Looked at by: bjk Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 74a87c38	24-Apr-2015	Kirk McKusick <mckusick@FreeBSD.org>	Limit the number of cylinder groups that will be searched when trying to build a cluster. The limit is tunable using the sysctl vfs.ffs.maxclustersearch. The current limit is 10 cylinder groups per block allocation. It was previously limited to the number of cylinder groups in the filesystem per block allocation. When there were no clusters of the needed size left, it repeatedly searched the whole filesystem for a non-existent cluster on every block allocation. The result was very slow filesystem allocation with 100% CPU utilization. The old behavior can be had by setting vfs.ffs.maxclustersearch to a huge number (1,000,000). This change affects only the layout policy routines so is not able to interfere with the integrity of the filesystem. Reported by: Dmitry Sivachenko (demon@) Tested by: Dmitry Sivachenko (demon@) MFC after: 2 weeks
# dda11d4a	15-Apr-2015	Rick Macklem <rmacklem@FreeBSD.org>	File systems that do not use the buffer cache (such as ZFS) must use VOP_FSYNC() to perform the NFS server's Commit operation. This patch adds a mnt_kern_flag called MNTK_USES_BCACHE which is set by file systems that use the buffer cache. If this flag is not set, the NFS server always does a VOP_FSYNC(). This should be ok for old file system modules that do not set MNTK_USES_BCACHE, since calling VOP_FSYNC() is correct, although it might not be optimal for file systems that use the buffer cache. Reviewed by: kib MFC after: 2 weeks
# 4af9f77e	27-Mar-2015	Konstantin Belousov <kib@FreeBSD.org>	Fix the hand after the immediate reboot when the following command sequence is performed on UFS SU+J rootfs: cp -Rp /sbin/init /sbin/init.old mv -f /sbin/init.old /sbin/init Hang occurs on the rootfs unmount. There are two issues: 1. Removed init binary, which is still mapped, creates a reference to the removed vnode. The inodeblock for such vnode must have active inodedep, which is (eventually) linked through the unlinked list. This means that ffs_sync(MNT_SUSPEND) cannot succeed, because number of softdep workitems for the mp is always > 0. FFS is suspended during unmount, so unmount just hangs. 2. As noted above, the inodedep is linked eventually. It is not linked until the superblock is written. But at the vfs_unmountall() time, when the rootfs is unmounted, the call is made to ffs_unmount()->ffs_sync() before vflush(), and ffs_sync() only calls ffs_sbupdate() after all workitems are flushed. It is masked for normal system operations, because syncer works in parallel and eventually flushes superblock. Syncer is stopped when rootfs unmounted, so ffs_sync() must do sb update on its own. Correct the issues listed above. For MNT_SUSPEND, count the number of linked unlinked inodedeps (this is not a typo) and substract the count of such workitems from the total. For the second issue, the ffs_sbupdate() is called right after device sync in ffs_sync() loop. There is third problem, occuring with both SU and SU+J. The softdep_waitidle() loop, which waits for softdep_flush() thread to clear the worklist, only waits 20ms max. It seems that the 1 tick, specified for msleep(9), was a typo. Add fsync(devvp, MNT_WAIT) call to softdep_waitidle(), which seems to significantly help the softdep thread, and change the MNT_LAZY update at the reboot time to MNT_WAIT for similar reasons. Note that userspace cannot create more work while devvp is flushed, since the mount point is always suspended before the call to softdep_waitidle() in unmount or remount path. PR: 195458 In collaboration with: gjb, pho Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# 4fce16e4	20-Oct-2014	Mateusz Guzik <mjg@FreeBSD.org>	Provide vfs suspension support only for filesystems which need it, take two. nullfs and unionfs need to request suspension if underlying filesystem(s) use it. Utilize mnt_kern_flag for this purpose. This is a fixup for 273271. No strong objections from: kib Pointy hat to: mjg MFC after: 2 weeks
# 4ce90426	20-Aug-2014	Konstantin Belousov <kib@FreeBSD.org>	Correct the test for condition to suspend UFS filesystem during unmount. There is no need to suspend read-only filesystem, while we need suspension on modificable mount point. Reported by: rwatson Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 895b3782	14-Jul-2014	Konstantin Belousov <kib@FreeBSD.org>	Extract the code to put a filesystem into the suspended state (at the unmount time) in the helper vfs_write_suspend_umnt(). Use it instead of two inline copies in FFS. Fix the bug in the FFS unmount, when suspension failed, the ufs extattrs were not reinitialized. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# e9838c11	03-Jun-2014	John-Mark Gurney <jmg@FreeBSD.org>	don't check fs_flags for _FLAGS_UPDATED as it is stored in fs_old_flags.. If you had a UFS2 FS that didn't have it's super block at SBLOCK_UFS2, you'll end up corrupting your FS as the superblock is updated and written to a different location... makefs used to put the superblock at SBLOCK_UFS1 for UFS 2 FS's causing this issue... Reviewed by: silience from mckusick MFC after: 1 week
# 4896af9f	01-Mar-2014	Pedro F. Giffuni <pfg@FreeBSD.org>	ufs: small formatting fixes. Cleanup some extra space. Use of tabs vs. spaces. No functional change. MFC after: 3 days Reviewed by: mckusick
# cf058082	21-Oct-2013	Brooks Davis <brooks@FreeBSD.org>	Allow kernels without options SOFTUPDATES to build. This should fix the embedded tinderboxes. Reviewed by: emaste
# 519e3c3b	20-Oct-2013	Kirk McKusick <mckusick@FreeBSD.org>	Third of several cleanups to soft dependency implementation. Ensure that softdep_unmount() and softdep_setup_sbupdate() only get called for filesystems running with soft dependencies. No functional change. Tested by: Peter Holm and Scott Long Sponsored by: Netflix
# cc3d8c35	09-Jul-2013	Konstantin Belousov <kib@FreeBSD.org>	There are several code sequences like vfs_busy(mp); vfs_write_suspend(mp); which are problematic if other thread starts unmount between two calls. The unmount starts a write, while vfs_write_suspend() drain writers. On the other hand, unmount drains busy references, causing the deadlock. Add a flag argument to vfs_write_suspend and require the callers of it to specify VS_SKIP_UNMOUNT flag, when the call is performed not in the mount path, i.e. the covered vnode is not locked. The suspension is not attempted if VS_SKIP_UNMOUNT is specified and unmount is in progress. Reported and tested by: Andreas Longwitz <longwitz@incore.de> Sponsored by: The FreeBSD Foundation MFC after: 3 weeks
# 5f3a9c40	02-Jul-2013	Pedro F. Giffuni <pfg@FreeBSD.org>	Style fix: spaces. Cleanup the incomplete revert. Reported by: bde MFC after: 4 weeks
# 9a0aea46	01-Jul-2013	Pedro F. Giffuni <pfg@FreeBSD.org>	Change i_gen in UFS to an unsigned type. Revert the simplification of the i_gen calculation. It is still a good idea to avoid zero values and for the case of old filesystems there is probably no advantage in using the complete 32 bits anyways. Discussed with: bde MFC after: 4 weeks
# bcb2f550	01-Jul-2013	Pedro F. Giffuni <pfg@FreeBSD.org>	Change i_gen in UFS to an unsigned type. Further simplify the i_gen calculation for older disks. Having a zero here is not really a problem and this is more similar to what is done in newfs_random(). Reported by: Xin Li MFC after: 4 weeks
# eee4072f	30-Jun-2013	Pedro F. Giffuni <pfg@FreeBSD.org>	Change i_gen in UFS to an unsigned type. In UFS, i_gen is a random generated value and there is not way for it to be negative. Actually, the value of i_gen is just used to match bit patterns and it is of not consequence if the values are signed or not. Following other filesystems, set it to unsigned and use it as such, Discussed by: mckusick Reviewed by: mckusick (previous version) MFC after: 4 weeks
# 22a72260	30-May-2013	Jeff Roberson <jeff@FreeBSD.org>	- Convert the bufobj lock to rwlock. - Use a shared bufobj lock in getblk() and inmem(). - Convert softdep's lk to rwlock to match the bufobj lock. - Move INFREECNT to b_flags and protect it with the buf lock. - Remove unnecessary locking around bremfree() and BKGRDINPROG. Sponsored by: EMC / Isilon Storage Division Discussed with: mckusick, kib, mdf
# 26089666	06-Apr-2013	Jeff Roberson <jeff@FreeBSD.org>	Prepare to replace the buf splay with a trie: - Don't insert BKGRDMARKER bufs into the splay or dirty/clean buf lists. No consumers need to find them there and it complicates the tree. These flags are all FFS specific and could be moved out of the buf cache. - Use pbgetvp() and pbrelvp() to associate the background and journal bufs with the vp. Not only is this much cheaper it makes more sense for these transient bufs. - Fix the assertions in pbget* and pbrel*. It's not safe to check list pointers which were never initialized. Use the BX flags instead. We also check B_PAGING in reassignbuf() so this should cover all cases. Discussed with: kib, mckusick, attilio Sponsored by: EMC / Isilon Storage Division
# 59a01b70	19-Mar-2013	Konstantin Belousov <kib@FreeBSD.org>	UFS support of the unmapped i/o for the user data buffers. Sponsored by: The FreeBSD Foundation Tested by: pho, scottl, jhb, bf
# ba05dec5	27-Feb-2013	Konstantin Belousov <kib@FreeBSD.org>	The softdep freeblks workitem might hold a reference on the dquot. Current dqflush() panics when a dquot with with non-zero refcount is encountered. The situation is possible, because quotas are turned off before softdep workitem queue if flushed, due to the quota file writes might create softdep workitems. Make the encountering an active dquot in dqflush() not fatal, return the error from quotaoff() instead. Ignore the quotaoff() failures when ffs_flushfiles() is called in the course of softdep_flushfiles() loop, until the last iteration. At the last loop, the quotas must be closed, and because SU workitems should be already flushed, the references to dquot are gone. Sponsored by: The FreeBSD Foundation Reported and tested by: pho Reviewed by: mckusick MFC after: 2 weeks
# ddd6b3fc	10-Jan-2013	Konstantin Belousov <kib@FreeBSD.org>	Add flags argument to vfs_write_resume() and remove vfs_write_resume_flags(). Sponsored by: The FreeBSD Foundation
# c6e0355c	19-Nov-2012	Attilio Rao <attilio@FreeBSD.org>	r16312 is not any longer real since many years (likely since when VFS received granular locking) but the comment present in UFS has been copied all over other filesystems code incorrectly for several times. Removes comments that makes no sense now. Reviewed by: kib MFC after: 3 days
# 1848286a	18-Nov-2012	Edward Tomasz Napierala <trasz@FreeBSD.org>	Add UFS writesuspension mechanism, designed to allow userland processes to modify on-disk metadata for filesystems mounted for write. Reviewed by: kib, mckusick Sponsored by: FreeBSD Foundation
# bc2258da	09-Nov-2012	Attilio Rao <attilio@FreeBSD.org>	Complete MPSAFE VFS interface and remove MNTK_MPSAFE flag. Porters should refer to __FreeBSD_version 1000021 for this change as it may have happened at the same timeframe.
# f7a3729c	22-Jul-2012	Kevin Lo <kevlo@FreeBSD.org>	Use NULL instead of 0 for pointers
# b569050a	30-May-2012	Konstantin Belousov <kib@FreeBSD.org>	Enable vn_io_fault() lock avoidance for UFS. Tested by: pho MFC after: 2 months
# 72b8ff1c	21-Apr-2012	Edward Tomasz Napierala <trasz@FreeBSD.org>	Fix use-after-free introduced in r234036. Reviewed by: mckusick Tested by: pho
# dca5e0ec	20-Apr-2012	Kirk McKusick <mckusick@FreeBSD.org>	This update uses the MNT_VNODE_FOREACH_ACTIVE interface that loops over just the active vnodes associated with a mount point to replace MNT_VNODE_FOREACH_ALL in the vfs_msync, ffs_sync_lazy, and qsync routines. The vfs_msync routine is run every 30 seconds for every writably mounted filesystem. It ensures that any files mmap'ed from the filesystem with modified pages have those pages queued to be written back to the file from which they are mapped. The ffs_lazy_sync and qsync routines are run every 30 seconds for every writably mounted UFS/FFS filesystem. The ffs_lazy_sync routine ensures that any files that have been accessed in the previous 30 seconds have had their access times queued for updating in the filesystem. The qsync routine ensures that any files with modified quotas have those quotas queued to be written back to their associated quota file. In a system configured with 250,000 vnodes, less than 1000 are typically active at any point in time. Prior to this change all 250,000 vnodes would be locked and inspected twice every minute by the syncer. For UFS/FFS filesystems they would be locked and inspected six times every minute (twice by each of these three routines since each of these routines does its own pass over the vnodes associated with a mount point). With this change the syncer now locks and inspects only the tiny set of vnodes that are active. Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks
# 71469bb3	17-Apr-2012	Kirk McKusick <mckusick@FreeBSD.org>	Replace the MNT_VNODE_FOREACH interface with MNT_VNODE_FOREACH_ALL. The primary changes are that the user of the interface no longer needs to manage the mount-mutex locking and that the vnode that is returned has its mutex locked (thus avoiding the need to check to see if its is DOOMED or other possible end of life senarios). To minimize compatibility issues for third-party developers, the old MNT_VNODE_FOREACH interface will remain available so that this change can be MFC'ed to 9. Following the MFC to 9, MNT_VNODE_FOREACH will be removed in head. The reason for this update is to prepare for the addition of the MNT_VNODE_FOREACH_ACTIVE interface that will loop over just the active vnodes associated with a mount point (typically less than 1% of the vnodes associated with the mount point). Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks
# 2b028c25	08-Apr-2012	Edward Tomasz Napierala <trasz@FreeBSD.org>	Fix panic in ffs_reload(), which may happen when read-only filesystem gets resized and then reloaded. Reviewed by: kib, mckusick (earlier version) Sponsored by: The FreeBSD Foundation
# b73ffa31	08-Apr-2012	Kirk McKusick <mckusick@FreeBSD.org>	Drop an unnecessary setting of si_mountpt when updating a UFS mount point. Clearly it must have been set when the mount was done. Reviewed by: kib
# 1faacf5d	28-Mar-2012	Kirk McKusick <mckusick@FreeBSD.org>	Keep track of the mount point associated with a special device to enable the collection of counts of synchronous and asynchronous reads and writes for its associated filesystem. The counts are displayed using `mount -v'. Ensure that buffers used for paging indicate the vnode from which they are operating so that counts of paging I/O operations from the filesystem are collected. This checkin only adds the setting of the mount point for the UFS/FFS filesystem, but it would be trivial to add the setting and clearing of the mount point at filesystem mount/unmount time for other filesystems too. Reviewed by: kib
# ea573a50	28-Mar-2012	Konstantin Belousov <kib@FreeBSD.org>	Do trivial reformatting of the comment to record the missed commit message for r233609: Restore the writes of atimes, quotas and superblock from syncer vnode. Noted by: rdivacky
# a988a5c6	28-Mar-2012	Konstantin Belousov <kib@FreeBSD.org>	Reviewed by: bde, mckusick Tested by: pho MFC after: 2 weeks
# e0c17408	28-Mar-2012	Konstantin Belousov <kib@FreeBSD.org>	Update comment. MFC after: 3 days
# 75a58389	24-Mar-2012	Kirk McKusick <mckusick@FreeBSD.org>	Add a third flags argument to ffs_syncvnode to avoid a possible conflict with MNT_WAIT flags that passed in its second argument. This will be MFC'ed together with r232351. Discussed with: kib
# 19c87af0	07-Feb-2012	Kirk McKusick <mckusick@FreeBSD.org>	In the original days of BSD, a sync was issued on every filesystem every 30 seconds. This spike in I/O caused the system to pause every 30 seconds which was quite annoying. So, the way that sync worked was changed so that when a vnode was first dirtied, it was put on a 30-second cleaning queue (see the syncer_workitem_pending queues in kern/vfs_subr.c). If the file has not been written or deleted after 30 seconds, the syncer pushes it out. As the syncer runs once per second, dirty files are trickled out slowly over the 30-second period instead of all at once by a call to sync(2). The one drawback to this is that it does not cover the filesystem metadata. To handle the metadata, vfs_allocate_syncvnode() is called to create a "filesystem syncer vnode" at mount time which cycles around the cleaning queue being sync'ed every 30 seconds. In the original design, the only things it would sync for UFS were the filesystem metadata: inode blocks, cylinder group bitmaps, and the superblock (e.g., by VOP_FSYNC'ing devvp, the device vnode from which the filesystem is mounted). Somewhere in its path to integration with FreeBSD the flushing of the filesystem syncer vnode got changed to sync every vnode associated with the filesystem. The result of this change is to return to the old filesystem-wide flush every 30-seconds behavior and makes the whole 30-second delay per vnode useless. This change goes back to the originally intended trickle out sync behavior. Key to ensuring that all the intended semantics are preserved (e.g., that all inode updates get flushed within a bounded period of time) is that all inode modifications get pushed to their corresponding inode blocks so that the metadata flush by the filesystem syncer vnode gets them to the disk in a timely way. Thanks to Konstantin Belousov (kib@) for doing the audit and commit -r231122 which ensures that all of these updates are being made. Reviewed by: kib Tested by: scottl MFC after: 2 weeks
# cc672d35	16-Jan-2012	Kirk McKusick <mckusick@FreeBSD.org>	Make sure all intermediate variables holding mount flags (mnt_flag) and that all internal kernel calls passing mount flags are declared as uint64_t so that flags in the top 32-bits are not lost. MFC after: 2 weeks
# b60ee81e	14-Jan-2012	Kirk McKusick <mckusick@FreeBSD.org>	Convert FFS mount error messages from kernel printf's to using the vfs_mount_error error message facility provided by the nmount interface. Clean up formatting of mount warnings which still need to use kernel printf's since they do not return errors. Requested by: Craig Rodrigues <rodrigc@crodrigues.org> MFC after: 2 weeks
# fddf7bae	29-Jul-2011	Kirk McKusick <mckusick@FreeBSD.org>	Update to -r224294 to ensure that only one of MNT_SUJ or MNT_SOFTDEP is set so that mount can revert back to using MNT_NOWAIT when doing getmntinfo. Approved by: re (kib)
# d716efa9	24-Jul-2011	Kirk McKusick <mckusick@FreeBSD.org>	Move the MNTK_SUJ flag in mnt_kern_flag to MNT_SUJ in mnt_flag so that it is visible to userland programs. This change enables the `mount' command with no arguments to be able to show if a filesystem is mounted using journaled soft updates as opposed to just normal soft updates. Approved by: re (bz)
# 927a12ae	15-Jul-2011	Kirk McKusick <mckusick@FreeBSD.org>	Add an FFS specific mount option to allow a filesystem checker (typically fsck_ffs) to register that it wishes to use FFS specific sysctl's to update the filesystem. This ensures that two checkers cannot run on a given filesystem at the same time and that no other process accidentally or maliciously uses the filesystem updating sysctls inappropriately. This functionality is needed by the journaling soft-updates recovery code.
# 8795189c	09-Jul-2011	Kirk McKusick <mckusick@FreeBSD.org>	Allow disk partitions associated with UFS read-only mounted filesystems to be opened for writing. This functionality used to be special-cased for just the root filesystem, but with this change is now available for all UFS filesystems. This change is needed for journaled soft updates recovery. Discussed with: Jeff Roberson
# 9420dc62	12-Jun-2011	Kirk McKusick <mckusick@FreeBSD.org>	Disable the soft updates journaling after a filesystem is successfully downgraded to read-only. It will be restarted if the filesystem is upgraded back to read-write.
# 280e091a	10-Jun-2011	Jeff Roberson <jeff@FreeBSD.org>	Implement fully asynchronous partial truncation with softupdates journaling to resolve errors which can cause corruption on recovery with the old synchronous mechanism. - Append partial truncation freework structures to indirdeps while truncation is proceeding. These prevent new block pointers from becoming valid until truncation completes and serialize truncations. - On completion of a partial truncate journal work waits for zeroed pointers to hit indirects. - softdep_journal_freeblocks() handles last frag allocation and last block zeroing. - vtruncbuf/ffs_page_remove moved into softdep_*_freeblocks() so it is only implemented in one place. - Block allocation failure handling moved up one level so it does not proceed with buf locks held. This permits us to do more extensive reclaims when filesystem space is exhausted. - softdep_sync_metadata() is broken into two parts, the first executes once at the start of ffs_syncvnode() and flushes truncations and inode dependencies. The second is called on each locked buf. This eliminates excessive looping and rollbacks. - Improve the mechanism in process_worklist_item() that handles acquiring vnode locks for handle_workitem_remove() so that it works more generally and does not loop excessively over the same worklist items on each call. - Don't corrupt directories by zeroing the tail in fsck. This is only done for regular files. - Push a fsync complete record for files that need it so the checker knows a truncation in the journal is no longer valid. Discussed with: mckusick, kib (ffs_pages_remove and ffs_truncate parts) Tested by: pho
# 694a586a	21-May-2011	Rick Macklem <rmacklem@FreeBSD.org>	Add a lock flags argument to the VFS_FHTOVP() file system method, so that callers can indicate the minimum vnode locking requirement. This will allow some file systems to choose to return a LK_SHARED locked vnode when LK_SHARED is specified for the flags argument. This patch only adds the flag. It does not change any file system to use it and all callers specify LK_EXCLUSIVE, so file system semantics are not changed. Reviewed by: kib
# 16b1f68d	20-Mar-2011	Konstantin Belousov <kib@FreeBSD.org>	Retire opt_ffs_broken_fixme.h. Instead of directly calling ffs_snapgone(), use UFS_SNAPGONE() with usual layering. Requested by: bde MFC after: 1 week
# 8c2a54de	28-Dec-2010	Konstantin Belousov <kib@FreeBSD.org>	Add kernel side support for BIO_DELETE/TRIM on UFS. The FS_TRIM fs flag indicates that administrator requested issuing of TRIM commands for the volume. UFS will only send the command to disk if the disk reports GEOM::candelete attribute. Since disk queue is reordered, data block is marked as free in the bitmap only after TRIM command completed. Due to need to sleep waiting for i/o to finish, TRIM bio_done routine schedules taskqueue to set the bitmap bit. Based on the patch by: mckusick Reviewed by: mckusick, pjd Tested by: pho MFC after: 1 month
# fddd463d	01-Dec-2010	Konstantin Belousov <kib@FreeBSD.org>	Journal start looks up .sujournal file by doing lookup on the root dvp. As result, failed softdep_mount() might leave up to two vnodes on the mp mountlist, preventing mnt_ref from going to zero. Call ffs_flushfiles() after failed softdep_mount() to clean mountlist. Initial report by: Garrett Cooper Reproduced and tested by: pho
# a7d5f7eb	19-Oct-2010	Jamie Gritton <jamie@FreeBSD.org>	A new jail(8) with a configuration file, to replace the work currently done by /etc/rc.d/jail.
# d0cc54f3	10-Oct-2010	Konstantin Belousov <kib@FreeBSD.org>	The r184588 changed the layout of struct export_args, causing an ABI breakage for old mount(2) syscall, since most struct <filesystem>_args embed export_args. The mount(2) is supposed to provide ABI compatibility for pre-nmount mount(8) binaries, so restore ABI to pre-r184588. Requested and reviewed by: bde MFC after: 2 weeks
# 59b3a4eb	17-Sep-2010	David E. O'Brien <obrien@FreeBSD.org>	Correct some non-code typos.
# 3634d5b2	20-Aug-2010	John Baldwin <jhb@FreeBSD.org>	Add dedicated routines to toggle lockmgr flags such as LK_NOSHARE and LK_CANRECURSE after a lock is created. Use them to implement macros that otherwise manipulated the flags directly. Assert that the associated lockmgr lock is exclusively locked by the current thread when manipulating these flags to ensure the flag updates are safe. This last change required some minor shuffling in a few filesystems to exclusively lock a brand new vnode slightly earlier. Reviewed by: kib MFC after: 3 days
# 61e1c193	16-Jul-2010	John Baldwin <jhb@FreeBSD.org>	Revert the previous commit. The race is not applicable to the lockmgr implementation in 8.0 and later as its flags field does not hold dynamic state such as waiters flags, but is only modified in lockinit() aside from VN_LOCK_*(). Discussed with: attilio
# dbfcf8cf	16-Jul-2010	John Baldwin <jhb@FreeBSD.org>	When the MNTK_EXTENDED_SHARED mount option was added, some filesystems were changed to defer the setting of VN_LOCK_ASHARE() (which clears LK_NOSHARE in the vnode lock's flags) until after they had determined if the vnode was a FIFO. This occurs after the vnode has been inserted a VFS hash or some similar table, so it is possible for another thread to find this vnode via vget() on an i-node number and block on the vnode lock. If the lockmgr interlock (vnode interlock for vnode locks) is not held when clearing the LK_NOSHARE flag, then the lk_flags field can be clobbered. As a result the thread blocked on the vnode lock may never get woken up. Fix this by holding the vnode interlock while modifying the lock flags in this case. MFC after: 3 days
# 710b73c8	23-May-2010	Andriy Gapon <avg@FreeBSD.org>	MFC r208293: ffs_mount: accept and drop userland-only options that can be passed from loader(8) PR: kern/141050
# 57da3de3	23-May-2010	Andriy Gapon <avg@FreeBSD.org>	MFC r207366: ffs_vfsops: restore alphabetic order of options in ffs_opts
# 0b962648	19-May-2010	Andriy Gapon <avg@FreeBSD.org>	ffs_mount: accept and drop userland-only options that can be passed from loader(8) In r193192 loader(8) has grown an ability to pass root mount options from fstab via vfs.root.mountfrom.options. Unfortunately, some options that can be present in fstab are for userland only and lead to root mounting failure when seen by kernel. Rather than teaching loader about FFS-specific options that should be filtered out, ffs_mount recognizes those options as valid, but ignores and deletes[1] them. [1] is suggested by jh. PR: kern/141050 Reported by: many Reviewed by: jh, bde MFC after: 4 days
# deb3b115	29-Apr-2010	Andriy Gapon <avg@FreeBSD.org>	ffs_vfsops: restore alphabetic order of options in ffs_opts The order was not correct only for nfsv4acls. ("no" prefix is ignored) MFC after: 1 week
# 113db2dd	24-Apr-2010	Jeff Roberson <jeff@FreeBSD.org>	- Merge soft-updates journaling from projects/suj/head into head. This brings in support for an optional intent log which eliminates the need for background fsck on unclean shutdown. Sponsored by: iXsystems, Yahoo!, and Juniper. With help from: McKusick and Peter Holm
# 0718d64d	18-Apr-2010	Edward Tomasz Napierala <trasz@FreeBSD.org>	MFC r200796: Implement NFSv4 ACL support for UFS. Reviewed by: rwatson
# 8eb59aff	12-Apr-2010	Andriy Gapon <avg@FreeBSD.org>	MFC r206128: ffs_mount: remove redundant assignment of to devvp.v_bufobj
# ecaf3257	03-Apr-2010	Andriy Gapon <avg@FreeBSD.org>	ffs_mount: remove redundant assignment of geom consumer to devvp.v_bufobj The assignment is already done in g_vfs_open. Redundant assignment is harmless, but can become a problem if g_vfs_open logic is changed. MFC after: 1 week
# 61996181	10-Feb-2010	Edward Tomasz Napierala <trasz@FreeBSD.org>	Remove unused variable.
# 9340fc72	21-Dec-2009	Edward Tomasz Napierala <trasz@FreeBSD.org>	Implement NFSv4 ACL support for UFS. Reviewed by: rwatson
# b67ca899	09-Sep-2009	Konstantin Belousov <kib@FreeBSD.org>	MFC r196920: insmntque_stddtr() clears vp->v_data and resets vp->v_op to dead_vnodeops before calling vgone(). Revert r189706 and corresponding part of the r186560. Approved by: re (kensmith)
# 6cc745d2	07-Sep-2009	Konstantin Belousov <kib@FreeBSD.org>	insmntque_stddtr() clears vp->v_data and resets vp->v_op to dead_vnodeops before calling vgone(). Revert r189706 and corresponding part of the r186560. Noted and reviewed by: tegge Approved by: des (pseudofs part) MFC after: 3 days
# bcf11e8d	05-Jun-2009	Robert Watson <rwatson@FreeBSD.org>	Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd
# dfd233ed	11-May-2009	Attilio Rao <attilio@FreeBSD.org>	Remove the thread argument from the FSD (File-System Dependent) parts of the VFS. Now all the VFS_* functions and relating parts don't want the context as long as it always refers to curthread. In some points, in particular when dealing with VOPs and functions living in the same namespace (eg. vflush) which still need to be converted, pass curthread explicitly in order to retain the old behaviour. Such loose ends will be fixed ASAP. While here fix a bug: now, UFS_EXTATTR can be compiled alone without the UFS_EXTATTR_AUTOSTART option. VFS KPI is heavilly changed by this commit so thirdy parts modules needs to be recompiled. Bump __FreeBSD_version in order to signal such situation.
# c1d8b5e8	16-Mar-2009	Konstantin Belousov <kib@FreeBSD.org>	Fix two issues with bufdaemon, often causing the processes to hang in the "nbufkv" sleep. First, ffs background cg group block write requests a new buffer for the shadow copy. When ffs_bufwrite() is called from the bufdaemon due to buffers shortage, requesting the buffer deadlock bufdaemon. Introduce a new flag for getnewbuf(), GB_NOWAIT_BD, to request getblk to not block while allocating the buffer, and return failure instead. Add a flag argument to the geteblk to allow to pass the flags to getblk(). Do not repeat the getnewbuf() call from geteblk if buffer allocation failed and either GB_NOWAIT_BD is specified, or geteblk() is called from bufdaemon (or its helper, see below). In ffs_bufwrite(), fall back to synchronous cg block write if shadow block allocation failed. Since r107847, buffer write assumes that vnode owning the buffer is locked. The second problem is that buffer cache may accumulate many buffers belonging to limited number of vnodes. With such workload, quite often threads that own the mentioned vnodes locks are trying to read another block from the vnodes, and, due to buffer cache exhaustion, are asking bufdaemon for help. Bufdaemon is unable to make any substantial progress because the vnodes are locked. Allow the threads owning vnode locks to help the bufdaemon by doing the flush pass over the buffer cache before getnewbuf() is going to uninterruptible sleep. Move the flushing code from buf_daemon() to new helper function buf_do_flush(), that is called from getnewbuf(). The number of buffers flushed by single call to buf_do_flush() from getnewbuf() is limited by new sysctl vfs.flushbufqtarget. Prevent recursive calls to buf_do_flush() by marking the bufdaemon and threads that temporarily help bufdaemon by TDP_BUFNEED flag. In collaboration with: pho Reviewed by: tegge (previous version) Tested by: glebius, yandex ... MFC after: 3 weeks
# e65f5a4e	11-Mar-2009	Konstantin Belousov <kib@FreeBSD.org>	The non-modifying EA VOPs are executed with only shared vnode lock taken. Provide a custom lock around initializing and tearing down EA area, to prevent both memory leaks and double-free of it. Count the number of EA area accessors. Lock protocol requires either holding exclusive vnode lock to modify i_ea_area, or shared vnode lock and owning IN_EA_LOCKED flag in i_flag. Noted by: YAMAMOTO, Taku <taku tackymt homeip net> Tested by: pho (previous version) MFC after: 2 weeks
# a9d95371	11-Mar-2009	Konstantin Belousov <kib@FreeBSD.org>	Do not double-free the struct inode when insmntque failed. Default insmntque destructor reclaims the vnode, and ufs_reclaim frees the memory. Reviewed by: tegge MFC after: 3 days
# 33fc3625	11-Mar-2009	John Baldwin <jhb@FreeBSD.org>	Add a new internal mount flag (MNTK_EXTENDED_SHARED) to indicate that a filesystem supports additional operations using shared vnode locks. Currently this is used to enable shared locks for open() and close() of read-only file descriptors. - When an ISOPEN namei() request is performed with LOCKSHARED, use a shared vnode lock for the leaf vnode only if the mount point has the extended shared flag set. - Set LOCKSHARED in vn_open_cred() for requests that specify O_RDONLY but not O_CREAT. - Use a shared vnode lock around VOP_CLOSE() if the file was opened with O_RDONLY and the mountpoint has the extended shared flag set. - Adjust md(4) to upgrade the vnode lock on the vnode it gets back from vn_open() since it now may only have a shared vnode lock. - Don't enable shared vnode locks on FIFO vnodes in ZFS and UFS since FIFO's require exclusive vnode locks for their open() and close() routines. (My recent MPSAFE patches for UDF and cd9660 already included this change.) - Enable extended shared operations on UFS, cd9660, and UDF. Submitted by: ups Reviewed by: pjd (ZFS bits) MFC after: 1 month
# 5bd65606	09-Mar-2009	John Baldwin <jhb@FreeBSD.org>	Adjust some variables (mostly related to the buffer cache) that hold address space sizes to be longs instead of ints. Specifically, the follow values are now longs: runningbufspace, bufspace, maxbufspace, bufmallocspace, maxbufmallocspace, lobufspace, hibufspace, lorunningspace, hirunningspace, maxswzone, maxbcache, and maxpipekva. Previously, a relatively small number (~ 44000) of buffers set in kern.nbuf would result in integer overflows resulting either in hangs or bogus values of hidirtybuffers and lodirtybuffers. Now one has to overflow a long to see such problems. There was a check for a nbuf setting that would cause overflows in the auto-tuning of nbuf. I've changed it to always check and cap nbuf but warn if a user-supplied tunable would cause overflow. Note that this changes the ABI of several sysctls that are used by things like top(1), etc., so any MFC would probably require a some gross shims to allow for that. MFC after: 1 month
# 4f560d75	23-Feb-2009	Edward Tomasz Napierala <trasz@FreeBSD.org>	Right now, when trying to unmount a device that's already gone, msdosfs_unmount() and ffs_unmount() exit early after getting ENXIO. However, dounmount() treats ENXIO as a success and proceeds with unmounting. In effect, the filesystem gets unmounted without closing GEOM provider etc. Reviewed by: kib Approved by: rwatson (mentor) Tested by: dho Sponsored by: FreeBSD Foundation
# 3c140b2d	23-Feb-2009	Edward Tomasz Napierala <trasz@FreeBSD.org>	Refactor, moving error checking outside of the 'if (mp->mnt_flag & MNT_SOFTDEP)' conditional. No functional changes. Reviewed by: kib Approved by: rwatson (mentor) Tested by: pho Sponsored by: FreeBSD Foundation
# ee445a69	11-Feb-2009	John Baldwin <jhb@FreeBSD.org>	- If the g_access() call for the initial root mount fails, then fully cleanup. Before the GEOM consumer would not have been closed. - Bump the reference on the character device being mounted while the associated devfs vnode is locked. Reviewed by: kib
# 49c4791c	29-Jan-2009	Edward Tomasz Napierala <trasz@FreeBSD.org>	Make sure the cdev doesn't go away while the filesystem is still mounted. Otherwise dev2udev() could return garbage. Reviewed by: kib Approved by: rwatson (mentor) Sponsored by: FreeBSD Foundation
# df86ccf6	07-Jan-2009	Konstantin Belousov <kib@FreeBSD.org>	If unmount of the ffs mp failed, reinitialize the extended attributes for the mp, and restart them if autostart is enabled. Reported and tested by: pho Reviewed by: rwatson MFC after: 3 weeks
# 15bc6b2b	28-Oct-2008	Edward Tomasz Napierala <trasz@FreeBSD.org>	Introduce accmode_t. This is required for NFSv4 ACLs - it will be neccessary to add more V* constants, and the variables changed by this patch were often being assigned to mode_t variables, which is 16 bit. Approved by: rwatson (mentor)
# 1ede983c	23-Oct-2008	Dag-Erling Smørgrav <des@FreeBSD.org>	Retire the MALLOC and FREE macros. They are an abomination unto style(9). MFC after: 3 months
# d7f03759	19-Oct-2008	Ulf Lilleengen <lulf@FreeBSD.org>	- Import the HEAD csup code which is the basis for the cvsmode work.
# 0d7935fd	10-Oct-2008	Attilio Rao <attilio@FreeBSD.org>	Remove the struct thread unuseful argument from bufobj interface. In particular following functions KPI results modified: - bufobj_invalbuf() - bufsync() and BO_SYNC() "virtual method" of the buffer objects set. Main consumers of bufobj functions are affected by this change too and, in particular, functions which changed their KPI are: - vinvalbuf() - g_vfs_close() Due to the KPI breakage, __FreeBSD_version will be bumped in a later commit. As a side note, please consider just temporary the 'curthread' argument passing to VOP_SYNC() (in bufsync()) as it will be axed out ASAP Reviewed by: kib Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
# f8886347	24-Sep-2008	John Baldwin <jhb@FreeBSD.org>	Enable shared lookups on UFS. There are some remaining issues with forced unmounts, but those are in the VFS lookup code are not UFS specific. Tested by: pho, kris
# 6fecb4e4	16-Sep-2008	Konstantin Belousov <kib@FreeBSD.org>	Suspend the write operations on the UFS filesystem being unmounted or remounted from rw to ro. Proposed and reviewed by: tegge In collaboration with: pho MFC after: 1 month
# 2814d5ba	16-Sep-2008	Konstantin Belousov <kib@FreeBSD.org>	When attempt is made to suspend a filesystem that is already syspended, wait until the current suspension is lifted instead of silently returning success immediately. The consequences of calling vfs_write() resume when not owning the suspension are not well-defined at best. Add the vfs_susp_clean() mount method to be called from vfs_write_resume(). Set it to process_deferred_inactive() for ffs, and stop calling it manually. Add the thread flag TDP_IGNSUSP that allows to bypass the suspension point in the vn_start_write. It is intended for use by VFS in the situations where the suspender want to do some i/o requiring calls to vn_start_write(), and this i/o cannot be done later. Reviewed by: tegge In collaboration with: pho MFC after: 1 month
# 52dfc8d7	16-Sep-2008	Konstantin Belousov <kib@FreeBSD.org>	Add the ffs structures introspection functions for ddb. Show the b_dep value for the buffer in the show buffer command. Add a comand to dump the dirty/clean buffer list for vnode. Reviewed by: tegge Tested and used by: pho MFC after: 1 month
# 90446e36	16-Sep-2008	Konstantin Belousov <kib@FreeBSD.org>	When downgrading the read-write mount to read-only, do_unmount() sets MNT_RDONLY flag before the VFS_MOUNT() is called. In ufs_inactive() and ufs_itimes_locked(), UFS verifies whether the fs is read-only by checking MNT_RDONLY, but this may cause loss of the IN_MODIFIED flag for inode on the fs being remounted rw->ro. Introduce UFS_RDONLY() struct ufsmount' method that reports the value of the fs_ronly. The later is set to 1 only after the remount is finished. Reviewed by: tegge In collaboration with: pho MFC after: 1 month
# 7b7ed832	28-Aug-2008	Konstantin Belousov <kib@FreeBSD.org>	Softdep code may need to instantiate vnode when processing dependencies. In particular, it may need this while syncing filesystem being unmounted. Since during unmount MNTK_NOINSMNTQUE flag is set, that could sometimes disallow insertion of the vnode into the vnode mount list, softdep code needs to overwrite the MNTK_NOINSMNTQUE flag. Create the ffs_vgetf() function that sets the VV_FORCEINSMQ flag for new vnode and use it consistently from the softdep code instead of ffs_vget(). Add the retry logic to the softdep_flushfiles() to flush the vnodes that could be instantiated while flushing softdep dependencies. Tested by: pho, kris Reviewed by: tegge MFC after: 1 month
# e792b09b	09-Aug-2008	Konstantin Belousov <kib@FreeBSD.org>	Revert r181345. Move the NULL pointer check to the vfs_deleteopt() function. Discussed with: rodrigc MFC after: 3 days
# a1a917e0	06-Aug-2008	Konstantin Belousov <kib@FreeBSD.org>	User may do "mount -o snapshot ...", that causes new FFS mount to be performed with snapshot option, while the mp->mnt_opt is NULL. Protect against NULL pointer dereference. Noted by: Mateusz Guzik <mjguzik gmail com> MFC after: 3 days
# a80d8caa	19-Jul-2008	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Say hi to svn, by simplifing ffs_vget() function a bit - there is no need for a variable that is used only once.
# fb77e0af	23-May-2008	Craig Rodrigues <rodrigc@FreeBSD.org>	After converting the "snapshot" mount option to the MNT_SNAPSHOT flag, delete "snapshot" from the persistent mount options list. This should fix problems with doing a mount -o snapshot of a file system, followed by an NFS export of the same file system. PR: 122833 Reported by: Leon Kos <leon.kos lecad fs uni-lj si>, Jaakko Heinonen <jh saunalahti fi> MFC after: 1 month
# 02a871f1	23-May-2008	Craig Rodrigues <rodrigc@FreeBSD.org>	For the following mount options, do not perform the string to flag conversions here, because we already do them further up in vfs_donmount() in vfs_mount.c async -> MNT_ASYNC force -> MNT_FORCE multilabel -> MNT_MULTILABEL noatime -> MNT_NOATIME noclusterr -> MNT_NOCLUSTERR noclusterw -> MNT_NOCLUSTERW MFC after: 1 month
# d952ba1b	26-Mar-2008	John Baldwin <jhb@FreeBSD.org>	Fix a nit with the 'nofoo' options where 'foo' is mapped to 'nonofoo' (such as 'atime' vs 'noatime'). The filesystems will always see either 'nofoo' or 'nonofoo', never plain 'foo'. As such, their list of valid mount options should include 'nofoo' instead of 'foo'. With this fix, you can do 'mount -u -o atime' on a FFS filesystem that isn't marked as noatime without getting an error. You can also update a noatime FFS filesystem mounted via mount(2) (e.g. 6.x /sbin/mount binary) to 'atime' using nmount(2) (e.g. 7.x /sbin/mount binary). MFC after: 1 week Reviewed by: crodig
# 698b1a66	22-Mar-2008	Jeff Roberson <jeff@FreeBSD.org>	- Complete part of the unfinished bufobj work by consistently using BO_LOCK/UNLOCK/MTX when manipulating the bufobj. - Create a new lock in the bufobj to lock bufobj fields independently. This leaves the vnode interlock as an 'identity' lock while the bufobj is an io lock. The bufobj lock is ordered before the vnode interlock and also before the mnt ilock. - Exploit this new lock order to simplify softdep_check_suspend(). - A few sync related functions are marked with a new XXX to note that we may not properly interlock against a non-zero bv_cnt when attempting to sync all vnodes on a mountlist. I do not believe this race is important. If I'm wrong this will make these locations easier to find. Reviewed by: kib (earlier diff) Tested by: kris, pho (earlier diff)
# e7fd8877	05-Mar-2008	Konstantin Belousov <kib@FreeBSD.org>	Initialize mnt_stat.f_iosize before autostarting UFS1 extattrs. It is normally initialized by ffs_statfs() after ffs_mount finished. The extattr autostart code calls the ufs_lookup(), that uses value above to iterate over the directory blocks, see bmask initialization in the ufs_lookup() and ufsdirhash. Having the filesystem with root directory spanning more then one block would result in reading a random kernel memory. PR: kern/120781 Test case provided by: rwatson MFC after: 1 week
# 6cf7bc60	03-Mar-2008	Robert Watson <rwatson@FreeBSD.org>	Move setting of MNTK_MPSAFE flag before UFS1 extended attribute auto-start so that the flag is set before we start performing I/O in the auto-start routine. MFC after: 2 weeks Suggested by: kib
# 628f51d2	24-Feb-2008	Attilio Rao <attilio@FreeBSD.org>	Introduce some functions in the vnode locks namespace and in the ffs namespace in order to handle lockmgr fields in a controlled way instead than spreading all around bogus stubs: - VN_LOCK_AREC() allows lock recursion for a specified vnode - VN_LOCK_ASHARE() allows lock sharing for a specified vnode In FFS land: - BUF_AREC() allows lock recursion for a specified buffer lock - BUF_NOREC() disallows recursion for a specified buffer lock Side note: union_subr.c::unionfs_node_update() is the only other function directly handling lockmgr fields. As this is not simple to fix, it has been left behind as "sole" exception.
# 0e9eb108	23-Jan-2008	Attilio Rao <attilio@FreeBSD.org>	Cleanup lockmgr interface and exported KPI: - Remove the "thread" argument from the lockmgr() function as it is always curthread now - Axe lockcount() function as it is no longer used - Axe LOCKMGR_ASSERT() as it is bogus really and no currently used. Hopefully this will be soonly replaced by something suitable for it. - Remove the prototype for dumplockinfo() as the function is no longer present Addictionally: - Introduce a KASSERT() in lockstatus() in order to let it accept only curthread or NULL as they should only be passed - Do a little bit of style(9) cleanup on lockmgr.h KPI results heavilly broken by this change, so manpages and FreeBSD_version will be modified accordingly by further commits. Tested by: matteo
# d638e093	19-Jan-2008	Attilio Rao <attilio@FreeBSD.org>	- Introduce the function lockmgr_recursed() which returns true if the lockmgr lkp, when held in exclusive mode, is recursed - Introduce the function BUF_RECURSED() which does the same for bufobj locks based on the top of lockmgr_recursed() - Introduce the function BUF_ISLOCKED() which works like the counterpart VOP_ISLOCKED(9), showing the state of lockmgr linked with the bufobj BUF_RECURSED() and BUF_ISLOCKED() entirely replace the usage of bogus BUF_REFCNT() in a more explicative and SMP-compliant way. This allows us to axe out BUF_REFCNT() and leaving the function lockcount() totally unused in our stock kernel. Further commits will axe lockcount() as well as part of lockmgr() cleanup. KPI results, obviously, broken so further commits will update manpages and freebsd version. Tested by: kris (on UFS and NFS)
# 22db15c0	13-Jan-2008	Attilio Rao <attilio@FreeBSD.org>	VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
# cb05b60a	09-Jan-2008	Attilio Rao <attilio@FreeBSD.org>	vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed. Manpage and FreeBSD_version will be updated through further commits. As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock. Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>
# 30d239bc	24-Oct-2007	Robert Watson <rwatson@FreeBSD.org>	Merge first in a series of TrustedBSD MAC Framework KPI changes from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer
# 77465d93	16-Oct-2007	Alfred Perlstein <alfred@FreeBSD.org>	Get rid of qaddr_t. Requested by: bde
# 04533fc6	04-Apr-2007	Xin LI <delphij@FreeBSD.org>	Use *_EMPTY macros when appropriate.
# 36d46679	20-Mar-2007	Konstantin Belousov <kib@FreeBSD.org>	Mark UFS as being MP-Safe in "options QUOTA" case too. Remove no more neccessary Giant acquisions in softdepend processing code. Tested by: Peter Holm Reviewed by: tegge Approved by: re (kensmith)
# 088ffd20	14-Mar-2007	Konstantin Belousov <kib@FreeBSD.org>	Implement fine-grained locking for UFS quotas. Each struct dquot gets dq_lock mutex to protect dq_flags and to interlock with DQ_LOCK. qhash, dqfreelist and dq.dq_cnt are protected by global dqhlock mutex. i_dquot array for inode is protected by lockmgr' vnode lock, corresponding assert added to the dqget(). Access to struct ufsmount quota-related fields (um_quotas and um_qflags) is protected by um_lock. Tested by: Peter Holm Reviewed by: tegge Approved by: re (kensmith) This work were not possible without enormous amount of help given by Tor Egge and Peter Holm. Tor reviewed each version of patch, pointed out numerous errors and provided invaluable suggestions. Peter did tireless testing of the patch as it was developed.
# 61b9d89f	12-Mar-2007	Tor Egge <tegge@FreeBSD.org>	Make insmntque() externally visibile and allow it to fail (e.g. during late stages of unmount). On failure, the vnode is recycled. Add insmntque1(), to allow for file system specific cleanup when recycling vnode on failure. Change getnewvnode() to no longer call insmntque(). Previously, embryonic vnodes were put onto the list of vnode belonging to a file system, which is unsafe for a file system marked MPSAFE. Change vfs_hash_insert() to no longer lock the vnode. The caller now has that responsibility. Change most file systems to lock the vnode and call insmntque() or insmntque1() after a new vnode has been sufficiently setup. Handle failed insmntque*() calls by propagating errors to callers, possibly after some file system specific cleanup. Approved by: re (kensmith) Reviewed by: kib In collaboration with: kib
# 10bcafe9	15-Feb-2007	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Move vnode-to-file-handle translation from vfs_vptofh to vop_vptofh method. This way we may support multiple structures in v_data vnode field within one file system without using black magic. Vnode-to-file-handle should be VOP in the first place, but was made VFS operation to keep interface as compatible as possible with SUN's VFS. BTW. Now Solaris also implements vnode-to-file-handle as VOP operation. VFS_VPTOFH() was left for API backward compatibility, but is marked for removal before 8.0-RELEASE. Approved by: mckusick Discussed with: many (on IRC) Tested with: ufs, msdosfs, cd9660, nullfs and zfs
# 2cc7d26f	23-Jan-2007	Konstantin Belousov <kib@FreeBSD.org>	Cylinder group bitmaps and blocks containing inode for a snapshot file are after snaplock, while other ffs device buffers are before snaplock in global lock order. By itself, this could cause deadlock when bdwrite() tries to flush dirty buffers on snapshotted ffs. If, during the flush, COW activity for snapshot needs to allocate block and ffs_alloccg() selects the cylinder group that is being written by bdwrite(), then kernel would panic due to recursive buffer lock acquision. Avoid dealing with buffers in bdwrite() that are from other side of snaplock divisor in the lock order then the buffer being written. Add new BOP, bop_bdwrite(), to do dirty buffer flushing for same vnode in the bdwrite(). Default implementation, bufbdflush(), refactors the code from bdwrite(). For ffs device buffers, specialized implementation is used. Reviewed by: tegge, jeff, Russell Cattelan (cattelan xfs org, xfs changes) Tested by: Peter Holm X-MFC after: 3 weeks (if ever: it changes ABI)
# acd3428b	06-Nov-2006	Robert Watson <rwatson@FreeBSD.org>	Sweep kernel replacing suser(9) calls with priv(9) calls, assigning specific privilege names to a broad range of privileges. These may require some future tweaking. Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>
# 1a60c7fc	31-Oct-2006	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Add gjournal specific code to the UFS file system: - Add FS_GJOURNAL flag which enables gjournal support on a file system. - Add cg_unrefs field to the cylinder group structure which holds number of unreferenced (orphaned) inodes in the given cylinder group. - Add fs_unrefs field to the super block structure which holds total number of unreferenced (orphaned) inodes. - When file or a directory is orphaned (last reference is removed, but object is still open), increase fs_unrefs and cg_unrefs fields, which is a hint for fsck in which cylinder groups looks for such (orphaned) objects. - When file is last closed, decrease {fs,cg}_unrefs fields. - Add VV_DELETED vnode flag which points at orphaned objects. Sponsored by: home.pl
# aed55708	22-Oct-2006	Robert Watson <rwatson@FreeBSD.org>	Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA
# 8d0547c6	25-Sep-2006	Tor Egge <tegge@FreeBSD.org>	Protect change to bo_flag by holding the bufobj mutex.
# 5da56ddb	25-Sep-2006	Tor Egge <tegge@FreeBSD.org>	Use mount interlock to protect all changes to mnt_flag and mnt_kern_flag. This eliminates a race where MNT_UPDATE flag could be lost when nmount() raced against sync(), sync_fsync() or quotactl().
# 5fe6d2be	09-Jul-2006	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Declare UFS module version.
# 946478fc	09-Jul-2006	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Change fs->fs_fsmnt to mp->mnt_stat.f_mntonname in warnings about missing MAC and ACLs support in the kernel. If it is a first mount, fs->fs_fsmnt is empty. MFC after: 1 week
# 71ac2d7c	03-Jun-2006	Craig Rodrigues <rodrigc@FreeBSD.org>	Check the sectorsize of the underlying disk before trying to bread() the UFS superblock. Should eliminate crashes when trying to do: mount -t ufs on an audio CD. PR: kern/85893 Reported by: Russell Francis <rfrancis at ev dot net> MFC after: 1 week
# ee98eb82	25-May-2006	Craig Rodrigues <rodrigc@FreeBSD.org>	Remove "update" from ffs_opts. It has been moved to global_opts in vfs_mount.c.
# 5eb304a9	25-May-2006	Craig Rodrigues <rodrigc@FreeBSD.org>	Remove calls to vfs_export() for exporting a filesystem for NFS mounting from individual filesystems. Call it instead in vfs_mount.c, after we call VFS_MOUNT() for a specific filesystem.
# 4ba8c2a5	23-May-2006	Craig Rodrigues <rodrigc@FreeBSD.org>	Take errmsg out of ffs_opts. It is already part of global_opts in vfs_mount.c.
# 868bb88f	02-May-2006	Tor Egge <tegge@FreeBSD.org>	Temporarily undo clusters contribution to global runningbufspace while handling copy on write for the buffers taking part in the cluster.
# cbd6fedb	27-Apr-2006	Scott Long <scottl@FreeBSD.org>	Fix a typo.
# 6ca9fcc5	27-Apr-2006	Jeff Roberson <jeff@FreeBSD.org>	- Add a BO_NEEDSGIANT flag to the bufobj. This flag forces all child buffers to go on the buf daemon's DIRTYGIANT queue. - Set BO_NEEDSGIANT on ffs's devvp since the ffs_copyonwrite handler runs in the context of the buf daemon and may require Giant.
# 7b3f1bbd	21-Apr-2006	Tom Rhodes <trhodes@FreeBSD.org>	Revert previous to this file before an actual request is made.
# 8fc22c9d	21-Apr-2006	Tom Rhodes <trhodes@FreeBSD.org>	Remove what I believe are two useless ifdefs. If a user or administrator enables multilabel, or any option for that matter, most likely they have a reason. This will allow users to see that mulilabel is enabled via an issued "mount" command and remove an annoying warning - printed only when a MAC kernel is not installed - on boot up. Discussed with: green, brueffer, Samy Al Bahra. Probably ran past: csjp (though I can't remember).
# 3bbd6d8a	30-Mar-2006	Jeff Roberson <jeff@FreeBSD.org>	- Release the references acquired by VOP_GETWRITEMOUNT and vfs_getvfs(). Discussed with: tegge Tested by: kris Sponsored by: Isilon Systems, Inc.
# 700118c7	19-Mar-2006	Tor Egge <tegge@FreeBSD.org>	Allow compilation when not using softupdates.
# 7de3839d	19-Mar-2006	Tor Egge <tegge@FreeBSD.org>	Let snapshots make a copy of old contents for all buffers taking part in a cluster instead of just the first buffer. Delay buf_start() calls until snapshots have a copy of old content. PR: kern/93942
# 95e7a3c3	19-Mar-2006	Tor Egge <tegge@FreeBSD.org>	Reduce probability of unmount failing after having unmounted snapshots.
# ca2fa807	10-Mar-2006	Tor Egge <tegge@FreeBSD.org>	Block secondary writes while expunging active unlinked files. Fix detection of active unlinked files by checking VI_OWEINACT and VI_DOINGINACT in addition to v_usecount. Defer inactive handling for unlinked files if the file system is mostly suspended (secondary writes being blocked). Perform deferred inactive handling after the file system is resumed.
# 1e70cd7f	09-Mar-2006	Tor Egge <tegge@FreeBSD.org>	Remove unneeded (and broken) usage of MNT_REF()/MNT_REL().
# 791dd2fa	08-Mar-2006	Tor Egge <tegge@FreeBSD.org>	Use vn_start_secondary_write() and vn_finished_secondary_write() as a replacement for vn_write_suspend_wait() to better account for secondary write processing. Close race where secondary writes could be started after ffs_sync() returned but before the file system was marked as suspended. Detect if secondary writes or softdep processing occurred during vnode sync loop in ffs_sync() and retry the loop if needed.
# 82be0a5a	09-Jan-2006	Tor Egge <tegge@FreeBSD.org>	Add marker vnodes to ensure that all vnodes associated with the mount point are iterated over when using MNT_VNODE_FOREACH. Reviewed by: truckman
# b6bd025c	24-Nov-2005	Craig Rodrigues <rodrigc@FreeBSD.org>	Fix parsing of atime, clusterr, clusterw, exec, suid, symfollow mount options. Noticed by: Amir Shalem < amir at boom dot org dot il>
# cea90362	20-Nov-2005	Craig Rodrigues <rodrigc@FreeBSD.org>	If export mount flag is not passed in, set default parameters for export structure and pass that to vfs_export(). Currently in userland mount(8), an export structure is unconditionally passed in, only for UFS. This is an attempt to move that UFS-specific behavior out of mount(8) and into the UFS filesystem code.
# 359d4388	19-Nov-2005	Craig Rodrigues <rodrigc@FreeBSD.org>	Add more options to ffs_opts, so that vfs_filteropts() will not complain when we pass these options to a UFS filesystem as strings via nmount(): noexec, nosuid, nosymfollow, sync, suiddir
# 26f59b64	17-Nov-2005	Craig Rodrigues <rodrigc@FreeBSD.org>	- Add parsing for the following existing UFS/FFS mount options in the nmount() callpath via vfs_getopt(), and set the appropriate MNT_* flag: -> acls, async, force, multilabel, noasync, noatime, -> noclusterr, noclusterw, snapshot, update - Allow errmsg as a valid mount option via vfs_getopt(), so we can later add a hook to propagate mount errors back to userspace via vfs_mount_error().
# 8680d698	20-Oct-2005	Nate Lawson <njl@FreeBSD.org>	Adjust maxfilesize for UFS1 and old 4.4 FFS. For UFS1, increase the limit to (max block - 1) * bsize. For DEV_BSIZE, this doubles the limit from 0.5 TB to 1 TB. For the old 4.4 FFS case, decrease the limit from 0.5 TB to 2 GB - 1. Older systems had a 32 bit off_t so they couldn't access the larger files anyway. Collaboration with: bde
# 9248a827	09-Oct-2005	Tor Egge <tegge@FreeBSD.org>	Don't pretend that a failed sync write was succesful.
# fdedad76	02-Sep-2005	Suleiman Souhlal <ssouhlal@FreeBSD.org>	ffs_mountfs() needs devvp to be locked, so lock it. Glanced at by: phk Tested by: pjd MFC after: 3 days
# 93373c42	21-Aug-2005	Suleiman Souhlal <ssouhlal@FreeBSD.org>	Set the mountpoint path in the superblock (fs_fsmnt) at mount-time so that it appears in the various messages (not cleanly unmounted, filesystem full, etc). This has been broken since rev 1.261.
# ec9c9e73	20-Jul-2005	Alan Cox <alc@FreeBSD.org>	Eliminate inconsistency in the setting of the B_DONE flag. Specifically, make the b_iodone callback responsible for setting it if it is needed. Previously, it was set unconditionally by bufdone() without holding whichever lock is shared by the b_iodone callback and the corresponding top-half function. Consequently, in a race, the top-half function could conclude that operation was done before the b_iodone callback finished. See, for example, aio_physwakeup() and aio_fphysio(). Note: I don't believe that the other, more widely-used b_iodone callbacks are affected. Discussed with: jeff Reviewed by: phk MFC after: 2 weeks
# 204ec66d	30-May-2005	Jeff Roberson <jeff@FreeBSD.org>	- Don't set our bio op to be a READ when we've just completed a write. There are subtle differences in the read and write completion path. Instead, grab an extra write ref so the write path can drop it when we recursively call bufdone(). I believe this may be the source of the wrong bufobj panics. Reported by: pho, kkenn
# 41d4783d	03-Apr-2005	Jeff Roberson <jeff@FreeBSD.org>	- In ffs_sync we need to pass LK_SLEEPFAIL in when we lock the vnode because it may change identities while we're sleeping on the lock. Otherwise we may bail out of ffs_sync() early due to an error from deadfs. - Collapse a VOP_UNLOCK, vrele into a single vput().
# 153910e0	03-Apr-2005	Jeff Roberson <jeff@FreeBSD.org>	- Move the contents of softdep_disk_prewrite into ffs_geom_strategy to fix two bugs. - ffs_disk_prewrite was pulling the vp from the buf and checking for COPYONWRITE, when really it wanted the vp from the bufobj that we're writing to, which is the devvp. This lead to us skipping the copy on write to all file data, which significantly broke snapshots for the last few months. - When the SOFTUPDATES option was not included in the kernel config we would also skip the copy on write check, which would effectively disable snapshots. - Remove an invalid mp_fixme(). Debugging tips from: mckusick Reported by: iedowse, others Discussed with: phk
# aa7ba427	30-Mar-2005	Jeff Roberson <jeff@FreeBSD.org>	- FFS supports shared locks, clear LK_NOSHARE from our vnode locks. Sponsored by: Isilon Systems, Inc.
# d6919865	29-Mar-2005	Jeff Roberson <jeff@FreeBSD.org>	- Upgrade a shared lock request to exclusive in ffs_vget() if we have to create the vnode. Sponsored by: Isilon Systems, Inc.
# 51f5ce0c	16-Mar-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Add two arguments to the vfs_hash() KPI so that filesystems which do not have unique hashes (NFS) can also use it.
# de68347b	15-Mar-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Don't hold a reference on the disk vnode for each inode.
# 45c26fa2	15-Mar-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Improve the vfs_hash() API: vput() the unneeded vnode centrally to avoid replicating the vput in all the filesystems.
# e82ef95c	15-Mar-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Simplify the vfs_hash calling convention.
# 14bc0685	14-Mar-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Use vfs_hash instead of home-rolled.
# fe68abe2	12-Mar-2005	Jeff Roberson <jeff@FreeBSD.org>	- The VI_DOOMED flag now signals the end of a vnode's relationship with the filesystem. Check that rather than VI_XLOCK. - Shorten ffs_reload by one step. The old check for an inactive vnode was slightly racey, and the code which deals with still active vnodes is not much more expensive. Sponsored by: Isilon Systems, Inc.
# dfd4be14	19-Feb-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Try to unbreak the vnode locking around vop_reclaim() (based mostly on patch from kan@). Pull bufobj_invalbuf() out of vinvalbuf() and make g_vfs call it on close. This is not yet a generally safe function, but for this very specific use it is safe. This solves the problem with buffers not being flushed by unmount or after failed mount attempts.
# adf41577	09-Feb-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Make a some SYSCTL_NODEs and some of FFS's VFS_ methods static.
# 02f2c6a9	08-Feb-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Split the vop_vector for ffs1 and ffs2, this is mostly for the different EXTATTR support.
# dd19a799	08-Feb-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Background writes are entirely an FFS/Softupdates thing. Give FFS vnodes a specific bufwrite method which contains all the background write stuff and then calls into the default bufwrite() for the rest of the job. Remove all the background write related stuff from the normal bufwrite. This drags the softdep_move_dependencies() back into FFS. Long term, it is worth looking at simply copying the data into allocated memory and issuing the bio directly and not create the "shadow buf" in the first place (just like copy-on-write is done in snapshots for instance). I don't think we really gain anything but complexity from doing this with a buf.
# efd6d980	08-Feb-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Don't use the UFS_* and VFS_* functions where a direct call is possble. The UFS_ functions are for UFS to call back into VFS. The VFS functions are external entry points into the filesystem.
# 40854ff5	08-Feb-2005	Poul-Henning Kamp <phk@FreeBSD.org>	For snapshots we need all VOP_LOCKs to be exclusive. The "business class upgrade" was implemented in UFS's VOP_LOCK implementation ufs_lock() which is the wrong layer, so move it to ffs_lock(). Also, as long as we have not abandonned advanced vfs-stacking we should not preclude it from happening: instead of implementing a copy locally, use the VOP_LOCK_APV(&ufs) to correctly arrive at vop_stdlock() at the bottom.
# 84a69752	25-Jan-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Introduce and use g_vfs_close().
# ce12d37e	24-Jan-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Don't create vnode_pager objects for the disk device. geom_vfs will do that.
# 3ba649d7	24-Jan-2005	Jeff Roberson <jeff@FreeBSD.org>	- Initialize and destroy the per-filesystem ufs lock where appropriate. - Use the buffer lock on the superblock buf to serialize calls to sbupdate. - Set the MNTK_MPSAFE flag when QUOTA is not defined in the kernel. Sponsored By: Isilon Systems, Inc.
# 39cfb239	15-Jan-2005	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Fix ACLs handling for the root file system. Without this fix, when ACLs are set via tunefs(8) on the root file system, they are removed on boot when 'mount -a' is called, because mount(8) called for the root file system always add MNT_UPDATE flag and MNT_UPDATE flag isn't perfect. Now, one cannot remove ACLs stored in superblock (configured with tunefs(8)) via 'mount -a' nor 'mount -u -o noacls <file system>', but it is still possible to mount file system which doesn't have ACLs in superblock via 'mount -o acls <file system>' or /etc/fstab's 'acls' option. Reported by: Lech Lorens/pl.comp.os.bsd Discussed with: phk, rwatson Reviewed by: rwatson MFC after: 2 weeks
# 7c0745ee	14-Jan-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Eliminate unused and unnecessary "cred" argument from vinvalbuf()
# e39db32a	12-Jan-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Ditch vfs_object_create() and make the callers call VOP_CREATEVOBJECT() directly.
# 6ef8480a	11-Jan-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Add BO_SYNC() and add a default which uses the secret vnode pointer and VOP_FSYNC() for now.
# 8df6bac4	11-Jan-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Remove the unused credential argument from VOP_FSYNC() and VFS_SYNC(). I'm not sure why a credential was added to these in the first place, it is not used anywhere and it doesn't make much sense: The credentials for syncing a file (ability to write to the file) should be checked at the system call level. Credentials for syncing one or more filesystems ("none") should be checked at the system call level as well. If the filesystem implementation needs a particular credential to carry out the syncing it would logically have to the cached mount credential, or a credential cached along with any delayed write data. Discussed with: rwatson
# 60727d8b	06-Jan-2005	Warner Losh <imp@FreeBSD.org>	/* -> /*- for license, minor formatting changes
# 4a18054d	12-Dec-2004	Poul-Henning Kamp <phk@FreeBSD.org>	With the introduction of UFS2 we started looking for superblocks in four different locations on a prospective filesystem. If we found none, we forgot to invalidate the four buffers, thus the following sequence would fails: (md0 = blank disk) mount /dev/md0 /mnt (fails, no superblocks) newfs /dev/md0 (writes using physio which does not go through buffercache). mount /dev/md0 /mnt (still fails, the four cached buffers still contain no superblocks) Found by: ru
# f21cc2ca	07-Dec-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Fix nfs exports (for now). The real fix is to teach mountd about nmount.
# 20a92a18	07-Dec-2004	Poul-Henning Kamp <phk@FreeBSD.org>	The remaining part of nmount/omount/rootfs mount changes. I cannot sensibly split the conversion of the remaining three filesystems out from the root mounting changes, so in one go: cd9660: Convert to nmount. Add omount compat shims. Remove dedicated rootfs mounting code. Use vfs_mountedfrom() Rely on vfs_mount.c calling VFS_STATFS() nfs(client): Convert to nmount (the simple way, mount_nfs(8) is still necessary). Add omount compat shims. Drop COMPAT_PRELITE2 mount arg compatibility. ffs: Convert to nmount. Add omount compat shims. Remove dedicated rootfs mounting code. Use vfs_mountedfrom() Rely on vfs_mount.c calling VFS_STATFS() Remove vfs_omount() method, all filesystems are now converted. Remove MNTK_WANTRDWR, handling RO/RW conversions is a filesystem task, and they all do it now. Change rootmounting to use DEVFS trampoline: vfs_mount.c: Mount devfs on /. Devfs needs no 'from' so this is clean. symlink /dev to /. This makes it possible to lookup /dev/foo. Mount "real" root filesystem on /. Surgically move the devfs mountpoint from under the real root filesystem onto /dev in the real root filesystem. Remove now unnecessary getdiskbyname(). kern_init.c: Don't do devfs mounting and rootvnode assignment here, it was already handled by vfs_mount.c. Remove now unused bdevvp(), addaliasu() and addalias(). Put the few necessary lines in devfs where they belong. This eliminates the second-last source of bogo vnodes, leaving only the lemming-syncer. Remove rootdev variable, it doesn't give meaning in a global context and was not trustworth anyway. Correct information is provided by statfs(/).
# 74331236	05-Dec-2004	Poul-Henning Kamp <phk@FreeBSD.org>	VFS_STATFS(mp, ...) is mostly called with &mp->mnt_stat, but a few cases doesn't. Most of the implementations have grown weeds for this so they copy some fields from mnt_stat if the passed argument isn't that. Fix this the cleaner way: Always call the implementation on mnt_stat and copy that in toto to the VFS_STATFS argument if different.
# 93e0b506	03-Dec-2004	Poul-Henning Kamp <phk@FreeBSD.org>	typo in comment.
# aec0fb7b	01-Dec-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Back when VOP_* was introduced, we did not have new-style struct initializations but we did have lofty goals and big ideals. Adjust to more contemporary circumstances and gain type checking. Replace the entire vop_t frobbing thing with properly typed structures. The only casualty is that we can not add a new VOP_ method with a loadable module. History has not given us reason to belive this would ever be feasible in the the first place. Eliminate in toto VOCALL(), vop_t, VNODEOP_SET() etc. Give coda correct prototypes and function definitions for all vop_()s. Generate a bit more data from the vnode_if.src file: a struct vop_vector and protype typedefs for all vop methods. Add a new vop_bypass() and make vop_default be a pointer to another struct vop_vector. Remove a lot of vfs_init since vop_vector is ready to use from the compiler. Cast various vop_mumble() to void * with uppercase name, for instance VOP_PANIC, VOP_NULL etc. Implement VCALL() by making vdesc_offset the offsetof() the relevant function pointer in vop_vector. This is disgusting but since the code is generated by a script comparatively safe. The alternative for nullfs etc. would be much worse. Fix up all vnode method vectors to remove casts so they become typesafe. (The bulk of this is generated by scripts)
# 964ebefd	25-Nov-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Use system wide no-op vfs_start function.
# 51ac12ab	13-Nov-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Be prepared to accept NULL mountargs as part of root-mounting.
# cf5e4149	12-Nov-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Put back the vfs_object_create() calls, they do make a difference when my test-setup does what I want it to instead of what I ask it to. Pointed out by: tegge
# 40ce27cb	09-Nov-2004	Poul-Henning Kamp <phk@FreeBSD.org>	fix some comments
# 2e664919	09-Nov-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Use mount flags instead of NULL path to detect root filesystem mount.
# 5e2ccaff	09-Nov-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Stop pretending to have a vm_object backing the underlying disk vnode: it isn't used for anything anywhere and the vnode_pager would explode if we attempted to.
# 40c340aa	04-Nov-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Don't grab the exclusive bit on a root filesystem until we are willing to mount it. Doing so prevented fsck to be run after a refused mount.
# 43920011	29-Oct-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Move UFS from DEVFS backing to GEOM backing. This eliminates a bunch of vnode overhead (approx 1-2 % speed improvement) and gives us more control over the access to the storage device. Access counts on the underlying device are not correctly tracked and therefore it is possible to read-only mount the same disk device multiple times: syv# mount -p /dev/md0 /var ufs rw 2 2 /dev/ad0 /mnt ufs ro 1 1 /dev/ad0 /mnt2 ufs ro 1 1 /dev/ad0 /mnt3 ufs ro 1 1 Since UFS/FFS is not a synchrousely consistent filesystem (ie: it caches things in RAM) this is not possible with read-write mounts, and the system will correctly reject this. Details: Add a geom consumer and a bufobj pointer to ufsmount. Eliminate the vnode argument from softdep_disk_prewrite(). Pick the vnode out of bp->b_vp for now. Eventually we should find it through bp->b_bufobj->b_private. In the mountcode, use g_vfs_open() once we have used VOP_ACCESS() to check permissions. When upgrading and downgrading between r/o and r/w do the right thing with GEOM access counts. Remove all the workarounds for not being able to do this with VOP_OPEN(). If we are the root mount, drop the exclusive access count until we upgrade to r/w. This allows fsck of the root filesystem and the MNT_RELOAD to work correctly. Set bo_private to the GEOM consumer on the device bufobj. Change the ffs_ops->strategy function to call g_vfs_strategy() In ufs_strategy() directly call the strategy on the disk bufobj. Same in rawread. In ffs_fsync() we will no longer see VCHR device nodes, so remove code which synced the filesystem mounted on it, in case we came there. I'm not sure this code made sense in the first place since we would have taken the specfs route on such a vnode. Redo the highly bogus readblock() function in the snapshot code to something slightly less bogus: Constructing an uio and using physio was really quite a detour. Instead just fill in a bio and ship it down.
# 570a7dda	28-Oct-2004	Poul-Henning Kamp <phk@FreeBSD.org>	We only support backing UFS/FFS with disks.
# 8dd56505	26-Oct-2004	Poul-Henning Kamp <phk@FreeBSD.org>	White space changes. Add missing static.
# 6e77a041	26-Oct-2004	Poul-Henning Kamp <phk@FreeBSD.org>	The island council met and voted buf_prewrite() home. Give ffs it's own bufobj->bo_ops vector and create a private strategy routine, (currently misnamed for forwards compatibility), which is just a copy of the generic bufstrategy routine except we call softdep_disk_prewrite() directly instead of through the buf_prewrite() indirection. Teach UFS about the need for softdep_disk_prewrite() and call the function directly in FFS. Remove buf_prewrite() from the default bufstrategy() and from the global bio_ops method vector.
# 5d9d81e7	26-Oct-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Put the I/O block size in bufobj->bo_bsize. We keep si_bsize_phys around for now as that is the simplest way to pull the number out of disk device drivers in devfs_open(). The correct solution would be to do an ioctl(DIOCGSECTORSIZE), but the point is probably mooth when filesystems sit on GEOM, so don't bother for now.
# 156cb265	25-Oct-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Loose the v_dirty* and v_clean* alias macros. Check the count field where we just want to know the full/empty state, rather than using TAILQ_EMPTY() or TAILQ_FIRST().
# ee1d0eb3	25-Oct-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Remove vnode->v_bsize. This was a dead-end.
# 8d02a378	05-Oct-2004	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Back out changes which were introduced to delay mounting root file system. Those changes were made on gmirror needs, but now gmirror handles this by itself.
# 4f116178	28-Sep-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Remove support for accessing device nodes in UFS/FFS. Device nodes can still be created and exported with NFS.
# 5a19f8b0	23-Sep-2004	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Introduce new /boot/loader.conf variable: root_mount_delay. It can be used to delay mounting root partition to give a chance to GEOM providers to show up. Now, when there is no needed provider, vfs_rootmount() function will look for it every second and if it can't be find in defined time, it'll ask for root device name (before this change it was done immediately). This will allow to boot from gmirror device in degraded mode.
# 5e8c582a	30-Jul-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Put a version element in the VFS filesystem configuration structure and refuse initializing filesystems with a wrong version. This will aid maintenance activites on the 5-stable branch. s/vfs_mount/vfs_omount/ s/vfs_nmount/vfs_mount/ Name our filesystems mount function consistently. Eliminate the namiedata argument to both vfs_mount and vfs_omount. It was originally there to save stack space. A few places abused it to get hold of some credentials to pass around. Effectively it is unused. Reorganize the root filesystem selection code.
# d634f693	28-Jul-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Remove global variable rootdevs and rootvp, they are unused as such. Add local rootvp variables as needed. Remove checks for miniroot's in the swappartition. We never did that and most of the filesystems could never be used for that, but it had still been copy&pasted all over the place.
# b403319b	28-Jul-2004	Alexander Kabaev <kan@FreeBSD.org>	Avoid using casts as lvalues. Introduce DIP_SET macro which sets proper inode field based on UFS version. Use DIP ro read values and DIP_SET to modify them throughout FFS code base.
# d8d3d415	14-Jul-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Make sure to update the mnt_stats before UFS1 extattr tried to do I/O on the device. Otherwise the blocksize is undefined in the buffer cache.
# f257b7a5	12-Jul-2004	Alfred Perlstein <alfred@FreeBSD.org>	Make VFS_ROOT() and vflush() take a thread argument. This is to allow filesystems to decide based on the passed thread which vnode to return. Several filesystems used curthread, they now use the passed thread.
# c94cd5fc	07-Jul-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Explicity initialize vp->v_bsize.
# e3c5a7a4	04-Jul-2004	Poul-Henning Kamp <phk@FreeBSD.org>	When we traverse the vnodes on a mountpoint we need to look out for our cached 'next vnode' being removed from this mountpoint. If we find that it was recycled, we restart our traversal from the start of the list. Code to do that is in all local disk filesystems (and a few other places) and looks roughly like this: MNT_ILOCK(mp); loop: for (vp = TAILQ_FIRST(&mp...); (vp = nvp) != NULL; nvp = TAILQ_NEXT(vp,...)) { if (vp->v_mount != mp) goto loop; MNT_IUNLOCK(mp); ... MNT_ILOCK(mp); } MNT_IUNLOCK(mp); The code which takes vnodes off a mountpoint looks like this: MNT_ILOCK(vp->v_mount); ... TAILQ_REMOVE(&vp->v_mount->mnt_nvnodelist, vp, v_nmntvnodes); ... MNT_IUNLOCK(vp->v_mount); ... vp->v_mount = something; (Take a moment and try to spot the locking error before you read on.) On a SMP system, one CPU could have removed nvp from our mountlist but not yet gotten to assign a new value to vp->v_mount while another CPU simultaneously get to the top of the traversal loop where it finds that (vp->v_mount != mp) is not true despite the fact that the vnode has indeed been removed from our mountpoint. Fix: Introduce the macro MNT_VNODE_FOREACH() to traverse the list of vnodes on a mountpoint while taking into account that vnodes may be removed from the list as we go. This saves approx 65 lines of duplicated code. Split the insmntque() which potentially moves a vnode from one mount point to another into delmntque() and insmntque() which does just what the names say. Fix delmntque() to set vp->v_mount to NULL while holding the mountpoint lock.
# 89c9c53d	16-Jun-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Do the dreaded s/dev_t/struct cdev */ Bump __FreeBSD_version accordingly.
# 451079d4	29-Apr-2004	Bosko Milekic <bmilekic@FreeBSD.org>	Revert previous change to this file because it breaks some things which compare /etc/fstab entries to results from getfsstat(). The real way to fix this is to make 'ufs2' a recognized filesystem (for real, no beating around the bush). This should fix things like 'umount -a -t ufs' now. Appologies for the previous breakage.
# 2aebb586	26-Apr-2004	Bosko Milekic <bmilekic@FreeBSD.org>	The previous change to mount(8) to report ufs or ufs2 used libufs, which only works for Charlie root. This change reverts the introduction of libufs and moves the check into the kernel. Since the f_fstypename is the same for both ufs and ufs2, we check fs_magic for presence of ufs2 and copy "ufs2" explicitly instead. Submitted by: Christian S.J. Peron <maneo@bsdpro.com>
# 012d4134	06-Apr-2004	Warner Losh <imp@FreeBSD.org>	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999 and irc message from Robert Watson saying that clause 3 can be removed from those files with an NAI copyright that also have only a University of California copyrights. Approved by: core, rwatson
# e9827c6d	13-Feb-2004	Bruce Evans <bde@FreeBSD.org>	Fixed some style bugs: - don't unlock the vnode after vinvalbuf() only to have to relock it almost immediately. - don't refer to devices classified by vn_isdisk() as block devices.
# 0efb1394	12-Feb-2004	Bruce Evans <bde@FreeBSD.org>	MFextfs: backed out secondary changes in rev.1.40 that had become just style bugs (a variable that is used only once, and misformattings).
# 8adff5fc	12-Feb-2004	Bruce Evans <bde@FreeBSD.org>	Fixed some minor style bugs (English usage and formatting of binary operators) in and near revs.1.169-1.170 (open mode bandaid). This (or better a proper fix) should have been done before cloning the bandaid to many other file systems.
# 31c81e4b	06-Dec-2003	Don Lewis <truckman@FreeBSD.org>	Set fs_ronly to the correct value in ffs_reload() when reloading the file system super block after fsck has repaired the file system. The value of fs_ronly was getting overwritten, which caused ffs_update() to attempt to update inode timestamps even though the file system was still mounted read-only. This fixes the "giving up on N buffers" error that is triggered by running fsck on the root file system and then rebooting without mounting the file system read-write.
# fde81c7d	12-Nov-2003	Kirk McKusick <mckusick@FreeBSD.org>	Update the statfs structure with 64-bit fields to allow accurate reporting of multi-terabyte filesystem sizes. You should build and boot a new kernel BEFORE doing a `make world' as the new kernel will know about binaries using the old statfs structure, but an old kernel will not know about the new system calls that support the new statfs structure. Running an old kernel after a `make world' will cause programs such as `df' that do a statfs system call to fail with a bad system call. Reviewed by: Bruce Evans <bde@zeta.org.au> Reviewed by: Tim Robbins <tjr@freebsd.org> Reviewed by: Julian Elischer <julian@elischer.org> Reviewed by: the hoards of <arch@freebsd.org> Sponsored by: DARPA & NAI Labs.
# ca430f2e	04-Nov-2003	Alexander Kabaev <kan@FreeBSD.org>	Remove mntvnode_mtx and replace it with per-mountpoint mutex. Introduce two new macros MNT_ILOCK(mp)/MNT_IUNLOCK(mp) to operate on this mutex transparently. Eventually new mutex will be protecting more fields in struct mount, not only vnode list. Discussed with: jeff
# 45d45c6c	02-Nov-2003	Alexander Kabaev <kan@FreeBSD.org>	Use VOP_UNLOCK/vrele instead of vput. td was erecived as a parameter and one cannot be sure it is equal to curthread.
# cb9ddc80	01-Nov-2003	Alexander Kabaev <kan@FreeBSD.org>	Take care not to call vput if thread used in corresponding vget wasn't curthread, i.e. when we receive a thread pointer to use as a function argument. Use VOP_UNLOCK/vrele in these cases. The only case there td != curthread known at the moment is boot() calling sync with thread0 pointer. This fixes the panic on shutdown people have reported.
# 492c1e68	31-Oct-2003	Alexander Kabaev <kan@FreeBSD.org>	Temporarily undo parts of the stuct mount locking commit by jeff. It is unsafe to hold a mutex across vput/vrele calls. This will be redone when a better locking strategy is agreed upon. Discussed with: jeff
# 69b609a8	05-Oct-2003	Jeff Roberson <jeff@FreeBSD.org>	- The VCHR case in ffs_sync() is an unneccsary optimization especially considering how infrequently we access devices via ffs now that we have devfs. Collapse this case with the other case. Obtained from: bde
# ab1f917b	05-Oct-2003	Jeff Roberson <jeff@FreeBSD.org>	- Further simplify ffs_sync(). The vnode lock is required for UFS_UPDATE() so make the code slightly more uniform. The vnode lock is acquired in all cases and now the only difference between VCHR and other is we call UFS_UPDATE instead of VOP_FSYNC().
# 2f05568a	05-Oct-2003	Jeff Roberson <jeff@FreeBSD.org>	- Check the XLOCK before inspecting v_data. - Slightly rewrite the fsync loop to be more lock friendly. We must acquire the vnode interlock before dropping the mnt lock. We must also check XLOCK to prevent vclean() races. - Use LK_INTERLOCK in the vget() in ffs_sync to further prevent vclean() races. - Use a local variable to store the results of the nvp == TAILQ_NEXT test so that we do not access the vp after we've vrele()d it. - Add an XXX comment about UFS_UPDATE() not being protected by any lock here. I suspect that it should need the VOP lock.
# 04a17687	04-Oct-2003	Jeff Roberson <jeff@FreeBSD.org>	- Increase the scope of the interlock in ffs_reload(). Acquire it before we release the mntvnode_mtx. - Call vgonel() directly instead of going through vrecycle() since we own the interlock now. - Remove a few cases where we locked the interlock just so that we could call VOP_UNLOCK with interlock held.
# 5c24d6ee	15-Aug-2003	Poul-Henning Kamp <phk@FreeBSD.org>	Eliminate the i_devvp field from the incore UFS inodes, we can get the same value from ip->i_ump->um_devvp. This saves a pointer in the memory copies of inodes, which can easily run into several hundred kilobytes. The extra indirection is unmeasurable in benchmarks. Approved by: mckusick
# a8d43c90	26-Jul-2003	Poul-Henning Kamp <phk@FreeBSD.org>	Add a "int fd" argument to VOP_OPEN() which in the future will contain the filedescriptor number on opens from userland. The index is used rather than a "struct file " since it conveys a bit more information, which may be useful to in particular fdescfs and /dev/fd/ For now pass -1 all over the place.
# 7652131b	12-Jun-2003	Poul-Henning Kamp <phk@FreeBSD.org>	Initialize struct vfsops C99-sparsely. Submitted by: hmp Reviewed by: phk
# f4636c59	11-Jun-2003	David E. O'Brien <obrien@FreeBSD.org>	Use __FBSDID().
# 6280ed26	31-May-2003	Poul-Henning Kamp <phk@FreeBSD.org>	Remove unused local variables. Found by: FlexeLint
# 36329289	01-May-2003	Tim J. Robbins <tjr@FreeBSD.org>	Do not attempt to free NULL dinodes (i_din1 or i_din2) in ffs_ifree(). These fields can be left as NULL if ffs_vget() allocates an inode but fails before the dinode memory has been allocated. There are two cases when this can occur: when we lose a race and another process has added the inode to the hash, and when reading the inode off disk fails. The bug was observed by Kris on one of the package-building machines. See http://marc.theaimsgroup.com/?l=freebsd-current&m=105172731013411&w=2 In Kris's case, it was the bread() that failed because of a disk error. The alternative to this patch is to ensure that ffs_vget() does not call vput() when the inode that hasn't been properly initialised.
# 8d721e87	01-May-2003	Tim J. Robbins <tjr@FreeBSD.org>	Free i_din2 instead of i_din1 in ffs_ifree() on UFS2 filesystems. This is purely a cosmetic change because these members are in a union together.
# 31566c96	20-Mar-2003	John Baldwin <jhb@FreeBSD.org>	Use td->td_ucred instead of td->td_proc->p_ucred.
# b4b138c2	18-Mar-2003	Poul-Henning Kamp <phk@FreeBSD.org>	Including <sys/stdint.h> is (almost?) universally only to be able to use %j in printfs, so put a newsted include in <sys/systm.h> where the printf prototype lives and save everybody else the trouble.
# 7261f5f6	03-Mar-2003	Jeff Roberson <jeff@FreeBSD.org>	- Add a new 'flags' parameter to getblk(). - Define one flag GB_LOCK_NOWAIT that tells getblk() to pass the LK_NOWAIT flag to the initial BUF_LOCK(). This will eventually be used in cases were we want to use a buffer only if it is not currently in use. - Convert all consumers of the getblk() api to use this extra parameter. Reviwed by: arch Not objected to by: mckusick
# 74f3809a	25-Feb-2003	Kirk McKusick <mckusick@FreeBSD.org>	Change the field used to test whether the superblock has been updated from the filesystem size field to the filesystem maximum blocksize field. The problem is that older versions of growfs updated only the new size field and not the old size field. This resulted in the old (smaller) size field being copied up to the new size field which caused the filesystem to appear to fsck to be badly trashed. This also adds a sanity check to ensure that the superblock is not being updated when the filesystem is mounted read-only. Obviously such an update should never happen. Reported by: Nate Lawson <nate@root.org> Sponsored by: DARPA & NAI Labs.
# a163d034	18-Feb-2003	Warner Losh <imp@FreeBSD.org>	Back out M_* changes, per decision of the TRB. Approved by: trb
# aca3e497	14-Feb-2003	Kirk McKusick <mckusick@FreeBSD.org>	Replace use of random() with arc4random() to provide less guessable values for the initial inode generation numbers in newfs and for newly allocated inode generation numbers in the kernel. Submitted by: Theo de Raadt <deraadt@cvs.openbsd.org> Sponsored by: DARPA & NAI Labs.
# 44956c98	21-Jan-2003	Alfred Perlstein <alfred@FreeBSD.org>	Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
# aa4d7a8a	27-Dec-2002	Poul-Henning Kamp <phk@FreeBSD.org>	Use three UMA zones for FFS/UFS inodes instead of malloc space. Since inodes are currently 144 bytes, this will save 112 bytes per inode. This can amount to up to 10MByte on large systems.
# de6ba7c0	27-Dec-2002	Poul-Henning Kamp <phk@FreeBSD.org>	Move the allocation of the inode contents into ffs_vfsops.c rather than passing malloc types around.
# 975512a9	27-Dec-2002	Poul-Henning Kamp <phk@FreeBSD.org>	Make ffs_mountfs() static. Remove the malloctype from the ufs mount structure, instead add a callback to the storage method for freeing inodes: UFS_IFREE(). Add vfs_ifree() method function which frees an inode. Unvariablelize the malloc type used for allocating inodes.
# 31574422	30-Nov-2002	Kirk McKusick <mckusick@FreeBSD.org>	Add a check to disable the previous patch so that future filesystems that choose to place their superblocks in non-standard locations will not get them smashed. Sponsored by: DARPA & NAI Labs.
# fa5d33e2	29-Nov-2002	Kirk McKusick <mckusick@FreeBSD.org>	Check to make sure that the fs_sblockloc field was properly updated before using it to write the superblock. This is to guard against accidentally trashing the disklabel if the superblock format missed being upgraded by the new kernel. Reported by: Sam Leffler <sam@errno.com> Sponsored by: DARPA & NAI Labs. Approved by: Murray Stokely <murray@FreeBSD.org>
# ada981b2	26-Nov-2002	Kirk McKusick <mckusick@FreeBSD.org>	Create a new 32-bit fs_flags word in the superblock. Add code to move the old 8-bit fs_old_flags to the new location the first time that the filesystem is mounted by a new kernel. One of the unused flags in fs_old_flags is used to indicate that the flags have been moved. Leave the fs_old_flags word intact so that it will work properly if used on an old kernel. Change the fs_sblockloc superblock location field to be in units of bytes instead of in units of filesystem fragments. The old units did not work properly when the fragment size exceeeded the superblock size (8192). Update old fs_sblockloc values at the same time that the flags are moved. Suggested by: BOUWSMA Barry <freebsd-misuser@netscum.dyndns.dk> Sponsored by: DARPA & NAI Labs.
# 763bbd2f	26-Oct-2002	Robert Watson <rwatson@FreeBSD.org>	Slightly change the semantics of vnode labels for MAC: rather than "refreshing" the label on the vnode before use, just get the label right from inception. For single-label file systems, set the label in the generic VFS getnewvnode() code; for multi-label file systems, leave the labeling up to the file system. With UFS1/2, this means reading the extended attribute during vfs_vget() as the inode is pulled off disk, rather than hitting the extended attributes frequently during operations later, improving performance. This also corrects sematics for shared vnode locks, which were not previously present in the system. This chances the cache coherrency properties WRT out-of-band access to label data, but in an acceptable form. With UFS1, there is a small race condition during automatic extended attribute start -- this is not present with UFS2, and occurs because EAs aren't available at vnode inception. We'll introduce a work around for this shortly. Approved by: re Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
# 9ab73fd1	24-Oct-2002	Kirk McKusick <mckusick@FreeBSD.org>	Within ufs, the ffs_sync and ffs_fsync functions did not always check for and/or report I/O errors. The result is that a VFS_SYNC or VOP_FSYNC called with MNT_WAIT could loop infinitely on ufs in the presence of a hard error writing a disk sector or in a filesystem full condition. This patch ensures that I/O errors will always be checked and returned. This patch also ensures that every call to VFS_SYNC or VOP_FSYNC with MNT_WAIT set checks for and takes appropriate action when an error is returned. Sponsored by: DARPA & NAI Labs.
# 80830407	15-Oct-2002	Robert Watson <rwatson@FreeBSD.org>	If the FS_MULTILABEL flag is set in a UFS or UFS2 superblock, automatically set MNT_MULTILABEL in the mount flags. If FS_ACLS is set in a UFS or UFS2 superblock, automatically set MNT_ACLS in the mount flags. If either of these flags is set, but the appropriate kernel option to support the features associated with the flag isn't available, then print a warning at mount-time. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
# a5b65058	13-Oct-2002	Kirk McKusick <mckusick@FreeBSD.org>	Regularize the vop_stdlock'ing protocol across all the filesystems that use it. Specifically, vop_stdlock uses the lock pointed to by vp->v_vnlock. By default, getnewvnode sets up vp->v_vnlock to reference vp->v_lock. Filesystems that wish to use the default do not need to allocate a lock at the front of their node structure (as some still did) or do a lockinit. They can simply start using vn_lock/VOP_UNLOCK. Filesystems that wish to manage their own locks, but still use the vop_stdlock functions (such as nullfs) can simply replace vp->v_vnlock with a pointer to the lock that they wish to have used for the vnode. Such filesystems are responsible for setting the vp->v_vnlock back to the default in their vop_reclaim routine (e.g., vp->v_vnlock = &vp->v_lock). In theory, this set of changes cleans up the existing filesystem lock interface and should have no function change to the existing locking scheme. Sponsored by: DARPA & NAI Labs.
# b6cef564	08-Oct-2002	Kirk McKusick <mckusick@FreeBSD.org>	The appropriate units for disk block addresses are always DEV_BSIZE, even when the underlying device has a larger sector size. Therefore, the filesystem code should not (and with this patch does not) try to use the underlying sector size when doing disk block address calculations. This patch fixes problems in -current when using the swap-based memory-disk device (mdconfig -a -t swap ...). This bugfix is not relevant to -stable as -stable does not have the memory-disk device. Sponsored by: DARPA & NAI Labs.
# 2ee5711e	24-Sep-2002	Jeff Roberson <jeff@FreeBSD.org>	- Convert locks to use standard macros. - Lock access to the buflists. - Document broken locking. - Use vrefcnt().
# 06be2aaa	14-Sep-2002	Nate Lawson <njl@FreeBSD.org>	Remove all use of vnode->v_tag, replacing with appropriate substitutes. v_tag is now const char * and should only be used for debugging. Additionally: 1. All users of VT_NTS now check vfsconf->vf_type VFCF_NETWORK 2. The user of VT_PROCFS now checks for the new flag VV_PROCDEP, which is propagated by pseudofs to all child vnodes if the fs sets PFS_PROCDEP. Suggested by: phk Reviewed by: bde, rwatson (earlier version)
# d6fe88e4	13-Aug-2002	Poul-Henning Kamp <phk@FreeBSD.org>	Unravel the UFS_EXTATTR incest between FFS and UFS: UFS_EXTATTR is an UFS only thing, and FFS should in principle not know if it is enabled or not. This commit cleans ffs_vnops.c for such knowledge, but not ffs_vfsops.c Sponsored by: DARPA and NAI Labs.
# 9bf1a756	13-Aug-2002	Poul-Henning Kamp <phk@FreeBSD.org>	Introduce typedefs for the member functions of struct vfsops and employ these in the main filesystems. This does not change the resulting code but makes the source a little bit more grepable. Sponsored by: DARPA and NAI Labs.
# e6e370a7	04-Aug-2002	Jeff Roberson <jeff@FreeBSD.org>	- Replace v_flag with v_iflag and v_vflag - v_vflag is protected by the vnode lock and is used when synchronization with VOP calls is needed. - v_iflag is protected by interlock and is used for dealing with vnode management issues. These flags include X/O LOCK, FREE, DOOMED, etc. - All accesses to v_iflag and v_vflag have either been locked or marked with mp_fixme's. - Many ASSERT_VOP_LOCKED calls have been added where the locking was not clear. - Many functions in vfs_subr.c were restructured to provide for stronger locking. Idea stolen from: BSD/OS
# 5346934f	01-Jul-2002	Ian Dowse <iedowse@FreeBSD.org>	Add the ffs bits necessary to support unloading of the ufs kernel module. This adds an ffs_uninit() function that calls ufs_uninit() and also calls a new softdep_uninitialize() function. Add a stub for softdep_uninitialize() to cover the non-SOFTUPDATES case. Reviewed by: mckusick
# 8f42fb8f	26-Jun-2002	Ian Dowse <iedowse@FreeBSD.org>	Remove the kernel file-size limit for UFS2, so that only the limit imposed by the filesystem structure itself remains. With 16k blocks, the maximum file size is now just over 128TB. For now, the UFS1 file size limit is left unchanged so as to remain consistent with RELENG_4, but it too could be removed in the future. Reviewed by: mckusick
# cfbf0a46	23-Jun-2002	Maxime Henrion <mux@FreeBSD.org>	Warning fixes for 64 bits platforms. This eliminates all the warnings I have had in the FFS code on sparc64. Reviewed by: mckusick
# 1c85e6a3	21-Jun-2002	Kirk McKusick <mckusick@FreeBSD.org>	This commit adds basic support for the UFS2 filesystem. The UFS2 filesystem expands the inode to 256 bytes to make space for 64-bit block pointers. It also adds a file-creation time field, an ability to use jumbo blocks per inode to allow extent like pointer density, and space for extended attributes (up to twice the filesystem block size worth of attributes, e.g., on a 16K filesystem, there is space for 32K of attributes). UFS2 fully supports and runs existing UFS1 filesystems. New filesystems built using newfs can be built in either UFS1 or UFS2 format using the -O option. In this commit UFS1 is the default format, so if you want to build UFS2 format filesystems, you must specify -O 2. This default will be changed to UFS2 when UFS2 proves itself to be stable. In this commit the boot code for reading UFS2 filesystems is not compiled (see /sys/boot/common/ufsread.c) as there is insufficient space in the boot block. Once the size of the boot block is increased, this code can be defined. Things to note: the definition of SBSIZE has changed to SBLOCKSIZE. The header file <ufs/ufs/dinode.h> must be included before <ufs/ffs/fs.h> so as to get the definitions of ufs2_daddr_t and ufs_lbn_t. Still TODO: Verify that the first level bootstraps work for all the architectures. Convert the utility ffsinfo to understand UFS2 and test growfs. Add support for the extended attribute storage. Update soft updates to ensure integrity of extended attribute storage. Switch the current extended attribute interfaces to use the extended attribute storage. Add the extent like functionality (framework is there, but is currently never used). Sponsored by: DARPA & NAI Labs. Reviewed by: Poul-Henning Kamp <phk@freebsd.org>
# 13866b3f	06-Jun-2002	Semen Ustimenko <semenu@FreeBSD.org>	Fix a typo in my recently added comment: s/beleived/believed/ Submitted by: keramida
# f576a00d	30-May-2002	Semen Ustimenko <semenu@FreeBSD.org>	Remove lock from ffs_vget introduced by v1.24. Instead of locking the vnode creation globaly, we allow processes to create vnodes concurently. In case of concurent creation of vnode for the one ino, we allow processes to race and then check who wins. Assuming that concurent creation of vnode for same ino is really rare case, this is belived to be an improvement, as it just allows concurent creation of vnodes. Idea by: bp Reviewed by: dillon MFC after: 1 month
# 00b162d0	18-May-2002	Ian Dowse <iedowse@FreeBSD.org>	Remove um_i_effnlink_valid, i_spare[] and the ufsmount_u and inode_u unions, since these were only necessary when ext2fs used ufs code. Reviewed by: mckusick
# d394511d	16-May-2002	Tom Rhodes <trhodes@FreeBSD.org>	More s/file system/filesystem/g
# 05f4ff5d	13-May-2002	Poul-Henning Kamp <phk@FreeBSD.org>	Remove register keyword. Sponsored by: DARPA & NAI Labs. Submitted by: mckusick
# 2dd527b3	08-Apr-2002	Poul-Henning Kamp <phk@FreeBSD.org>	Move generic disk ioctls from <sys/disklabel.h> to <sys/disk.h>. Sponsored by: DARPA & NAI Labs
# 6008862b	04-Apr-2002	John Baldwin <jhb@FreeBSD.org>	Change callers of mtx_init() to pass in an appropriate lock type name. In most cases NULL is passed, but in some cases such as network driver locks (which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used. Tested on: i386, alpha, sparc64
# 46a67eac	02-Apr-2002	Poul-Henning Kamp <phk@FreeBSD.org>	Use DIOCGSECTORSIZE instead of the bogus DIOCGPART ioctl.
# 44731cab	01-Apr-2002	John Baldwin <jhb@FreeBSD.org>	Change the suser() API to take advantage of td_ucred as well as do a general cleanup of the API. The entire API now consists of two functions similar to the pre-KSE API. The suser() function takes a thread pointer as its only argument. The td_ucred member of this thread must be valid so the only valid thread pointers are curthread and a few kernel threads such as thread0. The suser_cred() function takes a pointer to a struct ucred as its first argument and an integer flag as its second argument. The flag is currently only used for the PRISON_ROOT flag. Discussed on: smp@
# 0508986c	30-Mar-2002	Bruce Evans <bde@FreeBSD.org>	In ffs_mountffs(), set mnt_iosize_max to si_iosize_max unconditionally provided the latter is nonzero. At this point, the former is a fairly arbitrary default value (DFTPHYS), so changing it to any reasonable value specified by the device driver is safe. Using the maximum of these limits broke ffs clustered i/o for devices whose si_iosize_max is < DFLTPHYS. Using the minimum would break device drivers' ability to increase the active limit from DFTLPHYS up to MAXPHYS. Copied the code for this and the associated (unnecessary?) fixup of mp_iosize_max to all other filesystems that use clustering (ext2fs and msdosfs). It was completely missing. PR: 36309 MFC-after: 1 week
# 6f1e8551	19-Mar-2002	Alfred Perlstein <alfred@FreeBSD.org>	Remove __P.
# a0595d02	16-Mar-2002	Kirk McKusick <mckusick@FreeBSD.org>	Add a flags parameter to VFS_VGET to pass through the desired locking flags when acquiring a vnode. The immediate purpose is to allow polling lock requests (LK_NOWAIT) needed by soft updates to avoid deadlock when enlisting other processes to help with the background cleanup. For the future it will allow the use of shared locks for read access to vnodes. This change touches a lot of files as it affects most filesystems within the system. It has been well tested on FFS, loopback, and CD-ROM filesystems. only lightly on the others, so if you find a problem there, please let me (mckusick@mckusick.com) know.
# 063f7763	11-Mar-2002	Poul-Henning Kamp <phk@FreeBSD.org>	I missed one VOP_CLOSE in the previous commit. Pointed out by: bde
# 3dbceccb	11-Mar-2002	Poul-Henning Kamp <phk@FreeBSD.org>	As a XXX bandaid open the mounted device READ/WRITE even if we only mount read-only. The trouble here is that we don't reopen the device in read/write mode when we remount in read/write mode resulting in a filesystem sending write requests to a device which was only opened read/only. I'm not quite sure how such a reopen would best be done and defer the problem to more agile hackers.
# a854ed98	27-Feb-2002	John Baldwin <jhb@FreeBSD.org>	Simple p_ucred -> td_ucred changes to start using the per-thread ucred reference.
# cd600596	15-Jan-2002	Kirk McKusick <mckusick@FreeBSD.org>	When downgrading a filesystem from read-write to read-only, operations involving file removal or file update were not always being fully committed to disk. The result was lost files or corrupted file data. This change ensures that the filesystem is properly synced to disk before the filesystem is down-graded. This delta also fixes a long standing bug in which a file open for reading has been unlinked. When the last open reference to the file is closed, the inode is reclaimed by the filesystem. Previously, if the filesystem had been down-graded to read-only, the inode could not be reclaimed, and thus was lost and had to be later recovered by fsck. With this change, such files are found at the time of the down-grade. Normally they will result in the filesystem down-grade failing with `device busy'. If a forcible down-grade is done, then the affected files will be revoked causing the inode to be released and the open file descriptors to begin failing on attempts to read. Submitted by: "Sam Leffler" <sam@errno.com>
# 23b59018	20-Dec-2001	Matthew Dillon <dillon@FreeBSD.org>	Fix a BUF_TIMELOCK race against BUF_LOCK and fix a deadlock in vget() against VM_WAIT in the pageout code. Both fixes involve adjusting the lockmgr's timeout capability so locks obtained with timeouts do not interfere with locks obtained without a timeout. Hopefully MFC: before the 4.5 release
# 143a5346	16-Dec-2001	Ian Dowse <iedowse@FreeBSD.org>	Make sure we ignore the value of `fs_active' when reloading the superblock, and move the initialisation of it to beside where other pointer fields are initialised.
# cc5a9233	13-Dec-2001	Kirk McKusick <mckusick@FreeBSD.org>	Minimize the time necessary to suspend operations on a filesystem when taking a snapshot. The two time consuming operations are scanning all the filesystem bitmaps to determine which blocks are in use and scanning all the other snapshots so as to be able to expunge their blocks from the view of the current snapshot. The bitmap scanning is broken into two passes. Before suspending the filesystem all bitmaps are scanned. After the suspension, those bitmaps that changed after being scanned the first time are rescanned. Typically there are few bitmaps that need to be rescanned. The expunging of other snapshots is now done after the suspension is released by observing that we can easily identify any blocks that were allocated to them after the suspension (they will be maked as `not needing to be copied' in the just created snapshot). For all the gory details, see the ``Running fsck in the Background'' paper in the Usenix BSDCon 2002 Conference Proceedings, pages 55-64.
# 245df27c	25-Oct-2001	Matthew Dillon <dillon@FreeBSD.org>	Implement kern.maxvnodes. adjusting kern.maxvnodes now actually has a real effect. Optimize vfs_msync(). Avoid having to continually drop and re-obtain mutexes when scanning the vnode list. Improves looping case by 500%. Optimize ffs_sync(). Avoid having to continually drop and re-obtain mutexes when scanning the vnode list. This makes a couple of assumptions, which I believe are ok, in regards to vnode stability when the mount list mutex is held. Improves looping case by 500%. (more optimization work is needed on top of these fixes) MFC after: 1 week
# c72ccd01	22-Oct-2001	Matthew Dillon <dillon@FreeBSD.org>	Change the vnode list under the mount point from a LIST to a TAILQ in preparation for an implementation of limiting code for kern.maxvnodes. MFC after: 3 days
# b73d2870	02-Oct-2001	Robert Watson <rwatson@FreeBSD.org>	o Replace two direct uid!=0 comparisons with suser_td() calls. Obtained from: TrustedBSD Project
# b40ce416	12-Sep-2001	Julian Elischer <julian@FreeBSD.org>	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
# 4691e9ea	09-Sep-2001	Ian Dowse <iedowse@FreeBSD.org>	The "dirpref" directory layout preference improvements make use of an array "fs_contigdirs[]" to avoid too many directories getting created in each cylinder group. The memory required for this and two other arrays (fs_csp[] and fs_maxcluster[]) is allocated with a single malloc() call, and divided up afterwards. However, the 'space' pointer is not advanced correctly, so fs_contigdirs and fs_maxcluster end up pointing to the same address. Add the missing code to advance the 'space' pointer, and remove an unnecessary update of the pointer that follows. This is likely to fix the "ffs_clusteralloc: map mismatch" panics that have been reported recently. Submitted by: Luke Mewburn <lukem@wasabisystems.com>
# 7df97b61	01-Sep-2001	Robert Watson <rwatson@FreeBSD.org>	o At some point, unmounting a non-EA file system with EA's compiled in got a bit broken, when ufs_extattr_stop() was called and failed, ufs_extattr_destroy() would panic. This makes the call to destroy() conditional on the success of stop(). Submitted by: Christian Carstensen <cc@devcon.net> Obtained from: TrustedBSD Project
# ed87274d	28-Jun-2001	John Baldwin <jhb@FreeBSD.org>	Fix more mntvnode and vnode interlock order reversals.
# 49d2d9f4	27-Jun-2001	John Baldwin <jhb@FreeBSD.org>	- Fix a mntvnode and vnode interlock reversal. - Protect the mnt_vnode list with the mntvnode lock. - Use queue(9) macros.
# c7a3e237	29-May-2001	Poul-Henning Kamp <phk@FreeBSD.org>	Remove last vestiges of MFS.
# 0864ef1e	16-May-2001	Ian Dowse <iedowse@FreeBSD.org>	Change the second argument of vflush() to an integer that specifies the number of references on the filesystem root vnode to be both expected and released. Many filesystems hold an extra reference on the filesystem root vnode, which must be accounted for when determining if the filesystem is busy and then released if it isn't busy. The old `skipvp' approach required individual filesystem xxx_unmount functions to re-implement much of vflush()'s logic to deal with the root vnode. All 9 filesystems that hold an extra reference on the root vnode got the logic wrong in the case of forced unmounts, so `umount -f' would always fail if there were any extra root vnode references. Fix this issue centrally in vflush(), now that we can. This commit also fixes a vnode reference leak in devfs, which could result in idle devfs filesystems that refuse to unmount. Reviewed by: phk, bp
# 9ccb939e	08-May-2001	Kirk McKusick <mckusick@FreeBSD.org>	When running with soft updates, track the number of blocks and files that are committed to being freed and reflect these blocks in the counts returned by statfs (and thus also by the `df' command). This change allows programs such as those that do news expiration to know when to stop if they are trying to create a certain percentage of free space. Note that this change does not solve the much harder problem of making this to-be-freed space available to applications that want it (thus on a nearly full filesystem, you may still encounter out-of-space conditions even though the free space will show up eventually). Hopefully this harder problem will be the subject of a future enhancement.
# 855aa097	28-Apr-2001	Poul-Henning Kamp <phk@FreeBSD.org>	VOP_BALLOC was never really a VOP in the first place, so convert it to UFS_BALLOC like the other "between UFS and FFS function interfaces".
# 60fb0ce3	28-Apr-2001	Greg Lehey <grog@FreeBSD.org>	Revert consequences of changes to mount.h, part 2. Requested by: bde
# 112f7372	25-Apr-2001	Kirk McKusick <mckusick@FreeBSD.org>	When closing the last reference to an unlinked file, it is freed by the inactive routine. Because the freeing causes the filesystem to be modified, the close must be held up during periods when the filesystem is suspended. For snapshots to be consistent across crashes, they must write blocks that they copy and claim those written blocks in their on-disk block pointers before the old blocks that they referenced can be allowed to be written. Close a loophole that allowed unwritten blocks to be skipped when doing ffs_sync with a request to wait for all I/O activity to be completed.
# a13234bb	25-Apr-2001	Poul-Henning Kamp <phk@FreeBSD.org>	Move the netexport structure from the fs-specific mountstructure to struct mount. This makes the "struct netexport *" paramter to the vfs_export and vfs_checkexport interface unneeded. Consequently that all non-stacking filesystems can use vfs_stdcheckexp(). At the same time, make it a pointer to a struct netexport in struct mount, so that we can remove the bogus AF_MAX and #include <net/radix.h> from <sys/mount.h>
# 5d69bac4	23-Apr-2001	Ian Dowse <iedowse@FreeBSD.org>	Pre-dirpref versions of fsck may zero out the new superblock fields fs_contigdirs, fs_avgfilesize and fs_avgfpdir. This could cause panics if these fields were zeroed while a filesystem was mounted read-only, and then remounted read-write. Add code to ffs_reload() which copies the fs_contigdirs pointer from the previous superblock, and reinitialises fs_avgf* if necessary. Reviewed by: mckusick
# d98dc34f	23-Apr-2001	Greg Lehey <grog@FreeBSD.org>	Correct #includes to work with fixed sys/mount.h.
# 5819ab3f	16-Apr-2001	Kirk McKusick <mckusick@FreeBSD.org>	Add debugging option to always read/write cylinder groups as full sized blocks. To enable this option, use: `sysctl -w debug.bigcgs=1'. Add debugging option to disable background writes of cylinder groups. To enable this option, use: `sysctl -w debug.dobkgrdwrite=0'. These debugging options should be tried on systems that are panicing with corrupted cylinder group maps to see if it makes the problem go away. The set of panics in question are: ffs_clusteralloc: map mismatch ffs_nodealloccg: map corrupted ffs_nodealloccg: block not in map ffs_alloccg: map corrupted ffs_alloccg: block not in map ffs_alloccgblk: cyl groups corrupted ffs_alloccgblk: can't find blk in cyl ffs_checkblk: partially free fragment The following panics are less likely to be related to this problem, but might be helped by these debugging options: ffs_valloc: dup alloc ffs_blkfree: freeing free block ffs_blkfree: freeing free frag ffs_vfree: freeing free inode If you try these options, please report whether they helped reduce your bitmap corruption panics to Kirk McKusick at <mckusick@mckusick.com> and to Matt Dillon <dillon@earth.backplane.com>.
# 1a6a6610	13-Apr-2001	Kirk McKusick <mckusick@FreeBSD.org>	This checkin adds support in ufs/ffs for the FS_NEEDSFSCK flag. It is described in ufs/ffs/fs.h as follows: /* * Filesystem flags. * * Note that the FS_NEEDSFSCK flag is set and cleared only by the * fsck utility. It is set when background fsck finds an unexpected * inconsistency which requires a traditional foreground fsck to be * run. Such inconsistencies should only be found after an uncorrectable * disk error. A foreground fsck will clear the FS_NEEDSFSCK flag when * it has successfully cleaned up the filesystem. The kernel uses this * flag to enforce that inconsistent filesystems be mounted read-only. / #define FS_UNCLEAN 0x01 / filesystem not clean at mount / #define FS_DOSOFTDEP 0x02 / filesystem using soft dependencies / #define FS_NEEDSFSCK 0x04 / filesystem needs sync fsck before mount */
# a61ab64a	10-Apr-2001	Kirk McKusick <mckusick@FreeBSD.org>	Directory layout preference improvements from Grigoriy Orlov <gluk@ptci.ru>. His description of the problem and solution follow. My own tests show speedups on typical filesystem intensive workloads of 5% to 12% which is very impressive considering the small amount of code change involved. ------ One day I noticed that some file operations run much faster on small file systems then on big ones. I've looked at the ffs algorithms, thought about them, and redesigned the dirpref algorithm. First I want to describe the results of my tests. These results are old and I have improved the algorithm after these tests were done. Nevertheless they show how big the perfomance speedup may be. I have done two file/directory intensive tests on a two OpenBSD systems with old and new dirpref algorithm. The first test is "tar -xzf ports.tar.gz", the second is "rm -rf ports". The ports.tar.gz file is the ports collection from the OpenBSD 2.8 release. It contains 6596 directories and 13868 files. The test systems are: 1. Celeron-450, 128Mb, two IDE drives, the system at wd0, file system for test is at wd1. Size of test file system is 8 Gb, number of cg=991, size of cg is 8m, block size = 8k, fragment size = 1k OpenBSD-current from Dec 2000 with BUFCACHEPERCENT=35 2. PIII-600, 128Mb, two IBM DTLA-307045 IDE drives at i815e, the system at wd0, file system for test is at wd1. Size of test file system is 40 Gb, number of cg=5324, size of cg is 8m, block size = 8k, fragment size = 1k OpenBSD-current from Dec 2000 with BUFCACHEPERCENT=50 You can get more info about the test systems and methods at: http://www.ptci.ru/gluk/dirpref/old/dirpref.html Test Results tar -xzf ports.tar.gz rm -rf ports mode old dirpref new dirpref speedup old dirprefnew dirpref speedup First system normal 667 472 1.41 477 331 1.44 async 285 144 1.98 130 14 9.29 sync 768 616 1.25 477 334 1.43 softdep 413 252 1.64 241 38 6.34 Second system normal 329 81 4.06 263.5 93.5 2.81 async 302 25.7 11.75 112 2.26 49.56 sync 281 57.0 4.93 263 90.5 2.9 softdep 341 40.6 8.4 284 4.76 59.66 "old dirpref" and "new dirpref" columns give a test time in seconds. speedup - speed increasement in times, ie. old dirpref / new dirpref. ------ Algorithm description The old dirpref algorithm is described in comments: /* * Find a cylinder to place a directory. * * The policy implemented by this algorithm is to select from * among those cylinder groups with above the average number of * free inodes, the one with the smallest number of directories. / A new directory is allocated in a different cylinder groups than its parent directory resulting in a directory tree that is spreaded across all the cylinder groups. This spreading out results in a non-optimal access to the directories and files. When we have a small filesystem it is not a problem but when the filesystem is big then perfomance degradation becomes very apparent. What I mean by a big file system ? 1. A big filesystem is a filesystem which occupy 20-30 or more percent of total drive space, i.e. first and last cylinder are physically located relatively far from each other. 2. It has a relatively large number of cylinder groups, for example more cylinder groups than 50% of the buffers in the buffer cache. The first results in long access times, while the second results in many buffers being used by metadata operations. Such operations use cylinder group blocks and on-disk inode blocks. The cylinder group block (fs->fs_cblkno) contains struct cg, inode and block bit maps. It is 2k in size for the default filesystem parameters. If new and parent directories are located in different cylinder groups then the system performs more input/output operations and uses more buffers. On filesystems with many cylinder groups, lots of cache buffers are used for metadata operations. My solution for this problem is very simple. I allocate many directories in one cylinder group. I also do some things, so that the new allocation method does not cause excessive fragmentation and all directory inodes will not be located at a location far from its file's inodes and data. The algorithm is: / * Find a cylinder group to place a directory. * * The policy implemented by this algorithm is to allocate a * directory inode in the same cylinder group as its parent * directory, but also to reserve space for its files inodes * and data. Restrict the number of directories which may be * allocated one after another in the same cylinder group * without intervening allocation of files. * * If we allocate a first level directory then force allocation * in another cylinder group. / My early versions of dirpref give me a good results for a wide range of file operations and different filesystem capacities except one case: those applications that create their entire directory structure first and only later fill this structure with files. My solution for such and similar cases is to limit a number of directories which may be created one after another in the same cylinder group without intervening file creations. For this purpose, I allocate an array of counters at mount time. This array is linked to the superblock fs->fs_contigdirs[cg]. Each time a directory is created the counter increases and each time a file is created the counter decreases. A 60Gb filesystem with 8mb/cg requires 10kb of memory for the counters array. The maxcontigdirs is a maximum number of directories which may be created without an intervening file creation. I found in my tests that the best performance occurs when I restrict the number of directories in one cylinder group such that all its files may be located in the same cylinder group. There may be some deterioration in performance if all the file inodes are in the same cylinder group as its containing directory, but their data partially resides in a different cylinder group. The maxcontigdirs value is calculated to try to prevent this condition. Since there is no way to know how many files and directories will be allocated later I added two optimization parameters in superblock/tunefs. They are: int32_t fs_avgfilesize; / expected average file size / int32_t fs_avgfpdir; / expected # of files per directory */ These parameters have reasonable defaults but may be tweeked for special uses of a filesystem. They are only necessary in rare cases like better tuning a filesystem being used to store a squid cache. I have been using this algorithm for about 3 months. I have done a lot of testing on filesystems with different capacities, average filesize, average number of files per directory, and so on. I think this algorithm has no negative impact on filesystem perfomance. It works better than the default one in all cases. The new dirpref will greatly improve untarring/removing/coping of big directories, decrease load on cvs servers and much more. The new dirpref doesn't speedup a compilation process, but also doesn't slow it down. Obtained from: Grigoriy Orlov <gluk@ptci.ru>
# 812b1d41	20-Mar-2001	Kirk McKusick <mckusick@FreeBSD.org>	Add kernel support for running fsck on active filesystems.
# 516081f2	18-Mar-2001	Robert Watson <rwatson@FreeBSD.org>	o Change options FFS_EXTATTR and options FFS_EXTATTR_AUTOSTART to options UFS_EXTATTR and UFS_EXTATTR_AUTOSTART respectively. This change reflects the fact that our EA support is implemented entirely at the UFS layer (modulo FFS start/stop/autostart hooks for mount and unmount events). This also better reflects the fact that [shortly] MFS will also support EAs, as well as possibly IFS. o Consumers of the EA support in FFS are reminded that as a result, they must change kernel config files to reflect the new option names. Obtained from: TrustedBSD Project
# f5161237	13-Mar-2001	Robert Watson <rwatson@FreeBSD.org>	o Implement "options FFS_EXTATTR_AUTOSTART", which depends on "options FFS_EXTATTR". When extended attribute auto-starting is enabled, FFS will scan the .attribute directory off of the root of each file system, as it is mounted. If .attribute exists, EA support will be started for the file system. If there are files in the directory, FFS will attempt to start them as attribute backing files for attributes baring the same name. All attributes are started before access to the file system is permitted, so this permits race-free enabling of attributes. For attributes backing support for security features, such as ACLs, MAC, Capabilities, this is vital, as it prevents the file system attributes from getting out of sync as a result of file system operations between mount-time and the enabling of the extended attribute. The userland extattrctl tool will still function exactly as previously. Files must be placed directly in .attribute, which must be directly off of the file system root: symbolic links are not permitted. FFS_EXTATTR will continue to be able to function without FFS_EXTATTR_AUTOSTART for sites that do not want/require auto-starting. If you're using the UFS_ACL code available from www.TrustedBSD.org, using FFS_EXTATTR_AUTOSTART is recommended. o This support is implemented by adding an invocation of ufs_extattr_autostart() to ffs_mountfs(). In addition, several new supporting calls are introduced in ufs_extattr.c: ufs_extattr_autostart(): start EAs on the specified mount ufs_extattr_lookup(): given a directory and filename, return the vnode for the file. ufs_extattr_enable_with_open(): invoke ufs_extattr_enable() after doing the equililent of vn_open() on the passed file. ufs_extattr_iterate_directory(): iterate over a directory, invoking ufs_extattr_lookup() and ufs_extattr_enable_with_open() on each entry. o This feature is not widely tested, and therefore may contain bugs, caution is advised. Several changes are in the pipeline for this feature, including breaking out of EA namespaces into subdirectories of .attribute (this is waiting on the updated EA API), as well as a per-filesystem flag indicating whether or not EAs should be auto-started. This is required because administrators may not want .attribute auto-started on all file systems, especially if non-administrators have write access to the root of a file system. Obtained from: TrustedBSD Project
# 589c7af9	07-Mar-2001	Kirk McKusick <mckusick@FreeBSD.org>	Fixes to track snapshot copy-on-write checking in the specinfo structure rather than assuming that the device vnode would reside in the FFS filesystem (which is obviously a broken assumption with the device filesystem).
# f3a90da9	01-Mar-2001	Adrian Chadd <adrian@FreeBSD.org>	Reviewed by: jlemon An initial tidyup of the mount() syscall and VFS mount code. This code replaces the earlier work done by jlemon in an attempt to make linux_mount() work. * the guts of the mount work has been moved into vfs_mount(). * move `type', `path' and `flags' from being userland variables into being kernel variables in vfs_mount(). `data' remains a pointer into userspace. * Attempt to verify the `type' and `path' strings passed to vfs_mount() aren't too long. * rework mount() and linux_mount() to take the userland parameters (besides data, as mentioned) and pass kernel variables to vfs_mount(). (linux_mount() already did this, I've just tidied it up a little more.) * remove the copyin() stuff for `path'. `data' still requires copyin() since its a pointer into userland. * set `mount->mnt_statf_mntonname' in vfs_mount() rather than in each filesystem. This variable is generally initialised with `path', and each filesystem can override it if they want to. * NOTE: f_mntonname is intiailised with "/" in the case of a root mount.
# 9ed346ba	08-Feb-2001	Bosko Milekic <bmilekic@FreeBSD.org>	Change and clean the mutex lock interface. mtx_enter(lock, type) becomes: mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks) mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized) similarily, for releasing a lock, we now have: mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN. We change the caller interface for the two different types of locks because the semantics are entirely different for each case, and this makes it explicitly clear and, at the same time, it rids us of the extra `type' argument. The enter->lock and exit->unlock change has been made with the idea that we're "locking data" and not "entering locked code" in mind. Further, remove all additional "flags" previously passed to the lock acquire/release routines with the exception of two: MTX_QUIET and MTX_NOSWITCH The functionality of these flags is preserved and they can be passed to the lock/unlock routines by calling the corresponding wrappers: mtx_{lock, unlock}_flags(lock, flag(s)) and mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN locks, respectively. Re-inline some lock acq/rel code; in the sleep lock case, we only inline the _obtain_lock()s in order to ensure that the inlined code fits into a cache line. In the spin lock case, we inline recursion and actually only perform a function call if we need to spin. This change has been made with the idea that we generally tend to avoid spin locks and that also the spin locks that we do have and are heavily used (i.e. sched_lock) do recurse, and therefore in an effort to reduce function call overhead for some architectures (such as alpha), we inline recursion for this case. Create a new malloc type for the witness code and retire from using the M_DEV type. The new type is called M_WITNESS and is only declared if WITNESS is enabled. Begin cleaning up some machdep/mutex.h code - specifically updated the "optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently need those. Finally, caught up to the interface changes in all sys code. Contributors: jake, jhb, jasone (in no particular order)
# fc2ffbe6	04-Feb-2001	Poul-Henning Kamp <phk@FreeBSD.org>	Mechanical change to use <sys/queue.h> macro API instead of fondling implementation details. Created with: sed(1) Reviewed by: md5(1)
# 1b367556	23-Jan-2001	Jason Evans <jasone@FreeBSD.org>	Convert all simplelocks to mutexes and remove the simplelock implementations.
# f55ff3f3	15-Jan-2001	Ian Dowse <iedowse@FreeBSD.org>	The ffs superblock includes a 128-byte region for use by temporary in-core pointers to summary information. An array in this region (fs_csp) could overflow on filesystems with a very large number of cylinder groups (~16000 on i386 with 8k blocks). When this happens, other fields in the superblock get corrupted, and fsck refuses to check the filesystem. Solve this problem by replacing the fs_csp array in 'struct fs' with a single pointer, and add padding to keep the length of the 128-byte region fixed. Update the kernel and userland utilities to use just this single pointer. With this change, the kernel no longer makes use of the superblock fields 'fs_csshift' and 'fs_csmask'. Add a comment to newfs/mkfs.c to indicate that these fields must be calculated for compatibility with older kernels. Reviewed by: mckusick
# 937c4dfa	13-Dec-2000	Seigo Tanimura <tanimura@FreeBSD.org>	Do not race for the lock of an inode hash. Reviewed by: jhb
# 7cc0979f	08-Dec-2000	David Malone <dwmalone@FreeBSD.org>	Convert more malloc+bzero to malloc+M_ZERO. Submitted by: josh@zipperup.org Submitted by: Robert Drehmel <robd@gmx.net>
# 0b0c10b4	13-Oct-2000	Adrian Chadd <adrian@FreeBSD.org>	Initial commit of IFS - a inode-namespaced FFS. Here is a short description: How it works: -- Basically ifs is a copy of ffs, overriding some vfs/vnops. (Yes, hack.) I didn't see the need in duplicating all of sys/ufs/ffs to get this off the ground. File creation is done through a special file - 'newfile' . When newfile is called, the system allocates and returns an inode. Note that newfile is done in a cloning fashion: fd = open("newfile", O_CREAT\|O_RDWR, 0644); fstat(fd, &st); printf("new file is %d\n", (int)st.st_ino); Once you have created a file, you can open() and unlink() it by its returned inode number retrieved from the stat call, ie: fd = open("5", O_RDWR); The creation permissions depend entirely if you have write access to the root directory of the filesystem. To get the list of currently allocated inodes, VOP_READDIR has been added which returns a directory listing of those currently allocated. -- What this entails: * patching conf/files and conf/options to include IFS as a new compile option (and since ifs depends upon FFS, include the FFS routines) * An entry in i386/conf/NOTES indicating IFS exists and where to go for an explanation * Unstaticize a couple of routines in src/sys/ufs/ffs/ which the IFS routines require (ffs_mount() and ffs_reload()) * a new bunch of routines in src/sys/ufs/ifs/ which implement the IFS routines. IFS replaces some of the vfsops, and a handful of vnops - most notably are VFS_VGET(), VOP_LOOKUP(), VOP_UNLINK() and VOP_READDIR(). Any other directory operation is marked as invalid. What this results in: * an IFS partition's create permissions are controlled by the perm/ownership of the root mount point, just like a normal directory * Each inode has perm and ownership too * IFS does NOT mean an FFS partition can be opened per inode. This is a completely seperate filesystem here * Softupdates doesn't work with IFS, and really I don't think it needs it. Besides, fsck's are FAST. (Try it :-) * Inodes 0 and 1 aren't allocatable because they are special (dump/swap IIRC). Inode 2 isn't allocatable since UFS/FFS locks all inodes in the system against this particular inode, and unravelling THAT code isn't trivial. Therefore, useful inodes start at 3. Enjoy, and feedback is definitely appreciated!
# 7eb9fca5	09-Oct-2000	Eivind Eklund <eivind@FreeBSD.org>	Blow away the v_specmountpoint define, replacing it with what it was defined as (rdev->si_mountpoint)
# ff435dcb	06-Oct-2000	Robert Watson <rwatson@FreeBSD.org>	o Move initialization of ump from mp to the top of the function so that it is defined whenm used in ufs_extattr_uepm_destroy(), fixing a panic due to a NULL pointer dereference. Submitted by: Wesley Morgan <morganw@chemicals.tacorp.com>
# 9de54ba5	03-Oct-2000	Robert Watson <rwatson@FreeBSD.org>	o Add call to ufs_extattr_uepm_destroy() in ffs_unmount() so as to clean up lock on extattrs. o Get for free a comment indicating where auto-starting of extended attributes will eventually occur, as it was in my commit tree also. No implementation change here, only a comment.
# a18b1f1d	03-Oct-2000	Jason Evans <jasone@FreeBSD.org>	Convert lockmgr locks from using simple locks to using mutexes. Add lockdestroy() and appropriate invocations, which corresponds to lockinit() and must be called to clean up after a lockmgr lock is no longer needed.
# 67e87166	25-Sep-2000	Boris Popov <bp@FreeBSD.org>	Add a lock structure to vnode structure. Previously it was either allocated separately (nfs, cd9660 etc) or keept as a first element of structure referenced by v_data pointer(ffs). Such organization leads to known problems with stacked filesystems. From this point vop_nolock() functions maintain only interlock lock. vop_stdlock() functions maintain built-in v_lock structure using lockmgr(). vop_sharedlock() is compatible with vop_stdunlock(), but maintains a shared lock on vnode. If filesystem wishes to export lockmgr compatible lock, it can put an address of this lock to v_vnlock field. This indicates that the upper filesystem can take advantage of it and use single lock structure for entire (or part) of stack of vnodes. This field shouldn't be examined or modified by VFS code except for initialization purposes. Reviewed in general by: mckusick
# 8694d8e9	01-Aug-2000	Ollivier Robert <roberto@FreeBSD.org>	Fix the lockmgr panic everyone is seeing at shutdown time. vput assumes curproc is the lock holder, but it's not true in this case. Thanks a lot Luoqi ! Submitted by: luoqi Tested by: phk
# 9b971133	23-Jul-2000	Kirk McKusick <mckusick@FreeBSD.org>	This patch corrects the first round of panics and hangs reported with the new snapshot code. Update addaliasu to correctly implement the semantics of the old checkalias function. When a device vnode first comes into existence, check to see if an anonymous vnode for the same device was created at boot time by bdevvp(). If so, adopt the bdevvp vnode rather than creating a new vnode for the device. This corrects a problem which caused the kernel to panic when taking a snapshot of the root filesystem. Change the calling convention of vn_write_suspend_wait() to be the same as vn_start_write(). Split out softdep_flushworklist() from softdep_flushfiles() so that it can be used to clear the work queue when suspending filesystem operations. Access to buffers becomes recursive so that snapshots can recursively traverse their indirect blocks using ffs_copyonwrite() when checking for the need for copy on write when flushing one of their own indirect blocks. This eliminates a deadlock between the syncer daemon and a process taking a snapshot. Ensure that softdep_process_worklist() can never block because of a snapshot being taken. This eliminates a problem with buffer starvation. Cleanup change in ffs_sync() which did not synchronously wait when MNT_WAIT was specified. The result was an unclean filesystem panic when doing forcible unmount with heavy filesystem I/O in progress. Return a zero'ed block when reading a block that was not in use at the time that a snapshot was taken. Normally, these blocks should never be read. However, the readahead code will occationally read them which can cause unexpected behavior. Clean up the debugging code that ensures that no blocks be written on a filesystem while it is suspended. Snapshots must explicitly label the blocks that they are writing during the suspension so that they do not cause a `write on suspended filesystem' panic. Reorganize ffs_copyonwrite() to eliminate a deadlock and also to prevent a race condition that would permit the same block to be copied twice. This change eliminates an unexpected soft updates inconsistency in fsck caused by the double allocation. Use bqrelse rather than brelse for buffers that will be needed soon again by the snapshot code. This improves snapshot performance.
# 3fbd9742	12-Jul-2000	Boris Popov <bp@FreeBSD.org>	Prevent possible dereference of NULL pointer. Submitted by: Marius Bendiksen <mbendiks@eunet.no>
# f2a2857b	11-Jul-2000	Kirk McKusick <mckusick@FreeBSD.org>	Add snapshots to the fast filesystem. Most of the changes support the gating of system calls that cause modifications to the underlying filesystem. The gating can be enabled by any filesystem that needs to consistently suspend operations by adding the vop_stdgetwritemount to their set of vnops. Once gating is enabled, the function vfs_write_suspend stops all new write operations to a filesystem, allows any filesystem modifying system calls already in progress to complete, then sync's the filesystem to disk and returns. The function vfs_write_resume allows the suspended write operations to begin again. Gating is not added by default for all filesystems as for SMP systems it adds two extra locks to such critical kernel paths as the write system call. Thus, gating should only be added as needed. Details on the use and current status of snapshots in FFS can be found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness is not included here. Unless and until you create a snapshot file, these changes should have no effect on your system (famous last words).
# b2b0497a	03-Jun-2000	Robert Watson <rwatson@FreeBSD.org>	o If FFS_EXTATTR is defined, don't print out an error message on unmount if an FFS partition returns EOPNOTSUPP, as it just means extended attributes weren't enabled on that partition. Prevents spurious warning per-partition at shutdown.
# f3706a03	07-May-2000	Robert Watson <rwatson@FreeBSD.org>	s/ffs_unmonut/ffs_unmount/ in a gratuitous ufs_extattr printf. Reported by: knu
# 9626b608	05-May-2000	Poul-Henning Kamp <phk@FreeBSD.org>	Separate the struct bio related stuff out of <sys/buf.h> into <sys/bio.h>. <sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall not be made a nested include according to bdes teachings on the subject of nested includes. Diskdrivers and similar stuff below specfs::strategy() should no longer need to include <sys/buf.> unless they need caching of data. Still a few bogus uses of struct buf to track down. Repocopy by: peter
# 2c9b67a8	30-Apr-2000	Poul-Henning Kamp <phk@FreeBSD.org>	Remove unneeded #include <vm/vm_zone.h> Generated by: src/tools/tools/kerninclude
# a64ed089	14-Apr-2000	Robert Watson <rwatson@FreeBSD.org>	Introduce extended attribute support for FFS, allowing arbitrary (name, value) pairs to be associated with inodes. This support is used for ACLs, MAC labels, and Capabilities in the TrustedBSD security extensions, which are currently under development. In this implementation, attributes are backed to data vnodes in the style of the quota support in FFS. Support for FFS extended attributes may be enabled using the FFS_EXTATTR kernel option (disabled by default). Userland utilities and man pages will be committed in the next batch. VFS interfaces and man pages have been in the repo since 4.0-RELEASE and are unchanged. o ufs/ufs/extattr.h: UFS-specific extattr defines o ufs/ufs/ufs_extattr.c: bulk of support routines o ufs/{ufs,ffs,mfs}/*.[ch]: hooks and extattr.h includes o contrib/softupdates/ffs_softdep.c: extattr.h includes o conf/options, conf/files, i386/conf/LINT: added FFS_EXTATTR o coda/coda_vfsops.c: XXX required extattr.h due to ufsmount.h (This should not be the case, and will be fixed in a future commit) Currently attributes are not supported in MFS. This will be fixed. Reviewed by: adrian, bp, freebsd-fs, other unthanked souls Obtained from: TrustedBSD Project
# ba4ad1fc	09-Jan-2000	Poul-Henning Kamp <phk@FreeBSD.org>	Give vn_isdisk() a second argument where it can return a suitable errno. Suggested by: bde
# cf60e8e4	09-Jan-2000	Kirk McKusick <mckusick@FreeBSD.org>	Several performance improvements for soft updates have been added: 1) Fastpath deletions. When a file is being deleted, check to see if it was so recently created that its inode has not yet been written to disk. If so, the delete can proceed to immediately free the inode. 2) Background writes: No file or block allocations can be done while the bitmap is being written to disk. To avoid these stalls, the bitmap is copied to another buffer which is written thus leaving the original available for futher allocations. 3) Link count tracking. Constantly track the difference in i_effnlink and i_nlink so that inodes that have had no change other than i_effnlink need not be written. 4) Identify buffers with rollback dependencies so that the buffer flushing daemon can choose to skip over them.
# 7e58bfac	23-Dec-1999	Bruce Evans <bde@FreeBSD.org>	Update the unclean flag for mount -u. I forgot to handle this case when I made the absence of the clean flag sticky in rev.1.88. This was a problem main for "mount /". There is no way to mount "/" for writing without using mount -u (normally implicitly), so after "mount -f /" of an unclean filesystem, the absence of the clean flag was sticky forever.
# 91f37dcb	18-Dec-1999	Robert Watson <rwatson@FreeBSD.org>	Second pass commit to introduce new ACL and Extended Attribute system calls, vnops, vfsops, both in /kern, and to individual file systems that require a vfsop_ array entry. Reviewed by: eivind
# 762e6b85	15-Dec-1999	Eivind Eklund <eivind@FreeBSD.org>	Introduce NDFREE (and remove VOP_ABORTOP)
# 38224dcd	22-Nov-1999	Poul-Henning Kamp <phk@FreeBSD.org>	Convert various pieces of code to use vn_isdisk() rather than checking for vp->v_type == VBLK. In ccd: we don't need to call VOP_GETATTR to find the type of a vnode. Reviewed by: sos
# 698f9cf8	09-Nov-1999	Poul-Henning Kamp <phk@FreeBSD.org>	Next step in the device cleanup process. Correctly lock vnodes when calling VOP_OPEN() from filesystem mount code. Unify spec_open() for bdev and cdev cases. Remove the disabled bdev specific read/write code.
# 5bd5c8b9	02-Nov-1999	Bruce Evans <bde@FreeBSD.org>	Quick fix for breakage of ext2fs link counts as reported by stat(2) by the soft updates changes: only report the link count to be i_effnlink in ufs_getattr() for file systems that maintain i_effnlink. Tested by: Mike Dracopoulos <mdraco@math.uoa.gr>
# 6d147828	01-Nov-1999	Mike Smith <msmith@FreeBSD.org>	Newline-terminate the complaint message about not being able to find the root vnode pointer.
# 923502ff	29-Oct-1999	Poul-Henning Kamp <phk@FreeBSD.org>	useracc() the prequel: Merge the contents (less some trivial bordering the silly comments) of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts the #defines for the vm_inherit_t and vm_prot_t types next to their typedefs. This paves the road for the commit to follow shortly: change useracc() to use VM_PROT_{READ\|WRITE} rather than B_{READ\|WRITE} as argument.
# b89392e7	30-Sep-1999	Poul-Henning Kamp <phk@FreeBSD.org>	Remove the D_NOCLUSTER[RW] options which were added because vn had problems. Now that Matt has fixed vn, this can go. The vn driver should have used d_maxio (now si_iosize_max) anyway.
# 1b5464ef	29-Sep-1999	Poul-Henning Kamp <phk@FreeBSD.org>	Remove v_maxio from struct vnode. Replace it with mnt_iosize_max in struct mount. Nits from: bde
# c24fda81	10-Sep-1999	Alfred Perlstein <alfred@FreeBSD.org>	Seperate the export check in VFS_FHTOVP, exports are now checked via VFS_CHECKEXP. Add fh(open\|stat\|stafs) syscalls to allow userland to query filesystems based on (network) filehandle. Obtained from: NetBSD
# c3aac50f	27-Aug-1999	Peter Wemm <peter@FreeBSD.org>	$Id$ -> $FreeBSD$
# 41d2e3e0	24-Aug-1999	Poul-Henning Kamp <phk@FreeBSD.org>	Introduce vn_isdisk(struct vnode *vp) function, and use it to test for diskness.
# 7dc5cd04	13-Aug-1999	Poul-Henning Kamp <phk@FreeBSD.org>	The bdevsw() and cdevsw() are now identical, so kill the former.
# 0ef1c826	08-Aug-1999	Poul-Henning Kamp <phk@FreeBSD.org>	Decommision miscfs/specfs/specdev.h. Most of it goes into <sys/conf.h>, a few lines into <sys/vnode.h>. Add a few fields to struct specinfo, paving the way for the fun part.
# 68de329e	11-Jul-1999	Poul-Henning Kamp <phk@FreeBSD.org>	Use the fsid from the superblock, unless it looks bogus or has already been taken by some other filesystem.
# 2447bec8	31-May-1999	Poul-Henning Kamp <phk@FreeBSD.org>	Simplify cdevsw registration. The cdevsw_add() function now finds the major number(s) in the struct cdevsw passed to it. cdevsw_add_generic() is no longer needed, cdevsw_add() does the same thing. cdevsw_add() will print an message if the d_maj field looks bogus. Remove nblkdev and nchrdev variables. Most places they were used bogusly. Instead check a dev_t for validity by seeing if devsw() or bdevsw() returns NULL. Move bdevsw() and devsw() functions to kern/kern_conf.c Bump __FreeBSD_version to 400006 This commit removes: 72 bogus makedev() calls 26 bogus SYSINIT functions if_xe.c bogusly accessed cdevsw[], author/maintainer please fix. I4b and vinum not changed. Patches emailed to authors. LINT probably broken until they catch up.
# 4be2eb8c	08-May-1999	Poul-Henning Kamp <phk@FreeBSD.org>	I got tired of seeing all the cdevsw[major(foo)] all over the place. Made a new (inline) function devsw(dev_t dev) and substituted it. Changed to the BDEV variant to this format as well: bdevsw(dev_t dev) DEVFS will eventually benefit from this change too.
# 46eede00	07-May-1999	Poul-Henning Kamp <phk@FreeBSD.org>	Continue where Julian left off in July 1998: Virtualize bdevsw[] from cdevsw. bdevsw() is now an (inline) function. Join CDEV_MODULE and BDEV_MODULE to DEV_MODULE (please pay attention to the order of the cmaj/bmaj arguments!) Join CDEV_DRIVER_MODULE and BDEV_DRIVER_MODULE to DEV_DRIVER_MODULE (ditto!) (Next step will be to convert all bdev dev_t's to cdev dev_t's before they get to do any damage^H^H^H^H^H^Hwork in the kernel.)
# 8aef1712	27-Jan-1999	Matthew Dillon <dillon@FreeBSD.org>	Fix warnings in preparation for adding -Wall -Wcast-qual to the kernel compile
# de5d1ba5	07-Jan-1999	Bruce Evans <bde@FreeBSD.org>	Don't pass unused unused timestamp args to UFS_UPDATE() or waste time initializing them. This almost finishes centralizing (in-core) timestamp updates in ufs_itimes().
# fb116777	05-Jan-1999	Eivind Eklund <eivind@FreeBSD.org>	Remove the 'waslocked' parameter to vfs_object_create().
# a777e820	01-Jan-1999	Eivind Eklund <eivind@FreeBSD.org>	Remove the last clients of vfs_object_create(..., waslocked=1); waslocked will go away shortly. Reviewed by: dg
# 40c8cfe5	31-Oct-1998	Peter Wemm <peter@FreeBSD.org>	Use TAILQ macros for clean/dirty block list processing. Set b_xflags rather than abusing the list next pointer with a magic number.
# b5ee1640	27-Oct-1998	Bruce Evans <bde@FreeBSD.org>	Oops, the redundant tests for major numbers weren't redundant here. They checked for the magic major number for the "device" behind mfs mount points. Use a more obvious check for this device. Debugged by: Andrew Gallatin <gallatin@cs.duke.edu>
# 9c0619da	25-Oct-1998	Bruce Evans <bde@FreeBSD.org>	Don't follow null bdevsw pointers. The `major(dev) < nblkdev' test rotted when bdevsw[] became sparse. We still depend on magic to avoid having to check that (v_rdev) device numbers in vnodes are not NODEV. Removed redundant `major(dev) < nblkdev' tests instead of updating them.
# f5ef029e	25-Oct-1998	Poul-Henning Kamp <phk@FreeBSD.org>	Nitpicking and dusting performed on a train. Removes trivial warnings about unused variables, labels and other lint.
# 0922cce6	25-Sep-1998	Bruce Evans <bde@FreeBSD.org>	Fixed clean flag handling: - don't set the clean flag on unmount of an unclean filesystem that was (forcibly) mounted rw. - set the clean flag on rw -> ro update of a mounted initially-clean filesystem. - fixed some style bugs (mostly long lines). This uses the fs_flags field and FS_UNCLEAN state bit which were introduced in the softdep changes. NetBSD uses extra state bits in fs_clean. Reviewed by: luoqui
# d024c955	14-Sep-1998	Søren Schmidt <sos@FreeBSD.org>	Remove the SLICE code. This clearly needs alot more thought, and we dont need this to hunt us down in 3.0-RELEASE.
# 8994ca3c	07-Sep-1998	Bruce Evans <bde@FreeBSD.org>	Removed statically configured mount type numbers (MOUNT_) and all references to them. The change a couple of days ago to ignore these numbers in statically configured vfsconf structs was slightly premature because the cd9660, cfs, devfs, ext2fs, nfs vfs's still used MOUNT_ instead of the number in their vfsconf struct.
# 0492d857	17-Aug-1998	Bruce Evans <bde@FreeBSD.org>	Removed unused includes.
# bcbd6c6f	08-Jul-1998	Julian Elischer <julian@FreeBSD.org>	Don't update superblock if mounted readonly, also fixes some problems with softupdates on root. More cleanups are needed here.. Submitted by: Luoqi Chen <luoqi@watermarkgroup.com>
# 8435e0ae	04-Jun-1998	Doug Rabson <dfr@FreeBSD.org>	Use size_t instead of u_int for sizes.
# c11d2981	18-May-1998	Julian Elischer <julian@FreeBSD.org>	try stop the user from using mount -u to set the async flag on a filesystem currently using soft updates. Also needs a new copy of ffs_softdep.c to complete the fix.
# 79cc756d	05-May-1998	Mike Smith <msmith@FreeBSD.org>	As described by the submitter: Reverse the VFS_VRELE patch. Reference counting of vnodes does not need to be done per-fs. I noticed this while fixing vfs layering violations. Doing reference counting in generic code is also the preference cited by John Heidemann in recent discussions with him. The implementation of alternative vnode management per-fs is still a valid requirement for some filesystems but will be revisited sometime later, most likely using a different framework. Submitted by: Michael Hancock <michaelh@cet.co.jp>
# c0bab11d	19-Apr-1998	Julian Elischer <julian@FreeBSD.org>	Make the devfs SLICE option a standard type option. (hopefully it will go away eventually anyhow)
# 3e425b96	19-Apr-1998	Julian Elischer <julian@FreeBSD.org>	Add changes and code to implement a functional DEVFS. This code will be turned on with the TWO options DEVFS and SLICE. (see LINT) Two labels PRE_DEVFS_SLICE and POST_DEVFS_SLICE will deliniate these changes. /dev will be automatically mounted by init (thanks phk) on bootup. See /sys/dev/slice/slice.4 for more info. All code should act the same without these options enabled. Mike Smith, Poul Henning Kamp, Soeren, and a few dozen others This code does not support the following: bad144 handling. Persistance. (My head is still hurting from the last time we discussed this) ATAPI flopies are not handled by the SLICE code yet. When this code is running, all major numbers are arbitrary and COULD be dynamically assigned. (this is not done, for POLA only) Minor numbers for disk slices ARE arbitray and dynamically assigned.
# 227ee8a1	30-Mar-1998	Poul-Henning Kamp <phk@FreeBSD.org>	Eradicate the variable "time" from the kernel, using various measures. "time" wasn't a atomic variable, so splfoo() protection were needed around any access to it, unless you just wanted the seconds part. Most uses of time.tv_sec now uses the new variable time_second instead. gettime() changed to getmicrotime(0. Remove a couple of unneeded splfoo() protections, the new getmicrotime() is atomic, (until Bruce sets a breakpoint in it). A couple of places needed random data, so use read_random() instead of mucking about with time which isn't random. Add a new nfs_curusec() function. Mark a couple of bogosities involving the now disappeard time variable. Update ffs_update() to avoid the weird "== &time" checks, by fixing the one remaining call that passwd &time as args. Change profiling in ncr.c to use ticks instead of time. Resolution is the same. Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call hzto() which subtracts time" sequences. Reviewed by: bde
# 26cf9c3b	27-Mar-1998	Peter Wemm <peter@FreeBSD.org>	Enable the use of soft updates on the root filesystem. Previously, the softdep mode could only be activated on the initial mount of a filesystem and then only if it was a read-write mount. A 'mount -r' (as done in the rootfs mount) followed by a 'mount -u' to convert to read-write didn't start softdep mode.
# b1897c19	08-Mar-1998	Julian Elischer <julian@FreeBSD.org>	Reviewed by: dyson@freebsd.org (john Dyson), dg@root.com (david greenman) Submitted by: Kirk McKusick (mcKusick@mckusick.com) Obtained from: WHistle development tree
# 8f9110f6	07-Mar-1998	John Dyson <dyson@FreeBSD.org>	This mega-commit is meant to fix numerous interrelated problems. There has been some bitrot and incorrect assumptions in the vfs_bio code. These problems have manifest themselves worse on NFS type filesystems, but can still affect local filesystems under certain circumstances. Most of the problems have involved mmap consistancy, and as a side-effect broke the vfs.ioopt code. This code might have been committed seperately, but almost everything is interrelated. 1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that are fully valid. 2) Rather than deactivating erroneously read initial (header) pages in kern_exec, we now free them. 3) Fix the rundown of non-VMIO buffers that are in an inconsistent (missing vp) state. 4) Fix the disassociation of pages from buffers in brelse. The previous code had rotted and was faulty in a couple of important circumstances. 5) Remove a gratuitious buffer wakeup in vfs_vmio_release. 6) Remove a crufty and currently unused cluster mechanism for VBLK files in vfs_bio_awrite. When the code is functional, I'll add back a cleaner version. 7) The page busy count wakeups assocated with the buffer cache usage were incorrectly cleaned up in a previous commit by me. Revert to the original, correct version, but with a cleaner implementation. 8) The cluster read code now tries to keep data associated with buffers more aggressively (without breaking the heuristics) when it is presumed that the read data (buffers) will be soon needed. 9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The delay loop waiting is not useful for filesystem locks, due to the length of the time intervals. 10) Correct and clean-up spec_getpages. 11) Implement a fully functional nfs_getpages, nfs_putpages. 12) Fix nfs_write so that modifications are coherent with the NFS data on the server disk (at least as well as NFS seems to allow.) 13) Properly support MS_INVALIDATE on NFS. 14) Properly pass down MS_INVALIDATE to lower levels of the VM code from vm_map_clean. 15) Better support the notion of pages being busy but valid, so that fewer in-transit waits occur. (use p->busy more for pageouts instead of PG_BUSY.) Since the page is fully valid, it is still usable for reads. 16) It is possible (in error) for cached pages to be busy. Make the page allocation code handle that case correctly. (It should probably be a printf or panic, but I want the system to handle coding errors robustly. I'll probably add a printf.) 17) Correct the design and usage of vm_page_sleep. It didn't handle consistancy problems very well, so make the design a little less lofty. After vm_page_sleep, if it ever blocked, it is still important to relookup the page (if the object generation count changed), and verify it's status (always.) 18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up. 19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush. 20) Fix vm_pager_put_pages and it's descendents to support an int flag instead of a boolean, so that we can pass down the invalidate bit.
# 16337c2e	07-Mar-1998	Bruce Evans <bde@FreeBSD.org>	Fixed missing simple_lock() in ffs_mountfs().
# 34bdbbd0	01-Mar-1998	Mike Smith <msmith@FreeBSD.org>	The intent is to get rid of WILLRELE in vnode_if.src by making a complement to all ops that return a vpp, VFS_VRELE. This is initially only for file systems that implement the following ops that do a WILLRELE: vop_create, vop_whiteout, vop_mknod, vop_remove, vop_link, vop_rename, vop_mkdir, vop_rmdir, vop_symlink This is initial DNA that doesn't do anything yet. VFS_VRELE is implemented but not called. A default vfs_vrele was created for fs implementations that use the standard vnode management routines. VFS_VRELE implementations were made for the following file systems: Standard (vfs_vrele) ffs mfs nfs msdosfs devfs ext2fs Custom union umapfs Just EOPNOTSUPP fdesc procfs kernfs portal cd9660 These implementations may change as VOP changes are implemented. In the next phase, in the vop implementations calls to vrele and the vrele part of vput will be moved to the top layer vfs_vnops and made visible to all layers. vput will be replaced by unlock in these cases. Unlocking will still be done in the per fs layer but the refcount decrement will be triggered at the top because it doesn't hurt to hold a vnode reference a little longer. This will have minimal impact on the structure of the existing code. This will only be done for vnode arguments that are released by the various fs vop implementations. Wider use of VFS_VRELE will likely require restructuring of the code. Reviewed by: phk, dyson, terry et. al. Submitted by: Michael Hancock <michaelh@cet.co.jp>
# c9b99213	24-Feb-1998	Bruce Evans <bde@FreeBSD.org>	Fixed missing permissions checking for mounting by non-root. There is now less need for the vfs.usermount sysctl. msdosfs already has this change, modulo a missing LK_RETRY, via NetBSD. At least ext2fs is missing this and many other changes from Lite2. Obtained from: Lite2
# 303b270b	08-Feb-1998	Eivind Eklund <eivind@FreeBSD.org>	Staticize.
# 0b08f5f7	05-Feb-1998	Eivind Eklund <eivind@FreeBSD.org>	Back out DIAGNOSTIC changes.
# 47cfdb16	04-Feb-1998	Eivind Eklund <eivind@FreeBSD.org>	Turn DIAGNOSTIC into a new-style option.
# 9cfcd011	01-Feb-1998	John Dyson <dyson@FreeBSD.org>	Back out recent laptop sync changes. They had significant errors.
# de1050d8	31-Jan-1998	John Dyson <dyson@FreeBSD.org>	Support more intelligent sync operations for MNT_NOATIME. PR: kern/5577 Submitted by: Craig Leres <leres@ee.lbl.gov>
# 2d8acc0f	22-Jan-1998	John Dyson <dyson@FreeBSD.org>	VM level code cleanups. 1) Start using TSM. Struct procs continue to point to upages structure, after being freed. Struct vmspace continues to point to pte object and kva space for kstack. u_map is now superfluous. 2) vm_map's don't need to be reference counted. They always exist either in the kernel or in a vmspace. The vmspaces are managed by reference counts. 3) Remove the "wired" vm_map nonsense. 4) No need to keep a cache of kernel stack kva's. 5) Get rid of strange looking ++var, and change to var++. 6) Change more data structures to use our "zone" allocator. Added struct proc, struct vmspace and struct vnode. This saves a significant amount of kva space and physical memory. Additionally, this enables TSM for the zone managed memory. 7) Keep ioopt disabled for now. 8) Remove the now bogus "single use" map concept. 9) Use generation counts or id's for data structures residing in TSM, where it allows us to avoid unneeded restart overhead during traversals, where blocking might occur. 10) Account better for memory deficits, so the pageout daemon will be able to make enough memory available (experimental.) 11) Fix some vnode locking problems. (From Tor, I think.) 12) Add a check in ufs_lookup, to avoid lots of unneeded calls to bcmp. (experimental.) 13) Significantly shrink, cleanup, and make slightly faster the vm_fault.c code. Use generation counts, get rid of unneded collpase operations, and clean up the cluster code. 14) Make vm_zone more suitable for TSM. This commit is partially as a result of discussions and contributions from other people, including DG, Tor Egge, PHK, and probably others that I have forgotten to attribute (so let me know, if I forgot.) This is not the infamous, final cleanup of the vnode stuff, but a necessary step. Vnode mgmt should be correct, but things might still change, and there is still some missing stuff (like ioopt, and physical backing of non-merged cache files, debugging of layering concepts.)
# 47221757	17-Jan-1998	John Dyson <dyson@FreeBSD.org>	Tie up some loose ends in vnode/object management. Remove an unneeded config option in pmap. Fix a problem with faulting in pages. Clean-up some loose ends in swap pager memory management. The system should be much more stable, but all subtile bugs aren't fixed yet.
# 95e5e988	05-Jan-1998	John Dyson <dyson@FreeBSD.org>	Make our v_usecount vnode reference count work identically to the original BSD code. The association between the vnode and the vm_object no longer includes reference counts. The major difference is that vm_object's are no longer freed gratuitiously from the vnode, and so once an object is created for the vnode, it will last as long as the vnode does. When a vnode object reference count is incremented, then the underlying vnode reference count is incremented also. The two "objects" are now more intimately related, and so the interactions are now much less complex. When vnodes are now normally placed onto the free queue with an object still attached. The rundown of the object happens at vnode rundown time, and happens with exactly the same filesystem semantics of the original VFS code. There is absolutely no need for vnode_pager_uncache and other travesties like that anymore. A side-effect of these changes is that SMP locking should be much simpler, the I/O copyin/copyout optimizations work, NFS should be more ponderable, and further work on layered filesystems should be less frustrating, because of the totally coherent management of the vnode objects and vnodes. Please be careful with your system while running this code, but I would greatly appreciate feedback as soon a reasonably possible.
# 2be70f79	28-Dec-1997	John Dyson <dyson@FreeBSD.org>	Lots of improvements, including restructring the caching and management of vnodes and objects. There are some metadata performance improvements that come along with this. There are also a few prototypes added when the need is noticed. Changes include: 1) Cleaning up vref, vget. 2) Removal of the object cache. 3) Nuke vnode_pager_uncache and friends, because they aren't needed anymore. 4) Correct some missing LK_RETRY's in vn_lock. 5) Correct the page range in the code for msync. Be gentle, and please give me feedback asap.
# b1f4a44b	11-Nov-1997	Julian Elischer <julian@FreeBSD.org>	Reviewed by: various. Ever since I first say the way the mount flags were used I've hated the fact that modes, and events, internal and exported, and short-term and long term flags are all thrown together. Finally it's annoyed me enough.. This patch to the entire FreeBSD tree adds a second mount flag word to the mount struct. it is not exported to userspace. I have moved some of the non exported flags over to this word. this means that we now have 8 free bits in the mount flags. There are another two that might well move over, but which I'm not sure about. The only user visible change would have been in pstat -v, except that davidg has disabled it anyhow. I'd still like to move the state flags and the 'command' flags apart from each other.. e.g. MNT_FORCE really doesn't have the same semantics as MNT_RDONLY, but that's left for another day.
# 987f5696	16-Oct-1997	Poul-Henning Kamp <phk@FreeBSD.org>	Another VFS cleanup "kilo commit" 1. Remove VOP_UPDATE, it is (also) an UFS/{FFS,LFS,EXT2FS,MFS} intereface function, and now lives in the ufsmount structure. 2. Remove VOP_SEEK, it was unused. 3. Add mode default vops: VOP_ADVLOCK vop_einval VOP_CLOSE vop_null VOP_FSYNC vop_null VOP_IOCTL vop_enotty VOP_MMAP vop_einval VOP_OPEN vop_null VOP_PATHCONF vop_einval VOP_READLINK vop_einval VOP_REALLOCBLKS vop_eopnotsupp And remove identical functionality from filesystems 4. Add vop_stdpathconf, which returns the canonical stuff. Use it in the filesystems. (XXX: It's probably wrong that specfs and fifofs sets this vop, shouldn't it come from the "host" filesystem, for instance ufs or cd9660 ?) 5. Try to make system wide VOP functions have vop_* names. 6. Initialize the um_* vectors in LFS. (Recompile your LKMS!!!)
# cec0f20c	16-Oct-1997	Poul-Henning Kamp <phk@FreeBSD.org>	VFS mega cleanup commit (x/N) 1. Add new file "sys/kern/vfs_default.c" where default actions for VOPs go. Implement proper defaults for ABORTOP, BWRITE, LEASE, POLL, REVOKE and STRATEGY. Various stuff spread over the entire tree belongs here. 2. Change VOP_BLKATOFF to a normal function in cd9660. 3. Kill VOP_BLKATOFF, VOP_TRUNCATE, VOP_VFREE, VOP_VALLOC. These are private interface functions between UFS and the underlying storage manager layer (FFS/LFS/MFS/EXT2FS). The functions now live in struct ufsmount instead. 4. Remove a kludge of VOP_ functions in all filesystems, that did nothing but obscure the simplicity and break the expandability. If a filesystem doesn't implement VOP_FOO, it shouldn't have an entry for it in its vnops table. The system will try to DTRT if it is not implemented. There are still some cruft left, but the bulk of it is done. 5. Fix another VCALL in vfs_cache.c (thanks Bruce!)
# a1c995b6	12-Oct-1997	Poul-Henning Kamp <phk@FreeBSD.org>	Last major round (Unless Bruce thinks of somthing :-) of malloc changes. Distribute all but the most fundamental malloc types. This time I also remembered the trick to making things static: Put "static" in front of them. A couple of finer points by: bde
# 55166637	11-Oct-1997	Poul-Henning Kamp <phk@FreeBSD.org>	Distribute and statizice a lot of the malloc M_* types. Substantial input from: bde
# 0be6b890	10-Oct-1997	Poul-Henning Kamp <phk@FreeBSD.org>	Add type arg to ffs_mountfs and avoid examining v_tag to find out if MFS is getting a free ride. Use generic ufs_reclaim().
# 81bca6dd	27-Sep-1997	KATO Takenori <kato@FreeBSD.org>	Clustered read and write are switched at mount-option level. 1. Clustered I/O is switched by the MNT_NOCLUSTERR and MNT_NOCLUSTERW bits of the mnt_flag. The sysctl variables, vfs.foo.doclusterread and vfs.foo.doclusterwrite are deleted. Only mount option can control clustered I/O from userland. 2. When foofs_mount mounts block device, foofs_mount checks D_CLUSTERR and D_CLUSTERW bits of the d_flags member in the block device switch table. If D_NOCLUSTERR / D_NOCLUSTERW are set, MNT_NOCLUSTERR / MNT_NOCLUSTERW bits will be set. In this case, MNT_NOCLUSTERR and MNT_NOCLUSTERW cannot be cleared from userland. 3. Vnode driver disables both clustered read and write. 4. Union filesystem disables clutered write. Reviewed by: bde
# 41fadeeb	07-Sep-1997	Bruce Evans <bde@FreeBSD.org>	Removed yet more vestiges of config-time swap configuration and/or cleaned up nearby cruft.
# e4ba6a82	02-Sep-1997	Bruce Evans <bde@FreeBSD.org>	Removed unused #includes.
# 57bf258e	16-Aug-1997	Garrett Wollman <wollman@FreeBSD.org>	Fix all areas of the system (or at least all those in LINT) to avoid storing socket addresses in mbufs. (Socket buffers are the one exception.) A number of kernel APIs needed to get fixed in order to make this happen. Also, fix three protocol families which kept PCBs in mbufs to not malloc them instead. Delete some old compatibility cruft while we're at it, and add some new routines in the in_cksum family.
# fce002fd	24-Mar-1997	Bruce Evans <bde@FreeBSD.org>	Don't include <sys/ioctl.h> in the kernel. Stage 1: don't include it when it is not used. In most cases, the reasons for including it went away when the special ioctl headers became self-sufficient.
# 8f89943e	23-Mar-1997	Guido van Rooij <guido@FreeBSD.org>	Add generation number randomization. Newly created filesystems wil now automatically have random generation numbers. The kenel way of handling those also changed. Further it is advised to run fsirand on all your nfs exported filesystems. the code is mostly copied from OpenBSD, with the randomization chanegd to use /dev/urandom Reviewed by: Garrett Obtained from: OpenBSD
# 3ac4d1ef	22-Mar-1997	Bruce Evans <bde@FreeBSD.org>	Don't #include <sys/fcntl.h> in <sys/file.h> if KERNEL is defined. Fixed everything that depended on getting fcntl.h stuff from the wrong place. Most things don't depend on file.h stuff at all.
# 3c816944	21-Mar-1997	Bruce Evans <bde@FreeBSD.org>	Fixed some invalid (non-atomic) accesses to `time', mostly ones of the form `tv = time'. Use a new function gettime(). The current version just forces atomicicity without fixing precision or efficiency bugs. Simplified some related valid accesses by using the central function.
# cc9d8990	18-Mar-1997	Peter Wemm <peter@FreeBSD.org>	Restore the lost MNT_LOCAL flag twiddle. Lite2 has a different mechanism of setting it (compiled into vfs_conf.c), but we have a dynamic system in place. This could probably be better done via a runtime configure flag in the VFS_SET() VFS declaration, perhaps VFCF_LOCAL, and have the VFS code propagate this down into MNT_LOCAL at mount time. The other FS's would need to be updated, havinf UFS and MSDOSFS filesystems without MNT_LOCAL breaks a few things.. the man page rebuild scans for local filesystems and currently fails, I suspect that other tools like find and tar with their "local filesystem only" modes might be affected.
# 6ae83587	15-Mar-1997	Søren Schmidt <sos@FreeBSD.org>	Fix support for != 512 byte sector devices. Restores the use of SBLOCK instead of the BSOFF/sectorsize calculation. Using SBLOCK is bogus however in that it uses DEV_BSIZE instead of the actual sector size, but that is taken care of in other places. Changing the SBLOCK would be better, but it affects the system in other places, and doing it this way makes it possible to use filesystems that was made before the lite2 merge.
# 5ace3b26	08-Mar-1997	Mike Pritchard <mpp@FreeBSD.org>	Update a number of panic messages to reflect the actual name of the routine that caused the panic.
# 6875d254	22-Feb-1997	Peter Wemm <peter@FreeBSD.org>	Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.
# 996c772f	09-Feb-1997	John Dyson <dyson@FreeBSD.org>	This is the kernel Lite/2 commit. There are some requisite userland changes, so don't expect to be able to run the kernel as-is (very well) without the appropriate Lite/2 userland changes. The system boots and can mount UFS filesystems. Untested: ext2fs, msdosfs, NFS Known problems: Incorrect Berkeley ID strings in some files. Mount_std mounts will not work until the getfsent library routine is changed. Reviewed by: various people Submitted by: Jeffery Hsu <hsu@freebsd.org>
# 1130b656	14-Jan-1997	Jordan K. Hubbard <jkh@FreeBSD.org>	Make the long-awaited change from $Id$ to $FreeBSD$ This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long. Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
# 3f6f17ee	12-Nov-1996	Julian Elischer <julian@FreeBSD.org>	Submitted by: Archie and me. We encountered an interesting situation where the superblock for a file system got written to disk with the "fs_fmod" flag set to one. It appears that this flag is normally supposed to be cleared during ffs_sync(), but we experienced a crash, or some other weird occurrence that left it on the disk set to 1. Later this partition was mounted read-only... and the fs_fmod field was never cleared, causing ffs_sync() to panic "rofs mod" when trying to unmount that filesystem (ffs_vfsops.c: line 790). fix: set this bit to 0 when you load the superblock from disk. (see more complete mail on this to hackers)
# c645dc12	07-Sep-1996	John Dyson <dyson@FreeBSD.org>	Fix a VOP_UNLOCK panic when using options DIAGNOSTIC during dismount.
# 6476c0d2	21-Aug-1996	John Dyson <dyson@FreeBSD.org>	Even though this looks like it, this is not a complex code change. The interface into the "VMIO" system has changed to be more consistant and robust. Essentially, it is now no longer necessary to call vn_open to get merged VM/Buffer cache operation, and exceptional conditions such as merged operation of VBLK devices is simpler and more correct. This code corrects a potentially large set of problems including the problems with ktrace output and loaded systems, file create/deletes, etc. Most of the changes to NFS are cosmetic and name changes, eliminating a layer of subroutine calls. The direct calls to vput/vrele have been re-instituted for better cross platform compatibility. Reviewed by: davidg
# 2f9bae59	11-Jun-1996	David Greenman <dg@FreeBSD.org>	Moved the fsnode MALLOC to before the call to getnewvnode() so that the process won't possibly block before filling in the fsnode pointer (v_data) which might be dereferenced during a sync since the vnode is put on the mnt_vnodelist by getnewvnode. Pointed out by Matt Day <mday@artisoft.com>
# e1eec28a	11-Mar-1996	Peter Wemm <peter@FreeBSD.org>	Import 4.4BSD-Lite2 onto the vendor branch, note that in the kernel, all files are off the vendor branch, so this should not change anything. A "U" marker generally means that the file was not changed in between the 4.4Lite and Lite-2 releases, and does not need a merge. "C" generally means that there was a change.
# 847a3ba7	02-Mar-1996	John Dyson <dyson@FreeBSD.org>	Handle the bogus device that MFS uses as its VBLK device. We now don't try to VMIO open it on MFS mounts. This will fix the mfs_badops panic.
# 91477adc	01-Mar-1996	John Dyson <dyson@FreeBSD.org>	Enable VMIO for non-VDIR metadata and block device.
# e6302eab	25-Feb-1996	Bruce Evans <bde@FreeBSD.org>	Removed vestigial support for the obsolete FIFO option. In ext2fs it caused null pointer panics for all fifo operations unless FIFO was defined.
# bd7e5f99	18-Jan-1996	John Dyson <dyson@FreeBSD.org>	Eliminated many redundant vm_map_lookup operations for vm_mmap. Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish overhead for merged cache. Efficiency improvement for vfs_cluster. It used to do alot of redundant calls to cluster_rbuild. Correct the ordering for vrele of .text and release of credentials. Use the selective tlb update for 486/586/P6. Numerous fixes to the size of objects allocated for files. Additionally, fixes in the various pagers. Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs. Fixes in the swap pager for exhausted resources. The pageout code will not as readily thrash. Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE), thereby improving efficiency of several routines. Eliminate even more unnecessary vm_page_protect operations. Significantly speed up process forks. Make vm_object_page_clean more efficient, thereby eliminating the pause that happens every 30seconds. Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the case of filesystems mounted async. Fix a panic with busy pages when write clustering is done for non-VMIO buffers.
# 51ea8b57	14-Jan-1996	Bruce Evans <bde@FreeBSD.org>	Partially fixed negative and truncated "Avail" counts in df output. This fixes PR943. ffs/ffs_vfsops.c: ffs_statfs() multiplied by (100 - minfree) as part of calculating the minfree percentage (complemented in 100%), so with the standard minfree of 8, it was broken for file systems of size >= 1TB/92 = 11GB. Use the standard freespace() macro instead. This also fixes a rounding bug (the "Avail" count was sometimes 1 too small). ffs/* (not fixed): The freespace() macro multiplies by minfree, so with the standard minfree of 8, it is broken for file systems of size >= 1TB/8 = 128GB. This bug is more serious since it affects block allocation. ffs/ffs_alloc.c (not fixed): Ordinary users are sometimes allowed to allocate 1 (partial) block too many so that the "Avail" count goes negative. E.g., if there is 1 fragment available and the file is fairly large, one more full block is allocated. df/df.c: ufs_df() used/uses essentially the same code as ffs_statfs(), so it had/has the same bugs. ufs_df() gratuitously replaced "Avail" counts of < 0 by 0, so it gave different results for non-mounted file systems in this case.
# 01733a9b	05-Jan-1996	Garrett Wollman <wollman@FreeBSD.org>	Convert QUOTA to new-style option.
# b8dce649	17-Dec-1995	Poul-Henning Kamp <phk@FreeBSD.org>	Staticize.
# a316d390	10-Dec-1995	John Dyson <dyson@FreeBSD.org>	Changes to support 1Tb filesizes. Pages are now named by an (object,index) pair instead of (object,offset) pair.
# efeaf95a	06-Dec-1995	David Greenman <dg@FreeBSD.org>	Untangled the vm.h include file spaghetti.
# c03020b2	19-Nov-1995	Poul-Henning Kamp <phk@FreeBSD.org>	Fix compiler warnings.
# 2b14f991	28-Aug-1995	Julian Elischer <julian@FreeBSD.org>	Reviewed by: julian with quick glances by bruce and others Submitted by: terry (terry lambert) This is a composite of 3 patch sets submitted by terry. they are: New low-level init code that supports loadbal modules better some cleanups in the namei code to help terry in 16-bit character support some changes to the mount-root code to make it a little more modular.. NOTE: mounting root off cdrom or NFS MIGHT be broken as I haven't been able to test those cases.. certainly mounting root of disk still works just fine.. mfs should work but is untested. (tomorrows task) The low level init stuff includes a total rewrite of init_main.c to make it possible for new modules to have an init phase by simply adding an entry to a TEXT_SET (or is it DATA_SET) list. thus a new module can be added to the kernel without editing any other files other than the 'files' file.
# 628641f8	11-Aug-1995	David Greenman <dg@FreeBSD.org>	Converted mountlist to a CIRCLEQ. Partially obtained from: 4.4BSD-Lite2
# b6dedae6	06-Aug-1995	David Greenman <dg@FreeBSD.org>	Removed redundant call to vm_object_page_clean - this is already done in vfs_msync().
# 8997d94f	21-Jul-1995	David Greenman <dg@FreeBSD.org>	Since ufs_ihashget can block, the lock must be checked for each time the function returns. Also, moved lock into .bss and made minor cosmetic changes. Submitted by: Bruce Evans
# 2094ddb6	20-Jul-1995	David Greenman <dg@FreeBSD.org>	Implement a lock in ffs_vget to prevent a race condition where two processes try allocate the same inode/vnode, causing a duplicate. Submitted by: Matt Dillon, slightly reworked by me.
# 24a1cce3	13-Jul-1995	David Greenman <dg@FreeBSD.org>	NOTE: libkvm, w, ps, 'top', and any other utility which depends on struct proc or any VM system structure will have to be rebuilt!!! Much needed overhaul of the VM system. Included in this first round of changes: 1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages, haspage, and sync operations are supported. The haspage interface now provides information about clusterability. All pager routines now take struct vm_object's instead of "pagers". 2) Improved data structures. In the previous paradigm, there is constant confusion caused by pagers being both a data structure ("allocate a pager") and a collection of routines. The idea of a pager structure has escentially been eliminated. Objects now have types, and this type is used to index the appropriate pager. In most cases, items in the pager structure were duplicated in the object data structure and thus were unnecessary. In the few cases that remained, a un_pager structure union was created in the object to contain these items. 3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now be removed. For instance, vm_object_enter(), vm_object_lookup(), vm_object_remove(), and the associated object hash list were some of the things that were removed. 4) simple_lock's removed. Discussion with several people reveals that the SMP locking primitives used in the VM system aren't likely the mechanism that we'll be adopting. Even if it were, the locking that was in the code was very inadequate and would have to be mostly re-done anyway. The locking in a uni-processor kernel was a no-op but went a long way toward making the code difficult to read and debug. 5) Places that attempted to kludge-up the fact that we don't have kernel thread support have been fixed to reflect the reality that we are really dealing with processes, not threads. The VM system didn't have complete thread support, so the comments and mis-named routines were just wrong. We now use tsleep and wakeup directly in the lock routines, for instance. 6) Where appropriate, the pagers have been improved, especially in the pager_alloc routines. Most of the pager_allocs have been rewritten and are now faster and easier to maintain. 7) The pagedaemon pageout clustering algorithm has been rewritten and now tries harder to output an even number of pages before and after the requested page. This is sort of the reverse of the ideal pagein algorithm and should provide better overall performance. 8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup have been removed. Some other unnecessary casts have also been removed. 9) Some almost useless debugging code removed. 10) Terminology of shadow objects vs. backing objects straightened out. The fact that the vm_object data structure escentially had this backwards really confused things. The use of "shadow" and "backing object" throughout the code is now internally consistent and correct in the Mach terminology. 11) Several minor bug fixes, including one in the vm daemon that caused 0 RSS objects to not get purged as intended. 12) A "default pager" has now been created which cleans up the transition of objects to the "swap" type. The previous checks throughout the code for swp->pg_data != NULL were really ugly. This change also provides the rudiments for future backing of "anonymous" memory by something other than the swap pager (via the vnode pager, for example), and it allows the decision about which of these pagers to use to be made dynamically (although will need some additional decision code to do this, of course). 13) (dyson) MAP_COPY has been deprecated and the corresponding "copy object" code has been removed. MAP_COPY was undocumented and non- standard. It was furthermore broken in several ways which caused its behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will continue to work correctly, but via the slightly different semantics of MAP_PRIVATE. 14) (dyson) Sharing maps have been removed. It's marginal usefulness in a threads design can be worked around in other ways. Both #12 and #13 were done to simplify the code and improve readability and maintain- ability. (As were most all of these changes) TODO: 1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing this will reduce the vnode pager to a mere fraction of its current size. 2) Rewrite vm_fault and the swap/vnode pagers to use the clustering information provided by the new haspage pager interface. This will substantially reduce the overhead by eliminating a large number of VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be improved to provide both a "behind" and "ahead" indication of contiguousness. 3) Implement the extended features of pager_haspage in swap_pager_haspage(). It currently just says 0 pages ahead/behind. 4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps via a much more general mechanism that could also be used for disk striping of regular filesystems. 5) Do something to improve the architecture of vm_object_collapse(). The fact that it makes calls into the swap pager and knows too much about how the swap pager operates really bothers me. It also doesn't allow for collapsing of non-swap pager objects ("unnamed" objects backed by other pagers).
# aa2cabb9	27-Jun-1995	David Greenman <dg@FreeBSD.org>	1) Converted v_vmdata to v_object. 2) Removed unnecessary vm_object_lookup()/pager_cache(object, TRUE) pairs after vnode_pager_alloc() calls - the object is already guaranteed to be persistent. 3) Removed some gratuitous casts.
# 9b2e5354	30-May-1995	Rodney W. Grimes <rgrimes@FreeBSD.org>	Remove trailing whitespace.
# 2976b7f1	18-May-1995	David Greenman <dg@FreeBSD.org>	NFS diskless operation was broken because swapdev_vp wasn't initialized. These changes solve the problem in a general way by moving the initialization out of the individual fs_mountroot's and into swaponvp(). Submitted by: Poul-Henning Kamp
# 1469eec8	15-May-1995	David Greenman <dg@FreeBSD.org>	Fixed incompleteness that would allow dirty filesystems to get mounted when the single user shell was terminated. These changes disallow mounting or R/W upgrading filesystems that are dirty unless "-f" (force) option is used with mount. /etc/rc has been modified to abort the startup if one or more non-nfs partitions fail to mount. Reviewed by: Poul-Henning Kamp, Rod Grimes
# f33775af	01-May-1995	John Dyson <dyson@FreeBSD.org>	Limit filesize to the amount that the VM system can currently handle (2GB). If this limit is not imposed, then filesystem corruption will ensue when files larger than 2GB are created. This is temporary, and the underlying limitation will be removed later.
# 81c6e3e5	10-Apr-1995	David Greenman <dg@FreeBSD.org>	Handle the "syncing VCHR vnode hang" problem a little differently; just don't lock the vnode - it doesn't appear to ever be necessary for VCHR vnode/inodes. This fixes a bug introduced in the previous commit that caused tty timestamps to act strange (causing 'w' and 'finger' to show the tty wasn't idle when it may have been for hours).
# f6b04d2b	09-Apr-1995	David Greenman <dg@FreeBSD.org>	Changes from John Dyson and myself: Fixed remaining known bugs in the buffer IO and VM system. vfs_bio.c: Fixed some race conditions and locking bugs. Improved performance by removing some (now) unnecessary code and fixing some broken logic. Fixed process accounting of # of FS outputs. Properly handle NFS interrupts (B_EINTR). (various) Replaced calls to clrbuf() with calls to an optimized routine called vfs_bio_clrbuf(). (various FS sync) Sync out modified vnode_pager backed pages. ffs_vnops.c: Do two passes: Sync out file data first, then indirect blocks. vm_fault.c: Fixed deadly embrace caused by acquiring locks in the wrong order. vnode_pager.c: Changed to use buffer I/O system for writing out modified pages. This should fix the problem with the modification date previous not getting updated. Also dramatically simplifies the code. Note that this is going to change in the future and be implemented via VOP_PUTPAGES(). vm_object.c: Fixed a pile of bugs related to cleaning (vnode) objects. The performance of vm_object_page_clean() is terrible when dealing with huge objects, but this will change when we implement a binary tree to keep the object pages sorted. vm_pageout.c: Fixed broken clustering of pageouts. Fixed race conditions and other lockup style bugs in the scanning of pages. Improved performance.
# 3aa12267	28-Mar-1995	Bruce Evans <bde@FreeBSD.org>	Add and move declarations to fix all of the warnings from `gcc -Wimplicit' (except in netccitt, netiso and netns) that I didn't notice when I fixed "all" such warnings before.
# 4e83f749	18-Mar-1995	David Greenman <dg@FreeBSD.org>	Don't sync the inode date changes of character special devices during the FS sync. The system would appear to hang momentarily if there was a large backlog of I/O. This is because the vnode remains locked during the output - preventing normal character I/O. The problem was exacerbated by the FFS contiguous block allocation fixes and a semi-broken disksort(). The inode/date will still be synced during a normal FS dismount and whenever the inode is changed for other reasons.
# b5e8ce9f	16-Mar-1995	Bruce Evans <bde@FreeBSD.org>	Add and move declarations to fix all of the warnings from `gcc -Wimplicit' (except in netccitt, netiso and netns) and most of the warnings from `gcc -Wnested-externs'. Fix all the bugs found. There were no serious ones.
# 36633bf4	14-Nov-1994	Bruce Evans <bde@FreeBSD.org>	Undo a previous change. <sys/disklabel.h> was broken, not these files.
# 94a92413	27-Oct-1994	Jordan K. Hubbard <jkh@FreeBSD.org>	From: fredriks@mcs.com (Lars Fredriksen) ... It turns out that these files do not include <sys/dkbad.h> before <sys/disklabel.h>. Submitted by: fredriks
# 901ba606	21-Oct-1994	David Greenman <dg@FreeBSD.org>	Restrict fs_maxfilesize to 2^40, and check against this in ffs_truncate(). This is part of a bug fix from Kirk McKusick to work around problems in FFS related to the blkno of a 64bit offset not fitting into an int. Note the proper solution would be to deal with 64bit block numbers, but doing this would require sweeping changes; some other day perhaps. Submitted by: Marshall Kirk McKusick
# c1d9efcb	09-Oct-1994	Poul-Henning Kamp <phk@FreeBSD.org>	Cosmetics. make gcc less noisy. Still some way to go here.
# c9671602	08-Oct-1994	Poul-Henning Kamp <phk@FreeBSD.org>	Cosmetics for gcc -Wall. A couple of unused "int i"'s removed and a couple of prototypes added. And the usual () work.
# 862cdb8e	21-Sep-1994	Garrett Wollman <wollman@FreeBSD.org>	Call ffs ``ufs'' for the benefit of poor, confused user-land programs.
# c901836c	20-Sep-1994	Garrett Wollman <wollman@FreeBSD.org>	Implemented loadable VFS modules, and made most existing filesystems loadable. (NFS is a notable exception.)
# e0e9c421	20-Aug-1994	David Greenman <dg@FreeBSD.org>	Implemented filesystem clean bit via: machdep.c: Changed printf's a little and call vfs_unmountall() if the sync was successful. cd9660_vfsops.c, ffs_vfsops.c, nfs_vfsops.c, lfs_vfsops.c: Allow dismount of root FS. It is now disallowed at a higher level. vfs_conf.c: Removed unused rootfs global. vfs_subr.c: Added new routines vfs_unmountall and vfs_unmountroot. Filesystems are now dismounted if the machine is properly rebooted. ffs_vfsops.c: Toggle clean bit at the appropriate places. Print warning if an unclean FS is mounted. ffs_vfsops.c, lfs_vfsops.c: Fix bug in selecting proper flags for VOP_CLOSE(). vfs_syscalls.c: Disallow dismounting root FS via umount syscall.
# f23b4c91	18-Aug-1994	Garrett Wollman <wollman@FreeBSD.org>	Fix up some sloppy coding practices: - Delete redundant declarations. - Add -Wredundant-declarations to Makefile.i386 so they don't come back. - Delete sloppy COMMON-style declarations of uninitialized data in header files. - Add a few prototypes. - Clean up warnings resulting from the above. NB: ioconf.c will still generate a redundant-declaration warning, which is unavoidable unless somebody volunteers to make `config' smarter.
# 3c4dd356	02-Aug-1994	David Greenman <dg@FreeBSD.org>	Added $Id$
# 26f9a767	25-May-1994	Rodney W. Grimes <rgrimes@FreeBSD.org>	The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch. Reviewed by: Rodney W. Grimes Submitted by: John Dyson and David Greenman
# df8bae1d	24-May-1994	Rodney W. Grimes <rgrimes@FreeBSD.org>	BSD 4.4 Lite Kernel Sources