Cross Reference: /freebsd-current/sys/kern/vfs

History log of /freebsd-current/sys/kern/vfs_default.c
Revision	Date	Author	Comments
# 4cbe4c48	18-Nov-2023	Konstantin Belousov <kib@FreeBSD.org>	VFS: add VOP_GETLOWVNODE() It is similar to VOP_GETWRITEMOUNT(), and for given vnode vp should return the lower vnode which would actually handle write to vp. Flags allow to specify FREAD or FWRITE for benefit of possible unionfs implementation. Reviewed by: markj, Olivier Certner <olce.freebsd@certner.fr> Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D42603
# fdafd315	24-Nov-2023	Warner Losh <imp@FreeBSD.org>	sys: Automated cleanup of cdefs and other formatting Apply the following automated changes to try to eliminate no-longer-needed sys/cdefs.h includes as well as now-empty blank lines in a row. Remove /^#if.\n#endif.\n#include\s+<sys/cdefs.h>.\n/ Remove /\n+#include\s+<sys/cdefs.h>.\n+#if.\n#endif.\n+/ Remove /\n+#if.\n#endif.\n+/ Remove /^#if.\n#endif.\n/ Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/ Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/ Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/ Sponsored by: Netflix
# 29363fb4	23-Nov-2023	Warner Losh <imp@FreeBSD.org>	sys: Remove ancient SCCS tags. Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl script. Sponsored by: Netflix
# 685dc743	16-Aug-2023	Warner Losh <imp@FreeBSD.org>	sys: Remove $FreeBSD$: one-line .c pattern Remove /^[\s]__FBSDID$"\$FreeBSD\$"$;?\s*\n/
# 3d8450db	24-Apr-2023	Olivier Certner <olce.freebsd@certner.fr>	vfs: vn_dir_next_dirent(): Simplify interface and harden Simplify the old interface (one less argument, simpler termination test) and add documentation about it. Add more sanity checks (mostly under INVARIANTS, but also in the general case to prevent infinite loops). Drop the explicit test on minimum directory entry size (without INVARIANTS). Deal with the impacts in callers (dirent_exists() and vop_stdvptocnp()). dirent_exists() has been simplified a bit, preserving the exact same semantics but for the return code whose meaning has been reversed (0 now means the entry exists, ENOENT that it doesn't and other values are genuine errors). While here, suppress gratuitous casts of malloc return values. vn_dir_next_dirent() has been tested by a 'make -j4 buildkernel' with a temporary modification to the VFS cache causing vn_vptocnp() to always call VOP_VPTOCNP() and finally vop_stdvptocnp() (observed with temporary debug counters). Export new _GENERIC_MINDIRSIZ and _GENERIC_MAXDIRSIZ on __BSD_VISIBLE, and GENERIC_MINDIRSIZ and GENERIC_MAXDIRSIZ on _KERNEL. Reviewed by: kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D39764
# 6bce3f23	23-Apr-2023	Olivier Certner <olce.freebsd@certner.fr>	vfs: Export get_next_dirent() as vn_dir_next_dirent() Move internal-to-'vfs_default.c' get_next_dirent() to 'vfs_vnops.c' and export it for use by other parts of the VFS. This is a preparatory change for using it in vfs_emptydir(). No functional change. Reviewed by: kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D39755
# 83775757	07-Feb-2023	Mateusz Guzik <mjg@FreeBSD.org>	vfs: ansify Reported by: clang 15 Sponsored by: Rubicon Communications, LLC ("Netgate")
# bb92cd7b	24-Mar-2022	Mateusz Guzik <mjg@FreeBSD.org>	vfs: NDFREE(&nd, NDF_ONLY_PNBUF) -> NDFREE_PNBUF(&nd)
# e511bd14	26-Nov-2021	Mateusz Guzik <mjg@FreeBSD.org>	vfs: fully lockless v_writecount adjustment Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D33128
# 4dcdf398	17-May-2021	Mateusz Guzik <mjg@FreeBSD.org>	vfs: replace the MNTK_TEXT_REFS flag with VIRF_TEXT_REF This allows to stop maintaing the VI_TEXT_REF flag and consequently opens up fully lockless v_writecount adjustment. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D33127
# 3ffcfa59	26-Nov-2021	Mateusz Guzik <mjg@FreeBSD.org>	vfs: add vop_stdadd_writecount_nomsync This avoids needing to inspect the mount point every time. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D33125
# 7e1d3eef	25-Nov-2021	Mateusz Guzik <mjg@FreeBSD.org>	vfs: remove the unused thread argument from NDINIT* See b4a58fbf640409a1 ("vfs: remove cn_thread") Bump __FreeBSD_version to 1400043.
# f0c9847a	06-Nov-2021	Rick Macklem <rmacklem@FreeBSD.org>	vfs: Add "ioflag" and "cred" arguments to VOP_ALLOCATE When the NFSv4.2 server does a VOP_ALLOCATE(), it needs the operation to be done for the RPC's credential and not td_ucred. It also needs the writing to be done synchronously. This patch adds "ioflag" and "cred" arguments to VOP_ALLOCATE() and modifies vop_stdallocate() to use these arguments. The VOP_ALLOCATE.9 man page will be patched separately. Reviewed by: khng, kib Differential Revision: https://reviews.freebsd.org/D32865
# da779f26	27-Aug-2021	Rick Macklem <rmacklem@FreeBSD.org>	vfs_default: Change vop_stddeallocate() from static to global A future commit to the NFS client uses vop_stddeallocate() for cases where the NFS server does not support a Deallocate operation. Change vop_stddeallocate() from static to global so that it can be called by the NFS client. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D31640
# 9e202d03	25-Aug-2021	Ka Ho Ng <khng@FreeBSD.org>	fspacectl(2): Changes on rmsr.r_offset's minimum value returned rmsr.r_offset now is set to rqsr.r_offset plus the number of bytes zeroed before hitting the end-of-file. After this change rmsr.r_offset no longer contains the EOF when the requested operation range is completely beyond the end-of-file. Instead in such case rmsr.r_offset is equal to rqsr.r_offset. Callers can obtain the number of bytes zeroed by subtracting rqsr.r_offset from rmsr.r_offset. Sponsored by: The FreeBSD Foundation Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D31677
# 1eaa3652	24-Aug-2021	Ka Ho Ng <khng@FreeBSD.org>	fspacectl(2): Clarifies the return values rmacklem@ spotted two things in the system call: - Upon returning from a successful operation, vop_stddeallocate can update rmsr.r_offset to a value greater than file size. This behavior, although being harmless, can be confusing. - The EINVAL return value for rqsr.r_offset + rqsr.r_len > OFF_MAX is undocumented. This commit has the following changes: - vop_stddeallocate and shm_deallocate to bound the the affected area further by the file size. - The EINVAL case for rqsr.r_offset + rqsr.r_len > OFF_MAX is documented. - The fspacectl(2), vn_deallocate(9) and VOP_DEALLOCATE(9)'s return len is explicitly documented the be the value 0, and the return offset is restricted to be the smallest of off + len and current file size suggested by kib@. This semantic allows callers to interact better with potential file size growth after the call. Sponsored by: The FreeBSD Foundation Reviewed by: imp, kib Differential Revision: https://reviews.freebsd.org/D31604
# a638dc4e	12-Aug-2021	Ka Ho Ng <khng@FreeBSD.org>	vfs: Add ioflag to VOP_DEALLOCATE(9) The addition of ioflag allows callers passing IO_SYNC/IO_DATASYNC/IO_DIRECT down to the file system implementation. The vop_stddeallocate fallback implementation is updated to pass the ioflag to the file system implementation. vn_deallocate(9) internally is also changed to pass ioflag to the VOP_DEALLOCATE call. Sponsored by: The FreeBSD Foundation Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D31500
# 0dc332bf	05-Aug-2021	Ka Ho Ng <khng@FreeBSD.org>	Add fspacectl(2), vn_deallocate(9) and VOP_DEALLOCATE(9). fspacectl(2) is a system call to provide space management support to userspace applications. VOP_DEALLOCATE(9) is a VOP call to perform the deallocation. vn_deallocate(9) is a public KPI for kmods' use. The purpose of proposing a new system call, a KPI and a VOP call is to allow bhyve or other hypervisor monitors to emulate the behavior of SCSI UNMAP/NVMe DEALLOCATE on a plain file. fspacectl(2) comprises of cmd and flags parameters to specify the space management operation to be performed. Currently cmd has to be SPACECTL_DEALLOC, and flags has to be 0. fo_fspacectl is added to fileops. VOP_DEALLOCATE(9) is added as a new VOP call. A trivial implementation of VOP_DEALLOCATE(9) is provided. Sponsored by: The FreeBSD Foundation Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D28347
# a4b07a27	11-May-2021	Jason A. Harmening <jah@FreeBSD.org>	VFS_QUOTACTL(9): allow implementation to indicate busy state changes Instead of requiring all implementations of vfs_quotactl to unbusy the mount for Q_QUOTAON and Q_QUOTAOFF, add an "mp_busy" in/out param to VFS_QUOTACTL(9). The implementation may then indicate to the caller whether it needed to unbusy the mount. Also, add stbool.h to libprocstat modules which #define _KERNEL before including sys/mount.h. Otherwise they'll pull in sys/types.h before defining _KERNEL and therefore won't have the bool definition they need for mp_busy. Reviewed By: kib, markj Differential Revision: https://reviews.freebsd.org/D30556
# 271fcf1c	29-May-2021	Jason A. Harmening <jah@FreeBSD.org>	Revert commits 6d3e78ad6c11 and 54256e7954d7 Parts of libprocstat like to pretend they're kernel components for the sake of including mount.h, and including sys/types.h in the _KERNEL case doesn't fix the build for some reason. Revert both the VFS_QUOTACTL() change and the follow-up "fix" for now.
# 6d3e78ad	11-May-2021	Jason A. Harmening <jah@FreeBSD.org>	VFS_QUOTACTL(9): allow implementation to indicate busy state changes Instead of requiring all implementations of vfs_quotactl to unbusy the mount for Q_QUOTAON and Q_QUOTAOFF, add an "mp_busy" in/out param to VFS_QUOTACTL(9). The implementation may then indicate to the caller whether it needed to unbusy the mount. Reviewed By: kib, markj Differential Revision: https://reviews.freebsd.org/D30218
# 852088f6	14-May-2021	Mateusz Guzik <mjg@FreeBSD.org>	vfs: add missing atomic conversion to writecount adjustment Fixes: ("vfs: lockless writecount adjustment in set/unset text")
# b5fb9ae6	07-May-2021	Mateusz Guzik <mjg@FreeBSD.org>	vfs: lockless writecount adjustment in set/unset text ... for cases where this is not the first/last exec.
# 2b2d77e7	02-May-2021	Mark Johnston <markj@FreeBSD.org>	VOP_STAT: Provide a default value for va_gen Some filesystems, e.g., pseudofs and the NFSv3 client, do not provide one. Reviewed by: kib Reported by: KMSAN Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D30091
# a15f787a	15-Feb-2021	Mateusz Guzik <mjg@FreeBSD.org>	vfs: add vfs_ref_from_vp This generalizes what vop_stdgetwritemount used to be doing. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D28695
# fa3bd463	29-Jan-2021	Konstantin Belousov <kib@FreeBSD.org>	lockf: ensure atomicity of lockf for open(O_CREAT\|O_EXCL\|O_EXLOCK) or EX_SHLOCK. Do it by setting a vnode iflag indicating that the locking exclusive open is in progress, and not allowing F_LOCK request to make a progress until the first open finishes. Requested by: mckusick Reviewed by: markj, mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D28697
# 49c117c1	28-Jan-2021	Konstantin Belousov <kib@FreeBSD.org>	Add VOP_VPUT_PAIR() with trivial default implementation. The VOP is intended to be used in situations where VFS has two referenced locked vnodes, typically a directory vnode dvp and a vnode vp that is linked from the directory, and at least dvp is vput(9)ed. The child vnode can be also vput-ed, but optionally left referenced and locked. There, at least UFS may need to do some actions with dvp which cannot be done while vp is also locked, so its lock might be dropped temporary. For instance, in some cases UFS needs to sync dvp to avoid filesystem state that is currently not handled by either kernel nor fsck. Having such VOP provides the neccessary context for filesystem which can do correct locking and handle potential reclamation of vp after relock. Trivial implementation does vput(dvp) and optionally vput(vp). Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
# cd853791	27-Nov-2020	Konstantin Belousov <kib@FreeBSD.org>	Make MAXPHYS tunable. Bump MAXPHYS to 1M. Replace MAXPHYS by runtime variable maxphys. It is initialized from MAXPHYS by default, but can be also adjusted with the tunable kern.maxphys. Make b_pages[] array in struct buf flexible. Size b_pages[] for buffer cache buffers exactly to atop(maxbcachebuf) (currently it is sized to atop(MAXPHYS)), and b_pages[] for pbufs is sized to atop(maxphys) + 1. The +1 for pbufs allow several pbuf consumers, among them vmapbuf(), to use unaligned buffers still sized to maxphys, esp. when such buffers come from userspace (). Overall, we save significant amount of otherwise wasted memory in b_pages[] for buffer cache buffers, while bumping MAXPHYS to desired high value. Eliminate all direct uses of the MAXPHYS constant in kernel and driver sources, except a place which initialize maxphys. Some random (and arguably weird) uses of MAXPHYS, e.g. in linuxolator, are converted straight. Some drivers, which use MAXPHYS to size embeded structures, get private MAXPHYS-like constant; their convertion is out of scope for this work. Changes to cam/, dev/ahci, dev/ata, dev/mpr, dev/mpt, dev/mvs, dev/siis, where either submitted by, or based on changes by mav. Suggested by: mav () Reviewed by: imp, mav, imp, mckusick, scottl (intermediate versions) Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D27225
# f6dd1aef	09-Nov-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: group mount per-cpu vars into one struct While here move frequently read stuff into the same cacheline. This shrinks struct mount by 64 bytes. Tested by: pho
# 8ecd87a3	20-Oct-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: drop spurious cred argument from VOP_VPTOCNP
# 214eccf4	14-Oct-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: add VOP_EAGAIN Can be used to stub fplookup for example.
# 3c484f32	15-Sep-2020	Konstantin Belousov <kib@FreeBSD.org>	Convert page cache read to VOP. There are several negative side-effects of not calling into VOP layer at all for page cache reads. The biggest is the missed activation of EVFILT_READ knotes. Also, it allows filesystem to make more fine grained decision to refuse read from page cache. Keep VIRF_PGREAD flag around, it is still useful for nullfs, and for asserts. Reviewed by: markj Tested by: pho Discussed with: mjg Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D26346
# 5faf134c	16-Aug-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: assert that VI_TEXT_REF is not already set
# a92a971b	16-Aug-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: remove the thread argument from vget It was already asserted to be curthread. Semantic patch: @@ expression arg1, arg2, arg3; @@ - vget(arg1, arg2, arg3) + vget(arg1, arg2)
# 51ea7bea	07-Aug-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: add VOP_STAT The current scheme of calling VOP_GETATTR adds avoidable overhead. An example with tmpfs doing fstat (ops/s): before: 7488958 after: 7913833 Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D25910
# 2e4f8220	30-Jul-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: fold poll_no_poll into vop_nopoll The logic was almost completely present in vop_stdpoll anyway.
# abfdf767	30-Mar-2020	Konstantin Belousov <kib@FreeBSD.org>	VOP_GETPAGES_ASYNC(): consistently call iodone() callback in case of error. Reviewed by: glebius, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D24038
# ba8dd40b	14-Feb-2020	Mateusz Guzik <mjg@FreeBSD.org>	lockmgr: add a change missed in r357907
# c1b57fa7	14-Feb-2020	Mateusz Guzik <mjg@FreeBSD.org>	lockmgr: rename lock_fast_path to lock_flags The routine is not much of a fast path and the flags name better describes its purpose.
# 45757984	01-Feb-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: consistently use size_t for buflen around VOP_VPTOCNP
# cd0e46c6	26-Jan-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: remove vop loop from vop_sigdefer All ops are guaranteed to be present since r357131.
# a9099e5b	19-Jan-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: switch vop_stdunlock to call lockmgr_unlock Since the flags argument is now alawys 0 the new call provides the same behavior.
# cda31768	14-Jan-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: in vop_stdadd_writecount only vlazy vnodes on mounts using msync The only reason to vlazy there is to (overzealously) ensure all vnodes which need to be visited by msync scan can be found there. In particluar this is of no use zfs and tmpfs. While here depessimize the check.
# 57083d25	12-Jan-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: add per-mount vnode lazy list and use it for deferred inactive + msync This obviates the need to scan the entire active list looking for vnodes of interest. msync is handled by adding all vnodes with write count to the lazy list. deferred inactive directly adds vnodes as it sets the VI_DEFINACT flag. Vnodes get dequeued from the list when their hold count reaches 0. Newly added MNT_VNODE_FOREACH_LAZY* macros support filtering so that spurious locking is avoided in the common case. Reviewed by: jeff Tested by: pho (in a larger patch, previous version) Differential Revision: https://reviews.freebsd.org/D22995
# b249ce48	03-Jan-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: drop the mostly unused flags argument from VOP_UNLOCK Filesystems which want to use it in limited capacity can employ the VOP_UNLOCK_FLAGS macro. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D21427
# 6fa079fc	15-Dec-2019	Mateusz Guzik <mjg@FreeBSD.org>	vfs: flatten vop vectors This eliminates the following loop from all VOP calls: while(vop != NULL && \ vop->vop_spare2 == NULL && vop->vop_bypass == NULL) vop = vop->vop_default; Reviewed by: jeff Tesetd by: pho Differential Revision: https://reviews.freebsd.org/D22738
# ea9a16b2	12-Dec-2019	Rick Macklem <rmacklem@FreeBSD.org>	r355677 requires that vop_stdioctl() be global so it can be called from NFS. r355677 modified the NFS client so that it does lseek(SEEK_DATA/SEEK_HOLE) for NFSv4.2, but calls vop_stdioctl() otherwise. As such, vop_stdioctl() needs to be a global function. Missed during the code merge for r355677.
# c8b29d12	11-Dec-2019	Mateusz Guzik <mjg@FreeBSD.org>	vfs: locking primitives which elide ->v_vnlock and shared locking disablement Both of these features are not needed by many consumers and result in avoidable reads which in turn puts them on profiles due to cache-line ping ponging. On top of that the current lockgmr entry point is slower than necessary single-threaded. As an attempted clean up preparing for other changes, provide new routines which don't support any of the aforementioned features. With these patches in place vop_stdlock and vop_stdunlock disappear from flamegraphs during -j 104 buildkernel. Reviewed by: jeff (previous version) Tested by: pho Differential Revision: https://reviews.freebsd.org/D22665
# abd80ddb	08-Dec-2019	Mateusz Guzik <mjg@FreeBSD.org>	vfs: introduce v_irflag and make v_type smaller The current vnode layout is not smp-friendly by having frequently read data avoidably sharing cachelines with very frequently modified fields. In particular v_iflag inspected for VI_DOOMED can be found in the same line with v_usecount. Instead make it available in the same cacheline as the v_op, v_data and v_type which all get read all the time. v_type is avoidably 4 bytes while the necessary data will easily fit in 1. Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new flag field with a new value: VIRF_DOOMED. Reviewed by: kib, jeff Differential Revision: https://reviews.freebsd.org/D22715
# d245aa1e	17-Sep-2019	Mateusz Guzik <mjg@FreeBSD.org>	vfs: apply r352437 to the fast path as well This one is very hard to run into. If the filesystem is being unmounted or the mount point is freed the vfs_op_thread_enter will fail. For it to succeed the mount point itself would have to be reallocated in the time window between the initial read and the attempt to enter. Sponsored by: The FreeBSD Foundation
# 7f651859	17-Sep-2019	Mateusz Guzik <mjg@FreeBSD.org>	vfs: fix braino resulting in NULL pointer deref in r352424 The breakage was added after all the testing and the testing which followed was not sufficient to find it. Reported by: pho Sponsored by: The FreeBSD Foundation
# 4cace859	16-Sep-2019	Mateusz Guzik <mjg@FreeBSD.org>	vfs: convert struct mount counters to per-cpu There are 3 counters modified all the time in this structure - one for keeping the structure alive, one for preventing unmount and one for tracking active writers. Exact values of these counters are very rarely needed, which makes them a prime candidate for conversion to a per-cpu scheme, resulting in much better performance. Sample benchmark performing fstatfs (modifying 2 out of 3 counters) on a 104-way 2 socket Skylake system: before: 852393 ops/s after: 76682077 ops/s Reviewed by: kib, jeff Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21637
# a8c8e44b	16-Sep-2019	Mateusz Guzik <mjg@FreeBSD.org>	vfs: manage mnt_ref with atomics New primitive is introduced to denote sections can operate locklessly on aspects of struct mount, but which can also be disabled if necessary. This provides an opportunity to start scaling common case modifications while providing stable state of the struct when facing unmount, write suspendion or other events. mnt_ref is the first counter to start being managed in this manner with the intent to make it per-cpu. Reviewed by: kib, jeff Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21425
# 1874c61b	02-Sep-2019	Mateusz Guzik <mjg@FreeBSD.org>	vfs: restore mp null check in vop_stdgetwritemount The initially read mount point can already be NULL. Reported by: markj Fixes: r351656 ("vfs: stop refing freed mount points in vop_stdgetwritemount") Sponsored by: The FreeBSD Foundation
# 2796c209	01-Sep-2019	Mateusz Guzik <mjg@FreeBSD.org>	vfs: stop refing freed mount points in vop_stdgetwritemount The code used blindly ref based on an unsafely red address and then would backpedal if necessary. This was safe in terms of memory access since mounts are type-stable, but made for a potential a bug where the mount was reused and had the count reset to 0 before this code decreased it. Reviewed by: kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21411
# 1e2f0ceb	28-Aug-2019	Mateusz Guzik <mjg@FreeBSD.org>	vfs: add VOP_NEED_INACTIVE vnode usecount drops to 0 all the time (e.g. for directories during path lookup). When that happens the kernel would always lock the exclusive lock for the vnode in order to call vinactive(). This blocks other threads who want to use the vnode for looukp. vinactive is very rarely needed and can be tested for without the vnode lock held. This patch gives filesytems an opportunity to do it, sample total wait time for tmpfs over 500 minutes of poudriere -j 104: before: 557563641706 (lockmgr:tmpfs) after: 46309603301 (lockmgr:tmpfs) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21371
# 2e1b32c0	18-Aug-2019	Rick Macklem <rmacklem@FreeBSD.org>	Add a vop_stdioctl() that performs a trivial FIOSEEKDATA/FIOSEEKHOLE. Without this patch, when an application performed lseek(SEEK_DATA/SEEK_HOLE) on a file in a file system that does not have its own VOP_IOCTL(), the lseek(2) fails with errno ENOTTY. This didn't seem appropriate, since ENOTTY is not listed as an error return by either the lseek(2) man page nor the POSIX draft for lseek(2). A discussion on freebsd-current@ seemed to indicate that implementing a trivial algorithm that returns the offset argument for FIOSEEKDATA and returns the file's size for FIOSEEKHOLE was the preferred fix. http://docs.FreeBSD.org/cgi/mid.cgi?CAOtMX2iiQdv1+15e1N_r7V6aCx_VqAJCTP1AW+qs3Yg7sPg9wA The Linux kernel appears to implement this trivial algorithm as well. This patch adds a vop_stdioctl() that implements this trivial algorithm. It returns errors consistent with vn_bmap_seekhole() and, as such, will still return ENOTTY for non-regular files. I have proposed a separate patch that maps errors not described by the lseek(2) man page nor POSIX draft to EINVAL. This patch is under separate review. Reviewed by: kib Relnotes: yes Differential Revision: https://reviews.freebsd.org/D21299
# de4e1aeb	18-Aug-2019	Konstantin Belousov <kib@FreeBSD.org>	Fix an issue with executing tmpfs binary. Suppose that a binary was executed from tmpfs mount, and the text vnode was reclaimed while the binary was still running. It is possible during even the normal operations since tmpfs vnode' vm_object has swap type, and no references on the vnode is held. Also assume that the text vnode was revived for some reason. Then, on the process exit or exec, unmapping of the text mapping tries to remove the text reference from the vnode, but since it went from recycle/instantiation cycle, there is no reference kept, and assertion in VOP_UNSET_TEXT_CHECKED() triggers. Fix this by keeping a use reference on the tmpfs vnode for each exec reference. This prevents the vnode reclamation while executable map entry is active. Do it by adding per-mount flag MNTK_TEXT_REFS that directs vop_stdset_text() to add use ref on first vnode text use, and per-vnode VI_TEXT_REF flag, to record the need on unref in vop_stdunset_text() on last vnode text use going away. Set MNTK_TEXT_REFS for tmpfs mounts. Reported by: bdrewery Tested by: sbruno, pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week
# bbbbeca3	24-Jul-2019	Rick Macklem <rmacklem@FreeBSD.org>	Add kernel support for a Linux compatible copy_file_range(2) syscall. This patch adds support to the kernel for a Linux compatible copy_file_range(2) syscall and the related VOP_COPY_FILE_RANGE(9). This syscall/VOP can be used by the NFSv4.2 client to implement the Copy operation against an NFSv4.2 server to do file copies locally on the server. The vn_generic_copy_file_range() function in this patch can be used by the NFSv4.2 server to implement the Copy operation. Fuse may also me able to use the VOP_COPY_FILE_RANGE() method. vn_generic_copy_file_range() attempts to maintain holes in the output file in the range to be copied, but may fail to do so if the input and output files are on different file systems with different _PC_MIN_HOLE_SIZE values. Separate commits will be done for the generated syscall files and userland changes. A commit for a compat32 syscall will be done later. Reviewed by: kib, asomers (plus comments by brooks, jilles) Relnotes: yes Differential Revision: https://reviews.freebsd.org/D20584
# d01752c7	20-Jun-2019	Alan Somers <asomers@FreeBSD.org>	Add a VOP_BMAP(9) man page Reviewed by: mckusick MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20704
# 5d993207	30-May-2019	Konstantin Belousov <kib@FreeBSD.org>	Silence witness warning about duplicated mutex type. The order is correct, it is nullfs vnode interlock -> lower vnode interlock. vop_stdadd_writecount() is called from nullfs VOP_ADD_WRITECOUNT() and both take interlocks. Requested by: markj Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# 78022527	05-May-2019	Konstantin Belousov <kib@FreeBSD.org>	Switch to use shared vnode locks for text files during image activation. kern_execve() locks text vnode exclusive to be able to set and clear VV_TEXT flag. VV_TEXT is mutually exclusive with the v_writecount > 0 condition. The change removes VV_TEXT, replacing it with the condition v_writecount <= -1, and puts v_writecount under the vnode interlock. Each text reference decrements v_writecount. To clear the text reference when the segment is unmapped, it is recorded in the vm_map_entry backed by the text file as MAP_ENTRY_VN_TEXT flag, and v_writecount is incremented on the map entry removal The operations like VOP_ADD_WRITECOUNT() and VOP_SET_TEXT() check that v_writecount does not contradict the desired change. vn_writecheck() is now racy and its use was eliminated everywhere except access. Atomic check for writeability and increment of v_writecount is performed by the VOP. vn_truncate() now increments v_writecount around VOP_SETATTR() call, lack of which is arguably a bug on its own. nullfs bypasses v_writecount to the lower vnode always, so nullfs vnode has its own v_writecount correct, and lower vnode gets all references, since object->handle is always lower vnode. On the text vnode' vm object dealloc, the v_writecount value is reset to zero, and deadfs vop_unset_text short-circuit the operation. Reclamation of lowervp always reclaims all nullfs vnodes referencing lowervp first, so no stray references are left. Reviewed by: markj, trasz Tested by: mjg, pho Sponsored by: The FreeBSD Foundation MFC after: 1 month Differential revision: https://reviews.freebsd.org/D19923
# ae909414	09-Apr-2019	Konstantin Belousov <kib@FreeBSD.org>	Add vn_fsync_buf(). Provide a convenience function to avoid the hack with filling fake struct vop_fsync_args and then calling vop_stdfsync(). Sponsored by: The FreeBSD Foundation MFC after: 1 week
# f5fdf82d	11-Mar-2019	Simon J. Gerraty <sjg@FreeBSD.org>	Add _PC_ACL_* to vop_stdpathconf This avoid EINVAL from tmpfs etc. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D19512
# 5ef4f86d	22-Dec-2018	Bruce Evans <bde@FreeBSD.org>	Oops, rounddown() for the start was misspelled roundup() in r342295, so only aligned starts worked. This broke releasing caches in most cases where the i/o size is smaller than the fs block size.
# 2c0434ac	20-Dec-2018	Bruce Evans <bde@FreeBSD.org>	Fix rounding in vop_stdadvise() for POSIX_FADV_NOREUSE (really POSIX_FADV_DONTNEED). The most broken case was for applications that advise for the whole file and then do block-aligned i/o's 1 block at a time. Then advice is sent to VOP_ADVISE() 1 block at a time, but in vop_stdadvise() the 1-block advice was turned into 0-block advice for the buffer cache part. The bugs were caused partly by callers representing the region as (a_start, a_end), where a_end is actually the maximum, and everything else representing the region as (start, end) where 'end' is actually the end (1 after the maximum). The maximum a_end must be rounded up, but was rounded down. Also, rounding to page boundaries was inconsistent. The bugs and fixes have no effect for zfs and other file systems that don't use the buffer cache or the page cache. Most or all file systems currently use the default VOP_FADVISE(), but it finds a null buffer cache and a null page cache for file systems that don't use normal methods. Reviewed by: kib
# 8ff7fad1	23-Oct-2018	Konstantin Belousov <kib@FreeBSD.org>	Only call sigdeferstop() for NFS. Use bypass to catch any NFS VOP dispatch and route it through the wrapper which does sigdeferstop() and then dispatches original VOP. NFS does not need a bypass below it, which is not supported. The vop offset in the vop_vector is added since otherwise it is impossible to get vop_op_t from the internal table, and I did not wanted to create the layered fs only to wrap NFS VOPs. VFS_OP()s wrap is straightforward. Requested and reviewed by: mjg (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D17658
# 2186ee6e	19-May-2018	Mateusz Guzik <mjg@FreeBSD.org>	vfs: simplify vop_stdlock/unlock The interlock pointer is non-NULL by definition and the compiler see through that and eliminates the NULL checks. Just remove them from the code as they play no role. No difference in generated assembly.
# 16680b6a	23-Feb-2018	Kirk McKusick <mckusick@FreeBSD.org>	Include error number in the "fsync: giving up on dirty" message (in case it ever starts happening again in spite of 328444). Submitted by: Andreas Longwitz <longwitz at incore.de>
# 4d93711d	26-Jan-2018	Kirk McKusick <mckusick@FreeBSD.org>	For many years the message "fsync: giving up on dirty" has occationally appeared on UFS/FFS filesystems. In some cases it was promptly followed by a panic of "softdep_deallocate_dependencies: dangling deps". This fix should eliminate both of these occurences. Submitted by: Andreas Longwitz <longwitz at incore.de> Reviewed by: kib Tested by: Peter Holm (pho) PR: 225423 MFC after: 1 week
# b501cc5d	19-Dec-2017	John Baldwin <jhb@FreeBSD.org>	Rework pathconf handling for FIFOs. On the one hand, FIFOs should respect other variables not supported by the fifofs vnode operation (such as _PC_NAME_MAX, _PC_LINK_MAX, etc.). These values are fs-specific and must come from a fs-specific method. On the other hand, filesystems that support FIFOs are required to support _PC_PIPE_BUF on directory vnodes that can contain FIFOs. Given this latter requirement, once the fs-specific VOP_PATHCONF method supports _PC_PIPE_BUF for directories, it is also suitable for FIFOs permitting a single VOP_PATHCONF method to be used for both FIFOs and non-FIFOs. To that end, retire all of the FIFO-specific pathconf methods from filesystems and change FIFO-specific vnode operation switches to use the existing fs-specific VOP_PATHCONF method. For fifofs, set it's VOP_PATHCONF to VOP_PANIC since it should no longer be used. While here, move _PC_PIPE_BUF handling out of vop_stdpathconf() so that only filesystems supporting FIFOs will report a value. In addition, only report a valid _PC_PIPE_BUF for directories and FIFOs. Discussed with: bde Reviewed by: kib (part of a larger patch) MFC after: 1 month Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D12572
# 599afe53	19-Dec-2017	John Baldwin <jhb@FreeBSD.org>	Move NAME_MAX, LINK_MAX, and CHOWN_RESTRICTED out of vop_stdpathconf(). Having all filesystems fall through to default values isn't always correct and these values can vary for different filesystem implementations. Most of these changes just use the existing default values with a few exceptions: - Don't report CHOWN_RESTRICTED for ZFS since it doesn't do the exact permissions check this claims for chown(). - Use NANDFS_NAME_LEN for NAME_MAX for nandfs. - Don't report a LINK_MAX of 0 on smbfs. Now fail with EINVAL to indicate hard links aren't supported. Requested by: bde (though perhaps not this exact implementation) Reviewed by: kib (earlier version) MFC after: 1 month Sponsored by: Chelsio Communications
# 51369649	20-Nov-2017	Pedro F. Giffuni <pfg@FreeBSD.org>	sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.
# e1d15b89	21-Sep-2017	John Baldwin <jhb@FreeBSD.org>	Only handle _PC_MAX_CANON, _PC_MAX_INPUT, and _PC_VDISABLE for TTY devices. Move handling of these three pathconf() variables out of vop_stdpathconf() and into devfs_pathconf() as TTY devices can only be devfs files. In addition, only return settings for these three variables for devfs devices whose device switch has the D_TTY flag set. Discussed with: bde, kib Sponsored by: Chelsio Communications
# 0c3c207f	02-Jun-2017	Gleb Smirnoff <glebius@FreeBSD.org>	For UNIX sockets make vnode point not to the socket, but to the UNIX PCB, since the latter is the thing that links together VFS and sockets. While here, make the union in the struct vnode anonymous.
# 52d1adda	15-Mar-2017	Alan Cox <alc@FreeBSD.org>	Relax the locking requirements for vm_object_page_noreuse(). While reviewing all uses of OFF_TO_IDX(), I observed that vm_object_page_noreuse() is requiring an exclusive lock on the object when, in fact, a shared lock suffices. Reviewed by: kib, markj MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D10011
# 6fec662c	21-Feb-2017	Warner Losh <imp@FreeBSD.org>	Make the code match the comments: If we have ANY buf's that failed then return EAGAIN. The current code just returns that if the LAST buf failed. Reviewed by: kib@, trasz@ Differential Revision: https://reviews.freebsd.org/D9677
# c4a48867	12-Feb-2017	Mateusz Guzik <mjg@FreeBSD.org>	lockmgr: implement fast path The main lockmgr routine takes 8 arguments which makes it impossible to tail-call it by the intermediate vop_stdlock/unlock routines. The routine itself starts with an if-forest and reads from the lock itself several times. This slows things down both single- and multi-threaded. With the patch single-threaded fstats go 4% up and multithreaded up to ~27%. Note that there is still a lot of room for improvement. Reviewed by: kib Tested by: pho
# 2f304845	05-Jan-2017	Konstantin Belousov <kib@FreeBSD.org>	Do not allocate struct statfs on kernel stack. Right now size of the structure is 472 bytes on amd64, which is already large and stack allocations are indesirable. With the ino64 work, MNAMELEN is increased to 1024, which will make it impossible to have struct statfs on the stack. Extracted from: ino64 work by gleb Discussed with: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 69a28758	15-Sep-2016	Ed Maste <emaste@FreeBSD.org>	Renumber license clauses in sys/kern to avoid skipping #3
# 47e61f6c	15-Aug-2016	Konstantin Belousov <kib@FreeBSD.org>	Implement VOP_FDATASYNC() for msdosfs. Standard VOP_FSYNC() implementation just syncs data buffers, and due to this, is the correct and efficient implementation for msdosfs or any other filesystem which uses bufer cache trivially. Provide globally visible wrapper vop_stdfdatasync_buf() for future consumption by other filesystems. Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D7471
# 295af703	15-Aug-2016	Konstantin Belousov <kib@FreeBSD.org>	Add an implementation of fdatasync(2). The syscall is a trivial wrapper around new VOP_FDATASYNC(), sharing code with fsync(2). For all filesystems, this commit provides the implementation which delegates the work of VOP_FDATASYNC() to VOP_FSYNC(). This is functionally correct but not efficient. This is not yet POSIX-compliant implementation, because it does not ensure that queued AIO requests are completed before returning. Reviewed by: mckusick Discussed with: avg (ZFS), jhb (AIO part) Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D7471
# c73fb331	15-Aug-2016	Konstantin Belousov <kib@FreeBSD.org>	VOP_FSYNC() does not take cred as an argument. Correct comment. Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 411455a8	10-Aug-2016	Edward Tomasz Napierala <trasz@FreeBSD.org>	Replace all remaining calls to vprint(9) with vn_printf(9), and remove the old macro. MFC after: 1 month
# 399e8c17	09-Mar-2016	John Baldwin <jhb@FreeBSD.org>	Simplify AIO initialization now that it is standard. - Mark AIO system calls as STD and remove the helpers to dynamically register them. - Use COMPAT6 for the old system calls with the older sigevent instead of an 'o' prefix. - Simplify the POSIX configuration to note that AIO is always available. - Handle AIO in the default VOP_PATHCONF instead of special casing it in the pathconf() system call. fpathconf() is still hackish. - Remove freebsd32_aio_cancel() as it just called the native one directly. Reviewed by: kib Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D5589
# 1041e090	05-Jan-2016	Konstantin Belousov <kib@FreeBSD.org>	Two fixes for excessive iterations after r292326. Advance the logical block number to the lblkno of the found block plus one, instead of incrementing the block number which was used for lookup. This change skips sparcely populated buffer ranges, similar to r292325, instead of doing useless lookups. Do not restart the bnoreuselist() from the start of the range if buffer lock cannot be obtained without sleep. Only retry lookup and lock for the same queue and same logical block number. Reported by: benno Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 days
# b0cd2017	16-Dec-2015	Gleb Smirnoff <glebius@FreeBSD.org>	A change to KPI of vm_pager_get_pages() and underlying VOP_GETPAGES(). o With new KPI consumers can request contiguous ranges of pages, and unlike before, all pages will be kept busied on return, like it was done before with the 'reqpage' only. Now the reqpage goes away. With new interface it is easier to implement code protected from race conditions. Such arrayed requests for now should be preceeded by a call to vm_pager_haspage() to make sure that request is possible. This could be improved later, making vm_pager_haspage() obsolete. Strenghtening the promises on the business of the array of pages allows us to remove such hacks as swp_pager_free_nrpage() and vm_pager_free_nonreq(). o New KPI accepts two integer pointers that may optionally point at values for read ahead and read behind, that a pager may do, if it can. These pages are completely owned by pager, and not controlled by the caller. This shifts the UFS-specific readahead logic from vm_fault.c, which should be file system agnostic, into vnode_pager.c. It also removes one VOP_BMAP() request per hard fault. Discussed with: kib, alc, jeff, scottl Sponsored by: Nginx, Inc. Sponsored by: Netflix
# 106ebb76	16-Dec-2015	Konstantin Belousov <kib@FreeBSD.org>	Optimize vop_stdadvise(POSIX_FADV_DONTNEED). Instead of looking up a buffer for each block number in the range with gbincore(), look up the next instantiated buffer with the logical block number which is greater or equal to the next lblkno. This significantly speeds up the iteration for sparce-populated range. Move the iteration into new helper bnoreuselist(), which is structured similarly to flushbuflist(). Reported and tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation
# 0a19cfd4	01-Oct-2015	Mark Johnston <markj@FreeBSD.org>	Ensure that vop_stdadvise() does not call getblk() on vnodes that have an empty bufobj. Otherwise, vnodes belonging to filesystems that do not use the buffer cache may trigger assertion failures. Reported by: Fabien Keil
# 3138cd36	30-Sep-2015	Mark Johnston <markj@FreeBSD.org>	As a step towards the elimination of PG_CACHED pages, rework the handling of POSIX_FADV_DONTNEED so that it causes the backing pages to be moved to the head of the inactive queue instead of being cached. This affects the implementation of POSIX_FADV_NOREUSE as well, since it works by applying POSIX_FADV_DONTNEED to file ranges after they have been read or written. At that point the corresponding buffers may still be dirty, so the previous implementation would coalesce successive ranges and apply POSIX_FADV_DONTNEED to the result, ensuring that pages backing the dirty buffers would eventually be cached. To preserve this behaviour in an efficient manner, this change adds a new buf flag, B_NOREUSE, which causes the pages backing a VMIO buf to be placed at the head of the inactive queue when the buf is released. POSIX_FADV_NOREUSE then works by setting this flag in bufs that underlie the specified range. Reviewed by: alc, kib Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D3726
# 9ca30b0e	04-Jul-2015	Mateusz Guzik <mjg@FreeBSD.org>	vfs: use shared vnode locking when looking up ".." in vop_stdvptocnp Briefly discussed with: kib
# 46b34e5a	25-Dec-2014	Rick Macklem <rmacklem@FreeBSD.org>	Fix the comment introduced in r276192 so that it clearly states that the change is needed to avoid a deadlock. Suggested by: kib MFC after: 1 week
# 65cb225c	24-Dec-2014	Rick Macklem <rmacklem@FreeBSD.org>	Modify vop_stdadvlock{async}() so that it only locks/unlocks the vnode and does a VOP_GETATTR() for the SEEK_END case. This is safe to do, since lf_advlock{async}() only uses the size argument for the SEEK_END case. The NFSv4 server needs this when vfs.nfsd.enable_locallocks!=0 since locking the vnode results in a LOR that can cause a deadlock for the nfsd threads. Reviewed by: kib MFC after: 1 week
# 90effb23	22-Nov-2014	Gleb Smirnoff <glebius@FreeBSD.org>	Merge from projects/sendfile: o Provide a new VOP_GETPAGES_ASYNC(), which works like VOP_GETPAGES(), but doesn't sleep. It returns immediately, and will execute the I/O done handler function that must be supplied as argument. o Provide VOP_GETPAGES_ASYNC() for the FFS, which uses vnode_pager. o Extend pagertab to support pgo_getpages_async method, and implement this method for vnode_pager. Reviewed by: kib Tested by: pho Sponsored by: Netflix Sponsored by: Nginx, Inc.
# 27ad26d8	09-Sep-2014	Gleb Smirnoff <glebius@FreeBSD.org>	Remove unused arguments for VOP_GETPAGES(), VOP_PUTPAGES().
# 22a72260	30-May-2013	Jeff Roberson <jeff@FreeBSD.org>	- Convert the bufobj lock to rwlock. - Use a shared bufobj lock in getblk() and inmem(). - Convert softdep's lk to rwlock to match the bufobj lock. - Move INFREECNT to b_flags and protect it with the buf lock. - Remove unnecessary locking around bremfree() and BKGRDINPROG. Sponsored by: EMC / Isilon Storage Division Discussed with: mckusick, kib, mdf
# 89f6b863	08-Mar-2013	Attilio Rao <attilio@FreeBSD.org>	Switch the vm_object mutex to be a rwlock. This will enable in the future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes. The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs. The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example). Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho
# 58248e57	01-Mar-2013	Konstantin Belousov <kib@FreeBSD.org>	Make the default implementation of the VOP_VPTOCNP() fail if the directory entry, matched by the inode number, is ".". NFSv4 client might instantiate the distinct vnodes which have the same inode number, since single v4 export can be combined from several filesystems on the server. For instance, a case when the nested server mount point is exactly one directory below the top of the export, causes directory and its parent to have the same inode number 2. The vop_stdvptocnp() algorithm then returns "." as the name of the lower directory. Filtering out the "." entry with ENOENT works around this behaviour, the error forces getcwd(3) to fall back to usermode implementation, which compares both st_dev and st_ino. Based on the submission by: rmacklem Tested by: rmacklem MFC after: 1 week
# 140dedb8	02-Nov-2012	Konstantin Belousov <kib@FreeBSD.org>	The r241025 fixed the case when a binary, executed from nullfs mount, was still possible to open for write from the lower filesystem. There is a symmetric situation where the binary could already has file descriptors opened for write, but it can be executed from the nullfs overlay. Handle the issue by passing one v_writecount reference to the lower vnode if nullfs vnode has non-zero v_writecount. Note that only one write reference can be donated, since nullfs only keeps one use reference on the lower vnode. Always use the lower vnode v_writecount for the checks. Introduce the VOP_GET_WRITECOUNT to read v_writecount, which is currently always bypassed to the lower vnode, and VOP_ADD_WRITECOUNT to manipulate the v_writecount value, which manages a single bypass reference to the lower vnode. Caling the VOPs instead of directly accessing v_writecount provide the fix described in the previous paragraph. Tested by: pho MFC after: 3 weeks
# 5050aa86	22-Oct-2012	Konstantin Belousov <kib@FreeBSD.org>	Remove the support for using non-mpsafe filesystem modules. In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho
# 877d24ac	28-Sep-2012	Konstantin Belousov <kib@FreeBSD.org>	Fix the mis-handling of the VV_TEXT on the nullfs vnodes. If you have a binary on a filesystem which is also mounted over by nullfs, you could execute the binary from the lower filesystem, or from the nullfs mount. When executed from lower filesystem, the lower vnode gets VV_TEXT flag set, and the file cannot be modified while the binary is active. But, if executed as the nullfs alias, only the nullfs vnode gets VV_TEXT set, and you still can open the lower vnode for write. Add a set of VOPs for the VV_TEXT query, set and clear operations, which are correctly bypassed to lower vnode. Tested by: pho (previous version) MFC after: 2 weeks
# 75c898f2	09-Jun-2012	Kirk McKusick <mckusick@FreeBSD.org>	When synchronously syncing a device (MNT_WAIT), wait for buffers to become available. Otherwise we may excessively spin and fail with ``fsync: giving up on dirty''. Reviewed by: kib Tested by: Peter Holm MFC after: 1 week
# ac13a90c	16-May-2012	Gleb Kurtsou <gleb@FreeBSD.org>	Skip directory entries with zero inode number during traversal. Entries with zero inode number are considered placeholders by libc and UFS. Fix remaining uses of VOP_READDIR in kernel: vop_stdvptocnp, unionfs. Sponsored by: Google Summer of Code 2011
# 71469bb3	17-Apr-2012	Kirk McKusick <mckusick@FreeBSD.org>	Replace the MNT_VNODE_FOREACH interface with MNT_VNODE_FOREACH_ALL. The primary changes are that the user of the interface no longer needs to manage the mount-mutex locking and that the vnode that is returned has its mutex locked (thus avoiding the need to check to see if its is DOOMED or other possible end of life senarios). To minimize compatibility issues for third-party developers, the old MNT_VNODE_FOREACH interface will remain available so that this change can be MFC'ed to 9. Following the MFC to 9, MNT_VNODE_FOREACH will be removed in head. The reason for this update is to prepare for the addition of the MNT_VNODE_FOREACH_ACTIVE interface that will loop over just the active vnodes associated with a mount point (typically less than 1% of the vnodes associated with the mount point). Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks
# c7e41c8b	29-Feb-2012	Mikolaj Golub <trociny@FreeBSD.org>	Introduce VOP_UNP_BIND(), VOP_UNP_CONNECT(), and VOP_UNP_DETACH() operations for setting and accessing vnode's v_socket field. The operations are necessary to implement proper unix socket handling on layered file systems like nullfs(5). This change fixes the long standing issue with nullfs(5) being in that unix sockets did not work between lower and upper layers: if we bound to a socket on the lower layer we could connect only to the lower path; if we bound to the upper layer we could connect only to the upper path. The new behavior is one can connect to both the lower and the upper paths regardless what layer path one binds to. PR: kern/51583, kern/159663 Suggested by: kib Reviewed by: arch MFC after: 2 weeks
# f82360ac	19-Nov-2011	Konstantin Belousov <kib@FreeBSD.org>	Existing VOP_VPTOCNP() interface has a fatal flow that is critical for nullfs. The problem is that resulting vnode is only required to be held on return from the successfull call to vop, instead of being referenced. Nullfs VOP_INACTIVE() method reclaims the vnode, which in combination with the VOP_VPTOCNP() interface means that the directory vnode returned from VOP_VPTOCNP() is reclaimed in advance, causing vn_fullpath() to error with EBADF or like. Change the interface for VOP_VPTOCNP(), now the dvp must be referenced. Convert all in-tree implementations of VOP_VPTOCNP(), which is trivial, because vhold(9) and vref(9) are similar in the locking prerequisites. Out-of-tree fs implementation of VOP_VPTOCNP(), if any, should have no trouble with the fix. Tested by: pho Reviewed by: mckusick MFC after: 3 weeks (subject of re approval)
# 936c09ac	03-Nov-2011	John Baldwin <jhb@FreeBSD.org>	Add the posix_fadvise(2) system call. It is somewhat similar to madvise(2) except that it operates on a file descriptor instead of a memory region. It is currently only supported on regular files. Just as with madvise(2), the advice given to posix_fadvise(2) can be divided into two types. The first type provide hints about data access patterns and are used in the file read and write routines to modify the I/O flags passed down to VOP_READ() and VOP_WRITE(). These modes are thus filesystem independent. Note that to ease implementation (and since this API is only advisory anyway), only a single non-normal range is allowed per file descriptor. The second type of hints are used to hint to the OS that data will or will not be used. These hints are implemented via a new VOP_ADVISE(). A default implementation is provided which does nothing for the WILLNEED request and attempts to move any clean pages to the cache page queue for the DONTNEED request. This latter case required two other changes. First, a new V_CLEANONLY flag was added to vinvalbuf(). This requests vinvalbuf() to only flush clean buffers for the vnode from the buffer cache and to not remove any backing pages from the vnode. This is used to ensure clean pages are not wired into the buffer cache before attempting to move them to the cache page queue. The second change adds a new vm_object_page_cache() method. This method is somewhat similar to vm_object_page_remove() except that instead of freeing each page in the specified range, it attempts to move clean pages to the cache queue if possible. To preserve the ABI of struct file, the f_cdevpriv pointer is now reused in a union to point to the currently active advice region if one is present for regular files. Reviewed by: jilles, kib, arch@ Approved by: re (kib) MFC after: 1 month
# 694a586a	21-May-2011	Rick Macklem <rmacklem@FreeBSD.org>	Add a lock flags argument to the VFS_FHTOVP() file system method, so that callers can indicate the minimum vnode locking requirement. This will allow some file systems to choose to return a LK_SHARED locked vnode when LK_SHARED is specified for the flags argument. This patch only adds the flag. It does not change any file system to use it and all callers specify LK_EXCLUSIVE, so file system semantics are not changed. Reviewed by: kib
# 1ce4508f	19-Apr-2011	Matthew D Fleming <mdf@FreeBSD.org>	Allow VOP_ALLOCATE to be iterative, and have kern_posix_fallocate(9) drive looping and potentially yielding. Requested by: kib
# d91f88f7	18-Apr-2011	Matthew D Fleming <mdf@FreeBSD.org>	Add the posix_fallocate(2) syscall. The default implementation in vop_stdallocate() is filesystem agnostic and will run as slow as a read/write loop in userspace; however, it serves to correctly implement the functionality for filesystems that do not implement a VOP_ALLOCATE. Note that __FreeBSD_version was already bumped today to 900036 for any ports which would like to use this function. Also reserve space in the syscall table for posix_fadvise(2). Reviewed by: -arch (previous version)
# a7d5f7eb	19-Oct-2010	Jamie Gritton <jamie@FreeBSD.org>	A new jail(8) with a configuration file, to replace the work currently done by /etc/rc.d/jail.
# c2d844d8	25-Aug-2010	Brian Somers <brian@FreeBSD.org>	If we read zero bytes from the directory, early out with ENOENT rather than forging ahead and interpreting garbage buffer content and dirent structures. This change backs out r211684 which was essentially a no-op. MFC after: 1 week
# 90db41b6	22-Aug-2010	Brian Somers <brian@FreeBSD.org>	uio_resid isn't updated by VOP_READDIR for nfs filesystems. Use the uio_offset adjustment instead to calculate a correct *len. Without this change, we run off the end of the directory data we're reading and panic horribly for nfs filesystems. MFC after: 1 week
# 7fd32ea9	12-May-2010	Zachary Loafman <zml@FreeBSD.org>	Add VOP_ADVLOCKPURGE so that the file system is called when purging locks (in the case where the VFS impl isn't using lf_*) Submitted by: Matthew Fleming <matthew.fleming@isilon.com> Reviewed by: zml, dfr
# f2c7b986	09-Apr-2010	Konstantin Belousov <kib@FreeBSD.org>	MFC r206094: Supply default implementation of VOP_RENAME() that does neccessary unlocks and unreferences for argument vnodes, as expected by kern_renameat(9), and returns EOPNOTSUPP. This fixes locks and reference leaks when rename is attempted on fs that does not implement rename.
# 0bc7bd67	02-Apr-2010	Konstantin Belousov <kib@FreeBSD.org>	Supply default implementation of VOP_RENAME() that does neccessary unlocks and unreferences for argument vnodes, as expected by kern_renameat(9), and returns EOPNOTSUPP. This fixes locks and reference leaks when rename is attempted on fs that does not implement rename. PR: kern/107439 Based on submission by: Mikolaj Golub <to.my.trociny gmail com> Tested by: Mikolaj Golub MFC after: 1 week
# bd7ae209	27-Mar-2010	Edward Tomasz Napierala <trasz@FreeBSD.org>	MFC r197680: Provide default implementation for VOP_ACCESS(9), so that filesystems which want to provide VOP_ACCESSX(9) don't have to implement both. Note that this commit makes implementation of either of these two mandatory. Reviewed by: kib
# e15d7a0b	18-Feb-2010	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Use vput() instead of VOP_UNLOCK()+vrele(). The comment here is out-dated, we no longer pass thread pointer to VOP_UNLOCK().
# 23e62e76	11-Nov-2009	Edward Tomasz Napierala <trasz@FreeBSD.org>	Revert r198873. Having different VAPPEND semantics for VOP_ACCESS(9) and VOP_ACCESSX(9) is not a good idea.
# 597954c8	03-Nov-2009	Edward Tomasz Napierala <trasz@FreeBSD.org>	While VAPPEND without VWRITE makes sense for VOP_ACCESSX(9) (e.g. to check for the permission to create subdirectory (ACE4_ADD_SUBDIRECTORY)), it doesn't really make sense for VOP_ACCESS(9). Also, many VOP_ACCESS(9) implementations don't expect that. Make sure we don't confuse them.
# 2c29cfa0	01-Oct-2009	Edward Tomasz Napierala <trasz@FreeBSD.org>	Provide default implementation for VOP_ACCESS(9), so that filesystems which want to provide VOP_ACCESSX(9) don't have to implement both. Note that this commit makes implementation of either of these two mandatory. Reviewed by: kib
# c808c963	21-Jun-2009	Konstantin Belousov <kib@FreeBSD.org>	Add explicit struct ucred * argument for VOP_VPTOCNP, to be used by vn_open_cred in default implementation. Valid struct ucred is needed for audit and MAC, and curthread credentials may be wrong. This further requires modifying the interface of vn_fullpath(9), but it is out of scope of this change. Reviewed by: rwatson
# e0c161b8	21-Jun-2009	Konstantin Belousov <kib@FreeBSD.org>	Add another flags argument to vn_open_cred. Use it to specify that some vn_open_cred invocations shall not audit namei path. In particular, specify VN_OPEN_NOAUDIT for dotdot lookup performed by default implementation of vop_vptocnp, and for the open done for core file. vn_fullpath is called from the audit code, and vn_open there need to disable audit to avoid infinite recursion. Core file is created on return to user mode, that, in particular, happens during syscall return. The creation of the core file is audited by direct calls, and we do not want to overwrite audit information for syscall. Reported, reviewed and tested by: rwatson
# af465604	05-Jun-2009	Robert Watson <rwatson@FreeBSD.org>	Add mac_framework.h include missed when MAC code was (presumably) copied from another file.
# c97fcdba	30-May-2009	Edward Tomasz Napierala <trasz@FreeBSD.org>	Add VOP_ACCESSX, which can be used to query for newly added V* permissions, such as VWRITE_ACL. For a filsystems that don't implement it, there is a default implementation, which works as a wrapper around VOP_ACCESS. Reviewed by: rwatson@
# dfd233ed	11-May-2009	Attilio Rao <attilio@FreeBSD.org>	Remove the thread argument from the FSD (File-System Dependent) parts of the VFS. Now all the VFS_* functions and relating parts don't want the context as long as it always refers to curthread. In some points, in particular when dealing with VOPs and functions living in the same namespace (eg. vflush) which still need to be converted, pass curthread explicitly in order to retain the old behaviour. Such loose ends will be fixed ASAP. While here fix a bug: now, UFS_EXTATTR can be compiled alone without the UFS_EXTATTR_AUTOSTART option. VFS KPI is heavilly changed by this commit so thirdy parts modules needs to be recompiled. Bump __FreeBSD_version in order to signal such situation.
# 885868cd	10-Apr-2009	Robert Watson <rwatson@FreeBSD.org>	Remove VOP_LEASE and supporting functions. This hasn't been used since the removal of NQNFS, but was left in in case it was required for NFSv4. Since our new NFSv4 client and server can't use it for their requirements, GC the old mechanism, as well as other unused lease- related code and interfaces. Due to its impact on kernel programming and binary interfaces, this change should not be MFC'd. Proposed by: jeff Reviewed by: jeff Discussed with: rmacklem, zach loafman @ isilon
# f8ecc407	08-Mar-2009	Joe Marcus Clarke <marcus@FreeBSD.org>	Add a default implementation for VOP_VPTOCNP(9) which scans the parent directory of a vnode to find a dirent with a matching file number. The name from that dirent is then used to provide the component name. Note: if the initial vnode argument is not a directory itself, then the default VOP_VPTOCNP(9) implementation still returns ENOENT. Reviewed by: kib Approved by: kib Tested by: pho
# 125dcf8c	06-Mar-2009	Konstantin Belousov <kib@FreeBSD.org>	Extract the no_poll() and vop_nopoll() code into the common routine poll_no_poll(). Return a poll_no_poll() result from devfs_poll_f() when filedescriptor does not reference the live cdev, instead of ENXIO. Noted and tested by: hps MFC after: 1 week
# b9022449	11-Dec-2008	Joe Marcus Clarke <marcus@FreeBSD.org>	Add a new VOP, VOP_VPTOCNP, which translates a vnode to its component name on a best-effort basis. Teach vn_fullpath to use this new VOP if a regular VFS cache lookup fails. This VOP is designed to supplement the VFS cache to provide a better chance that a vnode-to-name lookup will succeed. Currently, an implementation for devfs is being committed. The default implementation is to return ENOENT. A big thanks to kib for the mentorship on this, and to pho for running it through his stress test suite. Reviewed by: arch Approved by: kib
# d7f03759	19-Oct-2008	Ulf Lilleengen <lulf@FreeBSD.org>	- Import the HEAD csup code which is the basis for the cvsmode work.
# 0359a12e	28-Aug-2008	Attilio Rao <attilio@FreeBSD.org>	Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread was always curthread and totally unuseful. Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
# eab626f1	16-Apr-2008	Konstantin Belousov <kib@FreeBSD.org>	Move the head of byte-level advisory lock list from the filesystem-specific vnode data to the struct vnode. Provide the default implementation for the vop_advlock and vop_advlockasync. Purge the locks on the vnode reclaim by using the lf_purgelocks(). The default implementation is augmented for the nfs and smbfs. In the nfs_advlock, push the Giant inside the nfs_dolock. Before the change, the vop_advlock and vop_advlockasync have taken the unlocked vnode and dereferenced the fs-private inode data, racing with with the vnode reclamation due to forced unmount. Now, the vop_getattr under the shared vnode lock is used to obtain the inode size, and later, in the lf_advlockasync, after locking the vnode interlock, the VI_DOOMED flag is checked to prevent an operation on the doomed vnode. The implementation of the lf_purgelocks() is submitted by dfr. Reported by: kris Tested by: kris, pho Discussed with: jeff, dfr MFC after: 2 weeks
# 698b1a66	22-Mar-2008	Jeff Roberson <jeff@FreeBSD.org>	- Complete part of the unfinished bufobj work by consistently using BO_LOCK/UNLOCK/MTX when manipulating the bufobj. - Create a new lock in the bufobj to lock bufobj fields independently. This leaves the vnode interlock as an 'identity' lock while the bufobj is an io lock. The bufobj lock is ordered before the vnode interlock and also before the mnt ilock. - Exploit this new lock order to simplify softdep_check_suspend(). - A few sync related functions are marked with a new XXX to note that we may not properly interlock against a non-zero bv_cnt when attempting to sync all vnodes on a mountlist. I do not believe this race is important. If I'm wrong this will make these locations easier to find. Reviewed by: kib (earlier diff) Tested by: kris, pho (earlier diff)
# 81c794f9	25-Feb-2008	Attilio Rao <attilio@FreeBSD.org>	Axe the 'thread' argument from VOP_ISLOCKED() and lockstatus() as it is always curthread. As KPI gets broken by this patch, manpages and __FreeBSD_version will be updated by further commits. Tested by: Andrea Barberio <insomniac at slackware dot it>
# 24463dbb	15-Feb-2008	Attilio Rao <attilio@FreeBSD.org>	- Introduce lockmgr_args() in the lockmgr space. This function performs the same operation of lockmgr() but accepting a custom wmesg, prio and timo for the particular lock instance, overriding default values lkp->lk_wmesg, lkp->lk_prio and lkp->lk_timo. - Use lockmgr_args() in order to implement BUF_TIMELOCK() - Cleanup BUF_LOCK() - Remove LK_INTERNAL as it is nomore used in the lockmgr namespace Tested by: Andrea Barberio <insomniac at slackware dot it>
# 0e9eb108	23-Jan-2008	Attilio Rao <attilio@FreeBSD.org>	Cleanup lockmgr interface and exported KPI: - Remove the "thread" argument from the lockmgr() function as it is always curthread now - Axe lockcount() function as it is no longer used - Axe LOCKMGR_ASSERT() as it is bogus really and no currently used. Hopefully this will be soonly replaced by something suitable for it. - Remove the prototype for dumplockinfo() as the function is no longer present Addictionally: - Introduce a KASSERT() in lockstatus() in order to let it accept only curthread or NULL as they should only be passed - Do a little bit of style(9) cleanup on lockmgr.h KPI results heavilly broken by this change, so manpages and FreeBSD_version will be modified accordingly by further commits. Tested by: matteo
# 22db15c0	13-Jan-2008	Attilio Rao <attilio@FreeBSD.org>	VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
# d413d210	18-May-2007	Konstantin Belousov <kib@FreeBSD.org>	Since renaming of vop_lock to _vop_lock, pre- and post-condition function calls are no more generated for vop_lock. Rename _vop_lock to vop_lock1 to satisfy tools/vnode_if.awk assumption about vop naming conventions. This restores pre/post-condition calls.
# 2c7b0f41	16-Feb-2007	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Remove VFS_VPTOFH entirely. API is already broken and it is good time to do it. Suggested by: rwatson
# 10bcafe9	15-Feb-2007	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Move vnode-to-file-handle translation from vfs_vptofh to vop_vptofh method. This way we may support multiple structures in v_data vnode field within one file system without using black magic. Vnode-to-file-handle should be VOP in the first place, but was made VFS operation to keep interface as compatible as possible with SUN's VFS. BTW. Now Solaris also implements vnode-to-file-handle as VOP operation. VFS_VPTOFH() was left for API backward compatibility, but is marked for removal before 8.0-RELEASE. Approved by: mckusick Discussed with: many (on IRC) Tested with: ufs, msdosfs, cd9660, nullfs and zfs
# 2f6a774b	12-Nov-2006	Kip Macy <kmacy@FreeBSD.org>	change vop_lock handling to allowing tracking of callers' file and line for acquisition of lockmgr locks Approved by: scottl (standing in for mentor rwatson)
# 60b0b1aa	19-Sep-2006	Tor Egge <tegge@FreeBSD.org>	Don't try to obtain a reference to a nonexisting (NULL) mount structure in default VOP_GETWRITEMOUNT().
# c5fcce21	30-Mar-2006	Jeff Roberson <jeff@FreeBSD.org>	- GETWRITEMOUNT now returns a referenced mountpoint to prevent its identity from changing. This is possible now that mounts are not freed. Discussed with: tegge Tested by: kris Sponsored by: Isilon Systems, Inc.
# 608c95d3	30-Jan-2006	Jeff Roberson <jeff@FreeBSD.org>	- Add a comment warning about an anomalous condition where we VOP_UNLOCK and then vrele rather than vput because we would like to VOP_UNLOCK with a specific thread.
# 82be0a5a	09-Jan-2006	Tor Egge <tegge@FreeBSD.org>	Add marker vnodes to ensure that all vnodes associated with the mount point are iterated over when using MNT_VNODE_FOREACH. Reviewed by: truckman
# 0430a5e2	13-Dec-2005	Dag-Erling Smørgrav <des@FreeBSD.org>	Eradicate caddr_t from the VFS API.
# a07b0feb	17-Aug-2005	Poul-Henning Kamp <phk@FreeBSD.org>	In vop_stdpathconf(ap) also default for _PC_NAME_MAX and _PC_PATH_MAX.
# e8ddb61d	02-Aug-2005	Jeff Roberson <jeff@FreeBSD.org>	- Replace the series of DEBUG_LOCKS hacks which tried to save the vn_lock caller by saving the stack of the last locker/unlocker in lockmgr. We also put the stack in KTR at the moment. Contributed by: Antoine Brodin <antoine.brodin@laposte.net>
# 7a06fe49	14-Jun-2005	Jeff Roberson <jeff@FreeBSD.org>	- Add and enhance asserts related to the wrong bufobj panic. Sponsored by: Isilon Systems, Inc. Approved by: re (blanket vfs)
# 679985d0	09-Jun-2005	Suleiman Souhlal <ssouhlal@FreeBSD.org>	Allow EVFILT_VNODE events to work on every filesystem type, not just UFS by: - Making the pre and post hooks for the VOP functions work even when DEBUG_VFS_LOCKS is not defined. - Moving the KNOTE activations into the corresponding VOP hooks. - Creating a MNTK_NOKNOTE flag for the mnt_kern_flag field of struct mount that permits filesystems to disable the new behavior. - Creating a default VOP_KQFILTER function: vfs_kqfilter() My benchmarks have not revealed any performance degradation. Reviewed by: jeff, bde Approved by: rwatson, jmg (kqueue changes), grehan (mentor)
# 5270a2db	30-Apr-2005	Jeff Roberson <jeff@FreeBSD.org>	- Remove unnecessary spls.
# f0ddc75e	03-Apr-2005	Jeff Roberson <jeff@FreeBSD.org>	- Now that writes to character devices supporting softupdates can generate dirty bufs even with a locked vnode, 100 retries is not that many. This should probably change from a retry count to an abort when we are no longer cleaning any buffers. - Don't call vprint() while we still hold the vnode locked. Move the call to later in the function. - Clean up a comment.
# aabb1753	24-Mar-2005	Jeff Roberson <jeff@FreeBSD.org>	- Fixup the default vfs_root function to match the new prototype. Sponsored by: Isilon Systems, Inc.
# 23f2513a	13-Mar-2005	Jeff Roberson <jeff@FreeBSD.org>	- Don't drop the lock in the default inactive handler anymore, VOP_NULL will do for vop_stdinactive now. Sponsored by: Isilon Systems, Inc.
# e8ed9330	20-Feb-2005	David Schultz <das@FreeBSD.org>	Remove VFS_START(). Its original purpose involved the mfs filesystem, which is long gone. Discussed with: mckusick Reviewed by: phk
# 7c5d36fb	07-Feb-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Remove vop_stddestroyvobject()
# 7146d6cb	28-Jan-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Move the contents of vop_stddestroyvobject() to the new vnode_pager function vnode_destroy_vobject(). Make the new function zero the vp->v_object pointer so we can tell if a call is missing.
# 729fcf7e	24-Jan-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Take VOP_GETVOBJECT() out to pasture. We use the direct pointer now.
# 69816ea3	24-Jan-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Kill VOP_CREATEVOBJECT(), it is now the responsibility of the filesystem for a given vnode to create a vnode_pager object if one is needed.
# d07a6d3f	24-Jan-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Move the body of vop_stdcreatevobject() over to the vnode_pager under the name Sande^H^H^H^H^Hvnode_create_vobject(). Make the new function take a size argument which removes the need for a VOP_STAT() or a very pessimistic guess for disks. Call that new function from vop_stdcreatevobject(). Make vnode_pager_alloc() private now that its only user came home.
# 35764be3	24-Jan-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Kill the VV_OBJBUF and test the v_object for NULL instead.
# 82d1b24c	24-Jan-2005	Jeff Roberson <jeff@FreeBSD.org>	- Remove GIANT_REQUIRED where it is no longer required. Sponsored By: Isilon Systems, Inc.
# e39db32a	12-Jan-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Ditch vfs_object_create() and make the callers call VOP_CREATEVOBJECT() directly.
# 8df6bac4	11-Jan-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Remove the unused credential argument from VOP_FSYNC() and VFS_SYNC(). I'm not sure why a credential was added to these in the first place, it is not used anywhere and it doesn't make much sense: The credentials for syncing a file (ability to write to the file) should be checked at the system call level. Credentials for syncing one or more filesystems ("none") should be checked at the system call level as well. If the filesystem implementation needs a particular credential to carry out the syncing it would logically have to the cached mount credential, or a credential cached along with any delayed write data. Discussed with: rwatson
# 9454b2d8	06-Jan-2005	Warner Losh <imp@FreeBSD.org>	/* -> /*- for copyright notices, minor format tweaks as necessary
# 6a0737ae	03-Dec-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Add missing vop_bypass (returning EOPNOTSUPP). Tripped up: marks
# aec0fb7b	01-Dec-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Back when VOP_* was introduced, we did not have new-style struct initializations but we did have lofty goals and big ideals. Adjust to more contemporary circumstances and gain type checking. Replace the entire vop_t frobbing thing with properly typed structures. The only casualty is that we can not add a new VOP_ method with a loadable module. History has not given us reason to belive this would ever be feasible in the the first place. Eliminate in toto VOCALL(), vop_t, VNODEOP_SET() etc. Give coda correct prototypes and function definitions for all vop_()s. Generate a bit more data from the vnode_if.src file: a struct vop_vector and protype typedefs for all vop methods. Add a new vop_bypass() and make vop_default be a pointer to another struct vop_vector. Remove a lot of vfs_init since vop_vector is ready to use from the compiler. Cast various vop_mumble() to void * with uppercase name, for instance VOP_PANIC, VOP_NULL etc. Implement VCALL() by making vdesc_offset the offsetof() the relevant function pointer in vop_vector. This is disgusting but since the code is generated by a script comparatively safe. The alternative for nullfs etc. would be much worse. Fix up all vnode method vectors to remove casts so they become typesafe. (The bulk of this is generated by scripts)
# c31e6a8d	18-Nov-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Make more sense out of vop_stdcreatevobject()
# 9c83534d	15-Nov-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Make VOP_BMAP return a struct bufobj for the underlying storage device instead of a vnode for it. The vnode_pager does not and should not have any interest in what the filesystem uses for backend. (vfs_cluster doesn't use the backing store argument.)
# b0cccc25	13-Nov-2004	Poul-Henning Kamp <phk@FreeBSD.org>	The default VOP_REVOKE() should be vop_panic() as we should never get here in the first place.
# 5349c79d	06-Nov-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Properly implement a default version of VOP_GETWRITEMOUNT. Remove improper access to vop_stdgetwritemount() which should and will instead rely on the VOP default path.
# 19187819	05-Nov-2004	Alan Cox <alc@FreeBSD.org>	Move a call to wakeup() from vm_object_terminate() to vnode_pager_dealloc() because this call is only needed to wake threads that slept when they discovered a dead object connected to a vnode. To eliminate unnecessary calls to wakeup() by vnode_pager_dealloc(), introduce a new flag, OBJ_DISCONNECTWNT. Reviewed by: tegge@
# c108bb74	29-Oct-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Remove VOP_SPECSTRATEGY() from the system.
# 5b285eff	27-Oct-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Eliminate unnecessary KASSERT. Eliminate a printf which would never tell us anything anyway because the KASSERT would have triggered.
# 156cb265	25-Oct-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Loose the v_dirty* and v_clean* alias macros. Check the count field where we just want to know the full/empty state, rather than using TAILQ_EMPTY() or TAILQ_FIRST().
# a76d8f4e	21-Oct-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Move the VI_BWAIT flag into no bo_flag element of bufobj and call it BO_WWAIT Add bufobj_wref(), bufobj_wdrop() and bufobj_wwait() to handle the write count on a bufobj. Bufobj_wdrop() replaces vwakeup(). Use these functions all relevant places except in ffs_softdep.c where the use if interlocked_sleep() makes this impossible. Rename b_vnbufs to b_bobufs now that we touch all the relevant files anyway.
# 38f878d7	24-Sep-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Use vn_isdisk() to check if vnode is a disk. (repeat, CVS core dumped on me)
# 233b81be	24-Sep-2004	Poul-Henning Kamp <phk@FreeBSD.org>	use vn_isdisk() to see if vnode is a disk.
# f257b7a5	12-Jul-2004	Alfred Perlstein <alfred@FreeBSD.org>	Make VFS_ROOT() and vflush() take a thread argument. This is to allow filesystems to decide based on the passed thread which vnode to return. Several filesystems used curthread, they now use the passed thread.
# 14d543dd	07-Jul-2004	Alfred Perlstein <alfred@FreeBSD.org>	style(9)
# 81d16e2d	07-Jul-2004	Alfred Perlstein <alfred@FreeBSD.org>	do the vfsstd thing instead of messing up our VFS_SYSCTL macro.
# e3c5a7a4	04-Jul-2004	Poul-Henning Kamp <phk@FreeBSD.org>	When we traverse the vnodes on a mountpoint we need to look out for our cached 'next vnode' being removed from this mountpoint. If we find that it was recycled, we restart our traversal from the start of the list. Code to do that is in all local disk filesystems (and a few other places) and looks roughly like this: MNT_ILOCK(mp); loop: for (vp = TAILQ_FIRST(&mp...); (vp = nvp) != NULL; nvp = TAILQ_NEXT(vp,...)) { if (vp->v_mount != mp) goto loop; MNT_IUNLOCK(mp); ... MNT_ILOCK(mp); } MNT_IUNLOCK(mp); The code which takes vnodes off a mountpoint looks like this: MNT_ILOCK(vp->v_mount); ... TAILQ_REMOVE(&vp->v_mount->mnt_nvnodelist, vp, v_nmntvnodes); ... MNT_IUNLOCK(vp->v_mount); ... vp->v_mount = something; (Take a moment and try to spot the locking error before you read on.) On a SMP system, one CPU could have removed nvp from our mountlist but not yet gotten to assign a new value to vp->v_mount while another CPU simultaneously get to the top of the traversal loop where it finds that (vp->v_mount != mp) is not true despite the fact that the vnode has indeed been removed from our mountpoint. Fix: Introduce the macro MNT_VNODE_FOREACH() to traverse the list of vnodes on a mountpoint while taking into account that vnodes may be removed from the list as we go. This saves approx 65 lines of duplicated code. Split the insmntque() which potentially moves a vnode from one mount point to another into delmntque() and insmntque() which does just what the names say. Fix delmntque() to set vp->v_mount to NULL while holding the mountpoint lock.
# 7f8a436f	05-Apr-2004	Warner Losh <imp@FreeBSD.org>	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999. Approved by: core
# b21126c6	29-Mar-2004	Peter Wemm <peter@FreeBSD.org>	Clean up the stub fake vnode locking implemenations. The main reason this stuff was here (NFS) was fixed by Alfred in November. The only remaining consumer of the stub functions was umapfs, which is horribly horribly broken. It has missed out on about the last 5 years worth of maintenence that was done on nullfs (from which umapfs is derived). It needs major work to bring it up to date with the vnode locking protocol. umapfs really needs to find a caretaker to bring it into the 21st century. Functions GC'ed: vop_noislocked, vop_nolock, vop_nounlock, vop_sharedlock.
# ca430f2e	04-Nov-2003	Alexander Kabaev <kan@FreeBSD.org>	Remove mntvnode_mtx and replace it with per-mountpoint mutex. Introduce two new macros MNT_ILOCK(mp)/MNT_IUNLOCK(mp) to operate on this mutex transparently. Eventually new mutex will be protecting more fields in struct mount, not only vnode list. Discussed with: jeff
# cb9ddc80	01-Nov-2003	Alexander Kabaev <kan@FreeBSD.org>	Take care not to call vput if thread used in corresponding vget wasn't curthread, i.e. when we receive a thread pointer to use as a function argument. Use VOP_UNLOCK/vrele in these cases. The only case there td != curthread known at the moment is boot() calling sync with thread0 pointer. This fixes the panic on shutdown people have reported.
# 492c1e68	31-Oct-2003	Alexander Kabaev <kan@FreeBSD.org>	Temporarily undo parts of the stuct mount locking commit by jeff. It is unsafe to hold a mutex across vput/vrele calls. This will be redone when a better locking strategy is agreed upon. Discussed with: jeff
# 0823d299	30-Oct-2003	Alexander Kabaev <kan@FreeBSD.org>	Relock mntvnode_mtx if vget fails in vfs_stdsync. The loop is always shoould entered with mutex locked.
# b2941431	26-Sep-2003	Poul-Henning Kamp <phk@FreeBSD.org>	Introduce no_poll() default method for device drivers. Have it do exactly the same as vop_nopoll() for consistency and put a comment in the two pointing at each other. Retire seltrue() in favour of no_poll(). Create private default functions in kern_conf.c instead of public ones. Change default strategy to return the bio with ENODEV instead of doing nothing which would lead the bio stranded. Retire public nullopen() and nullclose() as well as the entire band of public no{read,write,ioctl,mmap,kqfilter,strategy,poll,dump} funtions, they are the default actions now. Move the final two trivial functions from subr_xxx.c to kern_conf.c and retire the now empty subr_xxx.c
# 2a0f8aeb	15-Jun-2003	Poul-Henning Kamp <phk@FreeBSD.org>	I have not had any reports of trouble for a long time, so remove the gentle versions of the vop_strategy()/vop_specstrategy() mismatch methods and use vop_panic() instead.
# 677b542e	10-Jun-2003	David E. O'Brien <obrien@FreeBSD.org>	Use __FBSDID().
# 658ad5ff	05-May-2003	Alan Cox <alc@FreeBSD.org>	Lock the vm_object when performing vm_pager_deallocate().
# 12352fdc	02-May-2003	Alan Cox <alc@FreeBSD.org>	Lock access to the vm_object's flags in vop_stdcreatevobject().
# ab7b0ae5	30-Apr-2003	Alan Cox <alc@FreeBSD.org>	Lock an update to a vm_object's ref_count.
# 104a9b7e	29-Apr-2003	Alexander Kabaev <kan@FreeBSD.org>	Deprecate machine/limits.h in favor of new sys/limits.h. Change all in-tree consumers to include <sys/limits.h> Discussed on: standards@ Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>
# c829b9d0	26-Apr-2003	Alan Cox <alc@FreeBSD.org>	- Lock the vm_object on entry to vm_object_terminate().
# 09f11da5	13-Mar-2003	Jeff Roberson <jeff@FreeBSD.org>	- Remove a race between fsync like functions and flushbufqueues() by requiring locked bufs in vfs_bio_awrite(). Previously the buf could have been written out by fsync before we acquired the buf lock if it weren't for giant. The cluster_wbuild() handles this race properly but the single write at the end of vfs_bio_awrite() would not. - Modify flushbufqueues() so there is only one copy of the loop. Pass a parameter in that says whether or not we should sync bufs with deps. - Call flushbufqueues() a second time and then break if we couldn't find any bufs without deps.
# c162e9c2	11-Mar-2003	Alexander Kabaev <kan@FreeBSD.org>	Rename vfs_stdsync function to vfs_stdnosync which matches more closely what function is really doing. Update all existing consumers to use the new name. Introduce a new vfs_stdsync function, which iterates over mount point's vnodes and call FSYNC on each one of them in turn. Make nwfs and smbfs use this new function instead of rolling their own identical sync implementations. Reviewed by: jeff
# 72f0679c	10-Mar-2003	Alexander Kabaev <kan@FreeBSD.org>	Remove trainling whitespace.
# 9722121a	07-Mar-2003	John Baldwin <jhb@FreeBSD.org>	Respect any passed in external lockmgr flags such as LK_NOWAIT in the default implementations of VOP_LOCK() and VOP_UNLOCK(). Tested by: jlemon, phk Glanced at by: jeffr
# f7271711	03-Mar-2003	Jeff Roberson <jeff@FreeBSD.org>	- Correct the wchan in vop_stdfsync() This is almost what bde asked for. There is some desire to have per fs wchans still but that is difficult giving the current arrangement of the code.
# 7dc91116	03-Mar-2003	Nate Lawson <njl@FreeBSD.org>	Pick up one file missed in the previous vprint() cleanup
# 17661e5a	24-Feb-2003	Jeff Roberson <jeff@FreeBSD.org>	- Add an interlock argument to BUF_LOCK and BUF_TIMELOCK. - Remove the buftimelock mutex and acquire the buf's interlock to protect these fields instead. - Hold the vnode interlock while locking bufs on the clean/dirty queues. This reduces some cases from one BUF_LOCK with a LK_NOWAIT and another BUF_LOCK with a LK_TIMEFAIL to a single lock. Reviewed by: arch, mckusick
# 08883c8a	08-Feb-2003	Jeff Roberson <jeff@FreeBSD.org>	- Claim we're 'fsync' and not 'spec_fsync' in vop_stdfsync.
# 767b9a52	09-Feb-2003	Jeff Roberson <jeff@FreeBSD.org>	- Cleanup unlocked accesses to buf flags by introducing a new b_vflag member that is protected by the vnode lock. - Move B_SCANNED into b_vflags and call it BV_SCANNED. - Create a vop_stdfsync() modeled after spec's sync. - Replace spec_fsync, msdos_fsync, and hpfs_fsync with the stdfsync and some fs specific processing. This gives all of these filesystems proper behavior wrt MNT_WAIT/NOWAIT and the use of the B_SCANNED flag. - Annotate the locking in buf.h
# f5b11b6e	04-Jan-2003	Poul-Henning Kamp <phk@FreeBSD.org>	Temporarily introduce a new VOP_SPECSTRATEGY operation while I try to sort out disk-io from file-io in the vm/buffer/filesystem space. The intent is to sort VOP_STRATEGY calls into those which operate on "real" vnodes and those which operate on VCHR vnodes. For the latter kind, the call will be changed to VOP_SPECSTRATEGY, possibly conditionally for those places where dual-use happens. Add a default VOP_SPECSTRATEGY method which will call the normal VOP_STRATEGY. First time it is called it will print debugging information. This will only happen if a normal vnode is passed to VOP_SPECSTRATEGY by mistake. Add a real VOP_SPECSTRATEGY in specfs, which does what VOP_STRATEGY does on a VCHR vnode today. Add a new VOP_STRATEGY method in specfs to catch instances where the conversion to VOP_SPECSTRATEGY has not yet happened. Handle the request just like we always did, but first time called print debugging information. Apart up to two instances of console messages per boot, this amounts to a glorified no-op commit. If you get any of the messages on your console I would very much like a copy of them mailed to phk@freebsd.org
# c7fb6fd1	04-Jan-2003	Poul-Henning Kamp <phk@FreeBSD.org>	resort the vnode ops list.
# 1f59664b	24-Oct-2002	Poul-Henning Kamp <phk@FreeBSD.org>	Don't try to be cute and save a call/return by implementing a degenerate vrele() inline.
# a5b65058	13-Oct-2002	Kirk McKusick <mckusick@FreeBSD.org>	Regularize the vop_stdlock'ing protocol across all the filesystems that use it. Specifically, vop_stdlock uses the lock pointed to by vp->v_vnlock. By default, getnewvnode sets up vp->v_vnlock to reference vp->v_lock. Filesystems that wish to use the default do not need to allocate a lock at the front of their node structure (as some still did) or do a lockinit. They can simply start using vn_lock/VOP_UNLOCK. Filesystems that wish to manage their own locks, but still use the vop_stdlock functions (such as nullfs) can simply replace vp->v_vnlock with a pointer to the lock that they wish to have used for the vnode. Such filesystems are responsible for setting the vp->v_vnlock back to the default in their vop_reclaim routine (e.g., vp->v_vnlock = &vp->v_lock). In theory, this set of changes cleans up the existing filesystem lock interface and should have no function change to the existing locking scheme. Sponsored by: DARPA & NAI Labs.
# 3cc511c5	24-Sep-2002	Jeff Roberson <jeff@FreeBSD.org>	- Use the standard vp interlock macros.
# 6f211602	13-Aug-2002	Poul-Henning Kamp <phk@FreeBSD.org>	Remember to unlock the (optional) vnode in vfs_stdextattrctl(). Failing to do this made the following script hang: #!/bin/sh set -ex extattrctl start /tmp extattrctl initattr 64 /tmp/EA00 extattrctl enable /tmp user ea00 /tmp/EA00 extattrctl showattr /tmp/EA00 if the filesystem backing /tmp did not support EAs. The real solution is probably to have the extattrctl syscall do the unlocking rather than depend on the filesystem to do it. Considering that extattrctl is going to be made obsolete anyway, this has dogwash priority. Sponsored by: DARPA & NAI Labs.
# e6e370a7	04-Aug-2002	Jeff Roberson <jeff@FreeBSD.org>	- Replace v_flag with v_iflag and v_vflag - v_vflag is protected by the vnode lock and is used when synchronization with VOP calls is needed. - v_iflag is protected by interlock and is used for dealing with vnode management issues. These flags include X/O LOCK, FREE, DOOMED, etc. - All accesses to v_iflag and v_vflag have either been locked or marked with mp_fixme's. - Many ASSERT_VOP_LOCKED calls have been added where the locking was not clear. - Many functions in vfs_subr.c were restructured to provide for stronger locking. Idea stolen from: BSD/OS
# 0ea5e552	26-Jul-2002	Jeff Roberson <jeff@FreeBSD.org>	- The default for lock, unlock, and islocked is now std* instead of no*.
# fbedc80b	09-Jul-2002	Maxime Henrion <mux@FreeBSD.org>	Remove vfs_stdmount() and vfs_stdunmount(). They are not really useful and are incompatible with nmount.
# 3eee035c	27-Apr-2002	Ian Dowse <iedowse@FreeBSD.org>	Remove a stale comment saying that the vnode lock must be the first element in the structure pointed to by vp->v_data; the vnode lock is now within the vnode structure itself.
# c897b813	19-Mar-2002	Jeff Roberson <jeff@FreeBSD.org>	Remove references to vm_zone.h and switch over to the new uma API. Also, remove maxsockets. If you look carefully you'll notice that the old zone allocator never honored this anyway.
# 4d77a549	19-Mar-2002	Alfred Perlstein <alfred@FreeBSD.org>	Remove __P.
# a0595d02	16-Mar-2002	Kirk McKusick <mckusick@FreeBSD.org>	Add a flags parameter to VFS_VGET to pass through the desired locking flags when acquiring a vnode. The immediate purpose is to allow polling lock requests (LK_NOWAIT) needed by soft updates to avoid deadlock when enlisting other processes to help with the background cleanup. For the future it will allow the use of shared locks for read access to vnodes. This change touches a lot of files as it affects most filesystems within the system. It has been well tested on FFS, loopback, and CD-ROM filesystems. only lightly on the others, so if you find a problem there, please let me (mckusick@mckusick.com) know.
# eb8e6d52	05-Mar-2002	Eivind Eklund <eivind@FreeBSD.org>	Document all functions, global and static variables, and sysctls. Includes some minor whitespace changes, and re-ordering to be able to document properly (e.g, grouping of variables and the SYSCTL macro calls for them, where the documentation has been added.) Reviewed by: phk (but all errors are mine)
# 4f467cb8	22-Oct-2001	Matthew Dillon <dillon@FreeBSD.org>	Fix incorrect double-termination of vm_object. When a vm_object is terminated and flushes pending dirty pages it is possible for the object to be ref'd (0->1) and then deref'd (1->0) during termination. We do not terminate the object a second time. Document vop_stdgetvobject() to explicitly allow it to be called without the vnode interlock held (for upcoming sync_msync() and ffs_sync() performance optimizations) MFC after: 3 days
# b40ce416	12-Sep-2001	Julian Elischer <julian@FreeBSD.org>	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
# 0cddd8f0	04-Jul-2001	Matthew Dillon <dillon@FreeBSD.org>	With Alfred's permission, remove vm_mtx in favor of a fine-grained approach (this commit is just the first stage). Also add various GIANT_ macros to formalize the removal of Giant, making it easy to test in a more piecemeal fashion. These macros will allow us to test fine-grained locks to a degree before removing Giant, and also after, and to remove Giant in a piecemeal fashion via sysctl's on those subsystems which the authors believe can operate without Giant.
# 5bd57bc8	23-May-2001	John Baldwin <jhb@FreeBSD.org>	Don't release the vm lock just to turn around and grab it again.
# 23955314	18-May-2001	Alfred Perlstein <alfred@FreeBSD.org>	Introduce a global lock for the vm subsystem (vm_mtx). vm_mtx does not recurse and is required for most low level vm operations. faults can not be taken without holding Giant. Memory subsystems can now call the base page allocators safely. Almost all atomic ops were removed as they are covered under the vm mutex. Alpha and ia64 now need to catch up to i386's trap handlers. FFS and NFS have been tested, other filesystems will need minor changes (grabbing the vm lock when twiddling page properties). Reviewed (partially) by: jake, jhb
# 97f6754f	14-May-2001	Jonathan Lemon <jlemon@FreeBSD.org>	When calling poll() on a fd associated with a filesystem, let POLLIN/POLLOUT behave identically to POLLRDNORM/POLLWRNORM. Submitted by: bde PR: 27287 merge after: 1 week
# b966319d	06-May-2001	Poul-Henning Kamp <phk@FreeBSD.org>	Fix return type of vop_stdputpages() Noticed by: rwatson
# a62615e5	01-May-2001	Poul-Henning Kamp <phk@FreeBSD.org>	Implement vop_std{get\|put}pages() and add them to the default vop[]. Un-copy&paste all the VOP_{GET\|PUT}PAGES() functions which do nothing but the default.
# b7ebffbc	29-Apr-2001	Poul-Henning Kamp <phk@FreeBSD.org>	Add a vop_stdbmap(), and make it part of the default vop vector. Make 7 filesystems which don't really know about VOP_BMAP rely on the default vector, rather than more or less complete local vop_nopbmap() implementations.
# 60fb0ce3	28-Apr-2001	Greg Lehey <grog@FreeBSD.org>	Revert consequences of changes to mount.h, part 2. Requested by: bde
# a13234bb	25-Apr-2001	Poul-Henning Kamp <phk@FreeBSD.org>	Move the netexport structure from the fs-specific mountstructure to struct mount. This makes the "struct netexport *" paramter to the vfs_export and vfs_checkexport interface unneeded. Consequently that all non-stacking filesystems can use vfs_stdcheckexp(). At the same time, make it a pointer to a struct netexport in struct mount, so that we can remove the bogus AF_MAX and #include <net/radix.h> from <sys/mount.h>
# d98dc34f	23-Apr-2001	Greg Lehey <grog@FreeBSD.org>	Correct #includes to work with fixed sys/mount.h.
# f84e29a0	17-Apr-2001	Poul-Henning Kamp <phk@FreeBSD.org>	This patch removes the VOP_BWRITE() vector. VOP_BWRITE() was a hack which made it possible for NFS client side to use struct buf with non-bio backing. This patch takes a more general approach and adds a bp->b_op vector where more methods can be added. The success of this patch depends on bp->b_op being initialized all relevant places for some value of "relevant" which is not easy to determine. For now the buffers have grown a b_magic element which will make such issues a tiny bit easier to debug.
# 30632071	18-Mar-2001	Robert Watson <rwatson@FreeBSD.org>	o Rename "namespace" argument to "attrnamespace" as namespace is a C++ reserved word. Submitted by: jkh Obtained from: TrustedBSD Project
# 70f36851	14-Mar-2001	Robert Watson <rwatson@FreeBSD.org>	o Change the API and ABI of the Extended Attribute kernel interfaces to introduce a new argument, "namespace", rather than relying on a first- character namespace indicator. This is in line with more recent thinking on EA interfaces on various mailing lists, including the posix1e, Linux acl-devel, and trustedbsd-discuss forums. Two namespaces are defined by default, EXTATTR_NAMESPACE_SYSTEM and EXTATTR_NAMESPACE_USER, where the primary distinction lies in the access control model: user EAs are accessible based on the normal MAC and DAC file/directory protections, and system attributes are limited to kernel-originated or appropriately privileged userland requests. o These API changes occur at several levels: the namespace argument is introduced in the extattr_{get,set}_file() system call interfaces, at the vnode operation level in the vop_{get,set}extattr() interfaces, and in the UFS extended attribute implementation. Changes are also introduced in the VFS extattrctl() interface (system call, VFS, and UFS implementation), where the arguments are modified to include a namespace field, as well as modified to advoid direct access to userspace variables from below the VFS layer (in the style of recent changes to mount by adrian@FreeBSD.org). This required some cleanup and bug fixing regarding VFS locks and the VFS interface, as a vnode pointer may now be optionally submitted to the VFS_EXTATTRCTL() call. Updated documentation for the VFS interface will be committed shortly. o In the near future, the auto-starting feature will be updated to search two sub-directories to the ".attribute" directory in appropriate file systems: "user" and "system" to locate attributes intended for those namespaces, as the single filename is no longer sufficient to indicate what namespace the attribute is intended for. Until this is committed, all attributes auto-started by UFS will be placed in the EXTATTR_NAMESPACE_SYSTEM namespace. o The default POSIX.1e attribute names for ACLs and Capabilities have been updated to no longer include the '$' in their filename. As such, if you're using these features, you'll need to rename the attribute backing files to the same names without '$' symbols in front. o Note that these changes will require changes in userland, which will be committed shortly. These include modifications to the extended attribute utilities, as well as to libutil for new namespace string conversion routines. Once the matching userland changes are committed, a buildworld is recommended to update all the necessary include files and verify that the kernel and userland environments are in sync. Note: If you do not use extended attributes (most people won't), upgrading is not imperative although since the system call API has changed, the new userland extended attribute code will no longer compile with old include files. o Couple of minor cleanups while I'm there: make more code compilation conditional on FFS_EXTATTR, which should recover a bit of space on kernels running without EA's, as well as update copyright dates. Obtained from: TrustedBSD Project
# a25f0571	17-Feb-2001	Bruce Evans <bde@FreeBSD.org>	Added a dummy lookup vop. Specfs was broken by removing its dummy lookup vop so that it defaulted to using vop_eopnotsupp for strange lookups like the ones for open("/dev/null/", ...) and stat("/dev/null/", ...). This mainly caused the wrong errno to be returned by vfs syscalls (EOPNOTSUPP is not in POSIX, and is not documented in connection with specfs in open.2 and is not documented in stat.2 at all). Also, lookup vops are apparently required to set *ap->a_vpp to NULL on error, but vop_eopnotsupp is too broken to do this.
# 9ed346ba	08-Feb-2001	Bosko Milekic <bmilekic@FreeBSD.org>	Change and clean the mutex lock interface. mtx_enter(lock, type) becomes: mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks) mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized) similarily, for releasing a lock, we now have: mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN. We change the caller interface for the two different types of locks because the semantics are entirely different for each case, and this makes it explicitly clear and, at the same time, it rids us of the extra `type' argument. The enter->lock and exit->unlock change has been made with the idea that we're "locking data" and not "entering locked code" in mind. Further, remove all additional "flags" previously passed to the lock acquire/release routines with the exception of two: MTX_QUIET and MTX_NOSWITCH The functionality of these flags is preserved and they can be passed to the lock/unlock routines by calling the corresponding wrappers: mtx_{lock, unlock}_flags(lock, flag(s)) and mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN locks, respectively. Re-inline some lock acq/rel code; in the sleep lock case, we only inline the _obtain_lock()s in order to ensure that the inlined code fits into a cache line. In the spin lock case, we inline recursion and actually only perform a function call if we need to spin. This change has been made with the idea that we generally tend to avoid spin locks and that also the spin locks that we do have and are heavily used (i.e. sched_lock) do recurse, and therefore in an effort to reduce function call overhead for some architectures (such as alpha), we inline recursion for this case. Create a new malloc type for the witness code and retire from using the M_DEV type. The new type is called M_WITNESS and is only declared if WITNESS is enabled. Begin cleaning up some machdep/mutex.h code - specifically updated the "optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently need those. Finally, caught up to the interface changes in all sys code. Contributors: jake, jhb, jasone (in no particular order)
# e3c4036b	01-Nov-2000	Eivind Eklund <eivind@FreeBSD.org>	Give vop_mmap an untimely death. The opportunity to give it a timely death timed out in 1996.
# 35e0e5b3	20-Oct-2000	John Baldwin <jhb@FreeBSD.org>	Catch up to moving headers: - machine/ipl.h -> sys/ipl.h - machine/mutex.h -> sys/mutex.h
# a18b1f1d	03-Oct-2000	Jason Evans <jasone@FreeBSD.org>	Convert lockmgr locks from using simple locks to using mutexes. Add lockdestroy() and appropriate invocations, which corresponds to lockinit() and must be called to clean up after a lockmgr lock is no longer needed.
# 67e87166	25-Sep-2000	Boris Popov <bp@FreeBSD.org>	Add a lock structure to vnode structure. Previously it was either allocated separately (nfs, cd9660 etc) or keept as a first element of structure referenced by v_data pointer(ffs). Such organization leads to known problems with stacked filesystems. From this point vop_nolock() functions maintain only interlock lock. vop_stdlock() functions maintain built-in v_lock structure using lockmgr(). vop_sharedlock() is compatible with vop_stdunlock(), but maintains a shared lock on vnode. If filesystem wishes to export lockmgr compatible lock, it can put an address of this lock to v_vnlock field. This indicates that the upper filesystem can take advantage of it and use single lock structure for entire (or part) of stack of vnodes. This field shouldn't be examined or modified by VFS code except for initialization purposes. Reviewed in general by: mckusick
# 9ff5ce6b	12-Sep-2000	Boris Popov <bp@FreeBSD.org>	Add three new VOPs: VOP_CREATEVOBJECT, VOP_DESTROYVOBJECT and VOP_GETVOBJECT. They will be used by nullfs and other stacked filesystems to support full cache coherency. Reviewed in general by: mckusick, dillon
# 39f70682	18-Aug-2000	Poul-Henning Kamp <phk@FreeBSD.org>	Introduce vop_stdinactive() and make it the default if no vop_inactive is declared. Sort and prune a few vop_op[].
# f2a2857b	11-Jul-2000	Kirk McKusick <mckusick@FreeBSD.org>	Add snapshots to the fast filesystem. Most of the changes support the gating of system calls that cause modifications to the underlying filesystem. The gating can be enabled by any filesystem that needs to consistently suspend operations by adding the vop_stdgetwritemount to their set of vnops. Once gating is enabled, the function vfs_write_suspend stops all new write operations to a filesystem, allows any filesystem modifying system calls already in progress to complete, then sync's the filesystem to disk and returns. The function vfs_write_resume allows the suspended write operations to begin again. Gating is not added by default for all filesystems as for SMP systems it adds two extra locks to such critical kernel paths as the write system call. Thus, gating should only be added as needed. Details on the use and current status of snapshots in FFS can be found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness is not included here. Unless and until you create a snapshot file, these changes should have no effect on your system (famous last words).
# 9626b608	05-May-2000	Poul-Henning Kamp <phk@FreeBSD.org>	Separate the struct bio related stuff out of <sys/buf.h> into <sys/bio.h>. <sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall not be made a nested include according to bdes teachings on the subject of nested includes. Diskdrivers and similar stuff below specfs::strategy() should no longer need to include <sys/buf.> unless they need caching of data. Still a few bogus uses of struct buf to track down. Repocopy by: peter
# 8177437d	14-Apr-2000	Poul-Henning Kamp <phk@FreeBSD.org>	Complete the bio/buf divorce for all code below devfs::strategy Exceptions: Vinum untouched. This means that it cannot be compiled. Greg Lehey is on the case. CCD not converted yet, casts to struct buf (still safe) atapi-cd casts to struct buf to examine B_PHYS
# c244d2de	02-Apr-2000	Poul-Henning Kamp <phk@FreeBSD.org>	Move B_ERROR flag to b_ioflags and call it BIO_ERROR. (Much of this done by script) Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED. Move b_pblkno and b_iodone_chain to struct bio while we transition, they will be obsoleted once bio structs chain/stack. Add bio_queue field for struct bio aware disksort. Address a lot of stylistic issues brought up by bde.
# 21144e3b	20-Mar-2000	Poul-Henning Kamp <phk@FreeBSD.org>	Remove B_READ, B_WRITE and B_FREEBUF and replace them with a new field in struct buf: b_iocmd. The b_iocmd is enforced to have exactly one bit set. B_WRITE was bogusly defined as zero giving rise to obvious coding mistakes. Also eliminate the redundant struct buf flag B_CALL, it can just as efficiently be done by comparing b_iodone to NULL. Should you get a panic or drop into the debugger, complaining about "b_iocmd", don't continue. It is likely to write on your disk where it should have been reading. This change is a step in the direction towards a stackable BIO capability. A lot of this patch were machine generated (Thanks to style(9) compliance!) Vinum users: Greg has not had time to test this yet, be careful.
# 8f073875	18-Jan-2000	Robert Watson <rwatson@FreeBSD.org>	Fix bde'isms in acl/extattr syscall interface, renaming syscalls to prettier (?) names, adding some const's around here, et al. Reviewed by: bde
# 91f37dcb	18-Dec-1999	Robert Watson <rwatson@FreeBSD.org>	Second pass commit to introduce new ACL and Extended Attribute system calls, vnops, vfsops, both in /kern, and to individual file systems that require a vfsop_ array entry. Reviewed by: eivind
# 762e6b85	15-Dec-1999	Eivind Eklund <eivind@FreeBSD.org>	Introduce NDFREE (and remove VOP_ABORTOP)
# 6bdfe06a	11-Dec-1999	Eivind Eklund <eivind@FreeBSD.org>	Lock reporting and assertion changes. * lockstatus() and VOP_ISLOCKED() gets a new process argument and a new return value: LK_EXCLOTHER, when the lock is held exclusively by another process. * The ASSERT_VOP_(UN)LOCKED family is extended to use what this gives them * Extend the vnode_if.src format to allow more exact specification than locked/unlocked. This commit should not do any semantic changes unless you are using DEBUG_VFS_LOCKS. Discussed with: grog, mch, peter, phk Reviewed by: peter
# 0c974a1f	07-Nov-1999	Poul-Henning Kamp <phk@FreeBSD.org>	Make vop_panic() a little more informative.
# c24fda81	10-Sep-1999	Alfred Perlstein <alfred@FreeBSD.org>	Seperate the export check in VFS_FHTOVP, exports are now checked via VFS_CHECKEXP. Add fh(open\|stat\|stafs) syscalls to allow userland to query filesystems based on (network) filehandle. Obtained from: NetBSD
# 5a5fccc8	07-Sep-1999	Alfred Perlstein <alfred@FreeBSD.org>	All unimplemented VFS ops now have entries in kern/vfs_default.c that return reasonable defaults. This avoids confusing and ugly casting to eopnotsupp or making dummy functions. Bogus casting of filesystem sysctls to eopnotsupp() have been removed. This should make *_vfsops.c more readable and reduce bloat. Reviewed by: msmith, eivind Approved by: phk Tested by: Jeroen Ruigrok/Asmodai <asmodai@wxs.nl>
# c3aac50f	27-Aug-1999	Peter Wemm <peter@FreeBSD.org>	$Id$ -> $FreeBSD$
# 0625ba2f	17-Jun-1999	Gary Palmer <gpalmer@FreeBSD.org>	Add Id strings
# 4221e284	02-May-1999	Alan Cox <alc@FreeBSD.org>	The VFS/BIO subsystem contained a number of hacks in order to optimize piecemeal, middle-of-file writes for NFS. These hacks have caused no end of trouble, especially when combined with mmap(). I've removed them. Instead, NFS will issue a read-before-write to fully instantiate the struct buf containing the write. NFS does, however, optimize piecemeal appends to files. For most common file operations, you will not notice the difference. The sole remaining fragment in the VFS/BIO system is b_dirtyoff/end, which NFS uses to avoid cache coherency issues with read-merge-write style operations. NFS also optimizes the write-covers-entire-buffer case by avoiding the read-before-write. There is quite a bit of room for further optimization in these areas. The VM system marks pages fully-valid (AKA vm_page_t->valid = VM_PAGE_BITS_ALL) in several places, most noteably in vm_fault. This is not correct operation. The vm_pager_get_pages() code is now responsible for marking VM pages all-valid. A number of VM helper routines have been added to aid in zeroing-out the invalid portions of a VM page prior to the page being marked all-valid. This operation is necessary to properly support mmap(). The zeroing occurs most often when dealing with file-EOF situations. Several bugs have been fixed in the NFS subsystem, including bits handling file and directory EOF situations and buf->b_flags consistancy issues relating to clearing B_ERROR & B_INVAL, and handling B_DONE. getblk() and allocbuf() have been rewritten. B_CACHE operation is now formally defined in comments and more straightforward in implementation. B_CACHE for VMIO buffers is based on the validity of the backing store. B_CACHE for non-VMIO buffers is based simply on whether the buffer is B_INVAL or not (B_CACHE set if B_INVAL clear, and vise-versa). biodone() is now responsible for setting B_CACHE when a successful read completes. B_CACHE is also set when a bdwrite() is initiated and when a bwrite() is initiated. VFS VOP_BWRITE routines (there are only two - nfs_bwrite() and bwrite()) are now expected to set B_CACHE. This means that bowrite() and bawrite() also set B_CACHE indirectly. There are a number of places in the code which were previously using buf->b_bufsize (which is DEV_BSIZE aligned) when they should have been using buf->b_bcount. These have been fixed. getblk() now clears B_DONE on return because the rest of the system is so bad about dealing with B_DONE. Major fixes to NFS/TCP have been made. A server-side bug could cause requests to be lost by the server due to nfs_realign() overwriting other rpc's in the same TCP mbuf chain. The server's kernel must be recompiled to get the benefit of the fixes. Submitted by: Matthew Dillon <dillon@apollo.backplane.com>
# a5c9bce7	25-Feb-1999	Bruce Evans <bde@FreeBSD.org>	Added a used #include (don't depend on "vnode_if.h" including <sys/buf.h>).
# 15a1057c	20-Jan-1999	Eivind Eklund <eivind@FreeBSD.org>	Add 'options DEBUG_LOCKS', which stores extra information in struct lock, and add some macros and function parameters to make sure that the information get to the point where it can be put in the lock structure. While I'm here, add DEBUG_VFS_LOCKS to LINT.
# 4e61198e	10-Nov-1998	Peter Wemm <peter@FreeBSD.org>	Make the vnode opv vector construction fully dynamic. Previously we leaked memory on each unload and were limited to items referenced in the kernel copy of vnode_if.c. Now a kernel module is free to create it's own VOP_FOO() routines and the rest of the system will happily deal with it, including passthrough layers like union/umap/etc. Have VFS_SET() call a common vfs_modevent() handler rather than inline duplicating the common code all over the place. Have VNODEOP_SET() have the vnodeops removed at unload time (assuming a module) so that the vop_t vector is reclaimed. Slightly adjust the vop_t vectors so that calling slot 0 is a panic rather than a page fault. This could happen if VOP_something() was called without any handlers being present anywhere (including in vfs_default.c). slot 1 becomes the default vector for the vnodeop table. TODO: reclaim zones on unload (eg: nfs code)
# fd5d1124	04-Jul-1998	Julian Elischer <julian@FreeBSD.org>	VOP_STRATEGY grows an (struct vnode *) argument as the value in b_vp is often not really what you want. (and needs to be frobbed). more cleanups will follow this. Reviewed by: Bruce Evans <bde@freebsd.org>
# 79cc756d	05-May-1998	Mike Smith <msmith@FreeBSD.org>	As described by the submitter: Reverse the VFS_VRELE patch. Reference counting of vnodes does not need to be done per-fs. I noticed this while fixing vfs layering violations. Doing reference counting in generic code is also the preference cited by John Heidemann in recent discussions with him. The implementation of alternative vnode management per-fs is still a valid requirement for some filesystems but will be revisited sometime later, most likely using a different framework. Submitted by: Michael Hancock <michaelh@cet.co.jp>
# 8f9110f6	07-Mar-1998	John Dyson <dyson@FreeBSD.org>	This mega-commit is meant to fix numerous interrelated problems. There has been some bitrot and incorrect assumptions in the vfs_bio code. These problems have manifest themselves worse on NFS type filesystems, but can still affect local filesystems under certain circumstances. Most of the problems have involved mmap consistancy, and as a side-effect broke the vfs.ioopt code. This code might have been committed seperately, but almost everything is interrelated. 1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that are fully valid. 2) Rather than deactivating erroneously read initial (header) pages in kern_exec, we now free them. 3) Fix the rundown of non-VMIO buffers that are in an inconsistent (missing vp) state. 4) Fix the disassociation of pages from buffers in brelse. The previous code had rotted and was faulty in a couple of important circumstances. 5) Remove a gratuitious buffer wakeup in vfs_vmio_release. 6) Remove a crufty and currently unused cluster mechanism for VBLK files in vfs_bio_awrite. When the code is functional, I'll add back a cleaner version. 7) The page busy count wakeups assocated with the buffer cache usage were incorrectly cleaned up in a previous commit by me. Revert to the original, correct version, but with a cleaner implementation. 8) The cluster read code now tries to keep data associated with buffers more aggressively (without breaking the heuristics) when it is presumed that the read data (buffers) will be soon needed. 9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The delay loop waiting is not useful for filesystem locks, due to the length of the time intervals. 10) Correct and clean-up spec_getpages. 11) Implement a fully functional nfs_getpages, nfs_putpages. 12) Fix nfs_write so that modifications are coherent with the NFS data on the server disk (at least as well as NFS seems to allow.) 13) Properly support MS_INVALIDATE on NFS. 14) Properly pass down MS_INVALIDATE to lower levels of the VM code from vm_map_clean. 15) Better support the notion of pages being busy but valid, so that fewer in-transit waits occur. (use p->busy more for pageouts instead of PG_BUSY.) Since the page is fully valid, it is still usable for reads. 16) It is possible (in error) for cached pages to be busy. Make the page allocation code handle that case correctly. (It should probably be a printf or panic, but I want the system to handle coding errors robustly. I'll probably add a printf.) 17) Correct the design and usage of vm_page_sleep. It didn't handle consistancy problems very well, so make the design a little less lofty. After vm_page_sleep, if it ever blocked, it is still important to relookup the page (if the object generation count changed), and verify it's status (always.) 18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up. 19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush. 20) Fix vm_pager_put_pages and it's descendents to support an int flag instead of a boolean, so that we can pass down the invalidate bit.
# 34bdbbd0	01-Mar-1998	Mike Smith <msmith@FreeBSD.org>	The intent is to get rid of WILLRELE in vnode_if.src by making a complement to all ops that return a vpp, VFS_VRELE. This is initially only for file systems that implement the following ops that do a WILLRELE: vop_create, vop_whiteout, vop_mknod, vop_remove, vop_link, vop_rename, vop_mkdir, vop_rmdir, vop_symlink This is initial DNA that doesn't do anything yet. VFS_VRELE is implemented but not called. A default vfs_vrele was created for fs implementations that use the standard vnode management routines. VFS_VRELE implementations were made for the following file systems: Standard (vfs_vrele) ffs mfs nfs msdosfs devfs ext2fs Custom union umapfs Just EOPNOTSUPP fdesc procfs kernfs portal cd9660 These implementations may change as VOP changes are implemented. In the next phase, in the vop implementations calls to vrele and the vrele part of vput will be moved to the top layer vfs_vnops and made visible to all layers. vput will be replaced by unlock in these cases. Unlocking will still be done in the per fs layer but the refcount decrement will be triggered at the top because it doesn't hurt to hold a vnode reference a little longer. This will have minimal impact on the structure of the existing code. This will only be done for vnode arguments that are released by the various fs vop implementations. Wider use of VFS_VRELE will likely require restructuring of the code. Reviewed by: phk, dyson, terry et. al. Submitted by: Michael Hancock <michaelh@cet.co.jp>
# 95e5e988	05-Jan-1998	John Dyson <dyson@FreeBSD.org>	Make our v_usecount vnode reference count work identically to the original BSD code. The association between the vnode and the vm_object no longer includes reference counts. The major difference is that vm_object's are no longer freed gratuitiously from the vnode, and so once an object is created for the vnode, it will last as long as the vnode does. When a vnode object reference count is incremented, then the underlying vnode reference count is incremented also. The two "objects" are now more intimately related, and so the interactions are now much less complex. When vnodes are now normally placed onto the free queue with an object still attached. The rundown of the object happens at vnode rundown time, and happens with exactly the same filesystem semantics of the original VFS code. There is absolutely no need for vnode_pager_uncache and other travesties like that anymore. A side-effect of these changes is that SMP locking should be much simpler, the I/O copyin/copyout optimizations work, NFS should be more ponderable, and further work on layered filesystems should be less frustrating, because of the totally coherent management of the vnode objects and vnodes. Please be careful with your system while running this code, but I would greatly appreciate feedback as soon a reasonably possible.
# e51d1e87	17-Dec-1997	Garrett Wollman <wollman@FreeBSD.org>	Revert poll() for UFS files to traditional behavior where polling for read- or writability always returns true. This works around bugs in netscape and squid, at a minimum.
# 1cbbd625	14-Dec-1997	Garrett Wollman <wollman@FreeBSD.org>	Add support for poll(2) on files. vop_nopoll() now returns POLLNVAL if one of the new poll types is requested; hopefully this will not break any existing code. (This is done so that programs have a dependable way of determining whether a filesystem supports the extended poll types or not.) The new poll types added are: POLLWRITE - file contents may have been modified POLLNLINK - file was linked, unlinked, or renamed POLLATTRIB - file's attributes may have been changed POLLEXTEND - file was extended Note that the internal operation of poll() means that it is impossible for two processes to reliably poll for the same event (this could be fixed but may not be worth it), so it is not possible to rewrite `tail -f' to use poll at this time.
# 1cd52ec3	05-Dec-1997	Bruce Evans <bde@FreeBSD.org>	Don't include <sys/lock.h> in headers when only `struct simplelock' is required. Fixed everything that depended on the pollution.
# d662024a	18-Nov-1997	Bruce Evans <bde@FreeBSD.org>	Removed an unused #include. Fixed a style bug (one of many KNF breakages in vfs_subr.c moved here).
# dba3870c	26-Oct-1997	Poul-Henning Kamp <phk@FreeBSD.org>	VFS interior redecoration. Rename vn_default_error to vop_defaultop all over the place. Move vn_bwrite from vfs_bio.c to vfs_default.c and call it vop_stdbwrite. Use vop_null instead of nullop. Move vop_nopoll from vfs_subr.c to vfs_default.c Move vop_sharedlock from vfs_subr.c to vfs_default.c Move vop_nolock from vfs_subr.c to vfs_default.c Move vop_nounlock from vfs_subr.c to vfs_default.c Move vop_noislocked from vfs_subr.c to vfs_default.c Use vop_ebadf instead of *_ebadf. Add vop_defaultop for getpages on master vnode in MFS.
# 1b09ae77	26-Oct-1997	Poul-Henning Kamp <phk@FreeBSD.org>	Simplify the lease_check stuff.
# d54d34b5	16-Oct-1997	Poul-Henning Kamp <phk@FreeBSD.org>	Make a set of VOP standard lock, unlock & islocked VOP operators, which depend on the lock being located at vp->v_data. Saves 3x3 identical vop procs, more as the other filesystems becomes lock aware.
# e9565321	16-Oct-1997	Poul-Henning Kamp <phk@FreeBSD.org>	VFS clean up "hekto commit" 1. Add defaults for more VOPs VOP_LOCK vop_nolock VOP_ISLOCKED vop_noislocked VOP_UNLOCK vop_nounlock and remove direct reference in filesystems. 2. Rename the nfsv2 vnop tables to improve sorting order.
# 987f5696	16-Oct-1997	Poul-Henning Kamp <phk@FreeBSD.org>	Another VFS cleanup "kilo commit" 1. Remove VOP_UPDATE, it is (also) an UFS/{FFS,LFS,EXT2FS,MFS} intereface function, and now lives in the ufsmount structure. 2. Remove VOP_SEEK, it was unused. 3. Add mode default vops: VOP_ADVLOCK vop_einval VOP_CLOSE vop_null VOP_FSYNC vop_null VOP_IOCTL vop_enotty VOP_MMAP vop_einval VOP_OPEN vop_null VOP_PATHCONF vop_einval VOP_READLINK vop_einval VOP_REALLOCBLKS vop_eopnotsupp And remove identical functionality from filesystems 4. Add vop_stdpathconf, which returns the canonical stuff. Use it in the filesystems. (XXX: It's probably wrong that specfs and fifofs sets this vop, shouldn't it come from the "host" filesystem, for instance ufs or cd9660 ?) 5. Try to make system wide VOP functions have vop_* names. 6. Initialize the um_* vectors in LFS. (Recompile your LKMS!!!)
# 2df89645	16-Oct-1997	Poul-Henning Kamp <phk@FreeBSD.org>	Oops. forgot the blasted cvs add. Pointed hat sent from: Karl Denninger <karl@Mcs.Net>