Cross Reference: /freebsd-current/sys/sys/vnode.h

History log of /freebsd-current/sys/sys/vnode.h
Revision	Date	Author	Comments
# 56a8aca8	18-May-2024	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Stop treating size 0 as unknown size in vnode_create_vobject(). Whenever file is created, the vnode_create_vobject() function will try to determine its size by calling vn_getsize_locked() as size 0 is ambigious: it means either the file size is 0 or the file size is unknown. Introduce special value for the size argument: VNODE_NO_SIZE. Only when it is given, the vnode_create_vobject() will try to obtain file's size on its own. Introduce dedicated vnode_disk_create_vobject() for use by g_vfs_open(), so we don't have to call vn_isdisk() in the common case (for regular files). Handle the case of mediasize==0 in g_vfs_open(). Reviewed by: alc, kib, markj, olce Approved by: oshogbo (mentor), allanjude (mentor) Differential Revision: https://reviews.freebsd.org/D45244
# f04220c1	19-Jan-2024	Konstantin Belousov <kib@FreeBSD.org>	kcmp(2): implement for vnode files Reviewed by: brooks, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D43518
# 29363fb4	23-Nov-2023	Warner Losh <imp@FreeBSD.org>	sys: Remove ancient SCCS tags. Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl script. Sponsored by: Netflix
# 2ff63af9	16-Aug-2023	Warner Losh <imp@FreeBSD.org>	sys: Remove $FreeBSD$: one-line .h pattern Remove /^\s\+\s\$FreeBSD\$.$\n/
# 9c3bfe2a	10-Jul-2023	Konstantin Belousov <kib@FreeBSD.org>	Revert "VFS: Remove VV_READLINK flag" and "fdescfs: improve linrdlnk mount option" This reverts commits 4a402dfe0bc44770c9eac6e58a501e4805e29413 and 3bffa2262328e4ff1737516f176107f607e7bc76. The fix will be implemented in somewhat different manner. The semantic adjustment is incompatible with linuxolator expectations. Reported and reviewed by: dchagin Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D40969
# ba8cc6d7	12-Mar-2023	Mateusz Guzik <mjg@FreeBSD.org>	vfs: use __enum_uint8 for vtype and vstate This whacks hackery around only reading v_type once. Bump __FreeBSD_version to 1400093
# 9def8ea6	21-Apr-2023	Mateusz Guzik <mjg@FreeBSD.org>	vfs: list enums on separate lines Requested by: kib
# 4a402dfe	21-Jun-2023	Konstantin Belousov <kib@FreeBSD.org>	VFS: Remove VV_READLINK flag since its only reason to exist is removed. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D40700
# 2544b8e0	28-Apr-2023	Olivier Certner <olce.freebsd@certner.fr>	vfs: Rename vfs_emptydir() to vn_dir_check_empty() No functional change. While here, adapt comments to style(9). Reviewed by: kib MFC after: 1 week
# 3d8450db	24-Apr-2023	Olivier Certner <olce.freebsd@certner.fr>	vfs: vn_dir_next_dirent(): Simplify interface and harden Simplify the old interface (one less argument, simpler termination test) and add documentation about it. Add more sanity checks (mostly under INVARIANTS, but also in the general case to prevent infinite loops). Drop the explicit test on minimum directory entry size (without INVARIANTS). Deal with the impacts in callers (dirent_exists() and vop_stdvptocnp()). dirent_exists() has been simplified a bit, preserving the exact same semantics but for the return code whose meaning has been reversed (0 now means the entry exists, ENOENT that it doesn't and other values are genuine errors). While here, suppress gratuitous casts of malloc return values. vn_dir_next_dirent() has been tested by a 'make -j4 buildkernel' with a temporary modification to the VFS cache causing vn_vptocnp() to always call VOP_VPTOCNP() and finally vop_stdvptocnp() (observed with temporary debug counters). Export new _GENERIC_MINDIRSIZ and _GENERIC_MAXDIRSIZ on __BSD_VISIBLE, and GENERIC_MINDIRSIZ and GENERIC_MAXDIRSIZ on _KERNEL. Reviewed by: kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D39764
# 6bce3f23	23-Apr-2023	Olivier Certner <olce.freebsd@certner.fr>	vfs: Export get_next_dirent() as vn_dir_next_dirent() Move internal-to-'vfs_default.c' get_next_dirent() to 'vfs_vnops.c' and export it for use by other parts of the VFS. This is a preparatory change for using it in vfs_emptydir(). No functional change. Reviewed by: kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D39755
# 7b6fe242	08-Apr-2023	Konstantin Belousov <kib@FreeBSD.org>	DEBUG_VFS_LOCKS: use witness if available The assert_vop_locked messages are ignored, and file/line information is not too useful. Fixing this without changing both witness and VFS asserts KPIs is not possible. Reviewed by: markj (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D39464
# bb24eaea	05-Apr-2023	Konstantin Belousov <kib@FreeBSD.org>	vn_lock_pair(): allow to request shared locking If either of vnodes is shared locked, lock must not be recursed. Requested by: rmacklem Reviewed by: markj, rmacklem Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D39444
# 26b96487	07-Apr-2023	Mateusz Guzik <mjg@FreeBSD.org>	vfs: more informative panic for missing fplookup ops
# 5f6df177	03-Nov-2021	Mateusz Guzik <mjg@FreeBSD.org>	vfs: validate that vop vectors provide all or none fplookup vops In order to prevent later susprises.
# 62a573d9	16-Mar-2023	Mateusz Guzik <mjg@FreeBSD.org>	vfs: retire KERN_VNODE It got disabled in 2003: commit acb18acfec97aa7fe26ff48f80a5c3f89c9b542d Author: Poul-Henning Kamp <phk@FreeBSD.org> Date: Sun Feb 23 18:09:05 2003 +0000 Bracket the kern.vnode sysctl in #ifdef notyet because it results in massive locking issues on diskless systems. It is also not clear that this sysctl is non-dangerous in its requirements for locked down memory on large RAM systems. There does not seem to be practical use for it and the disabled routine does not work anyway. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D39127
# f45feecf	22-Sep-2022	Mateusz Guzik <mjg@FreeBSD.org>	vfs: add vn_getsize getattr is very expensive and in important cases only gets called to get the size. This can be optimized with a dedicated routine which obtains that statistic. As a step towards that goal make size-only consumers use a dedicated routine. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D37885
# 829f0bcb	19-Dec-2022	Mateusz Guzik <mjg@FreeBSD.org>	vfs: add the concept of vnode state transitions To quote from a comment above vput_final: <quote> * XXX Some filesystems pass in an exclusively locked vnode and strongly depend * on the lock being held all the way until VOP_INACTIVE. This in particular * happens with UFS which adds half-constructed vnodes to the hash, where they * can be found by other code. </quote> As is there is no mechanism which allows filesystems to denote that a vnode is fully initialized, consequently problems like the above are only found the hard way(tm). Add rudimentary support for state transitions, which in particular allow to assert the vnode is not legally unlocked until its fate is decided (either construction finishes or vgone is called to abort it). The new field lands in a 1-byte hole, thus it does not grow the struct. Bump __FreeBSD_version to 1400077 Reviewed by: kib (previous version) Tested by: pho Differential Revision: https://reviews.freebsd.org/D37759
# 94267fc9	22-Dec-2022	Mateusz Guzik <mjg@FreeBSD.org>	vfs: use designated initializers for the typename array While here prefix with v for better consistency with the vnode stuff. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D37759
# 78d35459	02-Dec-2022	Doug Rabson <dfr@FreeBSD.org>	Add vn_path_to_global_path_hardlink This is similar to vn_path_to_global_path but allows for regular files which may not be present in the cache. Reviewed by: mjg, kib Tested by: pho
# 080ef8a4	04-Aug-2022	Jason A. Harmening <jah@FreeBSD.org>	Add VV_CROSSLOCK vnode flag to avoid cross-mount lookup LOR When a lookup operation crosses into a new mountpoint, the mountpoint must first be busied before the root vnode can be locked. When a filesystem is unmounted, the vnode covered by the mountpoint must first be locked, and then the busy count for the mountpoint drained. Ordinarily, these two operations work fine if executed concurrently, but with a stacked filesystem the root vnode may in fact use the same lock as the covered vnode. By design, this will always be the case for unionfs (with either the upper or lower root vnode depending on mount options), and can also be the case for nullfs if the target and mount point are the same (which admittedly is very unlikely in practice). In this case, we have LOR. The lookup path holds the mountpoint busy while waiting on what is effectively the covered vnode lock, while a concurrent unmount holds the covered vnode lock and waits for the mountpoint's busy count to drain. Attempt to resolve this LOR by allowing the stacked filesystem to specify a new flag, VV_CROSSLOCK, on a covered vnode as necessary. Upon observing this flag, the vfs_lookup() will leave the covered vnode lock held while crossing into the mountpoint. Employ this flag for unionfs with the caveat that it can't be used for '-o below' mounts until other unionfs locking issues are resolved. Reported by: pho Tested by: pho Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D35054
# d653aaec	24-Oct-2022	Mateusz Guzik <mjg@FreeBSD.org>	cache: add cache_assert_no_entries
# 1b4b7517	18-Sep-2022	Konstantin Belousov <kib@FreeBSD.org>	Add vn_rlimit_fsizex() and vn_rlimit_fsizex_res() The vn_rlimit_fsizex() function: - checks that the write does not exceed RLIMIT_FSIZE limit and fs maximum supported file size - truncates write length if it exceeds the RLIMIT_FSIZE or max file size, but there are some bytes to write - sends SIGXFSZ if RLIMIT_FSIZE would be exceed otherwise POSIX mandates the truncated write in case when some bytes can be written but whole write request fails the RLIMIT_FSIZE check. The function is supposed to be used from VOP_WRITE()s. Due to pecularity in the VFS generic write syscall layer, uio_resid must correctly reflect the written amount (noted by markj). Provide the dual vn_rlimit_fsizex_res() function to correct uio_resid after the clamp done in vn_rlimit_fsizex() on VOP_WRITE() return. PR: 164793 Reviewed by: asomers, jah, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D36625
# 2ac083f6	18-Sep-2022	Konstantin Belousov <kib@FreeBSD.org>	Add vn_rlimit_trunc() Reviewed by: asomers, jah, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D36625
# fa3eb3c9	18-Sep-2022	Mateusz Guzik <mjg@FreeBSD.org>	vfs: indent V_VALID_FLAGS with a tab Requested by: kib
# a75d1ddd	17-Sep-2022	Mateusz Guzik <mjg@FreeBSD.org>	vfs: introduce V_PCATCH to stop abusing PCATCH
# a755fb92	10-Sep-2022	Mateusz Guzik <mjg@FreeBSD.org>	vfs: retire the V_MNTREF flag Reviewed by: kib, mckusick Differential Revision: https://reviews.freebsd.org/D36521
# c6487446	11-Apr-2022	Dmitry Chagin <dchagin@FreeBSD.org>	getdirentries: return ENOENT for unlinked but still open directory. To be more compatible to IEEE Std 1003.1-2008 (“POSIX.1”). Reviewed by: mjg, Pau Amma (doc) Differential revision: https://reviews.freebsd.org/D34680 MFC after: 2 weeks
# b7262756	02-Apr-2022	Mateusz Guzik <mjg@FreeBSD.org>	vfs: fixup WANTIOCTLCAPS on open In some cases vn_open_cred overwrites cn_flags, effectively nullifying initialisation done in NDINIT. This will have to be fixed. In the meantime make sure the flag is passed. Reported by: jenkins Noted by: Mathieu <sigsys@gmail.com>
# 66b177e1	12-Mar-2022	Mateusz Guzik <mjg@FreeBSD.org>	vfs: reduce spurious zeroing in VOP_STAT clang fails to take advantage of the fact that majority of the struct gets written to in the routine and decides to bzero the entire thing. Explicitly zero padding and spare fields, relying on KMSAN to catch problems should anything pop up later which also needs explicit zeroing. fstat on tmpfs (ops/s): before: 8216636 after: 8508033
# 381dd12c	12-Mar-2022	Mateusz Guzik <mjg@FreeBSD.org>	vfs: stop evaluating the argument multpile times in stat macros
# 66c5fbca	27-Jan-2022	Konstantin Belousov <kib@FreeBSD.org>	insmntque1(): remove useless arguments Also remove once-used functions to clean up after failed insmntque1(), which were destructor callbacks in previous life. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D34071
# 2a7e4cf8	27-Jan-2022	Mateusz Guzik <mjg@FreeBSD.org>	Revert b58ca5df0bb7 ("vfs: remove the now unused insmntque1") I was somehow convinced that insmntque calls insmntque1 with a NULL destructor. Unfortunately this worked well enough to not immediately blow up in simple testing. Keep not using the destructor in previously patched filesystems though as it avoids unnecessary casts. Noted by: kib Reported by: pho
# b58ca5df	26-Jan-2022	Mateusz Guzik <mjg@FreeBSD.org>	vfs: remove the now unused insmntque1 Bump __FreeBSD_version to 1400052.
# 4dd23ae1	10-Dec-2021	Mateusz Guzik <mjg@FreeBSD.org>	vfs: retire MNTK_NOKNOTE and VV_NOKNOTE MNTK_NOKNOTE was introduced in 679985d03a64f5dfb4355538ae6e3b70f8347f38 (dated 2005), VV_NOKNOTE in 34cc826ae8999f454dd6cb9c77d17ce83b169f92 few months later. Neither was ever used by anything in the tree.
# 4dcdf398	17-May-2021	Mateusz Guzik <mjg@FreeBSD.org>	vfs: replace the MNTK_TEXT_REFS flag with VIRF_TEXT_REF This allows to stop maintaing the VI_TEXT_REF flag and consequently opens up fully lockless v_writecount adjustment. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D33127
# 3ffcfa59	26-Nov-2021	Mateusz Guzik <mjg@FreeBSD.org>	vfs: add vop_stdadd_writecount_nomsync This avoids needing to inspect the mount point every time. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D33125
# 6eefabd4	22-Nov-2021	Brooks Davis <brooks@FreeBSD.org>	syscalls: improve nstat, nfstat, nlstat Optionally return errors when truncating dev_t, ino_t, and nlink_t. In the interest of code reuse, use freebsd11_cvtstat() to perform the truncation and error handling and then convert the resulting struct freebsd11_stat to struct nstat. Add missing freebsd32 compat syscalls. These syscalls require translation because struct nstat contains four instances of struct timespec which in turn contains a time_t and a long. Reviewed by: kib
# d28af1ab	15-Nov-2021	Mark Johnston <markj@FreeBSD.org>	vm: Add a mode to vm_object_page_remove() which skips invalid pages This will be used to break a deadlock in ZFS between the per-mountpoint teardown lock and page busy locks. In particular, when purging data from the page cache during dataset rollback, we want to avoid blocking on the busy state of invalid pages since the busying thread may be blocked on the teardown lock in zfs_getpages(). Add a helper, vn_pages_remove_valid(), for use by filesystems. Bump __FreeBSD_version so that the OpenZFS port can make use of the new helper. PR: 258208 Reviewed by: avg, kib, sef Tested by: pho (part of a larger patch) MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32931
# 47b248ac	03-Nov-2021	Konstantin Belousov <kib@FreeBSD.org>	Make locking assertions for VOP_FSYNC() and VOP_FDATASYNC() more correct For devfs vnodes, it is fine to not lock vnodes for VOP_FSYNC(). Otherwise vnode must be locked exclusively, except for MNT_SHARED_WRITES() where the shared lock is enough. Reported and tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32761
# 9a0bee9f	22-Oct-2021	Konstantin Belousov <kib@FreeBSD.org>	Make vn_fullpath_hardlink() externally callable Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32611
# 2b68eb8e	01-Oct-2021	Mateusz Guzik <mjg@FreeBSD.org>	vfs: remove thread argument from VOP_STAT and fo_stat.
# c5128c48	07-Sep-2021	Rick Macklem <rmacklem@FreeBSD.org>	VOP_COPY_FILE_RANGE: Add a COPY_FILE_RANGE_TIMEO1SEC flag Although it is not specified in the RFCs, the concept that the NFSv4 server should reply to an RPC request within a reasonable time is accepted practice within the NFSv4 community. Without this patch, the NFSv4.2 server attempts to reply to a Copy operation within 1second by limiting the copy to vfs.nfs.maxcopyrange bytes (default 10Mbytes). This is crude at best, given the large variation in I/O subsystem performance. This patch adds a kernel only flag COPY_FILE_RANGE_TIMEO1SEC that the NFSv4.2 can specify, which tells VOP_COPY_FILE_RANGE() to return after approximately 1 second with a partial result and implements this in vn_generic_copy_file_range(), used by vop_stdcopyfilerange(). Modifying the NFSv4.2 server to set this flag will be done in a separate patch. Also under consideration is exposing the COPY_FILE_RANGE_TIMEO1SEC to userland for use on the FreeBSD copy_file_range(2) syscall. MFC after: 2 weeks Reviewed by: khng Differential Revision: https://reviews.freebsd.org/D31829
# da779f26	27-Aug-2021	Rick Macklem <rmacklem@FreeBSD.org>	vfs_default: Change vop_stddeallocate() from static to global A future commit to the NFS client uses vop_stddeallocate() for cases where the NFS server does not support a Deallocate operation. Change vop_stddeallocate() from static to global so that it can be called by the NFS client. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D31640
# 0dc332bf	05-Aug-2021	Ka Ho Ng <khng@FreeBSD.org>	Add fspacectl(2), vn_deallocate(9) and VOP_DEALLOCATE(9). fspacectl(2) is a system call to provide space management support to userspace applications. VOP_DEALLOCATE(9) is a VOP call to perform the deallocation. vn_deallocate(9) is a public KPI for kmods' use. The purpose of proposing a new system call, a KPI and a VOP call is to allow bhyve or other hypervisor monitors to emulate the behavior of SCSI UNMAP/NVMe DEALLOCATE on a plain file. fspacectl(2) comprises of cmd and flags parameters to specify the space management operation to be performed. Currently cmd has to be SPACECTL_DEALLOC, and flags has to be 0. fo_fspacectl is added to fileops. VOP_DEALLOCATE(9) is added as a new VOP call. A trivial implementation of VOP_DEALLOCATE(9) is provided. Sponsored by: The FreeBSD Foundation Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D28347
# abbb57d5	04-Aug-2021	Ka Ho Ng <khng@FreeBSD.org>	vfs: Introduce vn_bmap_seekhole_locked() vn_bmap_seekhole_locked() is factored out version of vn_bmap_seekhole(). This variant requires shared vnode lock being held around the call. Sponsored by: The FreeBSD Foundation Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D31404
# 0ef5eee9	03-Aug-2021	Konstantin Belousov <kib@FreeBSD.org>	Add vn_lktype_write() and remove repetetive code that calculates vnode locking type for write. Reviewed by: khng, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31405
# 844aa31c	08-Jul-2021	Mateusz Guzik <mjg@FreeBSD.org>	cache: add cache_enter_time_flags
# 3cf75ca2	28-May-2021	Mateusz Guzik <mjg@FreeBSD.org>	vfs: retire unused vn_seqc_write_begin_unheld*
# cf74b2be	22-May-2021	Mateusz Guzik <mjg@FreeBSD.org>	vfs: retire the now unused vnlru_free routine
# 72b3b5a9	08-Apr-2021	Mateusz Guzik <mjg@FreeBSD.org>	vfs: replace vfs_smr_quiesce with vfs_smr_synchronize This ends up using a smr specific method. Suggested by: markj Tested by: pho
# 3f56bc79	30-Mar-2021	Mateusz Guzik <mjg@FreeBSD.org>	vfs: add vfs_smr_quiesce This can be used to observe all CPUs not executing while within vfs_smr_enter.
# e9272225	17-Mar-2021	Mateusz Guzik <mjg@FreeBSD.org>	vfs: fix vnlru marker handling for filtered/unfiltered cases The global list has a marker with an invariant that free vnodes are placed somewhere past that. A caller which performs filtering (like ZFS) can move said marker all the way to the end, across free vnodes which don't match. Then a caller which does not perform filtering will fail to find them. This makes vn_alloc_hard sleep for 1 second instead of reclaiming, resulting in significant stalls. Fix the problem by requiring an explicit marker by callers which do filtering. As a temporary measure extend vnlru_free to restart if it fails to reclaim anything. Big thanks go to the reporter for testing several iterations of the patch. Reported by: Yamagi <lists yamagi.org> Tested by: Yamagi <lists yamagi.org> Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D29324
# 2443068d	21-Feb-2021	Mateusz Guzik <mjg@FreeBSD.org>	vfs: shrink struct vnode to 448 bytes on LP64 ... by moving v_hash into a 4 byte hole. Combined with several previous size reductions this makes the size small enough to fit 9 vnodes per page as opposed to 8. Add a compilation time assert so that this is not unknowingly worsened. Note the structure still remains bigger than it should be.
# 2bfd8992	14-Feb-2021	Konstantin Belousov <kib@FreeBSD.org>	vnode: move write cluster support data to inodes. The data is only needed by filesystems that 1. use buffer cache 2. utilize clustering write support. Requested by: mjg Reviewed by: asomers (previous version), fsu (ext2 parts), mckusick Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28679
# fa3bd463	29-Jan-2021	Konstantin Belousov <kib@FreeBSD.org>	lockf: ensure atomicity of lockf for open(O_CREAT\|O_EXCL\|O_EXLOCK) or EX_SHLOCK. Do it by setting a vnode iflag indicating that the locking exclusive open is in progress, and not allowing F_LOCK request to make a progress until the first open finishes. Requested by: mckusick Reviewed by: markj, mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D28697
# b59a8e63	30-Jan-2021	Konstantin Belousov <kib@FreeBSD.org>	Stop ignoring ERELOOKUP from VOP_INACTIVE() When possible, relock the vnode and retry inactivation. Only vunref() is required not to drop the vnode lock, so handle it specially by not retrying. This is a part of the efforts to ensure that unlinked not referenced vnode does not prevent inode from reusing. Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
# 8d2a230e	25-Jan-2021	Mateusz Guzik <mjg@FreeBSD.org>	vfs: use atomic_load_consume_ptr in vn_load_v_data_smr
# 739ecbcf	23-Jan-2021	Mateusz Guzik <mjg@FreeBSD.org>	cache: add symlink support to lockless lookup Reviewed by: kib (previous version) Tested by: pho (previous version) Differential Revision: https://reviews.freebsd.org/D27488
# a5e28403	07-Jan-2021	Thomas Munro <tmunro@FreeBSD.org>	open(2): Add O_DSYNC flag. POSIX O_DSYNC means that writes include an implicit fdatasync(2), just as O_SYNC implies fsync(2). VOP_WRITE() functions that understand the new IO_DATASYNC flag can act accordingly, but we'll still pass down IO_SYNC so that file systems that don't understand it will continue to provide the stronger O_SYNC behaviour. Flag also applies to fcntl(2). Reviewed by: kib, delphij Differential Revision: https://reviews.freebsd.org/D25090
# c6d3272b	05-Jan-2021	Mateusz Guzik <mjg@FreeBSD.org>	vfs: add vn_seqc_read_notmodify
# 33f3e81d	01-Jan-2021	Mateusz Guzik <mjg@FreeBSD.org>	cache: combine fast path enabled status into one flag Tested by: pho
# 82397d79	31-Dec-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: denote vnode being a mount point with VIRF_MOUNTPOINT Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D27794
# 3e506a67	27-Dec-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: add v_irflag accessors Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D27793
# 3b1f974b	26-Nov-2020	Konstantin Belousov <kib@FreeBSD.org>	Make max ticks for pause in vn_lock_pair() adjustable at runtime. Reduce default value from hz / 10 to hz / 100. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation
# 7cde2ec4	13-Nov-2020	Konstantin Belousov <kib@FreeBSD.org>	Implement vn_lock_pair(). In collaboration with: pho Reviewed by: mckusick (previous version), markj (previous version) Tested by: markj (syzkaller), pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D26136
# 4bfebc8d	30-Oct-2020	Mateusz Guzik <mjg@FreeBSD.org>	cache: add cache_vop_mkdir and rename cache_rename to cache_vop_rename
# c7520caa	22-Oct-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: prevent avoidable evictions on mkdir of existing directories mkdir -p /foo/bar/baz will mkdir each path component and ignore EEXIST. The NOCACHE lookup will make the namecache unnecessarily evict the existing entry, and then fallback to the fs lookup routine eventually leading namei to return an error as the directory is already there. For invocations like mkdir -p /usr/obj/usr/src/sys/GENERIC/modules this triggers fallbacks to the slowpath for concurrently executing lookups. Tested by: pho Discussed with: kib
# 8ecd87a3	20-Oct-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: drop spurious cred argument from VOP_VPTOCNP
# 214eccf4	14-Oct-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: add VOP_EAGAIN Can be used to stub fplookup for example.
# a3d9bf49	23-Sep-2020	Mateusz Guzik <mjg@FreeBSD.org>	cache: drop the force flag from purgevfs The optional scan is wasteful, thus it is removed altogether from unmount. Callers which always want it anyway remain unaffected.
# 3c484f32	15-Sep-2020	Konstantin Belousov <kib@FreeBSD.org>	Convert page cache read to VOP. There are several negative side-effects of not calling into VOP layer at all for page cache reads. The biggest is the missed activation of EVFILT_READ knotes. Also, it allows filesystem to make more fine grained decision to refuse read from page cache. Keep VIRF_PGREAD flag around, it is still useful for nullfs, and for asserts. Reviewed by: markj Tested by: pho Discussed with: mjg Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D26346
# 88863665	15-Sep-2020	Konstantin Belousov <kib@FreeBSD.org>	vfs_subr.c: export io_hold_cnt and vn_read_from_obj(). Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D26346
# b1a824b6	02-Sep-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: retire vholdl as a symbol Similarly to vrefl in r364283.
# f6e54eb3	01-Sep-2020	Mateusz Guzik <mjg@FreeBSD.org>	sys: clean up empty lines in .c and .h files
# feabaaf9	24-Aug-2020	Mateusz Guzik <mjg@FreeBSD.org>	cache: drop the always curthread argument from reverse lookup routines Note VOP_VPTOCNP keeps getting it as temporary compatibility for zfs. Tested by: pho
# 39f88150	20-Aug-2020	Mateusz Guzik <mjg@FreeBSD.org>	cache: add cache_rename, a dedicated helper to use for renames While here make both tmpfs and ufs use it. No fuctional changes.
# 7ad2a82d	18-Aug-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: drop the error parameter from vn_isdisk, introduce vn_isdisk_error Most consumers pass NULL.
# fbca789f	16-Aug-2020	Konstantin Belousov <kib@FreeBSD.org>	VMIO read If possible, i.e. if the requested range is resident valid in the vm object queue, and some secondary conditions hold, copy data for read(2) directly from the valid cached pages, avoiding vnode lock and instantiating buffers. I intentionally do not start read-ahead, nor handle the advises on the cached range. Filesystems indicate support for VMIO reads by setting VIRF_PGREAD flag, which must not be cleared until vnode reclamation. Currently only filesystems that use vnode pager for v_objects can enable it, due to reliance on vnp_size. There is a WIP to handle it for tmpfs. Reviewed by: markj Discussed with: jeff Tested by: pho Benchmarked by: mjg Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D25968
# 60414088	16-Aug-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: retire vrefl as a symbol vrefl calls vref and there is only one in-tree consumer. Keep it as a macro for assertion purposes.
# a92a971b	16-Aug-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: remove the thread argument from vget It was already asserted to be curthread. Semantic patch: @@ expression arg1, arg2, arg3; @@ - vget(arg1, arg2, arg3) + vget(arg1, arg2)
# 36f47512	11-Aug-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: inline vrefcnt
# 4c2d103a	11-Aug-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: garbage collect vrefactn
# 3b444436	11-Aug-2020	Mateusz Guzik <mjg@FreeBSD.org>	devfs: rework si_usecount to track opens This removes a lot of special casing from the VFS layer. Reviewed by: kib (previous version) Tested by: pho (previous version) Differential Revision: https://reviews.freebsd.org/D25612
# 51ea7bea	07-Aug-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: add VOP_STAT The current scheme of calling VOP_GETATTR adds avoidable overhead. An example with tmpfs doing fstat (ops/s): before: 7488958 after: 7913833 Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D25910
# d292b194	05-Aug-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: remove the obsolete privused argument from vaccess This brings argument count down to 6, which is passable without the stack on amd64.
# db99ec56	04-Aug-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: support lockless dotdot lookup Tested by: pho
# 6e10434c	04-Aug-2020	Mateusz Guzik <mjg@FreeBSD.org>	cache: add cache_purge_vgone cache_purge locklessly checks whether the vnode at hand has any namecache entries. This can race with a concurrent purge which managed to remove the last entry, but may not be done touching the vnode. Make sure we observe the relevant vnode lock as not taken before proceeding with vgone. Paired with the fact that doomed vnodes cannnot receive entries this restores the invariant that there are no namecache-related writing users past cache_purge in vgone. Reported by: pho
# b145e389	02-Aug-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: shorten v_iflag and v_vflag While here renumber VI_* flags to remove the gaps. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D25921
# 838984de	02-Aug-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: move namecache initialisation into cache_vnode_init
# 848f8eff	30-Jul-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: inline vops if there are no pre/post associated calls This removes a level of indirection from frequently used methods, most notably VOP_LOCK1 and VOP_UNLOCK1. Tested by: pho
# 07d2145a	25-Jul-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: add the infrastructure for lockless lookup Reviewed by: kib Tested by: pho (in a patchset) Differential Revision: https://reviews.freebsd.org/D25577
# 0379ff6a	25-Jul-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: introduce vnode sequence counters Modified on each permission change and link/unlink. Reviewed by: kib Tested by: pho (in a patchset) Differential Revision: https://reviews.freebsd.org/D25573
# f8022be3	30-Jun-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: protect vnodes with smr vget_prep_smr and vhold_smr can be used to ref a vnode while within vfs_smr section, allowing consumers to get away without locking. See vhold_smr and vdropl for comments explaining caveats. Reviewed by: kib Testec by: pho Differential Revision: https://reviews.freebsd.org/D23913
# 2782c00c	22-Feb-2020	Ryan Libby <rlibby@FreeBSD.org>	vfs: quiet -Wwrite-strings Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23797
# 6a5abb1e	02-Feb-2020	Kyle Evans <kevans@FreeBSD.org>	Provide O_SEARCH O_SEARCH is defined by POSIX [0] to open a directory for searching, skipping permissions checks on the directory itself after the initial open(). This is close to the semantics we've historically applied for O_EXEC on a directory, which is UB according to POSIX. Conveniently, O_SEARCH on a file is also explicitly undefined behavior according to POSIX, so O_EXEC would be a fine choice. The spec goes on to state that O_SEARCH and O_EXEC need not be distinct values, but they're not defined to be the same value. This was pointed out as an incompatibility with other systems that had made its way into libarchive, which had assumed that O_EXEC was an alias for O_SEARCH. This defines compatibility O_SEARCH/FSEARCH (equivalent to O_EXEC and FEXEC respectively) and expands our UB for O_EXEC on a directory. O_EXEC on a directory is checked in vn_open_vnode already, so for completeness we add a NOEXECCHECK when O_SEARCH has been specified on the top-level fd and do not re-check that when descending in namei. [0] https://pubs.opengroup.org/onlinepubs/9699919799/ Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23247
# 6698e11f	02-Feb-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: remove the now empty vop_unlock_post
# 10a15df6	02-Feb-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: remove the never set VDESC_VPP_WILLRELE flag
# 7739d927	01-Feb-2020	Mateusz Guzik <mjg@FreeBSD.org>	cache: replace kern___getcwd with vn_getcwd The previous routine was resulting in extra data copies most notably in linux_getcwd.
# 45757984	01-Feb-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: consistently use size_t for buflen around VOP_VPTOCNP
# 643656cf	31-Jan-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: replace VOP_MARKATIME with VOP_MMAPPED The routine is only provided by ufs and is only used on mmap and exec. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23422
# 21c4f104	31-Jan-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: add vrefactn Differential Revision: https://reviews.freebsd.org/D23427
# 3cfabd81	30-Jan-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: remove the never set VDESC_NOMAP_VPP flag
# 0c236d3d	12-Jan-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: per-cpu batched requeuing of free vnodes Constant requeuing adds significant lock contention in certain workloads. Lessen the problem by batching it. Per-cpu areas are locked in order to synchronize against UMA freeing memory. vnode's v_mflag is converted to short to prevent the struct from growing. Sample result from an incremental make -s -j 104 bzImage on tmpfs: stock: 122.38s user 1780.45s system 6242% cpu 30.480 total patched: 144.84s user 985.90s system 4856% cpu 23.282 total Reviewed by: jeff Tested by: pho (in a larger patch, previous version) Differential Revision: https://reviews.freebsd.org/D22998
# cc3593fb	12-Jan-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: rework vnode list management The current notion of an active vnode is eliminated. Vnodes transition between 0<->1 hold counts all the time and the associated traversal between different lists induces significant scalability problems in certain workloads. Introduce a global list containing all allocated vnodes. They get unlinked only when UMA reclaims memory and are only requeued when hold count reaches 0. Sample result from an incremental make -s -j 104 bzImage on tmpfs: stock: 118.55s user 3649.73s system 7479% cpu 50.382 total patched: 122.38s user 1780.45s system 6242% cpu 30.480 total Reviewed by: jeff Tested by: pho (in a larger patch, previous version) Differential Revision: https://reviews.freebsd.org/D22997
# 57083d25	12-Jan-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: add per-mount vnode lazy list and use it for deferred inactive + msync This obviates the need to scan the entire active list looking for vnodes of interest. msync is handled by adding all vnodes with write count to the lazy list. deferred inactive directly adds vnodes as it sets the VI_DEFINACT flag. Vnodes get dequeued from the list when their hold count reaches 0. Newly added MNT_VNODE_FOREACH_LAZY* macros support filtering so that spurious locking is avoided in the common case. Reviewed by: jeff Tested by: pho (in a larger patch, previous version) Differential Revision: https://reviews.freebsd.org/D22995
# b52d50cf	11-Jan-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: prealloc vnodes in getnewvnode_reserve Having a reserved vnode count does not guarantee that getnewvnodes wont block later. Said blocking partially defeats the purpose of reserving in the first place. Preallocate instaed. The only consumer was always passing "1" as count and never nesting reservations.
# 69283067	11-Jan-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: incomplete pass at converting more ints to u_long Most notably numvnodes and freevnodes were u_long, but parameters used to govern them remained as ints.
# c8b3463d	07-Jan-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: reimplement deferred inactive to use a dedicated flag (VI_DEFINACT) The previous behavior of leaving VI_OWEINACT vnodes on the active list without a hold count is eliminated. Hold count is kept and inactive processing gets explicitly deferred by setting the VI_DEFINACT flag. The syncer is then responsible for vdrop. Reviewed by: kib (previous version) Tested by: pho (in a larger patch, previous version) Differential Revision: https://reviews.freebsd.org/D23036
# 478368ca	06-Jan-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: eliminate v_tag from struct vnode There was only one consumer and it was using it incorrectly. It is given an equivalent hack. Reviewed by: jeff Differential Revision: https://reviews.freebsd.org/D23037
# 8dbc6352	04-Jan-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: drop thread argument from vinactive
# 1cde9e38	04-Jan-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: predict VN_IS_DOOMED as false The macro is used everywhere.
# 952f5953	03-Jan-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: remove CTASSERT from VOP_UNLOCK_FLAGS gcc does not like it and it's not worth working around just for that compiler.
# b249ce48	03-Jan-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: drop the mostly unused flags argument from VOP_UNLOCK Filesystems which want to use it in limited capacity can employ the VOP_UNLOCK_FLAGS macro. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D21427
# 3d59b89c	03-Jan-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: add VOP_UNLOCK_FLAGS The flags argument from VOP_UNLOCK is about to be removed and some filesystems unlock the interlock as a convienience with it. Add a helper to retain the behavior for the few cases it is needed.
# c400abe5	16-Dec-2019	Li-Wen Hsu <lwhsu@FreeBSD.org>	Fix gcc build after r355790 Sponsored by: The FreeBSD Foundation
# 6fa079fc	15-Dec-2019	Mateusz Guzik <mjg@FreeBSD.org>	vfs: flatten vop vectors This eliminates the following loop from all VOP calls: while(vop != NULL && \ vop->vop_spare2 == NULL && vop->vop_bypass == NULL) vop = vop->vop_default; Reviewed by: jeff Tesetd by: pho Differential Revision: https://reviews.freebsd.org/D22738
# ea9a16b2	12-Dec-2019	Rick Macklem <rmacklem@FreeBSD.org>	r355677 requires that vop_stdioctl() be global so it can be called from NFS. r355677 modified the NFS client so that it does lseek(SEEK_DATA/SEEK_HOLE) for NFSv4.2, but calls vop_stdioctl() otherwise. As such, vop_stdioctl() needs to be a global function. Missed during the code merge for r355677.
# c8b29d12	11-Dec-2019	Mateusz Guzik <mjg@FreeBSD.org>	vfs: locking primitives which elide ->v_vnlock and shared locking disablement Both of these features are not needed by many consumers and result in avoidable reads which in turn puts them on profiles due to cache-line ping ponging. On top of that the current lockgmr entry point is slower than necessary single-threaded. As an attempted clean up preparing for other changes, provide new routines which don't support any of the aforementioned features. With these patches in place vop_stdlock and vop_stdunlock disappear from flamegraphs during -j 104 buildkernel. Reviewed by: jeff (previous version) Tested by: pho Differential Revision: https://reviews.freebsd.org/D22665
# ff4486e8	09-Dec-2019	Mateusz Guzik <mjg@FreeBSD.org>	vfs: refactor vhold and vdrop No fuctional changes.
# abd80ddb	08-Dec-2019	Mateusz Guzik <mjg@FreeBSD.org>	vfs: introduce v_irflag and make v_type smaller The current vnode layout is not smp-friendly by having frequently read data avoidably sharing cachelines with very frequently modified fields. In particular v_iflag inspected for VI_DOOMED can be found in the same line with v_usecount. Instead make it available in the same cacheline as the v_op, v_data and v_type which all get read all the time. v_type is avoidably 4 bytes while the necessary data will easily fit in 1. Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new flag field with a new value: VIRF_DOOMED. Reviewed by: kib, jeff Differential Revision: https://reviews.freebsd.org/D22715
# 89c4c2e5	30-Nov-2019	Mateusz Guzik <mjg@FreeBSD.org>	vfs: swap placement between v_type and v_tag The former is frequently accessed (e.g., in vfs_cache_lookup) and shares the cacheline with v_usecount, avoidably adding to cache misses during concurrent lookup. The latter is almost unused and probably can get garbage-collected. The struct does not change in size despite enum vs char * discrepancy. On 64-bit archs there used to be 4 bytes padding after v_type giving 480 bytes in total.
# fdc6b10d	29-Nov-2019	Konstantin Belousov <kib@FreeBSD.org>	Add a VN_OPEN_INVFS flag. vn_open_cred() assumes that it is called from the top-level of a VFS syscall. Writers must call bwillwrite() before locking any VFS resource to wait for cleanup of dirty buffers. ZFS getextattr() and setextattr() VOPs do call vn_open_cred(), which results in wait for unrelated buffers while owning ZFS vnode lock (and ZFS does not use buffer cache). VN_OPEN_INVFS allows caller to skip bwillwrite. Note that ZFS is still incorrect there, because it starts write on an mp and locks a vnode while holding another vnode lock. Reported by: Willem Jan Withagen <wjw@digiware.nl> Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 208b81bb	22-Oct-2019	Konstantin Belousov <kib@FreeBSD.org>	Add VV_VMSIZEVNLOCK flag. The flag specifies that vm_fault() handler should check the vnode' vm_object size under the vnode lock. It is converted into the object' OBJ_SIZEVNLOCK flag in vnode_pager_alloc(). Tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D21883
# dc20b834	06-Oct-2019	Mateusz Guzik <mjg@FreeBSD.org>	vfs: add optional root vnode caching Root vnodes looekd up all the time, e.g. when crossing a mount point. Currently used routines always perform a costly lookup which can be trivially avoided. Reviewed by: jeff (previous version), kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21646
# ba7a55d9	22-Sep-2019	Sean Eric Fagan <sef@FreeBSD.org>	Add two options to allow mount to avoid covering up existing mount points. The two options are * nocover/cover: Prevent/allow mounting over an existing root mountpoint. E.g., "mount -t ufs -o nocover /dev/sd1a /usr/local" will fail if /usr/local is already a mountpoint. * emptydir/noemptydir: Prevent/allow mounting on a non-empty directory. E.g., "mount -t ufs -o emptydir /dev/sd1a /usr" will fail. Neither of these options is intended to be a default, for historical and compatibility reasons. Reviewed by: allanjude, kib Differential Revision: https://reviews.freebsd.org/D21458
# e3c3248c	03-Sep-2019	Mateusz Guzik <mjg@FreeBSD.org>	vfs: implement usecount implying holdcnt vnodes have 2 reference counts - holdcnt to keep the vnode itself from getting freed and usecount to denote it is actively used. Previously all operations bumping usecount would also bump holdcnt, which is not necessary. We can detect if usecount is already > 1 (in which case holdcnt is also > 1) and utilize it to avoid bumping holdcnt on our own. This saves on atomic ops. Reviewed by: kib Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21471
# 1e2f0ceb	28-Aug-2019	Mateusz Guzik <mjg@FreeBSD.org>	vfs: add VOP_NEED_INACTIVE vnode usecount drops to 0 all the time (e.g. for directories during path lookup). When that happens the kernel would always lock the exclusive lock for the vnode in order to call vinactive(). This blocks other threads who want to use the vnode for looukp. vinactive is very rarely needed and can be tested for without the vnode lock held. This patch gives filesytems an opportunity to do it, sample total wait time for tmpfs over 500 minutes of poudriere -j 104: before: 557563641706 (lockmgr:tmpfs) after: 46309603301 (lockmgr:tmpfs) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21371
# 8795830c	25-Aug-2019	Mateusz Guzik <mjg@FreeBSD.org>	vfs: swap vop_unlock_post and vop_unlock_pre definitions to the logical order The change is no-op. Sponsored by: The FreeBSD Foundation
# 0256405e	24-Aug-2019	Mateusz Guzik <mjg@FreeBSD.org>	vfs: add vholdnz (for already held vnodes) Reviewed by: kib (previous version) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21358
# de4e1aeb	18-Aug-2019	Konstantin Belousov <kib@FreeBSD.org>	Fix an issue with executing tmpfs binary. Suppose that a binary was executed from tmpfs mount, and the text vnode was reclaimed while the binary was still running. It is possible during even the normal operations since tmpfs vnode' vm_object has swap type, and no references on the vnode is held. Also assume that the text vnode was revived for some reason. Then, on the process exit or exec, unmapping of the text mapping tries to remove the text reference from the vnode, but since it went from recycle/instantiation cycle, there is no reference kept, and assertion in VOP_UNSET_TEXT_CHECKED() triggers. Fix this by keeping a use reference on the tmpfs vnode for each exec reference. This prevents the vnode reclamation while executable map entry is active. Do it by adding per-mount flag MNTK_TEXT_REFS that directs vop_stdset_text() to add use ref on first vnode text use, and per-vnode VI_TEXT_REF flag, to record the need on unref in vop_stdunset_text() on last vnode text use going away. Set MNTK_TEXT_REFS for tmpfs mounts. Reported by: bdrewery Tested by: sbruno, pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 520482f4	30-Jul-2019	Mark Johnston <markj@FreeBSD.org>	Use VNASSERT() in checked VOP wrappers. Reviewed by: kib MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21120
# 2240d8c4	27-Jul-2019	Alan Somers <asomers@FreeBSD.org>	Add v_inval_buf_range, like vtruncbuf but for a range of a file v_inval_buf_range invalidates all buffers within a certain LBA range of a file. It will be used by fusefs(5). This commit is a partial merge of r346162, r346606, and r346756 from projects/fuse2. Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21032
# bbbbeca3	24-Jul-2019	Rick Macklem <rmacklem@FreeBSD.org>	Add kernel support for a Linux compatible copy_file_range(2) syscall. This patch adds support to the kernel for a Linux compatible copy_file_range(2) syscall and the related VOP_COPY_FILE_RANGE(9). This syscall/VOP can be used by the NFSv4.2 client to implement the Copy operation against an NFSv4.2 server to do file copies locally on the server. The vn_generic_copy_file_range() function in this patch can be used by the NFSv4.2 server to implement the Copy operation. Fuse may also me able to use the VOP_COPY_FILE_RANGE() method. vn_generic_copy_file_range() attempts to maintain holes in the output file in the range to be copied, but may fail to do so if the input and output files are on different file systems with different _PC_MIN_HOLE_SIZE values. Separate commits will be done for the generated syscall files and userland changes. A commit for a compat32 syscall will be done later. Reviewed by: kib, asomers (plus comments by brooks, jilles) Relnotes: yes Differential Revision: https://reviews.freebsd.org/D20584
# 555d8f28	01-Jul-2019	Rick Macklem <rmacklem@FreeBSD.org>	Factor out the code that does a VOP_SETATTR(size) from vn_truncate(). This patch factors the code in vn_truncate() that does the actual VOP_SETATTR() of size into a separate function called vn_truncate_locked(). This will allow the NFS server and the patch that adds a copy_file_range(2) syscall to call this function instead of duplicating the code and carrying over changes, such as the recent r347151. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D20808
# e3680954	27-Jun-2019	Rick Macklem <rmacklem@FreeBSD.org>	Add non-blocking trylock variants for the rangelock functions. A future patch that will add a Linux compatible copy_file_range(2) syscall needs to be able to lock the byte ranges of two files concurrently. To do this without a risk of deadlock, a non-blocking variant of vn_rangelock_rlock() called vn_rangelock_tryrlock() was needed. This patch adds this, along with vn_rangelock_trywlock(), in order to do this. The patch also adds a couple of comments, that I hope clarify how the algorithm used in kern_rangelock.c works. Reviewed by: kib, asomers (previous version) Differential Revision: https://reviews.freebsd.org/D20645
# 65417f5e	24-May-2019	Alan Somers <asomers@FreeBSD.org>	Remove "struct ucred" argument from vtruncbuf vtruncbuf takes a "struct ucred" argument. AFAICT, it's been unused ever since that function was first added in r34611. Remove it. Also, remove some "struct ucred" arguments from fuse and nfs functions that were only used by vtruncbuf. Reviewed by: cem MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20377
# 78022527	05-May-2019	Konstantin Belousov <kib@FreeBSD.org>	Switch to use shared vnode locks for text files during image activation. kern_execve() locks text vnode exclusive to be able to set and clear VV_TEXT flag. VV_TEXT is mutually exclusive with the v_writecount > 0 condition. The change removes VV_TEXT, replacing it with the condition v_writecount <= -1, and puts v_writecount under the vnode interlock. Each text reference decrements v_writecount. To clear the text reference when the segment is unmapped, it is recorded in the vm_map_entry backed by the text file as MAP_ENTRY_VN_TEXT flag, and v_writecount is incremented on the map entry removal The operations like VOP_ADD_WRITECOUNT() and VOP_SET_TEXT() check that v_writecount does not contradict the desired change. vn_writecheck() is now racy and its use was eliminated everywhere except access. Atomic check for writeability and increment of v_writecount is performed by the VOP. vn_truncate() now increments v_writecount around VOP_SETATTR() call, lack of which is arguably a bug on its own. nullfs bypasses v_writecount to the lower vnode always, so nullfs vnode has its own v_writecount correct, and lower vnode gets all references, since object->handle is always lower vnode. On the text vnode' vm object dealloc, the v_writecount value is reset to zero, and deadfs vop_unset_text short-circuit the operation. Reclamation of lowervp always reclaims all nullfs vnodes referencing lowervp first, so no stray references are left. Reviewed by: markj, trasz Tested by: mjg, pho Sponsored by: The FreeBSD Foundation MFC after: 1 month Differential revision: https://reviews.freebsd.org/D19923
# 6af6fdce	12-Apr-2019	Alan Somers <asomers@FreeBSD.org>	fusefs: evict invalidated cache contents during write-through fusefs's default cache mode is "writethrough", although it currently works more like "write-around"; writes bypass the cache completely. Since writes bypass the cache, they were leaving stale previously-read data in the cache. This commit invalidates that stale data. It also adds a new global v_inval_buf_range method, like vtruncbuf but for a range of a file. PR: 235774 Reported by: cem Sponsored by: The FreeBSD Foundation
# ae909414	09-Apr-2019	Konstantin Belousov <kib@FreeBSD.org>	Add vn_fsync_buf(). Provide a convenience function to avoid the hack with filling fake struct vop_fsync_args and then calling vop_stdfsync(). Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 8ff7fad1	23-Oct-2018	Konstantin Belousov <kib@FreeBSD.org>	Only call sigdeferstop() for NFS. Use bypass to catch any NFS VOP dispatch and route it through the wrapper which does sigdeferstop() and then dispatches original VOP. NFS does not need a bypass below it, which is not supported. The vop offset in the vop_vector is added since otherwise it is impossible to get vop_op_t from the internal table, and I did not wanted to create the layered fs only to wrap NFS VOPs. VFS_OP()s wrap is straightforward. Requested and reviewed by: mjg (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D17658
# 51369649	20-Nov-2017	Pedro F. Giffuni <pfg@FreeBSD.org>	sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.
# b59ea730	20-Aug-2017	Konstantin Belousov <kib@FreeBSD.org>	Allow vinvalbuf() to operate with the shared vnode lock. This mode allows other clean buffers to arrive while we flush the buf lists for the vnode, which is fine for the targeted use. We only need that all buffers existed at the time of the function start were flushed. In fact, only one assert has to be relaxed. In collaboration with: pho Reviewed by: rmacklem Sponsored by: The FreeBSD Foundation MFC after: 2 weeks X-Differential revision: https://reviews.freebsd.org/D12083
# 77d3337c	31-Jul-2017	Dmitry Chagin <dchagin@FreeBSD.org>	Implement proper Linux /dev/fd and /proc/self/fd behavior by adding Linux specific things to the native fdescfs file system. Unlike FreeBSD, the Linux fdescfs is a directory containing a symbolic links to the actual files, which the process has open. A readlink(2) call on this file returns a full path in case of regular file or a string in a special format (type:[inode], anon_inode:<file-type>, etc..). As well as in a FreeBSD, opening the file in the Linux fdescfs directory is equivalent to duplicating the corresponding file descriptor. Here we have mutually exclusive requirements: - in case of readlink(2) call fdescfs lookup() method should return VLNK vnode otherwise our kern_readlink() fail with EINVAL error; - in the other calls fdescfs lookup() method should return non VLNK vnode. For what new vnode v_flag VV_READLINK was added, which is set if fdescfs has beed mounted with linrdlnk option an modified kern_readlinkat() to properly handle it. For now For Linux ABI compatibility mount fdescfs volume with linrdlnk option: mount -t fdescfs -o linrdlnk null /compat/linux/dev/fd Reviewed by: kib@ MFC after: 1 week Relnotes: yes
# 3df7ebc4	05-Jun-2017	Konstantin Belousov <kib@FreeBSD.org>	Add sysctl vfs.ino64_trunc_error controlling action on truncating inode number or link count for the ABI compat binaries. Right now, and by default after the change, too large 64bit values are silently truncated to 32 bits. Enabling the knob causes the system to return EOVERFLOW for stat(2) family of compat syscalls when some values cannot be completely represented by the old structures. For getdirentries(2), knob skips the dirents which would cause non-trivial truncation of d_ino. EOVERFLOW error is specified by the X/Open 1996 LFS document ('Adding Support for Arbitrary File Sizes to the Single UNIX Specification'). Based on the discussion with: bde Sponsored by: The FreeBSD Foundation
# 0c3c207f	02-Jun-2017	Gleb Smirnoff <glebius@FreeBSD.org>	For UNIX sockets make vnode point not to the socket, but to the UNIX PCB, since the latter is the thing that links together VFS and sockets. While here, make the union in the struct vnode anonymous.
# 03311f11	27-May-2017	Konstantin Belousov <kib@FreeBSD.org>	Use whole mnt_stat.f_fsid bits for st_dev. Since ino64 expanded dev_t to 64bit, make VOP_GETATTR(9) provide all bits of mnt_stat.f_fsid as va_fsid for vnodes on filesystems which use f_fsid. In particular, NFSv3 and sometimes NFSv4, and ZFS use this method or reporting st_dev by stat(2). Provide a new helper vn_fsid() to avoid duplicating code to copy f_fsid to va_fsid. Note that the change is mostly cosmetic. Its motivation is to avoid sign-extension of f_fsid[0] into 64bit dev_t value which happens after dev_t becomes 64bit.. Reviewed by: avg(zfs), rmacklem (nfs) (both for previous version) Sponsored by: The FreeBSD Foundation
# 69921123	23-May-2017	Konstantin Belousov <kib@FreeBSD.org>	Commit the 64-bit inode project. Extend the ino_t, dev_t, nlink_t types to 64-bit ints. Modify struct dirent layout to add d_off, increase the size of d_fileno to 64-bits, increase the size of d_namlen to 16-bits, and change the required alignment. Increase struct statfs f_mntfromname[] and f_mntonname[] array length MNAMELEN to 1024. ABI breakage is mitigated by providing compatibility using versioned symbols, ingenious use of the existing padding in structures, and by employing other tricks. Unfortunately, not everything can be fixed, especially outside the base system. For instance, third-party APIs which pass struct stat around are broken in backward and forward incompatible ways. Kinfo sysctl MIBs ABI is changed in backward-compatible way, but there is no general mechanism to handle other sysctl MIBS which return structures where the layout has changed. It was considered that the breakage is either in the management interfaces, where we usually allow ABI slip, or is not important. Struct xvnode changed layout, no compat shims are provided. For struct xtty, dev_t tty device member was reduced to uint32_t. It was decided that keeping ABI compat in this case is more useful than reporting 64-bit dev_t, for the sake of pstat. Update note: strictly follow the instructions in UPDATING. Build and install the new kernel with COMPAT_FREEBSD11 option enabled, then reboot, and only then install new world. Credits: The 64-bit inode project, also known as ino64, started life many years ago as a project by Gleb Kurtsou (gleb). Kirk McKusick (mckusick) then picked up and updated the patch, and acted as a flag-waver. Feedback, suggestions, and discussions were carried by Ed Maste (emaste), John Baldwin (jhb), Jilles Tjoelker (jilles), and Rick Macklem (rmacklem). Kris Moore (kris) performed an initial ports investigation followed by an exp-run by Antoine Brodin (antoine). Essential and all-embracing testing was done by Peter Holm (pho). The heavy lifting of coordinating all these efforts and bringing the project to completion were done by Konstantin Belousov (kib). Sponsored by: The FreeBSD Foundation (emaste, kib) Differential revision: https://reviews.freebsd.org/D10439
# 0226f659	05-Apr-2017	Konstantin Belousov <kib@FreeBSD.org>	Add V_VMIO flag for vinvalbuf(9) to indicate that the flush request was issued during VM-initiated i/o (pageout), so that the function does not try to flush or remove pages or wait for the vm object paging-in-progress counter. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week X-Differential revision: https://reviews.freebsd.org/D10241
# fbbd9655	28-Feb-2017	Warner Losh <imp@FreeBSD.org>	Renumber copyright clause 4 Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96
# 5afb134c	12-Dec-2016	Mateusz Guzik <mjg@FreeBSD.org>	vfs: add vrefact, to be used when the vnode has to be already active This allows blind increment of relevant counters which under contention is cheaper than inc-not-zero loops at least on amd64. Use it in some of the places which are guaranteed to see already active vnodes. Reviewed by: kib (previous version)
# 99e6e193	23-Nov-2016	Mark Johnston <markj@FreeBSD.org>	Release laundered vnode pages to the head of the inactive queue. The swap pager enqueues laundered pages near the head of the inactive queue to avoid another trip through LRU before reclamation. This change adds support for this behaviour to the vnode pager and makes use of it in UFS and ext2fs. Some ioflag handling is consolidated into a common subroutine so that this support can be easily extended to other filesystems which make use of the buffer cache. No changes are needed for ZFS since its putpages routine always undirties the pages before returning, and the laundry thread requeues the pages appropriately in this case. Reviewed by: alc, kib Differential Revision: https://reviews.freebsd.org/D8589
# f71d0856	07-Oct-2016	Konstantin Belousov <kib@FreeBSD.org>	Limit scope of the optimization in r306608 to dounmount() caller only. Other uses of cache_purgevfs() do rely on the cache purge for correct operations, when paths are invalidated without unmount. Reported and tested by: jkim Discussed with: mjg Sponsored by: The FreeBSD Foundation
# 5a22c958	06-Oct-2016	Bryan Drewery <bdrewery@FreeBSD.org>	Add vrecyclel() to vrecycle() a vnode with the interlock already held. Obtained from: OneFS Sponsored by: Dell EMC Isilon MFC after: 2 weeks
# 5bb81f9b	30-Sep-2016	Mateusz Guzik <mjg@FreeBSD.org>	vfs: batch free vnodes in per-mnt lists Previously free vnodes would always by directly returned to the global LRU list. With this change up to mnt_free_list_batch vnodes are collected first. syncer runs always return the batch regardless of its size. While vnodes on per-mnt lists are not counted as free, they can be returned in case of vnode shortage. Reviewed by: kib Tested by: pho
# 8660b707	30-Sep-2016	Mateusz Guzik <mjg@FreeBSD.org>	vfs: remove the __bo_vnode field from struct vnode The pointer can be obtained using __containerof instead. Reviewed by: kib
# 47e61f6c	15-Aug-2016	Konstantin Belousov <kib@FreeBSD.org>	Implement VOP_FDATASYNC() for msdosfs. Standard VOP_FSYNC() implementation just syncs data buffers, and due to this, is the correct and efficient implementation for msdosfs or any other filesystem which uses bufer cache trivially. Provide globally visible wrapper vop_stdfdatasync_buf() for future consumption by other filesystems. Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D7471
# d293d0c9	11-Aug-2016	Edward Tomasz Napierala <trasz@FreeBSD.org>	Remove unused textvp_fullpath() macro. MFC after: 1 month
# 411455a8	10-Aug-2016	Edward Tomasz Napierala <trasz@FreeBSD.org>	Replace all remaining calls to vprint(9) with vn_printf(9), and remove the old macro. MFC after: 1 month
# 7b255097	04-Aug-2016	Edward Tomasz Napierala <trasz@FreeBSD.org>	Remove unused - never actually implemented - vnode lock types from vnode_if.src. MFC after: 1 month
# e896fb3b	17-Jun-2016	Mateusz Guzik <mjg@FreeBSD.org>	vfs: ifdef out noop vop_* primitives on !DEBUG_VFS_LOCKS kernels This removes calls to empty functions like vop_lock_{pre/post} from common vfs routines. Approved by: re (gjb)
# f8a75278	17-Jun-2016	Konstantin Belousov <kib@FreeBSD.org>	Add VFS interface to flush specified amount of free vnodes belonging to mount points with the given filesystem type, specified by mount vfs_ops pointer. Based on patch by: mckusick Reviewed by: avg, mckusick Tested by: allanjude, madpilot Sponsored by: The FreeBSD Foundation Approved by: re (gjb)
# 3f7ca894	17-May-2016	Konstantin Belousov <kib@FreeBSD.org>	Ensure that ftruncate(2) is performed synchronously when file is opened in O_SYNC mode, at least for UFS. This also handles truncation, done due to the O_SYNC \| O_TRUNC flags combination to open(2), in synchronous way. Noted by: bde Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# 54a33d2f	11-May-2016	Konstantin Belousov <kib@FreeBSD.org>	Add vfs_hash_ref(9) function, which finds a vnode by the hash value and returns it referenced. The function is similar to vfs_hash_get(9), but unlike the later, returned vnode is not locked. This operation cannot be requested with the vget(9) flags. Reviewed and tested by: rmacklem Sponsored by: The FreeBSD Foundation MFC after: 1 week
# cd85d599	11-May-2016	Konstantin Belousov <kib@FreeBSD.org>	Style: wrap long lines. Sponsored by: The FreeBSD Foundation MFC after: 1 week
# c89e1b87	03-May-2016	Konstantin Belousov <kib@FreeBSD.org>	Add EVFILT_VNODE open, read and close notifications. While there, order EVFILT_VNODE notes descriptions alphabetically. Based on submission, and tested by: Vladimir Kondratyev <wulf@cicgroup.ru> MFC after: 2 weeks
# 399e8c17	09-Mar-2016	John Baldwin <jhb@FreeBSD.org>	Simplify AIO initialization now that it is standard. - Mark AIO system calls as STD and remove the helpers to dynamically register them. - Use COMPAT6 for the old system calls with the older sigevent instead of an 'o' prefix. - Simplify the POSIX configuration to note that AIO is always available. - Handle AIO in the default VOP_PATHCONF instead of special casing it in the pathconf() system call. fpathconf() is still hackish. - Remove freebsd32_aio_cancel() as it just called the native one directly. Reviewed by: kib Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D5589
# 0791e0c0	24-Feb-2016	Konstantin Belousov <kib@FreeBSD.org>	Provide more correct sizing of the KVA consumed by a vnode, used by the virtvnodes calculation. Include the size of fs-specific v_data as the nfs nclnode inline, the NFS nclnode is bigger than either ZFS znode or UFS inode. Include the size of namecache_ts and short cache path element, multiplied by the name cache population factor, again inline. Inline defines are used to avoid pollution of the vnode.h with the subsystem-private objects. Non-significant unsynchronized changes of the definitions are fine, we do not care about that precision, and e.g. ZFS consumes much malloced memory per vnode for reasons unaccounted in the formula. Lower the partition of kmem dedicated to vnodes, from 1/7 to 1/10. The measures reduce vnode cache pressure on kmem and bring the vnode cache memory use below some apparent thresholds that were exceeded by r291244 due to more robust vnode reuse. Reported and tested by: marius (i386, previous version) Reviewed by: bde Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# 793c3817	18-Jan-2016	Mark Johnston <markj@FreeBSD.org>	Add vrefl(), a locked variant of vref(9). This API has no in-tree consumers at the moment but is useful to at least one out-of-tree consumer, and naturally complements existing vnode refcount functions (vholdl(9), vdropl(9)). Obtained from: kib (sys/ portion) Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D4947 Differential Revision: https://reviews.freebsd.org/D4953
# 106ebb76	16-Dec-2015	Konstantin Belousov <kib@FreeBSD.org>	Optimize vop_stdadvise(POSIX_FADV_DONTNEED). Instead of looking up a buffer for each block number in the range with gbincore(), look up the next instantiated buffer with the logical block number which is greater or equal to the next lblkno. This significantly speeds up the iteration for sparce-populated range. Move the iteration into new helper bnoreuselist(), which is structured similarly to flushbuflist(). Reported and tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation
# f186a80d	26-Nov-2015	Konstantin Belousov <kib@FreeBSD.org>	Remove VI_AGE vnode iflag, it is unused. Noted by: bde Sponsored by: The FreeBSD Foundation
# 412ce274	16-Sep-2015	Steven Hartland <smh@FreeBSD.org>	Fix kqueue write events for files > 2GB Due to the use of int's for file offsets in the VOP_WRITE_(PRE\|POST) macros, kqueue write events for files greater 2GB where never fired. This caused tail -f on a file greater 2GB to never see updates. MFC after: 1 week Relnotes: YES Sponsored by: Multiplay
# 55d33667	15-Sep-2015	Conrad Meyer <cem@FreeBSD.org>	kevent(2): Note DOOMED vnodes with NOTE_REVOKE In poll mode, check for and wake VBAD vnodes. (Vnodes that are VBAD at registration will never be woken by the RECLAIM trigger.) Add post-VOP_RECLAIM hook to trigger notes on vnode reclamation. (Vnodes that were fine at registration but are vgoned while being monitored should signal waiters.) Reviewed by: kib Approved by: markj (mentor) Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D3675
# 17518b1a	05-Sep-2015	Kirk McKusick <mckusick@FreeBSD.org>	Track changes to kern.maxvnodes and appropriately increase or decrease the size of the name cache hash table (mapping file names to vnodes) and the vnode hash table (mapping mount point and inode number to vnode). An appropriate locking strategy is the key to changing hash table sizes while they are in active use. Reviewed by: kib Tested by: Peter Holm Differential Revision: https://reviews.freebsd.org/D2265 MFC after: 2 weeks
# c9ba6504	24-Aug-2015	Edward Tomasz Napierala <trasz@FreeBSD.org>	Make vfs_unmountall() unmount /dev after /, not before. The only reason this didn't result in an unclean shutdown is that devfs ignores MNT_FORCE flag. Reviewed by: kib@ MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D3467
# 6e572e08	23-Aug-2015	Edward Tomasz Napierala <trasz@FreeBSD.org>	After r286237 it should be fine to call vgone(9) on a busy GEOM vnode; remove KASSERT that would prevent forced devfs unmount from working. MFC after: 1 month Sponsored by: The FreeBSD Foundation
# 752fc07d	16-Jul-2015	Mateusz Guzik <mjg@FreeBSD.org>	vfs: implement v_holdcnt/v_usecount manipulation using atomic ops Transitions 0->1 and 1->0 (which decide e.g. on putting the vnode on the free list) of either counter are still guarded with vnode interlock. Reviewed by: kib (earlier version) Tested by: pho
# f0725a8e	11-Jul-2015	Mateusz Guzik <mjg@FreeBSD.org>	Move chdir/chroot-related fdp manipulation to kern_descrip.c Prefix exported functions with pwd_. Deduplicate some code by adding a helper for setting fd_cdir. Reviewed by: kib
# f6f6d240	10-Jun-2015	Mateusz Guzik <mjg@FreeBSD.org>	Implement lockless resource limits. Use the same scheme implemented to manage credentials. Code needing to look at process's credentials (as opposed to thred's) is provided with *_proc variants of relevant functions. Places which possibly had to take the proc lock anyway still use the proc pointer to access limits.
# 2db0e1f5	27-May-2015	Konstantin Belousov <kib@FreeBSD.org>	Add V_MNTREF flag to the vn_start_write(9) and vn_start_secondary_write(9) functions. The flag indicates that the caller already owns a reference on the mount point, and the functions can consume it. The reference is released by vn_finished_write(9) and vn_finished_secondary_write(9) in due course. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# d5fec489	21-Apr-2015	Craig Rodrigues <rodrigc@FreeBSD.org>	Support file verification in MAC. * Add VCREAT flag to indicate when a new file is being created * Add VVERIFY to indicate verification is required * Both VCREAT and VVERIFY are only passed on the MAC method vnode_check_open and are removed from the accmode after * Add O_VERIFY flag to rtld open of objects * Add 'v' flag to __sflags to set O_VERIFY flag. Submitted by: Steve Kiernan <stevek@juniper.net> Obtained from: Juniper Networks, Inc. GitHub Pull Request: https://github.com/freebsd/freebsd/pull/27 Relnotes: yes
# 08189ed6	27-Feb-2015	Konstantin Belousov <kib@FreeBSD.org>	The VNASSERT in vflush() FORCECLOSE case is trying to panic early to prevent errors from yanking devices out from under filesystems. Only care about special vnodes on devfs, special nodes on other kinds of filesystems do not have special properties. Sponsored by: EMC / Isilon Storage Division Submitted by: Conrad Meyer MFC after: 1 week
# 8ee9765a	21-Dec-2014	Konstantin Belousov <kib@FreeBSD.org>	Add VN_OPEN_NAMECACHE flag for vn_open_cred(9), which requests that the created file name was cached. Use the flag for core dumps. Requested by: rpaulo Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# 90effb23	22-Nov-2014	Gleb Smirnoff <glebius@FreeBSD.org>	Merge from projects/sendfile: o Provide a new VOP_GETPAGES_ASYNC(), which works like VOP_GETPAGES(), but doesn't sleep. It returns immediately, and will execute the I/O done handler function that must be supplied as argument. o Provide VOP_GETPAGES_ASYNC() for the FFS, which uses vnode_pager. o Extend pagertab to support pgo_getpages_async method, and implement this method for vnode_pager. Reviewed by: kib Tested by: pho Sponsored by: Netflix Sponsored by: Nginx, Inc.
# f12aa60c	15-Oct-2014	Konstantin Belousov <kib@FreeBSD.org>	When vnode bypass cannot be performed on the cdev file descriptor for read/write/poll/ioctl, call standard vnode filedescriptor fop. This restores the special handling for terminals by calling the deadfs VOP, instead of always returning ENXIO for destroyed devices or revoked terminals. Since destroyed (and not revoked) device would use devfs_specops VOP vector, make dead_read/write/poll non-static and fill VOP table with pointers to the functions, to instead of VOP_PANIC. Noted and reviewed by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week
# e3d6fece	04-Oct-2014	Konstantin Belousov <kib@FreeBSD.org>	Add IO_RANGELOCKED flag for vn_rdwr(9), which specifies that vnode is not locked, but range is. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# bf66496e	01-Oct-2014	Will Andrews <will@FreeBSD.org>	Embellish a comment regarding the reliability of DEBUG_VFS_LOCKS. Submitted by: kib
# 575e02d9	29-Aug-2014	Konstantin Belousov <kib@FreeBSD.org>	Add function and wrapper to switch lockmgr and vnode lock back to auto-promotion of shared to exclusive. Tested by: hrs, pho Sponsored by: The FreeBSD Foundation MFC after: 1 week
# 895b3782	14-Jul-2014	Konstantin Belousov <kib@FreeBSD.org>	Extract the code to put a filesystem into the suspended state (at the unmount time) in the helper vfs_write_suspend_umnt(). Use it instead of two inline copies in FFS. Fix the bug in the FFS unmount, when suspension failed, the ufs extattrs were not reinitialized. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# a6945216	14-Jul-2014	Konstantin Belousov <kib@FreeBSD.org>	Generalize vn_get_ino() to allow filesystems to use custom vnode producer, instead of hard-coding VFS_VGET(). New function, which takes callback, is called vn_get_ino_gen(), standard callback for vn_get_ino() is provided. Convert inline copies of vn_get_ino() in msdosfs and cd9660 into the uses of vn_get_ino_gen(). Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# 7b81a399	17-Jun-2014	Konstantin Belousov <kib@FreeBSD.org>	In msdosfs_setattr(), add a check for result of the utimes(2) permissions test, forgotten in r164033. Refactor the permission checks for utimes(2) into vnode helper function vn_utimes_perm(9), and simplify its code comparing with the UFS origin, by writing the call to VOP_ACCESSX only once. Use the helper for UFS(5), tmpfs(5), devfs(5) and msdosfs(5). Reported by: bde Reviewed by: bde, trasz Sponsored by: The FreeBSD Foundation MFC after: 1 week
# cc3d8c35	09-Jul-2013	Konstantin Belousov <kib@FreeBSD.org>	There are several code sequences like vfs_busy(mp); vfs_write_suspend(mp); which are problematic if other thread starts unmount between two calls. The unmount starts a write, while vfs_write_suspend() drain writers. On the other hand, unmount drains busy references, causing the deadlock. Add a flag argument to vfs_write_suspend and require the callers of it to specify VS_SKIP_UNMOUNT flag, when the call is performed not in the mount path, i.e. the covered vnode is not locked. The suspension is not attempted if VS_SKIP_UNMOUNT is specified and unmount is in progress. Reported and tested by: Andreas Longwitz <longwitz@incore.de> Sponsored by: The FreeBSD Foundation MFC after: 3 weeks
# 3289d587	20-Mar-2013	Kirk McKusick <mckusick@FreeBSD.org>	When renaming a directory from one parent directory to another, we need to call ufs_checkpath() to walk from our new location to the root of the filesystem to ensure that we do not encounter ourselves along the way. Until now, we accomplished this by reading the ".." entries of each directory in our path until we reached the root (or encountered an error). This change tries to avoid the I/O of reading the ".." entries by first looking them up in the name cache and only doing the I/O when the name cache lookup fails. Reviewed by: kib Tested by: Peter Holm MFC after: 4 weeks
# 5f5f0554	15-Mar-2013	Konstantin Belousov <kib@FreeBSD.org>	Implement the helper function vn_io_fault_pgmove(), intended to use by the filesystem VOP_READ() and VOP_WRITE() implementations in the same way as vn_io_fault_uiomove() over the unmapped buffers. Helper provides the convenient wrapper over the pmap_copy_pages() for struct uio consumers, taking care of the TDP_UIOHELD situations. Sponsored by: The FreeBSD Foundation Tested by: pho MFC after: 2 weeks
# ba05dec5	27-Feb-2013	Konstantin Belousov <kib@FreeBSD.org>	The softdep freeblks workitem might hold a reference on the dquot. Current dqflush() panics when a dquot with with non-zero refcount is encountered. The situation is possible, because quotas are turned off before softdep workitem queue if flushed, due to the quota file writes might create softdep workitems. Make the encountering an active dquot in dqflush() not fatal, return the error from quotaoff() instead. Ignore the quotaoff() failures when ffs_flushfiles() is called in the course of softdep_flushfiles() loop, until the last iteration. At the last loop, the quotas must be closed, and because SU workitems should be already flushed, the references to dquot are gone. Sponsored by: The FreeBSD Foundation Reported and tested by: pho Reviewed by: mckusick MFC after: 2 weeks
# ab52a230	13-Jan-2013	Konstantin Belousov <kib@FreeBSD.org>	Rearrange the struct bufobj and struct vnode layouts to reduce padding. On the amd64 kernel with INVARIANTS turned off, size of the struct vnode is reduced from 496 to 472 bytes, saving 24 bytes of memory and KVA per vnode. Noted and reviewed by: peter Tested by: pho Sponsored by: The FreeBSD Foundation
# f6af8e37	13-Jan-2013	Konstantin Belousov <kib@FreeBSD.org>	Add exported vfs_hash_index() function, which calculates the canonical pre-masked hash for the given vnode. The function assumes that vp->v_hash is initialized by the filesystem vnode instantiation function. At the moment, it is only done if filesystem uses vfs_hash_insert(). Reviewed by: peter Tested by: peter, pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 5 days
# ddd6b3fc	10-Jan-2013	Konstantin Belousov <kib@FreeBSD.org>	Add flags argument to vfs_write_resume() and remove vfs_write_resume_flags(). Sponsored by: The FreeBSD Foundation
# f99cb34c	01-Jan-2013	Konstantin Belousov <kib@FreeBSD.org>	The process_deferred_inactive() function locks the vnodes of the ufs mount, which means that is must not be called while the snaplock is owned. The vfs_write_resume(9) does call the function as the VFS_SUSP_CLEAN() method, which is too early and falls into the region still protected by snaplock. Add yet another flag for the vfs_write_resume_flags() to avoid calling suspension cleanup handler after the suspend is lifted, and use it in the ffs_snapshot() call to vfs_write_resume. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
# 91e94745	28-Dec-2012	Konstantin Belousov <kib@FreeBSD.org>	Make it possible to atomically resume writes on the mount and account the write start, by adding a variation of the vfs_write_resume(9) which accepts flags. Use the new function to prevent a deadlock between parallel suspension and snapshotting a UFS mount. The ffs_snapshot() code performed vfs_write_resume() followed by vn_start_write() while owning the snaplock. If the suspension intervene between resume and vn_start_write(), the deadlock occured after the suspending thread tried to lock the snaplock, most typically during the write in the ffs_copyonwrite(). Reported and tested by: Andreas Longwitz <longwitz@incore.de> Reviewed by: mckusick MFC after: 2 weeks X-MFC-note: make the vfs_write_resume(9) function a macro after the MFC, in HEAD
# f121e3e8	27-Nov-2012	Pawel Jakub Dawidek <pjd@FreeBSD.org>	- Add NOCAPCHECK flag to namei that allows lookup to work even if the process is in capability mode. - Add VN_OPEN_NOCAPCHECK flag for vn_open_cred() to will ne converted into NOCAPCHECK namei flag. This functionality will be used to enable core dumps for sandboxed processes. Reviewed by: rwatson Obtained from: WHEEL Systems MFC after: 2 weeks
# 9b233e23	14-Oct-2012	Konstantin Belousov <kib@FreeBSD.org>	Add a KPI to allow to reserve some amount of space in the numvnodes counter, without actually allocating the vnodes. The supposed use of the getnewvnode_reserve(9) is to reclaim enough free vnodes while the code still does not hold any resources that might be needed during the reclamation, and to consume the slack later for getnewvnode() calls made from the innards. After the critical block is finished, the caller shall free any reserve left, by getnewvnode_drop_reserve(9). Reviewed by: avg Tested by: pho MFC after: 1 week
# 7ac1b61a	08-Jun-2012	John Baldwin <jhb@FreeBSD.org>	Split the second half of vn_open_cred() (after a vnode has been found via a lookup or created via VOP_CREATE()) into a new vn_open_vnode() function and use this function in fhopen() instead of duplicating code from vn_open_cred() directly. Tested by: pho Reviewed by: kib MFC after: 2 weeks
# 41014d99	30-May-2012	Konstantin Belousov <kib@FreeBSD.org>	vn_io_fault() is a facility to prevent page faults while filesystems perform copyin/copyout of the file data into the usermode buffer. Typical filesystem hold vnode lock and some buffer locks over the VOP_READ() and VOP_WRITE() operations, and since page fault handler may need to recurse into VFS to get the page content, a deadlock is possible. The facility works by disabling page faults handling for the current thread and attempting to execute i/o while allowing uiomove() to access the usermode mapping of the i/o buffer. If all buffer pages are resident, uiomove() is successfull and request is finished. If EFAULT is returned from uiomove(), the pages backing i/o buffer are faulted in and held, and the copyin/out is performed using uiomove_fromphys() over the held pages for the second attempt of VOP call. Since pages are hold in chunks to prevent large i/o requests from starving free pages pool, and since vnode lock is only taken for i/o over the current chunk, the vnode lock no longer protect atomicity of the whole i/o request. Use newly added rangelocks to provide the required atomicity of i/o regardind other i/o and truncations. Filesystems need to explicitely opt-in into the scheme, by setting the MNTK_NO_IOPF struct mount flag, and optionally by using vn_io_fault_uiomove(9) helper which takes care of calling uiomove() or converting uio into request for uiomove_fromphys(). Reviewed by: bf (comments), mdf, pjd (previous version) Tested by: pho Tested by: flo, Gustau P?rez <gperez entel upc edu> (previous version) MFC after: 2 months
# 8f0e9130	30-May-2012	Konstantin Belousov <kib@FreeBSD.org>	Add a rangelock implementation, intended to be used to range-locking the i/o regions of the vnode data space. The implementation is quite simple-minded, it uses the list of the lock requests, ordered by arrival time. Each request may be for read or for write. The implementation is fair FIFO. MFC after: 2 month
# c83f909c	30-May-2012	Konstantin Belousov <kib@FreeBSD.org>	Clarify that the v_lockf is advisory lock list. MFC after: 3 days
# 292520f7	25-May-2012	Konstantin Belousov <kib@FreeBSD.org>	Add a vn_bmap_seekhole(9) vnode helper which can be used by any filesystem which supports VOP_BMAP(9) to implement SEEK_HOLE/SEEK_DATA commands for lseek(2). MFC after: 2 weeks
# af6e6b87	23-Apr-2012	Edward Tomasz Napierala <trasz@FreeBSD.org>	Remove unused thread argument to vrecycle(). Reviewed by: kib
# c52fd858	23-Apr-2012	Edward Tomasz Napierala <trasz@FreeBSD.org>	Remove unused thread argument from vtruncbuf(). Reviewed by: kib
# f257ebbb	20-Apr-2012	Kirk McKusick <mckusick@FreeBSD.org>	This change creates a new list of active vnodes associated with a mount point. Active vnodes are those with a non-zero use or hold count, e.g., those vnodes that are not on the free list. Note that this list is in addition to the list of all the vnodes associated with a mount point. To avoid adding another set of linkage pointers to the vnode structure, the active list uses the existing linkage pointers used by the free list (previously named v_freelist, now renamed v_actfreelist). This update adds the MNT_VNODE_FOREACH_ACTIVE interface that loops over just the active vnodes associated with a mount point (typically less than 1% of the vnodes associated with the mount point). Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks
# 73305eb8	17-Apr-2012	Kirk McKusick <mckusick@FreeBSD.org>	Drop export of vdestroy() function from kern/vfs_subr.c as it is used only as a helper function in that file. Replace sole call to vbusy() with inline code in vholdl(). Replace sole calls to vfree() and vdestroy() with inline code in vdropl(). The Clang compiler already inlines these functions, so they do not show up in a kernel backtrace which is confusing. Also you cannot set their frame in kgdb which means that it is impossible to view their local variables. So, while the produced code is unchanged, the debugging should be easier. Discussed with: kib MFC after: 2 weeks
# ecb6e528	11-Apr-2012	Kirk McKusick <mckusick@FreeBSD.org>	Export vinactive() from kern/vfs_subr.c (e.g., make it no longer static and declare its prototype in sys/vnode.h) so that it can be called from process_deferred_inactive() (in ufs/ffs/ffs_snapshot.c) instead of the body of vinactive() being cut and pasted into process_deferred_inactive(). Reviewed by: kib MFC after: 2 weeks
# bd944e01	24-Mar-2012	Edward Tomasz Napierala <trasz@FreeBSD.org>	Remove unused define. Discussed with: kib
# b80dcb55	10-Mar-2012	Konstantin Belousov <kib@FreeBSD.org>	Remove fifo.h. The only used function declaration from the header is migrated to sys/vnode.h. Submitted by: gianni
# 5e99212d	02-Mar-2012	Rick Macklem <rmacklem@FreeBSD.org>	Post r230394, the Lookup RPC counts for both NFS clients increased significantly. Upon investigation this was caused by name cache misses for lookups of "..". For name cache entries for non-".." directories, the cache entry serves double duty. It maps both the named directory plus ".." for the parent of the directory. As such, two ctime values (one for each of the directory and its parent) need to be saved in the name cache entry. This patch adds an entry for ctime of the parent directory to the name cache. It also adds an additional uma zone for large entries with this time value, in order to minimize memory wastage. As well, it fixes a couple of cases where the mtime of the parent directory was being saved instead of ctime for positive name cache entries. With this patch, Lookup RPC counts return to values similar to pre-r230394 kernels. Reported by: bde Discussed with: kib Reviewed by: jhb MFC after: 2 weeks
# c7e41c8b	29-Feb-2012	Mikolaj Golub <trociny@FreeBSD.org>	Introduce VOP_UNP_BIND(), VOP_UNP_CONNECT(), and VOP_UNP_DETACH() operations for setting and accessing vnode's v_socket field. The operations are necessary to implement proper unix socket handling on layered file systems like nullfs(5). This change fixes the long standing issue with nullfs(5) being in that unix sockets did not work between lower and upper layers: if we bound to a socket on the lower layer we could connect only to the lower path; if we bound to the upper layer we could connect only to the upper path. The new behavior is one can connect to both the lower and the upper paths regardless what layer path one binds to. PR: kern/51583, kern/159663 Suggested by: kib Reviewed by: arch MFC after: 2 weeks
# 662c901c	25-Feb-2012	Mikolaj Golub <trociny@FreeBSD.org>	When detaching an unix domain socket, uipc_detach() checks unp->unp_vnode pointer to detect if there is a vnode associated with (binded to) this socket and does necessary cleanup if there is. The issue is that after forced unmount this check may be too late as the unp_vnode is reclaimed and the reference is stale. To fix this provide a helper function that is called on a socket vnode reclamation to do necessary cleanup. Pointed by: kib Reviewed by: kib MFC after: 2 weeks
# 526d0bd5	20-Feb-2012	Konstantin Belousov <kib@FreeBSD.org>	Fix found places where uio_resid is truncated to int. Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from the usermode. Discussed with: bde, das (previous versions) MFC after: 1 month
# c0ae88db	08-Feb-2012	Konstantin Belousov <kib@FreeBSD.org>	Trim 8 unused bytes from struct vnode on 64-bit architectures. Reviewed by: alc
# bf40d24a	06-Feb-2012	John Baldwin <jhb@FreeBSD.org>	Rename cache_lookup_times() to cache_lookup() and retire the old API and ABI stub for cache_lookup().
# c480f781	06-Feb-2012	Konstantin Belousov <kib@FreeBSD.org>	Current implementations of sync(2) and syncer vnode fsync() VOP uses mnt_noasync counter to temporary remove MNTK_ASYNC mount option, which is needed to guarantee a synchronous completion of the initiated i/o before syscall or VOP return. Global removal of MNTK_ASYNC option is harmful because not only i/o started from corresponding thread becomes synchronous, but all i/o is synchronous on the filesystem which is initiated during sync(2) or syncer activity. Instead of removing MNTK_ASYNC from mnt_kern_flag, provide a local thread flag to disable async i/o for current thread only. Use the opportunity to move DOINGASYNC() macro into sys/vnode.h and consistently use it through places which tested for MNTK_ASYNC. Some testing demonstrated 60-70% improvements in run time for the metadata-intensive operations on async-mounted UFS volumes, but still with great deviation due to other reasons. Reviewed by: mckusick Tested by: scottl MFC after: 2 weeks
# 5aefb4cb	20-Jan-2012	John Baldwin <jhb@FreeBSD.org>	Close a race in NFS lookup processing that could result in stale name cache entries on one client when a directory was renamed on another client. The root cause for the stale entry being trusted is that each per-vnode nfsnode structure has a single 'n_ctime' timestamp used to validate positive name cache entries. However, if there are multiple entries for a single vnode, they all share a single timestamp. To fix this, extend the name cache to allow filesystems to optionally store a timestamp value in each name cache entry. The NFS clients now fetch the timestamp associated with each name cache entry and use that to validate cache hits instead of the timestamps previously stored in the nfsnode. Another part of the fix is that the NFS clients now use timestamps from the post-op attributes of RPCs when adding name cache entries rather than pulling the timestamps out of the file's attribute cache. The latter is subject to races with other lookups updating the attribute cache concurrently. Some more details: - Add a variant of nfsm_postop_attr() to the old NFS client that can return a vattr structure with a copy of the post-op attributes. - Handle lookups of "." as a special case in the NFS clients since the name cache does not store name cache entries for ".", so we cannot get a useful timestamp. It didn't really make much sense to recheck the attributes on the the directory to validate the namecache hit for "." anyway. - ABI compat shims for the name cache routines are present in this commit so that it is safe to MFC. MFC after: 2 weeks
# f6e633a9	14-Jan-2012	Martin Matuska <mm@FreeBSD.org>	Introduce vn_path_to_global_path() This function updates path string to vnode's full global path and checks the size of the new path string against the pathlen argument. In vfs_domount(), sys_unmount() and kern_jail_set() this new function is used to update the supplied path argument to the respective global path. Unbreaks jailed zfs(8) with enforce_statfs set to 1. Reviewed by: kib MFC after: 1 month
# f0d6c5ca	23-Dec-2011	John Baldwin <jhb@FreeBSD.org>	Add post-VOP hooks for VOP_DELETEEXTATTR() and VOP_SETEXTATTR() and use these to trigger a NOTE_ATTRIB EVFILT_VNODE kevent when the extended attributes of a vnode are changed. Note that OS X already implements this behavior. Reviewed by: rwatson MFC after: 2 weeks
# 936c09ac	03-Nov-2011	John Baldwin <jhb@FreeBSD.org>	Add the posix_fadvise(2) system call. It is somewhat similar to madvise(2) except that it operates on a file descriptor instead of a memory region. It is currently only supported on regular files. Just as with madvise(2), the advice given to posix_fadvise(2) can be divided into two types. The first type provide hints about data access patterns and are used in the file read and write routines to modify the I/O flags passed down to VOP_READ() and VOP_WRITE(). These modes are thus filesystem independent. Note that to ease implementation (and since this API is only advisory anyway), only a single non-normal range is allowed per file descriptor. The second type of hints are used to hint to the OS that data will or will not be used. These hints are implemented via a new VOP_ADVISE(). A default implementation is provided which does nothing for the WILLNEED request and attempts to move any clean pages to the cache page queue for the DONTNEED request. This latter case required two other changes. First, a new V_CLEANONLY flag was added to vinvalbuf(). This requests vinvalbuf() to only flush clean buffers for the vnode from the buffer cache and to not remove any backing pages from the vnode. This is used to ensure clean pages are not wired into the buffer cache before attempting to move them to the cache page queue. The second change adds a new vm_object_page_cache() method. This method is somewhat similar to vm_object_page_remove() except that instead of freeing each page in the specified range, it attempts to move clean pages to the cache queue if possible. To preserve the ABI of struct file, the f_cdevpriv pointer is now reused in a union to point to the currently active advice region if one is present for regular files. Reviewed by: jilles, kib, arch@ Approved by: re (kib) MFC after: 1 month
# 82378711	25-Aug-2011	Martin Matuska <mm@FreeBSD.org>	Generalize ffs_pages_remove() into vn_pages_remove(). Remove mapped pages for all dataset vnodes in zfs_rezget() using new vn_pages_remove() to fix mmapped files changed by zfs rollback or zfs receive -F. PR: kern/160035, kern/156933 Reviewed by: kib, pjd Approved by: re (kib) MFC after: 1 week
# 9c00bb91	16-Aug-2011	Konstantin Belousov <kib@FreeBSD.org>	Add the fo_chown and fo_chmod methods to struct fileops and use them to implement fchown(2) and fchmod(2) support for several file types that previously lacked it. Add MAC entries for chown/chmod done on posix shared memory and (old) in-kernel posix semaphores. Based on the submission by: glebius Reviewed by: rwatson Approved by: re (bz)
# 91538d78	10-Jul-2011	Konstantin Belousov <kib@FreeBSD.org>	Update locking annotations for the struct vnode. MFC after: 3 days
# 280e091a	10-Jun-2011	Jeff Roberson <jeff@FreeBSD.org>	Implement fully asynchronous partial truncation with softupdates journaling to resolve errors which can cause corruption on recovery with the old synchronous mechanism. - Append partial truncation freework structures to indirdeps while truncation is proceeding. These prevent new block pointers from becoming valid until truncation completes and serialize truncations. - On completion of a partial truncate journal work waits for zeroed pointers to hit indirects. - softdep_journal_freeblocks() handles last frag allocation and last block zeroing. - vtruncbuf/ffs_page_remove moved into softdep_*_freeblocks() so it is only implemented in one place. - Block allocation failure handling moved up one level so it does not proceed with buf locks held. This permits us to do more extensive reclaims when filesystem space is exhausted. - softdep_sync_metadata() is broken into two parts, the first executes once at the start of ffs_syncvnode() and flushes truncations and inode dependencies. The second is called on each locked buf. This eliminates excessive looping and rollbacks. - Improve the mechanism in process_worklist_item() that handles acquiring vnode locks for handle_workitem_remove() so that it works more generally and does not loop excessively over the same worklist items on each call. - Don't corrupt directories by zeroing the tail in fsck. This is only done for regular files. - Push a fsync complete record for files that need it so the checker knows a truncation in the journal is no longer valid. Discussed with: mckusick, kib (ffs_pages_remove and ffs_truncate parts) Tested by: pho
# d91f88f7	18-Apr-2011	Matthew D Fleming <mdf@FreeBSD.org>	Add the posix_fallocate(2) syscall. The default implementation in vop_stdallocate() is filesystem agnostic and will run as slow as a read/write loop in userspace; however, it serves to correctly implement the functionality for filesystems that do not implement a VOP_ALLOCATE. Note that __FreeBSD_version was already bumped today to 900036 for any ports which would like to use this function. Also reserve space in the syscall table for posix_fadvise(2). Reviewed by: -arch (previous version)
# 08b163fa	02-Feb-2011	Matthew D Fleming <mdf@FreeBSD.org>	Put the general logic for being a CPU hog into a new function should_yield(). Use this in various places. Encapsulate the common case of check-and-yield into a new function maybe_yield(). Change several checks for a magic number of iterations to use should_yield() instead. MFC after: 1 week
# 730b63b0	19-Nov-2010	Konstantin Belousov <kib@FreeBSD.org>	Remove prtactive variable and related printf()s in the vop_inactive and vop_reclaim() methods. They seems to be unused, and the reported situation is normal for the forced unmount. MFC after: 1 week X-MFC-note: keep prtactive symbol in vfs_subr.c
# a7d5f7eb	19-Oct-2010	Jamie Gritton <jamie@FreeBSD.org>	A new jail(8) with a configuration file, to replace the work currently done by /etc/rc.d/jail.
# 3634d5b2	20-Aug-2010	John Baldwin <jhb@FreeBSD.org>	Add dedicated routines to toggle lockmgr flags such as LK_NOSHARE and LK_CANRECURSE after a lock is created. Use them to implement macros that otherwise manipulated the flags directly. Assert that the associated lockmgr lock is exclusively locked by the current thread when manipulating these flags to ensure the flag updates are safe. This last change required some minor shuffling in a few filesystems to exclusively lock a brand new vnode slightly earlier. Reviewed by: kib MFC after: 3 days
# 3979450b	06-Aug-2010	Konstantin Belousov <kib@FreeBSD.org>	Add new make_dev_p(9) flag MAKEDEV_ETERNAL to inform devfs that created cdev will never be destroyed. Propagate the flag to devfs vnodes as VV_ETERNVALDEV. Use the flags to avoid acquiring devmtx and taking a thread reference on such nodes. In collaboration with: pho MFC after: 1 month
# df17ff61	11-Jun-2010	Andriy Gapon <avg@FreeBSD.org>	vnode.h: expand debug macros to non-empty void statements when DEBUG_VFS_LOCKS is disabled MFC after: 2 weeks
# 7fd32ea9	12-May-2010	Zachary Loafman <zml@FreeBSD.org>	Add VOP_ADVLOCKPURGE so that the file system is called when purging locks (in the case where the VFS impl isn't using lf_*) Submitted by: Matthew Fleming <matthew.fleming@isilon.com> Reviewed by: zml, dfr
# 307d88b7	06-May-2010	Edward Tomasz Napierala <trasz@FreeBSD.org>	Style fixes and removal of unneeded variable. Submitted by: bde@
# b5f770bd	05-May-2010	Edward Tomasz Napierala <trasz@FreeBSD.org>	Move checking against RLIMIT_FSIZE into one place, vn_rlimit_fsize(). Reviewed by: kib
# bb9a8424	09-Apr-2010	Konstantin Belousov <kib@FreeBSD.org>	MFC r206093: Add function vop_rename_fail(9) that performs needed cleanup for locks and references of the VOP_RENAME(9) arguments. Use vop_rename_fail() in deadfs_rename().
# ea015880	02-Apr-2010	Konstantin Belousov <kib@FreeBSD.org>	Add function vop_rename_fail(9) that performs needed cleanup for locks and references of the VOP_RENAME(9) arguments. Use vop_rename_fail() in deadfs_rename(). Tested by: Mikolaj Golub MFC after: 1 week
# bd7ae209	27-Mar-2010	Edward Tomasz Napierala <trasz@FreeBSD.org>	MFC r197680: Provide default implementation for VOP_ACCESS(9), so that filesystems which want to provide VOP_ACCESSX(9) don't have to implement both. Note that this commit makes implementation of either of these two mandatory. Reviewed by: kib
# 78e32164	27-Mar-2010	Edward Tomasz Napierala <trasz@FreeBSD.org>	MFC r197405: Add pieces of infrastructure required for NFSv4 ACL support in UFS. Reviewed by: rwatson
# e9560b40	07-Feb-2010	Konstantin Belousov <kib@FreeBSD.org>	MFC r202528: Add vunref(9).
# d2f334bf	17-Jan-2010	Konstantin Belousov <kib@FreeBSD.org>	Add new function vunref(9) that decrements vnode use count (and hold count) while vnode is exclusively locked. The code for vput(9), vrele(9) and vunref(9) is merged. In collaboration with: pho Reviewed by: alc MFC after: 3 weeks
# 2d63cbda	10-Jan-2010	Konstantin Belousov <kib@FreeBSD.org>	MFC r200770: Remove VI_OBJDIRTY and make sure that OBJ_MIGHTBEDIRTY is set only for vnode-backed vm objects.
# 49e3050e	20-Dec-2009	Konstantin Belousov <kib@FreeBSD.org>	VI_OBJDIRTY vnode flag mirrors the state of OBJ_MIGHTBEDIRTY vm object flag. Besides providing the redundand information, need to update both vnode and object flags causes more acquisition of vnode interlock. OBJ_MIGHTBEDIRTY is only checked for vnode-backed vm objects. Remove VI_OBJDIRTY and make sure that OBJ_MIGHTBEDIRTY is set only for vnode-backed vm objects. Suggested and reviewed by: alc Tested by: pho MFC after: 3 weeks
# 2c29cfa0	01-Oct-2009	Edward Tomasz Napierala <trasz@FreeBSD.org>	Provide default implementation for VOP_ACCESS(9), so that filesystems which want to provide VOP_ACCESSX(9) don't have to implement both. Note that this commit makes implementation of either of these two mandatory. Reviewed by: kib
# a9315dde	22-Sep-2009	Edward Tomasz Napierala <trasz@FreeBSD.org>	Add pieces of infrastructure required for NFSv4 ACL support in UFS. Reviewed by: rwatson
# fe1d3f15	28-Jun-2009	Stanislav Sedov <stas@FreeBSD.org>	- Turn the third (islocked) argument of the knote call into flags parameter. Introduce the new flag KNF_NOKQLOCK to allow event callers to be called without KQ_LOCK mtx held. - Modify VFS knote calls to always use KNF_NOKQLOCK flag. This is required for ZFS as its getattr implementation may sleep. Approved by: re (rwatson) Reviewed by: kib MFC after: 2 weeks
# aec53722	25-Jun-2009	Edward Tomasz Napierala <trasz@FreeBSD.org>	Tweak comment.
# c808c963	21-Jun-2009	Konstantin Belousov <kib@FreeBSD.org>	Add explicit struct ucred * argument for VOP_VPTOCNP, to be used by vn_open_cred in default implementation. Valid struct ucred is needed for audit and MAC, and curthread credentials may be wrong. This further requires modifying the interface of vn_fullpath(9), but it is out of scope of this change. Reviewed by: rwatson
# e0c161b8	21-Jun-2009	Konstantin Belousov <kib@FreeBSD.org>	Add another flags argument to vn_open_cred. Use it to specify that some vn_open_cred invocations shall not audit namei path. In particular, specify VN_OPEN_NOAUDIT for dotdot lookup performed by default implementation of vop_vptocnp, and for the open done for core file. vn_fullpath is called from the audit code, and vn_open there need to disable audit to avoid infinite recursion. Core file is created on return to user mode, that, in particular, happens during syscall return. The creation of the core file is audited by direct calls, and we do not want to overwrite audit information for syscall. Reported, reviewed and tested by: rwatson
# f0830182	02-Jun-2009	Attilio Rao <attilio@FreeBSD.org>	Handle lock recursion differenty by always checking against LO_RECURSABLE instead the lock own flag itself. Tested by: pho
# 0449e6e1	31-May-2009	Konstantin Belousov <kib@FreeBSD.org>	Eliminate code duplication in vn_fullpath1() around the cache lookups and calls to vn_vptocnp() by moving more of the common code to vn_vptocnp(). Rename vn_vptocnp() to vn_vptocnp_locked() to signify that cache is locked around the call. Do not track buffer position by both the pointer and offset, use only buflen to record the start of the free space. Export vn_vptocnp() for external consumers as a wrapper around vn_vptocnp_locked() that locks the cache and handles hold counts. Tested by: pho
# c97fcdba	30-May-2009	Edward Tomasz Napierala <trasz@FreeBSD.org>	Add VOP_ACCESSX, which can be used to query for newly added V* permissions, such as VWRITE_ACL. For a filsystems that don't implement it, there is a default implementation, which works as a wrapper around VOP_ACCESS. Reviewed by: rwatson@
# 885868cd	10-Apr-2009	Robert Watson <rwatson@FreeBSD.org>	Remove VOP_LEASE and supporting functions. This hasn't been used since the removal of NQNFS, but was left in in case it was required for NFSv4. Since our new NFSv4 client and server can't use it for their requirements, GC the old mechanism, as well as other unused lease- related code and interfaces. Due to its impact on kernel programming and binary interfaces, this change should not be MFC'd. Proposed by: jeff Reviewed by: jeff Discussed with: rmacklem, zach loafman @ isilon
# 607fc40b	29-Mar-2009	Alexander Kabaev <kan@FreeBSD.org>	Replace v_dd vnode pointer with v_cache_dd pointer to struct namecache in directory vnodes. Allow namecache dotdot entry to be created pointing from child vnode to parent vnode if no existing links in opposite direction exist. Use direct link from parent to child for dotdot lookups otherwise. This restores more efficient dotdot caching in NFS filesystems which was lost when vnodes stoppped being type stable. Reviewed by: kib
# 6180d318	29-Mar-2009	Edward Tomasz Napierala <trasz@FreeBSD.org>	Get rid of VSTAT and replace it with VSTAT_PERMS, which is somewhat better defined. Approved by: rwatson (mentor)
# 54377204	27-Mar-2009	Edward Tomasz Napierala <trasz@FreeBSD.org>	Add new V* constants, neccessary for granular permission checks in NFSv4 ACLs. While here, get rid of VALLPERM; it wasn't used anyway. Approved by: rwatson (mentor)
# f0ffa083	08-Mar-2009	Joe Marcus Clarke <marcus@FreeBSD.org>	Add a prototype for the new vop_stdvptocnp function. Reviewed by: kib Approved by: kib Tested by: pho
# 03964c8e	19-Feb-2009	John Baldwin <jhb@FreeBSD.org>	Enable caching of negative pathname lookups in the NFS client. To avoid stale entries, we save a copy of the directory's modification time when the first negative cache entry was added in the directory's NFS node. When a negative cache entry is hit during a pathname lookup, the parent directory's modification time is checked. If it has changed, all of the negative cache entries for that parent are purged and the lookup falls back to using the RPC. This required adding a new cache_purge_negative() method to the name cache to purge only negative cache entries for a given directory. Submitted by: mohans, Rick Macklem, Ricardo Labiaga @ NetApp Reviewed by: mohans
# 9a734a28	28-Jan-2009	John Baldwin <jhb@FreeBSD.org>	Actually remove the VA_MARK_ATIME flag. This should have been in the earlier commit to add VOP_MARKATIME().
# e9aff357	21-Jan-2009	Konstantin Belousov <kib@FreeBSD.org>	Move the code from ufs_lookup.c used to do dotdot lookup, into the helper function. It is supposed to be useful for any filesystem that has to unlock dvp to walk to the ".." entry in lookup routine. Requested by: jhb Tested by: pho MFC after: 1 month
# 24b087e4	11-Dec-2008	Joe Marcus Clarke <marcus@FreeBSD.org>	Add a new error VOP, VOP_ENOENT. This function will simply return ENOENT. Reviewed by: arch Approved by: kib
# 1ba4a712	17-Nov-2008	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Update ZFS from version 6 to 13 and bring some FreeBSD-specific changes. This bring huge amount of changes, I'll enumerate only user-visible changes: - Delegated Administration Allows regular users to perform ZFS operations, like file system creation, snapshot creation, etc. - L2ARC Level 2 cache for ZFS - allows to use additional disks for cache. Huge performance improvements mostly for random read of mostly static content. - slog Allow to use additional disks for ZFS Intent Log to speed up operations like fsync(2). - vfs.zfs.super_owner Allows regular users to perform privileged operations on files stored on ZFS file systems owned by him. Very careful with this one. - chflags(2) Not all the flags are supported. This still needs work. - ZFSBoot Support to boot off of ZFS pool. Not finished, AFAIK. Submitted by: dfr - Snapshot properties - New failure modes Before if write requested failed, system paniced. Now one can select from one of three failure modes: - panic - panic on write error - wait - wait for disk to reappear - continue - serve read requests if possible, block write requests - Refquota, refreservation properties Just quota and reservation properties, but don't count space consumed by children file systems, clones and snapshots. - Sparse volumes ZVOLs that don't reserve space in the pool. - External attributes Compatible with extattr(2). - NFSv4-ACLs Not sure about the status, might not be complete yet. Submitted by: trasz - Creation-time properties - Regression tests for zpool(8) command. Obtained from: OpenSolaris
# 15bc6b2b	28-Oct-2008	Edward Tomasz Napierala <trasz@FreeBSD.org>	Introduce accmode_t. This is required for NFSv4 ACLs - it will be neccessary to add more V* constants, and the variables changed by this patch were often being assigned to mode_t variables, which is 16 bit. Approved by: rwatson (mentor)
# d7f03759	19-Oct-2008	Ulf Lilleengen <lulf@FreeBSD.org>	- Import the HEAD csup code which is the basis for the cvsmode work.
# 0d7935fd	10-Oct-2008	Attilio Rao <attilio@FreeBSD.org>	Remove the struct thread unuseful argument from bufobj interface. In particular following functions KPI results modified: - bufobj_invalbuf() - bufsync() and BO_SYNC() "virtual method" of the buffer objects set. Main consumers of bufobj functions are affected by this change too and, in particular, functions which changed their KPI are: - vinvalbuf() - g_vfs_close() Due to the KPI breakage, __FreeBSD_version will be bumped in a later commit. As a side note, please consider just temporary the 'curthread' argument passing to VOP_SYNC() (in bufsync()) as it will be axed out ASAP Reviewed by: kib Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
# bdb80947	16-Sep-2008	Konstantin Belousov <kib@FreeBSD.org>	Garbage-collect vn_write_suspend_wait(). Suggested and reviewed by: tegge Tested by: pho MFC after: 1 month
# dfa7fd1d	10-Sep-2008	Edward Tomasz Napierala <trasz@FreeBSD.org>	Remove VSVTX, VSGID and VSUID. This should be a no-op, as VSVTX == S_ISVTX, VSGID == S_ISGID and VSUID == S_ISUID. Approved by: rwatson (mentor)
# 0359a12e	28-Aug-2008	Attilio Rao <attilio@FreeBSD.org>	Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread was always curthread and totally unuseful. Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
# a888d54d	28-Aug-2008	Konstantin Belousov <kib@FreeBSD.org>	Introduce the VV_FORCEINSMQ vnode flag. It instructs the insmnque() function to ignore the unmounting and forces insertion of the vnode into the mount vnode list. Change insmntque() to fail when forced unmount is in progress and VV_FORCEINSMQ is not specified. Add an assertion to the insmntque(), requiring the vnode to be exclusively locked for mp-safe filesystems. Use the VV_FORCEINSMQ for the creation of the syncvnode. Tested by: pho Reviewed by: tegge MFC after: 1 month
# dfc714fb	31-Jul-2008	Christian S.J. Peron <csjp@FreeBSD.org>	Currently, BSM audit pathname token generation for chrooted or jailed processes are not producing absolute pathname tokens. It is required that audited pathnames are generated relative to the global root mount point. This modification changes our implementation of audit_canon_path(9) and introduces a new function: vn_fullpath_global(9) which performs a vnode -> pathname translation relative to the global mount point based on the contents of the name cache. Much like vn_fullpath, vn_fullpath_global is a wrapper function which called vn_fullpath1. Further, the string parsing routines have been converted to use the sbuf(9) framework. This change also removes the conditional acquisition of Giant, since the vn_fullpath1 method will not dip into file system dependent code. The vnode locking was modified to use vhold()/vdrop() instead the vref() and vrele(). This will modify the hold count instead of modifying the user count. This makes more sense since it's the kernel that requires the reference to the vnode. This also makes sure that the vnode does not get recycled we hold the reference to it. [1] Discussed with: rwatson Reviewed by: kib [1] MFC after: 2 weeks
# eab626f1	16-Apr-2008	Konstantin Belousov <kib@FreeBSD.org>	Move the head of byte-level advisory lock list from the filesystem-specific vnode data to the struct vnode. Provide the default implementation for the vop_advlock and vop_advlockasync. Purge the locks on the vnode reclaim by using the lf_purgelocks(). The default implementation is augmented for the nfs and smbfs. In the nfs_advlock, push the Giant inside the nfs_dolock. Before the change, the vop_advlock and vop_advlockasync have taken the unlocked vnode and dereferenced the fs-private inode data, racing with with the vnode reclamation due to forced unmount. Now, the vop_getattr under the shared vnode lock is used to obtain the inode size, and later, in the lf_advlockasync, after locking the vnode interlock, the VI_DOOMED flag is checked to prevent an operation on the doomed vnode. The implementation of the lf_purgelocks() is submitted by dfr. Reported by: kris Tested by: kris, pho Discussed with: jeff, dfr MFC after: 2 weeks
# 047dd67e	06-Apr-2008	Attilio Rao <attilio@FreeBSD.org>	Optimize lockmgr in order to get rid of the pool mutex interlock, of the state transitioning flags and of msleep(9) callings. Use, instead, an algorithm very similar to what sx(9) and rwlock(9) alredy do and direct accesses to the sleepqueue(9) primitive. In order to avoid writer starvation a mechanism very similar to what rwlock(9) uses now is implemented, with the correspective per-thread shared lockmgrs counter. This patch also adds 2 new functions to lockmgr KPI: lockmgr_rw() and lockmgr_args_rw(). These two are like the 2 "normal" versions, but they both accept a rwlock as interlock. In order to realize this, the general lockmgr manager function "__lockmgr_args()" has been implemented through the generic lock layer. It supports all the blocking primitives, but currently only these 2 mappers live. The patch drops the support for WITNESS atm, but it will be probabilly added soon. Also, there is a little race in the draining code which is also present in the current CVS stock implementation: if some sharers, once they wakeup, are in the runqueue they can contend the lock with the exclusive drainer. This is hard to be fixed but the now committed code mitigate this issue a lot better than the (past) CVS version. In addition assertive KA_HELD and KA_UNHELD have been made mute assertions because they are dangerous and they will be nomore supported soon. In order to avoid namespace pollution, stack.h is splitted into two parts: one which includes only the "struct stack" definition (_stack.h) and one defining the KPI. In this way, newly added _lockmgr.h can just include _stack.h. Kernel ABI results heavilly changed by this commit (the now committed version of "struct lock" is a lot smaller than the previous one) and KPI results broken by lockmgr_rw() / lockmgr_args_rw() introduction, so manpages and __FreeBSD_version will be updated accordingly. Tested by: kris, pho, jeff, danger Reviewed by: jeff Sponsored by: Google, Summer of Code program 2007
# 0a3af16a	31-Mar-2008	Konstantin Belousov <kib@FreeBSD.org>	Add the utility function vn_commname() to retrieve the command name from the vfs namecache, when available. Reviewed by: rwatson, rdivacky Tested by: pho
# 97735db7	23-Mar-2008	Jeff Roberson <jeff@FreeBSD.org>	- Remove an old comment; vnodes have been working without Giant for years now. - Clarify the locking required for VI_DOOMED in preparation for simplifications to vget() and vn_lock().
# 1be222e9	23-Mar-2008	Konstantin Belousov <kib@FreeBSD.org>	Yield the cpu in the kernel while iterating the list of the vnodes belonging to the mountpoint. Also, yield when in the softdep_process_worklist() even when we are not going to sleep due to buffer drain. It is believed that the ULE fixed the problem [1], but the yielding seems to be needed at least for the 4BSD case. Discussed: on stable@, with bde Reviewed by: tegge, jeff [1] MFC after: 2 weeks
# 7fbfba7b	01-Mar-2008	Attilio Rao <attilio@FreeBSD.org>	- Handle buffer lock waiters count directly in the buffer cache instead than rely on the lockmgr support [1]: * bump the waiters only if the interlock is held * let brelvp() return the waiters count * rely on brelvp() instead than BUF_LOCKWAITERS() in order to check for the waiters number - Remove a namespace pollution introduced recently with lockmgr.h including lock.h by including lock.h directly in the consumers and making it mandatory for using lockmgr. - Modify flags accepted by lockinit(): * introduce LK_NOPROFILE which disables lock profiling for the specified lockmgr * introduce LK_QUIET which disables ktr tracing for the specified lockmgr [2] * disallow LK_SLEEPFAIL and LK_NOWAIT to be passed there so that it can only be used on a per-instance basis - Remove BUF_LOCKWAITERS() and lockwaiters() as they are no longer used This patch breaks KPI so __FreBSD_version will be bumped and manpages updated by further commits. Additively, 'struct buf' changes results in a disturbed ABI also. [2] Really, currently there is no ktr tracing in the lockmgr, but it will be added soon. [1] Submitted by: kib Tested by: pho, Andrea Barberio <insomniac at slackware dot it>
# 628f51d2	24-Feb-2008	Attilio Rao <attilio@FreeBSD.org>	Introduce some functions in the vnode locks namespace and in the ffs namespace in order to handle lockmgr fields in a controlled way instead than spreading all around bogus stubs: - VN_LOCK_AREC() allows lock recursion for a specified vnode - VN_LOCK_ASHARE() allows lock sharing for a specified vnode In FFS land: - BUF_AREC() allows lock recursion for a specified buffer lock - BUF_NOREC() disallows recursion for a specified buffer lock Side note: union_subr.c::unionfs_node_update() is the only other function directly handling lockmgr fields. As this is not simple to fix, it has been left behind as "sole" exception.
# 22db15c0	13-Jan-2008	Attilio Rao <attilio@FreeBSD.org>	VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
# cb05b60a	09-Jan-2008	Attilio Rao <attilio@FreeBSD.org>	vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed. Manpage and FreeBSD_version will be updated through further commits. As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock. Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>
# 51662691	20-Oct-2007	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Remove redundant prototypes.
# 9e223287	31-May-2007	Konstantin Belousov <kib@FreeBSD.org>	Revert UF_OPENING workaround for CURRENT. Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation argument from being file descriptor index into the pointer to struct file. Proposed and reviewed by: jhb Reviewed by: daichi (unionfs) Approved by: re (kensmith)
# 950afe99	25-May-2007	Pawel Jakub Dawidek <pjd@FreeBSD.org>	The cache_leaf_test() function seems to be unused, so remove it.
# d413d210	18-May-2007	Konstantin Belousov <kib@FreeBSD.org>	Since renaming of vop_lock to _vop_lock, pre- and post-condition function calls are no more generated for vop_lock. Rename _vop_lock to vop_lock1 to satisfy tools/vnode_if.awk assumption about vop naming conventions. This restores pre/post-condition calls.
# e6534b36	31-Mar-2007	Dag-Erling Smørgrav <des@FreeBSD.org>	Make vdropl() public; zfs needs it. There is also plenty of existing file system code (mostly _reclaim()) which look like this: VOP_LOCK(vp); / examine vp / VOP_UNLOCK(vp); vdrop(vp); This can now be rewritten to: VOP_LOCK(vp); / examine vp / vdropl(vp); / will unlock vp */ MFC after: 1 week
# 61b9d89f	12-Mar-2007	Tor Egge <tegge@FreeBSD.org>	Make insmntque() externally visibile and allow it to fail (e.g. during late stages of unmount). On failure, the vnode is recycled. Add insmntque1(), to allow for file system specific cleanup when recycling vnode on failure. Change getnewvnode() to no longer call insmntque(). Previously, embryonic vnodes were put onto the list of vnode belonging to a file system, which is unsafe for a file system marked MPSAFE. Change vfs_hash_insert() to no longer lock the vnode. The caller now has that responsibility. Change most file systems to lock the vnode and call insmntque() or insmntque1() after a new vnode has been sufficiently setup. Handle failed insmntque*() calls by propagating errors to callers, possibly after some file system specific cleanup. Approved by: re (kensmith) Reviewed by: kib In collaboration with: kib
# 10bcafe9	15-Feb-2007	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Move vnode-to-file-handle translation from vfs_vptofh to vop_vptofh method. This way we may support multiple structures in v_data vnode field within one file system without using black magic. Vnode-to-file-handle should be VOP in the first place, but was made VFS operation to keep interface as compatible as possible with SUN's VFS. BTW. Now Solaris also implements vnode-to-file-handle as VOP operation. VFS_VPTOFH() was left for API backward compatibility, but is marked for removal before 8.0-RELEASE. Approved by: mckusick Discussed with: many (on IRC) Tested with: ufs, msdosfs, cd9660, nullfs and zfs
# 8fd14511	14-Dec-2006	Konstantin Belousov <kib@FreeBSD.org>	Use tab after #define. Pointed out by: pjd
# 3b7b5496	14-Dec-2006	Konstantin Belousov <kib@FreeBSD.org>	Resolve two deadlocks that could be caused by busy md device backed by vnode. Allow for md thread and the thread that owns lock on vnode backing the md device to do the write even when runningbufspace is exhausted. Tested by: Peter Holm Reviewed by: tegge MFC after: 2 weeks
# 2f6a774b	12-Nov-2006	Kip Macy <kmacy@FreeBSD.org>	change vop_lock handling to allowing tracking of callers' file and line for acquisition of lockmgr locks Approved by: scottl (standing in for mentor rwatson)
# 1a60c7fc	31-Oct-2006	Pawel Jakub Dawidek <pjd@FreeBSD.org>	Add gjournal specific code to the UFS file system: - Add FS_GJOURNAL flag which enables gjournal support on a file system. - Add cg_unrefs field to the cylinder group structure which holds number of unreferenced (orphaned) inodes in the given cylinder group. - Add fs_unrefs field to the super block structure which holds total number of unreferenced (orphaned) inodes. - When file or a directory is orphaned (last reference is removed, but object is still open), increase fs_unrefs and cg_unrefs fields, which is a hint for fsck in which cylinder groups looks for such (orphaned) objects. - When file is last closed, decrease {fs,cg}_unrefs fields. - Add VV_DELETED vnode flag which points at orphaned objects. Sponsored by: home.pl
# 4207c279	18-Apr-2006	Xin LI <delphij@FreeBSD.org>	In vfs_hash_get(): mount point should never be changed so explicitly constify the mp parameter. Reviewed by: phk
# 791dd2fa	08-Mar-2006	Tor Egge <tegge@FreeBSD.org>	Use vn_start_secondary_write() and vn_finished_secondary_write() as a replacement for vn_write_suspend_wait() to better account for secondary write processing. Close race where secondary writes could be started after ffs_sync() returned but before the file system was marked as suspended. Detect if secondary writes or softdep processing occurred during vnode sync loop in ffs_sync() and retry the loop if needed.
# eb2ea105	01-Mar-2006	Jeff Roberson <jeff@FreeBSD.org>	- Move softdep from using a global worklist to per-mount worklists. This has many positive effects including improved smp locking, reducing interdependencies between mounts that can lead to deadlocks, etc. - Add the softdep worklist and various counters to the ufsmnt structure. - Add a mount pointer to the workitem and remove mount pointers from the various structures derived from the workitem as they are now redundant. - Remove the poor-man's semaphore protecting softdep_process_worklist and softdep_flushworklist. Several threads may now process the list simultaneously. - Add softdep_waitidle() to block the thread until all pending dependencies being operated on by other threads have been flushed. - Use softdep_waitidle() in unmount and snapshots to block either operation until the fs is stable. - Remove softdep worklist processing from the syncer and move it into the softdep_flush() thread. This thread processes all softdep mounts once each second and when it is called via the new softdep_speedup() when there is a resource shortage. This removes the softdep hook from the kernel and various hacks in header files to support it. Reviewed by/Discussed with: tegge, truckman, mckusick Tested by: kris
# 731959b1	31-Jan-2006	Yaroslav Tykhiy <ytykhiy@gmail.com>	Use off_t for file size passed to vnode_create_vobject(). The former type, size_t, was causing truncation to 32 bits on i386, which immediately led to undersizing of VM objects backed by files >4GB. In particular, sendfile(2) was broken for such files. PR: kern/92243 MFC after: 5 days
# ea6d62f1	14-Jan-2006	Robert Watson <rwatson@FreeBSD.org>	Rename uid and gid arguments to vaccess() prototype to match vaccess() implementation in vfs_subr.c. No functional change. MFC after: 3 days
# 82be0a5a	09-Jan-2006	Tor Egge <tegge@FreeBSD.org>	Add marker vnodes to ensure that all vnodes associated with the mount point are iterated over when using MNT_VNODE_FOREACH. Reviewed by: truckman
# 0430a5e2	13-Dec-2005	Dag-Erling Smørgrav <des@FreeBSD.org>	Eradicate caddr_t from the VFS API.
# e26b05cf	13-Dec-2005	Dag-Erling Smørgrav <des@FreeBSD.org>	Nuke vnodeop_desc.vdesc_transports, which has been unused since the dawn of time (or the inception of ncvs, whichever came last)
# 9f5c1d19	12-Oct-2005	Diomidis Spinellis <dds@FreeBSD.org>	Move execve's access time update functionality into a new vfs_mark_atime() function, and use the new function for performing efficient atime updates in mmap(). Reviewed by: bde MFC after: 2 weeks
# 2883ba66	12-Sep-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Introduce vfs_read_dirent() which can help VOP_READDIR() implementations by handling all the cookie stuff.
# 34cc826a	05-Aug-2005	Suleiman Souhlal <ssouhlal@FreeBSD.org>	Holding a vnode doesn't prevent v_mount from disappearing (when the vnode is inactivated), possibly leading to a NULL dereference when checking if the mount wants knotes to be activated in the VOP hooks. So, we add a new vnode flag VV_NOKNOTE that is only set in getnewvnode(), if necessary, and check it when activating knotes. Since the flags are not erased when a vnode is being held, we can safely read them. Reviewed by: kris@ MFC after: 3 days
# e8ddb61d	02-Aug-2005	Jeff Roberson <jeff@FreeBSD.org>	- Replace the series of DEBUG_LOCKS hacks which tried to save the vn_lock caller by saving the stack of the last locker/unlocker in lockmgr. We also put the stack in KTR at the moment. Contributed by: Antoine Brodin <antoine.brodin@laposte.net>
# 2b0f687b	01-Jul-2005	Suleiman Souhlal <ssouhlal@FreeBSD.org>	Mistakingly undefined VN_KNOTE_LOCKED in my previous commit. Noticed by: Antoine Brodin <antoine.brodin@laposte.net> Approved by: re (scottl)
# 571dcd15	01-Jul-2005	Suleiman Souhlal <ssouhlal@FreeBSD.org>	Fix the recent panics/LORs/hangs created by my kqueue commit by: - Introducing the possibility of using locks different than mutexes for the knlist locking. In order to do this, we add three arguments to knlist_init() to specify the functions to use to lock, unlock and check if the lock is owned. If these arguments are NULL, we assume mtx_lock, mtx_unlock and mtx_owned, respectively. - Using the vnode lock for the knlist locking, when doing kqueue operations on a vnode. This way, we don't have to lock the vnode while holding a mutex, in filt_vfsread. Reviewed by: jmg Approved by: re (scottl), scottl (mentor override) Pointyhat to: ssouhlal Will be happy: everyone
# b930d853	13-Jun-2005	Jeff Roberson <jeff@FreeBSD.org>	- Don't make vgonel() globally visible, we want to change its prototype anyway and it's not used outside of vfs_subr.c. - Change vgonel() to accept a parameter which determines whether or not we'll put the vnode on the free list when we're done. - Use the new vgonel() parameter rather than VI_DOOMED to signal our intentions in vtryrecycle(). - In vgonel() return if VI_DOOMED is already set, this vnode has already been reclaimed. Sponsored by: Isilon Systems, Inc.
# 679985d0	09-Jun-2005	Suleiman Souhlal <ssouhlal@FreeBSD.org>	Allow EVFILT_VNODE events to work on every filesystem type, not just UFS by: - Making the pre and post hooks for the VOP functions work even when DEBUG_VFS_LOCKS is not defined. - Moving the KNOTE activations into the corresponding VOP hooks. - Creating a MNTK_NOKNOTE flag for the mnt_kern_flag field of struct mount that permits filesystems to disable the new behavior. - Creating a default VOP_KQFILTER function: vfs_kqfilter() My benchmarks have not revealed any performance degradation. Reviewed by: jeff, bde Approved by: rwatson, jmg (kqueue changes), grehan (mentor)
# 6341095e	31-May-2005	Ken Smith <kensmith@FreeBSD.org>	This patch addresses a standards violation issue. The standards say a file's access time should be updated when it gets executed. A while ago the mechanism used to exec was changed to use a more mmap based mechanism and this behavior was broken as a side-effect of that. A new vnode flag is added that gets set when the file gets executed, and the VOP_SETATTR() vnode operation gets called. The underlying filesystem is expected to handle it based on its own semantics, some filesystems don't support access time at all. Those that do should handle it in a way that does not block, does not generate I/O if possible, etc. In particular vn_start_write() has not been called. The UFS code handles it the same way as it would normally handle the access time if a file was read - the IN_ACCESS flag gets set in the inode but no other action happens at this point. The actual time update will happen later during a sync (which handles all the necessary locking). Got me into this: cperciva Discussed with: a lot with bde, a little with kan Showed patches to: phk, jeffr, standards@, arch@ Minor discussion on: arch@
# fc8dfa75	27-Apr-2005	Jeff Roberson <jeff@FreeBSD.org>	- Changes to vgone() and related teardown code have meant that the vxthread pointer is no longer needed.
# ab9707d7	22-Apr-2005	Jeff Roberson <jeff@FreeBSD.org>	- Add a VI_LOCK_FLAGS so we can pass MTX_DUPOK in. This somewhat defeats the purpose of having macros to hide the lock type as we may now be dependent on MTX_ flags. Sponsored by: Isilon Systems, Inc.
# cbc5da3a	11-Apr-2005	Jeff Roberson <jeff@FreeBSD.org>	- Add the mising ASSERT_VOP_ELOCKED code in the !DEBUG_VFS_LOCKS case. Pointy hat to: me
# 539de9ed	11-Apr-2005	Jeff Roberson <jeff@FreeBSD.org>	- Enable ASSERT_VOP_ELOCKED and assert_vop_elocked() now that vnode_if.awk uses it. Sponsored by: Isilon Systems, Inc.
# 878cdac0	29-Mar-2005	David Schultz <das@FreeBSD.org>	Eliminate v_id and v_ddid. This changes struct vnode, so all filesystem modules must be recompiled. (Since struct vnode has already changed in 6-CURRENT, there's little advantage to leaving the unused fields around.)
# c167961e	23-Mar-2005	Jeff Roberson <jeff@FreeBSD.org>	- If vput() is called with a shared lock it must upgrade to an exclusive before it can call VOP_INACTIVE(). This must use the EXCLUPGRADE path because we may violate some lock order with another locked vnode if we drop and reacquire the lock. If EXCLUPGRADE fails, we mark the vnode with VI_OWEINACT. This case should be very rare. - Clear VI_OWEINACT in vinactive() and vbusy(). - If VI_OWEINACT is set in vgone() do the VOP_INACTIVE call here as well. Sponsored by: Isilon Systems, Inc.
# 51f5ce0c	16-Mar-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Add two arguments to the vfs_hash() KPI so that filesystems which do not have unique hashes (NFS) can also use it.
# 78bb3c21	16-Mar-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Add mnt_hashseed to struct mount and initialize it witn PRNG bits, use it to get better hashing in vfs_hash. In case of an insert collision in vfs_hash_insert(), put the loosing vnode on a special list so that vfs_hash_remove() can just assume that it is on a list. Drop the VI_HASHED flag.
# b172f6c5	15-Mar-2005	Jeff Roberson <jeff@FreeBSD.org>	- Now that there are no external users of vfree() make it static. - Move VSHOULDBUSY, VSHOULDFREE, and VTRYRECYCLE into vfs_subr.c so no one else attempts to grow a dependency on them. - Now that objects with pages hold the vnode we don't have to do unlocked checks for the page count in the vm object in VSHOULDFREE. These three macros could simply check for holdcnt state transitions to determine whether the vnode is on the free list already, but the extra safety the flag affords us is probably worth the minimal cost. - The leafonly sysctl and code have been dead for several years now, remove the sysctl and the code that employed it from vtryrecycle(). - vtryrecycle() also no longer has to check the object's page count as the object holds the vnode until it reaches 0. Sponsored by: Isilon Systems, Inc.
# c178628d	15-Mar-2005	Jeff Roberson <jeff@FreeBSD.org>	- Expose vholdl() so it may be used outside of vfs_subr.c
# 6c325a2a	14-Mar-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Currently (almost) all filesystems maintain a local inode hash table to get from (mount + inode) to vnode. These tables are mostly copy&pasted from UFS, sized based on desiredvnodes and therefore quite large (128K-512K). Several filesystems are buggy enough that they allocate the hash table even before they know if they will ever be used or not. Add "vfs_hash", a system wide hash table, which will replace all the per-filesystem hash-tables. The fields we add to struct vnode will more or less be saved in the respective filesystems inodes. Having one central implementation will save code and will allow us to justify the complexity of code to dynamically (re)size the hash at a later point.
# 8045557f	14-Mar-2005	Jeff Roberson <jeff@FreeBSD.org>	- Increment the holdcnt once for each usecount reference. This allows us to use only the holdcnt to determine whether a vnode may be recycled, simplifying the V* macros as well as vtryrecycle(), etc. Sponsored by: Isilon Systems, Inc.
# 159b4548	14-Mar-2005	Jeff Roberson <jeff@FreeBSD.org>	- We do not have to check the object's ref_count in VSHOULDFREE or vtryrecycle(). All obj refs also ref the vnode. - Consistently use v_incr_usecount() to increment the usecount. This will be more important later. Sponsored by: Isilon Systems, Inc.
# 1d39df3f	14-Mar-2005	Jeff Roberson <jeff@FreeBSD.org>	- Retire OLOCK and OWANT. All callers hold the vnode lock when creating a vnode object. There has been an assert to prove this for some time. Sponsored by: Isilon Systems, Inc.
# 5e7475a3	13-Mar-2005	Jeff Roberson <jeff@FreeBSD.org>	- Get rid of VXLOCK, VXWANT, and vx_*. The vnode lock now protects us against recycling. - Modify VSHOULDFREE, VCANRECYCLE, etc. now that certain flags are no longer important. Remove VMIGHTFREE as it is only used in one place. Sponsored by: Isilon Systems, Inc.
# bd8bb8e4	22-Feb-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Group the fields in struct vnode by their function and stick comments there to tell what the function is.
# aa2f6ddc	22-Feb-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Reap more benefits from DEVFS: List devfs_dirents rather than vnodes off their shared struct cdev, this saves a pointer field in the vnode at the expense of a field in the devfs_dirent. There are often 100 times more vnodes so this is bargain. In addition it makes it harder for people to try to do stypid things like "finding the vnode from cdev". Since DEVFS handles all VCHR nodes now, we can do the vnode related cleanup in devfs_reclaim() instead of in dev_rel() and vgonel(). Similarly, we can do the struct cdev related cleanup in dev_rel() instead of devfs_reclaim(). rename idestroy_dev() to destroy_devl() for consistency. Add LIST_ENTRY de_alias to struct devfs_dirent. Remove v_specnext from struct vnode. Change si_hlist to si_alist in struct cdev. String new devfs vnodes' devfs_dirent on si_alist when we create them and take them off in devfs_reclaim(). Fix devfs_revoke() accordingly. Also don't clear fields devfs_reclaim() will clear when called from vgone(); Let devfs_reclaim() call dev_rel() instead of vgonel(). Move the usecount tracking from dev_rel() to devfs_reclaim(), and let dev_rel() take a struct cdev argument instead of vnode. Destroy SI_CHEAPCLONE devices in dev_rel() (instead of devfs_reclaim()) when they are no longer used. (This should maybe happen in devfs_close() instead.)
# 7fc940b2	22-Feb-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Remove vfinddev(), it is generally bogus when faced with jails and chroot and has no legitimate use(r)s in the tree.
# 4d8ac58b	17-Feb-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Introduce vx_wait{l}() and use it instead of home-rolled versions.
# 1ba21282	09-Feb-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Make various vnode related functions static
# 8d63297e	10-Feb-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Add __printflike() to vn_printf()
# 88e5b12a	08-Feb-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Drag another softupdates tentacle back into FFS: Now that FFS's vop_fsync is separate from the internal use we can do the full job there.
# 7c5d36fb	07-Feb-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Remove vop_stddestroyvobject()
# d4eb29ba	28-Jan-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Remove unused argument to vrecycle()
# 7146d6cb	28-Jan-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Move the contents of vop_stddestroyvobject() to the new vnode_pager function vnode_destroy_vobject(). Make the new function zero the vp->v_object pointer so we can tell if a call is missing.
# 729fcf7e	24-Jan-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Take VOP_GETVOBJECT() out to pasture. We use the direct pointer now.
# 69816ea3	24-Jan-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Kill VOP_CREATEVOBJECT(), it is now the responsibility of the filesystem for a given vnode to create a vnode_pager object if one is needed.
# d07a6d3f	24-Jan-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Move the body of vop_stdcreatevobject() over to the vnode_pager under the name Sande^H^H^H^H^Hvnode_create_vobject(). Make the new function take a size argument which removes the need for a VOP_STAT() or a very pessimistic guess for disks. Call that new function from vop_stdcreatevobject(). Make vnode_pager_alloc() private now that its only user came home.
# 7c93282e	24-Jan-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Change vprint() to vn_printf() which takes varargs. Add #define for vprint() to call vn_printf().
# 35764be3	24-Jan-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Kill the VV_OBJBUF and test the v_object for NULL instead.
# 49bc5dce	24-Jan-2005	Jeff Roberson <jeff@FreeBSD.org>	- Add a VCANRECYCLE() which performs all the checks required to ensure that we are free to release a vnode.
# 7c0745ee	14-Jan-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Eliminate unused and unnecessary "cred" argument from vinvalbuf()
# e39db32a	12-Jan-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Ditch vfs_object_create() and make the callers call VOP_CREATEVOBJECT() directly.
# de8a6c06	13-Jan-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Get rid of the VDESC() macro while the pot is boiling anyway, it is only used from generate files now, so we might as well generate the right stuff from the start.
# 63f89abf	13-Jan-2005	Poul-Henning Kamp <phk@FreeBSD.org>	Change the generated VOP_ macro implementations to improve type checking and KASSERT coverage. After this check there is only one "nasty" cast in this code but there is a KASSERT to protect against the wrong argument structure behind that cast. Un-inlining the meat of VOP_FOO() saves 35kB of text segment on a typical kernel with no change in performance. We also now run the checking and tracing on VOP's which have been layered by nullfs, umapfs, deadfs or unionfs. Add new (non-inline) VOP_FOO_AP() functions which take a "struct foo_args" argument and does everything the VOP_FOO() macros used to do with checks and debugging code. Add KASSERT to VOP_FOO_AP() check for argument type being correct. Slim down VOP_FOO() inline functions to just stuff arguments into the struct foo_args and call VOP_FOO_AP(). Put function pointer to VOP_FOO_AP() into vop_foo_desc structure and make VCALL() use it instead of the current offsetoff() hack. Retire vcall() which implemented the offsetoff() Make deadfs and unionfs use VOP_FOO_AP() calls instead of VCALL(), we know which specific call we want already. Remove unneeded arguments to VCALL() in nullfs and umapfs bypass functions. Remove unused vdesc_offset and VOFFSET(). Generally improve style/readability of the generated code.
# 60727d8b	06-Jan-2005	Warner Losh <imp@FreeBSD.org>	/* -> /*- for license, minor formatting changes
# 10eee285	22-Dec-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Shuffle numeric values of the IO_* flags to match the O_* flags from fcntl.h. This is in preparation for making the flags passed to device drivers be consistently from fcntl.h for all entrypoints. Today open, close and ioctl uses fcntl.h flags, while read and write uses vnode.h flags.
# e87047b4	20-Dec-2004	Poul-Henning Kamp <phk@FreeBSD.org>	We can only ever get to vgonechrl() from a devfs vnode, so we do not need to reassign the vp->v_op to devfs_specops, we know that is the value already. Make devfs_specops private to devfs.
# d3ecc7aa	11-Dec-2004	Marcel Moolenaar <marcel@FreeBSD.org>	Revert rev 1.259. The null-pointer function call (a dereference on ia64) was not the result of a change in the vector operations. It was caused by the NFS locking code using a FIFO and those bypassing the vnode. This indirectly caused the panic. The NFS locking code has been changed. Requested by: phk
# 20a92a18	07-Dec-2004	Poul-Henning Kamp <phk@FreeBSD.org>	The remaining part of nmount/omount/rootfs mount changes. I cannot sensibly split the conversion of the remaining three filesystems out from the root mounting changes, so in one go: cd9660: Convert to nmount. Add omount compat shims. Remove dedicated rootfs mounting code. Use vfs_mountedfrom() Rely on vfs_mount.c calling VFS_STATFS() nfs(client): Convert to nmount (the simple way, mount_nfs(8) is still necessary). Add omount compat shims. Drop COMPAT_PRELITE2 mount arg compatibility. ffs: Convert to nmount. Add omount compat shims. Remove dedicated rootfs mounting code. Use vfs_mountedfrom() Rely on vfs_mount.c calling VFS_STATFS() Remove vfs_omount() method, all filesystems are now converted. Remove MNTK_WANTRDWR, handling RO/RW conversions is a filesystem task, and they all do it now. Change rootmounting to use DEVFS trampoline: vfs_mount.c: Mount devfs on /. Devfs needs no 'from' so this is clean. symlink /dev to /. This makes it possible to lookup /dev/foo. Mount "real" root filesystem on /. Surgically move the devfs mountpoint from under the real root filesystem onto /dev in the real root filesystem. Remove now unnecessary getdiskbyname(). kern_init.c: Don't do devfs mounting and rootvnode assignment here, it was already handled by vfs_mount.c. Remove now unused bdevvp(), addaliasu() and addalias(). Put the few necessary lines in devfs where they belong. This eliminates the second-last source of bogo vnodes, leaving only the lemming-syncer. Remove rootdev variable, it doesn't give meaning in a global context and was not trustworth anyway. Correct information is provided by statfs(/).
# 061f5ec8	05-Dec-2004	Marcel Moolenaar <marcel@FreeBSD.org>	Fix null-pointer indirect function calls introduced in the previous commit. In the new world order, the transitive closure on the vector operations is not precomputed. As such, it's unsafe to actually use any of the function pointers in an indirect function call. They can be null, and we need to use the default vector in that case. This is mostly a quick fix for the four function pointers that are ed explicitly. A more generic or scalable solution is likely to see the light of day. No pathos on: current@
# aec0fb7b	01-Dec-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Back when VOP_* was introduced, we did not have new-style struct initializations but we did have lofty goals and big ideals. Adjust to more contemporary circumstances and gain type checking. Replace the entire vop_t frobbing thing with properly typed structures. The only casualty is that we can not add a new VOP_ method with a loadable module. History has not given us reason to belive this would ever be feasible in the the first place. Eliminate in toto VOCALL(), vop_t, VNODEOP_SET() etc. Give coda correct prototypes and function definitions for all vop_()s. Generate a bit more data from the vnode_if.src file: a struct vop_vector and protype typedefs for all vop methods. Add a new vop_bypass() and make vop_default be a pointer to another struct vop_vector. Remove a lot of vfs_init since vop_vector is ready to use from the compiler. Cast various vop_mumble() to void * with uppercase name, for instance VOP_PANIC, VOP_NULL etc. Implement VCALL() by making vdesc_offset the offsetof() the relevant function pointer in vop_vector. This is disgusting but since the code is generated by a script comparatively safe. The alternative for nullfs etc. would be much worse. Fix up all vnode method vectors to remove casts so they become typesafe. (The bulk of this is generated by scripts)
# db442506	13-Nov-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Eliminate vop_revoke() function now that devfs_revoke() does the entire job.
# c5b846fe	10-Nov-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Slim vnodes by another four bytes by eliminating the (now) unused field v_cachedid.
# c13a4e88	10-Nov-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Remove vn_todev()
# b797084e	09-Nov-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Remove vnode->v_cachedfs. It was only used for the highly dangerous "export all vnodes with a sysctl" function.
# 20eba72f	27-Oct-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Move the syncer linkage from vnode to bufobj. This is not quite a perfect separation: the syncer still think it knows that everything is a vnode.
# 156cb265	25-Oct-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Loose the v_dirty* and v_clean* alias macros. Check the count field where we just want to know the full/empty state, rather than using TAILQ_EMPTY() or TAILQ_FIRST().
# ee1d0eb3	25-Oct-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Remove vnode->v_bsize. This was a dead-end.
# 4dcd0ac4	25-Oct-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Collapse vnode->v_object and buf->b_object into bufobj->bo_object.
# ff7c5a48	22-Oct-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Alas, poor SPECFS! -- I knew him, Horatio; A filesystem of infinite jest, of most excellent fancy: he hath taught me lessons a thousand times; and now, how abhorred in my imagination it is! my gorge rises at it. Here were those hacks that I have curs'd I know not how oft. Where be your kludges now? your workarounds? your layering violations, that were wont to set the table on a roar? Move the skeleton of specfs into devfs where it now belongs and bury the rest.
# a76d8f4e	21-Oct-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Move the VI_BWAIT flag into no bo_flag element of bufobj and call it BO_WWAIT Add bufobj_wref(), bufobj_wdrop() and bufobj_wwait() to handle the write count on a bufobj. Bufobj_wdrop() replaces vwakeup(). Use these functions all relevant places except in ffs_softdep.c where the use if interlocked_sleep() makes this impossible. Rename b_vnbufs to b_bobufs now that we touch all the relevant files anyway.
# 1bca607b	21-Oct-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Add BO_* macros parallel to VI_* macros for manipulating the bo_mtx. Initialize the bo_mtx when we allocate a vnode i getnewvnode() For now we point to the vnodes interlock mutex, that retains the exact same locking sematics. Move v_numoutput from vnode to bufobj. Add renaming macro to postpone code sweep.
# 0bf87424	20-Oct-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Add new function ttyinitmode() which sets our systemwide default modes on a tty structure. Both the ".init" and the current settings are initialized allowing the function to be used both at attach and open time. The function takes an argument to decide if echoing should be enabled by default. Echoing should not be enabled for regular physical serial ports unless they are consoles, in which case they should be configured by ttyconsolemode() instead. Use the new function throughout.
# 1affa3ad	07-Sep-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Create simple function init_va_filerev() for initializing a va_filerev field. Replace three instances of longhaired initialization va_filerev fields. Added XXX comment wondering why we don't use random bits instead of uptime of the system for this purpose.
# ad3b9257	15-Aug-2004	John-Mark Gurney <jmg@FreeBSD.org>	Add locking to the kqueue subsystem. This also makes the kqueue subsystem a more complete subsystem, and removes the knowlege of how things are implemented from the drivers. Include locking around filter ops, so a module like aio will know when not to be unloaded if there are outstanding knotes using it's filter ops. Currently, it uses the MTX_DUPOK even though it is not always safe to aquire duplicate locks. Witness currently doesn't support the ability to discover if a dup lock is ok (in some cases). Reviewed by: green, rwatson (both earlier versions)
# 39dfb406	10-Aug-2004	Robert Watson <rwatson@FreeBSD.org>	Modify vnode locking key: the v_pollinfo pointer itself is protected by Giant; the contents are protected by the pollinfo mutex. We rely on Giant to prevent races in assigning the value of v_pollinfo.
# f257b7a5	12-Jul-2004	Alfred Perlstein <alfred@FreeBSD.org>	Make VFS_ROOT() and vflush() take a thread argument. This is to allow filesystems to decide based on the passed thread which vnode to return. Several filesystems used curthread, they now use the passed thread.
# 1cbb1e02	03-Jul-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Blocksize for I/O should be a property of the vnode and not found by groping around in the vnodes surroundings when we allocate a block. Assign a blocksize when we create a vnode, and yell a warning (and ignore it) if we got the wrong size. Please email all such warnings to me.
# f3732fd1	17-Jun-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Second half of the dev_t cleanup. The big lines are: NODEV -> NULL NOUDEV -> NODEV udev_t -> dev_t udev2dev() -> findcdev() Various minor adjustments including handling of userland access to kernel space struct cdev etc.
# 89c9c53d	16-Jun-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Do the dreaded s/dev_t/struct cdev */ Bump __FreeBSD_version accordingly.
# f99619a0	04-Jun-2004	Tim J. Robbins <tjr@FreeBSD.org>	Change the types of vn_rdwr_inchunks()'s len and aresid arguments to size_t and size_t *, respectively. Update callers for the new interface. This is a better fix for overflows that occurred when dumping segments larger than 2GB to core files.
# 82c6e879	06-Apr-2004	Warner Losh <imp@FreeBSD.org>	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999. Approved by: core
# b21126c6	29-Mar-2004	Peter Wemm <peter@FreeBSD.org>	Clean up the stub fake vnode locking implemenations. The main reason this stuff was here (NFS) was fixed by Alfred in November. The only remaining consumer of the stub functions was umapfs, which is horribly horribly broken. It has missed out on about the last 5 years worth of maintenence that was done on nullfs (from which umapfs is derived). It needs major work to bring it up to date with the vnode locking protocol. umapfs really needs to find a caretaker to bring it into the 21st century. Functions GC'ed: vop_noislocked, vop_nolock, vop_nounlock, vop_sharedlock.
# 651b11ea	11-Mar-2004	Poul-Henning Kamp <phk@FreeBSD.org>	Remove unused second arg to vfinddev(). Don't call addaliasu() on VBLK nodes.
# 64f4c8e8	05-Jan-2004	Alexander Kabaev <kan@FreeBSD.org>	Properly ifdef support for vfs locking assertions based on DEBUG_VFS_LOCKS. Obtained from: bde
# 89144d1a	05-Jan-2004	Alexander Kabaev <kan@FreeBSD.org>	Style fixes: Remove double empty lines. Add tab after #define's. Properly terminate sentences in comments. Obtained from: bde (mostly).
# 9efe7d9d	28-Dec-2003	Bruce Evans <bde@FreeBSD.org>	v_vxproc was a bogus name for a thread (pointer).
# eca8a663	11-Nov-2003	Robert Watson <rwatson@FreeBSD.org>	Modify the MAC Framework so that instead of embedding a (struct label) in various kernel objects to represent security data, we embed a (struct label *) pointer, which now references labels allocated using a UMA zone (mac_label.c). This allows the size and shape of struct label to be varied without changing the size and shape of these kernel objects, which become part of the frozen ABI with 5-STABLE. This opens the door for boot-time selection of the number of label slots, and hence changes to the bound on the number of simultaneous labeled policies at boot-time instead of compile-time. This also makes it easier to embed label references in new objects as required for locking/caching with fine-grained network stack locking, such as inpcb structures. This change also moves us further in the direction of hiding the structure of kernel objects from MAC policy modules, not to mention dramatically reducing the number of '&' symbols appearing in both the MAC Framework and MAC policy modules, and improving readability. While this results in minimal performance change with MAC enabled, it will observably shrink the size of a number of critical kernel data structures for the !MAC case, and should have a small (but measurable) performance benefit (i.e., struct vnode, struct socket) do to memory conservation and reduced cost of zeroing memory. NOTE: Users of MAC must recompile their kernel and all MAC modules as a result of this change. Because this is an API change, third party MAC modules will also need to be updated to make less use of the '&' symbol. Suggestions from: bmilekic Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
# ca430f2e	04-Nov-2003	Alexander Kabaev <kan@FreeBSD.org>	Remove mntvnode_mtx and replace it with per-mountpoint mutex. Introduce two new macros MNT_ILOCK(mp)/MNT_IUNLOCK(mp) to operate on this mutex transparently. Eventually new mutex will be protecting more fields in struct mount, not only vnode list. Discussed with: jeff
# 06cb76bd	23-Oct-2003	Garrett Wollman <wollman@FreeBSD.org>	Add appropriate const poisoning to the assert_*locked() family so that I can call ASSERT_VOP_LOCKED(vp, __func__) without a diagnostic. Inspired by: the evil and rude OpenAFS cache manager code
# c31e9345	04-Oct-2003	Jeff Roberson <jeff@FreeBSD.org>	- Document more of the vnode locking strategy.
# 7c89f162	27-Jul-2003	Poul-Henning Kamp <phk@FreeBSD.org>	Add fdidx argument to vn_open() and vn_open_cred() and pass -1 throughout.
# b6f5d1d6	09-Jul-2003	Jeffrey Hsu <hsu@FreeBSD.org>	Replace custom field offset macro with the system __offsetof() macro. Reviewed by: bde
# 17a13919	31-May-2003	Poul-Henning Kamp <phk@FreeBSD.org>	The IO_NOWDRAIN and B_NOWDRAIN hacks are no longer needed to prevent deadlocks with vnode backed md(4) devices because md now uses a kthread to run the bio requests instead of doing it directly from the bio down path.
# adf4e1d5	26-Apr-2003	Alan Cox <alc@FreeBSD.org>	Remove an unused declaration.
# 9c926592	09-Apr-2003	Mike Barcroft <mike@FreeBSD.org>	Add prototypes for change_root() and change_dir().
# 3a7053cb	24-Feb-2003	Kirk McKusick <mckusick@FreeBSD.org>	Prevent large files from monopolizing the system buffers. Keep track of the number of dirty buffers held by a vnode. When a bdwrite is done on a buffer, check the existing number of dirty buffers associated with its vnode. If the number rises above vfs.dirtybufthresh (currently 90% of vfs.hidirtybuffers), one of the other (hopefully older) dirty buffers associated with the vnode is written (using bawrite). In the event that this approach fails to curb the growth in it the vnode's number of dirty buffers (due to soft updates rollback dependencies), the more drastic approach of doing a VOP_FSYNC on the vnode is used. This code primarily affects very large and actively written files such as snapshots. This change should eliminate hanging when taking snapshots or doing background fsck on very large filesystems. Hopefully, one day it will be possible to cache filesystem metadata in the VM cache as is done with file data. As it stands, only the buffer cache can be used which limits total metadata storage to about 20Mb no matter how much memory is available on the system. This rather small memory gets badly thrashed causing a lot of extra I/O. For example, taking a snapshot of a 1Tb filesystem minimally requires about 35,000 write operations, but because of the cache thrashing (we only have about 350 buffers at our disposal) ends up doing about 237,540 I/O's thus taking twenty-five minutes instead of four if it could run entirely in the cache. Reported by: Attila Nagy <bra@fsn.hu> Sponsored by: DARPA & NAI Labs.
# 767b9a52	09-Feb-2003	Jeff Roberson <jeff@FreeBSD.org>	- Cleanup unlocked accesses to buf flags by introducing a new b_vflag member that is protected by the vnode lock. - Move B_SCANNED into b_vflags and call it BV_SCANNED. - Create a vop_stdfsync() modeled after spec's sync. - Replace spec_fsync, msdos_fsync, and hpfs_fsync with the stdfsync and some fs specific processing. This gives all of these filesystems proper behavior wrt MNT_WAIT/NOWAIT and the use of the B_SCANNED flag. - Annotate the locking in buf.h
# 6a1b2a22	29-Dec-2002	Ian Dowse <iedowse@FreeBSD.org>	Add a new vnode flag VI_DOINGINACT to indicate that a VOP_INACTIVE call is in progress on the vnode. When vput() or vrele() sees a 1->0 reference count transition, it now return without any further action if this flag is set. This flag is necessary to avoid recursion into VOP_INACTIVE if the filesystem inactive routine causes the reference count to increase and then drop back to zero. It is also used to guarantee that an unlocked vnode will not be recycled while blocked in VOP_INACTIVE(). There are at least two cases where the recursion can occur: one is that the softupdates code called by ufs_inactive() via ffs_truncate() can call vput() on the vnode. This has been reported by many people as "lockmgr: draining against myself" panics. The other case is that nfs_inactive() can call vget() and then vrele() on the vnode to clean up a sillyrename file. Reviewed by: mckusick (an older version of the patch)
# 45587e25	28-Dec-2002	Matthew Dillon <dillon@FreeBSD.org>	Abstract-out the constants for the sequential heuristic. No operational changes. MFC after: 1 day
# c7047e52	27-Oct-2002	Garrett Wollman <wollman@FreeBSD.org>	Change the way support for asynchronous I/O is indicated to applications to conform to 1003.1-2001. Make it possible for applications to actually tell whether or not asynchronous I/O is supported. Since FreeBSD's aio implementation works on all descriptor types, don't call down into file or vnode ops when [f]pathconf() is asked about _PC_ASYNC_IO; this avoids the need for every file and vnode op to know about it.
# 9ab73fd1	24-Oct-2002	Kirk McKusick <mckusick@FreeBSD.org>	Within ufs, the ffs_sync and ffs_fsync functions did not always check for and/or report I/O errors. The result is that a VFS_SYNC or VOP_FSYNC called with MNT_WAIT could loop infinitely on ufs in the presence of a hard error writing a disk sector or in a filesystem full condition. This patch ensures that I/O errors will always be checked and returned. This patch also ensures that every call to VFS_SYNC or VOP_FSYNC with MNT_WAIT set checks for and takes appropriate action when an error is returned. Sponsored by: DARPA & NAI Labs.
# 7d2eb23f	12-Oct-2002	Jeff Roberson <jeff@FreeBSD.org>	- Remove the do { } while(0) from the VOP lock assert macros. This was not optimized away by the compiler in time for it to still leave the VOP functions as inlines. Submitted by: bde
# 5c5f6cfa	28-Sep-2002	Poul-Henning Kamp <phk@FreeBSD.org>	Don't use unnamed anonymous structs: give it a name.
# ca916247	27-Sep-2002	Poul-Henning Kamp <phk@FreeBSD.org>	Rename struct specinfo to the more appropriate struct cdev. Agreed on: jake, rwatson, jhb
# 6423c943	25-Sep-2002	Jeff Roberson <jeff@FreeBSD.org>	- Move ASSERT_VOP_LOCK functionality into functions in vfs_subr.c - Make the VI asserts more orthogonal to the rest of the asserts by using a new, common vfs_badlock() function and adding a 'str' arg. - Adjust generated ASSERTS to match the new prototype. - Adjust explicit ASSERTS to match the new prototype.
# 6cb8bf20	24-Sep-2002	Jeff Roberson <jeff@FreeBSD.org>	- Lock down the syncer with sync_mtx. - Enable vfs_badlock_mutex by default. - Assert that the vp is locked in VOP_UNLOCK. - Use standard interlock macros in remaining code. - Correct a race in getnewvnode(). - Lock access to v_numoutput with interlock. - Lock access to buf lists and splay tree with interlock. - Add VOP and VI asserts. - Lock b_vnbufs with the vnode interlock. - Add vrefcnt() for callers who want to retreive the vnode ref without holding a lock. Add a comment that describes when this is safe. - Add vholdl() and vdropl() so that callers who already own the interlock can avoid race conditions and unnecessary unlocking. - Move the VOP_GETATTR() in vflush() into the WRITECLOSE conditional case. - Hold the interlock before droping the mntlist_mtx in vflush() to avoid a race. - Fix locking in vfs_msync().
# 9a0a3322	24-Sep-2002	Jeff Roberson <jeff@FreeBSD.org>	- Finish the struct vnode lock annotation. - Order fields by what lock is required to access them.
# cc8e7533	23-Sep-2002	Jeff Roberson <jeff@FreeBSD.org>	- Include sys/ktr.h so that vnode_if.h can define trace points.
# 06be2aaa	14-Sep-2002	Nate Lawson <njl@FreeBSD.org>	Remove all use of vnode->v_tag, replacing with appropriate substitutes. v_tag is now const char * and should only be used for debugging. Additionally: 1. All users of VT_NTS now check vfsconf->vf_type VFCF_NETWORK 2. The user of VT_PROCFS now checks for the new flag VV_PROCDEP, which is propagated by pseudofs to all child vnodes if the fs sets PFS_PROCDEP. Suggested by: phk Reviewed by: bde, rwatson (earlier version)
# bb77b793	10-Sep-2002	Bruce Evans <bde@FreeBSD.org>	Fixed namespace pollution in uma changes: - use `struct uma_zone *' instead of uma_zone_t, so that <sys/uma.h> isn't a prerequisite. - don't include <sys/uma.h>. Namespace pollution makes "opaque" types like uma_zone_t perfectly non-opaque. Such types should never be used (see style(9)). "Fixed" subsequently grown dependencies of this header on its own pollution by polluting explicitly: - include <sys/mutex.h> and its prerequisite <sys/lock.h> instead of depending on namespace pollution 2 layers deep in <sys/uma.h>.
# 8f19eb88	01-Sep-2002	Ian Dowse <iedowse@FreeBSD.org>	Split out a number of mostly VFS and signal related syscalls into a kernel-internal kern_*() version and a wrapper that is called via the syscall vector table. For paths and structure pointers, the internal version either takes a uio_seg parameter or requires the caller to copyin() the data to kernel memory as appropiate. This will permit emulation layers to use these syscalls without having to copy out translated arguments to the stack gap. Discussed on: -arch Review/suggestions: bde, jhb, peter, marcel
# 71ea4ba5	21-Aug-2002	Jeff Roberson <jeff@FreeBSD.org>	- Add two new debugging macros: ASSERT_VI_LOCKED and ASSERT_VI_UNLOCKED - Use the new VI asserts in place of the old mtx_assert checks. - Add the VI asserts to the automated lock checking in the VOP calls. The interlock should not be held across vops with a few exceptions. - Add the vop_(un)lock_{pre,post} functions to assert that interlock is held when LK_INTERLOCK is set.
# ea6027a8	15-Aug-2002	Robert Watson <rwatson@FreeBSD.org>	Make similar changes to fo_stat() and fo_poll() as made earlier to fo_read() and fo_write(): explicitly use the cred argument to fo_poll() as "active_cred" using the passed file descriptor's f_cred reference to provide access to the file credential. Add an active_cred argument to fo_stat() so that implementers have access to the active credential as well as the file credential. Generally modify callers of fo_stat() to pass in td->td_ucred rather than fp->f_cred, which was redundantly provided via the fp argument. This set of modifications also permits threads to perform these operations on behalf of another thread without modifying their credential. Trickle this change down into fo_stat/poll() implementations: - badfo_poll(), badfo_stat(): modify/add arguments. - kqueue_poll(), kqueue_stat(): modify arguments. - pipe_poll(), pipe_stat(): modify/add arguments, pass active_cred to MAC checks rather than td->td_ucred. - soo_poll(), soo_stat(): modify/add arguments, pass fp->f_cred rather than cred to pru_sopoll() to maintain current semantics. - sopoll(): moidfy arguments. - vn_poll(), vn_statfile(): modify/add arguments, pass new arguments to vn_stat(). Pass active_cred to MAC and fp->f_cred to VOP_POLL() to maintian current semantics. - vn_close(): rename cred to file_cred to reflect reality while I'm here. - vn_stat(): Add active_cred and file_cred arguments to vn_stat() and consumers so that this distinction is maintained at the VFS as well as 'struct file' layer. Pass active_cred instead of td->td_ucred to MAC and to VOP_GETATTR() to maintain current semantics. - fifofs: modify the creation of a "filetemp" so that the file credential is properly initialized and can be used in the socket code if desired. Pass ap->a_td->td_ucred as the active credential to soo_poll(). If we teach the vnop interface about the distinction between file and active credentials, we would use the active credential here. Note that current inconsistent passing of active_cred vs. file_cred to VOP's is maintained. It's not clear why GETATTR would be authorized using active_cred while POLL would be authorized using file_cred at the file system level. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
# 9ca43589	15-Aug-2002	Robert Watson <rwatson@FreeBSD.org>	In order to better support flexible and extensible access control, make a series of modifications to the credential arguments relating to file read and write operations to cliarfy which credential is used for what: - Change fo_read() and fo_write() to accept "active_cred" instead of "cred", and change the semantics of consumers of fo_read() and fo_write() to pass the active credential of the thread requesting an operation rather than the cached file cred. The cached file cred is still available in fo_read() and fo_write() consumers via fp->f_cred. These changes largely in sys_generic.c. For each implementation of fo_read() and fo_write(), update cred usage to reflect this change and maintain current semantics: - badfo_readwrite() unchanged - kqueue_read/write() unchanged pipe_read/write() now authorize MAC using active_cred rather than td->td_ucred - soo_read/write() unchanged - vn_read/write() now authorize MAC using active_cred but VOP_READ/WRITE() with fp->f_cred Modify vn_rdwr() to accept two credential arguments instead of a single credential: active_cred and file_cred. Use active_cred for MAC authorization, and select a credential for use in VOP_READ/WRITE() based on whether file_cred is NULL or not. If file_cred is provided, authorize the VOP using that cred, otherwise the active credential, matching current semantics. Modify current vn_rdwr() consumers to pass a file_cred if used in the context of a struct file, and to always pass active_cred. When vn_rdwr() is used without a file_cred, pass NOCRED. These changes should maintain current semantics for read/write, but avoid a redundant passing of fp->f_cred, as well as making it more clear what the origin of each credential is in file descriptor read/write operations. Follow-up commits will make similar changes to other file descriptor operations, and modify the MAC framework to pass both credentials to MAC policy modules so they can implement either semantic for revocation. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
# 01abbb42	13-Aug-2002	Robert Watson <rwatson@FreeBSD.org>	Move to a nested include of _label.h instead of mac.h in sys/sys/*.h (Most of the places where mac.h was recursively included from another kernel header file. net/netinet to follow.) Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs Suggested by: bde
# 62c0c263	11-Aug-2002	Robert Watson <rwatson@FreeBSD.org>	Introduce IO_NOMACCHECK, a flag that will be passed to vn_rdwr() to indicate that the calling code has already performed necessary MAC checks (if any) for this operation. This flag will help resolve layering problems that existing because vn_rdwr() is called both on behalf of user processes directly (such as in system calls of various sorts, during core dumps, etc), as well as deep in the file system code on behalf of the file system (such as in UFS, ext2fs, etc). Code that is acting on behalf of a kernel service rather than explicitly on behalf of a user process will specify this flag. By default, MAC checks will be performed (and generally should be performed). Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
# 780bb93d	07-Aug-2002	Jeff Roberson <jeff@FreeBSD.org>	- Adjust locking markup to match the proc markup. - Add a comment about the current, unfinished, state of vnode locking. Suggested by: bde
# 973a3f4b	05-Aug-2002	Jeff Roberson <jeff@FreeBSD.org>	- Document more of the struct vnode locking protocol. - Slightly reformat a comment block.
# e6e370a7	04-Aug-2002	Jeff Roberson <jeff@FreeBSD.org>	- Replace v_flag with v_iflag and v_vflag - v_vflag is protected by the vnode lock and is used when synchronization with VOP calls is needed. - v_iflag is protected by interlock and is used for dealing with vnode management issues. These flags include X/O LOCK, FREE, DOOMED, etc. - All accesses to v_iflag and v_vflag have either been locked or marked with mp_fixme's. - Many ASSERT_VOP_LOCKED calls have been added where the locking was not clear. - Many functions in vfs_subr.c were restructured to provide for stronger locking. Idea stolen from: BSD/OS
# 4eee8de7	30-Jul-2002	Dag-Erling Smørgrav <des@FreeBSD.org>	Introduce struct xvnode, which will be used instead of struct vnode for sysctl purposes. Also add two fields to struct vnode, v_cachedfs and v_cachedid, which hold the vnode's device and file id and are filled in by vn_open_cred() and vn_stat(). Sponsored by: DARPA, NAI Labs
# f3cfa607	30-Jul-2002	Robert Watson <rwatson@FreeBSD.org>	Begin committing support for Mandatory Access Control and extensible kernel access control. The MAC framework permits loadable kernel modules to link to the kernel at compile-time, boot-time, or run-time, and augment the system security policy. This commit includes the initial kernel implementation, although the interface with the userland components of the oeprating system is still under work, and not all kernel subsystems are supported. Later in this commit sequence, documentation of which kernel subsystems will not work correctly with a kernel compiled with MAC support will be added. Label vnodes, permitting security information to maintained at the granularity of the individual file, directory (et al). This data is protected by the vnode lock and may be read only when holding a shared lock, or modified only when holding an exclusive lock. Label information may be considered either the primary copy, or a cached copy. Individual file systems or kernel services may use the VCACHEDLABEL flag for accounting purposes to determine which it is. New VOPs will be introduced to refresh this label on demand, or to set the label value. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
# c72f085d	30-Jul-2002	Jeff Roberson <jeff@FreeBSD.org>	- Add vfs_badlock_{print,panic} support to the remaining VOP_ASSERT_* macros.
# 3b82505b	29-Jul-2002	Jeff Roberson <jeff@FreeBSD.org>	- Add VBAD to the list of vnodes that are ignored on locking operations.
# 586261a3	27-Jul-2002	Robert Watson <rwatson@FreeBSD.org>	Reserve VCACHEDLABEL vnode flag for use by the TrustedBSD MAC implementation. This flag will indicate that the security label in the vnode is currently valid, and therefore doesn't need to be refreshed before an access control decision can be made. Most file systems (or stdvops) will set this flag after they load the MAC label from disk the first time to prevent redundant disk I/O; some file synthetic file systems (procfs, for example) may not. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
# ccac0940	21-Jul-2002	Robert Watson <rwatson@FreeBSD.org>	Add VALLPERM, which is a mask of all the access control request permission bits for vnodes passed to vaccess() and friends. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
# 230d10f6	21-Jul-2002	Robert Watson <rwatson@FreeBSD.org>	Sort vnode access mode flags. Add flags VSTAT, VAPPEND required for TrustedBSD. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
# 7aca6291	19-Jul-2002	Kirk McKusick <mckusick@FreeBSD.org>	Add support to UFS2 to provide storage for extended attributes. As this code is not actually used by any of the existing interfaces, it seems unlikely to break anything (famous last words). The internal kernel interface to manipulate these attributes is invoked using two new IO_ flags: IO_NORMAL and IO_EXT. These flags may be specified in the ioflags word of VOP_READ, VOP_WRITE, and VOP_TRUNCATE. Specifying IO_NORMAL means that you want to do I/O to the normal data part of the file and IO_EXT means that you want to do I/O to the extended attributes part of the file. IO_NORMAL and IO_EXT are mutually exclusive for VOP_READ and VOP_WRITE, but may be specified individually or together in the case of VOP_TRUNCATE. For example, when removing a file, VOP_TRUNCATE is called with both IO_NORMAL and IO_EXT set. For backward compatibility, if neither IO_NORMAL nor IO_EXT is set, then IO_NORMAL is assumed. Note that the BA_ and IO_ flags have been `merged' so that they may both be used in the same flags word. This merger is possible by assigning the IO_ flags to the low sixteen bits and the BA_ flags the high sixteen bits. This works because the high sixteen bits of the IO_ word is reserved for read-ahead and help with write clustering so will never be used for flags. This merge lets us get away from code of the form: if (ioflags & IO_SYNC) flags \|= BA_SYNC; For the future, I have considered adding a new field to the vattr structure, va_extsize. This addition could then be exported through the stat structure to allow applications to find out the size of the extended attribute storage and also would provide a more standard interface for truncating them (via VOP_SETATTR rather than VOP_TRUNCATE). I am also contemplating adding a pathconf parameter (for concreteness, lets call it _PC_MAX_EXTSIZE) which would let an application determine the maximum size of the extended atribute storage. Sponsored by: DARPA & NAI Labs.
# faab4e27	16-Jul-2002	Kirk McKusick <mckusick@FreeBSD.org>	Change the name of st_createtime to st_birthtime. This change is made to reduce confusion between st_ctime and st_createtime. Submitted by: Eric Allman <eric@sendmail.org> Sponsored by: DARPA & NAI Labs.
# d331c5d4	10-Jul-2002	Matthew Dillon <dillon@FreeBSD.org>	Replace the global buffer hash table with per-vnode splay trees using a methodology similar to the vm_map_entry splay and the VM splay that Alan Cox is working on. Extensive testing has appeared to have shown no increase in overhead. Disadvantages Dirties more cache lines during lookups. Not as fast as a hash table lookup (but still N log N and optimal when there is locality of reference). Advantages vnode->v_dirtyblkhd is now perfectly sorted, making fsync/sync/filesystem syncer operate more efficiently. I get to rip out all the old hacks (some of which were mine) that tried to keep the v_dirtyblkhd tailq sorted. The per-vnode splay tree should be easier to lock / SMPng pushdown on vnodes will be easier. This commit along with another that Alan is working on for the VM page global hash table will allow me to implement ranged fsync(), optimize server-side nfs commit rpcs, and implement partial syncs by the filesystem syncer (aka filesystem syncer would detect that someone is trying to get the vnode lock, remembers its place, and skip to the next vnode). Note that the buffer cache splay is somewhat more complex then other splays due to special handling of background bitmap writes (multiple buffers with the same lblkno in the same vnode), and B_INVAL discontinuities between the old hash table and the existence of the buffer on the v_cleanblkhd list. Suggested by: alc
# f00501bd	09-Jul-2002	Jeff Roberson <jeff@FreeBSD.org>	- Remove IS_LOCKING_VFS() all of our filesystems support locking now - Add IGNORE_LOCK() that only ignores VCHR files for now since no one locks their underlying device in the leaf filesystems. (devvp) - Add prototypes for vop_lookup_{pre,post} that I forgot before.
# 9725e58e	07-Jul-2002	Jeff Roberson <jeff@FreeBSD.org>	- VT_PSEUDOFS and VT_PROCFS support locking now - Remove VBLK from the list of vtypes that are ignored for locking ops.
# 302c7aaa	05-Jul-2002	Jeff Roberson <jeff@FreeBSD.org>	- Add vop_strategy_pre to validate VOP_STRATEGY locking. - Disable original vop_strategy lock specification. - Switch to the new vop_strategy_pre for lock validation. VOP_STRATEGY requires only that the buf is locked UNLESS the block numbers need to be translated. There may be other reasons, but as long as the underlying layer uses a VOP to perform the operations they will be caught later.
# cc8662b0	05-Jul-2002	Jeff Roberson <jeff@FreeBSD.org>	Add "vop_rename_pre" to do pre rename lock verification. This is enabled only with DEBUG_VFS_LOCKS.
# d0641e4e	04-Jul-2002	Jeff Roberson <jeff@FreeBSD.org>	Cleanups for vnode lock debugging. - Tell IS_LOCKING_VFS to ignore block and character devices. specfs vnodes aren't locked for io and they just generate lots of false positives. - Add newlines to the badlock prints.
# 6bd521df	01-Jul-2002	Ian Dowse <iedowse@FreeBSD.org>	Use indirect function pointer hooks instead of #ifdef SOFTUPDATES direct calls for the two places where the kernel calls into soft updates code. Set up the hooks in softdep_initialize() and NULL them out in softdep_uninitialize(). This change allows soft updates to function correctly when ufs is loaded as a module. Reviewed by: mckusick
# 90769c9e	28-Jun-2002	Jeff Roberson <jeff@FreeBSD.org>	Improve the VOP locking asserts - Add vfs_badlock_print to control whether or not we print lock violations - Add vfs_badlock_panic to control whether we panic on lock violations Both default to on to mimic the original behavior if DEBUG_VFS_LOCKS is on.
# 1c85e6a3	21-Jun-2002	Kirk McKusick <mckusick@FreeBSD.org>	This commit adds basic support for the UFS2 filesystem. The UFS2 filesystem expands the inode to 256 bytes to make space for 64-bit block pointers. It also adds a file-creation time field, an ability to use jumbo blocks per inode to allow extent like pointer density, and space for extended attributes (up to twice the filesystem block size worth of attributes, e.g., on a 16K filesystem, there is space for 32K of attributes). UFS2 fully supports and runs existing UFS1 filesystems. New filesystems built using newfs can be built in either UFS1 or UFS2 format using the -O option. In this commit UFS1 is the default format, so if you want to build UFS2 format filesystems, you must specify -O 2. This default will be changed to UFS2 when UFS2 proves itself to be stable. In this commit the boot code for reading UFS2 filesystems is not compiled (see /sys/boot/common/ufsread.c) as there is insufficient space in the boot block. Once the size of the boot block is increased, this code can be defined. Things to note: the definition of SBSIZE has changed to SBLOCKSIZE. The header file <ufs/ufs/dinode.h> must be included before <ufs/ffs/fs.h> so as to get the definitions of ufs2_daddr_t and ufs_lbn_t. Still TODO: Verify that the first level bootstraps work for all the architectures. Convert the utility ffsinfo to understand UFS2 and test growfs. Add support for the extended attribute storage. Update soft updates to ensure integrity of extended attribute storage. Switch the current extended attribute interfaces to use the extended attribute storage. Add the extent like functionality (framework is there, but is currently never used). Sponsored by: DARPA & NAI Labs. Reviewed by: Poul-Henning Kamp <phk@freebsd.org>
# d394511d	16-May-2002	Tom Rhodes <trhodes@FreeBSD.org>	More s/file system/filesystem/g
# 84f9ed84	29-Apr-2002	Robert Watson <rwatson@FreeBSD.org>	Since devfs now uses vnode locks, add devfs back to IS_LOCKING_VFS.
# ae6b9ab0	25-Apr-2002	Robert Watson <rwatson@FreeBSD.org>	Add UDF to the list of filesystems where locking assertions should be evaluated. Approved by: scottl
# dbb17987	25-Apr-2002	Robert Watson <rwatson@FreeBSD.org>	1.43 (dfr 04-Apr-97): /* 1.43 (dfr 04-Apr-97): * [dfr] Kludge until I get around to fixing all the vfs locking. 1.43 (dfr 04-Apr-97): */ The new devfs doesn't support VFS locking. So don't do locking assertions for devfs vnodes. With this change, a kernel with options DEBUG_VFS_LOCKS actually gets to single-user mode. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
# df263cbd	14-Apr-2002	Scott Long <scottl@FreeBSD.org>	Add a filesystem driver for the Universal Disk Format. For more info, see http://people.freebsd.org/~scottl/udf MFC after: when asmodai gets the backport done Prodded by: phk asmodai des
# c897b813	19-Mar-2002	Jeff Roberson <jeff@FreeBSD.org>	Remove references to vm_zone.h and switch over to the new uma API. Also, remove maxsockets. If you look carefully you'll notice that the old zone allocator never honored this anyway.
# 789f12fe	19-Mar-2002	Alfred Perlstein <alfred@FreeBSD.org>	Remove __P
# 8355f576	19-Mar-2002	Jeff Roberson <jeff@FreeBSD.org>	This is the first part of the new kernel memory allocator. This replaces malloc(9) and vm_zone with a slab like allocator. Reviewed by: arch@
# 68edc1b9	18-Feb-2002	Poul-Henning Kamp <phk@FreeBSD.org>	Make v_addpollinfo() visible and non-inline. Have callers only call it as needed. Add necessary call in ufs_kqfilter(). Test-case found by: Andrew Gallatin <gallatin@cs.duke.edu>
# 4b55dbe3	17-Feb-2002	Poul-Henning Kamp <phk@FreeBSD.org>	Move the stuff related to select and poll out of struct vnode. The use of the zone allocator may or may not be overkill. There is an XXX: over in ufs/ufs/ufs_vnops.c that jlemon may need to revisit. This shaves about 60 bytes of struct vnode which on my laptop means 600k less RAM used for vnodes.
# e8b26e99	17-Feb-2002	Poul-Henning Kamp <phk@FreeBSD.org>	Collect the VN_KNOTE() macro definitions on vnode.h
# 038b6417	17-Feb-2002	Poul-Henning Kamp <phk@FreeBSD.org>	v_lease is unused, zap it.
# 079b7bad	07-Feb-2002	Julian Elischer <julian@FreeBSD.org>	Pre-KSE/M3 commit. this is a low-functionality change that changes the kernel to access the main thread of a process via the linked list of threads rather than assuming that it is embedded in the process. It IS still embeded there but remove all teh code that assumes that in preparation for the next commit which will actually move it out. Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,
# 23b59018	20-Dec-2001	Matthew Dillon <dillon@FreeBSD.org>	Fix a BUF_TIMELOCK race against BUF_LOCK and fix a deadlock in vget() against VM_WAIT in the pageout code. Both fixes involve adjusting the lockmgr's timeout capability so locks obtained with timeouts do not interfere with locks obtained without a timeout. Hopefully MFC: before the 4.5 release
# fdb33f08	18-Dec-2001	Matthew Dillon <dillon@FreeBSD.org>	This is a forward port of Peter's vlrureclaim() fix, with some minor mods by me to make it more efficient. The original code had serious balancing problems and could also deadlock easily. This code relegates the vnode reclamation to its own kproc and relaxes the vnode reclamation requirements to better maintain kern.maxvnodes. This code still doesn't balance as well as it could, but it does a much better job then the original code. Approved by: re@freebsd.org Obtained from: ps, peter, dillon MFS Assuming: Assuming no problems crop up in Yahoo testing MFC after: 7 days
# f03e89de	11-Nov-2001	Alfred Perlstein <alfred@FreeBSD.org>	turn vn_open() into a wrapper around vn_open_cred() which allows one to perform a vn_open using temporary/other/fake credentials. Modify the nfs client side locking code to use vn_open_cred() passing proc0's ucred instead of the old way which was to temporary raise privs while running vn_open(). This should close the race hopefully.
# 7e76bb56	05-Nov-2001	Matthew Dillon <dillon@FreeBSD.org>	Implement IO_NOWDRAIN and B_NOWDRAIN - prevents the buffer cache from blocking in wdrain during a write. This flag needs to be used in devices whos strategy routines turn-around and issue another high level I/O, such as when MD turns around and issues a VOP_WRITE to vnode backing store, in order to avoid deadlocking the dirty buffer draining code. Remove a vprintf() warning from MD when the backing vnode is found to be in-use. The syncer of buf_daemon could be flushing the backing vnode at the time of an MD operation so the warning is not correct. MFC after: 1 week
# 4ffa210b	27-Oct-2001	Matthew Dillon <dillon@FreeBSD.org>	syncdelay, filedelay, dirdelay, metadelay are ints, not time_t's, and can also be made static.
# 245df27c	25-Oct-2001	Matthew Dillon <dillon@FreeBSD.org>	Implement kern.maxvnodes. adjusting kern.maxvnodes now actually has a real effect. Optimize vfs_msync(). Avoid having to continually drop and re-obtain mutexes when scanning the vnode list. Improves looping case by 500%. Optimize ffs_sync(). Avoid having to continually drop and re-obtain mutexes when scanning the vnode list. This makes a couple of assumptions, which I believe are ok, in regards to vnode stability when the mount list mutex is held. Improves looping case by 500%. (more optimization work is needed on top of these fixes) MFC after: 1 week
# c72ccd01	22-Oct-2001	Matthew Dillon <dillon@FreeBSD.org>	Change the vnode list under the mount point from a LIST to a TAILQ in preparation for an implementation of limiting code for kern.maxvnodes. MFC after: 3 days
# 45fb069a	21-Oct-2001	Dag-Erling Smørgrav <des@FreeBSD.org>	Convert textvp_fullpath() into the more generic vn_fullpath() which takes a struct thread * and a struct vnode * instead of a struct proc *. Temporarily add a textvp_fullpath macro for compatibility.
# b5810bab	30-Sep-2001	Matthew Dillon <dillon@FreeBSD.org>	After extensive testing it has been determined that adding complexity to avoid removing higher level directory vnodes from the namecache has no perceivable effect and will be removed. This is especially true when vmiodirenable is turned on, which it is by default now. ( vmiodirenable makes a huge difference in directory caching ). The vfs.vmiodirenable and vfs.nameileafonly sysctls have been left in to allow further testing, but I expect to rip out vfs.nameileafonly soon too. I have also determined through testing that the real problem with numvnodes getting too large is due to the VM Page cache preventing the vnode from being reclaimed. The directory stuff made only a tiny dent relative to Poul's original code, enough so that some tests succeeded. But tests with several million small files show that the bigger problem is the VM Page cache. This will have to be addressed by a future commit. MFC after: 3 days
# 2af8d76d	13-Sep-2001	David E. O'Brien <obrien@FreeBSD.org>	Re-apply rev 1.178 -- style(9) the structure definitions. I have to wonder how many other changes were lost in the KSE mildstone 2 merge.
# b40ce416	12-Sep-2001	Julian Elischer <julian@FreeBSD.org>	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
# 06ae1e91	08-Sep-2001	Matthew Dillon <dillon@FreeBSD.org>	This brings in a Yahoo coredump patch from Paul, with additional mods by me (addition of vn_rdwr_inchunks). The problem Yahoo is solving is that if you have large process images core dumping, or you have a large number of forked processes all core dumping at the same time, the original coredump code would leave the vnode locked throughout. This can cause the directory vnode to get locked up, which can cause the parent directory vnode to get locked up, and so on all the way to the root node, locking the entire machine up for extremely long periods of time. This patch solves the problem in two ways. First it uses an advisory non-blocking lock to abort multiple processes trying to core to the same file. Second (my contribution) it chunks up the writes and uses bwillwrite() to avoid holding the vnode locked while blocking in the buffer cache. Submitted by: ps Reviewed by: dillon MFC after: 2 weeks
# 0f728902	27-Aug-2001	Peter Wemm <peter@FreeBSD.org>	If a file has been completely unlinked, stop automatically syncing the file. ffs will discard any pending dirty pages when it is closed, so we may as well not waste time trying to clean them. This doesn't stop other things from writing it out, eg: pageout, fsync(2) etc.
# 753d4978	29-May-2001	Poul-Henning Kamp <phk@FreeBSD.org>	Remove MFS
# ac8f990b	24-May-2001	Matthew Dillon <dillon@FreeBSD.org>	This patch implements O_DIRECT about 80% of the way. It takes a patchset Tor created a while ago, removes the raw I/O piece (that has cache coherency problems), and adds a buffer cache / VM freeing piece. Essentially this patch causes O_DIRECT I/O to not be left in the cache, but does not prevent it from going through the cache, hence the 80%. For the last 20% we need a method by which the I/O can be issued directly to buffer supplied by the user process and bypass the buffer cache entirely, but still maintain cache coherency. I also have the code working under -stable but the changes made to sys/file.h may not be MFCable, so an MFC is not on the table yet. Submitted by: tegge, dillon
# 0864ef1e	16-May-2001	Ian Dowse <iedowse@FreeBSD.org>	Change the second argument of vflush() to an integer that specifies the number of references on the filesystem root vnode to be both expected and released. Many filesystems hold an extra reference on the filesystem root vnode, which must be accounted for when determining if the filesystem is busy and then released if it isn't busy. The old `skipvp' approach required individual filesystem xxx_unmount functions to re-implement much of vflush()'s logic to deal with the root vnode. All 9 filesystems that hold an extra reference on the root vnode got the logic wrong in the case of forced unmounts, so `umount -f' would always fail if there were any extra root vnode references. Fix this issue centrally in vflush(), now that we can. This commit also fixes a vnode reference leak in devfs, which could result in idle devfs filesystems that refuse to unmount. Reviewed by: phk, bp
# a62615e5	01-May-2001	Poul-Henning Kamp <phk@FreeBSD.org>	Implement vop_std{get\|put}pages() and add them to the default vop[]. Un-copy&paste all the VOP_{GET\|PUT}PAGES() functions which do nothing but the default.
# fb919e4d	01-May-2001	Mark Murray <markm@FreeBSD.org>	Undo part of the tangle of having sys/lock.h and sys/mutex.h included in other "system" header files. Also help the deprecation of lockmgr.h by making it a sub-include of sys/lock.h and removing sys/lockmgr.h form kernel .c files. Sort sys/*.h includes where possible in affected files. OK'ed by: bde (with reservations)
# b7ebffbc	29-Apr-2001	Poul-Henning Kamp <phk@FreeBSD.org>	Add a vop_stdbmap(), and make it part of the default vop vector. Make 7 filesystems which don't really know about VOP_BMAP rely on the default vector, rather than more or less complete local vop_nopbmap() implementations.
# 1690d305	22-Apr-2001	David E. O'Brien <obrien@FreeBSD.org>	Removed old version of vaccess_acl_posix1e() that snuck back in rev 1.146. Submitted by (with good eye): Niels Chr. Bank-Pedersen <ncbp@bank-pedersen.dk>
# ea88c01d	21-Apr-2001	David E. O'Brien <obrien@FreeBSD.org>	Style(9) fixes: * get rid of space (0x20) before tab (^I) * indent with ^I, not 0x20 * continuation line for prototypes is for 0x20's past function's name col. * etc.
# 759cb263	18-Apr-2001	Seigo Tanimura <tanimura@FreeBSD.org>	Reclaim directory vnodes held in namecache if few free vnodes are available. Only directory vnodes holding no child directory vnodes held in v_cache_src are recycled, so that directory vnodes near the root of the filesystem hierarchy remain in namecache and directory vnodes are not reclaimed in cascade. The period of vnode reclaiming attempt and the number of vnodes attempted to reclaim can be tuned via sysctl(2). Suggested by: tegge Approved by: phk
# f84e29a0	17-Apr-2001	Poul-Henning Kamp <phk@FreeBSD.org>	This patch removes the VOP_BWRITE() vector. VOP_BWRITE() was a hack which made it possible for NFS client side to use struct buf with non-bio backing. This patch takes a more general approach and adds a bp->b_op vector where more methods can be added. The success of this patch depends on bp->b_op being initialized all relevant places for some value of "relevant" which is not easy to determine. For now the buffers have grown a b_magic element which will make such issues a tiny bit easier to debug.
# b114e127	16-Apr-2001	Robert Watson <rwatson@FreeBSD.org>	In my first reading of POSIX.1e, I misinterpreted handling of the ACL_USER_OBJ and ACL_GROUP_OBJ fields, believing that modification of the access ACL could be used by privileged processes to change file/directory ownership. In fact, this is incorrect; ACL_*_OBJ (+ ACL_MASK and ACL_OTHER) should have undefined ae_id fields; this commit attempts to correct that misunderstanding. o Modify arguments to vaccess_acl_posix1e() to accept the uid and gid associated with the vnode, as those can no longer be extracted from the ACL passed as an argument. Perform all comparisons against the passed arguments. This actually has the effect of simplifying a number of components of this call, as well as reducing the indent level, but now seperates handling of ACL_GROUP_OBJ from ACL_GROUP. o Modify acl_posix1e_check() to return EINVAL if the ae_id field of any of the ACL_{USER_OBJ,GROUP_OBJ,MASK,OTHER} entries is a value other than ACL_UNDEFINED_ID. As a temporary work-around to allow clean upgrades, set the ae_id field to ACL_UNDEFINED_ID before each check so that this cannot cause a failure in the short term (this work-around will be removed when the userland libraries and utilities are updated to take this change into account). o Modify ufs_sync_acl_from_inode() so that it forces ACL_{USER_OBJ,GROUP_OBJ,MASK,OTHER} ae_id fields to ACL_UNDEFINED_ID when synchronizing the ACL from the inode. o Modify ufs_sync_inode_from_acl to not propagate uid and gid information to the inode from the ACL during ACL update. Also modify the masking of permission bits that may be set from ALLPERMS to (S_IRWXU\|S_IRWXG\|S_IRWXO), as ACLs currently do not carry none-ACCESSPERMS (S_ISUID, S_ISGID, S_ISTXT). o Modify ufs_getacl() so that when it emulates an access ACL from the inode, it initializes the ae_id fields to ACL_UNDEFINED_ID. o Clean up ufs_setacl() substantially since it is no longer possible to perform chown/chgrp operations using vop_setacl(), so all the access control for that can be eliminated. o Modify ufs_access() so that it passes owner uid and gid information into vaccess_acl_posix1e(). Pointed out by: jedger Obtained from: TrustedBSD Project
# 0fdabd3a	13-Apr-2001	Boris Popov <bp@FreeBSD.org>	Move VT_SMBFS definition to the proper place. Undefine VI_LOCK/VI_UNLOCK.
# d67fe1bd	28-Mar-2001	Dag-Erling Smørgrav <des@FreeBSD.org>	Prepare for pseudofs.
# 30632071	18-Mar-2001	Robert Watson <rwatson@FreeBSD.org>	o Rename "namespace" argument to "attrnamespace" as namespace is a C++ reserved word. Submitted by: jkh Obtained from: TrustedBSD Project
# 70f36851	14-Mar-2001	Robert Watson <rwatson@FreeBSD.org>	o Change the API and ABI of the Extended Attribute kernel interfaces to introduce a new argument, "namespace", rather than relying on a first- character namespace indicator. This is in line with more recent thinking on EA interfaces on various mailing lists, including the posix1e, Linux acl-devel, and trustedbsd-discuss forums. Two namespaces are defined by default, EXTATTR_NAMESPACE_SYSTEM and EXTATTR_NAMESPACE_USER, where the primary distinction lies in the access control model: user EAs are accessible based on the normal MAC and DAC file/directory protections, and system attributes are limited to kernel-originated or appropriately privileged userland requests. o These API changes occur at several levels: the namespace argument is introduced in the extattr_{get,set}_file() system call interfaces, at the vnode operation level in the vop_{get,set}extattr() interfaces, and in the UFS extended attribute implementation. Changes are also introduced in the VFS extattrctl() interface (system call, VFS, and UFS implementation), where the arguments are modified to include a namespace field, as well as modified to advoid direct access to userspace variables from below the VFS layer (in the style of recent changes to mount by adrian@FreeBSD.org). This required some cleanup and bug fixing regarding VFS locks and the VFS interface, as a vnode pointer may now be optionally submitted to the VFS_EXTATTRCTL() call. Updated documentation for the VFS interface will be committed shortly. o In the near future, the auto-starting feature will be updated to search two sub-directories to the ".attribute" directory in appropriate file systems: "user" and "system" to locate attributes intended for those namespaces, as the single filename is no longer sufficient to indicate what namespace the attribute is intended for. Until this is committed, all attributes auto-started by UFS will be placed in the EXTATTR_NAMESPACE_SYSTEM namespace. o The default POSIX.1e attribute names for ACLs and Capabilities have been updated to no longer include the '$' in their filename. As such, if you're using these features, you'll need to rename the attribute backing files to the same names without '$' symbols in front. o Note that these changes will require changes in userland, which will be committed shortly. These include modifications to the extended attribute utilities, as well as to libutil for new namespace string conversion routines. Once the matching userland changes are committed, a buildworld is recommended to update all the necessary include files and verify that the kernel and userland environments are in sync. Note: If you do not use extended attributes (most people won't), upgrading is not imperative although since the system call API has changed, the new userland extended attribute code will no longer compile with old include files. o Couple of minor cleanups while I'm there: make more code compilation conditional on FFS_EXTATTR, which should recover a bit of space on kernels running without EA's, as well as update copyright dates. Obtained from: TrustedBSD Project
# 5293465f	06-Mar-2001	Robert Watson <rwatson@FreeBSD.org>	o Introduce filesystem-independent POSIX.1e ACL utility routines to support implementations of ACLs in file systems. Introduce the following new functions: vaccess_acl_posix1e() vaccess() that accepts an ACL acl_posix1e_mode_to_perm() Convert mode bits to ACL rights acl_posix1e_mode_to_entry() Build ACL entry from mode/uid/gid acl_posix1e_perms_to_mode() Generate file mode from ACL acl_posix1e_check() Syntax verification for ACL These functions allow a file system to rely on central ACL evaluation and syntax checking, as well as providing useful utilities to allow ACL-based file systems to generate mode/owner/etc information to return via VOP_GETATTR(), and to support file systems that split their ACL information over their existing inode storage (mode, uid, gid) and extended ACL into extended attributes (additional users, groups, ACL mask). o Add prototypes for exported functions to sys/acl.h, sys/vnode.h Reviewed by: trustedbsd-discuss, freebsd-arch Obtained from: TrustedBSD Project
# dfe53f15	21-Feb-2001	Boris Popov <bp@FreeBSD.org>	Add VI_LOCK(), VI_TRYLOCK() and VI_UNLOCK() macros to isolate implementation details of v_interlock. Reviewed by: jhb, phk, arch@
# 1b367556	23-Jan-2001	Jason Evans <jasone@FreeBSD.org>	Convert all simplelocks to mutexes and remove the simplelock implementations.
# 0a2c3d48	08-Jan-2001	Garrett Wollman <wollman@FreeBSD.org>	select() DKI is now in <sys/selinfo.h>.
# 2b6b0df7	26-Dec-2000	Matthew Dillon <dillon@FreeBSD.org>	This implements a better launder limiting solution. There was a solution in 4.2-REL which I ripped out in -stable and -current when implementing the low-memory handling solution. However, maxlaunder turns out to be the saving grace in certain very heavily loaded systems (e.g. newsreader box). The new algorithm limits the number of pages laundered in the first pageout daemon pass. If that is not sufficient then suceessive will be run without any limit. Write I/O is now pipelined using two sysctls, vfs.lorunningspace and vfs.hirunningspace. This prevents excessive buffered writes in the disk queues which cause long (multi-second) delays for reads. It leads to more stable (less jerky) and generally faster I/O streaming to disk by allowing required read ops (e.g. for indirect blocks and such) to occur without interrupting the write stream, amoung other things. NOTE: eventually, filesystem write I/O pipelining needs to be done on a per-device basis. At the moment it is globalized.
# 936524aa	18-Nov-2000	Matthew Dillon <dillon@FreeBSD.org>	Implement a low-memory deadlock solution. Removed most of the hacks that were trying to deal with low-memory situations prior to now. The new code is based on the concept that I/O must be able to function in a low memory situation. All major modules related to I/O (except networking) have been adjusted to allow allocation out of the system reserve memory pool. These modules now detect a low memory situation but rather then block they instead continue to operate, then return resources to the memory pool instead of cache them or leave them wired. Code has been added to stall in a low-memory situation prior to a vnode being locked. Thus situations where a process blocks in a low-memory condition while holding a locked vnode have been reduced to near nothing. Not only will I/O continue to operate, but many prior deadlock conditions simply no longer exist. Implement a number of VFS/BIO fixes (found by Ian): in biodone(), bogus-page replacement code, the loop was not properly incrementing loop variables prior to a continue statement. We do not believe this code can be hit anyway but we aren't taking any chances. We'll turn the whole section into a panic (as it already is in brelse()) after the release is rolled. In biodone(), the foff calculation was incorrectly clamped to the iosize, causing the wrong foff to be calculated for pages in the case of an I/O error or biodone() called without initiating I/O. The problem always caused a panic before. Now it doesn't. The problem is mainly an issue with NFS. Fixed casts for ~PAGE_MASK. This code worked properly before only because the calculations use signed arithmatic. Better to properly extend PAGE_MASK first before inverting it for the 64 bit masking op. In brelse(), the bogus_page fixup code was improperly throwing away the original contents of 'm' when it did the j-loop to fix the bogus pages. The result was that it would potentially invalidate parts of the WRONG page(!), leading to corruption. There may still be cases where a background bitmap write is being duplicated, causing potential corruption. We have identified a potentially serious bug related to this but the fix is still TBD. So instead this patch contains a KASSERT to detect the problem and panic the machine rather then continue to corrupt the filesystem. The problem does not occur very often.. it is very hard to reproduce, and it may or may not be the cause of the corruption people have reported. Review by: (VFS/BIO: mckusick, Ian Dowse <iedowse@maths.tcd.ie>) Testing by: (VM/Deadlock) Paul Saab <ps@yahoo-inc.com>
# 35e0e5b3	20-Oct-2000	John Baldwin <jhb@FreeBSD.org>	Catch up to moving headers: - machine/ipl.h -> sys/ipl.h - machine/mutex.h -> sys/mutex.h
# 47460a23	19-Oct-2000	Robert Watson <rwatson@FreeBSD.org>	o Introduce new VOP_ACCESS() flag VADMIN, allowing file systems to perform "administrative" authorization checks. In most cases, the VADMIN test checks to make sure the credential effective uid is the same as the file owner. o Modify vaccess() to set VADMIN as an available right if the uid is appropriate. o Modify references to uid-based access control operations such that they now always invoke VOP_ACCESS() instead of using hard-coded policy checks. o This allows alternative UFS policies to be implemented by replacing only ufs_access() (such as mandatory system policies). o VOP_ACCESS() requires the caller to hold an exclusive vnode lock on the vnode: I believe that new invocations of VOP_ACCESS() are always called with the lock held. o Some direct checks of the uid remain, largely associated with the QUOTA and SUIDDIR code. Reviewed by: eivind Obtained from: TrustedBSD Project
# a18b1f1d	03-Oct-2000	Jason Evans <jasone@FreeBSD.org>	Convert lockmgr locks from using simple locks to using mutexes. Add lockdestroy() and appropriate invocations, which corresponds to lockinit() and must be called to clean up after a lockmgr lock is no longer needed.
# 67e87166	25-Sep-2000	Boris Popov <bp@FreeBSD.org>	Add a lock structure to vnode structure. Previously it was either allocated separately (nfs, cd9660 etc) or keept as a first element of structure referenced by v_data pointer(ffs). Such organization leads to known problems with stacked filesystems. From this point vop_nolock() functions maintain only interlock lock. vop_stdlock() functions maintain built-in v_lock structure using lockmgr(). vop_sharedlock() is compatible with vop_stdunlock(), but maintains a shared lock on vnode. If filesystem wishes to export lockmgr compatible lock, it can put an address of this lock to v_vnlock field. This indicates that the upper filesystem can take advantage of it and use single lock structure for entire (or part) of stack of vnodes. This field shouldn't be examined or modified by VFS code except for initialization purposes. Reviewed in general by: mckusick
# 100d2c18	22-Sep-2000	Robert Watson <rwatson@FreeBSD.org>	o Introduce vn_extattr_rm(), a helper function in the style of vn_extattr_get() and vn_extattr_set(). vn_extattr_rm() removes the specified extended attribute from a vnode, authorizing the change as the kernel (NULL cred). Obtained from: TrustedBSD Project
# 210a5403	22-Sep-2000	Eivind Eklund <eivind@FreeBSD.org>	Remove addalias() prototype (staticized in kern/vfs_subr.c)
# 9ff5ce6b	12-Sep-2000	Boris Popov <bp@FreeBSD.org>	Add three new VOPs: VOP_CREATEVOBJECT, VOP_DESTROYVOBJECT and VOP_GETVOBJECT. They will be used by nullfs and other stacked filesystems to support full cache coherency. Reviewed in general by: mckusick, dillon
# 64dc16df	05-Sep-2000	Poul-Henning Kamp <phk@FreeBSD.org>	Move extern declaration of dead_vnodeop_p to a .h file. Remove race condition in vn_isdisk().
# 012c643d	29-Aug-2000	Robert Watson <rwatson@FreeBSD.org>	o Restructure vaccess() so as to check for DAC permission to modify the object before falling back on privilege. Make vaccess() accept an additional optional argument, privused, to determine whether privilege was required for vaccess() to return 0. Add commented out capability checks for reference. Rename some variables to make it more clear which modes/uids/etc are associated with the object, and which with the access mode. o Update file system use of vaccess() to pass NULL as the optional privused argument. Once additional patches are applied, suser() will no longer set ASU, so privused will permit passing of privilege information up the stack to the caller. Reviewed by: bde, green, phk, -security, others Obtained from: TrustedBSD Project
# e39c53ed	20-Aug-2000	Poul-Henning Kamp <phk@FreeBSD.org>	Centralize the canonical vop_access user/group/other check in vaccess(). Discussed with: bde
# 39f70682	18-Aug-2000	Poul-Henning Kamp <phk@FreeBSD.org>	Introduce vop_stdinactive() and make it the default if no vop_inactive is declared. Sort and prune a few vop_op[].
# e6a9ab52	08-Aug-2000	Robert Watson <rwatson@FreeBSD.org>	o Introduce vn_extattr_{get,set}, wrapper routines for VOP_GETEXTATTR and VOP_SETEXTATTR to simplify calling from in-kernel consumers, such as capability code. Both accept a vnode (optionally locked, with ioflg to indicate that), attribute name, and a buffer + buffer length in UIO_SYSSPACE. Both authorize the call as a kernel request, with cred set to NULL for the actual VOP_ calls. Obtained from: TrustedBSD Project
# 9b971133	23-Jul-2000	Kirk McKusick <mckusick@FreeBSD.org>	This patch corrects the first round of panics and hangs reported with the new snapshot code. Update addaliasu to correctly implement the semantics of the old checkalias function. When a device vnode first comes into existence, check to see if an anonymous vnode for the same device was created at boot time by bdevvp(). If so, adopt the bdevvp vnode rather than creating a new vnode for the device. This corrects a problem which caused the kernel to panic when taking a snapshot of the root filesystem. Change the calling convention of vn_write_suspend_wait() to be the same as vn_start_write(). Split out softdep_flushworklist() from softdep_flushfiles() so that it can be used to clear the work queue when suspending filesystem operations. Access to buffers becomes recursive so that snapshots can recursively traverse their indirect blocks using ffs_copyonwrite() when checking for the need for copy on write when flushing one of their own indirect blocks. This eliminates a deadlock between the syncer daemon and a process taking a snapshot. Ensure that softdep_process_worklist() can never block because of a snapshot being taken. This eliminates a problem with buffer starvation. Cleanup change in ffs_sync() which did not synchronously wait when MNT_WAIT was specified. The result was an unclean filesystem panic when doing forcible unmount with heavy filesystem I/O in progress. Return a zero'ed block when reading a block that was not in use at the time that a snapshot was taken. Normally, these blocks should never be read. However, the readahead code will occationally read them which can cause unexpected behavior. Clean up the debugging code that ensures that no blocks be written on a filesystem while it is suspended. Snapshots must explicitly label the blocks that they are writing during the suspension so that they do not cause a `write on suspended filesystem' panic. Reorganize ffs_copyonwrite() to eliminate a deadlock and also to prevent a race condition that would permit the same block to be copied twice. This change eliminates an unexpected soft updates inconsistency in fsck caused by the double allocation. Use bqrelse rather than brelse for buffers that will be needed soon again by the snapshot code. This improves snapshot performance.
# f2a2857b	11-Jul-2000	Kirk McKusick <mckusick@FreeBSD.org>	Add snapshots to the fast filesystem. Most of the changes support the gating of system calls that cause modifications to the underlying filesystem. The gating can be enabled by any filesystem that needs to consistently suspend operations by adding the vop_stdgetwritemount to their set of vnops. Once gating is enabled, the function vfs_write_suspend stops all new write operations to a filesystem, allows any filesystem modifying system calls already in progress to complete, then sync's the filesystem to disk and returns. The function vfs_write_resume allows the suspended write operations to begin again. Gating is not added by default for all filesystems as for SMP systems it adds two extra locks to such critical kernel paths as the write system call. Thus, gating should only be added as needed. Details on the use and current status of snapshots in FFS can be found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness is not included here. Unless and until you create a snapshot file, these changes should have no effect on your system (famous last words).
# c904bbbd	03-Jul-2000	Kirk McKusick <mckusick@FreeBSD.org>	Simplify and rationalise the management of the vnode free list (preparing the code to add snapshots).
# e6796b67	03-Jul-2000	Kirk McKusick <mckusick@FreeBSD.org>	Move the truncation code out of vn_open and into the open system call after the acquisition of any advisory locks. This fix corrects a case in which a process tries to open a file with a non-blocking exclusive lock. Even if it fails to get the lock it would still truncate the file even though its open failed. With this change, the truncation is done only after the lock is successfully acquired. Obtained from: BSD/OS
# e3975643	25-May-2000	Jake Burkholder <jake@FreeBSD.org>	Back out the previous change to the queue(3) interface. It was not discussed and should probably not happen. Requested by: msmith and others
# 740a1973	23-May-2000	Jake Burkholder <jake@FreeBSD.org>	Change the way that the queue(3) structures are declared; don't assume that the type argument to _HEAD and _ENTRY is a struct. Suggested by: phk Reviewed by: phk Approved by: mdodd
# b7db1901	26-Apr-2000	Brian Feldman <green@FreeBSD.org>	Move procfs_fullpath() to vfs_cache.c, with a rename to textvp_fullpath(). There's no excuse to have code in synthetic filestores that allows direct references to the textvp anymore. Feature requested by: msmith Feature agreed to by: warner Move requested by: phk Move agreed to by: bde
# 8a2852b1	21-Apr-2000	Brian Feldman <green@FreeBSD.org>	Move the declaration of "struct namecache" to vnode.h, as it can be useful elsewhere. Note, of course, that in an ideal world nothing should need to see our VFS implementation :-/
# e4649cfa	01-Apr-2000	Matthew Dillon <dillon@FreeBSD.org>	Change the write-behind code to take more care when starting async I/O's. The sequential read heuristic has been extended to cover writes as well. We continue to call cluster_write() normally, thus blocks in the file will still be reallocated for large (but still random) I/O's, but I/O will only be initiated for truely sequential writes. This solves a number of annoying situations, especially with DBM (hash method) writes, and also has the side effect of fixing a number of (stupid) benchmarks. Reviewed-by: mckusick
# b7a5f3ca	02-Feb-2000	Robert Watson <rwatson@FreeBSD.org>	Remove static qualifier from vgonel, as it is needed by the Arla folk outside of vfs_subr.c. Submitted by: Assar Westerlund <assar@sics.se> Reviewed by: rwatson Approved by: jkh
# 6cca21b1	15-Jan-2000	Boris Popov <bp@FreeBSD.org>	Add VT_NWFS tag.
# ba4ad1fc	09-Jan-2000	Poul-Henning Kamp <phk@FreeBSD.org>	Give vn_isdisk() a second argument where it can return a suitable errno. Suggested by: bde
# 664a31e4	28-Dec-1999	Peter Wemm <peter@FreeBSD.org>	Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL" is an application space macro and the applications are supposed to be free to use it as they please (but cannot). This is consistant with the other BSD's who made this change quite some time ago. More commits to come.
# 91f37dcb	18-Dec-1999	Robert Watson <rwatson@FreeBSD.org>	Second pass commit to introduce new ACL and Extended Attribute system calls, vnops, vfsops, both in /kern, and to individual file systems that require a vfsop_ array entry. Reviewed by: eivind
# 1e12157c	12-Dec-1999	Alfred Perlstein <alfred@FreeBSD.org>	explain that ioflags can be used to give read-ahead hints to the underlying filesystem.
# 6bdfe06a	11-Dec-1999	Eivind Eklund <eivind@FreeBSD.org>	Lock reporting and assertion changes. * lockstatus() and VOP_ISLOCKED() gets a new process argument and a new return value: LK_EXCLOTHER, when the lock is held exclusively by another process. * The ASSERT_VOP_(UN)LOCKED family is extended to use what this gives them * Extend the vnode_if.src format to allow more exact specification than locked/unlocked. This commit should not do any semantic changes unless you are using DEBUG_VFS_LOCKS. Discussed with: grog, mch, peter, phk Reviewed by: peter
# 0026ddba	09-Dec-1999	Semen Ustimenko <semenu@FreeBSD.org>	Added VT_HPFS vnode type.
# aa4f4b69	04-Oct-1999	Poul-Henning Kamp <phk@FreeBSD.org>	Move the buffered read/write code out of spec_{read\|write} and into two new functions spec_buf{read\|write}. Add sysctl vfs.bdev_buffered which defaults to 1 == true. This sysctl can be used to experimentally turn buffered behaviour for bdevs off. I should not be changed while any blockdevices are open. Remove the misplaced sysctl vfs.enable_userblk_io. No other changes in behaviour.
# 1b5464ef	29-Sep-1999	Poul-Henning Kamp <phk@FreeBSD.org>	Remove v_maxio from struct vnode. Replace it with mnt_iosize_max in struct mount. Nits from: bde
# 40360b1b	20-Sep-1999	Matthew Dillon <dillon@FreeBSD.org>	Final commit to remove vnode->v_lastr. vm_fault now handles read clustering issues (replacing code that used to be in ufs/ufs/ufs_readwrite.c). vm_fault also now uses the new VM page counter inlines. This completes the changeover from vnode->v_lastr to vm_entry_t->v_lastr for VM, and fp->f_nextread and fp->f_seqcount (which have been in the tree for a while). Determination of the I/O strategy (sequential, random, and so forth) is now handled on a descriptor-by-descriptor basis for base I/O calls, and on a memory-region-by-memory-region and process-by-process basis for VM faults. Reviewed by: David Greenman <dg@root.com>, Alan Cox <alc@cs.rice.edu>
# bb01f28e	17-Sep-1999	Matthew Dillon <dillon@FreeBSD.org>	Add vfs.enable_userblk_io sysctl to control whether user reads and writes to buffered block devices are allowed. The default is to be backwards compatible, i.e. reads and writes are allowed. The idea is for a larger crowd to start running with this disabled and see what problems, if any, crop up, and then to change the default to off and see if any problems crop up in the next 6 months prior to potentially removing support entirely. There are still a few people, Julian and myself included, who believe the buffered block device access from usermode to be useful. Remove use of vnode->v_lastr from buffered block device I/O in preparation for removal of vnode->v_lastr field, replacing it with the already existing seqcount metric to detect sequential operation. Reviewed by: Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com>
# c3aac50f	27-Aug-1999	Peter Wemm <peter@FreeBSD.org>	$Id$ -> $FreeBSD$
# dbafb366	26-Aug-1999	Poul-Henning Kamp <phk@FreeBSD.org>	Simplify the handling of VCHR and VBLK vnodes using the new dev_t: Make the alias list a SLIST. Drop the "fast recycling" optimization of vnodes (including the returning of a prexisting but stale vnode from checkalias). It doesn't buy us anything now that we don't hardlimit vnodes anymore. Rename checkalias2() and checkalias() to addalias() and addaliasu() - which takes dev_t and udev_t arg respectively. Make the revoke syscalls use vcount() instead of VALIASED. Remove VALIASED flag, we don't need it now and it is faster to traverse the much shorter lists than to maintain the flag. vfs_mountedon() can check the dev_t directly, all the vnodes point to the same one. Print the devicename in specfs/vprint(). Remove a couple of stale LFS vnode flags. Remove unimplemented/unused LK_DRAINED;
# 41d2e3e0	24-Aug-1999	Poul-Henning Kamp <phk@FreeBSD.org>	Introduce vn_isdisk(struct vnode *vp) function, and use it to test for diskness.
# 0ff7b13a	24-Aug-1999	Julian Elischer <julian@FreeBSD.org>	Make DEVFS use PHK's specinfo struct as the source of dev_t and devsw. In lookup() however it's the other way around as we need to supply the dev_t for the vnode, so devfs still has a copy of it stashed away. Sourcing it from the vnode in the vnops however is useful as it makes a lot of the code almost the same as that in specfs.
# a2801b77	21-Aug-1999	John Polstra <jdp@FreeBSD.org>	Support full-precision file timestamps. Until now, only the seconds have been maintained, and that is still the default. A new sysctl variable "vfs.timestamp_precision" can be used to enable higher levels of precision: 0 = seconds only; nanoseconds zeroed (default). 1 = seconds and nanoseconds, accurate within 1/HZ. 2 = seconds and nanoseconds, truncated to microseconds. >=3 = seconds and nanoseconds, maximum precision. Level 1 uses getnanotime(), which is fast but can be wrong by up to 1/HZ. Level 2 uses microtime(). It might be desirable for consistency with utimes() and friends, which take timeval structures rather than timespecs. Level 3 uses nanotime() for the higest precision. I benchmarked levels 0, 1, and 3 by copying a 550 MB tree with "cpio -pdu". There was almost negligible difference in the system times -- much less than 1%, and less than the variation among multiple runs at the same level. Bruce Evans dreamed up a torture test involving 1-byte reads with intervening fstat() calls, but the cpio test seems more realistic to me. This feature is currently implemented only for the UFS (FFS and MFS) filesystems. But I think it should be easy to support it in the others as well. An earlier version of this was reviewed by Bruce. He's not to blame for any breakage I've introduced since then. Reviewed by: bde (an earlier version of the code)
# 4d4f9323	13-Aug-1999	Poul-Henning Kamp <phk@FreeBSD.org>	s/v_specinfo/v_rdev/
# 0ef1c826	08-Aug-1999	Poul-Henning Kamp <phk@FreeBSD.org>	Decommision miscfs/specfs/specdev.h. Most of it goes into <sys/conf.h>, a few lines into <sys/vnode.h>. Add a few fields to struct specinfo, paving the way for the fun part.
# 67452993	26-Jul-1999	Alan Cox <alc@FreeBSD.org>	Add sysctl and support code to allow directories to be VMIO'd. The default setting for the sysctl is OFF, which is the historical operation. Submitted by: dillon
# 698bfad7	20-Jul-1999	Poul-Henning Kamp <phk@FreeBSD.org>	Now a dev_t is a pointer to struct specinfo which is shared by all specdev vnodes referencing this device. Details: cdevsw->d_parms has been removed, the specinfo is available now (== dev_t) and the driver should modify it directly when applicable, and the only driver doing so, does so: vn.c. I am not sure the logic in checking for "<" was right before, and it looks even less so now. An intial pool of 50 struct specinfo are depleted during early boot, after that malloc had better work. It is likely that fewer than 50 would do. Hashing is done from udev_t to dev_t with a prime number remainder hash, experiments show no better hash available for decent cost (MD5 is only marginally better) The prime number used should not be close to a power of two, we use 83 for now. Add new checkalias2() to get around the loss of info from dev2udev() in bdevvp(); The aliased vnodes are hung on a list straight of the dev_t, and speclisth[SPECSZ] is unused. The sharing of struct specinfo means that the v_specnext moves into the vnode which grows by 4 bytes. Don't use a VBLK dev_t which doesn't make sense in MFS, now we hang a dummy cdevsw on B/Cmaj 253 so that things look sane. Storage overhead from all of this is O(50k). Bump __FreeBSD_version to 400009 The next step will add the stuff needed so device-drivers can start to hang things from struct specinfo
# 6ca54864	18-Jul-1999	Poul-Henning Kamp <phk@FreeBSD.org>	Introduce the vn_todev(struct vnode*) function, which returns the dev_t corresponding to a VBLK or VCHR node, or NODEV.
# b008f92e	28-Jun-1999	Poul-Henning Kamp <phk@FreeBSD.org>	make va_fsid be of type udev_t
# e4ab40bc	15-Jun-1999	Kirk McKusick <mckusick@FreeBSD.org>	Get rid of the global variable rushjob and replace it with a function in kern/vfs_subr.c named speedup_syncer() which handles the speedup request. Change the various clients of rushjob to use the new function.
# bfbb9ce6	11-May-1999	Poul-Henning Kamp <phk@FreeBSD.org>	Divorce "dev_t" from the "major\|minor" bitmap, which is now called udev_t in the kernel but still called dev_t in userland. Provide functions to manipulate both types: major() umajor() minor() uminor() makedev() umakedev() dev2udev() udev2dev() For now they're functions, they will become in-line functions after one of the next two steps in this process. Return major/minor/makedev to macro-hood for userland. Register a name in cdevsw[] for the "filedescriptor" driver. In the kernel the udev_t appears in places where we have the major/minor number combination, (ie: a potential device: we may not have the driver nor the device), like in inodes, vattr, cdevsw registration and so on, whereas the dev_t appears where we carry around a reference to a actual device. In the future the cdevsw and the aliased-from vnode will be hung directly from the dev_t, along with up to two softc pointers for the device driver and a few houskeeping bits. This will essentially replace the current "alias" check code (same buck, bigger bang). A little stunt has been provided to try to catch places where the wrong type is being used (dev_t vs udev_t), if you see something not working, #undef DEVT_FASCIST in kern/kern_conf.c and see if it makes a difference. If it does, please try to track it down (many hands make light work) or at least try to reproduce it as simply as possible, and describe how to do that. Without DEVT_FASCIST I belive this patch is a no-op. Stylistic/posixoid comments about the userland view of the <sys/*.h> files welcome now, from userland they now contain the end result. Next planned step: make all dev_t's refer to the same devsw[] which means convert BLK's to CHR's at the perimeter of the vnodes and other places where they enter the game (bootdev, mknod, sysctl).
# e9189611	17-Apr-1999	Peter Wemm <peter@FreeBSD.org>	Well folks, this is it - The second stage of the removal for build support for LKM's..
# cd0fb97e	19-Feb-1999	Matthew Dillon <dillon@FreeBSD.org>	Make worklist add function a static, remove from sys/vnode.h
# 52e93a35	02-Feb-1999	Semen Ustimenko <semenu@FreeBSD.org>	Added vnode tag for NTFS. Reviewed by: David O'Brien <obrien@NUXI.com>
# 5a24726b	28-Jan-1999	Matthew Dillon <dillon@FreeBSD.org>	Clarify the SYSINIT problem by breaking SYSINIT's up into a void * version and a const void * version. Currently the const void * version simply calls the void * version ( i.e. no 'fix' is in place ). A solution needs to be found for the C_SYSINIT ( etc...) family of macros that allows const void * without generating a warning, but does not allow non-const void *.
# 8aef1712	27-Jan-1999	Matthew Dillon <dillon@FreeBSD.org>	Fix warnings in preparation for adding -Wall -Wcast-qual to the kernel compile
# d254af07	27-Jan-1999	Matthew Dillon <dillon@FreeBSD.org>	Fix warnings in preparation for adding -Wall -Wcast-qual to the kernel compile
# 15a1057c	20-Jan-1999	Eivind Eklund <eivind@FreeBSD.org>	Add 'options DEBUG_LOCKS', which stores extra information in struct lock, and add some macros and function parameters to make sure that the information get to the point where it can be put in the lock structure. While I'm here, add DEBUG_VFS_LOCKS to LINT.
# fb116777	05-Jan-1999	Eivind Eklund <eivind@FreeBSD.org>	Remove the 'waslocked' parameter to vfs_object_create().
# 4e61198e	10-Nov-1998	Peter Wemm <peter@FreeBSD.org>	Make the vnode opv vector construction fully dynamic. Previously we leaked memory on each unload and were limited to items referenced in the kernel copy of vnode_if.c. Now a kernel module is free to create it's own VOP_FOO() routines and the rest of the system will happily deal with it, including passthrough layers like union/umap/etc. Have VFS_SET() call a common vfs_modevent() handler rather than inline duplicating the common code all over the place. Have VNODEOP_SET() have the vnodeops removed at unload time (assuming a module) so that the vop_t vector is reclaimed. Slightly adjust the vop_t vectors so that calling slot 0 is a panic rather than a page fault. This could happen if VOP_something() was called without any handlers being present anywhere (including in vfs_default.c). slot 1 becomes the default vector for the vnodeop table. TODO: reclaim zones on unload (eg: nfs code)
# 630ff663	31-Oct-1998	Peter Wemm <peter@FreeBSD.org>	Convert the vnode clean/dirty attached buffer lists from LISTs to TAILQs. Add a new flags field (we get this for free because of struct packing) for cleaner management of tailq membership. We had two spare b_flags slots, but they are a precious resource and may be needed for other things that are related to other b_flags bits. The two new flags are convenient to use in a seperate location. Reviewed (in principle) by: dg Obtained from: John Dyson's old work-in-progress
# 20f02ef5	29-Oct-1998	Peter Wemm <peter@FreeBSD.org>	Remove the V_SAVEMETA flag, nothing uses it any more now that msdosfs and ext2fs call vtruncbuf() directly. This simplifies and cleans up vinvalbuf() a little.
# aa855a59	15-Oct-1998	Peter Wemm <peter@FreeBSD.org>	gulp. Jordan specifically OK'ed this.. This is the bulk of the support for doing kld modules. Two linker_sets were replaced by SYSINIT()'s. VFS's and exec handlers are self registered. kld is now a superset of lkm. I have converted most of them, they will follow as a seperate commit as samples. This all still works as a static a.out kernel using LKM's.
# 9afcea2f	11-Sep-1998	Robert V. Baron <rvb@FreeBSD.org>	All the references to cfs, in symbols, structs, and strings have been changed to coda. (Same for CFS.)
# a58915fc	09-Sep-1998	Tor Egge <tegge@FreeBSD.org>	Don't keep the underlying directory locked while performing the file system specific VFS_MOUNT operation. PR: 1067
# 4c162420	26-Aug-1998	Jordan K. Hubbard <jkh@FreeBSD.org>	Add VT_CFS type. Submitted by: Robert Baron <rvb@sicily.odyssey.cs.cmu.edu>
# 1f562172	10-May-1998	John Dyson <dyson@FreeBSD.org>	Fix the futimes/undelete/utrace conflict with other BSD's. Note that the only common usage of utrace (the possible problem with this commit) is with malloc, so this should be a real problem. Add the various NetBSD syscalls that allow full emulation of their development environment.
# 08637435	28-Mar-1998	Bruce Evans <bde@FreeBSD.org>	Moved some #includes from <sys/param.h> nearer to where they are actually used.
# bef608bd	15-Mar-1998	John Dyson <dyson@FreeBSD.org>	Some VM improvements, including elimination of alot of Sig-11 problems. Tor Egge and others have helped with various VM bugs lately, but don't blame him -- blame me!!! pmap.c: 1) Create an object for kernel page table allocations. This fixes a bogus allocation method previously used for such, by grabbing pages from the kernel object, using bogus pindexes. (This was a code cleanup, and perhaps a minor system stability issue.) pmap.c: 2) Pre-set the modify and accessed bits when prudent. This will decrease bus traffic under certain circumstances. vfs_bio.c, vfs_cluster.c: 3) Rather than calculating the beginning virtual byte offset multiple times, stick the offset into the buffer header, so that the calculated offset can be reused. (Long long multiplies are often expensive, and this is a probably unmeasurable performance improvement, and code cleanup.) vfs_bio.c: 4) Handle write recursion more intelligently (but not perfectly) so that it is less likely to cause a system panic, and is also much more robust. vfs_bio.c: 5) getblk incorrectly wrote out blocks that are incorrectly sized. The problem is fixed, and writes blocks out ONLY when B_DELWRI is true. vfs_bio.c: 6) Check that already constituted buffers have fully valid pages. If not, then make sure that the B_CACHE bit is not set. (This was a major source of Sig-11 type problems.) vfs_bio.c: 7) Fix a potential system deadlock due to an incorrectly specified sleep priority while waiting for a buffer write operation. The change that I made opens the system up to serious problems, and we need to examine the issue of process sleep priorities. vfs_cluster.c, vfs_bio.c: 8) Make clustered reads work more correctly (and more completely) when buffers are already constituted, but not fully valid. (This was another system reliability issue.) vfs_subr.c, ffs_inode.c: 9) Create a vtruncbuf function, which is used by filesystems that can truncate files. The vinvalbuf forced a file sync type operation, while vtruncbuf only invalidates the buffers past the new end of file, and also invalidates the appropriate pages. (This was a system reliabiliy and performance issue.) 10) Modify FFS to use vtruncbuf. vm_object.c: 11) Make the object rundown mechanism for OBJT_VNODE type objects work more correctly. Included in that fix, create pager entries for the OBJT_DEAD pager type, so that paging requests that might slip in during race conditions are properly handled. (This was a system reliability issue.) vm_page.c: 12) Make some of the page validation routines be a little less picky about arguments passed to them. Also, support page invalidation change the object generation count so that we handle generation counts a little more robustly. vm_pageout.c: 13) Further reduce pageout daemon activity when the system doesn't need help from it. There should be no additional performance decrease even when the pageout daemon is running. (This was a significant performance issue.) vnode_pager.c: 14) Teach the vnode pager to handle race conditions during vnode deallocations.
# b1897c19	08-Mar-1998	Julian Elischer <julian@FreeBSD.org>	Reviewed by: dyson@freebsd.org (john Dyson), dg@root.com (david greenman) Submitted by: Kirk McKusick (mcKusick@mckusick.com) Obtained from: WHistle development tree
# 8f9110f6	07-Mar-1998	John Dyson <dyson@FreeBSD.org>	This mega-commit is meant to fix numerous interrelated problems. There has been some bitrot and incorrect assumptions in the vfs_bio code. These problems have manifest themselves worse on NFS type filesystems, but can still affect local filesystems under certain circumstances. Most of the problems have involved mmap consistancy, and as a side-effect broke the vfs.ioopt code. This code might have been committed seperately, but almost everything is interrelated. 1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that are fully valid. 2) Rather than deactivating erroneously read initial (header) pages in kern_exec, we now free them. 3) Fix the rundown of non-VMIO buffers that are in an inconsistent (missing vp) state. 4) Fix the disassociation of pages from buffers in brelse. The previous code had rotted and was faulty in a couple of important circumstances. 5) Remove a gratuitious buffer wakeup in vfs_vmio_release. 6) Remove a crufty and currently unused cluster mechanism for VBLK files in vfs_bio_awrite. When the code is functional, I'll add back a cleaner version. 7) The page busy count wakeups assocated with the buffer cache usage were incorrectly cleaned up in a previous commit by me. Revert to the original, correct version, but with a cleaner implementation. 8) The cluster read code now tries to keep data associated with buffers more aggressively (without breaking the heuristics) when it is presumed that the read data (buffers) will be soon needed. 9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The delay loop waiting is not useful for filesystem locks, due to the length of the time intervals. 10) Correct and clean-up spec_getpages. 11) Implement a fully functional nfs_getpages, nfs_putpages. 12) Fix nfs_write so that modifications are coherent with the NFS data on the server disk (at least as well as NFS seems to allow.) 13) Properly support MS_INVALIDATE on NFS. 14) Properly pass down MS_INVALIDATE to lower levels of the VM code from vm_map_clean. 15) Better support the notion of pages being busy but valid, so that fewer in-transit waits occur. (use p->busy more for pageouts instead of PG_BUSY.) Since the page is fully valid, it is still usable for reads. 16) It is possible (in error) for cached pages to be busy. Make the page allocation code handle that case correctly. (It should probably be a printf or panic, but I want the system to handle coding errors robustly. I'll probably add a printf.) 17) Correct the design and usage of vm_page_sleep. It didn't handle consistancy problems very well, so make the design a little less lofty. After vm_page_sleep, if it ever blocked, it is still important to relookup the page (if the object generation count changed), and verify it's status (always.) 18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up. 19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush. 20) Fix vm_pager_put_pages and it's descendents to support an int flag instead of a boolean, so that we can pass down the invalidate bit.
# 50ce7ff4	23-Jan-1998	John Dyson <dyson@FreeBSD.org>	Add better support for larger I/O clusters, including larger physical I/O. The support is not mature yet, and some of the underlying implementation needs help. However, support does exist for IDE devices now.
# 47221757	17-Jan-1998	John Dyson <dyson@FreeBSD.org>	Tie up some loose ends in vnode/object management. Remove an unneeded config option in pmap. Fix a problem with faulting in pages. Clean-up some loose ends in swap pager memory management. The system should be much more stable, but all subtile bugs aren't fixed yet.
# 925a3a41	11-Jan-1998	John Dyson <dyson@FreeBSD.org>	Fix some vnode management problems, and better mgmt of vnode free list. Fix the UIO optimization code. Fix an assumption in vm_map_insert regarding allocation of swap pagers. Fix an spl problem in the collapse handling in vm_object_deallocate. When pages are freed from vnode objects, and the criteria for putting the associated vnode onto the free list is reached, either put the vnode onto the list, or put it onto an interrupt safe version of the list, for further transfer onto the actual free list. Some minor syntax changes changing pre-decs, pre-incs to post versions. Remove a bogus timeout (that I added for debugging) from vn_lock. PHK will likely still have problems with the vnode list management, and so do I, but it is better than it was.
# 95e5e988	05-Jan-1998	John Dyson <dyson@FreeBSD.org>	Make our v_usecount vnode reference count work identically to the original BSD code. The association between the vnode and the vm_object no longer includes reference counts. The major difference is that vm_object's are no longer freed gratuitiously from the vnode, and so once an object is created for the vnode, it will last as long as the vnode does. When a vnode object reference count is incremented, then the underlying vnode reference count is incremented also. The two "objects" are now more intimately related, and so the interactions are now much less complex. When vnodes are now normally placed onto the free queue with an object still attached. The rundown of the object happens at vnode rundown time, and happens with exactly the same filesystem semantics of the original VFS code. There is absolutely no need for vnode_pager_uncache and other travesties like that anymore. A side-effect of these changes is that SMP locking should be much simpler, the I/O copyin/copyout optimizations work, NFS should be more ponderable, and further work on layered filesystems should be less frustrating, because of the totally coherent management of the vnode objects and vnodes. Please be careful with your system while running this code, but I would greatly appreciate feedback as soon a reasonably possible.
# 82dc3896	29-Dec-1997	John Dyson <dyson@FreeBSD.org>	Add the vnode interlock back around vget.
# 60f8d464	28-Dec-1997	John Dyson <dyson@FreeBSD.org>	Fix the decl of vfs_ioopt, allow LFS to compile again, fix a minor problem with the object cache removal.
# 2be70f79	28-Dec-1997	John Dyson <dyson@FreeBSD.org>	Lots of improvements, including restructring the caching and management of vnodes and objects. There are some metadata performance improvements that come along with this. There are also a few prototypes added when the need is noticed. Changes include: 1) Cleaning up vref, vget. 2) Removal of the object cache. 3) Nuke vnode_pager_uncache and friends, because they aren't needed anymore. 4) Correct some missing LK_RETRY's in vn_lock. 5) Correct the page range in the code for msync. Be gentle, and please give me feedback asap.
# 1cbbd625	14-Dec-1997	Garrett Wollman <wollman@FreeBSD.org>	Add support for poll(2) on files. vop_nopoll() now returns POLLNVAL if one of the new poll types is requested; hopefully this will not break any existing code. (This is done so that programs have a dependable way of determining whether a filesystem supports the extended poll types or not.) The new poll types added are: POLLWRITE - file contents may have been modified POLLNLINK - file was linked, unlinked, or renamed POLLATTRIB - file's attributes may have been changed POLLEXTEND - file was extended Note that the internal operation of poll() means that it is impossible for two processes to reliably poll for the same event (this could be fixed but may not be worth it), so it is not possible to rewrite `tail -f' to use poll at this time.
# 1cd52ec3	05-Dec-1997	Bruce Evans <bde@FreeBSD.org>	Don't include <sys/lock.h> in headers when only `struct simplelock' is required. Fixed everything that depended on the pollution.
# cb451ebd	22-Nov-1997	Bruce Evans <bde@FreeBSD.org>	Staticized.
# f6fdfec4	18-Nov-1997	Bruce Evans <bde@FreeBSD.org>	Don't #include <machine/smp.h> even in the SMP case. Fixed the one place that depended on it. The "bazillion warnings" mentioned in the log for rev.1.45 apparently aren't a problem any more. It is hard to be sure because the SIMPLELOCK_DEBUG option turns off (and breaks) things in the SMP case. Don't forward declare structs that are already implicitly forward declared. Fixed a disordered declaration.
# dba3870c	26-Oct-1997	Poul-Henning Kamp <phk@FreeBSD.org>	VFS interior redecoration. Rename vn_default_error to vop_defaultop all over the place. Move vn_bwrite from vfs_bio.c to vfs_default.c and call it vop_stdbwrite. Use vop_null instead of nullop. Move vop_nopoll from vfs_subr.c to vfs_default.c Move vop_sharedlock from vfs_subr.c to vfs_default.c Move vop_nolock from vfs_subr.c to vfs_default.c Move vop_nounlock from vfs_subr.c to vfs_default.c Move vop_noislocked from vfs_subr.c to vfs_default.c Use vop_ebadf instead of *_ebadf. Add vop_defaultop for getpages on master vnode in MFS.
# 1b09ae77	26-Oct-1997	Poul-Henning Kamp <phk@FreeBSD.org>	Simplify the lease_check stuff.
# d54d34b5	16-Oct-1997	Poul-Henning Kamp <phk@FreeBSD.org>	Make a set of VOP standard lock, unlock & islocked VOP operators, which depend on the lock being located at vp->v_data. Saves 3x3 identical vop procs, more as the other filesystems becomes lock aware.
# 987f5696	16-Oct-1997	Poul-Henning Kamp <phk@FreeBSD.org>	Another VFS cleanup "kilo commit" 1. Remove VOP_UPDATE, it is (also) an UFS/{FFS,LFS,EXT2FS,MFS} intereface function, and now lives in the ufsmount structure. 2. Remove VOP_SEEK, it was unused. 3. Add mode default vops: VOP_ADVLOCK vop_einval VOP_CLOSE vop_null VOP_FSYNC vop_null VOP_IOCTL vop_enotty VOP_MMAP vop_einval VOP_OPEN vop_null VOP_PATHCONF vop_einval VOP_READLINK vop_einval VOP_REALLOCBLKS vop_eopnotsupp And remove identical functionality from filesystems 4. Add vop_stdpathconf, which returns the canonical stuff. Use it in the filesystems. (XXX: It's probably wrong that specfs and fifofs sets this vop, shouldn't it come from the "host" filesystem, for instance ufs or cd9660 ?) 5. Try to make system wide VOP functions have vop_* names. 6. Initialize the um_* vectors in LFS. (Recompile your LKMS!!!)
# cec0f20c	16-Oct-1997	Poul-Henning Kamp <phk@FreeBSD.org>	VFS mega cleanup commit (x/N) 1. Add new file "sys/kern/vfs_default.c" where default actions for VOPs go. Implement proper defaults for ABORTOP, BWRITE, LEASE, POLL, REVOKE and STRATEGY. Various stuff spread over the entire tree belongs here. 2. Change VOP_BLKATOFF to a normal function in cd9660. 3. Kill VOP_BLKATOFF, VOP_TRUNCATE, VOP_VFREE, VOP_VALLOC. These are private interface functions between UFS and the underlying storage manager layer (FFS/LFS/MFS/EXT2FS). The functions now live in struct ufsmount instead. 4. Remove a kludge of VOP_ functions in all filesystems, that did nothing but obscure the simplicity and break the expandability. If a filesystem doesn't implement VOP_FOO, it shouldn't have an entry for it in its vnops table. The system will try to DTRT if it is not implemented. There are still some cruft left, but the bulk of it is done. 5. Fix another VCALL in vfs_cache.c (thanks Bruce!)
# a1c995b6	12-Oct-1997	Poul-Henning Kamp <phk@FreeBSD.org>	Last major round (Unless Bruce thinks of somthing :-) of malloc changes. Distribute all but the most fundamental malloc types. This time I also remembered the trick to making things static: Put "static" in front of them. A couple of finer points by: bde
# 99448ed1	20-Sep-1997	John Dyson <dyson@FreeBSD.org>	Change the M_NAMEI allocations to use the zone allocator. This change plus the previous changes to use the zone allocator decrease the useage of malloc by half. The Zone allocator will be upgradeable to be able to use per CPU-pools, and has more intelligent usage of SPLs. Additionally, it has reasonable stats gathering capabilities, while making most calls inline.
# 3a74593f	13-Sep-1997	Peter Wemm <peter@FreeBSD.org>	Update interfaces for poll()
# a051452a	31-Aug-1997	Poul-Henning Kamp <phk@FreeBSD.org>	Change the 0xdeadb hack to a flag called VDOOMED. Introduce VFREE which indicates that vnode is on freelist. Rename vholdrele() to vdrop(). Create vfree() and vbusy() to add/delete vnode from freelist. Add vfree()/vbusy() to keep (v_holdcnt != 0 \|\| v_usecount != 0) vnodes off the freelist. Generalize vhold()/v_holdcnt to mean "do not recycle". Fix reassignbuf()s lack of use of vhold(). Use vhold() instead of checking v_cache_src list. Remove vtouch(), the vnodes are always vget'ed soon enough after for it to have any measuable effect. Add sysctl debug.freevnodes to keep track of things. Move cache_purge() up in getnewvnodes to avoid race. Decrement v_usecount after VOP_INACTIVE(), put a vhold() on it during VOP_INACTIVE() Unmacroize vhold()/vdrop() Print out VDOOMED and VFREE flags (XXX: should use %b) Reviewed by: dyson
# 0fa2443f	26-Aug-1997	Poul-Henning Kamp <phk@FreeBSD.org>	Uncut&paste cache_lookup(). This unifies several times in theory indentical 50 lines of code. The filesystems have a new method: vop_cachedlookup, which is the meat of the lookup, and use vfs_cache_lookup() for their vop_lookup method. vfs_cache_lookup() will check the namecache and pass on to the vop_cachedlookup method in case of a miss. It's still the task of the individual filesystems to populate the namecache with cache_enter(). Filesystems that do not use the namecache will just provide the vop_lookup method as usual.
# 7cbfd031	17-Aug-1997	Steve Passe <fsmp@FreeBSD.org>	Added includes of smp.h for SMP. This eliminates a bazillion warnings about implicit s_lock & friends.
# b15a966e	04-May-1997	Poul-Henning Kamp <phk@FreeBSD.org>	1. Add a {pointer, v_id} pair to the vnode to store the reference to the ".." vnode. This is cheaper storagewise than keeping it in the namecache, and it makes more sense since it's a 1:1 mapping. 2. Also handle the case of "." more intelligently rather than stuff the namecache with pointless entries. 3. Add two lists to the vnode and hang namecache entries which go from or to this vnode. When cleaning a vnode, delete all namecache entries it invalidates. 4. Never reuse namecache enties, malloc new ones when we need it, free old ones when they die. No longer a hard limit on how many we can have. 5. Remove the upper limit on namelength of namecache entries. 6. Make a global list for negative namecache entries, limit their number to a sysctl'able (debug.ncnegfactor) fraction of the total namecache. Currently the default fraction is 1/16th. (Suggestions for better default wanted!) 7. Assign v_id correctly in the face of 32bit rollover. 8. Remove the LRU list for namecache entries, not needed. Remove the #ifdef NCH_STATISTICS stuff, it's not needed either. 9. Use the vnode freelist as a true LRU list, also for namecache accesses. 10. Reuse vnodes more aggresively but also more selectively, if we can't reuse, malloc a new one. There is no longer a hard limit on their number, they grow to the point where we don't reuse potentially usable vnodes. A vnode will not get recycled if still has pages in core or if it is the source of namecache entries (Yes, this does indeed work :-) "." and ".." are not namecache entries any longer...) 11. Do not overload the v_id field in namecache entries with whiteout information, use a char sized flags field instead, so we can get rid of the vpid and v_id fields from the namecache struct. Since we're linked to the vnodes and purged when they're cleaned, we don't have to check the v_id any more. 12. NFS knew about the limitation on name length in the namecache, it shouldn't and doesn't now. Bugs: The namecache statistics no longer includes the hits for ".." and "." hits. Performance impact: Generally in the +/- 0.5% for "normal" workstations, but I hope this will allow the system to be selftuning over a bigger range of "special" applications. The case where RAM is available but unused for cache because we don't have any vnodes should be gone. Future work: Straighten out the namecache statistics. "desiredvnodes" is still used to (bogusly ?) size hash tables in the filesystems. I have still to find a way to safely free unused vnodes back so their number can shrink when not needed. There is a few uses of the v_id field left in the filesystems, scheduled for demolition at a later time. Maybe a one slot cache for unused namecache entries should be implemented to decrease the malloc/free frequency.
# 754c6d37	04-Apr-1997	Doug Rabson <dfr@FreeBSD.org>	Add some debugging macros for tracing VFS locking bugs. Declare (hopefully short-lived) vop_sharedlock.
# 6875d254	22-Feb-1997	Peter Wemm <peter@FreeBSD.org>	Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.
# 670718e2	12-Feb-1997	Mike Pritchard <mpp@FreeBSD.org>	Remove function prototypes for vfs_mountroot and vgoneall, since they were removed with the Lite2 merge. Submitted by: bde
# 996c772f	09-Feb-1997	John Dyson <dyson@FreeBSD.org>	This is the kernel Lite/2 commit. There are some requisite userland changes, so don't expect to be able to run the kernel as-is (very well) without the appropriate Lite/2 userland changes. The system boots and can mount UFS filesystems. Untested: ext2fs, msdosfs, NFS Known problems: Incorrect Berkeley ID strings in some files. Mount_std mounts will not work until the getfsent library routine is changed. Reviewed by: various people Submitted by: Jeffery Hsu <hsu@freebsd.org>
# 5131d64e	16-Jan-1997	Bruce Evans <bde@FreeBSD.org>	Removed option EXTRAVNODES. All versions of FreeBSD-2.x have a sysctl variable `kern.maxvnodes' which gives much better control over vnode allocation than EXTRAVNODES (except in -current between 1995/10/28 and 1996/11/12, kern.maxvnodes was read-only and thus useless).
# 1130b656	14-Jan-1997	Jordan K. Hubbard <jkh@FreeBSD.org>	Make the long-awaited change from $Id$ to $FreeBSD$ This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long. Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
# 8b612c4b	28-Dec-1996	John Dyson <dyson@FreeBSD.org>	This commit is the embodiment of some VFS read clustering improvements. Firstly, now our read-ahead clustering is on a file descriptor basis and not on a per-vnode basis. This will allow multiple processes reading the same file to take advantage of read-ahead clustering. Secondly, there previously was a problem with large reads still using the ramp-up algorithm. Of course, that was bogus, and now we read the entire "chunk" off of the disk in one operation. The read-ahead clustering algorithm should use less CPU than the previous also (I hope :-)). NOTE: THAT LKMS MUST BE REBUILT!!!
# 17a6a9e3	17-Oct-1996	Jordan K. Hubbard <jkh@FreeBSD.org>	Some very small changes to support Netcon's TFS filesystem. These patches were formerly applied by the Netcon installer before rebuilding your kernel.
# 62c3734c	15-Oct-1996	Bruce Evans <bde@FreeBSD.org>	Updated #includes to 4.4lite style.
# 6476c0d2	21-Aug-1996	John Dyson <dyson@FreeBSD.org>	Even though this looks like it, this is not a complex code change. The interface into the "VMIO" system has changed to be more consistant and robust. Essentially, it is now no longer necessary to call vn_open to get merged VM/Buffer cache operation, and exceptional conditions such as merged operation of VBLK devices is simpler and more correct. This code corrects a potentially large set of problems including the problems with ktrace output and loaded systems, file create/deletes, etc. Most of the changes to NFS are cosmetic and name changes, eliminating a layer of subroutine calls. The direct calls to vput/vrele have been re-instituted for better cross platform compatibility. Reviewed by: davidg
# 114a8cff	30-May-1996	Peter Wemm <peter@FreeBSD.org>	Add an option "EXTRA_VNODES" to cause an extra number of vnode structures to be allocated at boot time. This is an expensive option, as they consume physical ram and are not pageable etc. In certain situations, this kind of option is quite useful, especially for news servers that access a large number of directories at random and torture the name cache. Defining 5000 or 10000 extra vnodes should cut down the amount of vnode recycling somewhat, which should allow better name and directory caching etc. This is a "your mileage may vary" option, with no real indication of what works best for your machine except trial and error. Too many will cost you ram that you could otherwise use for disk buffers etc. This is based on something John Dyson mentioned to me a while ago.
# e206cebf	28-Mar-1996	David Greenman <dg@FreeBSD.org>	Change v_usecount & v_writecount from a short to an int. As shorts they can and will overflow on large machines - especially on machines with filesystems with lots of files (like netnews servers), and the result is a "free vnode isn't" panic or worse. This fixes one of the causes of these panics that I've been experiancing on wcarchive.
# 02e2c406	11-Mar-1996	Peter Wemm <peter@FreeBSD.org>	Import 4.4BSD-Lite2 onto the vendor branch, note that in the kernel, all files are off the vendor branch, so this should not change anything. A "U" marker generally means that the file was not changed in between the 4.4Lite and Lite-2 releases, and does not need a merge. "C" generally means that there was a change. [new sys/syscallargs.h file, to be "cvs rm"ed]
# d9c68230	03-Mar-1996	Peter Wemm <peter@FreeBSD.org>	Add missing prototype for newly public vn_vmio_open function, next to vn_vmio_close.
# 6c5e9bbd	30-Jan-1996	Mike Pritchard <mpp@FreeBSD.org>	Fix a bunch of spelling errors in the comment fields of a bunch of system include files.
# bd7e5f99	18-Jan-1996	John Dyson <dyson@FreeBSD.org>	Eliminated many redundant vm_map_lookup operations for vm_mmap. Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish overhead for merged cache. Efficiency improvement for vfs_cluster. It used to do alot of redundant calls to cluster_rbuild. Correct the ordering for vrele of .text and release of credentials. Use the selective tlb update for 486/586/P6. Numerous fixes to the size of objects allocated for files. Additionally, fixes in the various pagers. Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs. Fixes in the swap pager for exhausted resources. The pageout code will not as readily thrash. Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE), thereby improving efficiency of several routines. Eliminate even more unnecessary vm_page_protect operations. Significantly speed up process forks. Make vm_object_page_clean more efficient, thereby eliminating the pause that happens every 30seconds. Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the case of filesystems mounted async. Fix a panic with busy pages when write clustering is done for non-VMIO buffers.
# 45f486a5	25-Dec-1995	Bruce Evans <bde@FreeBSD.org>	Removed redundant (incompletely staticized) declararations.
# 27a0b398	17-Dec-1995	Poul-Henning Kamp <phk@FreeBSD.org>	Staticize. Unstaticize a function in scsi/scsi_base that was used, with an undocumented option. My last count on the LINT kernel shows: Total symbols: 3647 unref symbols: 463 undef symbols: 4 1 ref symbols: 1751 2 ref symbols: 485 Approaching the pain threshold now.
# 68a3b3cd	15-Dec-1995	Bruce Evans <bde@FreeBSD.org>	Completed a function declaration. Restored order to prototype list. Restored tabs to #defines.
# a316d390	10-Dec-1995	John Dyson <dyson@FreeBSD.org>	Changes to support 1Tb filesizes. Pages are now named by an (object,index) pair instead of (object,offset) pair.
# f57e6547	09-Nov-1995	Bruce Evans <bde@FreeBSD.org>	Introduced a type `vop_t' for vnode operation functions and used it 1138 times (:-() in casts and a few more times in declarations. This change is null for the i386. The type has to be `typedef int vop_t(void *)' and not `typedef int vop_t()' because `gcc -Wstrict-prototypes' warns about the latter. Since vnode op functions are called with args of different (struct pointer) types, neither of these function types is any use for type checking of the arg, so it would be preferable not to use the complete function type, especially since using the complete type requires adding 1138 casts to avoid compiler warnings and another 40+ casts to reverse the function pointer conversions before calling the functions.
# ef27d1fe	07-Nov-1995	John Dyson <dyson@FreeBSD.org>	Export a symbol that ext2fs wants (insmntque.)
# 39d38f93	06-Jul-1995	David Greenman <dg@FreeBSD.org>	Fixed an object allocation race condition that was causing a "object deallocated too many times" panic when using NFS. Reviewed by: John Dyson
# aa2cabb9	27-Jun-1995	David Greenman <dg@FreeBSD.org>	1) Converted v_vmdata to v_object. 2) Removed unnecessary vm_object_lookup()/pager_cache(object, TRUE) pairs after vnode_pager_alloc() calls - the object is already guaranteed to be persistent. 3) Removed some gratuitous casts.
# 999422d7	19-Apr-1995	Julian Elischer <julian@FreeBSD.org>	Reviewed by: no-one yet, but non-intrusive Submitted by: julian@tfs.com Obtained from: written from scratch slight changes to make space for devfs.. (also conditional test code in i386/isa/fd.c) =================================================================== RCS file: /home/ncvs/src/sys/sys/malloc.h,v retrieving revision 1.7 diff -r1.7 malloc.h 113a114,117 > #define M_DEVFSMNT 62 /* DEVFS mount structure / > #define M_DEVFSBACK 63 / DEVFS Back node / > #define M_DEVFSFRONT 64 / DEVFS Front node / > #define M_DEVFSNODE 65 / DEVFS node / 184c188,192 < NULL, NULL, NULL, NULL, NULL, \ --- > "DEVFS mount", / 62 M_DEVFSMNT / \ > "DEVFS back", / 63 M_DEVFSBACK / \ > "DEVFS front", / 64 M_DEVFSFRONT / \ > "DEVFS node", / 65 M_DEVFSNODE / \ > NULL, \ Index: sys/mount.h =================================================================== RCS file: /home/ncvs/src/sys/sys/mount.h,v retrieving revision 1.16 diff -r1.16 mount.h 100c100,101 < #define MOUNT_MAXTYPE 15 --- > #define MOUNT_DEVFS 16 / existing device Filesystem / > #define MOUNT_MAXTYPE 16 118a120 > "devfs", / 15 MOUNT_DEVFS */ \ Index: sys/vnode.h =================================================================== RCS file: /home/ncvs/src/sys/sys/vnode.h,v retrieving revision 1.19 diff -r1.19 vnode.h 61c61 < VT_UNION, VT_MSDOSFS --- > VT_UNION, VT_MSDOSFS, VT_DEVFS
# f6b04d2b	09-Apr-1995	David Greenman <dg@FreeBSD.org>	Changes from John Dyson and myself: Fixed remaining known bugs in the buffer IO and VM system. vfs_bio.c: Fixed some race conditions and locking bugs. Improved performance by removing some (now) unnecessary code and fixing some broken logic. Fixed process accounting of # of FS outputs. Properly handle NFS interrupts (B_EINTR). (various) Replaced calls to clrbuf() with calls to an optimized routine called vfs_bio_clrbuf(). (various FS sync) Sync out modified vnode_pager backed pages. ffs_vnops.c: Do two passes: Sync out file data first, then indirect blocks. vm_fault.c: Fixed deadly embrace caused by acquiring locks in the wrong order. vnode_pager.c: Changed to use buffer I/O system for writing out modified pages. This should fix the problem with the modification date previous not getting updated. Also dramatically simplifies the code. Note that this is going to change in the future and be implemented via VOP_PUTPAGES(). vm_object.c: Fixed a pile of bugs related to cleaning (vnode) objects. The performance of vm_object_page_clean() is terrible when dealing with huge objects, but this will change when we implement a binary tree to keep the object pages sorted. vm_pageout.c: Fixed broken clustering of pageouts. Fixed race conditions and other lockup style bugs in the scanning of pages. Improved performance.
# cb9db2b6	28-Mar-1995	David Greenman <dg@FreeBSD.org>	When NFS is compiled into the kernel, make NQNFS lease checking conditional on a "NQNFS" kernel config option. NQNFS is a 4.4 wart and the performance penalty of the lease checks on the client/server for _local_ I/O is too high to have this occur all the time - especially when most people will never use it.
# b5e8ce9f	16-Mar-1995	Bruce Evans <bde@FreeBSD.org>	Add and move declarations to fix all of the warnings from `gcc -Wimplicit' (except in netccitt, netiso and netns) and most of the warnings from `gcc -Wnested-externs'. Fix all the bugs found. There were no serious ones.
# 79f7a9e1	07-Mar-1995	David Greenman <dg@FreeBSD.org>	Added a new flag "VAGE" to indicate that the vnode should go on the head of the free list.
# ff2e6a8b	05-Jan-1995	Justin T. Gibbs <gibbs@FreeBSD.org>	Add VNINACT flag. LFS has a habbit of skipping the ufs_inactive procedure. It used to do this by setting a global <Yuck>. Now we set th VNINACT flag in the vnode to force a skip of ufs_inactive. Sorry for missing this file in my last commit folks. Index: vnode.h =================================================================== RCS file: /usr/cvs/src/sys/sys/vnode.h,v retrieving revision 1.14 diff -c -r1.14 vnode.h * 1.14 1994/11/14 13:51:53 --- vnode.h 1994/12/03 01:06:27 *********** * 116,121 **** --- 116,122 ---- #define VALIASED 0x0800 /* vnode has an alias / #define VDIROP 0x1000 / LFS: vnode is involved in a directory op / #define VVMIO 0x2000 / VMIO flag / + #define VNINACT 0x4000 / LFS: skip ufs_inactive() in lfs_vunref / / * Vnode attributes. A field value of VNOVAL represents a field whose value
# 80d08a82	14-Nov-1994	Bruce Evans <bde@FreeBSD.org>	Add prototype for vfinddev().
# 091b0456	20-Oct-1994	Garrett Wollman <wollman@FreeBSD.org>	Make my ALLDEVS kernel compile (basically, LINT minus a lot of options). This involves fixing a few things I broke last time.
# b4a8d575	08-Oct-1994	Poul-Henning Kamp <phk@FreeBSD.org>	Added prototypes here and there. Moved pfctlinput into socket.h.
# 8e58bf68	05-Oct-1994	David Greenman <dg@FreeBSD.org>	Stuff object into v_vmdata rather than pager. Not important which at the moment, but will be in the future. Other changes mostly cosmetic, but are made for future VMIO considerations. Submitted by: John Dyson
# f86eaaca	02-Oct-1994	Poul-Henning Kamp <phk@FreeBSD.org>	Prototypes, prototypes and even more prototypes. Not quite done yet, but getting closer all the time.
# bb56ec4a	25-Sep-1994	Poul-Henning Kamp <phk@FreeBSD.org>	While in the real world, I had a bad case of being swapped out for a lot of cycles. While waiting there I added a lot of the extra ()'s I have, (I have never used LISP to any extent). So I compiled the kernel with -Wall and shut up a lot of "suggest you add ()'s", removed a bunch of unused var's and added a couple of declarations here and there. Having a lap-top is highly recommended. My kernel still runs, yell at me if you kernel breaks.
# e21fa31a	22-Sep-1994	Garrett Wollman <wollman@FreeBSD.org>	Make NFS loadable.
# c901836c	20-Sep-1994	Garrett Wollman <wollman@FreeBSD.org>	Implemented loadable VFS modules, and made most existing filesystems loadable. (NFS is a notable exception.)
# 27a0bc89	19-Sep-1994	Doug Rabson <dfr@FreeBSD.org>	Added msdosfs. Obtained from: NetBSD
# d8f10c11	15-Sep-1994	Bruce Evans <bde@FreeBSD.org>	Add some prototypes.
# 1cdeb653	29-Aug-1994	David Greenman <dg@FreeBSD.org>	"bogus" fixes from 1.1.5 to work around some cache coherency problems.
# af9da405	20-Aug-1994	Paul Richards <paul@FreeBSD.org>	Made them all idempotent. Reviewed by: Submitted by:
# 3c4dd356	02-Aug-1994	David Greenman <dg@FreeBSD.org>	Added $Id$
# df8bae1d	24-May-1994	Rodney W. Grimes <rgrimes@FreeBSD.org>	BSD 4.4 Lite Kernel Sources