#
56a8aca8 |
|
18-May-2024 |
Pawel Jakub Dawidek <pjd@FreeBSD.org> |
Stop treating size 0 as unknown size in vnode_create_vobject(). Whenever file is created, the vnode_create_vobject() function will try to determine its size by calling vn_getsize_locked() as size 0 is ambigious: it means either the file size is 0 or the file size is unknown. Introduce special value for the size argument: VNODE_NO_SIZE. Only when it is given, the vnode_create_vobject() will try to obtain file's size on its own. Introduce dedicated vnode_disk_create_vobject() for use by g_vfs_open(), so we don't have to call vn_isdisk() in the common case (for regular files). Handle the case of mediasize==0 in g_vfs_open(). Reviewed by: alc, kib, markj, olce Approved by: oshogbo (mentor), allanjude (mentor) Differential Revision: https://reviews.freebsd.org/D45244
|
#
f04220c1 |
|
19-Jan-2024 |
Konstantin Belousov <kib@FreeBSD.org> |
kcmp(2): implement for vnode files Reviewed by: brooks, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D43518
|
#
29363fb4 |
|
23-Nov-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove ancient SCCS tags. Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl script. Sponsored by: Netflix
|
#
2ff63af9 |
|
16-Aug-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove $FreeBSD$: one-line .h pattern Remove /^\s*\*+\s*\$FreeBSD\$.*$\n/
|
#
9c3bfe2a |
|
10-Jul-2023 |
Konstantin Belousov <kib@FreeBSD.org> |
Revert "VFS: Remove VV_READLINK flag" and "fdescfs: improve linrdlnk mount option" This reverts commits 4a402dfe0bc44770c9eac6e58a501e4805e29413 and 3bffa2262328e4ff1737516f176107f607e7bc76. The fix will be implemented in somewhat different manner. The semantic adjustment is incompatible with linuxolator expectations. Reported and reviewed by: dchagin Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D40969
|
#
ba8cc6d7 |
|
12-Mar-2023 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: use __enum_uint8 for vtype and vstate This whacks hackery around only reading v_type once. Bump __FreeBSD_version to 1400093
|
#
9def8ea6 |
|
21-Apr-2023 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: list enums on separate lines Requested by: kib
|
#
4a402dfe |
|
21-Jun-2023 |
Konstantin Belousov <kib@FreeBSD.org> |
VFS: Remove VV_READLINK flag since its only reason to exist is removed. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D40700
|
#
2544b8e0 |
|
28-Apr-2023 |
Olivier Certner <olce.freebsd@certner.fr> |
vfs: Rename vfs_emptydir() to vn_dir_check_empty() No functional change. While here, adapt comments to style(9). Reviewed by: kib MFC after: 1 week
|
#
3d8450db |
|
24-Apr-2023 |
Olivier Certner <olce.freebsd@certner.fr> |
vfs: vn_dir_next_dirent(): Simplify interface and harden Simplify the old interface (one less argument, simpler termination test) and add documentation about it. Add more sanity checks (mostly under INVARIANTS, but also in the general case to prevent infinite loops). Drop the explicit test on minimum directory entry size (without INVARIANTS). Deal with the impacts in callers (dirent_exists() and vop_stdvptocnp()). dirent_exists() has been simplified a bit, preserving the exact same semantics but for the return code whose meaning has been reversed (0 now means the entry exists, ENOENT that it doesn't and other values are genuine errors). While here, suppress gratuitous casts of malloc return values. vn_dir_next_dirent() has been tested by a 'make -j4 buildkernel' with a temporary modification to the VFS cache causing vn_vptocnp() to always call VOP_VPTOCNP() and finally vop_stdvptocnp() (observed with temporary debug counters). Export new _GENERIC_MINDIRSIZ and _GENERIC_MAXDIRSIZ on __BSD_VISIBLE, and GENERIC_MINDIRSIZ and GENERIC_MAXDIRSIZ on _KERNEL. Reviewed by: kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D39764
|
#
6bce3f23 |
|
23-Apr-2023 |
Olivier Certner <olce.freebsd@certner.fr> |
vfs: Export get_next_dirent() as vn_dir_next_dirent() Move internal-to-'vfs_default.c' get_next_dirent() to 'vfs_vnops.c' and export it for use by other parts of the VFS. This is a preparatory change for using it in vfs_emptydir(). No functional change. Reviewed by: kib MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D39755
|
#
7b6fe242 |
|
08-Apr-2023 |
Konstantin Belousov <kib@FreeBSD.org> |
DEBUG_VFS_LOCKS: use witness if available The assert_vop_locked messages are ignored, and file/line information is not too useful. Fixing this without changing both witness and VFS asserts KPIs is not possible. Reviewed by: markj (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D39464
|
#
bb24eaea |
|
05-Apr-2023 |
Konstantin Belousov <kib@FreeBSD.org> |
vn_lock_pair(): allow to request shared locking If either of vnodes is shared locked, lock must not be recursed. Requested by: rmacklem Reviewed by: markj, rmacklem Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D39444
|
#
26b96487 |
|
07-Apr-2023 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: more informative panic for missing fplookup ops
|
#
5f6df177 |
|
03-Nov-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: validate that vop vectors provide all or none fplookup vops In order to prevent later susprises.
|
#
62a573d9 |
|
16-Mar-2023 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: retire KERN_VNODE It got disabled in 2003: commit acb18acfec97aa7fe26ff48f80a5c3f89c9b542d Author: Poul-Henning Kamp <phk@FreeBSD.org> Date: Sun Feb 23 18:09:05 2003 +0000 Bracket the kern.vnode sysctl in #ifdef notyet because it results in massive locking issues on diskless systems. It is also not clear that this sysctl is non-dangerous in its requirements for locked down memory on large RAM systems. There does not seem to be practical use for it and the disabled routine does not work anyway. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D39127
|
#
f45feecf |
|
22-Sep-2022 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: add vn_getsize getattr is very expensive and in important cases only gets called to get the size. This can be optimized with a dedicated routine which obtains that statistic. As a step towards that goal make size-only consumers use a dedicated routine. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D37885
|
#
829f0bcb |
|
19-Dec-2022 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: add the concept of vnode state transitions To quote from a comment above vput_final: <quote> * XXX Some filesystems pass in an exclusively locked vnode and strongly depend * on the lock being held all the way until VOP_INACTIVE. This in particular * happens with UFS which adds half-constructed vnodes to the hash, where they * can be found by other code. </quote> As is there is no mechanism which allows filesystems to denote that a vnode is fully initialized, consequently problems like the above are only found the hard way(tm). Add rudimentary support for state transitions, which in particular allow to assert the vnode is not legally unlocked until its fate is decided (either construction finishes or vgone is called to abort it). The new field lands in a 1-byte hole, thus it does not grow the struct. Bump __FreeBSD_version to 1400077 Reviewed by: kib (previous version) Tested by: pho Differential Revision: https://reviews.freebsd.org/D37759
|
#
94267fc9 |
|
22-Dec-2022 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: use designated initializers for the typename array While here prefix with v for better consistency with the vnode stuff. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D37759
|
#
78d35459 |
|
02-Dec-2022 |
Doug Rabson <dfr@FreeBSD.org> |
Add vn_path_to_global_path_hardlink This is similar to vn_path_to_global_path but allows for regular files which may not be present in the cache. Reviewed by: mjg, kib Tested by: pho
|
#
080ef8a4 |
|
04-Aug-2022 |
Jason A. Harmening <jah@FreeBSD.org> |
Add VV_CROSSLOCK vnode flag to avoid cross-mount lookup LOR When a lookup operation crosses into a new mountpoint, the mountpoint must first be busied before the root vnode can be locked. When a filesystem is unmounted, the vnode covered by the mountpoint must first be locked, and then the busy count for the mountpoint drained. Ordinarily, these two operations work fine if executed concurrently, but with a stacked filesystem the root vnode may in fact use the same lock as the covered vnode. By design, this will always be the case for unionfs (with either the upper or lower root vnode depending on mount options), and can also be the case for nullfs if the target and mount point are the same (which admittedly is very unlikely in practice). In this case, we have LOR. The lookup path holds the mountpoint busy while waiting on what is effectively the covered vnode lock, while a concurrent unmount holds the covered vnode lock and waits for the mountpoint's busy count to drain. Attempt to resolve this LOR by allowing the stacked filesystem to specify a new flag, VV_CROSSLOCK, on a covered vnode as necessary. Upon observing this flag, the vfs_lookup() will leave the covered vnode lock held while crossing into the mountpoint. Employ this flag for unionfs with the caveat that it can't be used for '-o below' mounts until other unionfs locking issues are resolved. Reported by: pho Tested by: pho Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D35054
|
#
d653aaec |
|
24-Oct-2022 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: add cache_assert_no_entries
|
#
1b4b7517 |
|
18-Sep-2022 |
Konstantin Belousov <kib@FreeBSD.org> |
Add vn_rlimit_fsizex() and vn_rlimit_fsizex_res() The vn_rlimit_fsizex() function: - checks that the write does not exceed RLIMIT_FSIZE limit and fs maximum supported file size - truncates write length if it exceeds the RLIMIT_FSIZE or max file size, but there are some bytes to write - sends SIGXFSZ if RLIMIT_FSIZE would be exceed otherwise POSIX mandates the truncated write in case when some bytes can be written but whole write request fails the RLIMIT_FSIZE check. The function is supposed to be used from VOP_WRITE()s. Due to pecularity in the VFS generic write syscall layer, uio_resid must correctly reflect the written amount (noted by markj). Provide the dual vn_rlimit_fsizex_res() function to correct uio_resid after the clamp done in vn_rlimit_fsizex() on VOP_WRITE() return. PR: 164793 Reviewed by: asomers, jah, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D36625
|
#
2ac083f6 |
|
18-Sep-2022 |
Konstantin Belousov <kib@FreeBSD.org> |
Add vn_rlimit_trunc() Reviewed by: asomers, jah, markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D36625
|
#
fa3eb3c9 |
|
18-Sep-2022 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: indent V_VALID_FLAGS with a tab Requested by: kib
|
#
a75d1ddd |
|
17-Sep-2022 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: introduce V_PCATCH to stop abusing PCATCH
|
#
a755fb92 |
|
10-Sep-2022 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: retire the V_MNTREF flag Reviewed by: kib, mckusick Differential Revision: https://reviews.freebsd.org/D36521
|
#
c6487446 |
|
11-Apr-2022 |
Dmitry Chagin <dchagin@FreeBSD.org> |
getdirentries: return ENOENT for unlinked but still open directory. To be more compatible to IEEE Std 1003.1-2008 (“POSIX.1”). Reviewed by: mjg, Pau Amma (doc) Differential revision: https://reviews.freebsd.org/D34680 MFC after: 2 weeks
|
#
b7262756 |
|
02-Apr-2022 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: fixup WANTIOCTLCAPS on open In some cases vn_open_cred overwrites cn_flags, effectively nullifying initialisation done in NDINIT. This will have to be fixed. In the meantime make sure the flag is passed. Reported by: jenkins Noted by: Mathieu <sigsys@gmail.com>
|
#
66b177e1 |
|
12-Mar-2022 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: reduce spurious zeroing in VOP_STAT clang fails to take advantage of the fact that majority of the struct gets written to in the routine and decides to bzero the entire thing. Explicitly zero padding and spare fields, relying on KMSAN to catch problems should anything pop up later which also needs explicit zeroing. fstat on tmpfs (ops/s): before: 8216636 after: 8508033
|
#
381dd12c |
|
12-Mar-2022 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: stop evaluating the argument multpile times in stat macros
|
#
66c5fbca |
|
27-Jan-2022 |
Konstantin Belousov <kib@FreeBSD.org> |
insmntque1(): remove useless arguments Also remove once-used functions to clean up after failed insmntque1(), which were destructor callbacks in previous life. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D34071
|
#
2a7e4cf8 |
|
27-Jan-2022 |
Mateusz Guzik <mjg@FreeBSD.org> |
Revert b58ca5df0bb7 ("vfs: remove the now unused insmntque1") I was somehow convinced that insmntque calls insmntque1 with a NULL destructor. Unfortunately this worked well enough to not immediately blow up in simple testing. Keep not using the destructor in previously patched filesystems though as it avoids unnecessary casts. Noted by: kib Reported by: pho
|
#
b58ca5df |
|
26-Jan-2022 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: remove the now unused insmntque1 Bump __FreeBSD_version to 1400052.
|
#
4dd23ae1 |
|
10-Dec-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: retire MNTK_NOKNOTE and VV_NOKNOTE MNTK_NOKNOTE was introduced in 679985d03a64f5dfb4355538ae6e3b70f8347f38 (dated 2005), VV_NOKNOTE in 34cc826ae8999f454dd6cb9c77d17ce83b169f92 few months later. Neither was ever used by anything in the tree.
|
#
4dcdf398 |
|
17-May-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: replace the MNTK_TEXT_REFS flag with VIRF_TEXT_REF This allows to stop maintaing the VI_TEXT_REF flag and consequently opens up fully lockless v_writecount adjustment. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D33127
|
#
3ffcfa59 |
|
26-Nov-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: add vop_stdadd_writecount_nomsync This avoids needing to inspect the mount point every time. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D33125
|
#
6eefabd4 |
|
22-Nov-2021 |
Brooks Davis <brooks@FreeBSD.org> |
syscalls: improve nstat, nfstat, nlstat Optionally return errors when truncating dev_t, ino_t, and nlink_t. In the interest of code reuse, use freebsd11_cvtstat() to perform the truncation and error handling and then convert the resulting struct freebsd11_stat to struct nstat. Add missing freebsd32 compat syscalls. These syscalls require translation because struct nstat contains four instances of struct timespec which in turn contains a time_t and a long. Reviewed by: kib
|
#
d28af1ab |
|
15-Nov-2021 |
Mark Johnston <markj@FreeBSD.org> |
vm: Add a mode to vm_object_page_remove() which skips invalid pages This will be used to break a deadlock in ZFS between the per-mountpoint teardown lock and page busy locks. In particular, when purging data from the page cache during dataset rollback, we want to avoid blocking on the busy state of invalid pages since the busying thread may be blocked on the teardown lock in zfs_getpages(). Add a helper, vn_pages_remove_valid(), for use by filesystems. Bump __FreeBSD_version so that the OpenZFS port can make use of the new helper. PR: 258208 Reviewed by: avg, kib, sef Tested by: pho (part of a larger patch) MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32931
|
#
47b248ac |
|
03-Nov-2021 |
Konstantin Belousov <kib@FreeBSD.org> |
Make locking assertions for VOP_FSYNC() and VOP_FDATASYNC() more correct For devfs vnodes, it is fine to not lock vnodes for VOP_FSYNC(). Otherwise vnode must be locked exclusively, except for MNT_SHARED_WRITES() where the shared lock is enough. Reported and tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32761
|
#
9a0bee9f |
|
22-Oct-2021 |
Konstantin Belousov <kib@FreeBSD.org> |
Make vn_fullpath_hardlink() externally callable Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D32611
|
#
2b68eb8e |
|
01-Oct-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: remove thread argument from VOP_STAT and fo_stat.
|
#
c5128c48 |
|
07-Sep-2021 |
Rick Macklem <rmacklem@FreeBSD.org> |
VOP_COPY_FILE_RANGE: Add a COPY_FILE_RANGE_TIMEO1SEC flag Although it is not specified in the RFCs, the concept that the NFSv4 server should reply to an RPC request within a reasonable time is accepted practice within the NFSv4 community. Without this patch, the NFSv4.2 server attempts to reply to a Copy operation within 1second by limiting the copy to vfs.nfs.maxcopyrange bytes (default 10Mbytes). This is crude at best, given the large variation in I/O subsystem performance. This patch adds a kernel only flag COPY_FILE_RANGE_TIMEO1SEC that the NFSv4.2 can specify, which tells VOP_COPY_FILE_RANGE() to return after approximately 1 second with a partial result and implements this in vn_generic_copy_file_range(), used by vop_stdcopyfilerange(). Modifying the NFSv4.2 server to set this flag will be done in a separate patch. Also under consideration is exposing the COPY_FILE_RANGE_TIMEO1SEC to userland for use on the FreeBSD copy_file_range(2) syscall. MFC after: 2 weeks Reviewed by: khng Differential Revision: https://reviews.freebsd.org/D31829
|
#
da779f26 |
|
27-Aug-2021 |
Rick Macklem <rmacklem@FreeBSD.org> |
vfs_default: Change vop_stddeallocate() from static to global A future commit to the NFS client uses vop_stddeallocate() for cases where the NFS server does not support a Deallocate operation. Change vop_stddeallocate() from static to global so that it can be called by the NFS client. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D31640
|
#
0dc332bf |
|
05-Aug-2021 |
Ka Ho Ng <khng@FreeBSD.org> |
Add fspacectl(2), vn_deallocate(9) and VOP_DEALLOCATE(9). fspacectl(2) is a system call to provide space management support to userspace applications. VOP_DEALLOCATE(9) is a VOP call to perform the deallocation. vn_deallocate(9) is a public KPI for kmods' use. The purpose of proposing a new system call, a KPI and a VOP call is to allow bhyve or other hypervisor monitors to emulate the behavior of SCSI UNMAP/NVMe DEALLOCATE on a plain file. fspacectl(2) comprises of cmd and flags parameters to specify the space management operation to be performed. Currently cmd has to be SPACECTL_DEALLOC, and flags has to be 0. fo_fspacectl is added to fileops. VOP_DEALLOCATE(9) is added as a new VOP call. A trivial implementation of VOP_DEALLOCATE(9) is provided. Sponsored by: The FreeBSD Foundation Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D28347
|
#
abbb57d5 |
|
04-Aug-2021 |
Ka Ho Ng <khng@FreeBSD.org> |
vfs: Introduce vn_bmap_seekhole_locked() vn_bmap_seekhole_locked() is factored out version of vn_bmap_seekhole(). This variant requires shared vnode lock being held around the call. Sponsored by: The FreeBSD Foundation Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D31404
|
#
0ef5eee9 |
|
03-Aug-2021 |
Konstantin Belousov <kib@FreeBSD.org> |
Add vn_lktype_write() and remove repetetive code that calculates vnode locking type for write. Reviewed by: khng, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D31405
|
#
844aa31c |
|
08-Jul-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: add cache_enter_time_flags
|
#
3cf75ca2 |
|
28-May-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: retire unused vn_seqc_write_begin_unheld*
|
#
cf74b2be |
|
22-May-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: retire the now unused vnlru_free routine
|
#
72b3b5a9 |
|
08-Apr-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: replace vfs_smr_quiesce with vfs_smr_synchronize This ends up using a smr specific method. Suggested by: markj Tested by: pho
|
#
3f56bc79 |
|
30-Mar-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: add vfs_smr_quiesce This can be used to observe all CPUs not executing while within vfs_smr_enter.
|
#
e9272225 |
|
17-Mar-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: fix vnlru marker handling for filtered/unfiltered cases The global list has a marker with an invariant that free vnodes are placed somewhere past that. A caller which performs filtering (like ZFS) can move said marker all the way to the end, across free vnodes which don't match. Then a caller which does not perform filtering will fail to find them. This makes vn_alloc_hard sleep for 1 second instead of reclaiming, resulting in significant stalls. Fix the problem by requiring an explicit marker by callers which do filtering. As a temporary measure extend vnlru_free to restart if it fails to reclaim anything. Big thanks go to the reporter for testing several iterations of the patch. Reported by: Yamagi <lists yamagi.org> Tested by: Yamagi <lists yamagi.org> Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D29324
|
#
2443068d |
|
21-Feb-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: shrink struct vnode to 448 bytes on LP64 ... by moving v_hash into a 4 byte hole. Combined with several previous size reductions this makes the size small enough to fit 9 vnodes per page as opposed to 8. Add a compilation time assert so that this is not unknowingly worsened. Note the structure still remains bigger than it should be.
|
#
2bfd8992 |
|
14-Feb-2021 |
Konstantin Belousov <kib@FreeBSD.org> |
vnode: move write cluster support data to inodes. The data is only needed by filesystems that 1. use buffer cache 2. utilize clustering write support. Requested by: mjg Reviewed by: asomers (previous version), fsu (ext2 parts), mckusick Tested by: pho Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D28679
|
#
fa3bd463 |
|
29-Jan-2021 |
Konstantin Belousov <kib@FreeBSD.org> |
lockf: ensure atomicity of lockf for open(O_CREAT|O_EXCL|O_EXLOCK) or EX_SHLOCK. Do it by setting a vnode iflag indicating that the locking exclusive open is in progress, and not allowing F_LOCK request to make a progress until the first open finishes. Requested by: mckusick Reviewed by: markj, mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D28697
|
#
b59a8e63 |
|
30-Jan-2021 |
Konstantin Belousov <kib@FreeBSD.org> |
Stop ignoring ERELOOKUP from VOP_INACTIVE() When possible, relock the vnode and retry inactivation. Only vunref() is required not to drop the vnode lock, so handle it specially by not retrying. This is a part of the efforts to ensure that unlinked not referenced vnode does not prevent inode from reusing. Reviewed by: chs, mckusick Tested by: pho MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
|
#
8d2a230e |
|
25-Jan-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: use atomic_load_consume_ptr in vn_load_v_data_smr
|
#
739ecbcf |
|
23-Jan-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: add symlink support to lockless lookup Reviewed by: kib (previous version) Tested by: pho (previous version) Differential Revision: https://reviews.freebsd.org/D27488
|
#
a5e28403 |
|
07-Jan-2021 |
Thomas Munro <tmunro@FreeBSD.org> |
open(2): Add O_DSYNC flag. POSIX O_DSYNC means that writes include an implicit fdatasync(2), just as O_SYNC implies fsync(2). VOP_WRITE() functions that understand the new IO_DATASYNC flag can act accordingly, but we'll still pass down IO_SYNC so that file systems that don't understand it will continue to provide the stronger O_SYNC behaviour. Flag also applies to fcntl(2). Reviewed by: kib, delphij Differential Revision: https://reviews.freebsd.org/D25090
|
#
c6d3272b |
|
05-Jan-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: add vn_seqc_read_notmodify
|
#
33f3e81d |
|
01-Jan-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: combine fast path enabled status into one flag Tested by: pho
|
#
82397d79 |
|
31-Dec-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: denote vnode being a mount point with VIRF_MOUNTPOINT Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D27794
|
#
3e506a67 |
|
27-Dec-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: add v_irflag accessors Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D27793
|
#
3b1f974b |
|
26-Nov-2020 |
Konstantin Belousov <kib@FreeBSD.org> |
Make max ticks for pause in vn_lock_pair() adjustable at runtime. Reduce default value from hz / 10 to hz / 100. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation
|
#
7cde2ec4 |
|
13-Nov-2020 |
Konstantin Belousov <kib@FreeBSD.org> |
Implement vn_lock_pair(). In collaboration with: pho Reviewed by: mckusick (previous version), markj (previous version) Tested by: markj (syzkaller), pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D26136
|
#
4bfebc8d |
|
30-Oct-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: add cache_vop_mkdir and rename cache_rename to cache_vop_rename
|
#
c7520caa |
|
22-Oct-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: prevent avoidable evictions on mkdir of existing directories mkdir -p /foo/bar/baz will mkdir each path component and ignore EEXIST. The NOCACHE lookup will make the namecache unnecessarily evict the existing entry, and then fallback to the fs lookup routine eventually leading namei to return an error as the directory is already there. For invocations like mkdir -p /usr/obj/usr/src/sys/GENERIC/modules this triggers fallbacks to the slowpath for concurrently executing lookups. Tested by: pho Discussed with: kib
|
#
8ecd87a3 |
|
20-Oct-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: drop spurious cred argument from VOP_VPTOCNP
|
#
214eccf4 |
|
14-Oct-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: add VOP_EAGAIN Can be used to stub fplookup for example.
|
#
a3d9bf49 |
|
23-Sep-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: drop the force flag from purgevfs The optional scan is wasteful, thus it is removed altogether from unmount. Callers which always want it anyway remain unaffected.
|
#
3c484f32 |
|
15-Sep-2020 |
Konstantin Belousov <kib@FreeBSD.org> |
Convert page cache read to VOP. There are several negative side-effects of not calling into VOP layer at all for page cache reads. The biggest is the missed activation of EVFILT_READ knotes. Also, it allows filesystem to make more fine grained decision to refuse read from page cache. Keep VIRF_PGREAD flag around, it is still useful for nullfs, and for asserts. Reviewed by: markj Tested by: pho Discussed with: mjg Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D26346
|
#
88863665 |
|
15-Sep-2020 |
Konstantin Belousov <kib@FreeBSD.org> |
vfs_subr.c: export io_hold_cnt and vn_read_from_obj(). Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D26346
|
#
b1a824b6 |
|
02-Sep-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: retire vholdl as a symbol Similarly to vrefl in r364283.
|
#
f6e54eb3 |
|
01-Sep-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
sys: clean up empty lines in .c and .h files
|
#
feabaaf9 |
|
24-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: drop the always curthread argument from reverse lookup routines Note VOP_VPTOCNP keeps getting it as temporary compatibility for zfs. Tested by: pho
|
#
39f88150 |
|
20-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: add cache_rename, a dedicated helper to use for renames While here make both tmpfs and ufs use it. No fuctional changes.
|
#
7ad2a82d |
|
18-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: drop the error parameter from vn_isdisk, introduce vn_isdisk_error Most consumers pass NULL.
|
#
fbca789f |
|
16-Aug-2020 |
Konstantin Belousov <kib@FreeBSD.org> |
VMIO read If possible, i.e. if the requested range is resident valid in the vm object queue, and some secondary conditions hold, copy data for read(2) directly from the valid cached pages, avoiding vnode lock and instantiating buffers. I intentionally do not start read-ahead, nor handle the advises on the cached range. Filesystems indicate support for VMIO reads by setting VIRF_PGREAD flag, which must not be cleared until vnode reclamation. Currently only filesystems that use vnode pager for v_objects can enable it, due to reliance on vnp_size. There is a WIP to handle it for tmpfs. Reviewed by: markj Discussed with: jeff Tested by: pho Benchmarked by: mjg Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D25968
|
#
60414088 |
|
16-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: retire vrefl as a symbol vrefl calls vref and there is only one in-tree consumer. Keep it as a macro for assertion purposes.
|
#
a92a971b |
|
16-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: remove the thread argument from vget It was already asserted to be curthread. Semantic patch: @@ expression arg1, arg2, arg3; @@ - vget(arg1, arg2, arg3) + vget(arg1, arg2)
|
#
36f47512 |
|
11-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: inline vrefcnt
|
#
4c2d103a |
|
11-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: garbage collect vrefactn
|
#
3b444436 |
|
11-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
devfs: rework si_usecount to track opens This removes a lot of special casing from the VFS layer. Reviewed by: kib (previous version) Tested by: pho (previous version) Differential Revision: https://reviews.freebsd.org/D25612
|
#
51ea7bea |
|
07-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: add VOP_STAT The current scheme of calling VOP_GETATTR adds avoidable overhead. An example with tmpfs doing fstat (ops/s): before: 7488958 after: 7913833 Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D25910
|
#
d292b194 |
|
05-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: remove the obsolete privused argument from vaccess This brings argument count down to 6, which is passable without the stack on amd64.
|
#
db99ec56 |
|
04-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: support lockless dotdot lookup Tested by: pho
|
#
6e10434c |
|
04-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: add cache_purge_vgone cache_purge locklessly checks whether the vnode at hand has any namecache entries. This can race with a concurrent purge which managed to remove the last entry, but may not be done touching the vnode. Make sure we observe the relevant vnode lock as not taken before proceeding with vgone. Paired with the fact that doomed vnodes cannnot receive entries this restores the invariant that there are no namecache-related writing users past cache_purge in vgone. Reported by: pho
|
#
b145e389 |
|
02-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: shorten v_iflag and v_vflag While here renumber VI_* flags to remove the gaps. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D25921
|
#
838984de |
|
02-Aug-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: move namecache initialisation into cache_vnode_init
|
#
848f8eff |
|
30-Jul-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: inline vops if there are no pre/post associated calls This removes a level of indirection from frequently used methods, most notably VOP_LOCK1 and VOP_UNLOCK1. Tested by: pho
|
#
07d2145a |
|
25-Jul-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: add the infrastructure for lockless lookup Reviewed by: kib Tested by: pho (in a patchset) Differential Revision: https://reviews.freebsd.org/D25577
|
#
0379ff6a |
|
25-Jul-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: introduce vnode sequence counters Modified on each permission change and link/unlink. Reviewed by: kib Tested by: pho (in a patchset) Differential Revision: https://reviews.freebsd.org/D25573
|
#
f8022be3 |
|
30-Jun-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: protect vnodes with smr vget_prep_smr and vhold_smr can be used to ref a vnode while within vfs_smr section, allowing consumers to get away without locking. See vhold_smr and vdropl for comments explaining caveats. Reviewed by: kib Testec by: pho Differential Revision: https://reviews.freebsd.org/D23913
|
#
2782c00c |
|
22-Feb-2020 |
Ryan Libby <rlibby@FreeBSD.org> |
vfs: quiet -Wwrite-strings Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D23797
|
#
6a5abb1e |
|
02-Feb-2020 |
Kyle Evans <kevans@FreeBSD.org> |
Provide O_SEARCH O_SEARCH is defined by POSIX [0] to open a directory for searching, skipping permissions checks on the directory itself after the initial open(). This is close to the semantics we've historically applied for O_EXEC on a directory, which is UB according to POSIX. Conveniently, O_SEARCH on a file is also explicitly undefined behavior according to POSIX, so O_EXEC would be a fine choice. The spec goes on to state that O_SEARCH and O_EXEC need not be distinct values, but they're not defined to be the same value. This was pointed out as an incompatibility with other systems that had made its way into libarchive, which had assumed that O_EXEC was an alias for O_SEARCH. This defines compatibility O_SEARCH/FSEARCH (equivalent to O_EXEC and FEXEC respectively) and expands our UB for O_EXEC on a directory. O_EXEC on a directory is checked in vn_open_vnode already, so for completeness we add a NOEXECCHECK when O_SEARCH has been specified on the top-level fd and do not re-check that when descending in namei. [0] https://pubs.opengroup.org/onlinepubs/9699919799/ Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23247
|
#
6698e11f |
|
02-Feb-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: remove the now empty vop_unlock_post
|
#
10a15df6 |
|
02-Feb-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: remove the never set VDESC_VPP_WILLRELE flag
|
#
7739d927 |
|
01-Feb-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
cache: replace kern___getcwd with vn_getcwd The previous routine was resulting in extra data copies most notably in linux_getcwd.
|
#
45757984 |
|
01-Feb-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: consistently use size_t for buflen around VOP_VPTOCNP
|
#
643656cf |
|
31-Jan-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: replace VOP_MARKATIME with VOP_MMAPPED The routine is only provided by ufs and is only used on mmap and exec. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23422
|
#
21c4f104 |
|
31-Jan-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: add vrefactn Differential Revision: https://reviews.freebsd.org/D23427
|
#
3cfabd81 |
|
30-Jan-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: remove the never set VDESC_NOMAP_VPP flag
|
#
0c236d3d |
|
12-Jan-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: per-cpu batched requeuing of free vnodes Constant requeuing adds significant lock contention in certain workloads. Lessen the problem by batching it. Per-cpu areas are locked in order to synchronize against UMA freeing memory. vnode's v_mflag is converted to short to prevent the struct from growing. Sample result from an incremental make -s -j 104 bzImage on tmpfs: stock: 122.38s user 1780.45s system 6242% cpu 30.480 total patched: 144.84s user 985.90s system 4856% cpu 23.282 total Reviewed by: jeff Tested by: pho (in a larger patch, previous version) Differential Revision: https://reviews.freebsd.org/D22998
|
#
cc3593fb |
|
12-Jan-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: rework vnode list management The current notion of an active vnode is eliminated. Vnodes transition between 0<->1 hold counts all the time and the associated traversal between different lists induces significant scalability problems in certain workloads. Introduce a global list containing all allocated vnodes. They get unlinked only when UMA reclaims memory and are only requeued when hold count reaches 0. Sample result from an incremental make -s -j 104 bzImage on tmpfs: stock: 118.55s user 3649.73s system 7479% cpu 50.382 total patched: 122.38s user 1780.45s system 6242% cpu 30.480 total Reviewed by: jeff Tested by: pho (in a larger patch, previous version) Differential Revision: https://reviews.freebsd.org/D22997
|
#
57083d25 |
|
12-Jan-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: add per-mount vnode lazy list and use it for deferred inactive + msync This obviates the need to scan the entire active list looking for vnodes of interest. msync is handled by adding all vnodes with write count to the lazy list. deferred inactive directly adds vnodes as it sets the VI_DEFINACT flag. Vnodes get dequeued from the list when their hold count reaches 0. Newly added MNT_VNODE_FOREACH_LAZY* macros support filtering so that spurious locking is avoided in the common case. Reviewed by: jeff Tested by: pho (in a larger patch, previous version) Differential Revision: https://reviews.freebsd.org/D22995
|
#
b52d50cf |
|
11-Jan-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: prealloc vnodes in getnewvnode_reserve Having a reserved vnode count does not guarantee that getnewvnodes wont block later. Said blocking partially defeats the purpose of reserving in the first place. Preallocate instaed. The only consumer was always passing "1" as count and never nesting reservations.
|
#
69283067 |
|
11-Jan-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: incomplete pass at converting more ints to u_long Most notably numvnodes and freevnodes were u_long, but parameters used to govern them remained as ints.
|
#
c8b3463d |
|
07-Jan-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: reimplement deferred inactive to use a dedicated flag (VI_DEFINACT) The previous behavior of leaving VI_OWEINACT vnodes on the active list without a hold count is eliminated. Hold count is kept and inactive processing gets explicitly deferred by setting the VI_DEFINACT flag. The syncer is then responsible for vdrop. Reviewed by: kib (previous version) Tested by: pho (in a larger patch, previous version) Differential Revision: https://reviews.freebsd.org/D23036
|
#
478368ca |
|
06-Jan-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: eliminate v_tag from struct vnode There was only one consumer and it was using it incorrectly. It is given an equivalent hack. Reviewed by: jeff Differential Revision: https://reviews.freebsd.org/D23037
|
#
8dbc6352 |
|
04-Jan-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: drop thread argument from vinactive
|
#
1cde9e38 |
|
04-Jan-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: predict VN_IS_DOOMED as false The macro is used everywhere.
|
#
952f5953 |
|
03-Jan-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: remove CTASSERT from VOP_UNLOCK_FLAGS gcc does not like it and it's not worth working around just for that compiler.
|
#
b249ce48 |
|
03-Jan-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: drop the mostly unused flags argument from VOP_UNLOCK Filesystems which want to use it in limited capacity can employ the VOP_UNLOCK_FLAGS macro. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D21427
|
#
3d59b89c |
|
03-Jan-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: add VOP_UNLOCK_FLAGS The flags argument from VOP_UNLOCK is about to be removed and some filesystems unlock the interlock as a convienience with it. Add a helper to retain the behavior for the few cases it is needed.
|
#
c400abe5 |
|
16-Dec-2019 |
Li-Wen Hsu <lwhsu@FreeBSD.org> |
Fix gcc build after r355790 Sponsored by: The FreeBSD Foundation
|
#
6fa079fc |
|
15-Dec-2019 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: flatten vop vectors This eliminates the following loop from all VOP calls: while(vop != NULL && \ vop->vop_spare2 == NULL && vop->vop_bypass == NULL) vop = vop->vop_default; Reviewed by: jeff Tesetd by: pho Differential Revision: https://reviews.freebsd.org/D22738
|
#
ea9a16b2 |
|
12-Dec-2019 |
Rick Macklem <rmacklem@FreeBSD.org> |
r355677 requires that vop_stdioctl() be global so it can be called from NFS. r355677 modified the NFS client so that it does lseek(SEEK_DATA/SEEK_HOLE) for NFSv4.2, but calls vop_stdioctl() otherwise. As such, vop_stdioctl() needs to be a global function. Missed during the code merge for r355677.
|
#
c8b29d12 |
|
11-Dec-2019 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: locking primitives which elide ->v_vnlock and shared locking disablement Both of these features are not needed by many consumers and result in avoidable reads which in turn puts them on profiles due to cache-line ping ponging. On top of that the current lockgmr entry point is slower than necessary single-threaded. As an attempted clean up preparing for other changes, provide new routines which don't support any of the aforementioned features. With these patches in place vop_stdlock and vop_stdunlock disappear from flamegraphs during -j 104 buildkernel. Reviewed by: jeff (previous version) Tested by: pho Differential Revision: https://reviews.freebsd.org/D22665
|
#
ff4486e8 |
|
09-Dec-2019 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: refactor vhold and vdrop No fuctional changes.
|
#
abd80ddb |
|
08-Dec-2019 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: introduce v_irflag and make v_type smaller The current vnode layout is not smp-friendly by having frequently read data avoidably sharing cachelines with very frequently modified fields. In particular v_iflag inspected for VI_DOOMED can be found in the same line with v_usecount. Instead make it available in the same cacheline as the v_op, v_data and v_type which all get read all the time. v_type is avoidably 4 bytes while the necessary data will easily fit in 1. Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new flag field with a new value: VIRF_DOOMED. Reviewed by: kib, jeff Differential Revision: https://reviews.freebsd.org/D22715
|
#
89c4c2e5 |
|
30-Nov-2019 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: swap placement between v_type and v_tag The former is frequently accessed (e.g., in vfs_cache_lookup) and shares the cacheline with v_usecount, avoidably adding to cache misses during concurrent lookup. The latter is almost unused and probably can get garbage-collected. The struct does not change in size despite enum vs char * discrepancy. On 64-bit archs there used to be 4 bytes padding after v_type giving 480 bytes in total.
|
#
fdc6b10d |
|
29-Nov-2019 |
Konstantin Belousov <kib@FreeBSD.org> |
Add a VN_OPEN_INVFS flag. vn_open_cred() assumes that it is called from the top-level of a VFS syscall. Writers must call bwillwrite() before locking any VFS resource to wait for cleanup of dirty buffers. ZFS getextattr() and setextattr() VOPs do call vn_open_cred(), which results in wait for unrelated buffers while owning ZFS vnode lock (and ZFS does not use buffer cache). VN_OPEN_INVFS allows caller to skip bwillwrite. Note that ZFS is still incorrect there, because it starts write on an mp and locks a vnode while holding another vnode lock. Reported by: Willem Jan Withagen <wjw@digiware.nl> Sponsored by: The FreeBSD Foundation MFC after: 1 week
|
#
208b81bb |
|
22-Oct-2019 |
Konstantin Belousov <kib@FreeBSD.org> |
Add VV_VMSIZEVNLOCK flag. The flag specifies that vm_fault() handler should check the vnode' vm_object size under the vnode lock. It is converted into the object' OBJ_SIZEVNLOCK flag in vnode_pager_alloc(). Tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D21883
|
#
dc20b834 |
|
06-Oct-2019 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: add optional root vnode caching Root vnodes looekd up all the time, e.g. when crossing a mount point. Currently used routines always perform a costly lookup which can be trivially avoided. Reviewed by: jeff (previous version), kib Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21646
|
#
ba7a55d9 |
|
22-Sep-2019 |
Sean Eric Fagan <sef@FreeBSD.org> |
Add two options to allow mount to avoid covering up existing mount points. The two options are * nocover/cover: Prevent/allow mounting over an existing root mountpoint. E.g., "mount -t ufs -o nocover /dev/sd1a /usr/local" will fail if /usr/local is already a mountpoint. * emptydir/noemptydir: Prevent/allow mounting on a non-empty directory. E.g., "mount -t ufs -o emptydir /dev/sd1a /usr" will fail. Neither of these options is intended to be a default, for historical and compatibility reasons. Reviewed by: allanjude, kib Differential Revision: https://reviews.freebsd.org/D21458
|
#
e3c3248c |
|
03-Sep-2019 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: implement usecount implying holdcnt vnodes have 2 reference counts - holdcnt to keep the vnode itself from getting freed and usecount to denote it is actively used. Previously all operations bumping usecount would also bump holdcnt, which is not necessary. We can detect if usecount is already > 1 (in which case holdcnt is also > 1) and utilize it to avoid bumping holdcnt on our own. This saves on atomic ops. Reviewed by: kib Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21471
|
#
1e2f0ceb |
|
28-Aug-2019 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: add VOP_NEED_INACTIVE vnode usecount drops to 0 all the time (e.g. for directories during path lookup). When that happens the kernel would always lock the exclusive lock for the vnode in order to call vinactive(). This blocks other threads who want to use the vnode for looukp. vinactive is very rarely needed and can be tested for without the vnode lock held. This patch gives filesytems an opportunity to do it, sample total wait time for tmpfs over 500 minutes of poudriere -j 104: before: 557563641706 (lockmgr:tmpfs) after: 46309603301 (lockmgr:tmpfs) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21371
|
#
8795830c |
|
25-Aug-2019 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: swap vop_unlock_post and vop_unlock_pre definitions to the logical order The change is no-op. Sponsored by: The FreeBSD Foundation
|
#
0256405e |
|
24-Aug-2019 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: add vholdnz (for already held vnodes) Reviewed by: kib (previous version) Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21358
|
#
de4e1aeb |
|
18-Aug-2019 |
Konstantin Belousov <kib@FreeBSD.org> |
Fix an issue with executing tmpfs binary. Suppose that a binary was executed from tmpfs mount, and the text vnode was reclaimed while the binary was still running. It is possible during even the normal operations since tmpfs vnode' vm_object has swap type, and no references on the vnode is held. Also assume that the text vnode was revived for some reason. Then, on the process exit or exec, unmapping of the text mapping tries to remove the text reference from the vnode, but since it went from recycle/instantiation cycle, there is no reference kept, and assertion in VOP_UNSET_TEXT_CHECKED() triggers. Fix this by keeping a use reference on the tmpfs vnode for each exec reference. This prevents the vnode reclamation while executable map entry is active. Do it by adding per-mount flag MNTK_TEXT_REFS that directs vop_stdset_text() to add use ref on first vnode text use, and per-vnode VI_TEXT_REF flag, to record the need on unref in vop_stdunset_text() on last vnode text use going away. Set MNTK_TEXT_REFS for tmpfs mounts. Reported by: bdrewery Tested by: sbruno, pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week
|
#
520482f4 |
|
30-Jul-2019 |
Mark Johnston <markj@FreeBSD.org> |
Use VNASSERT() in checked VOP wrappers. Reviewed by: kib MFC after: 3 days Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21120
|
#
2240d8c4 |
|
27-Jul-2019 |
Alan Somers <asomers@FreeBSD.org> |
Add v_inval_buf_range, like vtruncbuf but for a range of a file v_inval_buf_range invalidates all buffers within a certain LBA range of a file. It will be used by fusefs(5). This commit is a partial merge of r346162, r346606, and r346756 from projects/fuse2. Reviewed by: kib MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D21032
|
#
bbbbeca3 |
|
24-Jul-2019 |
Rick Macklem <rmacklem@FreeBSD.org> |
Add kernel support for a Linux compatible copy_file_range(2) syscall. This patch adds support to the kernel for a Linux compatible copy_file_range(2) syscall and the related VOP_COPY_FILE_RANGE(9). This syscall/VOP can be used by the NFSv4.2 client to implement the Copy operation against an NFSv4.2 server to do file copies locally on the server. The vn_generic_copy_file_range() function in this patch can be used by the NFSv4.2 server to implement the Copy operation. Fuse may also me able to use the VOP_COPY_FILE_RANGE() method. vn_generic_copy_file_range() attempts to maintain holes in the output file in the range to be copied, but may fail to do so if the input and output files are on different file systems with different _PC_MIN_HOLE_SIZE values. Separate commits will be done for the generated syscall files and userland changes. A commit for a compat32 syscall will be done later. Reviewed by: kib, asomers (plus comments by brooks, jilles) Relnotes: yes Differential Revision: https://reviews.freebsd.org/D20584
|
#
555d8f28 |
|
01-Jul-2019 |
Rick Macklem <rmacklem@FreeBSD.org> |
Factor out the code that does a VOP_SETATTR(size) from vn_truncate(). This patch factors the code in vn_truncate() that does the actual VOP_SETATTR() of size into a separate function called vn_truncate_locked(). This will allow the NFS server and the patch that adds a copy_file_range(2) syscall to call this function instead of duplicating the code and carrying over changes, such as the recent r347151. Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D20808
|
#
e3680954 |
|
27-Jun-2019 |
Rick Macklem <rmacklem@FreeBSD.org> |
Add non-blocking trylock variants for the rangelock functions. A future patch that will add a Linux compatible copy_file_range(2) syscall needs to be able to lock the byte ranges of two files concurrently. To do this without a risk of deadlock, a non-blocking variant of vn_rangelock_rlock() called vn_rangelock_tryrlock() was needed. This patch adds this, along with vn_rangelock_trywlock(), in order to do this. The patch also adds a couple of comments, that I hope clarify how the algorithm used in kern_rangelock.c works. Reviewed by: kib, asomers (previous version) Differential Revision: https://reviews.freebsd.org/D20645
|
#
65417f5e |
|
24-May-2019 |
Alan Somers <asomers@FreeBSD.org> |
Remove "struct ucred*" argument from vtruncbuf vtruncbuf takes a "struct ucred*" argument. AFAICT, it's been unused ever since that function was first added in r34611. Remove it. Also, remove some "struct ucred" arguments from fuse and nfs functions that were only used by vtruncbuf. Reviewed by: cem MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D20377
|
#
78022527 |
|
05-May-2019 |
Konstantin Belousov <kib@FreeBSD.org> |
Switch to use shared vnode locks for text files during image activation. kern_execve() locks text vnode exclusive to be able to set and clear VV_TEXT flag. VV_TEXT is mutually exclusive with the v_writecount > 0 condition. The change removes VV_TEXT, replacing it with the condition v_writecount <= -1, and puts v_writecount under the vnode interlock. Each text reference decrements v_writecount. To clear the text reference when the segment is unmapped, it is recorded in the vm_map_entry backed by the text file as MAP_ENTRY_VN_TEXT flag, and v_writecount is incremented on the map entry removal The operations like VOP_ADD_WRITECOUNT() and VOP_SET_TEXT() check that v_writecount does not contradict the desired change. vn_writecheck() is now racy and its use was eliminated everywhere except access. Atomic check for writeability and increment of v_writecount is performed by the VOP. vn_truncate() now increments v_writecount around VOP_SETATTR() call, lack of which is arguably a bug on its own. nullfs bypasses v_writecount to the lower vnode always, so nullfs vnode has its own v_writecount correct, and lower vnode gets all references, since object->handle is always lower vnode. On the text vnode' vm object dealloc, the v_writecount value is reset to zero, and deadfs vop_unset_text short-circuit the operation. Reclamation of lowervp always reclaims all nullfs vnodes referencing lowervp first, so no stray references are left. Reviewed by: markj, trasz Tested by: mjg, pho Sponsored by: The FreeBSD Foundation MFC after: 1 month Differential revision: https://reviews.freebsd.org/D19923
|
#
6af6fdce |
|
12-Apr-2019 |
Alan Somers <asomers@FreeBSD.org> |
fusefs: evict invalidated cache contents during write-through fusefs's default cache mode is "writethrough", although it currently works more like "write-around"; writes bypass the cache completely. Since writes bypass the cache, they were leaving stale previously-read data in the cache. This commit invalidates that stale data. It also adds a new global v_inval_buf_range method, like vtruncbuf but for a range of a file. PR: 235774 Reported by: cem Sponsored by: The FreeBSD Foundation
|
#
ae909414 |
|
09-Apr-2019 |
Konstantin Belousov <kib@FreeBSD.org> |
Add vn_fsync_buf(). Provide a convenience function to avoid the hack with filling fake struct vop_fsync_args and then calling vop_stdfsync(). Sponsored by: The FreeBSD Foundation MFC after: 1 week
|
#
8ff7fad1 |
|
23-Oct-2018 |
Konstantin Belousov <kib@FreeBSD.org> |
Only call sigdeferstop() for NFS. Use bypass to catch any NFS VOP dispatch and route it through the wrapper which does sigdeferstop() and then dispatches original VOP. NFS does not need a bypass below it, which is not supported. The vop offset in the vop_vector is added since otherwise it is impossible to get vop_op_t from the internal table, and I did not wanted to create the layered fs only to wrap NFS VOPs. VFS_OP()s wrap is straightforward. Requested and reviewed by: mjg (previous version) Tested by: pho Sponsored by: The FreeBSD Foundation Differential revision: https://reviews.freebsd.org/D17658
|
#
51369649 |
|
20-Nov-2017 |
Pedro F. Giffuni <pfg@FreeBSD.org> |
sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.
|
#
b59ea730 |
|
20-Aug-2017 |
Konstantin Belousov <kib@FreeBSD.org> |
Allow vinvalbuf() to operate with the shared vnode lock. This mode allows other clean buffers to arrive while we flush the buf lists for the vnode, which is fine for the targeted use. We only need that all buffers existed at the time of the function start were flushed. In fact, only one assert has to be relaxed. In collaboration with: pho Reviewed by: rmacklem Sponsored by: The FreeBSD Foundation MFC after: 2 weeks X-Differential revision: https://reviews.freebsd.org/D12083
|
#
77d3337c |
|
31-Jul-2017 |
Dmitry Chagin <dchagin@FreeBSD.org> |
Implement proper Linux /dev/fd and /proc/self/fd behavior by adding Linux specific things to the native fdescfs file system. Unlike FreeBSD, the Linux fdescfs is a directory containing a symbolic links to the actual files, which the process has open. A readlink(2) call on this file returns a full path in case of regular file or a string in a special format (type:[inode], anon_inode:<file-type>, etc..). As well as in a FreeBSD, opening the file in the Linux fdescfs directory is equivalent to duplicating the corresponding file descriptor. Here we have mutually exclusive requirements: - in case of readlink(2) call fdescfs lookup() method should return VLNK vnode otherwise our kern_readlink() fail with EINVAL error; - in the other calls fdescfs lookup() method should return non VLNK vnode. For what new vnode v_flag VV_READLINK was added, which is set if fdescfs has beed mounted with linrdlnk option an modified kern_readlinkat() to properly handle it. For now For Linux ABI compatibility mount fdescfs volume with linrdlnk option: mount -t fdescfs -o linrdlnk null /compat/linux/dev/fd Reviewed by: kib@ MFC after: 1 week Relnotes: yes
|
#
3df7ebc4 |
|
05-Jun-2017 |
Konstantin Belousov <kib@FreeBSD.org> |
Add sysctl vfs.ino64_trunc_error controlling action on truncating inode number or link count for the ABI compat binaries. Right now, and by default after the change, too large 64bit values are silently truncated to 32 bits. Enabling the knob causes the system to return EOVERFLOW for stat(2) family of compat syscalls when some values cannot be completely represented by the old structures. For getdirentries(2), knob skips the dirents which would cause non-trivial truncation of d_ino. EOVERFLOW error is specified by the X/Open 1996 LFS document ('Adding Support for Arbitrary File Sizes to the Single UNIX Specification'). Based on the discussion with: bde Sponsored by: The FreeBSD Foundation
|
#
0c3c207f |
|
02-Jun-2017 |
Gleb Smirnoff <glebius@FreeBSD.org> |
For UNIX sockets make vnode point not to the socket, but to the UNIX PCB, since the latter is the thing that links together VFS and sockets. While here, make the union in the struct vnode anonymous.
|
#
03311f11 |
|
27-May-2017 |
Konstantin Belousov <kib@FreeBSD.org> |
Use whole mnt_stat.f_fsid bits for st_dev. Since ino64 expanded dev_t to 64bit, make VOP_GETATTR(9) provide all bits of mnt_stat.f_fsid as va_fsid for vnodes on filesystems which use f_fsid. In particular, NFSv3 and sometimes NFSv4, and ZFS use this method or reporting st_dev by stat(2). Provide a new helper vn_fsid() to avoid duplicating code to copy f_fsid to va_fsid. Note that the change is mostly cosmetic. Its motivation is to avoid sign-extension of f_fsid[0] into 64bit dev_t value which happens after dev_t becomes 64bit.. Reviewed by: avg(zfs), rmacklem (nfs) (both for previous version) Sponsored by: The FreeBSD Foundation
|
#
69921123 |
|
23-May-2017 |
Konstantin Belousov <kib@FreeBSD.org> |
Commit the 64-bit inode project. Extend the ino_t, dev_t, nlink_t types to 64-bit ints. Modify struct dirent layout to add d_off, increase the size of d_fileno to 64-bits, increase the size of d_namlen to 16-bits, and change the required alignment. Increase struct statfs f_mntfromname[] and f_mntonname[] array length MNAMELEN to 1024. ABI breakage is mitigated by providing compatibility using versioned symbols, ingenious use of the existing padding in structures, and by employing other tricks. Unfortunately, not everything can be fixed, especially outside the base system. For instance, third-party APIs which pass struct stat around are broken in backward and forward incompatible ways. Kinfo sysctl MIBs ABI is changed in backward-compatible way, but there is no general mechanism to handle other sysctl MIBS which return structures where the layout has changed. It was considered that the breakage is either in the management interfaces, where we usually allow ABI slip, or is not important. Struct xvnode changed layout, no compat shims are provided. For struct xtty, dev_t tty device member was reduced to uint32_t. It was decided that keeping ABI compat in this case is more useful than reporting 64-bit dev_t, for the sake of pstat. Update note: strictly follow the instructions in UPDATING. Build and install the new kernel with COMPAT_FREEBSD11 option enabled, then reboot, and only then install new world. Credits: The 64-bit inode project, also known as ino64, started life many years ago as a project by Gleb Kurtsou (gleb). Kirk McKusick (mckusick) then picked up and updated the patch, and acted as a flag-waver. Feedback, suggestions, and discussions were carried by Ed Maste (emaste), John Baldwin (jhb), Jilles Tjoelker (jilles), and Rick Macklem (rmacklem). Kris Moore (kris) performed an initial ports investigation followed by an exp-run by Antoine Brodin (antoine). Essential and all-embracing testing was done by Peter Holm (pho). The heavy lifting of coordinating all these efforts and bringing the project to completion were done by Konstantin Belousov (kib). Sponsored by: The FreeBSD Foundation (emaste, kib) Differential revision: https://reviews.freebsd.org/D10439
|
#
0226f659 |
|
05-Apr-2017 |
Konstantin Belousov <kib@FreeBSD.org> |
Add V_VMIO flag for vinvalbuf(9) to indicate that the flush request was issued during VM-initiated i/o (pageout), so that the function does not try to flush or remove pages or wait for the vm object paging-in-progress counter. Reviewed by: markj Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week X-Differential revision: https://reviews.freebsd.org/D10241
|
#
fbbd9655 |
|
28-Feb-2017 |
Warner Losh <imp@FreeBSD.org> |
Renumber copyright clause 4 Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96
|
#
5afb134c |
|
12-Dec-2016 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: add vrefact, to be used when the vnode has to be already active This allows blind increment of relevant counters which under contention is cheaper than inc-not-zero loops at least on amd64. Use it in some of the places which are guaranteed to see already active vnodes. Reviewed by: kib (previous version)
|
#
99e6e193 |
|
23-Nov-2016 |
Mark Johnston <markj@FreeBSD.org> |
Release laundered vnode pages to the head of the inactive queue. The swap pager enqueues laundered pages near the head of the inactive queue to avoid another trip through LRU before reclamation. This change adds support for this behaviour to the vnode pager and makes use of it in UFS and ext2fs. Some ioflag handling is consolidated into a common subroutine so that this support can be easily extended to other filesystems which make use of the buffer cache. No changes are needed for ZFS since its putpages routine always undirties the pages before returning, and the laundry thread requeues the pages appropriately in this case. Reviewed by: alc, kib Differential Revision: https://reviews.freebsd.org/D8589
|
#
f71d0856 |
|
07-Oct-2016 |
Konstantin Belousov <kib@FreeBSD.org> |
Limit scope of the optimization in r306608 to dounmount() caller only. Other uses of cache_purgevfs() do rely on the cache purge for correct operations, when paths are invalidated without unmount. Reported and tested by: jkim Discussed with: mjg Sponsored by: The FreeBSD Foundation
|
#
5a22c958 |
|
06-Oct-2016 |
Bryan Drewery <bdrewery@FreeBSD.org> |
Add vrecyclel() to vrecycle() a vnode with the interlock already held. Obtained from: OneFS Sponsored by: Dell EMC Isilon MFC after: 2 weeks
|
#
5bb81f9b |
|
30-Sep-2016 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: batch free vnodes in per-mnt lists Previously free vnodes would always by directly returned to the global LRU list. With this change up to mnt_free_list_batch vnodes are collected first. syncer runs always return the batch regardless of its size. While vnodes on per-mnt lists are not counted as free, they can be returned in case of vnode shortage. Reviewed by: kib Tested by: pho
|
#
8660b707 |
|
30-Sep-2016 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: remove the __bo_vnode field from struct vnode The pointer can be obtained using __containerof instead. Reviewed by: kib
|
#
47e61f6c |
|
15-Aug-2016 |
Konstantin Belousov <kib@FreeBSD.org> |
Implement VOP_FDATASYNC() for msdosfs. Standard VOP_FSYNC() implementation just syncs data buffers, and due to this, is the correct and efficient implementation for msdosfs or any other filesystem which uses bufer cache trivially. Provide globally visible wrapper vop_stdfdatasync_buf() for future consumption by other filesystems. Reviewed by: mckusick Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential revision: https://reviews.freebsd.org/D7471
|
#
d293d0c9 |
|
11-Aug-2016 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Remove unused textvp_fullpath() macro. MFC after: 1 month
|
#
411455a8 |
|
10-Aug-2016 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Replace all remaining calls to vprint(9) with vn_printf(9), and remove the old macro. MFC after: 1 month
|
#
7b255097 |
|
04-Aug-2016 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Remove unused - never actually implemented - vnode lock types from vnode_if.src. MFC after: 1 month
|
#
e896fb3b |
|
17-Jun-2016 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: ifdef out noop vop_* primitives on !DEBUG_VFS_LOCKS kernels This removes calls to empty functions like vop_lock_{pre/post} from common vfs routines. Approved by: re (gjb)
|
#
f8a75278 |
|
17-Jun-2016 |
Konstantin Belousov <kib@FreeBSD.org> |
Add VFS interface to flush specified amount of free vnodes belonging to mount points with the given filesystem type, specified by mount vfs_ops pointer. Based on patch by: mckusick Reviewed by: avg, mckusick Tested by: allanjude, madpilot Sponsored by: The FreeBSD Foundation Approved by: re (gjb)
|
#
3f7ca894 |
|
17-May-2016 |
Konstantin Belousov <kib@FreeBSD.org> |
Ensure that ftruncate(2) is performed synchronously when file is opened in O_SYNC mode, at least for UFS. This also handles truncation, done due to the O_SYNC | O_TRUNC flags combination to open(2), in synchronous way. Noted by: bde Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
|
#
54a33d2f |
|
11-May-2016 |
Konstantin Belousov <kib@FreeBSD.org> |
Add vfs_hash_ref(9) function, which finds a vnode by the hash value and returns it referenced. The function is similar to vfs_hash_get(9), but unlike the later, returned vnode is not locked. This operation cannot be requested with the vget(9) flags. Reviewed and tested by: rmacklem Sponsored by: The FreeBSD Foundation MFC after: 1 week
|
#
cd85d599 |
|
11-May-2016 |
Konstantin Belousov <kib@FreeBSD.org> |
Style: wrap long lines. Sponsored by: The FreeBSD Foundation MFC after: 1 week
|
#
c89e1b87 |
|
03-May-2016 |
Konstantin Belousov <kib@FreeBSD.org> |
Add EVFILT_VNODE open, read and close notifications. While there, order EVFILT_VNODE notes descriptions alphabetically. Based on submission, and tested by: Vladimir Kondratyev <wulf@cicgroup.ru> MFC after: 2 weeks
|
#
399e8c17 |
|
09-Mar-2016 |
John Baldwin <jhb@FreeBSD.org> |
Simplify AIO initialization now that it is standard. - Mark AIO system calls as STD and remove the helpers to dynamically register them. - Use COMPAT6 for the old system calls with the older sigevent instead of an 'o' prefix. - Simplify the POSIX configuration to note that AIO is always available. - Handle AIO in the default VOP_PATHCONF instead of special casing it in the pathconf() system call. fpathconf() is still hackish. - Remove freebsd32_aio_cancel() as it just called the native one directly. Reviewed by: kib Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D5589
|
#
0791e0c0 |
|
24-Feb-2016 |
Konstantin Belousov <kib@FreeBSD.org> |
Provide more correct sizing of the KVA consumed by a vnode, used by the virtvnodes calculation. Include the size of fs-specific v_data as the nfs nclnode inline, the NFS nclnode is bigger than either ZFS znode or UFS inode. Include the size of namecache_ts and short cache path element, multiplied by the name cache population factor, again inline. Inline defines are used to avoid pollution of the vnode.h with the subsystem-private objects. Non-significant unsynchronized changes of the definitions are fine, we do not care about that precision, and e.g. ZFS consumes much malloced memory per vnode for reasons unaccounted in the formula. Lower the partition of kmem dedicated to vnodes, from 1/7 to 1/10. The measures reduce vnode cache pressure on kmem and bring the vnode cache memory use below some apparent thresholds that were exceeded by r291244 due to more robust vnode reuse. Reported and tested by: marius (i386, previous version) Reviewed by: bde Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
|
#
793c3817 |
|
18-Jan-2016 |
Mark Johnston <markj@FreeBSD.org> |
Add vrefl(), a locked variant of vref(9). This API has no in-tree consumers at the moment but is useful to at least one out-of-tree consumer, and naturally complements existing vnode refcount functions (vholdl(9), vdropl(9)). Obtained from: kib (sys/ portion) Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D4947 Differential Revision: https://reviews.freebsd.org/D4953
|
#
106ebb76 |
|
16-Dec-2015 |
Konstantin Belousov <kib@FreeBSD.org> |
Optimize vop_stdadvise(POSIX_FADV_DONTNEED). Instead of looking up a buffer for each block number in the range with gbincore(), look up the next instantiated buffer with the logical block number which is greater or equal to the next lblkno. This significantly speeds up the iteration for sparce-populated range. Move the iteration into new helper bnoreuselist(), which is structured similarly to flushbuflist(). Reported and tested by: pho Reviewed by: markj Sponsored by: The FreeBSD Foundation
|
#
f186a80d |
|
26-Nov-2015 |
Konstantin Belousov <kib@FreeBSD.org> |
Remove VI_AGE vnode iflag, it is unused. Noted by: bde Sponsored by: The FreeBSD Foundation
|
#
412ce274 |
|
16-Sep-2015 |
Steven Hartland <smh@FreeBSD.org> |
Fix kqueue write events for files > 2GB Due to the use of int's for file offsets in the VOP_WRITE_(PRE|POST) macros, kqueue write events for files greater 2GB where never fired. This caused tail -f on a file greater 2GB to never see updates. MFC after: 1 week Relnotes: YES Sponsored by: Multiplay
|
#
55d33667 |
|
15-Sep-2015 |
Conrad Meyer <cem@FreeBSD.org> |
kevent(2): Note DOOMED vnodes with NOTE_REVOKE In poll mode, check for and wake VBAD vnodes. (Vnodes that are VBAD at registration will never be woken by the RECLAIM trigger.) Add post-VOP_RECLAIM hook to trigger notes on vnode reclamation. (Vnodes that were fine at registration but are vgoned while being monitored should signal waiters.) Reviewed by: kib Approved by: markj (mentor) Sponsored by: EMC / Isilon Storage Division Differential Revision: https://reviews.freebsd.org/D3675
|
#
17518b1a |
|
05-Sep-2015 |
Kirk McKusick <mckusick@FreeBSD.org> |
Track changes to kern.maxvnodes and appropriately increase or decrease the size of the name cache hash table (mapping file names to vnodes) and the vnode hash table (mapping mount point and inode number to vnode). An appropriate locking strategy is the key to changing hash table sizes while they are in active use. Reviewed by: kib Tested by: Peter Holm Differential Revision: https://reviews.freebsd.org/D2265 MFC after: 2 weeks
|
#
c9ba6504 |
|
24-Aug-2015 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Make vfs_unmountall() unmount /dev after /, not before. The only reason this didn't result in an unclean shutdown is that devfs ignores MNT_FORCE flag. Reviewed by: kib@ MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D3467
|
#
6e572e08 |
|
23-Aug-2015 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
After r286237 it should be fine to call vgone(9) on a busy GEOM vnode; remove KASSERT that would prevent forced devfs unmount from working. MFC after: 1 month Sponsored by: The FreeBSD Foundation
|
#
752fc07d |
|
16-Jul-2015 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: implement v_holdcnt/v_usecount manipulation using atomic ops Transitions 0->1 and 1->0 (which decide e.g. on putting the vnode on the free list) of either counter are still guarded with vnode interlock. Reviewed by: kib (earlier version) Tested by: pho
|
#
f0725a8e |
|
11-Jul-2015 |
Mateusz Guzik <mjg@FreeBSD.org> |
Move chdir/chroot-related fdp manipulation to kern_descrip.c Prefix exported functions with pwd_. Deduplicate some code by adding a helper for setting fd_cdir. Reviewed by: kib
|
#
f6f6d240 |
|
10-Jun-2015 |
Mateusz Guzik <mjg@FreeBSD.org> |
Implement lockless resource limits. Use the same scheme implemented to manage credentials. Code needing to look at process's credentials (as opposed to thred's) is provided with *_proc variants of relevant functions. Places which possibly had to take the proc lock anyway still use the proc pointer to access limits.
|
#
2db0e1f5 |
|
27-May-2015 |
Konstantin Belousov <kib@FreeBSD.org> |
Add V_MNTREF flag to the vn_start_write(9) and vn_start_secondary_write(9) functions. The flag indicates that the caller already owns a reference on the mount point, and the functions can consume it. The reference is released by vn_finished_write(9) and vn_finished_secondary_write(9) in due course. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
|
#
d5fec489 |
|
21-Apr-2015 |
Craig Rodrigues <rodrigc@FreeBSD.org> |
Support file verification in MAC. * Add VCREAT flag to indicate when a new file is being created * Add VVERIFY to indicate verification is required * Both VCREAT and VVERIFY are only passed on the MAC method vnode_check_open and are removed from the accmode after * Add O_VERIFY flag to rtld open of objects * Add 'v' flag to __sflags to set O_VERIFY flag. Submitted by: Steve Kiernan <stevek@juniper.net> Obtained from: Juniper Networks, Inc. GitHub Pull Request: https://github.com/freebsd/freebsd/pull/27 Relnotes: yes
|
#
08189ed6 |
|
27-Feb-2015 |
Konstantin Belousov <kib@FreeBSD.org> |
The VNASSERT in vflush() FORCECLOSE case is trying to panic early to prevent errors from yanking devices out from under filesystems. Only care about special vnodes on devfs, special nodes on other kinds of filesystems do not have special properties. Sponsored by: EMC / Isilon Storage Division Submitted by: Conrad Meyer MFC after: 1 week
|
#
8ee9765a |
|
21-Dec-2014 |
Konstantin Belousov <kib@FreeBSD.org> |
Add VN_OPEN_NAMECACHE flag for vn_open_cred(9), which requests that the created file name was cached. Use the flag for core dumps. Requested by: rpaulo Tested by: pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
|
#
90effb23 |
|
22-Nov-2014 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Merge from projects/sendfile: o Provide a new VOP_GETPAGES_ASYNC(), which works like VOP_GETPAGES(), but doesn't sleep. It returns immediately, and will execute the I/O done handler function that must be supplied as argument. o Provide VOP_GETPAGES_ASYNC() for the FFS, which uses vnode_pager. o Extend pagertab to support pgo_getpages_async method, and implement this method for vnode_pager. Reviewed by: kib Tested by: pho Sponsored by: Netflix Sponsored by: Nginx, Inc.
|
#
f12aa60c |
|
15-Oct-2014 |
Konstantin Belousov <kib@FreeBSD.org> |
When vnode bypass cannot be performed on the cdev file descriptor for read/write/poll/ioctl, call standard vnode filedescriptor fop. This restores the special handling for terminals by calling the deadfs VOP, instead of always returning ENXIO for destroyed devices or revoked terminals. Since destroyed (and not revoked) device would use devfs_specops VOP vector, make dead_read/write/poll non-static and fill VOP table with pointers to the functions, to instead of VOP_PANIC. Noted and reviewed by: bde Sponsored by: The FreeBSD Foundation MFC after: 1 week
|
#
e3d6fece |
|
04-Oct-2014 |
Konstantin Belousov <kib@FreeBSD.org> |
Add IO_RANGELOCKED flag for vn_rdwr(9), which specifies that vnode is not locked, but range is. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
|
#
bf66496e |
|
01-Oct-2014 |
Will Andrews <will@FreeBSD.org> |
Embellish a comment regarding the reliability of DEBUG_VFS_LOCKS. Submitted by: kib
|
#
575e02d9 |
|
29-Aug-2014 |
Konstantin Belousov <kib@FreeBSD.org> |
Add function and wrapper to switch lockmgr and vnode lock back to auto-promotion of shared to exclusive. Tested by: hrs, pho Sponsored by: The FreeBSD Foundation MFC after: 1 week
|
#
895b3782 |
|
14-Jul-2014 |
Konstantin Belousov <kib@FreeBSD.org> |
Extract the code to put a filesystem into the suspended state (at the unmount time) in the helper vfs_write_suspend_umnt(). Use it instead of two inline copies in FFS. Fix the bug in the FFS unmount, when suspension failed, the ufs extattrs were not reinitialized. Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
|
#
a6945216 |
|
14-Jul-2014 |
Konstantin Belousov <kib@FreeBSD.org> |
Generalize vn_get_ino() to allow filesystems to use custom vnode producer, instead of hard-coding VFS_VGET(). New function, which takes callback, is called vn_get_ino_gen(), standard callback for vn_get_ino() is provided. Convert inline copies of vn_get_ino() in msdosfs and cd9660 into the uses of vn_get_ino_gen(). Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
|
#
7b81a399 |
|
17-Jun-2014 |
Konstantin Belousov <kib@FreeBSD.org> |
In msdosfs_setattr(), add a check for result of the utimes(2) permissions test, forgotten in r164033. Refactor the permission checks for utimes(2) into vnode helper function vn_utimes_perm(9), and simplify its code comparing with the UFS origin, by writing the call to VOP_ACCESSX only once. Use the helper for UFS(5), tmpfs(5), devfs(5) and msdosfs(5). Reported by: bde Reviewed by: bde, trasz Sponsored by: The FreeBSD Foundation MFC after: 1 week
|
#
cc3d8c35 |
|
09-Jul-2013 |
Konstantin Belousov <kib@FreeBSD.org> |
There are several code sequences like vfs_busy(mp); vfs_write_suspend(mp); which are problematic if other thread starts unmount between two calls. The unmount starts a write, while vfs_write_suspend() drain writers. On the other hand, unmount drains busy references, causing the deadlock. Add a flag argument to vfs_write_suspend and require the callers of it to specify VS_SKIP_UNMOUNT flag, when the call is performed not in the mount path, i.e. the covered vnode is not locked. The suspension is not attempted if VS_SKIP_UNMOUNT is specified and unmount is in progress. Reported and tested by: Andreas Longwitz <longwitz@incore.de> Sponsored by: The FreeBSD Foundation MFC after: 3 weeks
|
#
3289d587 |
|
20-Mar-2013 |
Kirk McKusick <mckusick@FreeBSD.org> |
When renaming a directory from one parent directory to another, we need to call ufs_checkpath() to walk from our new location to the root of the filesystem to ensure that we do not encounter ourselves along the way. Until now, we accomplished this by reading the ".." entries of each directory in our path until we reached the root (or encountered an error). This change tries to avoid the I/O of reading the ".." entries by first looking them up in the name cache and only doing the I/O when the name cache lookup fails. Reviewed by: kib Tested by: Peter Holm MFC after: 4 weeks
|
#
5f5f0554 |
|
15-Mar-2013 |
Konstantin Belousov <kib@FreeBSD.org> |
Implement the helper function vn_io_fault_pgmove(), intended to use by the filesystem VOP_READ() and VOP_WRITE() implementations in the same way as vn_io_fault_uiomove() over the unmapped buffers. Helper provides the convenient wrapper over the pmap_copy_pages() for struct uio consumers, taking care of the TDP_UIOHELD situations. Sponsored by: The FreeBSD Foundation Tested by: pho MFC after: 2 weeks
|
#
ba05dec5 |
|
27-Feb-2013 |
Konstantin Belousov <kib@FreeBSD.org> |
The softdep freeblks workitem might hold a reference on the dquot. Current dqflush() panics when a dquot with with non-zero refcount is encountered. The situation is possible, because quotas are turned off before softdep workitem queue if flushed, due to the quota file writes might create softdep workitems. Make the encountering an active dquot in dqflush() not fatal, return the error from quotaoff() instead. Ignore the quotaoff() failures when ffs_flushfiles() is called in the course of softdep_flushfiles() loop, until the last iteration. At the last loop, the quotas must be closed, and because SU workitems should be already flushed, the references to dquot are gone. Sponsored by: The FreeBSD Foundation Reported and tested by: pho Reviewed by: mckusick MFC after: 2 weeks
|
#
ab52a230 |
|
13-Jan-2013 |
Konstantin Belousov <kib@FreeBSD.org> |
Rearrange the struct bufobj and struct vnode layouts to reduce padding. On the amd64 kernel with INVARIANTS turned off, size of the struct vnode is reduced from 496 to 472 bytes, saving 24 bytes of memory and KVA per vnode. Noted and reviewed by: peter Tested by: pho Sponsored by: The FreeBSD Foundation
|
#
f6af8e37 |
|
13-Jan-2013 |
Konstantin Belousov <kib@FreeBSD.org> |
Add exported vfs_hash_index() function, which calculates the canonical pre-masked hash for the given vnode. The function assumes that vp->v_hash is initialized by the filesystem vnode instantiation function. At the moment, it is only done if filesystem uses vfs_hash_insert(). Reviewed by: peter Tested by: peter, pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 5 days
|
#
ddd6b3fc |
|
10-Jan-2013 |
Konstantin Belousov <kib@FreeBSD.org> |
Add flags argument to vfs_write_resume() and remove vfs_write_resume_flags(). Sponsored by: The FreeBSD Foundation
|
#
f99cb34c |
|
01-Jan-2013 |
Konstantin Belousov <kib@FreeBSD.org> |
The process_deferred_inactive() function locks the vnodes of the ufs mount, which means that is must not be called while the snaplock is owned. The vfs_write_resume(9) does call the function as the VFS_SUSP_CLEAN() method, which is too early and falls into the region still protected by snaplock. Add yet another flag for the vfs_write_resume_flags() to avoid calling suspension cleanup handler after the suspend is lifted, and use it in the ffs_snapshot() call to vfs_write_resume. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
|
#
91e94745 |
|
28-Dec-2012 |
Konstantin Belousov <kib@FreeBSD.org> |
Make it possible to atomically resume writes on the mount and account the write start, by adding a variation of the vfs_write_resume(9) which accepts flags. Use the new function to prevent a deadlock between parallel suspension and snapshotting a UFS mount. The ffs_snapshot() code performed vfs_write_resume() followed by vn_start_write() while owning the snaplock. If the suspension intervene between resume and vn_start_write(), the deadlock occured after the suspending thread tried to lock the snaplock, most typically during the write in the ffs_copyonwrite(). Reported and tested by: Andreas Longwitz <longwitz@incore.de> Reviewed by: mckusick MFC after: 2 weeks X-MFC-note: make the vfs_write_resume(9) function a macro after the MFC, in HEAD
|
#
f121e3e8 |
|
27-Nov-2012 |
Pawel Jakub Dawidek <pjd@FreeBSD.org> |
- Add NOCAPCHECK flag to namei that allows lookup to work even if the process is in capability mode. - Add VN_OPEN_NOCAPCHECK flag for vn_open_cred() to will ne converted into NOCAPCHECK namei flag. This functionality will be used to enable core dumps for sandboxed processes. Reviewed by: rwatson Obtained from: WHEEL Systems MFC after: 2 weeks
|
#
9b233e23 |
|
14-Oct-2012 |
Konstantin Belousov <kib@FreeBSD.org> |
Add a KPI to allow to reserve some amount of space in the numvnodes counter, without actually allocating the vnodes. The supposed use of the getnewvnode_reserve(9) is to reclaim enough free vnodes while the code still does not hold any resources that might be needed during the reclamation, and to consume the slack later for getnewvnode() calls made from the innards. After the critical block is finished, the caller shall free any reserve left, by getnewvnode_drop_reserve(9). Reviewed by: avg Tested by: pho MFC after: 1 week
|
#
7ac1b61a |
|
08-Jun-2012 |
John Baldwin <jhb@FreeBSD.org> |
Split the second half of vn_open_cred() (after a vnode has been found via a lookup or created via VOP_CREATE()) into a new vn_open_vnode() function and use this function in fhopen() instead of duplicating code from vn_open_cred() directly. Tested by: pho Reviewed by: kib MFC after: 2 weeks
|
#
41014d99 |
|
30-May-2012 |
Konstantin Belousov <kib@FreeBSD.org> |
vn_io_fault() is a facility to prevent page faults while filesystems perform copyin/copyout of the file data into the usermode buffer. Typical filesystem hold vnode lock and some buffer locks over the VOP_READ() and VOP_WRITE() operations, and since page fault handler may need to recurse into VFS to get the page content, a deadlock is possible. The facility works by disabling page faults handling for the current thread and attempting to execute i/o while allowing uiomove() to access the usermode mapping of the i/o buffer. If all buffer pages are resident, uiomove() is successfull and request is finished. If EFAULT is returned from uiomove(), the pages backing i/o buffer are faulted in and held, and the copyin/out is performed using uiomove_fromphys() over the held pages for the second attempt of VOP call. Since pages are hold in chunks to prevent large i/o requests from starving free pages pool, and since vnode lock is only taken for i/o over the current chunk, the vnode lock no longer protect atomicity of the whole i/o request. Use newly added rangelocks to provide the required atomicity of i/o regardind other i/o and truncations. Filesystems need to explicitely opt-in into the scheme, by setting the MNTK_NO_IOPF struct mount flag, and optionally by using vn_io_fault_uiomove(9) helper which takes care of calling uiomove() or converting uio into request for uiomove_fromphys(). Reviewed by: bf (comments), mdf, pjd (previous version) Tested by: pho Tested by: flo, Gustau P?rez <gperez entel upc edu> (previous version) MFC after: 2 months
|
#
8f0e9130 |
|
30-May-2012 |
Konstantin Belousov <kib@FreeBSD.org> |
Add a rangelock implementation, intended to be used to range-locking the i/o regions of the vnode data space. The implementation is quite simple-minded, it uses the list of the lock requests, ordered by arrival time. Each request may be for read or for write. The implementation is fair FIFO. MFC after: 2 month
|
#
c83f909c |
|
30-May-2012 |
Konstantin Belousov <kib@FreeBSD.org> |
Clarify that the v_lockf is advisory lock list. MFC after: 3 days
|
#
292520f7 |
|
25-May-2012 |
Konstantin Belousov <kib@FreeBSD.org> |
Add a vn_bmap_seekhole(9) vnode helper which can be used by any filesystem which supports VOP_BMAP(9) to implement SEEK_HOLE/SEEK_DATA commands for lseek(2). MFC after: 2 weeks
|
#
af6e6b87 |
|
23-Apr-2012 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Remove unused thread argument to vrecycle(). Reviewed by: kib
|
#
c52fd858 |
|
23-Apr-2012 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Remove unused thread argument from vtruncbuf(). Reviewed by: kib
|
#
f257ebbb |
|
20-Apr-2012 |
Kirk McKusick <mckusick@FreeBSD.org> |
This change creates a new list of active vnodes associated with a mount point. Active vnodes are those with a non-zero use or hold count, e.g., those vnodes that are not on the free list. Note that this list is in addition to the list of all the vnodes associated with a mount point. To avoid adding another set of linkage pointers to the vnode structure, the active list uses the existing linkage pointers used by the free list (previously named v_freelist, now renamed v_actfreelist). This update adds the MNT_VNODE_FOREACH_ACTIVE interface that loops over just the active vnodes associated with a mount point (typically less than 1% of the vnodes associated with the mount point). Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks
|
#
73305eb8 |
|
17-Apr-2012 |
Kirk McKusick <mckusick@FreeBSD.org> |
Drop export of vdestroy() function from kern/vfs_subr.c as it is used only as a helper function in that file. Replace sole call to vbusy() with inline code in vholdl(). Replace sole calls to vfree() and vdestroy() with inline code in vdropl(). The Clang compiler already inlines these functions, so they do not show up in a kernel backtrace which is confusing. Also you cannot set their frame in kgdb which means that it is impossible to view their local variables. So, while the produced code is unchanged, the debugging should be easier. Discussed with: kib MFC after: 2 weeks
|
#
ecb6e528 |
|
11-Apr-2012 |
Kirk McKusick <mckusick@FreeBSD.org> |
Export vinactive() from kern/vfs_subr.c (e.g., make it no longer static and declare its prototype in sys/vnode.h) so that it can be called from process_deferred_inactive() (in ufs/ffs/ffs_snapshot.c) instead of the body of vinactive() being cut and pasted into process_deferred_inactive(). Reviewed by: kib MFC after: 2 weeks
|
#
bd944e01 |
|
24-Mar-2012 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Remove unused define. Discussed with: kib
|
#
b80dcb55 |
|
10-Mar-2012 |
Konstantin Belousov <kib@FreeBSD.org> |
Remove fifo.h. The only used function declaration from the header is migrated to sys/vnode.h. Submitted by: gianni
|
#
5e99212d |
|
02-Mar-2012 |
Rick Macklem <rmacklem@FreeBSD.org> |
Post r230394, the Lookup RPC counts for both NFS clients increased significantly. Upon investigation this was caused by name cache misses for lookups of "..". For name cache entries for non-".." directories, the cache entry serves double duty. It maps both the named directory plus ".." for the parent of the directory. As such, two ctime values (one for each of the directory and its parent) need to be saved in the name cache entry. This patch adds an entry for ctime of the parent directory to the name cache. It also adds an additional uma zone for large entries with this time value, in order to minimize memory wastage. As well, it fixes a couple of cases where the mtime of the parent directory was being saved instead of ctime for positive name cache entries. With this patch, Lookup RPC counts return to values similar to pre-r230394 kernels. Reported by: bde Discussed with: kib Reviewed by: jhb MFC after: 2 weeks
|
#
c7e41c8b |
|
29-Feb-2012 |
Mikolaj Golub <trociny@FreeBSD.org> |
Introduce VOP_UNP_BIND(), VOP_UNP_CONNECT(), and VOP_UNP_DETACH() operations for setting and accessing vnode's v_socket field. The operations are necessary to implement proper unix socket handling on layered file systems like nullfs(5). This change fixes the long standing issue with nullfs(5) being in that unix sockets did not work between lower and upper layers: if we bound to a socket on the lower layer we could connect only to the lower path; if we bound to the upper layer we could connect only to the upper path. The new behavior is one can connect to both the lower and the upper paths regardless what layer path one binds to. PR: kern/51583, kern/159663 Suggested by: kib Reviewed by: arch MFC after: 2 weeks
|
#
662c901c |
|
25-Feb-2012 |
Mikolaj Golub <trociny@FreeBSD.org> |
When detaching an unix domain socket, uipc_detach() checks unp->unp_vnode pointer to detect if there is a vnode associated with (binded to) this socket and does necessary cleanup if there is. The issue is that after forced unmount this check may be too late as the unp_vnode is reclaimed and the reference is stale. To fix this provide a helper function that is called on a socket vnode reclamation to do necessary cleanup. Pointed by: kib Reviewed by: kib MFC after: 2 weeks
|
#
526d0bd5 |
|
20-Feb-2012 |
Konstantin Belousov <kib@FreeBSD.org> |
Fix found places where uio_resid is truncated to int. Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from the usermode. Discussed with: bde, das (previous versions) MFC after: 1 month
|
#
c0ae88db |
|
08-Feb-2012 |
Konstantin Belousov <kib@FreeBSD.org> |
Trim 8 unused bytes from struct vnode on 64-bit architectures. Reviewed by: alc
|
#
bf40d24a |
|
06-Feb-2012 |
John Baldwin <jhb@FreeBSD.org> |
Rename cache_lookup_times() to cache_lookup() and retire the old API and ABI stub for cache_lookup().
|
#
c480f781 |
|
06-Feb-2012 |
Konstantin Belousov <kib@FreeBSD.org> |
Current implementations of sync(2) and syncer vnode fsync() VOP uses mnt_noasync counter to temporary remove MNTK_ASYNC mount option, which is needed to guarantee a synchronous completion of the initiated i/o before syscall or VOP return. Global removal of MNTK_ASYNC option is harmful because not only i/o started from corresponding thread becomes synchronous, but all i/o is synchronous on the filesystem which is initiated during sync(2) or syncer activity. Instead of removing MNTK_ASYNC from mnt_kern_flag, provide a local thread flag to disable async i/o for current thread only. Use the opportunity to move DOINGASYNC() macro into sys/vnode.h and consistently use it through places which tested for MNTK_ASYNC. Some testing demonstrated 60-70% improvements in run time for the metadata-intensive operations on async-mounted UFS volumes, but still with great deviation due to other reasons. Reviewed by: mckusick Tested by: scottl MFC after: 2 weeks
|
#
5aefb4cb |
|
20-Jan-2012 |
John Baldwin <jhb@FreeBSD.org> |
Close a race in NFS lookup processing that could result in stale name cache entries on one client when a directory was renamed on another client. The root cause for the stale entry being trusted is that each per-vnode nfsnode structure has a single 'n_ctime' timestamp used to validate positive name cache entries. However, if there are multiple entries for a single vnode, they all share a single timestamp. To fix this, extend the name cache to allow filesystems to optionally store a timestamp value in each name cache entry. The NFS clients now fetch the timestamp associated with each name cache entry and use that to validate cache hits instead of the timestamps previously stored in the nfsnode. Another part of the fix is that the NFS clients now use timestamps from the post-op attributes of RPCs when adding name cache entries rather than pulling the timestamps out of the file's attribute cache. The latter is subject to races with other lookups updating the attribute cache concurrently. Some more details: - Add a variant of nfsm_postop_attr() to the old NFS client that can return a vattr structure with a copy of the post-op attributes. - Handle lookups of "." as a special case in the NFS clients since the name cache does not store name cache entries for ".", so we cannot get a useful timestamp. It didn't really make much sense to recheck the attributes on the the directory to validate the namecache hit for "." anyway. - ABI compat shims for the name cache routines are present in this commit so that it is safe to MFC. MFC after: 2 weeks
|
#
f6e633a9 |
|
14-Jan-2012 |
Martin Matuska <mm@FreeBSD.org> |
Introduce vn_path_to_global_path() This function updates path string to vnode's full global path and checks the size of the new path string against the pathlen argument. In vfs_domount(), sys_unmount() and kern_jail_set() this new function is used to update the supplied path argument to the respective global path. Unbreaks jailed zfs(8) with enforce_statfs set to 1. Reviewed by: kib MFC after: 1 month
|
#
f0d6c5ca |
|
23-Dec-2011 |
John Baldwin <jhb@FreeBSD.org> |
Add post-VOP hooks for VOP_DELETEEXTATTR() and VOP_SETEXTATTR() and use these to trigger a NOTE_ATTRIB EVFILT_VNODE kevent when the extended attributes of a vnode are changed. Note that OS X already implements this behavior. Reviewed by: rwatson MFC after: 2 weeks
|
#
936c09ac |
|
03-Nov-2011 |
John Baldwin <jhb@FreeBSD.org> |
Add the posix_fadvise(2) system call. It is somewhat similar to madvise(2) except that it operates on a file descriptor instead of a memory region. It is currently only supported on regular files. Just as with madvise(2), the advice given to posix_fadvise(2) can be divided into two types. The first type provide hints about data access patterns and are used in the file read and write routines to modify the I/O flags passed down to VOP_READ() and VOP_WRITE(). These modes are thus filesystem independent. Note that to ease implementation (and since this API is only advisory anyway), only a single non-normal range is allowed per file descriptor. The second type of hints are used to hint to the OS that data will or will not be used. These hints are implemented via a new VOP_ADVISE(). A default implementation is provided which does nothing for the WILLNEED request and attempts to move any clean pages to the cache page queue for the DONTNEED request. This latter case required two other changes. First, a new V_CLEANONLY flag was added to vinvalbuf(). This requests vinvalbuf() to only flush clean buffers for the vnode from the buffer cache and to not remove any backing pages from the vnode. This is used to ensure clean pages are not wired into the buffer cache before attempting to move them to the cache page queue. The second change adds a new vm_object_page_cache() method. This method is somewhat similar to vm_object_page_remove() except that instead of freeing each page in the specified range, it attempts to move clean pages to the cache queue if possible. To preserve the ABI of struct file, the f_cdevpriv pointer is now reused in a union to point to the currently active advice region if one is present for regular files. Reviewed by: jilles, kib, arch@ Approved by: re (kib) MFC after: 1 month
|
#
82378711 |
|
25-Aug-2011 |
Martin Matuska <mm@FreeBSD.org> |
Generalize ffs_pages_remove() into vn_pages_remove(). Remove mapped pages for all dataset vnodes in zfs_rezget() using new vn_pages_remove() to fix mmapped files changed by zfs rollback or zfs receive -F. PR: kern/160035, kern/156933 Reviewed by: kib, pjd Approved by: re (kib) MFC after: 1 week
|
#
9c00bb91 |
|
16-Aug-2011 |
Konstantin Belousov <kib@FreeBSD.org> |
Add the fo_chown and fo_chmod methods to struct fileops and use them to implement fchown(2) and fchmod(2) support for several file types that previously lacked it. Add MAC entries for chown/chmod done on posix shared memory and (old) in-kernel posix semaphores. Based on the submission by: glebius Reviewed by: rwatson Approved by: re (bz)
|
#
91538d78 |
|
10-Jul-2011 |
Konstantin Belousov <kib@FreeBSD.org> |
Update locking annotations for the struct vnode. MFC after: 3 days
|
#
280e091a |
|
10-Jun-2011 |
Jeff Roberson <jeff@FreeBSD.org> |
Implement fully asynchronous partial truncation with softupdates journaling to resolve errors which can cause corruption on recovery with the old synchronous mechanism. - Append partial truncation freework structures to indirdeps while truncation is proceeding. These prevent new block pointers from becoming valid until truncation completes and serialize truncations. - On completion of a partial truncate journal work waits for zeroed pointers to hit indirects. - softdep_journal_freeblocks() handles last frag allocation and last block zeroing. - vtruncbuf/ffs_page_remove moved into softdep_*_freeblocks() so it is only implemented in one place. - Block allocation failure handling moved up one level so it does not proceed with buf locks held. This permits us to do more extensive reclaims when filesystem space is exhausted. - softdep_sync_metadata() is broken into two parts, the first executes once at the start of ffs_syncvnode() and flushes truncations and inode dependencies. The second is called on each locked buf. This eliminates excessive looping and rollbacks. - Improve the mechanism in process_worklist_item() that handles acquiring vnode locks for handle_workitem_remove() so that it works more generally and does not loop excessively over the same worklist items on each call. - Don't corrupt directories by zeroing the tail in fsck. This is only done for regular files. - Push a fsync complete record for files that need it so the checker knows a truncation in the journal is no longer valid. Discussed with: mckusick, kib (ffs_pages_remove and ffs_truncate parts) Tested by: pho
|
#
d91f88f7 |
|
18-Apr-2011 |
Matthew D Fleming <mdf@FreeBSD.org> |
Add the posix_fallocate(2) syscall. The default implementation in vop_stdallocate() is filesystem agnostic and will run as slow as a read/write loop in userspace; however, it serves to correctly implement the functionality for filesystems that do not implement a VOP_ALLOCATE. Note that __FreeBSD_version was already bumped today to 900036 for any ports which would like to use this function. Also reserve space in the syscall table for posix_fadvise(2). Reviewed by: -arch (previous version)
|
#
08b163fa |
|
02-Feb-2011 |
Matthew D Fleming <mdf@FreeBSD.org> |
Put the general logic for being a CPU hog into a new function should_yield(). Use this in various places. Encapsulate the common case of check-and-yield into a new function maybe_yield(). Change several checks for a magic number of iterations to use should_yield() instead. MFC after: 1 week
|
#
730b63b0 |
|
19-Nov-2010 |
Konstantin Belousov <kib@FreeBSD.org> |
Remove prtactive variable and related printf()s in the vop_inactive and vop_reclaim() methods. They seems to be unused, and the reported situation is normal for the forced unmount. MFC after: 1 week X-MFC-note: keep prtactive symbol in vfs_subr.c
|
#
a7d5f7eb |
|
19-Oct-2010 |
Jamie Gritton <jamie@FreeBSD.org> |
A new jail(8) with a configuration file, to replace the work currently done by /etc/rc.d/jail.
|
#
3634d5b2 |
|
20-Aug-2010 |
John Baldwin <jhb@FreeBSD.org> |
Add dedicated routines to toggle lockmgr flags such as LK_NOSHARE and LK_CANRECURSE after a lock is created. Use them to implement macros that otherwise manipulated the flags directly. Assert that the associated lockmgr lock is exclusively locked by the current thread when manipulating these flags to ensure the flag updates are safe. This last change required some minor shuffling in a few filesystems to exclusively lock a brand new vnode slightly earlier. Reviewed by: kib MFC after: 3 days
|
#
3979450b |
|
06-Aug-2010 |
Konstantin Belousov <kib@FreeBSD.org> |
Add new make_dev_p(9) flag MAKEDEV_ETERNAL to inform devfs that created cdev will never be destroyed. Propagate the flag to devfs vnodes as VV_ETERNVALDEV. Use the flags to avoid acquiring devmtx and taking a thread reference on such nodes. In collaboration with: pho MFC after: 1 month
|
#
df17ff61 |
|
11-Jun-2010 |
Andriy Gapon <avg@FreeBSD.org> |
vnode.h: expand debug macros to non-empty void statements when DEBUG_VFS_LOCKS is disabled MFC after: 2 weeks
|
#
7fd32ea9 |
|
12-May-2010 |
Zachary Loafman <zml@FreeBSD.org> |
Add VOP_ADVLOCKPURGE so that the file system is called when purging locks (in the case where the VFS impl isn't using lf_*) Submitted by: Matthew Fleming <matthew.fleming@isilon.com> Reviewed by: zml, dfr
|
#
307d88b7 |
|
06-May-2010 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Style fixes and removal of unneeded variable. Submitted by: bde@
|
#
b5f770bd |
|
05-May-2010 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Move checking against RLIMIT_FSIZE into one place, vn_rlimit_fsize(). Reviewed by: kib
|
#
bb9a8424 |
|
09-Apr-2010 |
Konstantin Belousov <kib@FreeBSD.org> |
MFC r206093: Add function vop_rename_fail(9) that performs needed cleanup for locks and references of the VOP_RENAME(9) arguments. Use vop_rename_fail() in deadfs_rename().
|
#
ea015880 |
|
02-Apr-2010 |
Konstantin Belousov <kib@FreeBSD.org> |
Add function vop_rename_fail(9) that performs needed cleanup for locks and references of the VOP_RENAME(9) arguments. Use vop_rename_fail() in deadfs_rename(). Tested by: Mikolaj Golub MFC after: 1 week
|
#
bd7ae209 |
|
27-Mar-2010 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
MFC r197680: Provide default implementation for VOP_ACCESS(9), so that filesystems which want to provide VOP_ACCESSX(9) don't have to implement both. Note that this commit makes implementation of either of these two mandatory. Reviewed by: kib
|
#
78e32164 |
|
27-Mar-2010 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
MFC r197405: Add pieces of infrastructure required for NFSv4 ACL support in UFS. Reviewed by: rwatson
|
#
e9560b40 |
|
07-Feb-2010 |
Konstantin Belousov <kib@FreeBSD.org> |
MFC r202528: Add vunref(9).
|
#
d2f334bf |
|
17-Jan-2010 |
Konstantin Belousov <kib@FreeBSD.org> |
Add new function vunref(9) that decrements vnode use count (and hold count) while vnode is exclusively locked. The code for vput(9), vrele(9) and vunref(9) is merged. In collaboration with: pho Reviewed by: alc MFC after: 3 weeks
|
#
2d63cbda |
|
10-Jan-2010 |
Konstantin Belousov <kib@FreeBSD.org> |
MFC r200770: Remove VI_OBJDIRTY and make sure that OBJ_MIGHTBEDIRTY is set only for vnode-backed vm objects.
|
#
49e3050e |
|
20-Dec-2009 |
Konstantin Belousov <kib@FreeBSD.org> |
VI_OBJDIRTY vnode flag mirrors the state of OBJ_MIGHTBEDIRTY vm object flag. Besides providing the redundand information, need to update both vnode and object flags causes more acquisition of vnode interlock. OBJ_MIGHTBEDIRTY is only checked for vnode-backed vm objects. Remove VI_OBJDIRTY and make sure that OBJ_MIGHTBEDIRTY is set only for vnode-backed vm objects. Suggested and reviewed by: alc Tested by: pho MFC after: 3 weeks
|
#
2c29cfa0 |
|
01-Oct-2009 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Provide default implementation for VOP_ACCESS(9), so that filesystems which want to provide VOP_ACCESSX(9) don't have to implement both. Note that this commit makes implementation of either of these two mandatory. Reviewed by: kib
|
#
a9315dde |
|
22-Sep-2009 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Add pieces of infrastructure required for NFSv4 ACL support in UFS. Reviewed by: rwatson
|
#
fe1d3f15 |
|
28-Jun-2009 |
Stanislav Sedov <stas@FreeBSD.org> |
- Turn the third (islocked) argument of the knote call into flags parameter. Introduce the new flag KNF_NOKQLOCK to allow event callers to be called without KQ_LOCK mtx held. - Modify VFS knote calls to always use KNF_NOKQLOCK flag. This is required for ZFS as its getattr implementation may sleep. Approved by: re (rwatson) Reviewed by: kib MFC after: 2 weeks
|
#
aec53722 |
|
25-Jun-2009 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Tweak comment.
|
#
c808c963 |
|
21-Jun-2009 |
Konstantin Belousov <kib@FreeBSD.org> |
Add explicit struct ucred * argument for VOP_VPTOCNP, to be used by vn_open_cred in default implementation. Valid struct ucred is needed for audit and MAC, and curthread credentials may be wrong. This further requires modifying the interface of vn_fullpath(9), but it is out of scope of this change. Reviewed by: rwatson
|
#
e0c161b8 |
|
21-Jun-2009 |
Konstantin Belousov <kib@FreeBSD.org> |
Add another flags argument to vn_open_cred. Use it to specify that some vn_open_cred invocations shall not audit namei path. In particular, specify VN_OPEN_NOAUDIT for dotdot lookup performed by default implementation of vop_vptocnp, and for the open done for core file. vn_fullpath is called from the audit code, and vn_open there need to disable audit to avoid infinite recursion. Core file is created on return to user mode, that, in particular, happens during syscall return. The creation of the core file is audited by direct calls, and we do not want to overwrite audit information for syscall. Reported, reviewed and tested by: rwatson
|
#
f0830182 |
|
02-Jun-2009 |
Attilio Rao <attilio@FreeBSD.org> |
Handle lock recursion differenty by always checking against LO_RECURSABLE instead the lock own flag itself. Tested by: pho
|
#
0449e6e1 |
|
31-May-2009 |
Konstantin Belousov <kib@FreeBSD.org> |
Eliminate code duplication in vn_fullpath1() around the cache lookups and calls to vn_vptocnp() by moving more of the common code to vn_vptocnp(). Rename vn_vptocnp() to vn_vptocnp_locked() to signify that cache is locked around the call. Do not track buffer position by both the pointer and offset, use only buflen to record the start of the free space. Export vn_vptocnp() for external consumers as a wrapper around vn_vptocnp_locked() that locks the cache and handles hold counts. Tested by: pho
|
#
c97fcdba |
|
30-May-2009 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Add VOP_ACCESSX, which can be used to query for newly added V* permissions, such as VWRITE_ACL. For a filsystems that don't implement it, there is a default implementation, which works as a wrapper around VOP_ACCESS. Reviewed by: rwatson@
|
#
885868cd |
|
10-Apr-2009 |
Robert Watson <rwatson@FreeBSD.org> |
Remove VOP_LEASE and supporting functions. This hasn't been used since the removal of NQNFS, but was left in in case it was required for NFSv4. Since our new NFSv4 client and server can't use it for their requirements, GC the old mechanism, as well as other unused lease- related code and interfaces. Due to its impact on kernel programming and binary interfaces, this change should not be MFC'd. Proposed by: jeff Reviewed by: jeff Discussed with: rmacklem, zach loafman @ isilon
|
#
607fc40b |
|
29-Mar-2009 |
Alexander Kabaev <kan@FreeBSD.org> |
Replace v_dd vnode pointer with v_cache_dd pointer to struct namecache in directory vnodes. Allow namecache dotdot entry to be created pointing from child vnode to parent vnode if no existing links in opposite direction exist. Use direct link from parent to child for dotdot lookups otherwise. This restores more efficient dotdot caching in NFS filesystems which was lost when vnodes stoppped being type stable. Reviewed by: kib
|
#
6180d318 |
|
29-Mar-2009 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Get rid of VSTAT and replace it with VSTAT_PERMS, which is somewhat better defined. Approved by: rwatson (mentor)
|
#
54377204 |
|
27-Mar-2009 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Add new V* constants, neccessary for granular permission checks in NFSv4 ACLs. While here, get rid of VALLPERM; it wasn't used anyway. Approved by: rwatson (mentor)
|
#
f0ffa083 |
|
08-Mar-2009 |
Joe Marcus Clarke <marcus@FreeBSD.org> |
Add a prototype for the new vop_stdvptocnp function. Reviewed by: kib Approved by: kib Tested by: pho
|
#
03964c8e |
|
19-Feb-2009 |
John Baldwin <jhb@FreeBSD.org> |
Enable caching of negative pathname lookups in the NFS client. To avoid stale entries, we save a copy of the directory's modification time when the first negative cache entry was added in the directory's NFS node. When a negative cache entry is hit during a pathname lookup, the parent directory's modification time is checked. If it has changed, all of the negative cache entries for that parent are purged and the lookup falls back to using the RPC. This required adding a new cache_purge_negative() method to the name cache to purge only negative cache entries for a given directory. Submitted by: mohans, Rick Macklem, Ricardo Labiaga @ NetApp Reviewed by: mohans
|
#
9a734a28 |
|
28-Jan-2009 |
John Baldwin <jhb@FreeBSD.org> |
Actually remove the VA_MARK_ATIME flag. This should have been in the earlier commit to add VOP_MARKATIME().
|
#
e9aff357 |
|
21-Jan-2009 |
Konstantin Belousov <kib@FreeBSD.org> |
Move the code from ufs_lookup.c used to do dotdot lookup, into the helper function. It is supposed to be useful for any filesystem that has to unlock dvp to walk to the ".." entry in lookup routine. Requested by: jhb Tested by: pho MFC after: 1 month
|
#
24b087e4 |
|
11-Dec-2008 |
Joe Marcus Clarke <marcus@FreeBSD.org> |
Add a new error VOP, VOP_ENOENT. This function will simply return ENOENT. Reviewed by: arch Approved by: kib
|
#
1ba4a712 |
|
17-Nov-2008 |
Pawel Jakub Dawidek <pjd@FreeBSD.org> |
Update ZFS from version 6 to 13 and bring some FreeBSD-specific changes. This bring huge amount of changes, I'll enumerate only user-visible changes: - Delegated Administration Allows regular users to perform ZFS operations, like file system creation, snapshot creation, etc. - L2ARC Level 2 cache for ZFS - allows to use additional disks for cache. Huge performance improvements mostly for random read of mostly static content. - slog Allow to use additional disks for ZFS Intent Log to speed up operations like fsync(2). - vfs.zfs.super_owner Allows regular users to perform privileged operations on files stored on ZFS file systems owned by him. Very careful with this one. - chflags(2) Not all the flags are supported. This still needs work. - ZFSBoot Support to boot off of ZFS pool. Not finished, AFAIK. Submitted by: dfr - Snapshot properties - New failure modes Before if write requested failed, system paniced. Now one can select from one of three failure modes: - panic - panic on write error - wait - wait for disk to reappear - continue - serve read requests if possible, block write requests - Refquota, refreservation properties Just quota and reservation properties, but don't count space consumed by children file systems, clones and snapshots. - Sparse volumes ZVOLs that don't reserve space in the pool. - External attributes Compatible with extattr(2). - NFSv4-ACLs Not sure about the status, might not be complete yet. Submitted by: trasz - Creation-time properties - Regression tests for zpool(8) command. Obtained from: OpenSolaris
|
#
15bc6b2b |
|
28-Oct-2008 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Introduce accmode_t. This is required for NFSv4 ACLs - it will be neccessary to add more V* constants, and the variables changed by this patch were often being assigned to mode_t variables, which is 16 bit. Approved by: rwatson (mentor)
|
#
d7f03759 |
|
19-Oct-2008 |
Ulf Lilleengen <lulf@FreeBSD.org> |
- Import the HEAD csup code which is the basis for the cvsmode work.
|
#
0d7935fd |
|
10-Oct-2008 |
Attilio Rao <attilio@FreeBSD.org> |
Remove the struct thread unuseful argument from bufobj interface. In particular following functions KPI results modified: - bufobj_invalbuf() - bufsync() and BO_SYNC() "virtual method" of the buffer objects set. Main consumers of bufobj functions are affected by this change too and, in particular, functions which changed their KPI are: - vinvalbuf() - g_vfs_close() Due to the KPI breakage, __FreeBSD_version will be bumped in a later commit. As a side note, please consider just temporary the 'curthread' argument passing to VOP_SYNC() (in bufsync()) as it will be axed out ASAP Reviewed by: kib Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
|
#
bdb80947 |
|
16-Sep-2008 |
Konstantin Belousov <kib@FreeBSD.org> |
Garbage-collect vn_write_suspend_wait(). Suggested and reviewed by: tegge Tested by: pho MFC after: 1 month
|
#
dfa7fd1d |
|
10-Sep-2008 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Remove VSVTX, VSGID and VSUID. This should be a no-op, as VSVTX == S_ISVTX, VSGID == S_ISGID and VSUID == S_ISUID. Approved by: rwatson (mentor)
|
#
0359a12e |
|
28-Aug-2008 |
Attilio Rao <attilio@FreeBSD.org> |
Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread was always curthread and totally unuseful. Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
|
#
a888d54d |
|
28-Aug-2008 |
Konstantin Belousov <kib@FreeBSD.org> |
Introduce the VV_FORCEINSMQ vnode flag. It instructs the insmnque() function to ignore the unmounting and forces insertion of the vnode into the mount vnode list. Change insmntque() to fail when forced unmount is in progress and VV_FORCEINSMQ is not specified. Add an assertion to the insmntque(), requiring the vnode to be exclusively locked for mp-safe filesystems. Use the VV_FORCEINSMQ for the creation of the syncvnode. Tested by: pho Reviewed by: tegge MFC after: 1 month
|
#
dfc714fb |
|
31-Jul-2008 |
Christian S.J. Peron <csjp@FreeBSD.org> |
Currently, BSM audit pathname token generation for chrooted or jailed processes are not producing absolute pathname tokens. It is required that audited pathnames are generated relative to the global root mount point. This modification changes our implementation of audit_canon_path(9) and introduces a new function: vn_fullpath_global(9) which performs a vnode -> pathname translation relative to the global mount point based on the contents of the name cache. Much like vn_fullpath, vn_fullpath_global is a wrapper function which called vn_fullpath1. Further, the string parsing routines have been converted to use the sbuf(9) framework. This change also removes the conditional acquisition of Giant, since the vn_fullpath1 method will not dip into file system dependent code. The vnode locking was modified to use vhold()/vdrop() instead the vref() and vrele(). This will modify the hold count instead of modifying the user count. This makes more sense since it's the kernel that requires the reference to the vnode. This also makes sure that the vnode does not get recycled we hold the reference to it. [1] Discussed with: rwatson Reviewed by: kib [1] MFC after: 2 weeks
|
#
eab626f1 |
|
16-Apr-2008 |
Konstantin Belousov <kib@FreeBSD.org> |
Move the head of byte-level advisory lock list from the filesystem-specific vnode data to the struct vnode. Provide the default implementation for the vop_advlock and vop_advlockasync. Purge the locks on the vnode reclaim by using the lf_purgelocks(). The default implementation is augmented for the nfs and smbfs. In the nfs_advlock, push the Giant inside the nfs_dolock. Before the change, the vop_advlock and vop_advlockasync have taken the unlocked vnode and dereferenced the fs-private inode data, racing with with the vnode reclamation due to forced unmount. Now, the vop_getattr under the shared vnode lock is used to obtain the inode size, and later, in the lf_advlockasync, after locking the vnode interlock, the VI_DOOMED flag is checked to prevent an operation on the doomed vnode. The implementation of the lf_purgelocks() is submitted by dfr. Reported by: kris Tested by: kris, pho Discussed with: jeff, dfr MFC after: 2 weeks
|
#
047dd67e |
|
06-Apr-2008 |
Attilio Rao <attilio@FreeBSD.org> |
Optimize lockmgr in order to get rid of the pool mutex interlock, of the state transitioning flags and of msleep(9) callings. Use, instead, an algorithm very similar to what sx(9) and rwlock(9) alredy do and direct accesses to the sleepqueue(9) primitive. In order to avoid writer starvation a mechanism very similar to what rwlock(9) uses now is implemented, with the correspective per-thread shared lockmgrs counter. This patch also adds 2 new functions to lockmgr KPI: lockmgr_rw() and lockmgr_args_rw(). These two are like the 2 "normal" versions, but they both accept a rwlock as interlock. In order to realize this, the general lockmgr manager function "__lockmgr_args()" has been implemented through the generic lock layer. It supports all the blocking primitives, but currently only these 2 mappers live. The patch drops the support for WITNESS atm, but it will be probabilly added soon. Also, there is a little race in the draining code which is also present in the current CVS stock implementation: if some sharers, once they wakeup, are in the runqueue they can contend the lock with the exclusive drainer. This is hard to be fixed but the now committed code mitigate this issue a lot better than the (past) CVS version. In addition assertive KA_HELD and KA_UNHELD have been made mute assertions because they are dangerous and they will be nomore supported soon. In order to avoid namespace pollution, stack.h is splitted into two parts: one which includes only the "struct stack" definition (_stack.h) and one defining the KPI. In this way, newly added _lockmgr.h can just include _stack.h. Kernel ABI results heavilly changed by this commit (the now committed version of "struct lock" is a lot smaller than the previous one) and KPI results broken by lockmgr_rw() / lockmgr_args_rw() introduction, so manpages and __FreeBSD_version will be updated accordingly. Tested by: kris, pho, jeff, danger Reviewed by: jeff Sponsored by: Google, Summer of Code program 2007
|
#
0a3af16a |
|
31-Mar-2008 |
Konstantin Belousov <kib@FreeBSD.org> |
Add the utility function vn_commname() to retrieve the command name from the vfs namecache, when available. Reviewed by: rwatson, rdivacky Tested by: pho
|
#
97735db7 |
|
23-Mar-2008 |
Jeff Roberson <jeff@FreeBSD.org> |
- Remove an old comment; vnodes have been working without Giant for years now. - Clarify the locking required for VI_DOOMED in preparation for simplifications to vget() and vn_lock().
|
#
1be222e9 |
|
23-Mar-2008 |
Konstantin Belousov <kib@FreeBSD.org> |
Yield the cpu in the kernel while iterating the list of the vnodes belonging to the mountpoint. Also, yield when in the softdep_process_worklist() even when we are not going to sleep due to buffer drain. It is believed that the ULE fixed the problem [1], but the yielding seems to be needed at least for the 4BSD case. Discussed: on stable@, with bde Reviewed by: tegge, jeff [1] MFC after: 2 weeks
|
#
7fbfba7b |
|
01-Mar-2008 |
Attilio Rao <attilio@FreeBSD.org> |
- Handle buffer lock waiters count directly in the buffer cache instead than rely on the lockmgr support [1]: * bump the waiters only if the interlock is held * let brelvp() return the waiters count * rely on brelvp() instead than BUF_LOCKWAITERS() in order to check for the waiters number - Remove a namespace pollution introduced recently with lockmgr.h including lock.h by including lock.h directly in the consumers and making it mandatory for using lockmgr. - Modify flags accepted by lockinit(): * introduce LK_NOPROFILE which disables lock profiling for the specified lockmgr * introduce LK_QUIET which disables ktr tracing for the specified lockmgr [2] * disallow LK_SLEEPFAIL and LK_NOWAIT to be passed there so that it can only be used on a per-instance basis - Remove BUF_LOCKWAITERS() and lockwaiters() as they are no longer used This patch breaks KPI so __FreBSD_version will be bumped and manpages updated by further commits. Additively, 'struct buf' changes results in a disturbed ABI also. [2] Really, currently there is no ktr tracing in the lockmgr, but it will be added soon. [1] Submitted by: kib Tested by: pho, Andrea Barberio <insomniac at slackware dot it>
|
#
628f51d2 |
|
24-Feb-2008 |
Attilio Rao <attilio@FreeBSD.org> |
Introduce some functions in the vnode locks namespace and in the ffs namespace in order to handle lockmgr fields in a controlled way instead than spreading all around bogus stubs: - VN_LOCK_AREC() allows lock recursion for a specified vnode - VN_LOCK_ASHARE() allows lock sharing for a specified vnode In FFS land: - BUF_AREC() allows lock recursion for a specified buffer lock - BUF_NOREC() disallows recursion for a specified buffer lock Side note: union_subr.c::unionfs_node_update() is the only other function directly handling lockmgr fields. As this is not simple to fix, it has been left behind as "sole" exception.
|
#
22db15c0 |
|
13-Jan-2008 |
Attilio Rao <attilio@FreeBSD.org> |
VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
|
#
cb05b60a |
|
09-Jan-2008 |
Attilio Rao <attilio@FreeBSD.org> |
vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed. Manpage and FreeBSD_version will be updated through further commits. As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock. Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>
|
#
51662691 |
|
20-Oct-2007 |
Pawel Jakub Dawidek <pjd@FreeBSD.org> |
Remove redundant prototypes.
|
#
9e223287 |
|
31-May-2007 |
Konstantin Belousov <kib@FreeBSD.org> |
Revert UF_OPENING workaround for CURRENT. Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation argument from being file descriptor index into the pointer to struct file. Proposed and reviewed by: jhb Reviewed by: daichi (unionfs) Approved by: re (kensmith)
|
#
950afe99 |
|
25-May-2007 |
Pawel Jakub Dawidek <pjd@FreeBSD.org> |
The cache_leaf_test() function seems to be unused, so remove it.
|
#
d413d210 |
|
18-May-2007 |
Konstantin Belousov <kib@FreeBSD.org> |
Since renaming of vop_lock to _vop_lock, pre- and post-condition function calls are no more generated for vop_lock. Rename _vop_lock to vop_lock1 to satisfy tools/vnode_if.awk assumption about vop naming conventions. This restores pre/post-condition calls.
|
#
e6534b36 |
|
31-Mar-2007 |
Dag-Erling Smørgrav <des@FreeBSD.org> |
Make vdropl() public; zfs needs it. There is also plenty of existing file system code (mostly *_reclaim()) which look like this: VOP_LOCK(vp); /* examine vp */ VOP_UNLOCK(vp); vdrop(vp); This can now be rewritten to: VOP_LOCK(vp); /* examine vp */ vdropl(vp); /* will unlock vp */ MFC after: 1 week
|
#
61b9d89f |
|
12-Mar-2007 |
Tor Egge <tegge@FreeBSD.org> |
Make insmntque() externally visibile and allow it to fail (e.g. during late stages of unmount). On failure, the vnode is recycled. Add insmntque1(), to allow for file system specific cleanup when recycling vnode on failure. Change getnewvnode() to no longer call insmntque(). Previously, embryonic vnodes were put onto the list of vnode belonging to a file system, which is unsafe for a file system marked MPSAFE. Change vfs_hash_insert() to no longer lock the vnode. The caller now has that responsibility. Change most file systems to lock the vnode and call insmntque() or insmntque1() after a new vnode has been sufficiently setup. Handle failed insmntque*() calls by propagating errors to callers, possibly after some file system specific cleanup. Approved by: re (kensmith) Reviewed by: kib In collaboration with: kib
|
#
10bcafe9 |
|
15-Feb-2007 |
Pawel Jakub Dawidek <pjd@FreeBSD.org> |
Move vnode-to-file-handle translation from vfs_vptofh to vop_vptofh method. This way we may support multiple structures in v_data vnode field within one file system without using black magic. Vnode-to-file-handle should be VOP in the first place, but was made VFS operation to keep interface as compatible as possible with SUN's VFS. BTW. Now Solaris also implements vnode-to-file-handle as VOP operation. VFS_VPTOFH() was left for API backward compatibility, but is marked for removal before 8.0-RELEASE. Approved by: mckusick Discussed with: many (on IRC) Tested with: ufs, msdosfs, cd9660, nullfs and zfs
|
#
8fd14511 |
|
14-Dec-2006 |
Konstantin Belousov <kib@FreeBSD.org> |
Use tab after #define. Pointed out by: pjd
|
#
3b7b5496 |
|
14-Dec-2006 |
Konstantin Belousov <kib@FreeBSD.org> |
Resolve two deadlocks that could be caused by busy md device backed by vnode. Allow for md thread and the thread that owns lock on vnode backing the md device to do the write even when runningbufspace is exhausted. Tested by: Peter Holm Reviewed by: tegge MFC after: 2 weeks
|
#
2f6a774b |
|
12-Nov-2006 |
Kip Macy <kmacy@FreeBSD.org> |
change vop_lock handling to allowing tracking of callers' file and line for acquisition of lockmgr locks Approved by: scottl (standing in for mentor rwatson)
|
#
1a60c7fc |
|
31-Oct-2006 |
Pawel Jakub Dawidek <pjd@FreeBSD.org> |
Add gjournal specific code to the UFS file system: - Add FS_GJOURNAL flag which enables gjournal support on a file system. - Add cg_unrefs field to the cylinder group structure which holds number of unreferenced (orphaned) inodes in the given cylinder group. - Add fs_unrefs field to the super block structure which holds total number of unreferenced (orphaned) inodes. - When file or a directory is orphaned (last reference is removed, but object is still open), increase fs_unrefs and cg_unrefs fields, which is a hint for fsck in which cylinder groups looks for such (orphaned) objects. - When file is last closed, decrease {fs,cg}_unrefs fields. - Add VV_DELETED vnode flag which points at orphaned objects. Sponsored by: home.pl
|
#
4207c279 |
|
18-Apr-2006 |
Xin LI <delphij@FreeBSD.org> |
In vfs_hash_get(): mount point should never be changed so explicitly constify the mp parameter. Reviewed by: phk
|
#
791dd2fa |
|
08-Mar-2006 |
Tor Egge <tegge@FreeBSD.org> |
Use vn_start_secondary_write() and vn_finished_secondary_write() as a replacement for vn_write_suspend_wait() to better account for secondary write processing. Close race where secondary writes could be started after ffs_sync() returned but before the file system was marked as suspended. Detect if secondary writes or softdep processing occurred during vnode sync loop in ffs_sync() and retry the loop if needed.
|
#
eb2ea105 |
|
01-Mar-2006 |
Jeff Roberson <jeff@FreeBSD.org> |
- Move softdep from using a global worklist to per-mount worklists. This has many positive effects including improved smp locking, reducing interdependencies between mounts that can lead to deadlocks, etc. - Add the softdep worklist and various counters to the ufsmnt structure. - Add a mount pointer to the workitem and remove mount pointers from the various structures derived from the workitem as they are now redundant. - Remove the poor-man's semaphore protecting softdep_process_worklist and softdep_flushworklist. Several threads may now process the list simultaneously. - Add softdep_waitidle() to block the thread until all pending dependencies being operated on by other threads have been flushed. - Use softdep_waitidle() in unmount and snapshots to block either operation until the fs is stable. - Remove softdep worklist processing from the syncer and move it into the softdep_flush() thread. This thread processes all softdep mounts once each second and when it is called via the new softdep_speedup() when there is a resource shortage. This removes the softdep hook from the kernel and various hacks in header files to support it. Reviewed by/Discussed with: tegge, truckman, mckusick Tested by: kris
|
#
731959b1 |
|
31-Jan-2006 |
Yaroslav Tykhiy <ytykhiy@gmail.com> |
Use off_t for file size passed to vnode_create_vobject(). The former type, size_t, was causing truncation to 32 bits on i386, which immediately led to undersizing of VM objects backed by files >4GB. In particular, sendfile(2) was broken for such files. PR: kern/92243 MFC after: 5 days
|
#
ea6d62f1 |
|
14-Jan-2006 |
Robert Watson <rwatson@FreeBSD.org> |
Rename uid and gid arguments to vaccess() prototype to match vaccess() implementation in vfs_subr.c. No functional change. MFC after: 3 days
|
#
82be0a5a |
|
09-Jan-2006 |
Tor Egge <tegge@FreeBSD.org> |
Add marker vnodes to ensure that all vnodes associated with the mount point are iterated over when using MNT_VNODE_FOREACH. Reviewed by: truckman
|
#
0430a5e2 |
|
13-Dec-2005 |
Dag-Erling Smørgrav <des@FreeBSD.org> |
Eradicate caddr_t from the VFS API.
|
#
e26b05cf |
|
13-Dec-2005 |
Dag-Erling Smørgrav <des@FreeBSD.org> |
Nuke vnodeop_desc.vdesc_transports, which has been unused since the dawn of time (or the inception of ncvs, whichever came last)
|
#
9f5c1d19 |
|
12-Oct-2005 |
Diomidis Spinellis <dds@FreeBSD.org> |
Move execve's access time update functionality into a new vfs_mark_atime() function, and use the new function for performing efficient atime updates in mmap(). Reviewed by: bde MFC after: 2 weeks
|
#
2883ba66 |
|
12-Sep-2005 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Introduce vfs_read_dirent() which can help VOP_READDIR() implementations by handling all the cookie stuff.
|
#
34cc826a |
|
05-Aug-2005 |
Suleiman Souhlal <ssouhlal@FreeBSD.org> |
Holding a vnode doesn't prevent v_mount from disappearing (when the vnode is inactivated), possibly leading to a NULL dereference when checking if the mount wants knotes to be activated in the VOP hooks. So, we add a new vnode flag VV_NOKNOTE that is only set in getnewvnode(), if necessary, and check it when activating knotes. Since the flags are not erased when a vnode is being held, we can safely read them. Reviewed by: kris@ MFC after: 3 days
|
#
e8ddb61d |
|
02-Aug-2005 |
Jeff Roberson <jeff@FreeBSD.org> |
- Replace the series of DEBUG_LOCKS hacks which tried to save the vn_lock caller by saving the stack of the last locker/unlocker in lockmgr. We also put the stack in KTR at the moment. Contributed by: Antoine Brodin <antoine.brodin@laposte.net>
|
#
2b0f687b |
|
01-Jul-2005 |
Suleiman Souhlal <ssouhlal@FreeBSD.org> |
Mistakingly undefined VN_KNOTE_LOCKED in my previous commit. Noticed by: Antoine Brodin <antoine.brodin@laposte.net> Approved by: re (scottl)
|
#
571dcd15 |
|
01-Jul-2005 |
Suleiman Souhlal <ssouhlal@FreeBSD.org> |
Fix the recent panics/LORs/hangs created by my kqueue commit by: - Introducing the possibility of using locks different than mutexes for the knlist locking. In order to do this, we add three arguments to knlist_init() to specify the functions to use to lock, unlock and check if the lock is owned. If these arguments are NULL, we assume mtx_lock, mtx_unlock and mtx_owned, respectively. - Using the vnode lock for the knlist locking, when doing kqueue operations on a vnode. This way, we don't have to lock the vnode while holding a mutex, in filt_vfsread. Reviewed by: jmg Approved by: re (scottl), scottl (mentor override) Pointyhat to: ssouhlal Will be happy: everyone
|
#
b930d853 |
|
13-Jun-2005 |
Jeff Roberson <jeff@FreeBSD.org> |
- Don't make vgonel() globally visible, we want to change its prototype anyway and it's not used outside of vfs_subr.c. - Change vgonel() to accept a parameter which determines whether or not we'll put the vnode on the free list when we're done. - Use the new vgonel() parameter rather than VI_DOOMED to signal our intentions in vtryrecycle(). - In vgonel() return if VI_DOOMED is already set, this vnode has already been reclaimed. Sponsored by: Isilon Systems, Inc.
|
#
679985d0 |
|
09-Jun-2005 |
Suleiman Souhlal <ssouhlal@FreeBSD.org> |
Allow EVFILT_VNODE events to work on every filesystem type, not just UFS by: - Making the pre and post hooks for the VOP functions work even when DEBUG_VFS_LOCKS is not defined. - Moving the KNOTE activations into the corresponding VOP hooks. - Creating a MNTK_NOKNOTE flag for the mnt_kern_flag field of struct mount that permits filesystems to disable the new behavior. - Creating a default VOP_KQFILTER function: vfs_kqfilter() My benchmarks have not revealed any performance degradation. Reviewed by: jeff, bde Approved by: rwatson, jmg (kqueue changes), grehan (mentor)
|
#
6341095e |
|
31-May-2005 |
Ken Smith <kensmith@FreeBSD.org> |
This patch addresses a standards violation issue. The standards say a file's access time should be updated when it gets executed. A while ago the mechanism used to exec was changed to use a more mmap based mechanism and this behavior was broken as a side-effect of that. A new vnode flag is added that gets set when the file gets executed, and the VOP_SETATTR() vnode operation gets called. The underlying filesystem is expected to handle it based on its own semantics, some filesystems don't support access time at all. Those that do should handle it in a way that does not block, does not generate I/O if possible, etc. In particular vn_start_write() has not been called. The UFS code handles it the same way as it would normally handle the access time if a file was read - the IN_ACCESS flag gets set in the inode but no other action happens at this point. The actual time update will happen later during a sync (which handles all the necessary locking). Got me into this: cperciva Discussed with: a lot with bde, a little with kan Showed patches to: phk, jeffr, standards@, arch@ Minor discussion on: arch@
|
#
fc8dfa75 |
|
27-Apr-2005 |
Jeff Roberson <jeff@FreeBSD.org> |
- Changes to vgone() and related teardown code have meant that the vxthread pointer is no longer needed.
|
#
ab9707d7 |
|
22-Apr-2005 |
Jeff Roberson <jeff@FreeBSD.org> |
- Add a VI_LOCK_FLAGS so we can pass MTX_DUPOK in. This somewhat defeats the purpose of having macros to hide the lock type as we may now be dependent on MTX_ flags. Sponsored by: Isilon Systems, Inc.
|
#
cbc5da3a |
|
11-Apr-2005 |
Jeff Roberson <jeff@FreeBSD.org> |
- Add the mising ASSERT_VOP_ELOCKED code in the !DEBUG_VFS_LOCKS case. Pointy hat to: me
|
#
539de9ed |
|
11-Apr-2005 |
Jeff Roberson <jeff@FreeBSD.org> |
- Enable ASSERT_VOP_ELOCKED and assert_vop_elocked() now that vnode_if.awk uses it. Sponsored by: Isilon Systems, Inc.
|
#
878cdac0 |
|
29-Mar-2005 |
David Schultz <das@FreeBSD.org> |
Eliminate v_id and v_ddid. This changes struct vnode, so all filesystem modules must be recompiled. (Since struct vnode has already changed in 6-CURRENT, there's little advantage to leaving the unused fields around.)
|
#
c167961e |
|
23-Mar-2005 |
Jeff Roberson <jeff@FreeBSD.org> |
- If vput() is called with a shared lock it must upgrade to an exclusive before it can call VOP_INACTIVE(). This must use the EXCLUPGRADE path because we may violate some lock order with another locked vnode if we drop and reacquire the lock. If EXCLUPGRADE fails, we mark the vnode with VI_OWEINACT. This case should be very rare. - Clear VI_OWEINACT in vinactive() and vbusy(). - If VI_OWEINACT is set in vgone() do the VOP_INACTIVE call here as well. Sponsored by: Isilon Systems, Inc.
|
#
51f5ce0c |
|
16-Mar-2005 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Add two arguments to the vfs_hash() KPI so that filesystems which do not have unique hashes (NFS) can also use it.
|
#
78bb3c21 |
|
16-Mar-2005 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Add mnt_hashseed to struct mount and initialize it witn PRNG bits, use it to get better hashing in vfs_hash. In case of an insert collision in vfs_hash_insert(), put the loosing vnode on a special list so that vfs_hash_remove() can just assume that it is on a list. Drop the VI_HASHED flag.
|
#
b172f6c5 |
|
15-Mar-2005 |
Jeff Roberson <jeff@FreeBSD.org> |
- Now that there are no external users of vfree() make it static. - Move VSHOULDBUSY, VSHOULDFREE, and VTRYRECYCLE into vfs_subr.c so no one else attempts to grow a dependency on them. - Now that objects with pages hold the vnode we don't have to do unlocked checks for the page count in the vm object in VSHOULDFREE. These three macros could simply check for holdcnt state transitions to determine whether the vnode is on the free list already, but the extra safety the flag affords us is probably worth the minimal cost. - The leafonly sysctl and code have been dead for several years now, remove the sysctl and the code that employed it from vtryrecycle(). - vtryrecycle() also no longer has to check the object's page count as the object holds the vnode until it reaches 0. Sponsored by: Isilon Systems, Inc.
|
#
c178628d |
|
15-Mar-2005 |
Jeff Roberson <jeff@FreeBSD.org> |
- Expose vholdl() so it may be used outside of vfs_subr.c
|
#
6c325a2a |
|
14-Mar-2005 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Currently (almost) all filesystems maintain a local inode hash table to get from (mount + inode) to vnode. These tables are mostly copy&pasted from UFS, sized based on desiredvnodes and therefore quite large (128K-512K). Several filesystems are buggy enough that they allocate the hash table even before they know if they will ever be used or not. Add "vfs_hash", a system wide hash table, which will replace all the per-filesystem hash-tables. The fields we add to struct vnode will more or less be saved in the respective filesystems inodes. Having one central implementation will save code and will allow us to justify the complexity of code to dynamically (re)size the hash at a later point.
|
#
8045557f |
|
14-Mar-2005 |
Jeff Roberson <jeff@FreeBSD.org> |
- Increment the holdcnt once for each usecount reference. This allows us to use only the holdcnt to determine whether a vnode may be recycled, simplifying the V* macros as well as vtryrecycle(), etc. Sponsored by: Isilon Systems, Inc.
|
#
159b4548 |
|
14-Mar-2005 |
Jeff Roberson <jeff@FreeBSD.org> |
- We do not have to check the object's ref_count in VSHOULDFREE or vtryrecycle(). All obj refs also ref the vnode. - Consistently use v_incr_usecount() to increment the usecount. This will be more important later. Sponsored by: Isilon Systems, Inc.
|
#
1d39df3f |
|
14-Mar-2005 |
Jeff Roberson <jeff@FreeBSD.org> |
- Retire OLOCK and OWANT. All callers hold the vnode lock when creating a vnode object. There has been an assert to prove this for some time. Sponsored by: Isilon Systems, Inc.
|
#
5e7475a3 |
|
13-Mar-2005 |
Jeff Roberson <jeff@FreeBSD.org> |
- Get rid of VXLOCK, VXWANT, and vx_*. The vnode lock now protects us against recycling. - Modify VSHOULDFREE, VCANRECYCLE, etc. now that certain flags are no longer important. Remove VMIGHTFREE as it is only used in one place. Sponsored by: Isilon Systems, Inc.
|
#
bd8bb8e4 |
|
22-Feb-2005 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Group the fields in struct vnode by their function and stick comments there to tell what the function is.
|
#
aa2f6ddc |
|
22-Feb-2005 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Reap more benefits from DEVFS: List devfs_dirents rather than vnodes off their shared struct cdev, this saves a pointer field in the vnode at the expense of a field in the devfs_dirent. There are often 100 times more vnodes so this is bargain. In addition it makes it harder for people to try to do stypid things like "finding the vnode from cdev". Since DEVFS handles all VCHR nodes now, we can do the vnode related cleanup in devfs_reclaim() instead of in dev_rel() and vgonel(). Similarly, we can do the struct cdev related cleanup in dev_rel() instead of devfs_reclaim(). rename idestroy_dev() to destroy_devl() for consistency. Add LIST_ENTRY de_alias to struct devfs_dirent. Remove v_specnext from struct vnode. Change si_hlist to si_alist in struct cdev. String new devfs vnodes' devfs_dirent on si_alist when we create them and take them off in devfs_reclaim(). Fix devfs_revoke() accordingly. Also don't clear fields devfs_reclaim() will clear when called from vgone(); Let devfs_reclaim() call dev_rel() instead of vgonel(). Move the usecount tracking from dev_rel() to devfs_reclaim(), and let dev_rel() take a struct cdev argument instead of vnode. Destroy SI_CHEAPCLONE devices in dev_rel() (instead of devfs_reclaim()) when they are no longer used. (This should maybe happen in devfs_close() instead.)
|
#
7fc940b2 |
|
22-Feb-2005 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Remove vfinddev(), it is generally bogus when faced with jails and chroot and has no legitimate use(r)s in the tree.
|
#
4d8ac58b |
|
17-Feb-2005 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Introduce vx_wait{l}() and use it instead of home-rolled versions.
|
#
1ba21282 |
|
09-Feb-2005 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Make various vnode related functions static
|
#
8d63297e |
|
10-Feb-2005 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Add __printflike() to vn_printf()
|
#
88e5b12a |
|
08-Feb-2005 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Drag another softupdates tentacle back into FFS: Now that FFS's vop_fsync is separate from the internal use we can do the full job there.
|
#
7c5d36fb |
|
07-Feb-2005 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Remove vop_stddestroyvobject()
|
#
d4eb29ba |
|
28-Jan-2005 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Remove unused argument to vrecycle()
|
#
7146d6cb |
|
28-Jan-2005 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Move the contents of vop_stddestroyvobject() to the new vnode_pager function vnode_destroy_vobject(). Make the new function zero the vp->v_object pointer so we can tell if a call is missing.
|
#
729fcf7e |
|
24-Jan-2005 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Take VOP_GETVOBJECT() out to pasture. We use the direct pointer now.
|
#
69816ea3 |
|
24-Jan-2005 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Kill VOP_CREATEVOBJECT(), it is now the responsibility of the filesystem for a given vnode to create a vnode_pager object if one is needed.
|
#
d07a6d3f |
|
24-Jan-2005 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Move the body of vop_stdcreatevobject() over to the vnode_pager under the name Sande^H^H^H^H^Hvnode_create_vobject(). Make the new function take a size argument which removes the need for a VOP_STAT() or a very pessimistic guess for disks. Call that new function from vop_stdcreatevobject(). Make vnode_pager_alloc() private now that its only user came home.
|
#
7c93282e |
|
24-Jan-2005 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Change vprint() to vn_printf() which takes varargs. Add #define for vprint() to call vn_printf().
|
#
35764be3 |
|
24-Jan-2005 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Kill the VV_OBJBUF and test the v_object for NULL instead.
|
#
49bc5dce |
|
24-Jan-2005 |
Jeff Roberson <jeff@FreeBSD.org> |
- Add a VCANRECYCLE() which performs all the checks required to ensure that we are free to release a vnode.
|
#
7c0745ee |
|
14-Jan-2005 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Eliminate unused and unnecessary "cred" argument from vinvalbuf()
|
#
e39db32a |
|
12-Jan-2005 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Ditch vfs_object_create() and make the callers call VOP_CREATEVOBJECT() directly.
|
#
de8a6c06 |
|
13-Jan-2005 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Get rid of the VDESC() macro while the pot is boiling anyway, it is only used from generate files now, so we might as well generate the right stuff from the start.
|
#
63f89abf |
|
13-Jan-2005 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Change the generated VOP_ macro implementations to improve type checking and KASSERT coverage. After this check there is only one "nasty" cast in this code but there is a KASSERT to protect against the wrong argument structure behind that cast. Un-inlining the meat of VOP_FOO() saves 35kB of text segment on a typical kernel with no change in performance. We also now run the checking and tracing on VOP's which have been layered by nullfs, umapfs, deadfs or unionfs. Add new (non-inline) VOP_FOO_AP() functions which take a "struct foo_args" argument and does everything the VOP_FOO() macros used to do with checks and debugging code. Add KASSERT to VOP_FOO_AP() check for argument type being correct. Slim down VOP_FOO() inline functions to just stuff arguments into the struct foo_args and call VOP_FOO_AP(). Put function pointer to VOP_FOO_AP() into vop_foo_desc structure and make VCALL() use it instead of the current offsetoff() hack. Retire vcall() which implemented the offsetoff() Make deadfs and unionfs use VOP_FOO_AP() calls instead of VCALL(), we know which specific call we want already. Remove unneeded arguments to VCALL() in nullfs and umapfs bypass functions. Remove unused vdesc_offset and VOFFSET(). Generally improve style/readability of the generated code.
|
#
60727d8b |
|
06-Jan-2005 |
Warner Losh <imp@FreeBSD.org> |
/* -> /*- for license, minor formatting changes
|
#
10eee285 |
|
22-Dec-2004 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Shuffle numeric values of the IO_* flags to match the O_* flags from fcntl.h. This is in preparation for making the flags passed to device drivers be consistently from fcntl.h for all entrypoints. Today open, close and ioctl uses fcntl.h flags, while read and write uses vnode.h flags.
|
#
e87047b4 |
|
20-Dec-2004 |
Poul-Henning Kamp <phk@FreeBSD.org> |
We can only ever get to vgonechrl() from a devfs vnode, so we do not need to reassign the vp->v_op to devfs_specops, we know that is the value already. Make devfs_specops private to devfs.
|
#
d3ecc7aa |
|
11-Dec-2004 |
Marcel Moolenaar <marcel@FreeBSD.org> |
Revert rev 1.259. The null-pointer function call (a dereference on ia64) was not the result of a change in the vector operations. It was caused by the NFS locking code using a FIFO and those bypassing the vnode. This indirectly caused the panic. The NFS locking code has been changed. Requested by: phk
|
#
20a92a18 |
|
07-Dec-2004 |
Poul-Henning Kamp <phk@FreeBSD.org> |
The remaining part of nmount/omount/rootfs mount changes. I cannot sensibly split the conversion of the remaining three filesystems out from the root mounting changes, so in one go: cd9660: Convert to nmount. Add omount compat shims. Remove dedicated rootfs mounting code. Use vfs_mountedfrom() Rely on vfs_mount.c calling VFS_STATFS() nfs(client): Convert to nmount (the simple way, mount_nfs(8) is still necessary). Add omount compat shims. Drop COMPAT_PRELITE2 mount arg compatibility. ffs: Convert to nmount. Add omount compat shims. Remove dedicated rootfs mounting code. Use vfs_mountedfrom() Rely on vfs_mount.c calling VFS_STATFS() Remove vfs_omount() method, all filesystems are now converted. Remove MNTK_WANTRDWR, handling RO/RW conversions is a filesystem task, and they all do it now. Change rootmounting to use DEVFS trampoline: vfs_mount.c: Mount devfs on /. Devfs needs no 'from' so this is clean. symlink /dev to /. This makes it possible to lookup /dev/foo. Mount "real" root filesystem on /. Surgically move the devfs mountpoint from under the real root filesystem onto /dev in the real root filesystem. Remove now unnecessary getdiskbyname(). kern_init.c: Don't do devfs mounting and rootvnode assignment here, it was already handled by vfs_mount.c. Remove now unused bdevvp(), addaliasu() and addalias(). Put the few necessary lines in devfs where they belong. This eliminates the second-last source of bogo vnodes, leaving only the lemming-syncer. Remove rootdev variable, it doesn't give meaning in a global context and was not trustworth anyway. Correct information is provided by statfs(/).
|
#
061f5ec8 |
|
05-Dec-2004 |
Marcel Moolenaar <marcel@FreeBSD.org> |
Fix null-pointer indirect function calls introduced in the previous commit. In the new world order, the transitive closure on the vector operations is not precomputed. As such, it's unsafe to actually use any of the function pointers in an indirect function call. They can be null, and we need to use the default vector in that case. This is mostly a quick fix for the four function pointers that are ed explicitly. A more generic or scalable solution is likely to see the light of day. No pathos on: current@
|
#
aec0fb7b |
|
01-Dec-2004 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Back when VOP_* was introduced, we did not have new-style struct initializations but we did have lofty goals and big ideals. Adjust to more contemporary circumstances and gain type checking. Replace the entire vop_t frobbing thing with properly typed structures. The only casualty is that we can not add a new VOP_ method with a loadable module. History has not given us reason to belive this would ever be feasible in the the first place. Eliminate in toto VOCALL(), vop_t, VNODEOP_SET() etc. Give coda correct prototypes and function definitions for all vop_()s. Generate a bit more data from the vnode_if.src file: a struct vop_vector and protype typedefs for all vop methods. Add a new vop_bypass() and make vop_default be a pointer to another struct vop_vector. Remove a lot of vfs_init since vop_vector is ready to use from the compiler. Cast various vop_mumble() to void * with uppercase name, for instance VOP_PANIC, VOP_NULL etc. Implement VCALL() by making vdesc_offset the offsetof() the relevant function pointer in vop_vector. This is disgusting but since the code is generated by a script comparatively safe. The alternative for nullfs etc. would be much worse. Fix up all vnode method vectors to remove casts so they become typesafe. (The bulk of this is generated by scripts)
|
#
db442506 |
|
13-Nov-2004 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Eliminate vop_revoke() function now that devfs_revoke() does the entire job.
|
#
c5b846fe |
|
10-Nov-2004 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Slim vnodes by another four bytes by eliminating the (now) unused field v_cachedid.
|
#
c13a4e88 |
|
10-Nov-2004 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Remove vn_todev()
|
#
b797084e |
|
09-Nov-2004 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Remove vnode->v_cachedfs. It was only used for the highly dangerous "export all vnodes with a sysctl" function.
|
#
20eba72f |
|
27-Oct-2004 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Move the syncer linkage from vnode to bufobj. This is not quite a perfect separation: the syncer still think it knows that everything is a vnode.
|
#
156cb265 |
|
25-Oct-2004 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Loose the v_dirty* and v_clean* alias macros. Check the count field where we just want to know the full/empty state, rather than using TAILQ_EMPTY() or TAILQ_FIRST().
|
#
ee1d0eb3 |
|
25-Oct-2004 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Remove vnode->v_bsize. This was a dead-end.
|
#
4dcd0ac4 |
|
25-Oct-2004 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Collapse vnode->v_object and buf->b_object into bufobj->bo_object.
|
#
ff7c5a48 |
|
22-Oct-2004 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Alas, poor SPECFS! -- I knew him, Horatio; A filesystem of infinite jest, of most excellent fancy: he hath taught me lessons a thousand times; and now, how abhorred in my imagination it is! my gorge rises at it. Here were those hacks that I have curs'd I know not how oft. Where be your kludges now? your workarounds? your layering violations, that were wont to set the table on a roar? Move the skeleton of specfs into devfs where it now belongs and bury the rest.
|
#
a76d8f4e |
|
21-Oct-2004 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Move the VI_BWAIT flag into no bo_flag element of bufobj and call it BO_WWAIT Add bufobj_wref(), bufobj_wdrop() and bufobj_wwait() to handle the write count on a bufobj. Bufobj_wdrop() replaces vwakeup(). Use these functions all relevant places except in ffs_softdep.c where the use if interlocked_sleep() makes this impossible. Rename b_vnbufs to b_bobufs now that we touch all the relevant files anyway.
|
#
1bca607b |
|
21-Oct-2004 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Add BO_* macros parallel to VI_* macros for manipulating the bo_mtx. Initialize the bo_mtx when we allocate a vnode i getnewvnode() For now we point to the vnodes interlock mutex, that retains the exact same locking sematics. Move v_numoutput from vnode to bufobj. Add renaming macro to postpone code sweep.
|
#
0bf87424 |
|
20-Oct-2004 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Add new function ttyinitmode() which sets our systemwide default modes on a tty structure. Both the ".init" and the current settings are initialized allowing the function to be used both at attach and open time. The function takes an argument to decide if echoing should be enabled by default. Echoing should not be enabled for regular physical serial ports unless they are consoles, in which case they should be configured by ttyconsolemode() instead. Use the new function throughout.
|
#
1affa3ad |
|
07-Sep-2004 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Create simple function init_va_filerev() for initializing a va_filerev field. Replace three instances of longhaired initialization va_filerev fields. Added XXX comment wondering why we don't use random bits instead of uptime of the system for this purpose.
|
#
ad3b9257 |
|
15-Aug-2004 |
John-Mark Gurney <jmg@FreeBSD.org> |
Add locking to the kqueue subsystem. This also makes the kqueue subsystem a more complete subsystem, and removes the knowlege of how things are implemented from the drivers. Include locking around filter ops, so a module like aio will know when not to be unloaded if there are outstanding knotes using it's filter ops. Currently, it uses the MTX_DUPOK even though it is not always safe to aquire duplicate locks. Witness currently doesn't support the ability to discover if a dup lock is ok (in some cases). Reviewed by: green, rwatson (both earlier versions)
|
#
39dfb406 |
|
10-Aug-2004 |
Robert Watson <rwatson@FreeBSD.org> |
Modify vnode locking key: the v_pollinfo pointer itself is protected by Giant; the contents are protected by the pollinfo mutex. We rely on Giant to prevent races in assigning the value of v_pollinfo.
|
#
f257b7a5 |
|
12-Jul-2004 |
Alfred Perlstein <alfred@FreeBSD.org> |
Make VFS_ROOT() and vflush() take a thread argument. This is to allow filesystems to decide based on the passed thread which vnode to return. Several filesystems used curthread, they now use the passed thread.
|
#
1cbb1e02 |
|
03-Jul-2004 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Blocksize for I/O should be a property of the vnode and not found by groping around in the vnodes surroundings when we allocate a block. Assign a blocksize when we create a vnode, and yell a warning (and ignore it) if we got the wrong size. Please email all such warnings to me.
|
#
f3732fd1 |
|
17-Jun-2004 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Second half of the dev_t cleanup. The big lines are: NODEV -> NULL NOUDEV -> NODEV udev_t -> dev_t udev2dev() -> findcdev() Various minor adjustments including handling of userland access to kernel space struct cdev etc.
|
#
89c9c53d |
|
16-Jun-2004 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Do the dreaded s/dev_t/struct cdev */ Bump __FreeBSD_version accordingly.
|
#
f99619a0 |
|
04-Jun-2004 |
Tim J. Robbins <tjr@FreeBSD.org> |
Change the types of vn_rdwr_inchunks()'s len and aresid arguments to size_t and size_t *, respectively. Update callers for the new interface. This is a better fix for overflows that occurred when dumping segments larger than 2GB to core files.
|
#
82c6e879 |
|
06-Apr-2004 |
Warner Losh <imp@FreeBSD.org> |
Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999. Approved by: core
|
#
b21126c6 |
|
29-Mar-2004 |
Peter Wemm <peter@FreeBSD.org> |
Clean up the stub fake vnode locking implemenations. The main reason this stuff was here (NFS) was fixed by Alfred in November. The only remaining consumer of the stub functions was umapfs, which is horribly horribly broken. It has missed out on about the last 5 years worth of maintenence that was done on nullfs (from which umapfs is derived). It needs major work to bring it up to date with the vnode locking protocol. umapfs really needs to find a caretaker to bring it into the 21st century. Functions GC'ed: vop_noislocked, vop_nolock, vop_nounlock, vop_sharedlock.
|
#
651b11ea |
|
11-Mar-2004 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Remove unused second arg to vfinddev(). Don't call addaliasu() on VBLK nodes.
|
#
64f4c8e8 |
|
05-Jan-2004 |
Alexander Kabaev <kan@FreeBSD.org> |
Properly ifdef support for vfs locking assertions based on DEBUG_VFS_LOCKS. Obtained from: bde
|
#
89144d1a |
|
05-Jan-2004 |
Alexander Kabaev <kan@FreeBSD.org> |
Style fixes: Remove double empty lines. Add tab after #define's. Properly terminate sentences in comments. Obtained from: bde (mostly).
|
#
9efe7d9d |
|
28-Dec-2003 |
Bruce Evans <bde@FreeBSD.org> |
v_vxproc was a bogus name for a thread (pointer).
|
#
eca8a663 |
|
11-Nov-2003 |
Robert Watson <rwatson@FreeBSD.org> |
Modify the MAC Framework so that instead of embedding a (struct label) in various kernel objects to represent security data, we embed a (struct label *) pointer, which now references labels allocated using a UMA zone (mac_label.c). This allows the size and shape of struct label to be varied without changing the size and shape of these kernel objects, which become part of the frozen ABI with 5-STABLE. This opens the door for boot-time selection of the number of label slots, and hence changes to the bound on the number of simultaneous labeled policies at boot-time instead of compile-time. This also makes it easier to embed label references in new objects as required for locking/caching with fine-grained network stack locking, such as inpcb structures. This change also moves us further in the direction of hiding the structure of kernel objects from MAC policy modules, not to mention dramatically reducing the number of '&' symbols appearing in both the MAC Framework and MAC policy modules, and improving readability. While this results in minimal performance change with MAC enabled, it will observably shrink the size of a number of critical kernel data structures for the !MAC case, and should have a small (but measurable) performance benefit (i.e., struct vnode, struct socket) do to memory conservation and reduced cost of zeroing memory. NOTE: Users of MAC must recompile their kernel and all MAC modules as a result of this change. Because this is an API change, third party MAC modules will also need to be updated to make less use of the '&' symbol. Suggestions from: bmilekic Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
|
#
ca430f2e |
|
04-Nov-2003 |
Alexander Kabaev <kan@FreeBSD.org> |
Remove mntvnode_mtx and replace it with per-mountpoint mutex. Introduce two new macros MNT_ILOCK(mp)/MNT_IUNLOCK(mp) to operate on this mutex transparently. Eventually new mutex will be protecting more fields in struct mount, not only vnode list. Discussed with: jeff
|
#
06cb76bd |
|
23-Oct-2003 |
Garrett Wollman <wollman@FreeBSD.org> |
Add appropriate const poisoning to the assert_*locked() family so that I can call ASSERT_VOP_LOCKED(vp, __func__) without a diagnostic. Inspired by: the evil and rude OpenAFS cache manager code
|
#
c31e9345 |
|
04-Oct-2003 |
Jeff Roberson <jeff@FreeBSD.org> |
- Document more of the vnode locking strategy.
|
#
7c89f162 |
|
27-Jul-2003 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Add fdidx argument to vn_open() and vn_open_cred() and pass -1 throughout.
|
#
b6f5d1d6 |
|
09-Jul-2003 |
Jeffrey Hsu <hsu@FreeBSD.org> |
Replace custom field offset macro with the system __offsetof() macro. Reviewed by: bde
|
#
17a13919 |
|
31-May-2003 |
Poul-Henning Kamp <phk@FreeBSD.org> |
The IO_NOWDRAIN and B_NOWDRAIN hacks are no longer needed to prevent deadlocks with vnode backed md(4) devices because md now uses a kthread to run the bio requests instead of doing it directly from the bio down path.
|
#
adf4e1d5 |
|
26-Apr-2003 |
Alan Cox <alc@FreeBSD.org> |
Remove an unused declaration.
|
#
9c926592 |
|
09-Apr-2003 |
Mike Barcroft <mike@FreeBSD.org> |
Add prototypes for change_root() and change_dir().
|
#
3a7053cb |
|
24-Feb-2003 |
Kirk McKusick <mckusick@FreeBSD.org> |
Prevent large files from monopolizing the system buffers. Keep track of the number of dirty buffers held by a vnode. When a bdwrite is done on a buffer, check the existing number of dirty buffers associated with its vnode. If the number rises above vfs.dirtybufthresh (currently 90% of vfs.hidirtybuffers), one of the other (hopefully older) dirty buffers associated with the vnode is written (using bawrite). In the event that this approach fails to curb the growth in it the vnode's number of dirty buffers (due to soft updates rollback dependencies), the more drastic approach of doing a VOP_FSYNC on the vnode is used. This code primarily affects very large and actively written files such as snapshots. This change should eliminate hanging when taking snapshots or doing background fsck on very large filesystems. Hopefully, one day it will be possible to cache filesystem metadata in the VM cache as is done with file data. As it stands, only the buffer cache can be used which limits total metadata storage to about 20Mb no matter how much memory is available on the system. This rather small memory gets badly thrashed causing a lot of extra I/O. For example, taking a snapshot of a 1Tb filesystem minimally requires about 35,000 write operations, but because of the cache thrashing (we only have about 350 buffers at our disposal) ends up doing about 237,540 I/O's thus taking twenty-five minutes instead of four if it could run entirely in the cache. Reported by: Attila Nagy <bra@fsn.hu> Sponsored by: DARPA & NAI Labs.
|
#
767b9a52 |
|
09-Feb-2003 |
Jeff Roberson <jeff@FreeBSD.org> |
- Cleanup unlocked accesses to buf flags by introducing a new b_vflag member that is protected by the vnode lock. - Move B_SCANNED into b_vflags and call it BV_SCANNED. - Create a vop_stdfsync() modeled after spec's sync. - Replace spec_fsync, msdos_fsync, and hpfs_fsync with the stdfsync and some fs specific processing. This gives all of these filesystems proper behavior wrt MNT_WAIT/NOWAIT and the use of the B_SCANNED flag. - Annotate the locking in buf.h
|
#
6a1b2a22 |
|
29-Dec-2002 |
Ian Dowse <iedowse@FreeBSD.org> |
Add a new vnode flag VI_DOINGINACT to indicate that a VOP_INACTIVE call is in progress on the vnode. When vput() or vrele() sees a 1->0 reference count transition, it now return without any further action if this flag is set. This flag is necessary to avoid recursion into VOP_INACTIVE if the filesystem inactive routine causes the reference count to increase and then drop back to zero. It is also used to guarantee that an unlocked vnode will not be recycled while blocked in VOP_INACTIVE(). There are at least two cases where the recursion can occur: one is that the softupdates code called by ufs_inactive() via ffs_truncate() can call vput() on the vnode. This has been reported by many people as "lockmgr: draining against myself" panics. The other case is that nfs_inactive() can call vget() and then vrele() on the vnode to clean up a sillyrename file. Reviewed by: mckusick (an older version of the patch)
|
#
45587e25 |
|
28-Dec-2002 |
Matthew Dillon <dillon@FreeBSD.org> |
Abstract-out the constants for the sequential heuristic. No operational changes. MFC after: 1 day
|
#
c7047e52 |
|
27-Oct-2002 |
Garrett Wollman <wollman@FreeBSD.org> |
Change the way support for asynchronous I/O is indicated to applications to conform to 1003.1-2001. Make it possible for applications to actually tell whether or not asynchronous I/O is supported. Since FreeBSD's aio implementation works on all descriptor types, don't call down into file or vnode ops when [f]pathconf() is asked about _PC_ASYNC_IO; this avoids the need for every file and vnode op to know about it.
|
#
9ab73fd1 |
|
24-Oct-2002 |
Kirk McKusick <mckusick@FreeBSD.org> |
Within ufs, the ffs_sync and ffs_fsync functions did not always check for and/or report I/O errors. The result is that a VFS_SYNC or VOP_FSYNC called with MNT_WAIT could loop infinitely on ufs in the presence of a hard error writing a disk sector or in a filesystem full condition. This patch ensures that I/O errors will always be checked and returned. This patch also ensures that every call to VFS_SYNC or VOP_FSYNC with MNT_WAIT set checks for and takes appropriate action when an error is returned. Sponsored by: DARPA & NAI Labs.
|
#
7d2eb23f |
|
12-Oct-2002 |
Jeff Roberson <jeff@FreeBSD.org> |
- Remove the do { } while(0) from the VOP lock assert macros. This was not optimized away by the compiler in time for it to still leave the VOP functions as inlines. Submitted by: bde
|
#
5c5f6cfa |
|
28-Sep-2002 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Don't use unnamed anonymous structs: give it a name.
|
#
ca916247 |
|
27-Sep-2002 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Rename struct specinfo to the more appropriate struct cdev. Agreed on: jake, rwatson, jhb
|
#
6423c943 |
|
25-Sep-2002 |
Jeff Roberson <jeff@FreeBSD.org> |
- Move ASSERT_VOP_*LOCK* functionality into functions in vfs_subr.c - Make the VI asserts more orthogonal to the rest of the asserts by using a new, common vfs_badlock() function and adding a 'str' arg. - Adjust generated ASSERTS to match the new prototype. - Adjust explicit ASSERTS to match the new prototype.
|
#
6cb8bf20 |
|
24-Sep-2002 |
Jeff Roberson <jeff@FreeBSD.org> |
- Lock down the syncer with sync_mtx. - Enable vfs_badlock_mutex by default. - Assert that the vp is locked in VOP_UNLOCK. - Use standard interlock macros in remaining code. - Correct a race in getnewvnode(). - Lock access to v_numoutput with interlock. - Lock access to buf lists and splay tree with interlock. - Add VOP and VI asserts. - Lock b_vnbufs with the vnode interlock. - Add vrefcnt() for callers who want to retreive the vnode ref without holding a lock. Add a comment that describes when this is safe. - Add vholdl() and vdropl() so that callers who already own the interlock can avoid race conditions and unnecessary unlocking. - Move the VOP_GETATTR() in vflush() into the WRITECLOSE conditional case. - Hold the interlock before droping the mntlist_mtx in vflush() to avoid a race. - Fix locking in vfs_msync().
|
#
9a0a3322 |
|
24-Sep-2002 |
Jeff Roberson <jeff@FreeBSD.org> |
- Finish the struct vnode lock annotation. - Order fields by what lock is required to access them.
|
#
cc8e7533 |
|
23-Sep-2002 |
Jeff Roberson <jeff@FreeBSD.org> |
- Include sys/ktr.h so that vnode_if.h can define trace points.
|
#
06be2aaa |
|
14-Sep-2002 |
Nate Lawson <njl@FreeBSD.org> |
Remove all use of vnode->v_tag, replacing with appropriate substitutes. v_tag is now const char * and should only be used for debugging. Additionally: 1. All users of VT_NTS now check vfsconf->vf_type VFCF_NETWORK 2. The user of VT_PROCFS now checks for the new flag VV_PROCDEP, which is propagated by pseudofs to all child vnodes if the fs sets PFS_PROCDEP. Suggested by: phk Reviewed by: bde, rwatson (earlier version)
|
#
bb77b793 |
|
10-Sep-2002 |
Bruce Evans <bde@FreeBSD.org> |
Fixed namespace pollution in uma changes: - use `struct uma_zone *' instead of uma_zone_t, so that <sys/uma.h> isn't a prerequisite. - don't include <sys/uma.h>. Namespace pollution makes "opaque" types like uma_zone_t perfectly non-opaque. Such types should never be used (see style(9)). "Fixed" subsequently grown dependencies of this header on its own pollution by polluting explicitly: - include <sys/mutex.h> and its prerequisite <sys/lock.h> instead of depending on namespace pollution 2 layers deep in <sys/uma.h>.
|
#
8f19eb88 |
|
01-Sep-2002 |
Ian Dowse <iedowse@FreeBSD.org> |
Split out a number of mostly VFS and signal related syscalls into a kernel-internal kern_*() version and a wrapper that is called via the syscall vector table. For paths and structure pointers, the internal version either takes a uio_seg parameter or requires the caller to copyin() the data to kernel memory as appropiate. This will permit emulation layers to use these syscalls without having to copy out translated arguments to the stack gap. Discussed on: -arch Review/suggestions: bde, jhb, peter, marcel
|
#
71ea4ba5 |
|
21-Aug-2002 |
Jeff Roberson <jeff@FreeBSD.org> |
- Add two new debugging macros: ASSERT_VI_LOCKED and ASSERT_VI_UNLOCKED - Use the new VI asserts in place of the old mtx_assert checks. - Add the VI asserts to the automated lock checking in the VOP calls. The interlock should not be held across vops with a few exceptions. - Add the vop_(un)lock_{pre,post} functions to assert that interlock is held when LK_INTERLOCK is set.
|
#
ea6027a8 |
|
15-Aug-2002 |
Robert Watson <rwatson@FreeBSD.org> |
Make similar changes to fo_stat() and fo_poll() as made earlier to fo_read() and fo_write(): explicitly use the cred argument to fo_poll() as "active_cred" using the passed file descriptor's f_cred reference to provide access to the file credential. Add an active_cred argument to fo_stat() so that implementers have access to the active credential as well as the file credential. Generally modify callers of fo_stat() to pass in td->td_ucred rather than fp->f_cred, which was redundantly provided via the fp argument. This set of modifications also permits threads to perform these operations on behalf of another thread without modifying their credential. Trickle this change down into fo_stat/poll() implementations: - badfo_poll(), badfo_stat(): modify/add arguments. - kqueue_poll(), kqueue_stat(): modify arguments. - pipe_poll(), pipe_stat(): modify/add arguments, pass active_cred to MAC checks rather than td->td_ucred. - soo_poll(), soo_stat(): modify/add arguments, pass fp->f_cred rather than cred to pru_sopoll() to maintain current semantics. - sopoll(): moidfy arguments. - vn_poll(), vn_statfile(): modify/add arguments, pass new arguments to vn_stat(). Pass active_cred to MAC and fp->f_cred to VOP_POLL() to maintian current semantics. - vn_close(): rename cred to file_cred to reflect reality while I'm here. - vn_stat(): Add active_cred and file_cred arguments to vn_stat() and consumers so that this distinction is maintained at the VFS as well as 'struct file' layer. Pass active_cred instead of td->td_ucred to MAC and to VOP_GETATTR() to maintain current semantics. - fifofs: modify the creation of a "filetemp" so that the file credential is properly initialized and can be used in the socket code if desired. Pass ap->a_td->td_ucred as the active credential to soo_poll(). If we teach the vnop interface about the distinction between file and active credentials, we would use the active credential here. Note that current inconsistent passing of active_cred vs. file_cred to VOP's is maintained. It's not clear why GETATTR would be authorized using active_cred while POLL would be authorized using file_cred at the file system level. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
9ca43589 |
|
15-Aug-2002 |
Robert Watson <rwatson@FreeBSD.org> |
In order to better support flexible and extensible access control, make a series of modifications to the credential arguments relating to file read and write operations to cliarfy which credential is used for what: - Change fo_read() and fo_write() to accept "active_cred" instead of "cred", and change the semantics of consumers of fo_read() and fo_write() to pass the active credential of the thread requesting an operation rather than the cached file cred. The cached file cred is still available in fo_read() and fo_write() consumers via fp->f_cred. These changes largely in sys_generic.c. For each implementation of fo_read() and fo_write(), update cred usage to reflect this change and maintain current semantics: - badfo_readwrite() unchanged - kqueue_read/write() unchanged pipe_read/write() now authorize MAC using active_cred rather than td->td_ucred - soo_read/write() unchanged - vn_read/write() now authorize MAC using active_cred but VOP_READ/WRITE() with fp->f_cred Modify vn_rdwr() to accept two credential arguments instead of a single credential: active_cred and file_cred. Use active_cred for MAC authorization, and select a credential for use in VOP_READ/WRITE() based on whether file_cred is NULL or not. If file_cred is provided, authorize the VOP using that cred, otherwise the active credential, matching current semantics. Modify current vn_rdwr() consumers to pass a file_cred if used in the context of a struct file, and to always pass active_cred. When vn_rdwr() is used without a file_cred, pass NOCRED. These changes should maintain current semantics for read/write, but avoid a redundant passing of fp->f_cred, as well as making it more clear what the origin of each credential is in file descriptor read/write operations. Follow-up commits will make similar changes to other file descriptor operations, and modify the MAC framework to pass both credentials to MAC policy modules so they can implement either semantic for revocation. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
01abbb42 |
|
13-Aug-2002 |
Robert Watson <rwatson@FreeBSD.org> |
Move to a nested include of _label.h instead of mac.h in sys/sys/*.h (Most of the places where mac.h was recursively included from another kernel header file. net/netinet to follow.) Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs Suggested by: bde
|
#
62c0c263 |
|
11-Aug-2002 |
Robert Watson <rwatson@FreeBSD.org> |
Introduce IO_NOMACCHECK, a flag that will be passed to vn_rdwr() to indicate that the calling code has already performed necessary MAC checks (if any) for this operation. This flag will help resolve layering problems that existing because vn_rdwr() is called both on behalf of user processes directly (such as in system calls of various sorts, during core dumps, etc), as well as deep in the file system code on behalf of the file system (such as in UFS, ext2fs, etc). Code that is acting on behalf of a kernel service rather than explicitly on behalf of a user process will specify this flag. By default, MAC checks will be performed (and generally should be performed). Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
780bb93d |
|
07-Aug-2002 |
Jeff Roberson <jeff@FreeBSD.org> |
- Adjust locking markup to match the proc markup. - Add a comment about the current, unfinished, state of vnode locking. Suggested by: bde
|
#
973a3f4b |
|
05-Aug-2002 |
Jeff Roberson <jeff@FreeBSD.org> |
- Document more of the struct vnode locking protocol. - Slightly reformat a comment block.
|
#
e6e370a7 |
|
04-Aug-2002 |
Jeff Roberson <jeff@FreeBSD.org> |
- Replace v_flag with v_iflag and v_vflag - v_vflag is protected by the vnode lock and is used when synchronization with VOP calls is needed. - v_iflag is protected by interlock and is used for dealing with vnode management issues. These flags include X/O LOCK, FREE, DOOMED, etc. - All accesses to v_iflag and v_vflag have either been locked or marked with mp_fixme's. - Many ASSERT_VOP_LOCKED calls have been added where the locking was not clear. - Many functions in vfs_subr.c were restructured to provide for stronger locking. Idea stolen from: BSD/OS
|
#
4eee8de7 |
|
30-Jul-2002 |
Dag-Erling Smørgrav <des@FreeBSD.org> |
Introduce struct xvnode, which will be used instead of struct vnode for sysctl purposes. Also add two fields to struct vnode, v_cachedfs and v_cachedid, which hold the vnode's device and file id and are filled in by vn_open_cred() and vn_stat(). Sponsored by: DARPA, NAI Labs
|
#
f3cfa607 |
|
30-Jul-2002 |
Robert Watson <rwatson@FreeBSD.org> |
Begin committing support for Mandatory Access Control and extensible kernel access control. The MAC framework permits loadable kernel modules to link to the kernel at compile-time, boot-time, or run-time, and augment the system security policy. This commit includes the initial kernel implementation, although the interface with the userland components of the oeprating system is still under work, and not all kernel subsystems are supported. Later in this commit sequence, documentation of which kernel subsystems will not work correctly with a kernel compiled with MAC support will be added. Label vnodes, permitting security information to maintained at the granularity of the individual file, directory (et al). This data is protected by the vnode lock and may be read only when holding a shared lock, or modified only when holding an exclusive lock. Label information may be considered either the primary copy, or a cached copy. Individual file systems or kernel services may use the VCACHEDLABEL flag for accounting purposes to determine which it is. New VOPs will be introduced to refresh this label on demand, or to set the label value. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
c72f085d |
|
30-Jul-2002 |
Jeff Roberson <jeff@FreeBSD.org> |
- Add vfs_badlock_{print,panic} support to the remaining VOP_ASSERT_* macros.
|
#
3b82505b |
|
29-Jul-2002 |
Jeff Roberson <jeff@FreeBSD.org> |
- Add VBAD to the list of vnodes that are ignored on locking operations.
|
#
586261a3 |
|
27-Jul-2002 |
Robert Watson <rwatson@FreeBSD.org> |
Reserve VCACHEDLABEL vnode flag for use by the TrustedBSD MAC implementation. This flag will indicate that the security label in the vnode is currently valid, and therefore doesn't need to be refreshed before an access control decision can be made. Most file systems (or stdvops) will set this flag after they load the MAC label from disk the first time to prevent redundant disk I/O; some file synthetic file systems (procfs, for example) may not. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
ccac0940 |
|
21-Jul-2002 |
Robert Watson <rwatson@FreeBSD.org> |
Add VALLPERM, which is a mask of all the access control request permission bits for vnodes passed to vaccess() and friends. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
230d10f6 |
|
21-Jul-2002 |
Robert Watson <rwatson@FreeBSD.org> |
Sort vnode access mode flags. Add flags VSTAT, VAPPEND required for TrustedBSD. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
7aca6291 |
|
19-Jul-2002 |
Kirk McKusick <mckusick@FreeBSD.org> |
Add support to UFS2 to provide storage for extended attributes. As this code is not actually used by any of the existing interfaces, it seems unlikely to break anything (famous last words). The internal kernel interface to manipulate these attributes is invoked using two new IO_ flags: IO_NORMAL and IO_EXT. These flags may be specified in the ioflags word of VOP_READ, VOP_WRITE, and VOP_TRUNCATE. Specifying IO_NORMAL means that you want to do I/O to the normal data part of the file and IO_EXT means that you want to do I/O to the extended attributes part of the file. IO_NORMAL and IO_EXT are mutually exclusive for VOP_READ and VOP_WRITE, but may be specified individually or together in the case of VOP_TRUNCATE. For example, when removing a file, VOP_TRUNCATE is called with both IO_NORMAL and IO_EXT set. For backward compatibility, if neither IO_NORMAL nor IO_EXT is set, then IO_NORMAL is assumed. Note that the BA_ and IO_ flags have been `merged' so that they may both be used in the same flags word. This merger is possible by assigning the IO_ flags to the low sixteen bits and the BA_ flags the high sixteen bits. This works because the high sixteen bits of the IO_ word is reserved for read-ahead and help with write clustering so will never be used for flags. This merge lets us get away from code of the form: if (ioflags & IO_SYNC) flags |= BA_SYNC; For the future, I have considered adding a new field to the vattr structure, va_extsize. This addition could then be exported through the stat structure to allow applications to find out the size of the extended attribute storage and also would provide a more standard interface for truncating them (via VOP_SETATTR rather than VOP_TRUNCATE). I am also contemplating adding a pathconf parameter (for concreteness, lets call it _PC_MAX_EXTSIZE) which would let an application determine the maximum size of the extended atribute storage. Sponsored by: DARPA & NAI Labs.
|
#
faab4e27 |
|
16-Jul-2002 |
Kirk McKusick <mckusick@FreeBSD.org> |
Change the name of st_createtime to st_birthtime. This change is made to reduce confusion between st_ctime and st_createtime. Submitted by: Eric Allman <eric@sendmail.org> Sponsored by: DARPA & NAI Labs.
|
#
d331c5d4 |
|
10-Jul-2002 |
Matthew Dillon <dillon@FreeBSD.org> |
Replace the global buffer hash table with per-vnode splay trees using a methodology similar to the vm_map_entry splay and the VM splay that Alan Cox is working on. Extensive testing has appeared to have shown no increase in overhead. Disadvantages Dirties more cache lines during lookups. Not as fast as a hash table lookup (but still N log N and optimal when there is locality of reference). Advantages vnode->v_dirtyblkhd is now perfectly sorted, making fsync/sync/filesystem syncer operate more efficiently. I get to rip out all the old hacks (some of which were mine) that tried to keep the v_dirtyblkhd tailq sorted. The per-vnode splay tree should be easier to lock / SMPng pushdown on vnodes will be easier. This commit along with another that Alan is working on for the VM page global hash table will allow me to implement ranged fsync(), optimize server-side nfs commit rpcs, and implement partial syncs by the filesystem syncer (aka filesystem syncer would detect that someone is trying to get the vnode lock, remembers its place, and skip to the next vnode). Note that the buffer cache splay is somewhat more complex then other splays due to special handling of background bitmap writes (multiple buffers with the same lblkno in the same vnode), and B_INVAL discontinuities between the old hash table and the existence of the buffer on the v_cleanblkhd list. Suggested by: alc
|
#
f00501bd |
|
09-Jul-2002 |
Jeff Roberson <jeff@FreeBSD.org> |
- Remove IS_LOCKING_VFS() all of our filesystems support locking now - Add IGNORE_LOCK() that only ignores VCHR files for now since no one locks their underlying device in the leaf filesystems. (devvp) - Add prototypes for vop_lookup_{pre,post} that I forgot before.
|
#
9725e58e |
|
07-Jul-2002 |
Jeff Roberson <jeff@FreeBSD.org> |
- VT_PSEUDOFS and VT_PROCFS support locking now - Remove VBLK from the list of vtypes that are ignored for locking ops.
|
#
302c7aaa |
|
05-Jul-2002 |
Jeff Roberson <jeff@FreeBSD.org> |
- Add vop_strategy_pre to validate VOP_STRATEGY locking. - Disable original vop_strategy lock specification. - Switch to the new vop_strategy_pre for lock validation. VOP_STRATEGY requires only that the buf is locked UNLESS the block numbers need to be translated. There may be other reasons, but as long as the underlying layer uses a VOP to perform the operations they will be caught later.
|
#
cc8662b0 |
|
05-Jul-2002 |
Jeff Roberson <jeff@FreeBSD.org> |
Add "vop_rename_pre" to do pre rename lock verification. This is enabled only with DEBUG_VFS_LOCKS.
|
#
d0641e4e |
|
04-Jul-2002 |
Jeff Roberson <jeff@FreeBSD.org> |
Cleanups for vnode lock debugging. - Tell IS_LOCKING_VFS to ignore block and character devices. specfs vnodes aren't locked for io and they just generate lots of false positives. - Add newlines to the badlock prints.
|
#
6bd521df |
|
01-Jul-2002 |
Ian Dowse <iedowse@FreeBSD.org> |
Use indirect function pointer hooks instead of #ifdef SOFTUPDATES direct calls for the two places where the kernel calls into soft updates code. Set up the hooks in softdep_initialize() and NULL them out in softdep_uninitialize(). This change allows soft updates to function correctly when ufs is loaded as a module. Reviewed by: mckusick
|
#
90769c9e |
|
28-Jun-2002 |
Jeff Roberson <jeff@FreeBSD.org> |
Improve the VOP locking asserts - Add vfs_badlock_print to control whether or not we print lock violations - Add vfs_badlock_panic to control whether we panic on lock violations Both default to on to mimic the original behavior if DEBUG_VFS_LOCKS is on.
|
#
1c85e6a3 |
|
21-Jun-2002 |
Kirk McKusick <mckusick@FreeBSD.org> |
This commit adds basic support for the UFS2 filesystem. The UFS2 filesystem expands the inode to 256 bytes to make space for 64-bit block pointers. It also adds a file-creation time field, an ability to use jumbo blocks per inode to allow extent like pointer density, and space for extended attributes (up to twice the filesystem block size worth of attributes, e.g., on a 16K filesystem, there is space for 32K of attributes). UFS2 fully supports and runs existing UFS1 filesystems. New filesystems built using newfs can be built in either UFS1 or UFS2 format using the -O option. In this commit UFS1 is the default format, so if you want to build UFS2 format filesystems, you must specify -O 2. This default will be changed to UFS2 when UFS2 proves itself to be stable. In this commit the boot code for reading UFS2 filesystems is not compiled (see /sys/boot/common/ufsread.c) as there is insufficient space in the boot block. Once the size of the boot block is increased, this code can be defined. Things to note: the definition of SBSIZE has changed to SBLOCKSIZE. The header file <ufs/ufs/dinode.h> must be included before <ufs/ffs/fs.h> so as to get the definitions of ufs2_daddr_t and ufs_lbn_t. Still TODO: Verify that the first level bootstraps work for all the architectures. Convert the utility ffsinfo to understand UFS2 and test growfs. Add support for the extended attribute storage. Update soft updates to ensure integrity of extended attribute storage. Switch the current extended attribute interfaces to use the extended attribute storage. Add the extent like functionality (framework is there, but is currently never used). Sponsored by: DARPA & NAI Labs. Reviewed by: Poul-Henning Kamp <phk@freebsd.org>
|
#
d394511d |
|
16-May-2002 |
Tom Rhodes <trhodes@FreeBSD.org> |
More s/file system/filesystem/g
|
#
84f9ed84 |
|
29-Apr-2002 |
Robert Watson <rwatson@FreeBSD.org> |
Since devfs now uses vnode locks, add devfs back to IS_LOCKING_VFS.
|
#
ae6b9ab0 |
|
25-Apr-2002 |
Robert Watson <rwatson@FreeBSD.org> |
Add UDF to the list of filesystems where locking assertions should be evaluated. Approved by: scottl
|
#
dbb17987 |
|
25-Apr-2002 |
Robert Watson <rwatson@FreeBSD.org> |
1.43 (dfr 04-Apr-97): /* 1.43 (dfr 04-Apr-97): * [dfr] Kludge until I get around to fixing all the vfs locking. 1.43 (dfr 04-Apr-97): */ The new devfs doesn't support VFS locking. So don't do locking assertions for devfs vnodes. With this change, a kernel with options DEBUG_VFS_LOCKS actually gets to single-user mode. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
#
df263cbd |
|
14-Apr-2002 |
Scott Long <scottl@FreeBSD.org> |
Add a filesystem driver for the Universal Disk Format. For more info, see http://people.freebsd.org/~scottl/udf MFC after: when asmodai gets the backport done Prodded by: phk asmodai des
|
#
c897b813 |
|
19-Mar-2002 |
Jeff Roberson <jeff@FreeBSD.org> |
Remove references to vm_zone.h and switch over to the new uma API. Also, remove maxsockets. If you look carefully you'll notice that the old zone allocator never honored this anyway.
|
#
789f12fe |
|
19-Mar-2002 |
Alfred Perlstein <alfred@FreeBSD.org> |
Remove __P
|
#
8355f576 |
|
19-Mar-2002 |
Jeff Roberson <jeff@FreeBSD.org> |
This is the first part of the new kernel memory allocator. This replaces malloc(9) and vm_zone with a slab like allocator. Reviewed by: arch@
|
#
68edc1b9 |
|
18-Feb-2002 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Make v_addpollinfo() visible and non-inline. Have callers only call it as needed. Add necessary call in ufs_kqfilter(). Test-case found by: Andrew Gallatin <gallatin@cs.duke.edu>
|
#
4b55dbe3 |
|
17-Feb-2002 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Move the stuff related to select and poll out of struct vnode. The use of the zone allocator may or may not be overkill. There is an XXX: over in ufs/ufs/ufs_vnops.c that jlemon may need to revisit. This shaves about 60 bytes of struct vnode which on my laptop means 600k less RAM used for vnodes.
|
#
e8b26e99 |
|
17-Feb-2002 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Collect the VN_KNOTE() macro definitions on vnode.h
|
#
038b6417 |
|
17-Feb-2002 |
Poul-Henning Kamp <phk@FreeBSD.org> |
v_lease is unused, zap it.
|
#
079b7bad |
|
07-Feb-2002 |
Julian Elischer <julian@FreeBSD.org> |
Pre-KSE/M3 commit. this is a low-functionality change that changes the kernel to access the main thread of a process via the linked list of threads rather than assuming that it is embedded in the process. It IS still embeded there but remove all teh code that assumes that in preparation for the next commit which will actually move it out. Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,
|
#
23b59018 |
|
20-Dec-2001 |
Matthew Dillon <dillon@FreeBSD.org> |
Fix a BUF_TIMELOCK race against BUF_LOCK and fix a deadlock in vget() against VM_WAIT in the pageout code. Both fixes involve adjusting the lockmgr's timeout capability so locks obtained with timeouts do not interfere with locks obtained without a timeout. Hopefully MFC: before the 4.5 release
|
#
fdb33f08 |
|
18-Dec-2001 |
Matthew Dillon <dillon@FreeBSD.org> |
This is a forward port of Peter's vlrureclaim() fix, with some minor mods by me to make it more efficient. The original code had serious balancing problems and could also deadlock easily. This code relegates the vnode reclamation to its own kproc and relaxes the vnode reclamation requirements to better maintain kern.maxvnodes. This code still doesn't balance as well as it could, but it does a much better job then the original code. Approved by: re@freebsd.org Obtained from: ps, peter, dillon MFS Assuming: Assuming no problems crop up in Yahoo testing MFC after: 7 days
|
#
f03e89de |
|
11-Nov-2001 |
Alfred Perlstein <alfred@FreeBSD.org> |
turn vn_open() into a wrapper around vn_open_cred() which allows one to perform a vn_open using temporary/other/fake credentials. Modify the nfs client side locking code to use vn_open_cred() passing proc0's ucred instead of the old way which was to temporary raise privs while running vn_open(). This should close the race hopefully.
|
#
7e76bb56 |
|
05-Nov-2001 |
Matthew Dillon <dillon@FreeBSD.org> |
Implement IO_NOWDRAIN and B_NOWDRAIN - prevents the buffer cache from blocking in wdrain during a write. This flag needs to be used in devices whos strategy routines turn-around and issue another high level I/O, such as when MD turns around and issues a VOP_WRITE to vnode backing store, in order to avoid deadlocking the dirty buffer draining code. Remove a vprintf() warning from MD when the backing vnode is found to be in-use. The syncer of buf_daemon could be flushing the backing vnode at the time of an MD operation so the warning is not correct. MFC after: 1 week
|
#
4ffa210b |
|
27-Oct-2001 |
Matthew Dillon <dillon@FreeBSD.org> |
syncdelay, filedelay, dirdelay, metadelay are ints, not time_t's, and can also be made static.
|
#
245df27c |
|
25-Oct-2001 |
Matthew Dillon <dillon@FreeBSD.org> |
Implement kern.maxvnodes. adjusting kern.maxvnodes now actually has a real effect. Optimize vfs_msync(). Avoid having to continually drop and re-obtain mutexes when scanning the vnode list. Improves looping case by 500%. Optimize ffs_sync(). Avoid having to continually drop and re-obtain mutexes when scanning the vnode list. This makes a couple of assumptions, which I believe are ok, in regards to vnode stability when the mount list mutex is held. Improves looping case by 500%. (more optimization work is needed on top of these fixes) MFC after: 1 week
|
#
c72ccd01 |
|
22-Oct-2001 |
Matthew Dillon <dillon@FreeBSD.org> |
Change the vnode list under the mount point from a LIST to a TAILQ in preparation for an implementation of limiting code for kern.maxvnodes. MFC after: 3 days
|
#
45fb069a |
|
21-Oct-2001 |
Dag-Erling Smørgrav <des@FreeBSD.org> |
Convert textvp_fullpath() into the more generic vn_fullpath() which takes a struct thread * and a struct vnode * instead of a struct proc *. Temporarily add a textvp_fullpath macro for compatibility.
|
#
b5810bab |
|
30-Sep-2001 |
Matthew Dillon <dillon@FreeBSD.org> |
After extensive testing it has been determined that adding complexity to avoid removing higher level directory vnodes from the namecache has no perceivable effect and will be removed. This is especially true when vmiodirenable is turned on, which it is by default now. ( vmiodirenable makes a huge difference in directory caching ). The vfs.vmiodirenable and vfs.nameileafonly sysctls have been left in to allow further testing, but I expect to rip out vfs.nameileafonly soon too. I have also determined through testing that the real problem with numvnodes getting too large is due to the VM Page cache preventing the vnode from being reclaimed. The directory stuff made only a tiny dent relative to Poul's original code, enough so that some tests succeeded. But tests with several million small files show that the bigger problem is the VM Page cache. This will have to be addressed by a future commit. MFC after: 3 days
|
#
2af8d76d |
|
13-Sep-2001 |
David E. O'Brien <obrien@FreeBSD.org> |
Re-apply rev 1.178 -- style(9) the structure definitions. I have to wonder how many other changes were lost in the KSE mildstone 2 merge.
|
#
b40ce416 |
|
12-Sep-2001 |
Julian Elischer <julian@FreeBSD.org> |
KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
|
#
06ae1e91 |
|
08-Sep-2001 |
Matthew Dillon <dillon@FreeBSD.org> |
This brings in a Yahoo coredump patch from Paul, with additional mods by me (addition of vn_rdwr_inchunks). The problem Yahoo is solving is that if you have large process images core dumping, or you have a large number of forked processes all core dumping at the same time, the original coredump code would leave the vnode locked throughout. This can cause the directory vnode to get locked up, which can cause the parent directory vnode to get locked up, and so on all the way to the root node, locking the entire machine up for extremely long periods of time. This patch solves the problem in two ways. First it uses an advisory non-blocking lock to abort multiple processes trying to core to the same file. Second (my contribution) it chunks up the writes and uses bwillwrite() to avoid holding the vnode locked while blocking in the buffer cache. Submitted by: ps Reviewed by: dillon MFC after: 2 weeks
|
#
0f728902 |
|
27-Aug-2001 |
Peter Wemm <peter@FreeBSD.org> |
If a file has been completely unlinked, stop automatically syncing the file. ffs will discard any pending dirty pages when it is closed, so we may as well not waste time trying to clean them. This doesn't stop other things from writing it out, eg: pageout, fsync(2) etc.
|
#
753d4978 |
|
29-May-2001 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Remove MFS
|
#
ac8f990b |
|
24-May-2001 |
Matthew Dillon <dillon@FreeBSD.org> |
This patch implements O_DIRECT about 80% of the way. It takes a patchset Tor created a while ago, removes the raw I/O piece (that has cache coherency problems), and adds a buffer cache / VM freeing piece. Essentially this patch causes O_DIRECT I/O to not be left in the cache, but does not prevent it from going through the cache, hence the 80%. For the last 20% we need a method by which the I/O can be issued directly to buffer supplied by the user process and bypass the buffer cache entirely, but still maintain cache coherency. I also have the code working under -stable but the changes made to sys/file.h may not be MFCable, so an MFC is not on the table yet. Submitted by: tegge, dillon
|
#
0864ef1e |
|
16-May-2001 |
Ian Dowse <iedowse@FreeBSD.org> |
Change the second argument of vflush() to an integer that specifies the number of references on the filesystem root vnode to be both expected and released. Many filesystems hold an extra reference on the filesystem root vnode, which must be accounted for when determining if the filesystem is busy and then released if it isn't busy. The old `skipvp' approach required individual filesystem xxx_unmount functions to re-implement much of vflush()'s logic to deal with the root vnode. All 9 filesystems that hold an extra reference on the root vnode got the logic wrong in the case of forced unmounts, so `umount -f' would always fail if there were any extra root vnode references. Fix this issue centrally in vflush(), now that we can. This commit also fixes a vnode reference leak in devfs, which could result in idle devfs filesystems that refuse to unmount. Reviewed by: phk, bp
|
#
a62615e5 |
|
01-May-2001 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Implement vop_std{get|put}pages() and add them to the default vop[]. Un-copy&paste all the VOP_{GET|PUT}PAGES() functions which do nothing but the default.
|
#
fb919e4d |
|
01-May-2001 |
Mark Murray <markm@FreeBSD.org> |
Undo part of the tangle of having sys/lock.h and sys/mutex.h included in other "system" header files. Also help the deprecation of lockmgr.h by making it a sub-include of sys/lock.h and removing sys/lockmgr.h form kernel .c files. Sort sys/*.h includes where possible in affected files. OK'ed by: bde (with reservations)
|
#
b7ebffbc |
|
29-Apr-2001 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Add a vop_stdbmap(), and make it part of the default vop vector. Make 7 filesystems which don't really know about VOP_BMAP rely on the default vector, rather than more or less complete local vop_nopbmap() implementations.
|
#
1690d305 |
|
22-Apr-2001 |
David E. O'Brien <obrien@FreeBSD.org> |
Removed old version of vaccess_acl_posix1e() that snuck back in rev 1.146. Submitted by (with good eye): Niels Chr. Bank-Pedersen <ncbp@bank-pedersen.dk>
|
#
ea88c01d |
|
21-Apr-2001 |
David E. O'Brien <obrien@FreeBSD.org> |
Style(9) fixes: * get rid of space (0x20) before tab (^I) * indent with ^I, not 0x20 * continuation line for prototypes is for 0x20's past function's name col. * etc.
|
#
759cb263 |
|
18-Apr-2001 |
Seigo Tanimura <tanimura@FreeBSD.org> |
Reclaim directory vnodes held in namecache if few free vnodes are available. Only directory vnodes holding no child directory vnodes held in v_cache_src are recycled, so that directory vnodes near the root of the filesystem hierarchy remain in namecache and directory vnodes are not reclaimed in cascade. The period of vnode reclaiming attempt and the number of vnodes attempted to reclaim can be tuned via sysctl(2). Suggested by: tegge Approved by: phk
|
#
f84e29a0 |
|
17-Apr-2001 |
Poul-Henning Kamp <phk@FreeBSD.org> |
This patch removes the VOP_BWRITE() vector. VOP_BWRITE() was a hack which made it possible for NFS client side to use struct buf with non-bio backing. This patch takes a more general approach and adds a bp->b_op vector where more methods can be added. The success of this patch depends on bp->b_op being initialized all relevant places for some value of "relevant" which is not easy to determine. For now the buffers have grown a b_magic element which will make such issues a tiny bit easier to debug.
|
#
b114e127 |
|
16-Apr-2001 |
Robert Watson <rwatson@FreeBSD.org> |
In my first reading of POSIX.1e, I misinterpreted handling of the ACL_USER_OBJ and ACL_GROUP_OBJ fields, believing that modification of the access ACL could be used by privileged processes to change file/directory ownership. In fact, this is incorrect; ACL_*_OBJ (+ ACL_MASK and ACL_OTHER) should have undefined ae_id fields; this commit attempts to correct that misunderstanding. o Modify arguments to vaccess_acl_posix1e() to accept the uid and gid associated with the vnode, as those can no longer be extracted from the ACL passed as an argument. Perform all comparisons against the passed arguments. This actually has the effect of simplifying a number of components of this call, as well as reducing the indent level, but now seperates handling of ACL_GROUP_OBJ from ACL_GROUP. o Modify acl_posix1e_check() to return EINVAL if the ae_id field of any of the ACL_{USER_OBJ,GROUP_OBJ,MASK,OTHER} entries is a value other than ACL_UNDEFINED_ID. As a temporary work-around to allow clean upgrades, set the ae_id field to ACL_UNDEFINED_ID before each check so that this cannot cause a failure in the short term (this work-around will be removed when the userland libraries and utilities are updated to take this change into account). o Modify ufs_sync_acl_from_inode() so that it forces ACL_{USER_OBJ,GROUP_OBJ,MASK,OTHER} ae_id fields to ACL_UNDEFINED_ID when synchronizing the ACL from the inode. o Modify ufs_sync_inode_from_acl to not propagate uid and gid information to the inode from the ACL during ACL update. Also modify the masking of permission bits that may be set from ALLPERMS to (S_IRWXU|S_IRWXG|S_IRWXO), as ACLs currently do not carry none-ACCESSPERMS (S_ISUID, S_ISGID, S_ISTXT). o Modify ufs_getacl() so that when it emulates an access ACL from the inode, it initializes the ae_id fields to ACL_UNDEFINED_ID. o Clean up ufs_setacl() substantially since it is no longer possible to perform chown/chgrp operations using vop_setacl(), so all the access control for that can be eliminated. o Modify ufs_access() so that it passes owner uid and gid information into vaccess_acl_posix1e(). Pointed out by: jedger Obtained from: TrustedBSD Project
|
#
0fdabd3a |
|
13-Apr-2001 |
Boris Popov <bp@FreeBSD.org> |
Move VT_SMBFS definition to the proper place. Undefine VI_LOCK/VI_UNLOCK.
|
#
d67fe1bd |
|
28-Mar-2001 |
Dag-Erling Smørgrav <des@FreeBSD.org> |
Prepare for pseudofs.
|
#
30632071 |
|
18-Mar-2001 |
Robert Watson <rwatson@FreeBSD.org> |
o Rename "namespace" argument to "attrnamespace" as namespace is a C++ reserved word. Submitted by: jkh Obtained from: TrustedBSD Project
|
#
70f36851 |
|
14-Mar-2001 |
Robert Watson <rwatson@FreeBSD.org> |
o Change the API and ABI of the Extended Attribute kernel interfaces to introduce a new argument, "namespace", rather than relying on a first- character namespace indicator. This is in line with more recent thinking on EA interfaces on various mailing lists, including the posix1e, Linux acl-devel, and trustedbsd-discuss forums. Two namespaces are defined by default, EXTATTR_NAMESPACE_SYSTEM and EXTATTR_NAMESPACE_USER, where the primary distinction lies in the access control model: user EAs are accessible based on the normal MAC and DAC file/directory protections, and system attributes are limited to kernel-originated or appropriately privileged userland requests. o These API changes occur at several levels: the namespace argument is introduced in the extattr_{get,set}_file() system call interfaces, at the vnode operation level in the vop_{get,set}extattr() interfaces, and in the UFS extended attribute implementation. Changes are also introduced in the VFS extattrctl() interface (system call, VFS, and UFS implementation), where the arguments are modified to include a namespace field, as well as modified to advoid direct access to userspace variables from below the VFS layer (in the style of recent changes to mount by adrian@FreeBSD.org). This required some cleanup and bug fixing regarding VFS locks and the VFS interface, as a vnode pointer may now be optionally submitted to the VFS_EXTATTRCTL() call. Updated documentation for the VFS interface will be committed shortly. o In the near future, the auto-starting feature will be updated to search two sub-directories to the ".attribute" directory in appropriate file systems: "user" and "system" to locate attributes intended for those namespaces, as the single filename is no longer sufficient to indicate what namespace the attribute is intended for. Until this is committed, all attributes auto-started by UFS will be placed in the EXTATTR_NAMESPACE_SYSTEM namespace. o The default POSIX.1e attribute names for ACLs and Capabilities have been updated to no longer include the '$' in their filename. As such, if you're using these features, you'll need to rename the attribute backing files to the same names without '$' symbols in front. o Note that these changes will require changes in userland, which will be committed shortly. These include modifications to the extended attribute utilities, as well as to libutil for new namespace string conversion routines. Once the matching userland changes are committed, a buildworld is recommended to update all the necessary include files and verify that the kernel and userland environments are in sync. Note: If you do not use extended attributes (most people won't), upgrading is not imperative although since the system call API has changed, the new userland extended attribute code will no longer compile with old include files. o Couple of minor cleanups while I'm there: make more code compilation conditional on FFS_EXTATTR, which should recover a bit of space on kernels running without EA's, as well as update copyright dates. Obtained from: TrustedBSD Project
|
#
5293465f |
|
06-Mar-2001 |
Robert Watson <rwatson@FreeBSD.org> |
o Introduce filesystem-independent POSIX.1e ACL utility routines to support implementations of ACLs in file systems. Introduce the following new functions: vaccess_acl_posix1e() vaccess() that accepts an ACL acl_posix1e_mode_to_perm() Convert mode bits to ACL rights acl_posix1e_mode_to_entry() Build ACL entry from mode/uid/gid acl_posix1e_perms_to_mode() Generate file mode from ACL acl_posix1e_check() Syntax verification for ACL These functions allow a file system to rely on central ACL evaluation and syntax checking, as well as providing useful utilities to allow ACL-based file systems to generate mode/owner/etc information to return via VOP_GETATTR(), and to support file systems that split their ACL information over their existing inode storage (mode, uid, gid) and extended ACL into extended attributes (additional users, groups, ACL mask). o Add prototypes for exported functions to sys/acl.h, sys/vnode.h Reviewed by: trustedbsd-discuss, freebsd-arch Obtained from: TrustedBSD Project
|
#
dfe53f15 |
|
21-Feb-2001 |
Boris Popov <bp@FreeBSD.org> |
Add VI_LOCK(), VI_TRYLOCK() and VI_UNLOCK() macros to isolate implementation details of v_interlock. Reviewed by: jhb, phk, arch@
|
#
1b367556 |
|
23-Jan-2001 |
Jason Evans <jasone@FreeBSD.org> |
Convert all simplelocks to mutexes and remove the simplelock implementations.
|
#
0a2c3d48 |
|
08-Jan-2001 |
Garrett Wollman <wollman@FreeBSD.org> |
select() DKI is now in <sys/selinfo.h>.
|
#
2b6b0df7 |
|
26-Dec-2000 |
Matthew Dillon <dillon@FreeBSD.org> |
This implements a better launder limiting solution. There was a solution in 4.2-REL which I ripped out in -stable and -current when implementing the low-memory handling solution. However, maxlaunder turns out to be the saving grace in certain very heavily loaded systems (e.g. newsreader box). The new algorithm limits the number of pages laundered in the first pageout daemon pass. If that is not sufficient then suceessive will be run without any limit. Write I/O is now pipelined using two sysctls, vfs.lorunningspace and vfs.hirunningspace. This prevents excessive buffered writes in the disk queues which cause long (multi-second) delays for reads. It leads to more stable (less jerky) and generally faster I/O streaming to disk by allowing required read ops (e.g. for indirect blocks and such) to occur without interrupting the write stream, amoung other things. NOTE: eventually, filesystem write I/O pipelining needs to be done on a per-device basis. At the moment it is globalized.
|
#
936524aa |
|
18-Nov-2000 |
Matthew Dillon <dillon@FreeBSD.org> |
Implement a low-memory deadlock solution. Removed most of the hacks that were trying to deal with low-memory situations prior to now. The new code is based on the concept that I/O must be able to function in a low memory situation. All major modules related to I/O (except networking) have been adjusted to allow allocation out of the system reserve memory pool. These modules now detect a low memory situation but rather then block they instead continue to operate, then return resources to the memory pool instead of cache them or leave them wired. Code has been added to stall in a low-memory situation prior to a vnode being locked. Thus situations where a process blocks in a low-memory condition while holding a locked vnode have been reduced to near nothing. Not only will I/O continue to operate, but many prior deadlock conditions simply no longer exist. Implement a number of VFS/BIO fixes (found by Ian): in biodone(), bogus-page replacement code, the loop was not properly incrementing loop variables prior to a continue statement. We do not believe this code can be hit anyway but we aren't taking any chances. We'll turn the whole section into a panic (as it already is in brelse()) after the release is rolled. In biodone(), the foff calculation was incorrectly clamped to the iosize, causing the wrong foff to be calculated for pages in the case of an I/O error or biodone() called without initiating I/O. The problem always caused a panic before. Now it doesn't. The problem is mainly an issue with NFS. Fixed casts for ~PAGE_MASK. This code worked properly before only because the calculations use signed arithmatic. Better to properly extend PAGE_MASK first before inverting it for the 64 bit masking op. In brelse(), the bogus_page fixup code was improperly throwing away the original contents of 'm' when it did the j-loop to fix the bogus pages. The result was that it would potentially invalidate parts of the *WRONG* page(!), leading to corruption. There may still be cases where a background bitmap write is being duplicated, causing potential corruption. We have identified a potentially serious bug related to this but the fix is still TBD. So instead this patch contains a KASSERT to detect the problem and panic the machine rather then continue to corrupt the filesystem. The problem does not occur very often.. it is very hard to reproduce, and it may or may not be the cause of the corruption people have reported. Review by: (VFS/BIO: mckusick, Ian Dowse <iedowse@maths.tcd.ie>) Testing by: (VM/Deadlock) Paul Saab <ps@yahoo-inc.com>
|
#
35e0e5b3 |
|
20-Oct-2000 |
John Baldwin <jhb@FreeBSD.org> |
Catch up to moving headers: - machine/ipl.h -> sys/ipl.h - machine/mutex.h -> sys/mutex.h
|
#
47460a23 |
|
19-Oct-2000 |
Robert Watson <rwatson@FreeBSD.org> |
o Introduce new VOP_ACCESS() flag VADMIN, allowing file systems to perform "administrative" authorization checks. In most cases, the VADMIN test checks to make sure the credential effective uid is the same as the file owner. o Modify vaccess() to set VADMIN as an available right if the uid is appropriate. o Modify references to uid-based access control operations such that they now always invoke VOP_ACCESS() instead of using hard-coded policy checks. o This allows alternative UFS policies to be implemented by replacing only ufs_access() (such as mandatory system policies). o VOP_ACCESS() requires the caller to hold an exclusive vnode lock on the vnode: I believe that new invocations of VOP_ACCESS() are always called with the lock held. o Some direct checks of the uid remain, largely associated with the QUOTA and SUIDDIR code. Reviewed by: eivind Obtained from: TrustedBSD Project
|
#
a18b1f1d |
|
03-Oct-2000 |
Jason Evans <jasone@FreeBSD.org> |
Convert lockmgr locks from using simple locks to using mutexes. Add lockdestroy() and appropriate invocations, which corresponds to lockinit() and must be called to clean up after a lockmgr lock is no longer needed.
|
#
67e87166 |
|
25-Sep-2000 |
Boris Popov <bp@FreeBSD.org> |
Add a lock structure to vnode structure. Previously it was either allocated separately (nfs, cd9660 etc) or keept as a first element of structure referenced by v_data pointer(ffs). Such organization leads to known problems with stacked filesystems. From this point vop_no*lock*() functions maintain only interlock lock. vop_std*lock*() functions maintain built-in v_lock structure using lockmgr(). vop_sharedlock() is compatible with vop_stdunlock(), but maintains a shared lock on vnode. If filesystem wishes to export lockmgr compatible lock, it can put an address of this lock to v_vnlock field. This indicates that the upper filesystem can take advantage of it and use single lock structure for entire (or part) of stack of vnodes. This field shouldn't be examined or modified by VFS code except for initialization purposes. Reviewed in general by: mckusick
|
#
100d2c18 |
|
22-Sep-2000 |
Robert Watson <rwatson@FreeBSD.org> |
o Introduce vn_extattr_rm(), a helper function in the style of vn_extattr_get() and vn_extattr_set(). vn_extattr_rm() removes the specified extended attribute from a vnode, authorizing the change as the kernel (NULL cred). Obtained from: TrustedBSD Project
|
#
210a5403 |
|
22-Sep-2000 |
Eivind Eklund <eivind@FreeBSD.org> |
Remove addalias() prototype (staticized in kern/vfs_subr.c)
|
#
9ff5ce6b |
|
12-Sep-2000 |
Boris Popov <bp@FreeBSD.org> |
Add three new VOPs: VOP_CREATEVOBJECT, VOP_DESTROYVOBJECT and VOP_GETVOBJECT. They will be used by nullfs and other stacked filesystems to support full cache coherency. Reviewed in general by: mckusick, dillon
|
#
64dc16df |
|
05-Sep-2000 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Move extern declaration of dead_vnodeop_p to a .h file. Remove race condition in vn_isdisk().
|
#
012c643d |
|
29-Aug-2000 |
Robert Watson <rwatson@FreeBSD.org> |
o Restructure vaccess() so as to check for DAC permission to modify the object before falling back on privilege. Make vaccess() accept an additional optional argument, privused, to determine whether privilege was required for vaccess() to return 0. Add commented out capability checks for reference. Rename some variables to make it more clear which modes/uids/etc are associated with the object, and which with the access mode. o Update file system use of vaccess() to pass NULL as the optional privused argument. Once additional patches are applied, suser() will no longer set ASU, so privused will permit passing of privilege information up the stack to the caller. Reviewed by: bde, green, phk, -security, others Obtained from: TrustedBSD Project
|
#
e39c53ed |
|
20-Aug-2000 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Centralize the canonical vop_access user/group/other check in vaccess(). Discussed with: bde
|
#
39f70682 |
|
18-Aug-2000 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Introduce vop_stdinactive() and make it the default if no vop_inactive is declared. Sort and prune a few vop_op[].
|
#
e6a9ab52 |
|
08-Aug-2000 |
Robert Watson <rwatson@FreeBSD.org> |
o Introduce vn_extattr_{get,set}, wrapper routines for VOP_GETEXTATTR and VOP_SETEXTATTR to simplify calling from in-kernel consumers, such as capability code. Both accept a vnode (optionally locked, with ioflg to indicate that), attribute name, and a buffer + buffer length in UIO_SYSSPACE. Both authorize the call as a kernel request, with cred set to NULL for the actual VOP_ calls. Obtained from: TrustedBSD Project
|
#
9b971133 |
|
23-Jul-2000 |
Kirk McKusick <mckusick@FreeBSD.org> |
This patch corrects the first round of panics and hangs reported with the new snapshot code. Update addaliasu to correctly implement the semantics of the old checkalias function. When a device vnode first comes into existence, check to see if an anonymous vnode for the same device was created at boot time by bdevvp(). If so, adopt the bdevvp vnode rather than creating a new vnode for the device. This corrects a problem which caused the kernel to panic when taking a snapshot of the root filesystem. Change the calling convention of vn_write_suspend_wait() to be the same as vn_start_write(). Split out softdep_flushworklist() from softdep_flushfiles() so that it can be used to clear the work queue when suspending filesystem operations. Access to buffers becomes recursive so that snapshots can recursively traverse their indirect blocks using ffs_copyonwrite() when checking for the need for copy on write when flushing one of their own indirect blocks. This eliminates a deadlock between the syncer daemon and a process taking a snapshot. Ensure that softdep_process_worklist() can never block because of a snapshot being taken. This eliminates a problem with buffer starvation. Cleanup change in ffs_sync() which did not synchronously wait when MNT_WAIT was specified. The result was an unclean filesystem panic when doing forcible unmount with heavy filesystem I/O in progress. Return a zero'ed block when reading a block that was not in use at the time that a snapshot was taken. Normally, these blocks should never be read. However, the readahead code will occationally read them which can cause unexpected behavior. Clean up the debugging code that ensures that no blocks be written on a filesystem while it is suspended. Snapshots must explicitly label the blocks that they are writing during the suspension so that they do not cause a `write on suspended filesystem' panic. Reorganize ffs_copyonwrite() to eliminate a deadlock and also to prevent a race condition that would permit the same block to be copied twice. This change eliminates an unexpected soft updates inconsistency in fsck caused by the double allocation. Use bqrelse rather than brelse for buffers that will be needed soon again by the snapshot code. This improves snapshot performance.
|
#
f2a2857b |
|
11-Jul-2000 |
Kirk McKusick <mckusick@FreeBSD.org> |
Add snapshots to the fast filesystem. Most of the changes support the gating of system calls that cause modifications to the underlying filesystem. The gating can be enabled by any filesystem that needs to consistently suspend operations by adding the vop_stdgetwritemount to their set of vnops. Once gating is enabled, the function vfs_write_suspend stops all new write operations to a filesystem, allows any filesystem modifying system calls already in progress to complete, then sync's the filesystem to disk and returns. The function vfs_write_resume allows the suspended write operations to begin again. Gating is not added by default for all filesystems as for SMP systems it adds two extra locks to such critical kernel paths as the write system call. Thus, gating should only be added as needed. Details on the use and current status of snapshots in FFS can be found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness is not included here. Unless and until you create a snapshot file, these changes should have no effect on your system (famous last words).
|
#
c904bbbd |
|
03-Jul-2000 |
Kirk McKusick <mckusick@FreeBSD.org> |
Simplify and rationalise the management of the vnode free list (preparing the code to add snapshots).
|
#
e6796b67 |
|
03-Jul-2000 |
Kirk McKusick <mckusick@FreeBSD.org> |
Move the truncation code out of vn_open and into the open system call after the acquisition of any advisory locks. This fix corrects a case in which a process tries to open a file with a non-blocking exclusive lock. Even if it fails to get the lock it would still truncate the file even though its open failed. With this change, the truncation is done only after the lock is successfully acquired. Obtained from: BSD/OS
|
#
e3975643 |
|
25-May-2000 |
Jake Burkholder <jake@FreeBSD.org> |
Back out the previous change to the queue(3) interface. It was not discussed and should probably not happen. Requested by: msmith and others
|
#
740a1973 |
|
23-May-2000 |
Jake Burkholder <jake@FreeBSD.org> |
Change the way that the queue(3) structures are declared; don't assume that the type argument to *_HEAD and *_ENTRY is a struct. Suggested by: phk Reviewed by: phk Approved by: mdodd
|
#
b7db1901 |
|
26-Apr-2000 |
Brian Feldman <green@FreeBSD.org> |
Move procfs_fullpath() to vfs_cache.c, with a rename to textvp_fullpath(). There's no excuse to have code in synthetic filestores that allows direct references to the textvp anymore. Feature requested by: msmith Feature agreed to by: warner Move requested by: phk Move agreed to by: bde
|
#
8a2852b1 |
|
21-Apr-2000 |
Brian Feldman <green@FreeBSD.org> |
Move the declaration of "struct namecache" to vnode.h, as it can be useful elsewhere. Note, of course, that in an ideal world nothing should need to see our VFS implementation :-/
|
#
e4649cfa |
|
01-Apr-2000 |
Matthew Dillon <dillon@FreeBSD.org> |
Change the write-behind code to take more care when starting async I/O's. The sequential read heuristic has been extended to cover writes as well. We continue to call cluster_write() normally, thus blocks in the file will still be reallocated for large (but still random) I/O's, but I/O will only be initiated for truely sequential writes. This solves a number of annoying situations, especially with DBM (hash method) writes, and also has the side effect of fixing a number of (stupid) benchmarks. Reviewed-by: mckusick
|
#
b7a5f3ca |
|
02-Feb-2000 |
Robert Watson <rwatson@FreeBSD.org> |
Remove static qualifier from vgonel, as it is needed by the Arla folk outside of vfs_subr.c. Submitted by: Assar Westerlund <assar@sics.se> Reviewed by: rwatson Approved by: jkh
|
#
6cca21b1 |
|
15-Jan-2000 |
Boris Popov <bp@FreeBSD.org> |
Add VT_NWFS tag.
|
#
ba4ad1fc |
|
09-Jan-2000 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Give vn_isdisk() a second argument where it can return a suitable errno. Suggested by: bde
|
#
664a31e4 |
|
28-Dec-1999 |
Peter Wemm <peter@FreeBSD.org> |
Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL" is an application space macro and the applications are supposed to be free to use it as they please (but cannot). This is consistant with the other BSD's who made this change quite some time ago. More commits to come.
|
#
91f37dcb |
|
18-Dec-1999 |
Robert Watson <rwatson@FreeBSD.org> |
Second pass commit to introduce new ACL and Extended Attribute system calls, vnops, vfsops, both in /kern, and to individual file systems that require a vfsop_ array entry. Reviewed by: eivind
|
#
1e12157c |
|
12-Dec-1999 |
Alfred Perlstein <alfred@FreeBSD.org> |
explain that ioflags can be used to give read-ahead hints to the underlying filesystem.
|
#
6bdfe06a |
|
11-Dec-1999 |
Eivind Eklund <eivind@FreeBSD.org> |
Lock reporting and assertion changes. * lockstatus() and VOP_ISLOCKED() gets a new process argument and a new return value: LK_EXCLOTHER, when the lock is held exclusively by another process. * The ASSERT_VOP_(UN)LOCKED family is extended to use what this gives them * Extend the vnode_if.src format to allow more exact specification than locked/unlocked. This commit should not do any semantic changes unless you are using DEBUG_VFS_LOCKS. Discussed with: grog, mch, peter, phk Reviewed by: peter
|
#
0026ddba |
|
09-Dec-1999 |
Semen Ustimenko <semenu@FreeBSD.org> |
Added VT_HPFS vnode type.
|
#
aa4f4b69 |
|
04-Oct-1999 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Move the buffered read/write code out of spec_{read|write} and into two new functions spec_buf{read|write}. Add sysctl vfs.bdev_buffered which defaults to 1 == true. This sysctl can be used to experimentally turn buffered behaviour for bdevs off. I should not be changed while any blockdevices are open. Remove the misplaced sysctl vfs.enable_userblk_io. No other changes in behaviour.
|
#
1b5464ef |
|
29-Sep-1999 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Remove v_maxio from struct vnode. Replace it with mnt_iosize_max in struct mount. Nits from: bde
|
#
40360b1b |
|
20-Sep-1999 |
Matthew Dillon <dillon@FreeBSD.org> |
Final commit to remove vnode->v_lastr. vm_fault now handles read clustering issues (replacing code that used to be in ufs/ufs/ufs_readwrite.c). vm_fault also now uses the new VM page counter inlines. This completes the changeover from vnode->v_lastr to vm_entry_t->v_lastr for VM, and fp->f_nextread and fp->f_seqcount (which have been in the tree for a while). Determination of the I/O strategy (sequential, random, and so forth) is now handled on a descriptor-by-descriptor basis for base I/O calls, and on a memory-region-by-memory-region and process-by-process basis for VM faults. Reviewed by: David Greenman <dg@root.com>, Alan Cox <alc@cs.rice.edu>
|
#
bb01f28e |
|
17-Sep-1999 |
Matthew Dillon <dillon@FreeBSD.org> |
Add vfs.enable_userblk_io sysctl to control whether user reads and writes to buffered block devices are allowed. The default is to be backwards compatible, i.e. reads and writes are allowed. The idea is for a larger crowd to start running with this disabled and see what problems, if any, crop up, and then to change the default to off and see if any problems crop up in the next 6 months prior to potentially removing support entirely. There are still a few people, Julian and myself included, who believe the buffered block device access from usermode to be useful. Remove use of vnode->v_lastr from buffered block device I/O in preparation for removal of vnode->v_lastr field, replacing it with the already existing seqcount metric to detect sequential operation. Reviewed by: Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com>
|
#
c3aac50f |
|
27-Aug-1999 |
Peter Wemm <peter@FreeBSD.org> |
$Id$ -> $FreeBSD$
|
#
dbafb366 |
|
26-Aug-1999 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Simplify the handling of VCHR and VBLK vnodes using the new dev_t: Make the alias list a SLIST. Drop the "fast recycling" optimization of vnodes (including the returning of a prexisting but stale vnode from checkalias). It doesn't buy us anything now that we don't hardlimit vnodes anymore. Rename checkalias2() and checkalias() to addalias() and addaliasu() - which takes dev_t and udev_t arg respectively. Make the revoke syscalls use vcount() instead of VALIASED. Remove VALIASED flag, we don't need it now and it is faster to traverse the much shorter lists than to maintain the flag. vfs_mountedon() can check the dev_t directly, all the vnodes point to the same one. Print the devicename in specfs/vprint(). Remove a couple of stale LFS vnode flags. Remove unimplemented/unused LK_DRAINED;
|
#
41d2e3e0 |
|
24-Aug-1999 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Introduce vn_isdisk(struct vnode *vp) function, and use it to test for diskness.
|
#
0ff7b13a |
|
24-Aug-1999 |
Julian Elischer <julian@FreeBSD.org> |
Make DEVFS use PHK's specinfo struct as the source of dev_t and devsw. In lookup() however it's the other way around as we need to supply the dev_t for the vnode, so devfs still has a copy of it stashed away. Sourcing it from the vnode in the vnops however is useful as it makes a lot of the code almost the same as that in specfs.
|
#
a2801b77 |
|
21-Aug-1999 |
John Polstra <jdp@FreeBSD.org> |
Support full-precision file timestamps. Until now, only the seconds have been maintained, and that is still the default. A new sysctl variable "vfs.timestamp_precision" can be used to enable higher levels of precision: 0 = seconds only; nanoseconds zeroed (default). 1 = seconds and nanoseconds, accurate within 1/HZ. 2 = seconds and nanoseconds, truncated to microseconds. >=3 = seconds and nanoseconds, maximum precision. Level 1 uses getnanotime(), which is fast but can be wrong by up to 1/HZ. Level 2 uses microtime(). It might be desirable for consistency with utimes() and friends, which take timeval structures rather than timespecs. Level 3 uses nanotime() for the higest precision. I benchmarked levels 0, 1, and 3 by copying a 550 MB tree with "cpio -pdu". There was almost negligible difference in the system times -- much less than 1%, and less than the variation among multiple runs at the same level. Bruce Evans dreamed up a torture test involving 1-byte reads with intervening fstat() calls, but the cpio test seems more realistic to me. This feature is currently implemented only for the UFS (FFS and MFS) filesystems. But I think it should be easy to support it in the others as well. An earlier version of this was reviewed by Bruce. He's not to blame for any breakage I've introduced since then. Reviewed by: bde (an earlier version of the code)
|
#
4d4f9323 |
|
13-Aug-1999 |
Poul-Henning Kamp <phk@FreeBSD.org> |
s/v_specinfo/v_rdev/
|
#
0ef1c826 |
|
08-Aug-1999 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Decommision miscfs/specfs/specdev.h. Most of it goes into <sys/conf.h>, a few lines into <sys/vnode.h>. Add a few fields to struct specinfo, paving the way for the fun part.
|
#
67452993 |
|
26-Jul-1999 |
Alan Cox <alc@FreeBSD.org> |
Add sysctl and support code to allow directories to be VMIO'd. The default setting for the sysctl is OFF, which is the historical operation. Submitted by: dillon
|
#
698bfad7 |
|
20-Jul-1999 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Now a dev_t is a pointer to struct specinfo which is shared by all specdev vnodes referencing this device. Details: cdevsw->d_parms has been removed, the specinfo is available now (== dev_t) and the driver should modify it directly when applicable, and the only driver doing so, does so: vn.c. I am not sure the logic in checking for "<" was right before, and it looks even less so now. An intial pool of 50 struct specinfo are depleted during early boot, after that malloc had better work. It is likely that fewer than 50 would do. Hashing is done from udev_t to dev_t with a prime number remainder hash, experiments show no better hash available for decent cost (MD5 is only marginally better) The prime number used should not be close to a power of two, we use 83 for now. Add new checkalias2() to get around the loss of info from dev2udev() in bdevvp(); The aliased vnodes are hung on a list straight of the dev_t, and speclisth[SPECSZ] is unused. The sharing of struct specinfo means that the v_specnext moves into the vnode which grows by 4 bytes. Don't use a VBLK dev_t which doesn't make sense in MFS, now we hang a dummy cdevsw on B/Cmaj 253 so that things look sane. Storage overhead from all of this is O(50k). Bump __FreeBSD_version to 400009 The next step will add the stuff needed so device-drivers can start to hang things from struct specinfo
|
#
6ca54864 |
|
18-Jul-1999 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Introduce the vn_todev(struct vnode*) function, which returns the dev_t corresponding to a VBLK or VCHR node, or NODEV.
|
#
b008f92e |
|
28-Jun-1999 |
Poul-Henning Kamp <phk@FreeBSD.org> |
make va_fsid be of type udev_t
|
#
e4ab40bc |
|
15-Jun-1999 |
Kirk McKusick <mckusick@FreeBSD.org> |
Get rid of the global variable rushjob and replace it with a function in kern/vfs_subr.c named speedup_syncer() which handles the speedup request. Change the various clients of rushjob to use the new function.
|
#
bfbb9ce6 |
|
11-May-1999 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Divorce "dev_t" from the "major|minor" bitmap, which is now called udev_t in the kernel but still called dev_t in userland. Provide functions to manipulate both types: major() umajor() minor() uminor() makedev() umakedev() dev2udev() udev2dev() For now they're functions, they will become in-line functions after one of the next two steps in this process. Return major/minor/makedev to macro-hood for userland. Register a name in cdevsw[] for the "filedescriptor" driver. In the kernel the udev_t appears in places where we have the major/minor number combination, (ie: a potential device: we may not have the driver nor the device), like in inodes, vattr, cdevsw registration and so on, whereas the dev_t appears where we carry around a reference to a actual device. In the future the cdevsw and the aliased-from vnode will be hung directly from the dev_t, along with up to two softc pointers for the device driver and a few houskeeping bits. This will essentially replace the current "alias" check code (same buck, bigger bang). A little stunt has been provided to try to catch places where the wrong type is being used (dev_t vs udev_t), if you see something not working, #undef DEVT_FASCIST in kern/kern_conf.c and see if it makes a difference. If it does, please try to track it down (many hands make light work) or at least try to reproduce it as simply as possible, and describe how to do that. Without DEVT_FASCIST I belive this patch is a no-op. Stylistic/posixoid comments about the userland view of the <sys/*.h> files welcome now, from userland they now contain the end result. Next planned step: make all dev_t's refer to the same devsw[] which means convert BLK's to CHR's at the perimeter of the vnodes and other places where they enter the game (bootdev, mknod, sysctl).
|
#
e9189611 |
|
17-Apr-1999 |
Peter Wemm <peter@FreeBSD.org> |
Well folks, this is it - The second stage of the removal for build support for LKM's..
|
#
cd0fb97e |
|
19-Feb-1999 |
Matthew Dillon <dillon@FreeBSD.org> |
Make worklist add function a static, remove from sys/vnode.h
|
#
52e93a35 |
|
02-Feb-1999 |
Semen Ustimenko <semenu@FreeBSD.org> |
Added vnode tag for NTFS. Reviewed by: David O'Brien <obrien@NUXI.com>
|
#
5a24726b |
|
28-Jan-1999 |
Matthew Dillon <dillon@FreeBSD.org> |
Clarify the SYSINIT problem by breaking SYSINIT's up into a void * version and a const void * version. Currently the const void * version simply calls the void * version ( i.e. no 'fix' is in place ). A solution needs to be found for the C_SYSINIT ( etc...) family of macros that allows const void * without generating a warning, but does not allow non-const void *.
|
#
8aef1712 |
|
27-Jan-1999 |
Matthew Dillon <dillon@FreeBSD.org> |
Fix warnings in preparation for adding -Wall -Wcast-qual to the kernel compile
|
#
d254af07 |
|
27-Jan-1999 |
Matthew Dillon <dillon@FreeBSD.org> |
Fix warnings in preparation for adding -Wall -Wcast-qual to the kernel compile
|
#
15a1057c |
|
20-Jan-1999 |
Eivind Eklund <eivind@FreeBSD.org> |
Add 'options DEBUG_LOCKS', which stores extra information in struct lock, and add some macros and function parameters to make sure that the information get to the point where it can be put in the lock structure. While I'm here, add DEBUG_VFS_LOCKS to LINT.
|
#
fb116777 |
|
05-Jan-1999 |
Eivind Eklund <eivind@FreeBSD.org> |
Remove the 'waslocked' parameter to vfs_object_create().
|
#
4e61198e |
|
10-Nov-1998 |
Peter Wemm <peter@FreeBSD.org> |
Make the vnode opv vector construction fully dynamic. Previously we leaked memory on each unload and were limited to items referenced in the kernel copy of vnode_if.c. Now a kernel module is free to create it's own VOP_FOO() routines and the rest of the system will happily deal with it, including passthrough layers like union/umap/etc. Have VFS_SET() call a common vfs_modevent() handler rather than inline duplicating the common code all over the place. Have VNODEOP_SET() have the vnodeops removed at unload time (assuming a module) so that the vop_t ** vector is reclaimed. Slightly adjust the vop_t ** vectors so that calling slot 0 is a panic rather than a page fault. This could happen if VOP_something() was called without *any* handlers being present anywhere (including in vfs_default.c). slot 1 becomes the default vector for the vnodeop table. TODO: reclaim zones on unload (eg: nfs code)
|
#
630ff663 |
|
31-Oct-1998 |
Peter Wemm <peter@FreeBSD.org> |
Convert the vnode clean/dirty attached buffer lists from LISTs to TAILQs. Add a new flags field (we get this for free because of struct packing) for cleaner management of tailq membership. We had two spare b_flags slots, but they are a precious resource and may be needed for other things that are related to other b_flags bits. The two new flags are convenient to use in a seperate location. Reviewed (in principle) by: dg Obtained from: John Dyson's old work-in-progress
|
#
20f02ef5 |
|
29-Oct-1998 |
Peter Wemm <peter@FreeBSD.org> |
Remove the V_SAVEMETA flag, nothing uses it any more now that msdosfs and ext2fs call vtruncbuf() directly. This simplifies and cleans up vinvalbuf() a little.
|
#
aa855a59 |
|
15-Oct-1998 |
Peter Wemm <peter@FreeBSD.org> |
*gulp*. Jordan specifically OK'ed this.. This is the bulk of the support for doing kld modules. Two linker_sets were replaced by SYSINIT()'s. VFS's and exec handlers are self registered. kld is now a superset of lkm. I have converted most of them, they will follow as a seperate commit as samples. This all still works as a static a.out kernel using LKM's.
|
#
9afcea2f |
|
11-Sep-1998 |
Robert V. Baron <rvb@FreeBSD.org> |
All the references to cfs, in symbols, structs, and strings have been changed to coda. (Same for CFS.)
|
#
a58915fc |
|
09-Sep-1998 |
Tor Egge <tegge@FreeBSD.org> |
Don't keep the underlying directory locked while performing the file system specific VFS_MOUNT operation. PR: 1067
|
#
4c162420 |
|
26-Aug-1998 |
Jordan K. Hubbard <jkh@FreeBSD.org> |
Add VT_CFS type. Submitted by: Robert Baron <rvb@sicily.odyssey.cs.cmu.edu>
|
#
1f562172 |
|
10-May-1998 |
John Dyson <dyson@FreeBSD.org> |
Fix the futimes/undelete/utrace conflict with other BSD's. Note that the only common usage of utrace (the possible problem with this commit) is with malloc, so this should be a real problem. Add the various NetBSD syscalls that allow full emulation of their development environment.
|
#
08637435 |
|
28-Mar-1998 |
Bruce Evans <bde@FreeBSD.org> |
Moved some #includes from <sys/param.h> nearer to where they are actually used.
|
#
bef608bd |
|
15-Mar-1998 |
John Dyson <dyson@FreeBSD.org> |
Some VM improvements, including elimination of alot of Sig-11 problems. Tor Egge and others have helped with various VM bugs lately, but don't blame him -- blame me!!! pmap.c: 1) Create an object for kernel page table allocations. This fixes a bogus allocation method previously used for such, by grabbing pages from the kernel object, using bogus pindexes. (This was a code cleanup, and perhaps a minor system stability issue.) pmap.c: 2) Pre-set the modify and accessed bits when prudent. This will decrease bus traffic under certain circumstances. vfs_bio.c, vfs_cluster.c: 3) Rather than calculating the beginning virtual byte offset multiple times, stick the offset into the buffer header, so that the calculated offset can be reused. (Long long multiplies are often expensive, and this is a probably unmeasurable performance improvement, and code cleanup.) vfs_bio.c: 4) Handle write recursion more intelligently (but not perfectly) so that it is less likely to cause a system panic, and is also much more robust. vfs_bio.c: 5) getblk incorrectly wrote out blocks that are incorrectly sized. The problem is fixed, and writes blocks out ONLY when B_DELWRI is true. vfs_bio.c: 6) Check that already constituted buffers have fully valid pages. If not, then make sure that the B_CACHE bit is not set. (This was a major source of Sig-11 type problems.) vfs_bio.c: 7) Fix a potential system deadlock due to an incorrectly specified sleep priority while waiting for a buffer write operation. The change that I made opens the system up to serious problems, and we need to examine the issue of process sleep priorities. vfs_cluster.c, vfs_bio.c: 8) Make clustered reads work more correctly (and more completely) when buffers are already constituted, but not fully valid. (This was another system reliability issue.) vfs_subr.c, ffs_inode.c: 9) Create a vtruncbuf function, which is used by filesystems that can truncate files. The vinvalbuf forced a file sync type operation, while vtruncbuf only invalidates the buffers past the new end of file, and also invalidates the appropriate pages. (This was a system reliabiliy and performance issue.) 10) Modify FFS to use vtruncbuf. vm_object.c: 11) Make the object rundown mechanism for OBJT_VNODE type objects work more correctly. Included in that fix, create pager entries for the OBJT_DEAD pager type, so that paging requests that might slip in during race conditions are properly handled. (This was a system reliability issue.) vm_page.c: 12) Make some of the page validation routines be a little less picky about arguments passed to them. Also, support page invalidation change the object generation count so that we handle generation counts a little more robustly. vm_pageout.c: 13) Further reduce pageout daemon activity when the system doesn't need help from it. There should be no additional performance decrease even when the pageout daemon is running. (This was a significant performance issue.) vnode_pager.c: 14) Teach the vnode pager to handle race conditions during vnode deallocations.
|
#
b1897c19 |
|
08-Mar-1998 |
Julian Elischer <julian@FreeBSD.org> |
Reviewed by: dyson@freebsd.org (john Dyson), dg@root.com (david greenman) Submitted by: Kirk McKusick (mcKusick@mckusick.com) Obtained from: WHistle development tree
|
#
8f9110f6 |
|
07-Mar-1998 |
John Dyson <dyson@FreeBSD.org> |
This mega-commit is meant to fix numerous interrelated problems. There has been some bitrot and incorrect assumptions in the vfs_bio code. These problems have manifest themselves worse on NFS type filesystems, but can still affect local filesystems under certain circumstances. Most of the problems have involved mmap consistancy, and as a side-effect broke the vfs.ioopt code. This code might have been committed seperately, but almost everything is interrelated. 1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that are fully valid. 2) Rather than deactivating erroneously read initial (header) pages in kern_exec, we now free them. 3) Fix the rundown of non-VMIO buffers that are in an inconsistent (missing vp) state. 4) Fix the disassociation of pages from buffers in brelse. The previous code had rotted and was faulty in a couple of important circumstances. 5) Remove a gratuitious buffer wakeup in vfs_vmio_release. 6) Remove a crufty and currently unused cluster mechanism for VBLK files in vfs_bio_awrite. When the code is functional, I'll add back a cleaner version. 7) The page busy count wakeups assocated with the buffer cache usage were incorrectly cleaned up in a previous commit by me. Revert to the original, correct version, but with a cleaner implementation. 8) The cluster read code now tries to keep data associated with buffers more aggressively (without breaking the heuristics) when it is presumed that the read data (buffers) will be soon needed. 9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The delay loop waiting is not useful for filesystem locks, due to the length of the time intervals. 10) Correct and clean-up spec_getpages. 11) Implement a fully functional nfs_getpages, nfs_putpages. 12) Fix nfs_write so that modifications are coherent with the NFS data on the server disk (at least as well as NFS seems to allow.) 13) Properly support MS_INVALIDATE on NFS. 14) Properly pass down MS_INVALIDATE to lower levels of the VM code from vm_map_clean. 15) Better support the notion of pages being busy but valid, so that fewer in-transit waits occur. (use p->busy more for pageouts instead of PG_BUSY.) Since the page is fully valid, it is still usable for reads. 16) It is possible (in error) for cached pages to be busy. Make the page allocation code handle that case correctly. (It should probably be a printf or panic, but I want the system to handle coding errors robustly. I'll probably add a printf.) 17) Correct the design and usage of vm_page_sleep. It didn't handle consistancy problems very well, so make the design a little less lofty. After vm_page_sleep, if it ever blocked, it is still important to relookup the page (if the object generation count changed), and verify it's status (always.) 18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up. 19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush. 20) Fix vm_pager_put_pages and it's descendents to support an int flag instead of a boolean, so that we can pass down the invalidate bit.
|
#
50ce7ff4 |
|
23-Jan-1998 |
John Dyson <dyson@FreeBSD.org> |
Add better support for larger I/O clusters, including larger physical I/O. The support is not mature yet, and some of the underlying implementation needs help. However, support does exist for IDE devices now.
|
#
47221757 |
|
17-Jan-1998 |
John Dyson <dyson@FreeBSD.org> |
Tie up some loose ends in vnode/object management. Remove an unneeded config option in pmap. Fix a problem with faulting in pages. Clean-up some loose ends in swap pager memory management. The system should be much more stable, but all subtile bugs aren't fixed yet.
|
#
925a3a41 |
|
11-Jan-1998 |
John Dyson <dyson@FreeBSD.org> |
Fix some vnode management problems, and better mgmt of vnode free list. Fix the UIO optimization code. Fix an assumption in vm_map_insert regarding allocation of swap pagers. Fix an spl problem in the collapse handling in vm_object_deallocate. When pages are freed from vnode objects, and the criteria for putting the associated vnode onto the free list is reached, either put the vnode onto the list, or put it onto an interrupt safe version of the list, for further transfer onto the actual free list. Some minor syntax changes changing pre-decs, pre-incs to post versions. Remove a bogus timeout (that I added for debugging) from vn_lock. PHK will likely still have problems with the vnode list management, and so do I, but it is better than it was.
|
#
95e5e988 |
|
05-Jan-1998 |
John Dyson <dyson@FreeBSD.org> |
Make our v_usecount vnode reference count work identically to the original BSD code. The association between the vnode and the vm_object no longer includes reference counts. The major difference is that vm_object's are no longer freed gratuitiously from the vnode, and so once an object is created for the vnode, it will last as long as the vnode does. When a vnode object reference count is incremented, then the underlying vnode reference count is incremented also. The two "objects" are now more intimately related, and so the interactions are now much less complex. When vnodes are now normally placed onto the free queue with an object still attached. The rundown of the object happens at vnode rundown time, and happens with exactly the same filesystem semantics of the original VFS code. There is absolutely no need for vnode_pager_uncache and other travesties like that anymore. A side-effect of these changes is that SMP locking should be much simpler, the I/O copyin/copyout optimizations work, NFS should be more ponderable, and further work on layered filesystems should be less frustrating, because of the totally coherent management of the vnode objects and vnodes. Please be careful with your system while running this code, but I would greatly appreciate feedback as soon a reasonably possible.
|
#
82dc3896 |
|
29-Dec-1997 |
John Dyson <dyson@FreeBSD.org> |
Add the vnode interlock back around vget.
|
#
60f8d464 |
|
28-Dec-1997 |
John Dyson <dyson@FreeBSD.org> |
Fix the decl of vfs_ioopt, allow LFS to compile again, fix a minor problem with the object cache removal.
|
#
2be70f79 |
|
28-Dec-1997 |
John Dyson <dyson@FreeBSD.org> |
Lots of improvements, including restructring the caching and management of vnodes and objects. There are some metadata performance improvements that come along with this. There are also a few prototypes added when the need is noticed. Changes include: 1) Cleaning up vref, vget. 2) Removal of the object cache. 3) Nuke vnode_pager_uncache and friends, because they aren't needed anymore. 4) Correct some missing LK_RETRY's in vn_lock. 5) Correct the page range in the code for msync. Be gentle, and please give me feedback asap.
|
#
1cbbd625 |
|
14-Dec-1997 |
Garrett Wollman <wollman@FreeBSD.org> |
Add support for poll(2) on files. vop_nopoll() now returns POLLNVAL if one of the new poll types is requested; hopefully this will not break any existing code. (This is done so that programs have a dependable way of determining whether a filesystem supports the extended poll types or not.) The new poll types added are: POLLWRITE - file contents may have been modified POLLNLINK - file was linked, unlinked, or renamed POLLATTRIB - file's attributes may have been changed POLLEXTEND - file was extended Note that the internal operation of poll() means that it is impossible for two processes to reliably poll for the same event (this could be fixed but may not be worth it), so it is not possible to rewrite `tail -f' to use poll at this time.
|
#
1cd52ec3 |
|
05-Dec-1997 |
Bruce Evans <bde@FreeBSD.org> |
Don't include <sys/lock.h> in headers when only `struct simplelock' is required. Fixed everything that depended on the pollution.
|
#
cb451ebd |
|
22-Nov-1997 |
Bruce Evans <bde@FreeBSD.org> |
Staticized.
|
#
f6fdfec4 |
|
18-Nov-1997 |
Bruce Evans <bde@FreeBSD.org> |
Don't #include <machine/smp.h> even in the SMP case. Fixed the one place that depended on it. The "bazillion warnings" mentioned in the log for rev.1.45 apparently aren't a problem any more. It is hard to be sure because the SIMPLELOCK_DEBUG option turns off (and breaks) things in the SMP case. Don't forward declare structs that are already implicitly forward declared. Fixed a disordered declaration.
|
#
dba3870c |
|
26-Oct-1997 |
Poul-Henning Kamp <phk@FreeBSD.org> |
VFS interior redecoration. Rename vn_default_error to vop_defaultop all over the place. Move vn_bwrite from vfs_bio.c to vfs_default.c and call it vop_stdbwrite. Use vop_null instead of nullop. Move vop_nopoll from vfs_subr.c to vfs_default.c Move vop_sharedlock from vfs_subr.c to vfs_default.c Move vop_nolock from vfs_subr.c to vfs_default.c Move vop_nounlock from vfs_subr.c to vfs_default.c Move vop_noislocked from vfs_subr.c to vfs_default.c Use vop_ebadf instead of *_ebadf. Add vop_defaultop for getpages on master vnode in MFS.
|
#
1b09ae77 |
|
26-Oct-1997 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Simplify the lease_check stuff.
|
#
d54d34b5 |
|
16-Oct-1997 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Make a set of VOP standard lock, unlock & islocked VOP operators, which depend on the lock being located at vp->v_data. Saves 3x3 identical vop procs, more as the other filesystems becomes lock aware.
|
#
987f5696 |
|
16-Oct-1997 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Another VFS cleanup "kilo commit" 1. Remove VOP_UPDATE, it is (also) an UFS/{FFS,LFS,EXT2FS,MFS} intereface function, and now lives in the ufsmount structure. 2. Remove VOP_SEEK, it was unused. 3. Add mode default vops: VOP_ADVLOCK vop_einval VOP_CLOSE vop_null VOP_FSYNC vop_null VOP_IOCTL vop_enotty VOP_MMAP vop_einval VOP_OPEN vop_null VOP_PATHCONF vop_einval VOP_READLINK vop_einval VOP_REALLOCBLKS vop_eopnotsupp And remove identical functionality from filesystems 4. Add vop_stdpathconf, which returns the canonical stuff. Use it in the filesystems. (XXX: It's probably wrong that specfs and fifofs sets this vop, shouldn't it come from the "host" filesystem, for instance ufs or cd9660 ?) 5. Try to make system wide VOP functions have vop_* names. 6. Initialize the um_* vectors in LFS. (Recompile your LKMS!!!)
|
#
cec0f20c |
|
16-Oct-1997 |
Poul-Henning Kamp <phk@FreeBSD.org> |
VFS mega cleanup commit (x/N) 1. Add new file "sys/kern/vfs_default.c" where default actions for VOPs go. Implement proper defaults for ABORTOP, BWRITE, LEASE, POLL, REVOKE and STRATEGY. Various stuff spread over the entire tree belongs here. 2. Change VOP_BLKATOFF to a normal function in cd9660. 3. Kill VOP_BLKATOFF, VOP_TRUNCATE, VOP_VFREE, VOP_VALLOC. These are private interface functions between UFS and the underlying storage manager layer (FFS/LFS/MFS/EXT2FS). The functions now live in struct ufsmount instead. 4. Remove a kludge of VOP_ functions in all filesystems, that did nothing but obscure the simplicity and break the expandability. If a filesystem doesn't implement VOP_FOO, it shouldn't have an entry for it in its vnops table. The system will try to DTRT if it is not implemented. There are still some cruft left, but the bulk of it is done. 5. Fix another VCALL in vfs_cache.c (thanks Bruce!)
|
#
a1c995b6 |
|
12-Oct-1997 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Last major round (Unless Bruce thinks of somthing :-) of malloc changes. Distribute all but the most fundamental malloc types. This time I also remembered the trick to making things static: Put "static" in front of them. A couple of finer points by: bde
|
#
99448ed1 |
|
20-Sep-1997 |
John Dyson <dyson@FreeBSD.org> |
Change the M_NAMEI allocations to use the zone allocator. This change plus the previous changes to use the zone allocator decrease the useage of malloc by half. The Zone allocator will be upgradeable to be able to use per CPU-pools, and has more intelligent usage of SPLs. Additionally, it has reasonable stats gathering capabilities, while making most calls inline.
|
#
3a74593f |
|
13-Sep-1997 |
Peter Wemm <peter@FreeBSD.org> |
Update interfaces for poll()
|
#
a051452a |
|
31-Aug-1997 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Change the 0xdeadb hack to a flag called VDOOMED. Introduce VFREE which indicates that vnode is on freelist. Rename vholdrele() to vdrop(). Create vfree() and vbusy() to add/delete vnode from freelist. Add vfree()/vbusy() to keep (v_holdcnt != 0 || v_usecount != 0) vnodes off the freelist. Generalize vhold()/v_holdcnt to mean "do not recycle". Fix reassignbuf()s lack of use of vhold(). Use vhold() instead of checking v_cache_src list. Remove vtouch(), the vnodes are always vget'ed soon enough after for it to have any measuable effect. Add sysctl debug.freevnodes to keep track of things. Move cache_purge() up in getnewvnodes to avoid race. Decrement v_usecount after VOP_INACTIVE(), put a vhold() on it during VOP_INACTIVE() Unmacroize vhold()/vdrop() Print out VDOOMED and VFREE flags (XXX: should use %b) Reviewed by: dyson
|
#
0fa2443f |
|
26-Aug-1997 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Uncut&paste cache_lookup(). This unifies several times in theory indentical 50 lines of code. The filesystems have a new method: vop_cachedlookup, which is the meat of the lookup, and use vfs_cache_lookup() for their vop_lookup method. vfs_cache_lookup() will check the namecache and pass on to the vop_cachedlookup method in case of a miss. It's still the task of the individual filesystems to populate the namecache with cache_enter(). Filesystems that do not use the namecache will just provide the vop_lookup method as usual.
|
#
7cbfd031 |
|
17-Aug-1997 |
Steve Passe <fsmp@FreeBSD.org> |
Added includes of smp.h for SMP. This eliminates a bazillion warnings about implicit s_lock & friends.
|
#
b15a966e |
|
04-May-1997 |
Poul-Henning Kamp <phk@FreeBSD.org> |
1. Add a {pointer, v_id} pair to the vnode to store the reference to the ".." vnode. This is cheaper storagewise than keeping it in the namecache, and it makes more sense since it's a 1:1 mapping. 2. Also handle the case of "." more intelligently rather than stuff the namecache with pointless entries. 3. Add two lists to the vnode and hang namecache entries which go from or to this vnode. When cleaning a vnode, delete all namecache entries it invalidates. 4. Never reuse namecache enties, malloc new ones when we need it, free old ones when they die. No longer a hard limit on how many we can have. 5. Remove the upper limit on namelength of namecache entries. 6. Make a global list for negative namecache entries, limit their number to a sysctl'able (debug.ncnegfactor) fraction of the total namecache. Currently the default fraction is 1/16th. (Suggestions for better default wanted!) 7. Assign v_id correctly in the face of 32bit rollover. 8. Remove the LRU list for namecache entries, not needed. Remove the #ifdef NCH_STATISTICS stuff, it's not needed either. 9. Use the vnode freelist as a true LRU list, also for namecache accesses. 10. Reuse vnodes more aggresively but also more selectively, if we can't reuse, malloc a new one. There is no longer a hard limit on their number, they grow to the point where we don't reuse potentially usable vnodes. A vnode will not get recycled if still has pages in core or if it is the source of namecache entries (Yes, this does indeed work :-) "." and ".." are not namecache entries any longer...) 11. Do not overload the v_id field in namecache entries with whiteout information, use a char sized flags field instead, so we can get rid of the vpid and v_id fields from the namecache struct. Since we're linked to the vnodes and purged when they're cleaned, we don't have to check the v_id any more. 12. NFS knew about the limitation on name length in the namecache, it shouldn't and doesn't now. Bugs: The namecache statistics no longer includes the hits for ".." and "." hits. Performance impact: Generally in the +/- 0.5% for "normal" workstations, but I hope this will allow the system to be selftuning over a bigger range of "special" applications. The case where RAM is available but unused for cache because we don't have any vnodes should be gone. Future work: Straighten out the namecache statistics. "desiredvnodes" is still used to (bogusly ?) size hash tables in the filesystems. I have still to find a way to safely free unused vnodes back so their number can shrink when not needed. There is a few uses of the v_id field left in the filesystems, scheduled for demolition at a later time. Maybe a one slot cache for unused namecache entries should be implemented to decrease the malloc/free frequency.
|
#
754c6d37 |
|
04-Apr-1997 |
Doug Rabson <dfr@FreeBSD.org> |
Add some debugging macros for tracing VFS locking bugs. Declare (hopefully short-lived) vop_sharedlock.
|
#
6875d254 |
|
22-Feb-1997 |
Peter Wemm <peter@FreeBSD.org> |
Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.
|
#
670718e2 |
|
12-Feb-1997 |
Mike Pritchard <mpp@FreeBSD.org> |
Remove function prototypes for vfs_mountroot and vgoneall, since they were removed with the Lite2 merge. Submitted by: bde
|
#
996c772f |
|
09-Feb-1997 |
John Dyson <dyson@FreeBSD.org> |
This is the kernel Lite/2 commit. There are some requisite userland changes, so don't expect to be able to run the kernel as-is (very well) without the appropriate Lite/2 userland changes. The system boots and can mount UFS filesystems. Untested: ext2fs, msdosfs, NFS Known problems: Incorrect Berkeley ID strings in some files. Mount_std mounts will not work until the getfsent library routine is changed. Reviewed by: various people Submitted by: Jeffery Hsu <hsu@freebsd.org>
|
#
5131d64e |
|
16-Jan-1997 |
Bruce Evans <bde@FreeBSD.org> |
Removed option EXTRAVNODES. All versions of FreeBSD-2.x have a sysctl variable `kern.maxvnodes' which gives much better control over vnode allocation than EXTRAVNODES (except in -current between 1995/10/28 and 1996/11/12, kern.maxvnodes was read-only and thus useless).
|
#
1130b656 |
|
14-Jan-1997 |
Jordan K. Hubbard <jkh@FreeBSD.org> |
Make the long-awaited change from $Id$ to $FreeBSD$ This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long. Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
|
#
8b612c4b |
|
28-Dec-1996 |
John Dyson <dyson@FreeBSD.org> |
This commit is the embodiment of some VFS read clustering improvements. Firstly, now our read-ahead clustering is on a file descriptor basis and not on a per-vnode basis. This will allow multiple processes reading the same file to take advantage of read-ahead clustering. Secondly, there previously was a problem with large reads still using the ramp-up algorithm. Of course, that was bogus, and now we read the entire "chunk" off of the disk in one operation. The read-ahead clustering algorithm should use less CPU than the previous also (I hope :-)). NOTE: THAT LKMS MUST BE REBUILT!!!
|
#
17a6a9e3 |
|
17-Oct-1996 |
Jordan K. Hubbard <jkh@FreeBSD.org> |
Some very small changes to support Netcon's TFS filesystem. These patches were formerly applied by the Netcon installer before rebuilding your kernel.
|
#
62c3734c |
|
15-Oct-1996 |
Bruce Evans <bde@FreeBSD.org> |
Updated #includes to 4.4lite style.
|
#
6476c0d2 |
|
21-Aug-1996 |
John Dyson <dyson@FreeBSD.org> |
Even though this looks like it, this is not a complex code change. The interface into the "VMIO" system has changed to be more consistant and robust. Essentially, it is now no longer necessary to call vn_open to get merged VM/Buffer cache operation, and exceptional conditions such as merged operation of VBLK devices is simpler and more correct. This code corrects a potentially large set of problems including the problems with ktrace output and loaded systems, file create/deletes, etc. Most of the changes to NFS are cosmetic and name changes, eliminating a layer of subroutine calls. The direct calls to vput/vrele have been re-instituted for better cross platform compatibility. Reviewed by: davidg
|
#
114a8cff |
|
30-May-1996 |
Peter Wemm <peter@FreeBSD.org> |
Add an option "EXTRA_VNODES" to cause an extra number of vnode structures to be allocated at boot time. This is an expensive option, as they consume physical ram and are not pageable etc. In certain situations, this kind of option is quite useful, especially for news servers that access a large number of directories at random and torture the name cache. Defining 5000 or 10000 extra vnodes should cut down the amount of vnode recycling somewhat, which should allow better name and directory caching etc. This is a "your mileage may vary" option, with no real indication of what works best for your machine except trial and error. Too many will cost you ram that you could otherwise use for disk buffers etc. This is based on something John Dyson mentioned to me a while ago.
|
#
e206cebf |
|
28-Mar-1996 |
David Greenman <dg@FreeBSD.org> |
Change v_usecount & v_writecount from a short to an int. As shorts they can and will overflow on large machines - especially on machines with filesystems with lots of files (like netnews servers), and the result is a "free vnode isn't" panic or worse. This fixes one of the causes of these panics that I've been experiancing on wcarchive.
|
#
02e2c406 |
|
11-Mar-1996 |
Peter Wemm <peter@FreeBSD.org> |
Import 4.4BSD-Lite2 onto the vendor branch, note that in the kernel, all files are off the vendor branch, so this should not change anything. A "U" marker generally means that the file was not changed in between the 4.4Lite and Lite-2 releases, and does not need a merge. "C" generally means that there was a change. [new sys/syscallargs.h file, to be "cvs rm"ed]
|
#
d9c68230 |
|
03-Mar-1996 |
Peter Wemm <peter@FreeBSD.org> |
Add missing prototype for newly public vn_vmio_open function, next to vn_vmio_close.
|
#
6c5e9bbd |
|
30-Jan-1996 |
Mike Pritchard <mpp@FreeBSD.org> |
Fix a bunch of spelling errors in the comment fields of a bunch of system include files.
|
#
bd7e5f99 |
|
18-Jan-1996 |
John Dyson <dyson@FreeBSD.org> |
Eliminated many redundant vm_map_lookup operations for vm_mmap. Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish overhead for merged cache. Efficiency improvement for vfs_cluster. It used to do alot of redundant calls to cluster_rbuild. Correct the ordering for vrele of .text and release of credentials. Use the selective tlb update for 486/586/P6. Numerous fixes to the size of objects allocated for files. Additionally, fixes in the various pagers. Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs. Fixes in the swap pager for exhausted resources. The pageout code will not as readily thrash. Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE), thereby improving efficiency of several routines. Eliminate even more unnecessary vm_page_protect operations. Significantly speed up process forks. Make vm_object_page_clean more efficient, thereby eliminating the pause that happens every 30seconds. Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the case of filesystems mounted async. Fix a panic with busy pages when write clustering is done for non-VMIO buffers.
|
#
45f486a5 |
|
25-Dec-1995 |
Bruce Evans <bde@FreeBSD.org> |
Removed redundant (incompletely staticized) declararations.
|
#
27a0b398 |
|
17-Dec-1995 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Staticize. Unstaticize a function in scsi/scsi_base that was used, with an undocumented option. My last count on the LINT kernel shows: Total symbols: 3647 unref symbols: 463 undef symbols: 4 1 ref symbols: 1751 2 ref symbols: 485 Approaching the pain threshold now.
|
#
68a3b3cd |
|
15-Dec-1995 |
Bruce Evans <bde@FreeBSD.org> |
Completed a function declaration. Restored order to prototype list. Restored tabs to #defines.
|
#
a316d390 |
|
10-Dec-1995 |
John Dyson <dyson@FreeBSD.org> |
Changes to support 1Tb filesizes. Pages are now named by an (object,index) pair instead of (object,offset) pair.
|
#
f57e6547 |
|
09-Nov-1995 |
Bruce Evans <bde@FreeBSD.org> |
Introduced a type `vop_t' for vnode operation functions and used it 1138 times (:-() in casts and a few more times in declarations. This change is null for the i386. The type has to be `typedef int vop_t(void *)' and not `typedef int vop_t()' because `gcc -Wstrict-prototypes' warns about the latter. Since vnode op functions are called with args of different (struct pointer) types, neither of these function types is any use for type checking of the arg, so it would be preferable not to use the complete function type, especially since using the complete type requires adding 1138 casts to avoid compiler warnings and another 40+ casts to reverse the function pointer conversions before calling the functions.
|
#
ef27d1fe |
|
07-Nov-1995 |
John Dyson <dyson@FreeBSD.org> |
Export a symbol that ext2fs wants (insmntque.)
|
#
39d38f93 |
|
06-Jul-1995 |
David Greenman <dg@FreeBSD.org> |
Fixed an object allocation race condition that was causing a "object deallocated too many times" panic when using NFS. Reviewed by: John Dyson
|
#
aa2cabb9 |
|
27-Jun-1995 |
David Greenman <dg@FreeBSD.org> |
1) Converted v_vmdata to v_object. 2) Removed unnecessary vm_object_lookup()/pager_cache(object, TRUE) pairs after vnode_pager_alloc() calls - the object is already guaranteed to be persistent. 3) Removed some gratuitous casts.
|
#
999422d7 |
|
19-Apr-1995 |
Julian Elischer <julian@FreeBSD.org> |
Reviewed by: no-one yet, but non-intrusive Submitted by: julian@tfs.com Obtained from: written from scratch slight changes to make space for devfs.. (also conditional test code in i386/isa/fd.c) =================================================================== RCS file: /home/ncvs/src/sys/sys/malloc.h,v retrieving revision 1.7 diff -r1.7 malloc.h 113a114,117 > #define M_DEVFSMNT 62 /* DEVFS mount structure */ > #define M_DEVFSBACK 63 /* DEVFS Back node */ > #define M_DEVFSFRONT 64 /* DEVFS Front node */ > #define M_DEVFSNODE 65 /* DEVFS node */ 184c188,192 < NULL, NULL, NULL, NULL, NULL, \ --- > "DEVFS mount", /* 62 M_DEVFSMNT */ \ > "DEVFS back", /* 63 M_DEVFSBACK */ \ > "DEVFS front", /* 64 M_DEVFSFRONT */ \ > "DEVFS node", /* 65 M_DEVFSNODE */ \ > NULL, \ Index: sys/mount.h =================================================================== RCS file: /home/ncvs/src/sys/sys/mount.h,v retrieving revision 1.16 diff -r1.16 mount.h 100c100,101 < #define MOUNT_MAXTYPE 15 --- > #define MOUNT_DEVFS 16 /* existing device Filesystem */ > #define MOUNT_MAXTYPE 16 118a120 > "devfs", /* 15 MOUNT_DEVFS */ \ Index: sys/vnode.h =================================================================== RCS file: /home/ncvs/src/sys/sys/vnode.h,v retrieving revision 1.19 diff -r1.19 vnode.h 61c61 < VT_UNION, VT_MSDOSFS --- > VT_UNION, VT_MSDOSFS, VT_DEVFS
|
#
f6b04d2b |
|
09-Apr-1995 |
David Greenman <dg@FreeBSD.org> |
Changes from John Dyson and myself: Fixed remaining known bugs in the buffer IO and VM system. vfs_bio.c: Fixed some race conditions and locking bugs. Improved performance by removing some (now) unnecessary code and fixing some broken logic. Fixed process accounting of # of FS outputs. Properly handle NFS interrupts (B_EINTR). (various) Replaced calls to clrbuf() with calls to an optimized routine called vfs_bio_clrbuf(). (various FS sync) Sync out modified vnode_pager backed pages. ffs_vnops.c: Do two passes: Sync out file data first, then indirect blocks. vm_fault.c: Fixed deadly embrace caused by acquiring locks in the wrong order. vnode_pager.c: Changed to use buffer I/O system for writing out modified pages. This should fix the problem with the modification date previous not getting updated. Also dramatically simplifies the code. Note that this is going to change in the future and be implemented via VOP_PUTPAGES(). vm_object.c: Fixed a pile of bugs related to cleaning (vnode) objects. The performance of vm_object_page_clean() is terrible when dealing with huge objects, but this will change when we implement a binary tree to keep the object pages sorted. vm_pageout.c: Fixed broken clustering of pageouts. Fixed race conditions and other lockup style bugs in the scanning of pages. Improved performance.
|
#
cb9db2b6 |
|
28-Mar-1995 |
David Greenman <dg@FreeBSD.org> |
When NFS is compiled into the kernel, make NQNFS lease checking conditional on a "NQNFS" kernel config option. NQNFS is a 4.4 wart and the performance penalty of the lease checks on the client/server for _local_ I/O is too high to have this occur all the time - especially when most people will never use it.
|
#
b5e8ce9f |
|
16-Mar-1995 |
Bruce Evans <bde@FreeBSD.org> |
Add and move declarations to fix all of the warnings from `gcc -Wimplicit' (except in netccitt, netiso and netns) and most of the warnings from `gcc -Wnested-externs'. Fix all the bugs found. There were no serious ones.
|
#
79f7a9e1 |
|
07-Mar-1995 |
David Greenman <dg@FreeBSD.org> |
Added a new flag "VAGE" to indicate that the vnode should go on the head of the free list.
|
#
ff2e6a8b |
|
05-Jan-1995 |
Justin T. Gibbs <gibbs@FreeBSD.org> |
Add VNINACT flag. LFS has a habbit of skipping the ufs_inactive procedure. It used to do this by setting a global <Yuck>. Now we set th VNINACT flag in the vnode to force a skip of ufs_inactive. Sorry for missing this file in my last commit folks. Index: vnode.h =================================================================== RCS file: /usr/cvs/src/sys/sys/vnode.h,v retrieving revision 1.14 diff -c -r1.14 vnode.h *** 1.14 1994/11/14 13:51:53 --- vnode.h 1994/12/03 01:06:27 *************** *** 116,121 **** --- 116,122 ---- #define VALIASED 0x0800 /* vnode has an alias */ #define VDIROP 0x1000 /* LFS: vnode is involved in a directory op */ #define VVMIO 0x2000 /* VMIO flag */ + #define VNINACT 0x4000 /* LFS: skip ufs_inactive() in lfs_vunref */ /* * Vnode attributes. A field value of VNOVAL represents a field whose value
|
#
80d08a82 |
|
14-Nov-1994 |
Bruce Evans <bde@FreeBSD.org> |
Add prototype for vfinddev().
|
#
091b0456 |
|
20-Oct-1994 |
Garrett Wollman <wollman@FreeBSD.org> |
Make my ALLDEVS kernel compile (basically, LINT minus a lot of options). This involves fixing a few things I broke last time.
|
#
b4a8d575 |
|
08-Oct-1994 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Added prototypes here and there. Moved pfctlinput into socket.h.
|
#
8e58bf68 |
|
05-Oct-1994 |
David Greenman <dg@FreeBSD.org> |
Stuff object into v_vmdata rather than pager. Not important which at the moment, but will be in the future. Other changes mostly cosmetic, but are made for future VMIO considerations. Submitted by: John Dyson
|
#
f86eaaca |
|
02-Oct-1994 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Prototypes, prototypes and even more prototypes. Not quite done yet, but getting closer all the time.
|
#
bb56ec4a |
|
25-Sep-1994 |
Poul-Henning Kamp <phk@FreeBSD.org> |
While in the real world, I had a bad case of being swapped out for a lot of cycles. While waiting there I added a lot of the extra ()'s I have, (I have never used LISP to any extent). So I compiled the kernel with -Wall and shut up a lot of "suggest you add ()'s", removed a bunch of unused var's and added a couple of declarations here and there. Having a lap-top is highly recommended. My kernel still runs, yell at me if you kernel breaks.
|
#
e21fa31a |
|
22-Sep-1994 |
Garrett Wollman <wollman@FreeBSD.org> |
Make NFS loadable.
|
#
c901836c |
|
20-Sep-1994 |
Garrett Wollman <wollman@FreeBSD.org> |
Implemented loadable VFS modules, and made most existing filesystems loadable. (NFS is a notable exception.)
|
#
27a0bc89 |
|
19-Sep-1994 |
Doug Rabson <dfr@FreeBSD.org> |
Added msdosfs. Obtained from: NetBSD
|
#
d8f10c11 |
|
15-Sep-1994 |
Bruce Evans <bde@FreeBSD.org> |
Add some prototypes.
|
#
1cdeb653 |
|
29-Aug-1994 |
David Greenman <dg@FreeBSD.org> |
"bogus" fixes from 1.1.5 to work around some cache coherency problems.
|
#
af9da405 |
|
20-Aug-1994 |
Paul Richards <paul@FreeBSD.org> |
Made them all idempotent. Reviewed by: Submitted by:
|
#
3c4dd356 |
|
02-Aug-1994 |
David Greenman <dg@FreeBSD.org> |
Added $Id$
|
#
df8bae1d |
|
24-May-1994 |
Rodney W. Grimes <rgrimes@FreeBSD.org> |
BSD 4.4 Lite Kernel Sources
|