History log of /freebsd-current/sys/kern/vnode_if.src
Revision Date Author Comments
# 4cbe4c48 18-Nov-2023 Konstantin Belousov <kib@FreeBSD.org>

VFS: add VOP_GETLOWVNODE()

It is similar to VOP_GETWRITEMOUNT(), and for given vnode vp should
return the lower vnode which would actually handle write to vp.
Flags allow to specify FREAD or FWRITE for benefit of possible unionfs
implementation.

Reviewed by: markj, Olivier Certner <olce.freebsd@certner.fr>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D42603


# 29363fb4 23-Nov-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove ancient SCCS tags.

Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.

Sponsored by: Netflix


# 031beb4e 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line sh pattern

Remove /^\s*#[#!]?\s*\$FreeBSD\$.*$\n/


# fa1ac969 07-Aug-2022 Gordon Bergling <gbe@FreeBSD.org>

vnode(9): Fix a typo in a source code comment

- s/paramater/parameter/

MFC after: 3 days


# b214fcce 13-Dec-2021 Alan Somers <asomers@FreeBSD.org>

Change VOP_READDIR's cookies argument to a **uint64_t

The cookies argument is only used by the NFS server. NFSv2 defines the
cookie as 32 bits on the wire, but NFSv3 increased it to 64 bits. Our
VOP_READDIR, however, has always defined it as u_long, which is 32 bits
on some architectures. Change it to 64 bits on all architectures. This
doesn't matter for any in-tree file systems, but it matters for some
FUSE file systems that use 64-bit directory cookies.

PR: 260375
Reviewed by: rmacklem
Differential Revision: https://reviews.freebsd.org/D33404


# 47b248ac 03-Nov-2021 Konstantin Belousov <kib@FreeBSD.org>

Make locking assertions for VOP_FSYNC() and VOP_FDATASYNC() more correct

For devfs vnodes, it is fine to not lock vnodes for VOP_FSYNC().
Otherwise vnode must be locked exclusively, except for MNT_SHARED_WRITES()
where the shared lock is enough.

Reported and tested by: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32761


# f0c9847a 06-Nov-2021 Rick Macklem <rmacklem@FreeBSD.org>

vfs: Add "ioflag" and "cred" arguments to VOP_ALLOCATE

When the NFSv4.2 server does a VOP_ALLOCATE(), it needs
the operation to be done for the RPC's credential and not
td_ucred. It also needs the writing to be done synchronously.

This patch adds "ioflag" and "cred" arguments to VOP_ALLOCATE()
and modifies vop_stdallocate() to use these arguments.

The VOP_ALLOCATE.9 man page will be patched separately.

Reviewed by: khng, kib
Differential Revision: https://reviews.freebsd.org/D32865


# 2b68eb8e 01-Oct-2021 Mateusz Guzik <mjg@FreeBSD.org>

vfs: remove thread argument from VOP_STAT

and fo_stat.


# a638dc4e 12-Aug-2021 Ka Ho Ng <khng@FreeBSD.org>

vfs: Add ioflag to VOP_DEALLOCATE(9)

The addition of ioflag allows callers passing
IO_SYNC/IO_DATASYNC/IO_DIRECT down to the file system implementation.
The vop_stddeallocate fallback implementation is updated to pass the
ioflag to the file system implementation. vn_deallocate(9) internally is
also changed to pass ioflag to the VOP_DEALLOCATE call.

Sponsored by: The FreeBSD Foundation
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D31500


# 0dc332bf 05-Aug-2021 Ka Ho Ng <khng@FreeBSD.org>

Add fspacectl(2), vn_deallocate(9) and VOP_DEALLOCATE(9).

fspacectl(2) is a system call to provide space management support to
userspace applications. VOP_DEALLOCATE(9) is a VOP call to perform the
deallocation. vn_deallocate(9) is a public KPI for kmods' use.

The purpose of proposing a new system call, a KPI and a VOP call is to
allow bhyve or other hypervisor monitors to emulate the behavior of SCSI
UNMAP/NVMe DEALLOCATE on a plain file.

fspacectl(2) comprises of cmd and flags parameters to specify the
space management operation to be performed. Currently cmd has to be
SPACECTL_DEALLOC, and flags has to be 0.

fo_fspacectl is added to fileops.
VOP_DEALLOCATE(9) is added as a new VOP call. A trivial implementation
of VOP_DEALLOCATE(9) is provided.

Sponsored by: The FreeBSD Foundation
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D28347


# 49c117c1 28-Jan-2021 Konstantin Belousov <kib@FreeBSD.org>

Add VOP_VPUT_PAIR() with trivial default implementation.

The VOP is intended to be used in situations where VFS has two
referenced locked vnodes, typically a directory vnode dvp and a vnode
vp that is linked from the directory, and at least dvp is vput(9)ed.
The child vnode can be also vput-ed, but optionally left referenced and
locked.

There, at least UFS may need to do some actions with dvp which cannot be
done while vp is also locked, so its lock might be dropped temporary.
For instance, in some cases UFS needs to sync dvp to avoid filesystem
state that is currently not handled by either kernel nor fsck. Having
such VOP provides the neccessary context for filesystem which can do
correct locking and handle potential reclamation of vp after relock.

Trivial implementation does vput(dvp) and optionally vput(vp).

Reviewed by: chs, mckusick
Tested by: pho
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation


# 739ecbcf 23-Jan-2021 Mateusz Guzik <mjg@FreeBSD.org>

cache: add symlink support to lockless lookup

Reviewed by: kib (previous version)
Tested by: pho (previous version)
Differential Revision: https://reviews.freebsd.org/D27488


# c7520caa 22-Oct-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: prevent avoidable evictions on mkdir of existing directories

mkdir -p /foo/bar/baz will mkdir each path component and ignore EEXIST.

The NOCACHE lookup will make the namecache unnecessarily evict the existing entry,
and then fallback to the fs lookup routine eventually leading namei to return an
error as the directory is already there.

For invocations like mkdir -p /usr/obj/usr/src/sys/GENERIC/modules this triggers
fallbacks to the slowpath for concurrently executing lookups.

Tested by: pho
Discussed with: kib


# ab21ed17 20-Oct-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: drop the de facto curthread argument from VOP_INACTIVE


# 8ecd87a3 20-Oct-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: drop spurious cred argument from VOP_VPTOCNP


# 3c484f32 15-Sep-2020 Konstantin Belousov <kib@FreeBSD.org>

Convert page cache read to VOP.

There are several negative side-effects of not calling into VOP layer
at all for page cache reads. The biggest is the missed activation of
EVFILT_READ knotes.

Also, it allows filesystem to make more fine grained decision to
refuse read from page cache.

Keep VIRF_PGREAD flag around, it is still useful for nullfs, and for
asserts.

Reviewed by: markj
Tested by: pho
Discussed with: mjg
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D26346


# 8f226f4c 19-Aug-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: remove the always-curthread td argument from VOP_RECLAIM


# 21d5af2b 10-Aug-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: drop the thread argumemnt from vfs_fplookup_vexec

It is guaranteed curthread.

Tested by: pho
Sponsored by: The FreeBSD Foundation


# 51ea7bea 07-Aug-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: add VOP_STAT

The current scheme of calling VOP_GETATTR adds avoidable overhead.

An example with tmpfs doing fstat (ops/s):
before: 7488958
after: 7913833

Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D25910


# 848f8eff 30-Jul-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: inline vops if there are no pre/post associated calls

This removes a level of indirection from frequently used methods, most notably
VOP_LOCK1 and VOP_UNLOCK1.

Tested by: pho


# 07d2145a 25-Jul-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: add the infrastructure for lockless lookup

Reviewed by: kib
Tested by: pho (in a patchset)
Differential Revision: https://reviews.freebsd.org/D25577


# 0379ff6a 25-Jul-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: introduce vnode sequence counters

Modified on each permission change and link/unlink.

Reviewed by: kib
Tested by: pho (in a patchset)
Differential Revision: https://reviews.freebsd.org/D25573


# 2782c00c 22-Feb-2020 Ryan Libby <rlibby@FreeBSD.org>

vfs: quiet -Wwrite-strings

Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D23797


# 6698e11f 02-Feb-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: remove the now empty vop_unlock_post


# 45757984 01-Feb-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: consistently use size_t for buflen around VOP_VPTOCNP


# 643656cf 31-Jan-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: replace VOP_MARKATIME with VOP_MMAPPED

The routine is only provided by ufs and is only used on mmap and exec.

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D23422


# b249ce48 03-Jan-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: drop the mostly unused flags argument from VOP_UNLOCK

Filesystems which want to use it in limited capacity can employ the
VOP_UNLOCK_FLAGS macro.

Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D21427


# 1e2f0ceb 28-Aug-2019 Mateusz Guzik <mjg@FreeBSD.org>

vfs: add VOP_NEED_INACTIVE

vnode usecount drops to 0 all the time (e.g. for directories during path lookup).
When that happens the kernel would always lock the exclusive lock for the vnode
in order to call vinactive(). This blocks other threads who want to use the vnode
for looukp.

vinactive is very rarely needed and can be tested for without the vnode lock held.

This patch gives filesytems an opportunity to do it, sample total wait time for
tmpfs over 500 minutes of poudriere -j 104:

before: 557563641706 (lockmgr:tmpfs)
after: 46309603301 (lockmgr:tmpfs)

Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21371


# bb9e2184 18-Aug-2019 Konstantin Belousov <kib@FreeBSD.org>

Change locking requirements for VOP_UNSET_TEXT().

Require the vnode to be locked for the VOP_UNSET_TEXT() call. This
will be used by the following bug fix for a tmpfs issue.

Tested by: sbruno, pho (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# bbbbeca3 24-Jul-2019 Rick Macklem <rmacklem@FreeBSD.org>

Add kernel support for a Linux compatible copy_file_range(2) syscall.

This patch adds support to the kernel for a Linux compatible
copy_file_range(2) syscall and the related VOP_COPY_FILE_RANGE(9).
This syscall/VOP can be used by the NFSv4.2 client to implement the
Copy operation against an NFSv4.2 server to do file copies locally on
the server.
The vn_generic_copy_file_range() function in this patch can be used
by the NFSv4.2 server to implement the Copy operation.
Fuse may also me able to use the VOP_COPY_FILE_RANGE() method.

vn_generic_copy_file_range() attempts to maintain holes in the output
file in the range to be copied, but may fail to do so if the input and
output files are on different file systems with different _PC_MIN_HOLE_SIZE
values.

Separate commits will be done for the generated syscall files and userland
changes. A commit for a compat32 syscall will be done later.

Reviewed by: kib, asomers (plus comments by brooks, jilles)
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D20584


# 78022527 05-May-2019 Konstantin Belousov <kib@FreeBSD.org>

Switch to use shared vnode locks for text files during image activation.

kern_execve() locks text vnode exclusive to be able to set and clear
VV_TEXT flag. VV_TEXT is mutually exclusive with the v_writecount > 0
condition.

The change removes VV_TEXT, replacing it with the condition
v_writecount <= -1, and puts v_writecount under the vnode interlock.
Each text reference decrements v_writecount. To clear the text
reference when the segment is unmapped, it is recorded in the
vm_map_entry backed by the text file as MAP_ENTRY_VN_TEXT flag, and
v_writecount is incremented on the map entry removal

The operations like VOP_ADD_WRITECOUNT() and VOP_SET_TEXT() check that
v_writecount does not contradict the desired change. vn_writecheck()
is now racy and its use was eliminated everywhere except access.
Atomic check for writeability and increment of v_writecount is
performed by the VOP. vn_truncate() now increments v_writecount
around VOP_SETATTR() call, lack of which is arguably a bug on its own.

nullfs bypasses v_writecount to the lower vnode always, so nullfs
vnode has its own v_writecount correct, and lower vnode gets all
references, since object->handle is always lower vnode.

On the text vnode' vm object dealloc, the v_writecount value is reset
to zero, and deadfs vop_unset_text short-circuit the operation.
Reclamation of lowervp always reclaims all nullfs vnodes referencing
lowervp first, so no stray references are left.

Reviewed by: markj, trasz
Tested by: mjg, pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
Differential revision: https://reviews.freebsd.org/D19923


# 1493c2ee 02-Nov-2018 Brooks Davis <brooks@FreeBSD.org>

Make vop_symlink take a const target path.

This will enable callers to take const paths as part of syscall
decleration improvements.

Where doing so is easy and non-distruptive carry the const through
implementations. In UFS the value is passed to an interface that must
take non-const values. In ZFS, const poisoning would touch code shared
with upstream and it's not worth adding diffs.

Bump __FreeBSD_version for external API consumers.

Reviewed by: kib (prior version)
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D17805


# b1288166 17-Jan-2018 John Baldwin <jhb@FreeBSD.org>

Use long for the last argument to VOP_PATHCONF rather than a register_t.

pathconf(2) and fpathconf(2) both return a long. The kern_[f]pathconf()
functions now accept a pointer to a long value rather than modifying
td_retval directly. Instead, the system calls explicitly store the
returned long value in td_retval[0].

Requested by: bde
Reviewed by: kib
Sponsored by: Chelsio Communications


# 0c3c207f 02-Jun-2017 Gleb Smirnoff <glebius@FreeBSD.org>

For UNIX sockets make vnode point not to the socket, but to the UNIX PCB,
since the latter is the thing that links together VFS and sockets.

While here, make the union in the struct vnode anonymous.


# 69a28758 15-Sep-2016 Ed Maste <emaste@FreeBSD.org>

Renumber license clauses in sys/kern to avoid skipping #3


# 9ce60e28 27-Aug-2016 Konstantin Belousov <kib@FreeBSD.org>

Consistently delimit each vnode description block with two blank
lines.

Sponsored by: The FreeBSD Foundation
MFC after: 3 days


# 295af703 15-Aug-2016 Konstantin Belousov <kib@FreeBSD.org>

Add an implementation of fdatasync(2).

The syscall is a trivial wrapper around new VOP_FDATASYNC(), sharing
code with fsync(2). For all filesystems, this commit provides the
implementation which delegates the work of VOP_FDATASYNC() to
VOP_FSYNC(). This is functionally correct but not efficient.

This is not yet POSIX-compliant implementation, because it does not
ensure that queued AIO requests are completed before returning.

Reviewed by: mckusick
Discussed with: avg (ZFS), jhb (AIO part)
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D7471


# f8acef5a 12-Aug-2016 Edward Tomasz Napierala <trasz@FreeBSD.org>

Remove unused "X" vnode lock assertion, somehow missed in r303743.

MFC after: 1 month


# 7b255097 04-Aug-2016 Edward Tomasz Napierala <trasz@FreeBSD.org>

Remove unused - never actually implemented - vnode lock types
from vnode_if.src.

MFC after: 1 month


# c89e1b87 03-May-2016 Konstantin Belousov <kib@FreeBSD.org>

Add EVFILT_VNODE open, read and close notifications.

While there, order EVFILT_VNODE notes descriptions alphabetically.

Based on submission, and tested by: Vladimir Kondratyev <wulf@cicgroup.ru>
MFC after: 2 weeks


# e3043798 29-Apr-2016 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/kern: spelling fixes in comments.

No functional change.


# b0cd2017 16-Dec-2015 Gleb Smirnoff <glebius@FreeBSD.org>

A change to KPI of vm_pager_get_pages() and underlying VOP_GETPAGES().

o With new KPI consumers can request contiguous ranges of pages, and
unlike before, all pages will be kept busied on return, like it was
done before with the 'reqpage' only. Now the reqpage goes away. With
new interface it is easier to implement code protected from race
conditions.

Such arrayed requests for now should be preceeded by a call to
vm_pager_haspage() to make sure that request is possible. This
could be improved later, making vm_pager_haspage() obsolete.

Strenghtening the promises on the business of the array of pages
allows us to remove such hacks as swp_pager_free_nrpage() and
vm_pager_free_nonreq().

o New KPI accepts two integer pointers that may optionally point at
values for read ahead and read behind, that a pager may do, if it
can. These pages are completely owned by pager, and not controlled
by the caller.

This shifts the UFS-specific readahead logic from vm_fault.c, which
should be file system agnostic, into vnode_pager.c. It also removes
one VOP_BMAP() request per hard fault.

Discussed with: kib, alc, jeff, scottl
Sponsored by: Nginx, Inc.
Sponsored by: Netflix


# 55d33667 15-Sep-2015 Conrad Meyer <cem@FreeBSD.org>

kevent(2): Note DOOMED vnodes with NOTE_REVOKE

In poll mode, check for and wake VBAD vnodes. (Vnodes that are VBAD at
registration will never be woken by the RECLAIM trigger.)

Add post-VOP_RECLAIM hook to trigger notes on vnode reclamation. (Vnodes that
were fine at registration but are vgoned while being monitored should signal
waiters.)

Reviewed by: kib
Approved by: markj (mentor)
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D3675


# f6d6b5e2 30-Mar-2015 Gleb Smirnoff <glebius@FreeBSD.org>

Catch up on r271387 and remove unused parameter from
VOP_GETPAGES_ASYNC().


# 90effb23 22-Nov-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Merge from projects/sendfile:

o Provide a new VOP_GETPAGES_ASYNC(), which works like VOP_GETPAGES(), but
doesn't sleep. It returns immediately, and will execute the I/O done handler
function that must be supplied as argument.
o Provide VOP_GETPAGES_ASYNC() for the FFS, which uses vnode_pager.
o Extend pagertab to support pgo_getpages_async method, and implement this
method for vnode_pager.

Reviewed by: kib
Tested by: pho
Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# 27ad26d8 09-Sep-2014 Gleb Smirnoff <glebius@FreeBSD.org>

Remove unused arguments for VOP_GETPAGES(), VOP_PUTPAGES().


# 1bd7d0b7 09-Nov-2013 Konstantin Belousov <kib@FreeBSD.org>

If filesystem declares that it supports shared locking for writes, use
shared vnode lock for VOP_PUTPAGES() as well. The only such
filesystem in the tree is ZFS, and it uses
vnode_pager_generic_putpages(), which performs the pageout with
VOP_WRITE().

Reviewed by: alc
Discussed with: avg
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# 4f15bb67 22-Nov-2012 Andriy Gapon <avg@FreeBSD.org>

remove vop_lookup_pre and vop_lookup_post

Suggested by: kib
MFC after: 5 days


# c496727c 19-Nov-2012 Andriy Gapon <avg@FreeBSD.org>

vnode_if: fix locking protocol description for lookup and cachedlookup

Also remove the checks from vop_lookup_pre and vop_lookup_post, which
are now completely redundant (before this change they were partially
redundant).

Discussed with: kib
MFC after: 10 days


# 140dedb8 02-Nov-2012 Konstantin Belousov <kib@FreeBSD.org>

The r241025 fixed the case when a binary, executed from nullfs mount,
was still possible to open for write from the lower filesystem. There
is a symmetric situation where the binary could already has file
descriptors opened for write, but it can be executed from the nullfs
overlay.

Handle the issue by passing one v_writecount reference to the lower
vnode if nullfs vnode has non-zero v_writecount. Note that only one
write reference can be donated, since nullfs only keeps one use
reference on the lower vnode. Always use the lower vnode v_writecount
for the checks.

Introduce the VOP_GET_WRITECOUNT to read v_writecount, which is
currently always bypassed to the lower vnode, and VOP_ADD_WRITECOUNT
to manipulate the v_writecount value, which manages a single bypass
reference to the lower vnode. Caling the VOPs instead of directly
accessing v_writecount provide the fix described in the previous
paragraph.

Tested by: pho
MFC after: 3 weeks


# 877d24ac 28-Sep-2012 Konstantin Belousov <kib@FreeBSD.org>

Fix the mis-handling of the VV_TEXT on the nullfs vnodes.

If you have a binary on a filesystem which is also mounted over by
nullfs, you could execute the binary from the lower filesystem, or
from the nullfs mount. When executed from lower filesystem, the lower
vnode gets VV_TEXT flag set, and the file cannot be modified while the
binary is active. But, if executed as the nullfs alias, only the
nullfs vnode gets VV_TEXT set, and you still can open the lower vnode
for write.

Add a set of VOPs for the VV_TEXT query, set and clear operations,
which are correctly bypassed to lower vnode.

Tested by: pho (previous version)
MFC after: 2 weeks


# c7e41c8b 29-Feb-2012 Mikolaj Golub <trociny@FreeBSD.org>

Introduce VOP_UNP_BIND(), VOP_UNP_CONNECT(), and VOP_UNP_DETACH()
operations for setting and accessing vnode's v_socket field.

The operations are necessary to implement proper unix socket handling
on layered file systems like nullfs(5).

This change fixes the long standing issue with nullfs(5) being in that
unix sockets did not work between lower and upper layers: if we bound
to a socket on the lower layer we could connect only to the lower
path; if we bound to the upper layer we could connect only to the
upper path. The new behavior is one can connect to both the lower and
the upper paths regardless what layer path one binds to.

PR: kern/51583, kern/159663
Suggested by: kib
Reviewed by: arch
MFC after: 2 weeks


# 71eeeaf2 06-Jan-2012 John Baldwin <jhb@FreeBSD.org>

Add 5 spare VOPs as placeholders to avoid breaking the KBI in the future
when new VOPs are MFC'd to a branch.

Reviewed by: kib, bz
MFC after: 3 days


# f0d6c5ca 23-Dec-2011 John Baldwin <jhb@FreeBSD.org>

Add post-VOP hooks for VOP_DELETEEXTATTR() and VOP_SETEXTATTR() and use
these to trigger a NOTE_ATTRIB EVFILT_VNODE kevent when the extended
attributes of a vnode are changed.

Note that OS X already implements this behavior.

Reviewed by: rwatson
MFC after: 2 weeks


# 936c09ac 03-Nov-2011 John Baldwin <jhb@FreeBSD.org>

Add the posix_fadvise(2) system call. It is somewhat similar to
madvise(2) except that it operates on a file descriptor instead of a
memory region. It is currently only supported on regular files.

Just as with madvise(2), the advice given to posix_fadvise(2) can be
divided into two types. The first type provide hints about data access
patterns and are used in the file read and write routines to modify the
I/O flags passed down to VOP_READ() and VOP_WRITE(). These modes are
thus filesystem independent. Note that to ease implementation (and
since this API is only advisory anyway), only a single non-normal
range is allowed per file descriptor.

The second type of hints are used to hint to the OS that data will or
will not be used. These hints are implemented via a new VOP_ADVISE().
A default implementation is provided which does nothing for the WILLNEED
request and attempts to move any clean pages to the cache page queue for
the DONTNEED request. This latter case required two other changes.
First, a new V_CLEANONLY flag was added to vinvalbuf(). This requests
vinvalbuf() to only flush clean buffers for the vnode from the buffer
cache and to not remove any backing pages from the vnode. This is
used to ensure clean pages are not wired into the buffer cache before
attempting to move them to the cache page queue. The second change adds
a new vm_object_page_cache() method. This method is somewhat similar to
vm_object_page_remove() except that instead of freeing each page in the
specified range, it attempts to move clean pages to the cache queue if
possible.

To preserve the ABI of struct file, the f_cdevpriv pointer is now reused
in a union to point to the currently active advice region if one is
present for regular files.

Reviewed by: jilles, kib, arch@
Approved by: re (kib)
MFC after: 1 month


# fa2c76c9 13-May-2011 Matthew D Fleming <mdf@FreeBSD.org>

Correctly use INOUT for the offset/len parameters to vop_allocate. As
far as I can tell this is for documentation only at the moment.


# 1ce4508f 19-Apr-2011 Matthew D Fleming <mdf@FreeBSD.org>

Allow VOP_ALLOCATE to be iterative, and have kern_posix_fallocate(9)
drive looping and potentially yielding.

Requested by: kib


# d91f88f7 18-Apr-2011 Matthew D Fleming <mdf@FreeBSD.org>

Add the posix_fallocate(2) syscall. The default implementation in
vop_stdallocate() is filesystem agnostic and will run as slow as a
read/write loop in userspace; however, it serves to correctly
implement the functionality for filesystems that do not implement a
VOP_ALLOCATE.

Note that __FreeBSD_version was already bumped today to 900036 for any
ports which would like to use this function.

Also reserve space in the syscall table for posix_fadvise(2).

Reviewed by: -arch (previous version)


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# 7fd32ea9 12-May-2010 Zachary Loafman <zml@FreeBSD.org>

Add VOP_ADVLOCKPURGE so that the file system is called when purging
locks (in the case where the VFS impl isn't using lf_*)

Submitted by: Matthew Fleming <matthew.fleming@isilon.com>
Reviewed by: zml, dfr


# c808c963 21-Jun-2009 Konstantin Belousov <kib@FreeBSD.org>

Add explicit struct ucred * argument for VOP_VPTOCNP, to be used by
vn_open_cred in default implementation. Valid struct ucred is needed for
audit and MAC, and curthread credentials may be wrong.

This further requires modifying the interface of vn_fullpath(9), but it
is out of scope of this change.

Reviewed by: rwatson


# a03fb243 11-Jun-2009 Paul Saab <ps@FreeBSD.org>

Stop asserting on exclusive locks in fsync since it can now support
shared vnode locking on ZFS.

Reviewed by: jhb


# 27bfb741 08-Jun-2009 Paul Saab <ps@FreeBSD.org>

Simply shared vnode locking and extend it to also include fsync.
Also, in vop_write, no longer assert for exclusive locks on the
vnode.

Reviewed by: jhb, kmacy, jeffr


# c97fcdba 30-May-2009 Edward Tomasz Napierala <trasz@FreeBSD.org>

Add VOP_ACCESSX, which can be used to query for newly added V*
permissions, such as VWRITE_ACL. For a filsystems that don't
implement it, there is a default implementation, which works
as a wrapper around VOP_ACCESS.

Reviewed by: rwatson@


# 885868cd 10-Apr-2009 Robert Watson <rwatson@FreeBSD.org>

Remove VOP_LEASE and supporting functions. This hasn't been used since
the removal of NQNFS, but was left in in case it was required for NFSv4.
Since our new NFSv4 client and server can't use it for their
requirements, GC the old mechanism, as well as other unused lease-
related code and interfaces.

Due to its impact on kernel programming and binary interfaces, this
change should not be MFC'd.

Proposed by: jeff
Reviewed by: jeff
Discussed with: rmacklem, zach loafman @ isilon


# 33fc3625 11-Mar-2009 John Baldwin <jhb@FreeBSD.org>

Add a new internal mount flag (MNTK_EXTENDED_SHARED) to indicate that a
filesystem supports additional operations using shared vnode locks.
Currently this is used to enable shared locks for open() and close() of
read-only file descriptors.
- When an ISOPEN namei() request is performed with LOCKSHARED, use a
shared vnode lock for the leaf vnode only if the mount point has the
extended shared flag set.
- Set LOCKSHARED in vn_open_cred() for requests that specify O_RDONLY but
not O_CREAT.
- Use a shared vnode lock around VOP_CLOSE() if the file was opened with
O_RDONLY and the mountpoint has the extended shared flag set.
- Adjust md(4) to upgrade the vnode lock on the vnode it gets back from
vn_open() since it now may only have a shared vnode lock.
- Don't enable shared vnode locks on FIFO vnodes in ZFS and UFS since
FIFO's require exclusive vnode locks for their open() and close()
routines. (My recent MPSAFE patches for UDF and cd9660 already included
this change.)
- Enable extended shared operations on UFS, cd9660, and UDF.

Submitted by: ups
Reviewed by: pjd (ZFS bits)
MFC after: 1 month


# beace176 21-Jan-2009 John Baldwin <jhb@FreeBSD.org>

Move the VA_MARKATIME flag for VOP_SETATTR() out into its own VOP:
VOP_MARKATIME() since unlike the rest of VOP_SETATTR(), VA_MARKATIME
can be performed while holding a shared vnode lock (the same functionality
is done internally by VOP_READ which can run with a shared vnode lock).
Add missing locking of the vnode interlock to the ufs implementation and
remove a special note and test from the NFS client about not supporting the
feature.

Inspired by: ups
Tested by: pho


# b9022449 11-Dec-2008 Joe Marcus Clarke <marcus@FreeBSD.org>

Add a new VOP, VOP_VPTOCNP, which translates a vnode to its component name
on a best-effort basis. Teach vn_fullpath to use this new VOP if a
regular VFS cache lookup fails. This VOP is designed to supplement the
VFS cache to provide a better chance that a vnode-to-name lookup will
succeed.

Currently, an implementation for devfs is being committed. The default
implementation is to return ENOENT.

A big thanks to kib for the mentorship on this, and to pho for running it
through his stress test suite.

Reviewed by: arch
Approved by: kib


# 15bc6b2b 28-Oct-2008 Edward Tomasz Napierala <trasz@FreeBSD.org>

Introduce accmode_t. This is required for NFSv4 ACLs - it will be neccessary
to add more V* constants, and the variables changed by this patch were often
being assigned to mode_t variables, which is 16 bit.

Approved by: rwatson (mentor)


# a48ac381 27-Oct-2008 John Baldwin <jhb@FreeBSD.org>

- Whitespace fix for vop_poll.
- Use the right label for vop_vptofh lock assertions so they are enforced.


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# 0359a12e 28-Aug-2008 Attilio Rao <attilio@FreeBSD.org>

Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread
was always curthread and totally unuseful.

Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>


# dfdcada3 26-Mar-2008 Doug Rabson <dfr@FreeBSD.org>

Add the new kernel-mode NFS Lock Manager. To use it instead of the
user-mode lock manager, build a kernel with the NFSLOCKD option and
add '-k' to 'rpc_lockd_flags' in rc.conf.

Highlights include:

* Thread-safe kernel RPC client - many threads can use the same RPC
client handle safely with replies being de-multiplexed at the socket
upcall (typically driven directly by the NIC interrupt) and handed
off to whichever thread matches the reply. For UDP sockets, many RPC
clients can share the same socket. This allows the use of a single
privileged UDP port number to talk to an arbitrary number of remote
hosts.

* Single-threaded kernel RPC server. Adding support for multi-threaded
server would be relatively straightforward and would follow
approximately the Solaris KPI. A single thread should be sufficient
for the NLM since it should rarely block in normal operation.

* Kernel mode NLM server supporting cancel requests and granted
callbacks. I've tested the NLM server reasonably extensively - it
passes both my own tests and the NFS Connectathon locking tests
running on Solaris, Mac OS X and Ubuntu Linux.

* Userland NLM client supported. While the NLM server doesn't have
support for the local NFS client's locking needs, it does have to
field async replies and granted callbacks from remote NLMs that the
local client has contacted. We relay these replies to the userland
rpc.lockd over a local domain RPC socket.

* Robust deadlock detection for the local lock manager. In particular
it will detect deadlocks caused by a lock request that covers more
than one blocking request. As required by the NLM protocol, all
deadlock detection happens synchronously - a user is guaranteed that
if a lock request isn't rejected immediately, the lock will
eventually be granted. The old system allowed for a 'deferred
deadlock' condition where a blocked lock request could wake up and
find that some other deadlock-causing lock owner had beaten them to
the lock.

* Since both local and remote locks are managed by the same kernel
locking code, local and remote processes can safely use file locks
for mutual exclusion. Local processes have no fairness advantage
compared to remote processes when contending to lock a region that
has just been unlocked - the local lock manager enforces a strict
first-come first-served model for both local and remote lockers.

Sponsored by: Isilon Systems
PR: 95247 107555 115524 116679
MFC after: 2 weeks


# e30cf87b 25-Feb-2008 Konstantin Belousov <kib@FreeBSD.org>

Do not assert any locks for VOP_PRINT. In particular, do not assert that
the vnode interlock is not held. vn_printf() already correctly handles
locked and unlocked vnode interlocks, and all the in-tree vop_print
methods are interlock-agnostic.

Some code calls vprintf() with the vnode interlock held, that causes
unjustified panics with INVARIANTS (ffs_syncvnode() as example).

Reported by: Peter Holm


# 81c794f9 25-Feb-2008 Attilio Rao <attilio@FreeBSD.org>

Axe the 'thread' argument from VOP_ISLOCKED() and lockstatus() as it is
always curthread.

As KPI gets broken by this patch, manpages and __FreeBSD_version will be
updated by further commits.

Tested by: Andrea Barberio <insomniac at slackware dot it>


# 22db15c0 13-Jan-2008 Attilio Rao <attilio@FreeBSD.org>

VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.

KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.

Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>


# 9e223287 31-May-2007 Konstantin Belousov <kib@FreeBSD.org>

Revert UF_OPENING workaround for CURRENT.
Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation
argument from being file descriptor index into the pointer to struct file.

Proposed and reviewed by: jhb
Reviewed by: daichi (unionfs)
Approved by: re (kensmith)


# d413d210 18-May-2007 Konstantin Belousov <kib@FreeBSD.org>

Since renaming of vop_lock to _vop_lock, pre- and post-condition
function calls are no more generated for vop_lock.
Rename _vop_lock to vop_lock1 to satisfy tools/vnode_if.awk assumption
about vop naming conventions. This restores pre/post-condition calls.


# 10bcafe9 15-Feb-2007 Pawel Jakub Dawidek <pjd@FreeBSD.org>

Move vnode-to-file-handle translation from vfs_vptofh to vop_vptofh method.
This way we may support multiple structures in v_data vnode field within
one file system without using black magic.

Vnode-to-file-handle should be VOP in the first place, but was made VFS
operation to keep interface as compatible as possible with SUN's VFS.
BTW. Now Solaris also implements vnode-to-file-handle as VOP operation.

VFS_VPTOFH() was left for API backward compatibility, but is marked for
removal before 8.0-RELEASE.

Approved by: mckusick
Discussed with: many (on IRC)
Tested with: ufs, msdosfs, cd9660, nullfs and zfs


# 2f6a774b 12-Nov-2006 Kip Macy <kmacy@FreeBSD.org>

change vop_lock handling to allowing tracking of callers' file and line for
acquisition of lockmgr locks

Approved by: scottl (standing in for mentor rwatson)


# 23efd78d 31-May-2006 Diomidis Spinellis <dds@FreeBSD.org>

Remove two locking assertion entries that:
a) were incorrectly written and therefore never compiled into
assertions, and
b) were incorrectly specified and when compiled resulted in a
failed assertion.


# f69ec7af 30-May-2006 Diomidis Spinellis <dds@FreeBSD.org>

Assertion code specifications are introduced using special character
sequences that are distinct from comments. %% is used for argument
locks; %! for pre- and post-conditions.


# b1b42821 30-May-2006 Diomidis Spinellis <dds@FreeBSD.org>

Remove incorrect lock validation specifications that caused
failed assertions with DEBUG_VFS_LOCKS.
We should reinstate them with correct specifications, possibly
after extendng vnode_if.awk

Noted by: truckman@


# 0e1c7fb8 28-May-2006 Diomidis Spinellis <dds@FreeBSD.org>

Add missing % signs in the lock annotations of the functions:
lookup, rename, strategy, islocked
The missing % sign meant that the lines were processed as plain
comments and the corresponding assertions were never generated.


# 0430a5e2 13-Dec-2005 Dag-Erling Smørgrav <des@FreeBSD.org>

Eradicate caddr_t from the VFS API.


# 679985d0 09-Jun-2005 Suleiman Souhlal <ssouhlal@FreeBSD.org>

Allow EVFILT_VNODE events to work on every filesystem type, not just
UFS by:
- Making the pre and post hooks for the VOP functions work even when
DEBUG_VFS_LOCKS is not defined.
- Moving the KNOTE activations into the corresponding VOP hooks.
- Creating a MNTK_NOKNOTE flag for the mnt_kern_flag field of struct
mount that permits filesystems to disable the new behavior.
- Creating a default VOP_KQFILTER function: vfs_kqfilter()

My benchmarks have not revealed any performance degradation.

Reviewed by: jeff, bde
Approved by: rwatson, jmg (kqueue changes), grehan (mentor)


# 17c916e3 11-Apr-2005 Jeff Roberson <jeff@FreeBSD.org>

- Mark the VOPs that require exclusive locks. Those that aren't marked
with E may be called with a shared lock held. This list really could
be made per filesystem if we had any filesystems which differed from
ffs in locking guarantees. VFS itself is not sensitive to this except
where vgone() etc. are concerned.

Sponsored by: Isilon Systems, Inc.


# 4e674696 13-Mar-2005 Jeff Roberson <jeff@FreeBSD.org>

- CLOSE, REVOKE, INACTIVE, and RECLAIM are not L L L, that's a locked vnode
on enter, exit, error. This allows for the removal of the XLOCK.

Sponsored by: Isilon Systems, Inc.


# 7ee4eb61 07-Feb-2005 Poul-Henning Kamp <phk@FreeBSD.org>

VOP_DESTROYVOBJECT() is no more.


# 729fcf7e 24-Jan-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Take VOP_GETVOBJECT() out to pasture. We use the direct pointer now.


# 69816ea3 24-Jan-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Kill VOP_CREATEVOBJECT(), it is now the responsibility of the filesystem
for a given vnode to create a vnode_pager object if one is needed.


# 8df6bac4 11-Jan-2005 Poul-Henning Kamp <phk@FreeBSD.org>

Remove the unused credential argument from VOP_FSYNC() and VFS_SYNC().

I'm not sure why a credential was added to these in the first place, it is
not used anywhere and it doesn't make much sense:

The credentials for syncing a file (ability to write to the
file) should be checked at the system call level.

Credentials for syncing one or more filesystems ("none")
should be checked at the system call level as well.

If the filesystem implementation needs a particular credential
to carry out the syncing it would logically have to the
cached mount credential, or a credential cached along with
any delayed write data.

Discussed with: rwatson


# 9454b2d8 06-Jan-2005 Warner Losh <imp@FreeBSD.org>

/* -> /*- for copyright notices, minor format tweaks as necessary


# 9c83534d 15-Nov-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Make VOP_BMAP return a struct bufobj for the underlying storage device
instead of a vnode for it.

The vnode_pager does not and should not have any interest in what
the filesystem uses for backend.

(vfs_cluster doesn't use the backing store argument.)


# c108bb74 29-Oct-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Remove VOP_SPECSTRATEGY() from the system.


# 883d3c0c 13-Sep-2004 Poul-Henning Kamp <phk@FreeBSD.org>

Remove the buffercache/vnode side of BIO_DELETE processing in
preparation for integration of p4::phk_bufwork. In the future,
local filesystems will talk to GEOM directly and they will consequently
be able to issue BIO_DELETE directly. Since the removal of the fla
driver, BIO_DELETE has effectively been a no-op anyway.


# 7f8a436f 05-Apr-2004 Warner Losh <imp@FreeBSD.org>

Remove advertising clause from University of California Regent's license,
per letter dated July 22, 1999.

Approved by: core


# 9080ff25 28-Jul-2003 Robert Watson <rwatson@FreeBSD.org>

Rename VOP_RMEXTATTR() to VOP_DELETEEXTATTR() for consistency with the
kernel ACL interfaces and system call names.

Break out UFS2 and FFS extattr delete and list vnode operations from
setextattr and getextattr to deleteextattr and listextattr, which
cleans up the implementations, and makes the results more readable,
and makes the APIs more clear.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


# 1b6c6095 27-Jul-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Call the new argument "fdidx" that is more precise than "fd".


# a8d43c90 26-Jul-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Add a "int fd" argument to VOP_OPEN() which in the future will
contain the filedescriptor number on opens from userland.

The index is used rather than a "struct file *" since it conveys a bit
more information, which may be useful to in particular fdescfs and /dev/fd/*

For now pass -1 all over the place.


# 77533ed2 22-Jun-2003 Robert Watson <rwatson@FreeBSD.org>

Expose vop_rmextattr as an explicit operation at the vnode operation
interface, rather than relying on a NULL uio for the deletion
operation.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


# c2ef4dd4 19-Jun-2003 Stefan Eßer <se@FreeBSD.org>

Add comment about **vpp being special-cased in vnode_if.awk (1.38)


# a6f1342f 04-Jun-2003 Robert Watson <rwatson@FreeBSD.org>

Add vop_listextattr(), similar to vop_getextattr() but without a
specific attribute name. It will have the same semantics as the
older vop_getextattr() "retrieve the names" hack, returning
a buffer with ASCII nul-seperated names.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


# f5b11b6e 04-Jan-2003 Poul-Henning Kamp <phk@FreeBSD.org>

Temporarily introduce a new VOP_SPECSTRATEGY operation while I try
to sort out disk-io from file-io in the vm/buffer/filesystem space.

The intent is to sort VOP_STRATEGY calls into those which operate
on "real" vnodes and those which operate on VCHR vnodes. For
the latter kind, the call will be changed to VOP_SPECSTRATEGY,
possibly conditionally for those places where dual-use happens.

Add a default VOP_SPECSTRATEGY method which will call the normal
VOP_STRATEGY. First time it is called it will print debugging
information. This will only happen if a normal vnode is passed
to VOP_SPECSTRATEGY by mistake.

Add a real VOP_SPECSTRATEGY in specfs, which does what VOP_STRATEGY
does on a VCHR vnode today.

Add a new VOP_STRATEGY method in specfs to catch instances where
the conversion to VOP_SPECSTRATEGY has not yet happened. Handle
the request just like we always did, but first time called print
debugging information.

Apart up to two instances of console messages per boot, this amounts
to a glorified no-op commit.

If you get any of the messages on your console I would very much
like a copy of them mailed to phk@freebsd.org


# 79191eca 24-Dec-2002 Robert Watson <rwatson@FreeBSD.org>

Flush vop_refreshlabel() definition, since it is no longer used.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, Network Associates Laboratories


# 6fc15f9b 25-Sep-2002 Jeff Roberson <jeff@FreeBSD.org>

- We don't need any automated lock checking for vop_islocked.


# fa288043 19-Sep-2002 Don Lewis <truckman@FreeBSD.org>

VOP_FSYNC() requires that it's vnode argument be locked, which nfs_link()
wasn't doing. Rather than just lock and unlock the vnode around the call
to VOP_FSYNC(), implement rwatson's suggestion to lock the file vnode
in kern_link() before calling VOP_LINK(), since the other filesystems
also locked the file vnode right away in their link methods. Remove the
locking and and unlocking from the leaf filesystem link methods.

Reviewed by: rwatson, bde (except for the unionfs_link() changes)


# e1657bbb 05-Sep-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Introduce the VOP_OPENEXTATTR() and VOP_CLOSEEXTATTR() methods.

Together these two implement a simple transcation style grouping for
modifications of extended attributes on a vnode.

VOP_CLOSEEXTATTR() takes a boolean "commit" argument, which determines
if the aggregate changes are attempted written or not. A commit will
fail if any of the VOP_SETEXTATTR() calls since the VOP_OPENEXTATTR()
have failed to meet their objective or if the flush to disk fails.

The default operations for these two VOP's is to return EOPNOTSUPP.

This API may still be subject to change.

Sponsored by: DARPA & NAI Labs


# 71ea4ba5 21-Aug-2002 Jeff Roberson <jeff@FreeBSD.org>

- Add two new debugging macros: ASSERT_VI_LOCKED and ASSERT_VI_UNLOCKED
- Use the new VI asserts in place of the old mtx_assert checks.
- Add the VI asserts to the automated lock checking in the VOP calls. The
interlock should not be held across vops with a few exceptions.
- Add the vop_(un)lock_{pre,post} functions to assert that interlock is held
when LK_INTERLOCK is set.


# f8ef020e 30-Jul-2002 Robert Watson <rwatson@FreeBSD.org>

Begin committing support for Mandatory Access Control and extensible
kernel access control. The MAC framework permits loadable kernel
modules to link to the kernel at compile-time, boot-time, or run-time,
and augment the system security policy. This commit includes the
initial kernel implementation, although the interface with the userland
components of the operating system is still under work, and not all
kernel subsystems are supported. Later in this commit sequence,
documentation of which kernel subsystems will not work correctly with
a kernel compiled with MAC support will be added.

Introduce two node vnode operations required to support MAC. First,
VOP_REFRESHLABEL(), which will be invoked by callers requiring that
vp->v_label be sufficiently "fresh" for access control purposes.
Second, VOP_SETLABEL(), which be invoked by callers requiring that
the passed label contents be updated. The file system is responsible
for updating v_label if appropriate in coordination with the MAC
framework, as well as committing to disk. File systems that are
not MAC-aware need not implement these VOPs, as the MAC framework
will default to maintaining a single label for all vnodes based
on the label on the file system mount point.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# 1e4c7a13 30-Jul-2002 Jeff Roberson <jeff@FreeBSD.org>

- Acknowledge recursive vnode locks in the vop_unlock specification. The
vnode may not be unlocked even if the operation succeeded.


# 50bfcee1 09-Jul-2002 Jeff Roberson <jeff@FreeBSD.org>

- Use the new vop_lookup_{pre,post} instead of simpler locking specification.


# 41a5470d 07-Jul-2002 Jeff Roberson <jeff@FreeBSD.org>

- Require locks for getattr. At some point this could only require shared
locks.


# e818064e 05-Jul-2002 Jeff Roberson <jeff@FreeBSD.org>

- Disable original vop_strategy lock specification.
- Switch to the new vop_strategy_pre for lock validation.

VOP_STRATEGY requires only that the buf is locked UNLESS the block numbers need
to be translated. There may be other reasons, but as long as the underlying
layer uses a VOP to perform the operations they will be caught later.


# 13e407ef 05-Jul-2002 Jeff Roberson <jeff@FreeBSD.org>

Use the new #! directive for vop_rename. Leave the old lock specification
intact but disabled.


# 98b0c789 14-May-2002 Poul-Henning Kamp <phk@FreeBSD.org>

Make daddr_t and u_daddr_t 64bits wide.
Retire daddr64_t and use daddr_t instead.

Sponsored by: DARPA & NAI Labs.


# 0d2af521 15-Mar-2002 Kirk McKusick <mckusick@FreeBSD.org>

Introduce the new 64-bit size disk block, daddr64_t. Change
the bio and buffer structures to have daddr64_t bio_pblkno,
b_blkno, and b_lblkno fields which allows access to disks
larger than a Terabyte in size. This change also requires
that the VOP_BMAP vnode operation accept and return daddr64_t
blocks. This delta should not affect system operation in
any way. It merely sets up the necessary interfaces to allow
the development of disk drivers that work with these larger
disk block addresses. It also allows for the development of
UFS2 which will use 64-bit block addresses.


# eae13067 17-Feb-2002 Robert Watson <rwatson@FreeBSD.org>

Per discussion at BSDCon, note that the vop_getattr locking protocol
should require a shared lock, rather than an exclusive lock, which can
improve performance. No actual code change here, since a number of
VFS locking fixes are in the works.


# 17459091 10-Feb-2002 Robert Watson <rwatson@FreeBSD.org>

Add a comment indicating that the locking protocol should be updated
to be 'L L L' for vop_getattr(). Don't update it yet, because there
are still many offenders.


# 74237f55 09-Feb-2002 Robert Watson <rwatson@FreeBSD.org>

Part I: Update extended attribute API and ABI:

o Modify the system call syntax for extattr_{get,set}_{fd,file}() so
as not to use the scatter gather API (which appeared not to be used
by any consumers, and be less portable), rather, accepts 'data'
and 'nbytes' in the style of other simple read/write interfaces.
This changes the API and ABI.

o Modify system call semantics so that extattr_get_{fd,file}() return
a size_t. When performing a read, the number of bytes read will
be returned, unless the data pointer is NULL, in which case the
number of bytes of data are returned. This changes the API only.

o Modify the VOP_GETEXTATTR() vnode operation to accept a *size_t
argument so as to return the size, if desirable. If set to NULL,
the size will not be returned.

o Update various filesystems (pseodofs, ufs) to DTRT.

These changes should make extended attributes more useful and more
portable. More commits to rebuild the system call files, as well
as update userland utilities to follow.

Obtained from: TrustedBSD Project
Sponsored by: DARPA, NAI Labs


# b40ce416 12-Sep-2001 Julian Elischer <julian@FreeBSD.org>

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# 2b3dc41c 24-Jul-2001 Assar Westerlund <assar@FreeBSD.org>

correct description of `vpp' for mknod/symlink: they are actually
returned locked


# 855aa097 28-Apr-2001 Poul-Henning Kamp <phk@FreeBSD.org>

VOP_BALLOC was never really a VOP in the first place, so convert it
to UFS_BALLOC like the other "between UFS and FFS function interfaces".


# f84e29a0 17-Apr-2001 Poul-Henning Kamp <phk@FreeBSD.org>

This patch removes the VOP_BWRITE() vector.

VOP_BWRITE() was a hack which made it possible for NFS client
side to use struct buf with non-bio backing.

This patch takes a more general approach and adds a bp->b_op
vector where more methods can be added.

The success of this patch depends on bp->b_op being initialized
all relevant places for some value of "relevant" which is not
easy to determine. For now the buffers have grown a b_magic
element which will make such issues a tiny bit easier to debug.


# 30632071 18-Mar-2001 Robert Watson <rwatson@FreeBSD.org>

o Rename "namespace" argument to "attrnamespace" as namespace is a C++
reserved word.

Submitted by: jkh
Obtained from: TrustedBSD Project


# 70f36851 14-Mar-2001 Robert Watson <rwatson@FreeBSD.org>

o Change the API and ABI of the Extended Attribute kernel interfaces to
introduce a new argument, "namespace", rather than relying on a first-
character namespace indicator. This is in line with more recent
thinking on EA interfaces on various mailing lists, including the
posix1e, Linux acl-devel, and trustedbsd-discuss forums. Two namespaces
are defined by default, EXTATTR_NAMESPACE_SYSTEM and
EXTATTR_NAMESPACE_USER, where the primary distinction lies in the
access control model: user EAs are accessible based on the normal
MAC and DAC file/directory protections, and system attributes are
limited to kernel-originated or appropriately privileged userland
requests.

o These API changes occur at several levels: the namespace argument is
introduced in the extattr_{get,set}_file() system call interfaces,
at the vnode operation level in the vop_{get,set}extattr() interfaces,
and in the UFS extended attribute implementation. Changes are also
introduced in the VFS extattrctl() interface (system call, VFS,
and UFS implementation), where the arguments are modified to include
a namespace field, as well as modified to advoid direct access to
userspace variables from below the VFS layer (in the style of recent
changes to mount by adrian@FreeBSD.org). This required some cleanup
and bug fixing regarding VFS locks and the VFS interface, as a vnode
pointer may now be optionally submitted to the VFS_EXTATTRCTL()
call. Updated documentation for the VFS interface will be committed
shortly.

o In the near future, the auto-starting feature will be updated to
search two sub-directories to the ".attribute" directory in appropriate
file systems: "user" and "system" to locate attributes intended for
those namespaces, as the single filename is no longer sufficient
to indicate what namespace the attribute is intended for. Until this
is committed, all attributes auto-started by UFS will be placed in
the EXTATTR_NAMESPACE_SYSTEM namespace.

o The default POSIX.1e attribute names for ACLs and Capabilities have
been updated to no longer include the '$' in their filename. As such,
if you're using these features, you'll need to rename the attribute
backing files to the same names without '$' symbols in front.

o Note that these changes will require changes in userland, which will
be committed shortly. These include modifications to the extended
attribute utilities, as well as to libutil for new namespace
string conversion routines. Once the matching userland changes are
committed, a buildworld is recommended to update all the necessary
include files and verify that the kernel and userland environments
are in sync. Note: If you do not use extended attributes (most people
won't), upgrading is not imperative although since the system call
API has changed, the new userland extended attribute code will no longer
compile with old include files.

o Couple of minor cleanups while I'm there: make more code compilation
conditional on FFS_EXTATTR, which should recover a bit of space on
kernels running without EA's, as well as update copyright dates.

Obtained from: TrustedBSD Project


# 589c7af9 07-Mar-2001 Kirk McKusick <mckusick@FreeBSD.org>

Fixes to track snapshot copy-on-write checking in the specinfo
structure rather than assuming that the device vnode would reside
in the FFS filesystem (which is obviously a broken assumption with
the device filesystem).


# 608a3ce6 15-Feb-2001 Jonathan Lemon <jlemon@FreeBSD.org>

Extend kqueue down to the device layer.

Backwards compatible approach suggested by: peter


# e3c4036b 01-Nov-2000 Eivind Eklund <eivind@FreeBSD.org>

Give vop_mmap an untimely death. The opportunity to give it a timely
death timed out in 1996.


# 988ee790 21-Sep-2000 Robert Watson <rwatson@FreeBSD.org>

o Change locking rules for VOP_GETACL() to indicate that vnode locks
must be held when retrieving ACLs from vnodes. This is required for
EA-based UFS ACL implementations.
o Update vacl_get_acl() so that it does appropriate vnode locking.
o Remove static from M_ACL malloc define so that it is accessible for
consumers of ACLs other than in kern_acl.c

Obtained from: TrustedBSD Project


# 9ff5ce6b 12-Sep-2000 Boris Popov <bp@FreeBSD.org>

Add three new VOPs: VOP_CREATEVOBJECT, VOP_DESTROYVOBJECT and VOP_GETVOBJECT.
They will be used by nullfs and other stacked filesystems to support full
cache coherency.

Reviewed in general by: mckusick, dillon


# 877dd71f 26-Aug-2000 Robert Watson <rwatson@FreeBSD.org>

o Correct spelling of ufs_exttatr_find_attr -> ufs_extattr_find_attr
o Add "const" qualifier to attrname argument of various calls to remove
warnings

Obtained from: TrustedBSD Project


# 3ce7b7aa 26-Jul-2000 Robert Watson <rwatson@FreeBSD.org>

o Lock vnode before calling extattr_* VOP's, and modify vnode spec to
allow for that.
o Remember to call NDFREE() if exiting as a result of a failed
vn_start_write() when snapshotting.

Reviewed by: mckusick
Obtained from: TrustedBSD Project


# f2a2857b 11-Jul-2000 Kirk McKusick <mckusick@FreeBSD.org>

Add snapshots to the fast filesystem. Most of the changes support
the gating of system calls that cause modifications to the underlying
filesystem. The gating can be enabled by any filesystem that needs
to consistently suspend operations by adding the vop_stdgetwritemount
to their set of vnops. Once gating is enabled, the function
vfs_write_suspend stops all new write operations to a filesystem,
allows any filesystem modifying system calls already in progress
to complete, then sync's the filesystem to disk and returns. The
function vfs_write_resume allows the suspended write operations to
begin again. Gating is not added by default for all filesystems as
for SMP systems it adds two extra locks to such critical kernel
paths as the write system call. Thus, gating should only be added
as needed.

Details on the use and current status of snapshots in FFS can be
found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness
is not included here. Unless and until you create a snapshot file,
these changes should have no effect on your system (famous last words).


# 91f37dcb 18-Dec-1999 Robert Watson <rwatson@FreeBSD.org>

Second pass commit to introduce new ACL and Extended Attribute system
calls, vnops, vfsops, both in /kern, and to individual file systems that
require a vfsop_ array entry.

Reviewed by: eivind


# d776d82c 18-Dec-1999 Eivind Eklund <eivind@FreeBSD.org>

Since VOP_LOCK can be used to up and downgrade locks, it is not possible
to say anything about the lockstate before and after it. Thus, change the
lockspec from U L U to ? ? ?.


# 762e6b85 15-Dec-1999 Eivind Eklund <eivind@FreeBSD.org>

Introduce NDFREE (and remove VOP_ABORTOP)


# 6bdfe06a 11-Dec-1999 Eivind Eklund <eivind@FreeBSD.org>

Lock reporting and assertion changes.
* lockstatus() and VOP_ISLOCKED() gets a new process argument and a new
return value: LK_EXCLOTHER, when the lock is held exclusively by another
process.
* The ASSERT_VOP_(UN)LOCKED family is extended to use what this gives them
* Extend the vnode_if.src format to allow more exact specification than
locked/unlocked.

This commit should not do any semantic changes unless you are using
DEBUG_VFS_LOCKS.

Discussed with: grog, mch, peter, phk
Reviewed by: peter


# dd8c04f4 13-Nov-1999 Eivind Eklund <eivind@FreeBSD.org>

Remove WILLRELE from VOP_SYMLINK

Note: Previous commit to these files (except coda_vnops and devfs_vnops)
that claimed to remove WILLRELE from VOP_RENAME actually removed it from
VOP_MKNOD.


# edfe736d 11-Nov-1999 Eivind Eklund <eivind@FreeBSD.org>

Remove WILLRELE from VOP_RENAME


# 5c69e12c 26-Sep-1999 Eivind Eklund <eivind@FreeBSD.org>

Move the vop_islocked declaration to the top, in preparation for committing
code to auto-generate assertions from the lockspecs


# c3aac50f 27-Aug-1999 Peter Wemm <peter@FreeBSD.org>

$Id$ -> $FreeBSD$


# f9c8cab5 16-Jun-1999 Kirk McKusick <mckusick@FreeBSD.org>

Add a vnode argument to VOP_BWRITE to get rid of the last vnode
operator special case. Delete special case code from vnode_if.sh,
vnode_if.src, umap_vnops.c, and null_vnops.c.


# 361d0ec5 26-Mar-1999 Eivind Eklund <eivind@FreeBSD.org>

Remove incorrect lock specs for vop_whiteout (introduced by Lite/2).
The lock specs are for vnodes only.

Add (hopefully correct) lock specs for vop_strategy, vop_getpages and
vop_putpages.


# 0375c9f2 05-Sep-1998 Poul-Henning Kamp <phk@FreeBSD.org>

Add a new vnode op, VOP_FREEBLKS(), which filesystems can use to inform
device drivers about sectors no longer in use.

Device-drivers receive the call through d_strategy, if they have
D_CANFREE in d_flags.

This allows flash based devices to erase the sectors and avoid
pointlessly carrying them around in compactions.

Reviewed by: Kirk Mckusick, bde
Sponsored by: M-Systems (www.m-sys.com)


# fd5d1124 04-Jul-1998 Julian Elischer <julian@FreeBSD.org>

VOP_STRATEGY grows an (struct vnode *) argument
as the value in b_vp is often not really what you want.
(and needs to be frobbed). more cleanups will follow this.
Reviewed by: Bruce Evans <bde@freebsd.org>


# 7be2d300 06-May-1998 Mike Smith <msmith@FreeBSD.org>

In the words of the submitter:

---------
Make callers of namei() responsible for releasing references or locks
instead of having the underlying filesystems do it. This eliminates
redundancy in all terminal filesystems and makes it possible for stacked
transport layers such as umapfs or nullfs to operate correctly.

Quality testing was done with testvn, and lat_fs from the lmbench suite.

Some NFS client testing courtesy of Patrik Kudo.

vop_mknod and vop_symlink still release the returned vpp. vop_rename
still releases 4 vnode arguments before it returns. These remaining cases
will be corrected in the next set of patches.
---------

Submitted by: Michael Hancock <michaelh@cet.co.jp>


# b1897c19 08-Mar-1998 Julian Elischer <julian@FreeBSD.org>

Reviewed by: dyson@freebsd.org (john Dyson), dg@root.com (david greenman)
Submitted by: Kirk McKusick (mcKusick@mckusick.com)
Obtained from: WHistle development tree


# 987f5696 16-Oct-1997 Poul-Henning Kamp <phk@FreeBSD.org>

Another VFS cleanup "kilo commit"

1. Remove VOP_UPDATE, it is (also) an UFS/{FFS,LFS,EXT2FS,MFS}
intereface function, and now lives in the ufsmount structure.

2. Remove VOP_SEEK, it was unused.

3. Add mode default vops:

VOP_ADVLOCK vop_einval
VOP_CLOSE vop_null
VOP_FSYNC vop_null
VOP_IOCTL vop_enotty
VOP_MMAP vop_einval
VOP_OPEN vop_null
VOP_PATHCONF vop_einval
VOP_READLINK vop_einval
VOP_REALLOCBLKS vop_eopnotsupp

And remove identical functionality from filesystems

4. Add vop_stdpathconf, which returns the canonical stuff. Use
it in the filesystems. (XXX: It's probably wrong that specfs
and fifofs sets this vop, shouldn't it come from the "host"
filesystem, for instance ufs or cd9660 ?)

5. Try to make system wide VOP functions have vop_* names.

6. Initialize the um_* vectors in LFS.

(Recompile your LKMS!!!)


# cec0f20c 16-Oct-1997 Poul-Henning Kamp <phk@FreeBSD.org>

VFS mega cleanup commit (x/N)

1. Add new file "sys/kern/vfs_default.c" where default actions for
VOPs go. Implement proper defaults for ABORTOP, BWRITE, LEASE,
POLL, REVOKE and STRATEGY. Various stuff spread over the entire
tree belongs here.

2. Change VOP_BLKATOFF to a normal function in cd9660.

3. Kill VOP_BLKATOFF, VOP_TRUNCATE, VOP_VFREE, VOP_VALLOC. These
are private interface functions between UFS and the underlying
storage manager layer (FFS/LFS/MFS/EXT2FS). The functions now
live in struct ufsmount instead.

4. Remove a kludge of VOP_ functions in all filesystems, that did
nothing but obscure the simplicity and break the expandability.
If a filesystem doesn't implement VOP_FOO, it shouldn't have an
entry for it in its vnops table. The system will try to DTRT
if it is not implemented. There are still some cruft left, but
the bulk of it is done.

5. Fix another VCALL in vfs_cache.c (thanks Bruce!)


# 6b8e64f5 13-Sep-1997 Peter Wemm <peter@FreeBSD.org>

Change VOP_SELECT to VOP_POLL


# c049f064 25-Aug-1997 Poul-Henning Kamp <phk@FreeBSD.org>

Add a new vnode op (cachedlookup) so that filesystems can plug into
a global vfs_cache check. The rest of this change will come when the
current zero size file problem is resolved.


# 996c772f 09-Feb-1997 John Dyson <dyson@FreeBSD.org>

This is the kernel Lite/2 commit. There are some requisite userland
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.

The system boots and can mount UFS filesystems.

Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
Mount_std mounts will not work until the getfsent
library routine is changed.

Reviewed by: various people
Submitted by: Jeffery Hsu <hsu@freebsd.org>


# 1130b656 14-Jan-1997 Jordan K. Hubbard <jkh@FreeBSD.org>

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# edbfedac 11-Mar-1996 Peter Wemm <peter@FreeBSD.org>

Import 4.4BSD-Lite2 onto the vendor branch, note that in the kernel, all
files are off the vendor branch, so this should not change anything.

A "U" marker generally means that the file was not changed in between
the 4.4Lite and Lite-2 releases, and does not need a merge. "C" generally
means that there was a change.
[note new unused (in this form) syscalls.conf, to be 'cvs rm'ed]


# a316d390 10-Dec-1995 John Dyson <dyson@FreeBSD.org>

Changes to support 1Tb filesizes. Pages are now named by an
(object,index) pair instead of (object,offset) pair.


# cd0402ec 22-Oct-1995 John Dyson <dyson@FreeBSD.org>

Interface change for the VOP_GETPAGES -- missed in previous commits.


# c83ebe77 03-Sep-1995 John Dyson <dyson@FreeBSD.org>

Added VOP_GETPAGES/VOP_PUTPAGES and also the "backwards" block count
for VOP_BMAP. Updated affected filesystems...


# 47777413 01-Aug-1995 David Greenman <dg@FreeBSD.org>

Removed my special-case hack for VOP_LINK and fixed the problem with the
wrong vp's ops vector being used by changing the VOP_LINK's argument order.
The special-case hack doesn't go far enough and breaks the generic
bypass routine used in some non-leaf filesystems. Pointed out by Kirk
McKusick.


# 083c109d 07-Jul-1995 David Greenman <dg@FreeBSD.org>

The generated VCALL always uses the first vp which in the case of /link/
might not be handled by the same FS as the directory (e.g. special device
files)...so it must be special-cased. This bug is seen when doing
"ln /dev/console /dev/foo" or equivilent and first appeared after I fixed
the argument order of VOP_LINK. YUCK! There really needs to be a way of
specifying what vp to use in the VCALL; doing this could fix the strategy
and bwrite special-cases, too.


# 98796526 28-Jun-1995 David Greenman <dg@FreeBSD.org>

Fixed VOP_LINK argument order botch.


# 9abf4d6e 28-Sep-1994 Doug Rabson <dfr@FreeBSD.org>

Make NFS ask the filesystems for directory cookies instead of making them
itself.


# 3c4dd356 02-Aug-1994 David Greenman <dg@FreeBSD.org>

Added $Id$


# df8bae1d 24-May-1994 Rodney W. Grimes <rgrimes@FreeBSD.org>

BSD 4.4 Lite Kernel Sources