Cross Reference: /freebsd-10.1-release/sys/nfsserver/

History log of /freebsd-10.1-release/sys/nfsserver/
Revision	Date	Author	Comments
272461	03-Oct-2014	gjb	Copy stable/10@r272459 to releng/10.1 as part of the 10.1-RELEASE process. Approved by: re (implicit) Sponsored by: The FreeBSD Foundation
268961	21-Jul-2014	bdrewery	MFC r268114: Change NFS readdir() to only ignore cookies preceding the given offset for UFS rather than for all but ZFS.
265667	08-May-2014	rmacklem	MFC: r264888 The PR reported that the old NFS server did not set uio_td == NULL for the VOP_READ() call. This patch fixes both the old and new server for this case.
261049	22-Jan-2014	mav	MFC r259765: Fix RPC server threads file handle affinity to work better with ZFS. Instead of taking 8 specific bytes of file handle to identify file during RPC thread affitinity handling, use trivial hash of the full file handle. ZFS's struct zfid_short does not have padding field after the length field, as result, originally picked 8 bytes are loosing lower 16 bits of object ID, causing many false matches and unneeded requests affinity to same thread. This fix substantially improves NFS server latency and scalability in SPEC NFS benchmark by more flexible use of multiple NFS threads.
256281	10-Oct-2013	gjb	Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle. Approved by: re (implicit) Sponsored by: The FreeBSD Foundation
255219	05-Sep-2013	pjd	Change the cap_rights_t type from uint64_t to a structure that we can extend in the future in a backward compatible (API and ABI) way. The cap_rights_t represents capability rights. We used to use one bit to represent one right, but we are running out of spare bits. Currently the new structure provides place for 114 rights (so 50 more than the previous cap_rights_t), but it is possible to grow the structure to hold at least 285 rights, although we can make it even larger if 285 rights won't be enough. The structure definition looks like this: struct cap_rights { uint64_t cr_rights[CAP_RIGHTS_VERSION + 2]; }; The initial CAP_RIGHTS_VERSION is 0. The top two bits in the first element of the cr_rights[] array contain total number of elements in the array - 2. This means if those two bits are equal to 0, we have 2 array elements. The top two bits in all remaining array elements should be 0. The next five bits in all array elements contain array index. Only one bit is used and bit position in this five-bits range defines array index. This means there can be at most five array elements in the future. To define new right the CAPRIGHT() macro must be used. The macro takes two arguments - an array index and a bit to set, eg. #define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL) We still support aliases that combine few rights, but the rights have to belong to the same array element, eg: #define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL) #define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL) #define CAP_FCHMODAT (CAP_FCHMOD \| CAP_LOOKUP) There is new API to manage the new cap_rights_t structure: cap_rights_t cap_rights_init(cap_rights_t rights, ...); void cap_rights_set(cap_rights_t rights, ...); void cap_rights_clear(cap_rights_t rights, ...); bool cap_rights_is_set(const cap_rights_t rights, ...); bool cap_rights_is_valid(const cap_rights_t rights); void cap_rights_merge(cap_rights_t dst, const cap_rights_t src); void cap_rights_remove(cap_rights_t dst, const cap_rights_t src); bool cap_rights_contains(const cap_rights_t big, const cap_rights_t little); Capability rights to the cap_rights_init(), cap_rights_set(), cap_rights_clear() and cap_rights_is_set() functions are provided by separating them with commas, eg: cap_rights_t rights; cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT); There is no need to terminate the list of rights, as those functions are actually macros that take care of the termination, eg: #define cap_rights_set(rights, ...) \ __cap_rights_set((rights), __VA_ARGS__, 0ULL) void __cap_rights_set(cap_rights_t *rights, ...); Thanks to using one bit as an array index we can assert in those functions that there are no two rights belonging to different array elements provided together. For example this is illegal and will be detected, because CAP_LOOKUP belongs to element 0 and CAP_PDKILL to element 1: cap_rights_init(&rights, CAP_LOOKUP \| CAP_PDKILL); Providing several rights that belongs to the same array's element this way is correct, but is not advised. It should only be used for aliases definition. This commit also breaks compatibility with some existing Capsicum system calls, but I see no other way to do that. This should be fine as Capsicum is still experimental and this change is not going to 9.x. Sponsored by: The FreeBSD Foundation
251171	31-May-2013	jeff	- Convert the bufobj lock to rwlock. - Use a shared bufobj lock in getblk() and inmem(). - Convert softdep's lk to rwlock to match the bufobj lock. - Move INFREECNT to b_flags and protect it with the buf lock. - Remove unnecessary locking around bremfree() and BKGRDINPROG. Sponsored by: EMC / Isilon Storage Division Discussed with: mckusick, kib, mdf
249596	17-Apr-2013	ken	Move the NFS FHA (File Handle Affinity) code from sys/nfsserver to sys/nfs, since it is now shared by the two NFS servers. Suggested by: rmacklem Sponsored by: Spectra Logic MFC after: 2 weeks
249592	17-Apr-2013	ken	Revamp the old NFS server's File Handle Affinity (FHA) code so that it will work with either the old or new server. The FHA code keeps a cache of currently active file handles for NFSv2 and v3 requests, so that read and write requests for the same file are directed to the same group of threads (reads) or thread (writes). It does not currently work for NFSv4 requests. They are more complex, and will take more work to support. This improves read-ahead performance, especially with ZFS, if the FHA tuning parameters are configured appropriately. Without the FHA code, concurrent reads that are part of a sequential read from a file will be directed to separate NFS threads. This has the effect of confusing the ZFS zfetch (prefetch) code and makes sequential reads significantly slower with clients like Linux that do a lot of prefetching. The FHA code has also been updated to direct write requests to nearby file offsets to the same thread in the same way it batches reads, and the FHA code will now also send writes to multiple threads when needed. This improves sequential write performance in ZFS, because writes to a file are now more ordered. Since NFS writes (generally less than 64K) are smaller than the typical ZFS record size (usually 128K), out of order NFS writes to the same block can trigger a read in ZFS. Sending them down the same thread increases the odds of their being in order. In order for multiple write threads per file in the FHA code to be useful, writes in the NFS server have been changed to use a LK_SHARED vnode lock, and upgrade that to LK_EXCLUSIVE if the filesystem doesn't allow multiple writers to a file at once. ZFS is currently the only filesystem that allows multiple writers to a file, because it has internal file range locking. This change does not affect the NFSv4 code. This improves random write performance to a single file in ZFS, since we can now have multiple writers inside ZFS at one time. I have changed the default tuning parameters to a 22 bit (4MB) window size (from 256K) and unlimited commands per thread as a result of my benchmarking with ZFS. The FHA code has been updated to allow configuring the tuning parameters from loader tunable variables in addition to sysctl variables. The read offset window calculation has been slightly modified as well. Instead of having separate bins, each file handle has a rolling window of bin_shift size. This minimizes glitches in throughput when shifting from one bin to another. sys/conf/files: Add nfs_fha_new.c and nfs_fha_old.c. Compile nfs_fha.c when either the old or the new NFS server is built. sys/fs/nfs/nfsport.h, sys/fs/nfs/nfs_commonport.c: Bring in changes from Rick Macklem to newnfs_realign that allow it to operate in blocking (M_WAITOK) or non-blocking (M_NOWAIT) mode. sys/fs/nfs/nfs_commonsubs.c, sys/fs/nfs/nfs_var.h: Bring in a change from Rick Macklem to allow telling nfsm_dissect() whether or not to wait for mallocs. sys/fs/nfs/nfsm_subs.h: Bring in changes from Rick Macklem to create a new nfsm_dissect_nonblock() inline function and NFSM_DISSECT_NONBLOCK() macro. sys/fs/nfs/nfs_commonkrpc.c, sys/fs/nfsclient/nfs_clkrpc.c: Add the malloc wait flag to a newnfs_realign() call. sys/fs/nfsserver/nfs_nfsdkrpc.c: Setup the new NFS server's RPC thread pool so that it will call the FHA code. Add the malloc flag argument to newnfs_realign(). Unstaticize newnfs_nfsv3_procid[] so that we can use it in the FHA code. sys/fs/nfsserver/nfs_nfsdsocket.c: In nfsrvd_dorpc(), add NFSPROC_WRITE to the list of RPC types that use the LK_SHARED lock type. sys/fs/nfsserver/nfs_nfsdport.c: In nfsd_fhtovp(), if we're starting a write, check to see whether the underlying filesystem supports shared writes. If not, upgrade the lock type from LK_SHARED to LK_EXCLUSIVE. sys/nfsserver/nfs_fha.c: Remove all code that is specific to the NFS server implementation. Anything that is server-specific is now accessed through a callback supplied by that server's FHA shim in the new softc. There are now separate sysctls and tunables for the FHA implementations for the old and new NFS servers. The new NFS server has its tunables under vfs.nfsd.fha, the old NFS server's tunables are under vfs.nfsrv.fha as before. In fha_extract_info(), use callouts for all server-specific code. Getting file handles and offsets is now done in the individual server's shim module. In fha_hash_entry_choose_thread(), change the way we decide whether two reads are in proximity to each other. Previously, the calculation was a simple shift operation to see whether the offsets were in the same power of 2 bucket. The issue was that there would be a bucket (and therefore thread) transition, even if the reads were in close proximity. When there is a thread transition, reads wind up going somewhat out of order, and ZFS gets confused. The new calculation simply tries to see whether the offsets are within 1 << bin_shift of each other. If they are, the reads will be sent to the same thread. The effect of this change is that for sequential reads, if the client doesn't exceed the max_reqs_per_nfsd parameter and the bin_shift is set to a reasonable value (22, or 4MB works well in my tests), the reads in any sequential stream will largely be confined to a single thread. Change fha_assign() so that it takes a softc argument. It is now called from the individual server's shim code, which will pass in the softc. Change fhe_stats_sysctl() so that it takes a softc parameter. It is now called from the individual server's shim code. Add the current offset to the list of things printed out about each active thread. Change the num_reads and num_writes counters in the fha_hash_entry structure to 32-bit values, and rename them num_rw and num_exclusive, respectively, to reflect their changed usage. Add an enable sysctl and tunable that allows the user to disable the FHA code (when vfs.XXX.fha.enable = 0). This is useful for before/after performance comparisons. nfs_fha.h: Move most structure definitions out of nfs_fha.c and into the header file, so that the individual server shims can see them. Change the default bin_shift to 22 (4MB) instead of 18 (256K). Allow unlimited commands per thread. sys/nfsserver/nfs_fha_old.c, sys/nfsserver/nfs_fha_old.h, sys/fs/nfsserver/nfs_fha_new.c, sys/fs/nfsserver/nfs_fha_new.h: Add shims for the old and new NFS servers to interface with the FHA code, and callbacks for the The shims contain all of the code and definitions that are specific to the NFS servers. They setup the server-specific callbacks and set the server name for the sysctl and loader tunable variables. sys/nfsserver/nfs_srvkrpc.c: Configure the RPC code to call fhaold_assign() instead of fha_assign(). sys/modules/nfsd/Makefile: Add nfs_fha.c and nfs_fha_new.c. sys/modules/nfsserver/Makefile: Add nfs_fha_old.c. Reviewed by: rmacklem Sponsored by: Spectra Logic MFC after: 2 weeks
248084	09-Mar-2013	attilio	Switch the vm_object mutex to be a rwlock. This will enable in the future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes. The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs. The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example). Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho
247602	02-Mar-2013	pjd	Merge Capsicum overhaul: - Capability is no longer separate descriptor type. Now every descriptor has set of its own capability rights. - The cap_new(2) system call is left, but it is no longer documented and should not be used in new code. - The new syscall cap_rights_limit(2) should be used instead of cap_new(2), which limits capability rights of the given descriptor without creating a new one. - The cap_getrights(2) syscall is renamed to cap_rights_get(2). - If CAP_IOCTL capability right is present we can further reduce allowed ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed ioctls can be retrived with cap_ioctls_get(2) syscall. - If CAP_FCNTL capability right is present we can further reduce fcntls that can be used with the new cap_fcntls_limit(2) syscall and retrive them with cap_fcntls_get(2). - To support ioctl and fcntl white-listing the filedesc structure was heavly modified. - The audit subsystem, kdump and procstat tools were updated to recognize new syscalls. - Capability rights were revised and eventhough I tried hard to provide backward API and ABI compatibility there are some incompatible changes that are described in detail below: CAP_CREATE old behaviour: - Allow for openat(2)+O_CREAT. - Allow for linkat(2). - Allow for symlinkat(2). CAP_CREATE new behaviour: - Allow for openat(2)+O_CREAT. Added CAP_LINKAT: - Allow for linkat(2). ABI: Reuses CAP_RMDIR bit. - Allow to be target for renameat(2). Added CAP_SYMLINKAT: - Allow for symlinkat(2). Removed CAP_DELETE. Old behaviour: - Allow for unlinkat(2) when removing non-directory object. - Allow to be source for renameat(2). Removed CAP_RMDIR. Old behaviour: - Allow for unlinkat(2) when removing directory. Added CAP_RENAMEAT: - Required for source directory for the renameat(2) syscall. Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR): - Allow for unlinkat(2) on any object. - Required if target of renameat(2) exists and will be removed by this call. Removed CAP_MAPEXEC. CAP_MMAP old behaviour: - Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and PROT_WRITE. CAP_MMAP new behaviour: - Allow for mmap(2)+PROT_NONE. Added CAP_MMAP_R: - Allow for mmap(PROT_READ). Added CAP_MMAP_W: - Allow for mmap(PROT_WRITE). Added CAP_MMAP_X: - Allow for mmap(PROT_EXEC). Added CAP_MMAP_RW: - Allow for mmap(PROT_READ \| PROT_WRITE). Added CAP_MMAP_RX: - Allow for mmap(PROT_READ \| PROT_EXEC). Added CAP_MMAP_WX: - Allow for mmap(PROT_WRITE \| PROT_EXEC). Added CAP_MMAP_RWX: - Allow for mmap(PROT_READ \| PROT_WRITE \| PROT_EXEC). Renamed CAP_MKDIR to CAP_MKDIRAT. Renamed CAP_MKFIFO to CAP_MKFIFOAT. Renamed CAP_MKNODE to CAP_MKNODEAT. CAP_READ old behaviour: - Allow pread(2). - Disallow read(2), readv(2) (if there is no CAP_SEEK). CAP_READ new behaviour: - Allow read(2), readv(2). - Disallow pread(2) (CAP_SEEK was also required). CAP_WRITE old behaviour: - Allow pwrite(2). - Disallow write(2), writev(2) (if there is no CAP_SEEK). CAP_WRITE new behaviour: - Allow write(2), writev(2). - Disallow pwrite(2) (CAP_SEEK was also required). Added convinient defines: #define CAP_PREAD (CAP_SEEK \| CAP_READ) #define CAP_PWRITE (CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_R (CAP_MMAP \| CAP_SEEK \| CAP_READ) #define CAP_MMAP_W (CAP_MMAP \| CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_X (CAP_MMAP \| CAP_SEEK \| 0x0000000000000008ULL) #define CAP_MMAP_RW (CAP_MMAP_R \| CAP_MMAP_W) #define CAP_MMAP_RX (CAP_MMAP_R \| CAP_MMAP_X) #define CAP_MMAP_WX (CAP_MMAP_W \| CAP_MMAP_X) #define CAP_MMAP_RWX (CAP_MMAP_R \| CAP_MMAP_W \| CAP_MMAP_X) #define CAP_RECV CAP_READ #define CAP_SEND CAP_WRITE #define CAP_SOCK_CLIENT \ (CAP_CONNECT \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| CAP_GETSOCKOPT \| \ CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| CAP_SETSOCKOPT \| CAP_SHUTDOWN) #define CAP_SOCK_SERVER \ (CAP_ACCEPT \| CAP_BIND \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| \ CAP_GETSOCKOPT \| CAP_LISTEN \| CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| \ CAP_SETSOCKOPT \| CAP_SHUTDOWN) Added defines for backward API compatibility: #define CAP_MAPEXEC CAP_MMAP_X #define CAP_DELETE CAP_UNLINKAT #define CAP_MKDIR CAP_MKDIRAT #define CAP_RMDIR CAP_UNLINKAT #define CAP_MKFIFO CAP_MKFIFOAT #define CAP_MKNOD CAP_MKNODAT #define CAP_SOCK_ALL (CAP_SOCK_CLIENT \| CAP_SOCK_SERVER) Sponsored by: The FreeBSD Foundation Reviewed by: Christoph Mallon <christoph.mallon@gmx.de> Many aspects discussed with: rwatson, benl, jonathan ABI compatibility discussed with: kib
245611	18-Jan-2013	jhb	Use vfs_timestamp() to set file timestamps rather than invoking getmicrotime() or getnanotime() directly in NFS. Reviewed by: rmacklem, bde MFC after: 1 week
243882	05-Dec-2012	glebius	Mechanically substitute flags from historic mbuf allocator with malloc(9) flags within sys. Exceptions: - sys/contrib not touched - sys/mbuf.h edited manually
241896	22-Oct-2012	kib	Remove the support for using non-mpsafe filesystem modules. In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho
241394	10-Oct-2012	kevlo	Revert previous commit... Pointyhat to: kevlo (myself)
241370	09-Oct-2012	kevlo	Prefer NULL over 0 for pointers
241025	28-Sep-2012	kib	Fix the mis-handling of the VV_TEXT on the nullfs vnodes. If you have a binary on a filesystem which is also mounted over by nullfs, you could execute the binary from the lower filesystem, or from the nullfs mount. When executed from lower filesystem, the lower vnode gets VV_TEXT flag set, and the file cannot be modified while the binary is active. But, if executed as the nullfs alias, only the nullfs vnode gets VV_TEXT set, and you still can open the lower vnode for write. Add a set of VOPs for the VV_TEXT query, set and clear operations, which are correctly bypassed to lower vnode. Tested by: pho (previous version) MFC after: 2 weeks
228520	15-Dec-2011	delphij	Honor NFSv3 commit call (RFC 1813, Section 3.3.21) where when count is 0, the full length from offset is being flushed. Note that for now VOP_FSYNC does not support offset and length parameters so we still do the same full VOP_FSYNC. This issue was reported at FreeNAS support site as FreeNAS ticket #1096. Submitted by: "ceckerle" <ce.freenas eckerle net> Prodded by: gcooper Reviewed by: rmacklem MFC after: 2 weeks
228185	01-Dec-2011	jhb	Enhance the sequential access heuristic used to perform readahead in the NFS server and reuse it for writes as well to allow writes to the backing store to be clustered. - Use a prime number for the size of the heuristic table (1017 is not prime). - Move the logic to locate a heuristic entry from the table and compute the sequential count out of VOP_READ() and into a separate routine. - Use the logic from sequential_heuristic() in vfs_vnops.c to update the seqcount when a sequential access is performed rather than just increasing seqcount by 1. This lets the clustering count ramp up faster. - Allow for some reordering of RPCs and if it is detected leave the current seqcount as-is rather than dropping back to a seqcount of 1. Also, when out of order access is encountered, cut seqcount in half rather than dropping it all the way back to 1 to further aid with reordering. - Fix the new NFS server to properly update the next offset after a successful VOP_READ() so that the readahead actually works. Some of these changes came from an earlier patch by Bjorn Gronwall that was forwarded to me by bde@. Discussed with: bde, rmacklem, fs@ Submitted by: Bjorn Gronwall (1, 4) MFC after: 2 weeks
225356	03-Sep-2011	rmacklem	Fix the NFS servers so that they can do a Lookup of "..", which requires that ni_strictrelative be set to 0, post-r224810. Tested by: swills (earlier version), geo dot liaskos at gmail.com Approved by: re (kib)
224778	11-Aug-2011	rwatson	Second-to-last commit implementing Capsicum capabilities in the FreeBSD kernel for FreeBSD 9.0: Add a new capability mask argument to fget(9) and friends, allowing system call code to declare what capabilities are required when an integer file descriptor is converted into an in-kernel struct file *. With options CAPABILITIES compiled into the kernel, this enforces capability protection; without, this change is effectively a no-op. Some cases require special handling, such as mmap(2), which must preserve information about the maximum rights at the time of mapping in the memory map so that they can later be enforced in mprotect(2) -- this is done by narrowing the rights in the existing max_protection field used for similar purposes with file permissions. In namei(9), we assert that the code is not reached from within capability mode, as we're not yet ready to enforce namespace capabilities there. This will follow in a later commit. Update two capability names: CAP_EVENT and CAP_KEVENT become CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they represent. Approved by: re (bz) Submitted by: jonathan Sponsored by: Google Inc
223309	19-Jun-2011	rmacklem	Fix the kgssapi so that it can be loaded as a module. Currently the NFS subsystems use five of the rpcsec_gss/kgssapi entry points, but since it was not obvious which others might be useful, all nineteen were included. Basically the nineteen entry points are set in a structure called rpc_gss_entries and inline functions defined in sys/rpc/rpcsec_gss.h check for the entry points being non-NULL and then call them. A default value is returned otherwise. Requested by rwatson. Reviewed by: jhb MFC after: 2 weeks
222167	22-May-2011	rmacklem	Add a lock flags argument to the VFS_FHTOVP() file system method, so that callers can indicate the minimum vnode locking requirement. This will allow some file systems to choose to return a LK_SHARED locked vnode when LK_SHARED is specified for the flags argument. This patch only adds the flag. It does not change any file system to use it and all callers specify LK_EXCLUSIVE, so file system semantics are not changed. Reviewed by: kib
219028	25-Feb-2011	netchild	Add some FEATURE macros for various features (AUDIT/CAM/IPC/KTR/MAC/NFS/NTP/ PMC/SYSV/...). No FreeBSD version bump, the userland application to query the features will be committed last and can serve as an indication of the availablility if needed. Sponsored by: Google Summer of Code 2010 Submitted by: kibab Reviewed by: arch@ (parts by rwatson, trasz, jhb) X-MFC after: to be determined in last commit with code from this project
218345	05-Feb-2011	alc	Unless "cnt" exceeds MAX_COMMIT_COUNT, nfsrv_commit() and nfsvno_fsync() are incorrectly calling vm_object_page_clean(). They are passing the length of the range rather than the ending offset of the range. Perform the OFF_TO_IDX() conversion in vm_object_page_clean() rather than the callers. Reviewed by: kib MFC after: 3 weeks
216774	28-Dec-2010	pjd	ZFS might not return monotonically increasing directory offset cookies, so turn off UFS-specific hack that assumes so in ZFS case. Before the change we can miss returning some directory entries to a NFS client. I believe that the hack should be moved to ufs_readdir(), but until we find somebody who will do it, turn it off for ZFS in NFS server code. Submitted by: rmacklem Discussed with: rmacklem, mckusick MFC after: 3 days
216633	21-Dec-2010	pjd	Use newly added NFSRV_FLAG_BUSY flag for nfsrv_fhtovp() to keep mount point busy. This fixes a race where we can pass invalid mount point to VFS_VGET() via vp->v_mount when exported file system was forcibly unmounted between nfsrv_fhtovp() and VFS_VGET(). Reviewed by: kib MFC after: 5 days
216632	21-Dec-2010	pjd	- Move pubflag and lockflag handling from nfsrv_fhtovp() to nfs_namei() - this is the only place that is different from all the other nfsrv_fhtovp() consumers. This simplifies nfsrv_fhtovp() a bit and also eliminates one vn_lock/VOP_UNLOCK() cycle in case of NFSv3. - Implement NFSRV_FLAG_BUSY flag for nfsrv_fhtovp() that tells it to leave mount point busy. Reviewed by: kib MFC after: 5 days
216631	21-Dec-2010	pjd	On error, unbusy file system and jump to the end, so we won't try to unlock NULL *vpp. Reviewed by: kib MFC after: 5 days
216627	21-Dec-2010	pjd	After r216626 no extra { } are needed with VFS_UNLOCK_GIANT().
216565	19-Dec-2010	pjd	Reduce lock scope a little.
216454	15-Dec-2010	kib	VOP_ISLOCKED() should not be used to determine if the vnode is locked. Explicitely track the locked status of the vnode. Reviewed by: pjd Tested by: avg MFC after: 1 week
214851	05-Nov-2010	kib	Fix a bug in r214049. The nvp == vp case shall be handled specially only for !usevget case. If VFS_VGET is working, the vnode shared lock is obtained recursively and vput() shall be done, not vunref(). Submitted by: rmacklem Tested by: Josh Carroll <josh.carroll gmail com> MFC after: 3 days
214049	19-Oct-2010	kib	When readdirplus() is handled on the exported filesystem that does not support VFS_VGET, like msdosfs, do not call VOP_LOOKUP() for dotdot on the root directory. Our filesystems expect that VFS handles dotdot lookups on root on its own. Reported and tested by: kevlo MFC after: 2 weeks
211854	26-Aug-2010	pjd	- When VFS_VGET() is not supported, switch to VOP_LOOKUP(). - We are fine by only share-locking the vnode. - Remove assertion that doesn't hold for ZFS where we cross mount points boundaries by going into .zfs/snapshot/<name>/. Reviewed by: rmacklem MFC after: 1 month
205661	26-Mar-2010	rmacklem	Patch the regular NFS server so that it returns ESTALE to the client for all errors returned by VFS_FHTOVP(). This is required to ensure that EIO doesn't get returned to the client when ZFS is used as the server file system. Tested by: korvus AT comcast.net Reviewed by: jhb MFC after: 2 weeks
203968	16-Feb-2010	marius	Factor out the code shared between NFS client and server into its own module. With r203732 it became apparent that creating the sysctl nodes twice causes at least a warning, however the whole code shouldn't be present twice in the first place. Discussed with: rmacklem
203732	09-Feb-2010	marius	- Move nfs_realign() from the NFS client to the shared NFS code and remove the NFS server version in order to reduce code duplication. The shared version now uses a second parameter how, which is passed on to m_get(9) and m_getcl(9) as the server used M_WAIT while the client requires M_DONTWAIT, and replaces the the previously unused parameter hsiz. - Change nfs_realign() to use nfsm_aligned() so as with other NFS code the alignment check isn't actually performed on platforms without strict alignment requirements for performance reasons because as the comment suggests unaligned data only occasionally occurs with TCP. - Change fha_extract_info() to use nfs_realign() with M_DONTWAIT rather than M_WAIT because it's called with the RPC sp_lock held. Reviewed by: jhb, rmacklem MFC after: 1 week
201899	09-Jan-2010	marius	Some style(9) fixes in order to fabricate a commit to denote that the commit message for r201896 actually should have read: As nfsm_srvmtofh_xx() assumes the 4-byte alignment required by XDR ensure the mbuf data is aligned accordingly by calling nfs_realign() in fha_extract_info(). This fix is orthogonal to the problem solved by r199274/r199284. PR: 142102 (second part) MFC after: 1 week
201896	09-Jan-2010	marius	Exclude options COMPAT_FREEBSD4 now that the MD freebsd4_sigreturn() is gone since r201396 and which is also in line with the fact that FreeBSD 4 didn't supported sparc64. PR: 142102 (second part) MFC after: 1 week
200084	03-Dec-2009	jhb	Properly return an error reply if an NFS remove or link operation fails. Previously the failing operation would allocate an mbuf and construct an error reply, but because the function did not return 0, the NFS server assumed it had failed to generate a reply and would leak the reply mbuf as well as not sending the reply to the NFS client. PR: kern/140853 Submitted by: Ted Faber faber at isi edu (remove) Reviewed by: rmacklem (remove) MFC after: 1 week
199284	15-Nov-2009	marcel	Revert previous change and fix misalignment by using bcopy() to copy the file handle from fid_data into fh. This eliminates conditional compilation. Pointed out by: imp
199274	14-Nov-2009	marcel	Fix an obvious panic by not casting from a pointer that is 4-bytes alignment to a type that needs 8-byte alignment, and thus causing misaligned memory references. MFC after: 1 week
197525	26-Sep-2009	pjd	Ensure that tv_sec is between INT32_MIN and INT32_MAX, so ZFS won't object. This completes the fix from r185586. PR: kern/139059 Reported by: Daniel Braniss <danny@cs.huji.ac.il> Submitted by: Jaakko Heinonen <jh@saunalahti.fi> Tested by: Daniel Braniss <danny@cs.huji.ac.il> MFC after: 3 days
197040	09-Sep-2009	pjd	Correct typo after manual patching. Noticed by: b. f.
197039	09-Sep-2009	pjd	Fix usecount leak in mknod(2) on file system exported over NFS. While I'm here, correct typo in comment. Reviewed by: kan, kib MFC after: 3 days
195202	30-Jun-2009	dfr	Remove the old kernel RPC implementation and the NFS_LEGACYRPC option. Approved by: re
195181	30-Jun-2009	jhb	Fix build with NFS_LEGACYRPC enabled after the socket upcall locking changes. Approved by: re (kensmith)
194498	19-Jun-2009	brooks	Rework the credential code to support larger values of NGROUPS and NGROUPS_MAX, eliminate ABI dependencies on them, and raise the to 1024 and 1023 respectively. (Previously they were equal, but under a close reading of POSIX, NGROUPS_MAX was defined to be too large by 1 since it is the number of supplemental groups, not total number of groups.) The bulk of the change consists of converting the struct ucred member cr_groups from a static array to a pointer. Do the equivalent in kinfo_proc. Introduce new interfaces crcopysafe() and crsetgroups() for duplicating a process credential before modifying it and for setting group lists respectively. Both interfaces take care for the details of allocating groups array. crsetgroups() takes care of truncating the group list to the current maximum (NGROUPS) if necessary. In the future, crsetgroups() may be responsible for insuring invariants such as sorting the supplemental groups to allow groupmember() to be implemented as a binary search. Because we can not change struct xucred without breaking application ABIs, we leave it alone and introduce a new XU_NGROUPS value which is always 16 and is to be used or NGRPS as appropriate for things such as NFS which need to use no more than 16 groups. When feasible, truncate the group list rather than generating an error. Minor changes: - Reduce the number of hand rolled versions of groupmember(). - Do not assign to both cr_gid and cr_groups[0]. - Modify ipfw to cache ucreds instead of part of their contents since they are immutable once referenced by more than one entity. Submitted by: Isilon Systems (initial implementation) X-MFC after: never PR: bin/113398 kern/133867
194407	17-Jun-2009	rmacklem	Since svc_[dg\|vc\|tli\|tp]_create() did not hold a reference count on the SVCXPTR structure returned by them, it was possible for the structure to be free'd before svc_reg() had been completed using the structure. This patch acquires a reference count on the newly created structure that is returned by svc_[dg\|vc\|tli\|tp]_create(). It also adds the appropriate SVC_RELEASE() calls to the callers, except the experimental nfs subsystem. The latter will be committed separately. Submitted by: dfr Tested by: pho Approved by: kib (mentor)
194073	12-Jun-2009	rmacklem	Add a #include <sys/jail.h> so that it builds when options KGSSAPI is specified in the kernel configuration. Approved by: kib (mentor)
193511	05-Jun-2009	rwatson	Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd
193272	01-Jun-2009	jhb	Rework socket upcalls to close some races with setup/teardown of upcalls. - Each socket upcall is now invoked with the appropriate socket buffer locked. It is not permissible to call soisconnected() with this lock held; however, so socket upcalls now return an integer value. The two possible values are SU_OK and SU_ISCONNECTED. If an upcall returns SU_ISCONNECTED, then the soisconnected() will be invoked on the socket after the socket buffer lock is dropped. - A new API is provided for setting and clearing socket upcalls. The API consists of soupcall_set() and soupcall_clear(). - To simplify locking, each socket buffer now has a separate upcall. - When a socket upcall returns SU_ISCONNECTED, the upcall is cleared from the receive socket buffer automatically. Note that a SO_SND upcall should never return SU_ISCONNECTED. - All this means that accept filters should now return SU_ISCONNECTED instead of calling soisconnected() directly. They also no longer need to explicitly clear the upcall on the new socket. - The HTTP accept filter still uses soupcall_set() to manage its internal state machine, but other accept filters no longer have any explicit knowlege of socket upcall internals aside from their return value. - The various RPC client upcalls currently drop the socket buffer lock while invoking soreceive() as a temporary band-aid. The plan for the future is to add a new flag to allow soreceive() to be called with the socket buffer locked. - The AIO callback for socket I/O is now also invoked with the socket buffer locked. Previously sowakeup() would drop the socket buffer lock only to call aio_swake() which immediately re-acquired the socket buffer lock for the duration of the function call. Discussed with: rwatson, rmacklem
193066	29-May-2009	jamie	Place hostnames and similar information fully under the prison system. The system hostname is now stored in prison0, and the global variable "hostname" has been removed, as has the hostname_mtx mutex. Jails may have their own host information, or they may inherit it from the parent/system. The proper way to read the hostname is via getcredhostname(), which will copy either the hostname associated with the passed cred, or the system hostname if you pass NULL. The system hostname can still be accessed directly (and without locking) at prison0.pr_host, but that should be avoided where possible. The "similar information" referred to is domainname, hostid, and hostuuid, which have also become prison parameters and had their associated global variables removed. Approved by: bz (mentor)
192895	27-May-2009	jamie	Add hierarchical jails. A jail may further virtualize its environment by creating a child jail, which is visible to that jail and to any parent jails. Child jails may be restricted more than their parents, but never less. Jail names reflect this hierarchy, being MIB-style dot-separated strings. Every thread now points to a jail, the default being prison0, which contains information about the physical system. Prison0's root directory is the same as rootvnode; its hostname is the same as the global hostname, and its securelevel replaces the global securelevel. Note that the variable "securelevel" has actually gone away, which should not cause any problems for code that properly uses securelevel_gt() and securelevel_ge(). Some jail-related permissions that were kept in global variables and set via sysctls are now per-jail settings. The sysctls still exist for backward compatibility, used only by the now-deprecated jail(2) system call. Approved by: bz (mentor)
192678	24-May-2009	dfr	Fix build of KGSSAPI bits post-vimage.
191990	11-May-2009	attilio	Remove the thread argument from the FSD (File-System Dependent) parts of the VFS. Now all the VFS_* functions and relating parts don't want the context as long as it always refers to curthread. In some points, in particular when dealing with VOPs and functions living in the same namespace (eg. vflush) which still need to be converted, pass curthread explicitly in order to retain the old behaviour. Such loose ends will be fixed ASAP. While here fix a bug: now, UFS_EXTATTR can be compiled alone without the UFS_EXTATTR_AUTOSTART option. VFS KPI is heavilly changed by this commit so thirdy parts modules needs to be recompiled. Bump __FreeBSD_version in order to signal such situation.
191940	09-May-2009	kan	Do not embed struct ucred into larger netcred parent structures. Credential might need to hang around longer than its parent and be used outside of mnt_explock scope controlling netcred lifetime. Use separate reference-counted ucred allocated separately instead. While there, extend mnt_explock coverage in vfs_stdexpcheck and clean-up some unused declarations in new NFS code. Reported by: John Hickey PR: kern/133439 Reviewed by: dfr, kib
190971	12-Apr-2009	rmacklem	Change nfsserver so that it uses the nfssvc() system call provided in sys/nfs/nfs_nfssvc.c by registering with it using the nfsd_call_nfsserver function pointer. Also, add the build glue for nfs_nfssvc.c optionally based on "nfsserver" and also as a loadable module. Submitted by: rmacklem Reviewed by: kib Approved by: kib (mentor)
190053	19-Mar-2009	dfr	Fix an mbuf leak in the error path. Submitted by: Rick Macklem <rick at snowhite dot cis dot uoguelph dot ca>
188965	23-Feb-2009	rwatson	Include audit.h so that the system call path protected by NFS_LEGACYRPC can audit its arguments. Submitted by: Jaakko Heinonen <jh at saunalahti.fi> MFC after: 1 week X-MFC-note: MFC with r188311
188588	13-Feb-2009	jhb	Use shared vnode locks when invoking VOP_READDIR(). MFC after: 1 month
188311	08-Feb-2009	rwatson	Audit the flag argument to the nfssvc(2) system call. Obtained from: TrustedBSD Project Sponsored by: Apple, Inc.
187830	28-Jan-2009	ed	Last step of splitting up minor and unit numbers: remove minor(). Inside the kernel, the minor() function was responsible for obtaining the device minor number of a character device. Because we made device numbers dynamically allocated and independent of the unit number passed to make_dev() a long time ago, it was actually a misnomer. If you really want to obtain the device number, you should use dev2udev(). We already converted all the drivers to use dev2unit() to obtain the device unit number, which is still used by a lot of drivers. I've noticed not a single driver passes NULL to dev2unit(). Even if they would, its behaviour would make little sense. This is why I've removed the NULL check. Ths commit removes minor(), minor2unit() and unit2minor() from the kernel. Because there was a naming collision with uminor(), we can rename umajor() and uminor() back to major() and minor(). This means that the makedev(3) manual page also applies to kernel space code now. I suspect umajor() and uminor() isn't used that often in external code, but to make it easier for other parties to port their code, I've increased __FreeBSD_version to 800062.
186165	16-Dec-2008	kensmith	Handle VFS_VGET() failing with an error other than EOPNOTSUPP in addition to failing with that error. PR: 125149 Submitted by: Jaakko Heinonen (jh <at> saunalahti <dot> fi) Reviewed by: mohans, kan MFC after: 3 days
185860	10-Dec-2008	dfr	We need to pass a structure with enough space for an NFSv2 filehandle to nfs_srvmtofh_xx otherwise bad things happen when an NFSv2 client tries to make a request.
185586	03-Dec-2008	kan	Change nfsserver slightly so that it does not trip over the timestamp validation code on ZFS. Problem: when opening file with O_CREAT\|O_EXCL NFS has to jump through extra hoops to ensure O_EXCL semantics. Namely, client supplies of 8 bytes (NFSX_V3CREATEVERF) bytes of verification data to uniquely identify this create request. Server then creates a new file with access mode 0, copies received 8 bytes into va_atime member of struct vattr and attempt to set the atime on file using VOP_SETATTR. If that succeeds, it fetches file attributes with VOP_GETATTR and verifies that atime timestamps match. If timestamps do not match, NFS server concludes it has probbaly lost the race to another process creating the file with the same name and bails with EEXIST. This scheme works OK when exported FS is FFS, but if underlying filesystem is ZFS _and_ server is running 64bit kernel, it breaks down due to sanity checking in zfs_setattr function, which refuses to accept any timestamps which have tv_sec that cannot be represented as 32bit int. Since struct timespec fields are 64 bit integers on 64bit platforms and server just copies NFSX_V3CREATEVERF bytes info va_atime, all eight bytes supplied by client end up in va_atime.tv_sec, forcing it out of valid 32bit range. The solution this change implements is simple: it treats NFSX_V3CREATEVERF as two 32bit integers and unpacks them separately into va_atime.tv_sec and va_atime.tv_nsec respectively, thus guaranteeing that tv_sec remains in 32 bit range and ZFS remains happy. Reviewed by: kib
185432	29-Nov-2008	kib	In the nfsrv_fhtovp(), after the vfs_getvfs() function found the pointer to the fs, but before a vnode on the fs is locked, unmount may free fs structures, causing access to destroyed data and freed memory. Introduce a vfs_busymp() function that looks up and busies found fs while mountlist_mtx is held. Use it in nfsrv_fhtovp() and in the implementation of the handle syscalls. Two other uses of the vfs_getvfs() in the vfs_subr.c, namely in sysctl_vfs_ctl and vfs_getnewfsid seems to be ok. In particular, sysctl_vfs_ctl is protected by Giant by being a non-sleeping sysctl handler, that prevents Giant-locked unmount code to interfere with it. Noted by: tegge Reviewed by: dfr Tested by: pho MFC after: 1 month
184969	14-Nov-2008	dfr	Switch the default rpc implementation for NFS back to the new code. I believe I have fixed the reported problems - if you still have trouble with it, please contact me with as much detail as possible so that I can track down any other issues as quickly as possible.
184921	13-Nov-2008	dfr	Use the remote address for access control, not the local address. This fixes the nfsd problems that some people have with the new code. Add support for the vfs.nfsrv.nfs_privport sysctl which denies access unless the client is using a port number less than 1024. Not really sure if this is particularly useful since it doesn't add any real security.
184920	13-Nov-2008	dfr	Temporarily switch NFS back to the old RPC code while I try to diagnose and fix the problems a few people have noticed with the new code. People who want to continue testing the new code or who need RPCSEC_GSS support should use the new option NFS_NEWRPC to select it.
184869	12-Nov-2008	dfr	Turn (NFSERR_AUTHERR\|code) status values into svcerr_auth(rqst, code) replies instead of returning a success with a bogus NFS error code.
184868	12-Nov-2008	dfr	Allow v3 GETATTR requests even when weakly authenticated. Change the error return for for weakly authenticated requests from REJECTEDCRED to WEAKAUTH for consistency with Solaris.
184744	07-Nov-2008	dfr	Range-check NFSv2 procedure numbers before converting to NFSv3. Submitted by: csjp
184719	06-Nov-2008	dfr	Don't depend on krpc.ko in the NFS_LEGACYRPC case.
184716	06-Nov-2008	des	Unbreak NFS. Pointy hat to: dfr
184693	05-Nov-2008	dfr	If mountd doesn't specify a secflavor list for the mount, assume that -sec=sys is what was wanted.
184643	04-Nov-2008	dfr	Include <sys/eventhandler.h>.
184588	03-Nov-2008	dfr	Implement support for RPCSEC_GSS authentication to both the NFS client and server. This replaces the RPC implementation of the NFS client and server with the newer RPC implementation originally developed (actually ported from the userland sunrpc code) to support the NFS Lock Manager. I have tested this code extensively and I believe it is stable and that performance is at least equal to the legacy RPC implementation. The NFS code currently contains support for both the new RPC implementation and the older legacy implementation inherited from the original NFS codebase. The default is to use the new implementation - add the NFS_LEGACYRPC option to fall back to the old code. When I merge this support back to RELENG_7, I will probably change this so that users have to 'opt in' to get the new code. To use RPCSEC_GSS on either client or server, you must build a kernel which includes the KGSSAPI option and the crypto device. On the userland side, you must build at least a new libc, mountd, mount_nfs and gssd. You must install new versions of /etc/rc.d/gssd and /etc/rc.d/nfsd and add 'gssd_enable=YES' to /etc/rc.conf. As long as gssd is running, you should be able to mount an NFS filesystem from a server that requires RPCSEC_GSS authentication. The mount itself can happen without any kerberos credentials but all access to the filesystem will be denied unless the accessing user has a valid ticket file in the standard place (/tmp/krb5cc_<uid>). There is currently no support for situations where the ticket file is in a different place, such as when the user logged in via SSH and has delegated credentials from that login. This restriction is also present in Solaris and Linux. In theory, we could improve this in future, possibly using Brooks Davis' implementation of variant symlinks. Supporting RPCSEC_GSS on a server is nearly as simple. You must create service creds for the server in the form 'nfs/<fqdn>@<REALM>' and install them in /etc/krb5.keytab. The standard heimdal utility ktutil makes this fairly easy. After the service creds have been created, you can add a '-sec=krb5' option to /etc/exports and restart both mountd and nfsd. The only other difference an administrator should notice is that nfsd doesn't fork to create service threads any more. In normal operation, there will be two nfsd processes, one in userland waiting for TCP connections and one in the kernel handling requests. The latter process will create as many kthreads as required - these should be visible via 'top -H'. The code has some support for varying the number of service threads according to load but initially at least, nfsd uses a fixed number of threads according to the value supplied to its '-n' option. Sponsored by: Isilon Systems MFC after: 1 month
184561	02-Nov-2008	trhodes	Document a few sysctls in the NFS client and server code. Minor style(9) where applicable. Approved by: alfred (slightly older version)
184413	28-Oct-2008	trasz	Introduce accmode_t. This is required for NFSv4 ACLs - it will be neccessary to add more V* constants, and the variables changed by this patch were often being assigned to mode_t variables, which is 16 bit. Approved by: rwatson (mentor)
184407	28-Oct-2008	rwatson	Rename three MAC entry points from _proc_ to _cred_ to reflect the fact that they operate directly on credentials: mac_proc_create_swapper(), mac_proc_create_init(), and mac_proc_associate_nfsd(). Update policies. Obtained from: TrustedBSD Project
184205	23-Oct-2008	des	Retire the MALLOC and FREE macros. They are an abomination unto style(9). MFC after: 3 months
183809	12-Oct-2008	rwatson	Turn XXX's for unlocked writes of NFS server statistics to simple notes, as we consider it a feature to exchange performance for consistency. MFC after: 3 days
183113	17-Sep-2008	attilio	Remove the suser(9) interface from the kernel. It has been replaced from years by the priv_check(9) interface and just very few places are left. Note that compatibility stub with older FreeBSD version (all above the 8 limit though) are left in order to reduce diffs against old versions. It is responsibility of the maintainers for any module, if they think it is the case, to axe out such cases. This patch breaks KPI so __FreeBSD_version will be bumped into a later commit. This patch needs to be credited 50-50 with rwatson@ as he found time to explain me how the priv_check() works in detail and to review patches. Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com> Reviewed by: rwatson
183103	16-Sep-2008	attilio	Decontext-alize the nfsserver module. Now, only some few places still require thread passing (mostly the ones which access to VOP_* functions) and will be fixed once the primitive also will be. Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
182371	28-Aug-2008	attilio	Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread was always curthread and totally unuseful. Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
180131	30-Jun-2008	rwatson	Remove spls from NFS server setup call; expand receive socket buffer locking to cover full setup of socket upcalls; remove XXX about locking. MFC after: 3 weeks
179381	28-May-2008	kib	Change the fix in the rev. 1.179 to use nfsrv_lockedpair_nd(). Tested by: pho MFC after: 3 days
179380	28-May-2008	kib	Initialize vfslocked prior to calling nfsm_srvmtofh where it was forgotten. Reported by: Andrew Edwards <aedwards sandvine com> Tested by: pho MFC after: 3 days
177599	25-Mar-2008	ru	Replaced the misleading uses of a historical artefact M_TRYWAIT with M_WAIT. Removed dead code that assumed that M_TRYWAIT can return NULL; it's not true since the advent of MBUMA. Reviewed by: arch There are ongoing disputes as to whether we want to switch to directly using UMA flags M_WAITOK/M_NOWAIT for mbuf(9) allocation.
177493	22-Mar-2008	jeff	- Complete part of the unfinished bufobj work by consistently using BO_LOCK/UNLOCK/MTX when manipulating the bufobj. - Create a new lock in the bufobj to lock bufobj fields independently. This leaves the vnode interlock as an 'identity' lock while the bufobj is an io lock. The bufobj lock is ordered before the vnode interlock and also before the mnt ilock. - Exploit this new lock order to simplify softdep_check_suspend(). - A few sync related functions are marked with a new XXX to note that we may not properly interlock against a non-zero bv_cnt when attempting to sync all vnodes on a mountlist. I do not believe this race is important. If I'm wrong this will make these locations easier to find. Reviewed by: kib (earlier diff) Tested by: kris, pho (earlier diff)
177386	19-Mar-2008	dfr	Fix a regression from the last revision - don't edit the ns_rec list while not holding the lock.
177360	18-Mar-2008	dfr	Don't call nfs_realign while holding locks. Reviewed by: kib
176790	04-Mar-2008	kib	Fix the Giant leak in the nfsrv_remove(). Reported by: pluknet <pluknet gmail com> MFC after: 1 week
175450	18-Jan-2008	remko	Use nfsrv_destroycache() only once, else it crashes the server. PR: kern/118152 Submitted by: Bjoern Groenvall <bg at sics dot se> Approved by: imp (mentor, a while ago already), jhb MFC After: 3 days
175294	13-Jan-2008	attilio	VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
175202	10-Jan-2008	attilio	vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed. Manpage and FreeBSD_version will be updated through further commits. As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock. Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>
173334	04-Nov-2007	rwatson	Garbage collect now-unused nfsrv_setcred() -- it's not only unused, but also a purveyor of unfortunate (and now unsupported) direct frobbing of struct ucred. MFC after: 3 days
172957	25-Oct-2007	rwatson	Rename mac_associate_nfsd_label() to mac_proc_associate_nfsd(), and move from mac_vfs.c to mac_process.c to join other functions that setup up process labels for specific purposes. Unlike the two proc create calls, this call is intended to run after creation when a process registers as the NFS daemon, so remains an _associate_ call.. Obtained from: TrustedBSD Project
172759	18-Oct-2007	jhb	Add a -z flag to nfsstat which zeros the NFS statistics after displaying them. MFC after: 1 week Requested by: ps Submitted by: ps (6 years ago)
172557	12-Oct-2007	mohans	Set the NFS server sockbuf high watermarks to the system defaults (up form 32KB). The low highwatermark setting caused UDP fullsock request drops, throttling thruput greatly. Reported by: Kris Kennaway Approved by: re@ (Ken Smith)
171744	06-Aug-2007	rwatson	Remove the now-unused NET_{LOCK,UNLOCK,ASSERT}_GIANT() macros, which previously conditionally acquired Giant based on debug.mpsafenet. As that has now been removed, they are no longer required. Removing them significantly simplifies error-handling in the socket layer, eliminated quite a bit of unwinding of locking in error cases. While here clean up the now unneeded opt_net.h, which previously was used for the NET_WITH_GIANT kernel option. Clean up some related gotos for consistency. Reviewed by: bz, csjp Tested by: kris Approved by: re (kensmith)
171613	27-Jul-2007	rwatson	First in a series of changes to remove the now-unused Giant compatibility framework for non-MPSAFE network protocols: - Remove debug_mpsafenet variable, sysctl, and tunable. - Remove NET_NEEDS_GIANT() and associate SYSINITSs used by it to force debug.mpsafenet=0 if non-MPSAFE protocols are compiled into the kernel. - Remove logic to automatically flag interrupt handlers as non-MPSAFE if debug.mpsafenet is set for an INTR_TYPE_NET handler. - Remove logic to automatically flag netisr handlers as non-MPSAFE if debug.mpsafenet is set. - Remove references in a few subsystems, including NFS and Cronyx drivers, which keyed off debug_mpsafenet to determine various aspects of their own locking behavior. - Convert NET_LOCK_GIANT(), NET_UNLOCK_GIANT(), and NET_ASSERT_GIANT into no-op's, as their entire behavior was determined by the value in debug_mpsafenet. - Alias NET_CALLOUT_MPSAFE to CALLOUT_MPSAFE. Many remaining references to NET_.*_GIANT() and NET_CALLOUT_MPSAFE are still present in subsystems, and will be removed in followup commits. Reviewed by: bz, jhb Approved by: re (kensmith)
170689	13-Jun-2007	rwatson	Include priv.h to pick up suser(9) definitions, missed in an earlier commit. Warnings spotted by: kris
170488	10-Jun-2007	mjacob	Init timespec to zero fo quiesce warnings.
168951	22-Apr-2007	rwatson	Remove MAC Framework access control check entry points made redundant with the introduction of priv(9) and MAC Framework entry points for privilege checking/granting. These entry points exactly aligned with privileges and provided no additional security context: - mac_check_sysarch_ioperm() - mac_check_kld_unload() - mac_check_settime() - mac_check_system_nfsd() Add mpo_priv_check() implementations to Biba and LOMAC policies, which, for each privilege, determine if they can be granted to processes considered unprivileged by those two policies. These mostly, but not entirely, align with the set of privileges granted in jails. Obtained from: TrustedBSD Project
168931	21-Apr-2007	rwatson	Attempt to rationalize NFS privileges: - Replace PRIV_NFSD with PRIV_NFS_DAEMON, add PRIV_NFS_LOCKD. - Use PRIV_NFS_DAEMON in the NFS server. - In the NFS client, move the privilege check from nfslockdans(), which occurs every time a write is performed on /dev/nfslock, and instead do it in nfslock_open() just once. This allows us to avoid checking the saved uid for root, and just use the effective on open. Use PRIV_NFS_LOCKD.
168761	15-Apr-2007	rwatson	In nfsrv_rcv(), don't reacquire the nfs server lock until after nfs_realign() has been called, as it may sleep waiting on memory allocation. Reported by: simon
168268	02-Apr-2007	jhb	- Split out the part of SYSCALL_MODULE_HELPER() that builds a 'struct sysent' for a new system call into a new MAKE_SYSENT() macro. - Use MAKE_SYSENT() to build a full sysent for the nfssvc system call in the NFS server and use syscall_register() and syscall_deregister() to manage the nfssvc system call entry instead of manually frobbing the sysent[] array.
167899	26-Mar-2007	jhb	Initialize vfslocked to 0 before nfsm_srvmtofh() so that the variable is not used uninitialized in 'nfsmout' if nfsm_srvmtofh() gets an internal error. CID: 1766 Found by: Coverity Prevent (tm)
167665	17-Mar-2007	jeff	- Turn all explicit giant acquires into conditional VFS_LOCK_GIANTs. Only ops which used namei still remained. - Implement a scheme for reducing the overhead of tracking which vops require giant by constantly reducing the number of recursive giant acquires to one, leaving us with only one vfslocked variable. - Remove all NFSD lock acquisition and release from the individual nfs ops. Careful examination has shown that they are not required. This greatly simplifies the code. Sponsored by: Isilon Systems, Inc. Discussed with: rwatson Tested by: kkenn Approved by: re
167214	05-Mar-2007	wkoszek	Change these descriptions of memory types used in malloc(9), as their current, rather long strings make output from vmstat -m look unpleasant. Approved by: cognet (mentor)
167211	04-Mar-2007	rwatson	Remove 'MPSAFE' annotations from the comments above most system calls: all system calls now enter without Giant held, and then in some cases, acquire Giant explicitly. Remove a number of other MPSAFE annotations in the credential code and tweak one or two other adjacent comments.
166774	15-Feb-2007	pjd	Move vnode-to-file-handle translation from vfs_vptofh to vop_vptofh method. This way we may support multiple structures in v_data vnode field within one file system without using black magic. Vnode-to-file-handle should be VOP in the first place, but was made VFS operation to keep interface as compatible as possible with SUN's VFS. BTW. Now Solaris also implements vnode-to-file-handle as VOP operation. VFS_VPTOFH() was left for API backward compatibility, but is marked for removal before 8.0-RELEASE. Approved by: mckusick Discussed with: many (on IRC) Tested with: ufs, msdosfs, cd9660, nullfs and zfs
166683	13-Feb-2007	mpp	Get the vfs giant lock before calling nfs_access. Reviewed by: mohan
165739	02-Jan-2007	hrs	The nfsm_srvpathsiz() macro in nfsrv_symlink() in nfs_serv.c should check length of the pathname in the range 0<=n<=NFS_MAXPATHLEN, not 0<n<=NFS_MAXPATHLEN. This fixes a minor interoperability problem that the FreeBSD NFS server did not allow a symlink pointing the empty pathname. MFC after: 1 week
165118	12-Dec-2006	bz	MFp4: 92972, 98913 + one more change In ip6_sprintf no longer use and return one of eight static buffers for printing/logging ipv6 addresses. The caller now has to hand in a sufficiently large buffer as first argument.
164585	24-Nov-2006	rwatson	Push Giant a bit further off the NFS server in a number of straight forward cases by converting from unconditional acquisition of Giant around vnode operations to conditional acquisition: - Remove nfsrv_access_withgiant(), and cause nfsrv_access() to now assert that Giant will be held if it is required for the vnode. - Add nfsrv_fhtovp_locked(), which will drop the NFS server lock if required, and modify nfsrv_fhtovp() to conditionally acquire Giant if required. - In the VOP's not dealing with more than one vnode at a time (i.e., not involving a lookup), conditionally acquire Giant. This removes Giant use for MPSAFE file systems for a number of quite important RPCs, including getattr, read, write. It leaves unconditional Giant acquisitions in vnode operations that interact with the name space or more than one vnode at a time as these require further work. Tested by: kris Reviewed by: kib
164436	20-Nov-2006	pjd	Protect nfsm_srvpathsiz() call with the nfsd_mtx lock. Reviewed by: mohans
164033	06-Nov-2006	rwatson	Sweep kernel replacing suser(9) calls with priv(9) calls, assigning specific privilege names to a broad range of privileges. These may require some future tweaking. Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>
163701	26-Oct-2006	kib	Fix leak in NAMEI zone caused by nfs server when VOP_RENAME fails. Submitted by: Padma Bhooma <pbhooma at panasas com> Reviewed by: bde Approved by: pjd (mentor) MFC after: 1 week
163606	22-Oct-2006	rwatson	Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA
160881	01-Aug-2006	jhb	- Add a new function nfsrv_destroycache() to tear down the server request cache when unloading the nfsserver module. This fixes a memory leak and a stale pointer. - Use callout_drain() rather than callout_stop() when unloading the nfsserver module. MFC after: 3 days
160880	01-Aug-2006	jhb	Use TAILQ_FOREACH_SAFE() in a couple of places.
160798	28-Jul-2006	jhb	Now that all system calls are MPSAFE, retire the SYF_MPSAFE flag used to mark system calls as being MPSAFE: - Stop conditionally acquiring Giant around system call invocations. - Remove all of the 'M' prefixes from the master system call files. - Remove support for the 'M' prefix from the script that generates the syscall-related files from the master system call files. - Don't explicitly set SYF_MPSAFE when registering nfssvc.
160619	24-Jul-2006	rwatson	soreceive_generic(), and sopoll_generic(). Add new functions sosend(), soreceive(), and sopoll(), which are wrappers for pru_sosend, pru_soreceive, and pru_sopoll, and are now used univerally by socket consumers rather than either directly invoking the old so*() functions or directly invoking the protocol switch method (about an even split prior to this commit). This completes an architectural change that was begun in 1996 to permit protocols to provide substitute implementations, as now used by UDP. Consumers now uniformly invoke sosend(), soreceive(), and sopoll() to perform these operations on sockets -- in particular, distributed file systems and socket system calls. Architectural head nod: sam, gnn, wollman
159871	23-Jun-2006	mohans	Size the NFS server dupreq cache on the basis of nmbclusters. On servers with low nmbclusters, we tie up too many mbclusters in the NFS duplicate request cache. This change limits the size of the dupreq cache to 1/2 the nmbclusters (and flaots in a range of [64, 2048]). MFC after 2 weeks. Reported by: Steve Kargl, David O'Brien Tested by: Steve Kargl
159268	05-Jun-2006	kib	Temporary workaround to prevent leak of Giant from nfsd when calling lookup(). Reviewed by: tegge Tested by: "Arno J. Klaassen" <arno at heho snv jussieu fr>, "Rong-en Fan" <grafan at gmail com>, Dmitriy Kirhlarov <dimma at higis ru>, Dmitry Pryanishnikov <dmitry at atlantis dp ua> MFC after: 1 week Approved by: kan, pjd (mentors)
158008	25-Apr-2006	mohans	Bump up the NFS server dupreq cache limit to 2K (from 64). With a small duplicate request cache, under heavy load a lot of non-idempotent requests were getting served again, resulting in errors. Found by : Kris Kennaway.
157575	06-Apr-2006	csjp	Introduce a new MAC entry point for label initialization of the NFS daemon's credential: mac_associate_nfsd_label() This entry point can be utilized by various Mandatory Access Control policies so they can properly initialize the label of files which get created as a result of an NFS operation. This work will be useful for fixing kernel panics associated with accessing un-initialized or invalid vnode labels. The implementation of these entry points will come shortly. Obtained from: TrustedBSD Requested by: mdodd MFC after: 3 weeks
157391	02-Apr-2006	cel	rick says: The following bug was just identified in OpenBSD and it looks like the same bug exists in the other BSDen NFS servers. A Linux client (don't know which version, but you can look at http://bugzilla.kernel.org/show_bug.cgi?id=6256) does a Setattr of mtime to the server's time, where the file is mode 0664 and the client user has group access (ie. caller is not the file owner). The BSD servers fail the Setattr with EPERM, since the VA_UTIMES_NULL flag isn't set before doing the VOP_SETATTR. It seems to me that this should be allowed, since it is allowed for a local utimes(2). If so, the fix is to set VA_UTIMES_NULL for the "set-time-to-server-time" cases of setting atime and/or mtime. Submitted by: rick@snowhite.cis.uoguelph.ca Reviewed by: cel Approved by: silby MFC after: 1 week
157325	31-Mar-2006	jeff	- Release the references acquired by VOP_GETWRITEMOUNT and vfs_getvfs(). Discussed with: tegge Tested by: kris Sponsored by: Isilon Systems, Inc.
156586	12-Mar-2006	jeff	- Reorder vrele calls after vput calls to prevent lock order reversals between leaf and directory locks. Found by: kris Sponsored by: Isilon Systems, Inc.
156439	08-Mar-2006	simon	When parsing an RPC request in nfsrv_dorec(), KASSERT that there actually is an mbuf to process. This catches the missing mbuf before it would otherwise causes a NULL pointer dereference, which could be triggered by a 0 length RPC record before the check for such records was added in rev 1.97. Approved by: cperciva (mentor)
156146	01-Mar-2006	simon	Correct a remote kernel panic when processing zero-length RPC records via TCP. [06:10] Security: FreeBSD-SA-06:10.nfs Approved by: cperciva
155160	01-Feb-2006	jeff	- Reorder calls to vrele() after calls to vput() when the vrele is a directory. vrele() may lock the passed vnode, which in these cases would give an invalid lock order of child -> parent. These situations are deadlock prone although do not typically deadlock because the vrele is typically not releasing the last reference to the vnode. Users of vrele must consider it as a call to vn_lock() and order it appropriately. MFC After: 1 week Sponsored by: Isilon Systems, Inc. Tested by: kkenn
154960	28-Jan-2006	csjp	Manage the ucred for the NFS server using the crget/crfree API defined in kern_prot.c. This API handles reference counting among many other things. Notably, if MAC is compiled into the kernel, it will properly initialize the MAC labels when the ucred is allocated. This work is in preparation for a new MAC entry point which will be responsible for properly initializing policy specific labels for the NFS server credential. Utilization of the crfree/crget APIs reduce the complexity associated with this label's management. Submitted by: green (with changes) [1] Obtained from: TrustedBSD Project Discussed with: rwatson, alfred [1] I moved the ucred allocation outside the scope of the NFS server lock to prevent M_WAIKOK allocations from occurring with non-sleep-able locks held. Additionally, to reduce complexity, the ucred persist as long as the NFS server descriptor.
154737	23-Jan-2006	trhodes	Revert my previous commit. Proved I'm not that bright at times: jhb
154729	23-Jan-2006	trhodes	Fix indentation. Prodded by: stefanf, ru, njl (in that order)
154629	21-Jan-2006	trhodes	Remove some dead code. Found with: Coverity Prevent(tm)
151897	31-Oct-2005	rwatson	Normalize a significant number of kernel malloc type names: - Prefer '_' to ' ', as it results in more easily parsed results in memory monitoring tools such as vmstat. - Remove punctuation that is incompatible with using memory type names as file names, such as '/' characters. - Disambiguate some collisions by adding subsystem prefixes to some memory types. - Generally prefer lower case to upper case. - If the same type is defined in multiple architecture directories, attempt to use the same name in additional cases. Not all instances were caught in this change, so more work is required to finish this conversion. Similar changes are required for UMA zone names.
151761	27-Oct-2005	glebius	Keep locks consistent before goto. Reported by: pho Reviewed by: mohans
150634	27-Sep-2005	jhb	Use the refcount API to manage the reference count for user credentials rather than using pool mutexes. Tested on: i386, alpha, sparc64
145197	17-Apr-2005	rwatson	NFS write gathering defers execution of NFS server write requests to wait to see if additional write requests will arrive that can be coalesced and clustered with earlier ones. When doing so, it must determine whether the two requests are made by credentials with the same access writes, so as not to coalesce improperly. NFSW_SAMECRED() implements a test of two credentials using a binary compare. Replace NFSW_SAMECRED() macro with nfsrv_samecred() function, which is aware of the contents and layout of a struct ucred, rather than a simple binary compare. While the binary compare works when ucred is simply a zero'd and embedded 'struct ucred' in the NFS descriptor, it will work less well when the ucred associated with an NFS descriptor is "real", so has defined and populated reference count, mutex, etc. MFC after: 1 week Obtained from: TrustedBSD Project
144246	28-Mar-2005	sam	avoid potential null ptr deref by free'ing excess mbufs instead of zero'ing their length (copied from m_adj where this code came from after the equivalent change there has had time to soak) Noticed by: Coverity Prevent analysis tool
144141	26-Mar-2005	delphij	Do not do write gathering for NFSv3, since it makes no sense unless the client is broken and does sync writes all the time. Obtained from: NetBSD (sys/nfs/nfs_syscalls.c,v 1.44) Reviewed by: -arch (bde)
140770	24-Jan-2005	phk	Don't try to create vnode_pager objects on other filesystems vnodes, either they did it themselves or it won't happen.
140495	19-Jan-2005	ps	Now that we have a non blocking version of nfsm_dissect(), change all the nfsm_dissect() calls (done under the NFSD lock) to nfsm_dissect_nonblock(). Submitted by: Mohan Srinivasan
140181	13-Jan-2005	phk	Ditch vfs_object_create() and make the callers call VOP_CREATEVOBJECT() directly.
140048	11-Jan-2005	phk	Remove the unused credential argument from VOP_FSYNC() and VFS_SYNC(). I'm not sure why a credential was added to these in the first place, it is not used anywhere and it doesn't make much sense: The credentials for syncing a file (ability to write to the file) should be checked at the system call level. Credentials for syncing one or more filesystems ("none") should be checked at the system call level as well. If the filesystem implementation needs a particular credential to carry out the syncing it would logically have to the cached mount credential, or a credential cached along with any delayed write data. Discussed with: rwatson
139823	07-Jan-2005	imp	/* -> /*- for license, minor formatting changes
137589	11-Nov-2004	rwatson	Correct a bug in nfsrv_create() where a call to nfsrv_access() might be made holding the NFS server mutex. To clean this up, introduce a version of the function, nfsrv_access_withgiant(), that expects the NFS server mutex to already have been dropped and Giant acquired. Wrap nfsrv_access() around this. This permits callers to more efficiently check access if they're in a code block performing VFS operations, and can be substitited for the nfsrv_access() call that triggered this bug. PR: 73807, 73208 MFC after: 1 week
136767	22-Oct-2004	phk	Add b_bufobj to struct buf which eventually will eliminate the need for b_vp. Initialize b_bufobj for all buffers. Make incore() and gbincore() take a bufobj instead of a vnode. Make inmem() local to vfs_bio.c Change a lot of VI_[UN]LOCK(bp->b_vp) to BO_[UN]LOCK(bp->b_bufobj) also VI_MTX() to BO_MTX(), Make buf_vlist_add() take a bufobj instead of a vnode. Eliminate other uses of bp->b_vp where bp->b_bufobj will do. Various minor polishing: remove "register", turn panic into KASSERT, use new function declarations, TAILQ_FOREACH_SAFE() etc.
136662	18-Oct-2004	rwatson	Correct several instances where calls to vfs_getvfs() resulting in failure in the NFS server would result in a leaked instance of the NFS server subsystem lock. Liberally sprinkle assertions in all target labels for error unwinding to assert the desired locking state. RELENG_5_3 candidate. MFC after: 3 days Reported by: Wilkinson, Alex <alex dot wilkinson at dsto dot defence dot gov dot au>
134296	25-Aug-2004	rwatson	Convert a mtx_lock(&Giant) to a mtx_unlock(&Giant) in nfsrv_link() to prevent leakage of Giant. With INVARIANTS, this results in an assertion failure following execution of the RPC. Without INVARIANTS, it could result in problems if the NFS server is killed causing nfsd to return to user space holding Giant. Feet provided by: brueffer
132591	24-Jul-2004	rwatson	If debug.mpsafenet is non-zero, run the NFS server callout without Giant.
132590	24-Jul-2004	rwatson	Remove spl() use from nfsrv_timer.
132199	15-Jul-2004	phk	Do a pass over all modules in the kernel and make them return EOPNOTSUPP for unknown events. A number of modules return EINVAL in this instance, and I have left those alone for now and instead taught MOD_QUIESCE to accept this as "didn't do anything".
132086	13-Jul-2004	alfred	Do not call sorecieve() in the context of a socket callback as it causes lock order reversals so->inpcb since we're called with the socket lock held.
131532	03-Jul-2004	rwatson	Change M_WAITOK argument to sodupsockaddr() to M_NOWAIT. When the call to dup_sockaddr() was renamed to sodupsockaddr(), the argument was changed from '1' to 'M_WAITOK', which changed the semantics. This resulted in a WITNESS warning about a potential sleep while holding the NFS server mutex. Now this will no longer happen, restoring a possible bug present in the original code (setting RC_NAM even though the malloc to copy the addres may fail). bde observes that the flag names here should probably not be the same as the malloc flags for name space reasons. Bumped into by: kuriyama
130653	17-Jun-2004	rwatson	Merge additional socket buffer locking from rwatson_netperf: - Lock down low hanging fruit use of sb_flags with socket buffer lock. - Lock down low hanging fruit use of so_state with socket lock. - Lock down low hanging fruit use of so_options. - Lock down low-hanging fruit use of sb_lowwat and sb_hiwat with socket buffer lock. - Annotate situations in which we unlock the socket lock and then grab the receive socket buffer lock, which are currently actually the same lock. Depending on how we want to play our cards, we may want to coallesce these lock uses to reduce overhead. - Convert a if()->panic() into a KASSERT relating to so_state in soaccept(). - Remove a number of splnet()/splx() references. More complex merging of socket and socket buffer locking to follow.
130640	17-Jun-2004	phk	Second half of the dev_t cleanup. The big lines are: NODEV -> NULL NOUDEV -> NODEV udev_t -> dev_t udev2dev() -> findcdev() Various minor adjustments including handling of userland access to kernel space struct cdev etc.
129902	31-May-2004	bmilekic	Giant wasn't dropped here if we have to return EBUSY. This is bad.
129899	31-May-2004	rwatson	Release NFS subsystem lock and acquire Giant when calling into vn_start_write().
129894	31-May-2004	rwatson	Add an assertion that nfssvc() isn't called with Giant. Add two additional pairs of assertions, one at the end of the NFS server event loop, and one one exit from the NFS daemon, that assert that if debug.mpsafenet is enabled, Giant is not held, and that if it is not enabled, Giant will be held. This is intended to support debugging scenarios where Giant is "leaked" during NFS processing.
129888	31-May-2004	rwatson	The NFS server modevent code manually patches the system call table to install nfssvc(). It also updates the argument count, but did so without setting SYF_MPSAFE, effectively removing the MPSAFE flag even when syscalls.master indicates it doesn't require Giant. This change forces the modevent to set MPSAFE as a flag to its internal notion of an argument coutn. Note: this duplication of information is a bad thing, but is a more general problem I'm not currently willing to address.
129886	30-May-2004	rwatson	One more case where we want to drop the NFS server lock and acquire Giant when entering VFS. Discovered by code inspection; still not hit without debug.mpsafenet=1. Reported by: bmilekic
129885	30-May-2004	rwatson	Acquire Giant around two more cases when calling into VFS to vput() a vnode. Not bumped into with asserts in the main tree because we run the NFS server with Giant by default. Discovered by inspection. Complete annotations of Giant acquisition/release to note that it's only because of VFS that we acquire Giant in most places in the NFS server.
129841	29-May-2004	rwatson	Don't release Giant until after the call to vput() in nfsrv_setattr(). Unless running with debug.mpsafenet=1, this was not actually a problem.
129839	29-May-2004	rwatson	No need to conditionally acquire Giant in nfssvc_nfsd() because it is acquired by the caller. Should not cause problems, but causes an unnecessary recursion on Giant. Pointed out by: bmilekic
129785	27-May-2004	rwatson	Call nfsm_clget_nolock() instead of nfsm_clget() when holding the NFS subsystem lock to avoid tripping over an assertion regarding whether the lock is held or not. This is likely to be the cause of a panic tripped over by Andrea Campi.
129639	24-May-2004	rwatson	The socket code upcalls into the NFS server using the so_upcall mechanism so that early processing on mbufs can be performed before a context switch to the NFS server threads. Because of this, if the socket code is running without Giant, the NFS server also needs to be able to run the upcall code without relying on the presence on Giant. This change modifies the NFS server to run using a "giant code lock" covering operation of the whole subsystem. Work is in progress to move to data-based locking as part of the NFSv4 server changes. Introduce an NFS server subsystem lock, 'nfsd_mtx', and a set of macros to operate on the lock: NFSD_LOCK_ASSERT() Assert nfsd_mtx owned by current thread NFSD_UNLOCK_ASSERT() Assert nfsd_mtx not owned by current thread NFSD_LOCK_DONTCARE() Advisory: this function doesn't care NFSD_LOCK() Lock nfsd_mtx NFSD_UNLOCK() Unlock nfsd_mtx Constify a number of global variables/structures in the NFS server code, as they are not modified and contain constants only: nfsrvv2_procid nfsrv_nfsv3_procid nonidempotent nfsv2_repstat nfsv2_type nfsrv_nfsv3_procid nfsrvv2_procid nfsrv_v2errmap nfsv3err_null nfsv3err_getattr nfsv3err_setattr nfsv3err_lookup nfsv3err_access nfsv3err_readlink nfsv3err_read nfsv3err_write nfsv3err_create nfsv3err_mkdir nfsv3err_symlink nfsv3err_mknod nfsv3err_remove nfsv3err_rmdir nfsv3err_rename nfsv3err_link nfsv3err_readdir nfsv3err_readdirplus nfsv3err_fsstat nfsv3err_fsinfo nfsv3err_pathconf nfsv3err_commit nfsrv_v3errmap There are additional structures that should be constified but due to their being passed into general purpose functions without const arguments, I have not yet converted. In general, acquire nfsd_mtx when accessing any of the global NFS structures, including struct nfssvc_sock, struct nfsd, struct nfsrv_descript. Release nfsd_mtx whenever calling into VFS, and acquire Giant for calls into VFS. Giant is not required for any part of the operation of the NFS server with the exception of calls into VFS. Giant will never by acquired in the upcall code path. However, it may operate entirely covered by Giant, or not. If debug.mpsafenet is set to 0, the system calls will acquire Giant across all operations, and the upcall will assert Giant. As such, by default, this enables locking and allows us to test assertions, but should not cause any substantial new amount of code to be run without Giant. Bugs should manifest in the form of lock assertion failures for now. This approach is similar (but not identical) to modifications to the BSD/OS NFS server code snapshot provided by BSDi as part of their SMPng snapshot. The strategy is almost the same (single lock over the NFS server), but differs in the following ways: - Our NFS client and server code bases don't overlap, which means both fewer bugs and easier locking (thanks Peter!). Also means NFSD_() as opposed to NFS_(). - We make broad use of assertions, whereas the BSD/OS code does not. - Made slightly different choices about how to handle macros building packets but operating with side effects. - We acquire Giant only when entering VFS from the NFS server daemon threads. - Serious bugs in BSD/OS implementation corrected -- the snapshot we received was clearly a work in progress. Based on ideas from: BSDi SMPng Snapshot Reviewed by: rick@snowhite.cis.uoguelph.ca Extensive testing by: kris
128154	12-Apr-2004	mux	Don't send the available space as is in the FSSTAT call. Under FreeBSD, we can have a negative available space value, but the corresponding fields in the NFS protocol are unsigned. So trnucate the value to 0 if it's negative, so that the client doesn't receive absurdly high values. Tested by: cognet
128112	11-Apr-2004	peadar	Don't let the NFS server module be unloaded as long as there are nfsd processes running Reviewed By: iedowse PR: 16299
127977	07-Apr-2004	imp	Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999 and email from Peter Wemm, Alan Cox and Robert Watson. Approved by: core, peter, alc, rwatson
127924	06-Apr-2004	rwatson	Add imperfect comments identifying the function of various nfs socket condition flags. Corrections, if appropriate, welcome.
127857	04-Apr-2004	rwatson	Spell 2 as SHUT_RDWR when used as an argument to soshutdown().
127851	04-Apr-2004	rwatson	Explicitly compare pointers with NULL rather than treating a pointer as a boolean directly, use NULL instead of 0.
126962	14-Mar-2004	peter	Calculate NFS timeouts in units of 10ms, not 5ms. This matches the default clock precision on i386. This is a NOP change on i386. But this stops the mount_nfs units from suddenly changing to units of 1/20 of a second (vs the normal 1/10 of a second) if HZ is increased.
126853	11-Mar-2004	phk	Properly vector all bwrite() and BUF_WRITE() calls through the same path and s/BUF_WRITE()/bwrite()/ since it now does the same as bwrite().
126723	07-Mar-2004	kan	Convert from timeout to callout API. Submitted by: rwatson
126425	01-Mar-2004	rwatson	Rename dup_sockaddr() to sodupsockaddr() for consistency with other functions in kern_socket.c. Rename the "canwait" field to "mflags" and pass M_WAITOK and M_NOWAIT in from the caller context rather than "1" or "0". Correct mflags pass into mac_init_socket() from previous commit to not include M_ZERO. Submitted by: sam
123608	17-Dec-2003	jhb	Fix some becuase -> because typos. Reported by: Marco Wertejuk <wertejuk@mwcis.com>
122823	17-Nov-2003	rwatson	Update a comment about needing to fix NFS server credential use by 5.0-RELEASE: make it now read 5.3-RELEASE to be realistic. Still needs fixing...
122261	07-Nov-2003	sam	Assert GIANT_REQUIRED where sockets are manipulated. This is preparatory for MPSAFE network commits and ongoing socket locking work. Supported by: FreeBSD Foundation
121473	24-Oct-2003	phk	When grabbing vnodes to service NFS requests, make sure to call vn_start_write() early to avoid snapshot deadlocks. By: mckusick
120753	04-Oct-2003	jeff	- Set the sopt_dir member of the sockopt structure, otherwise, this parameter will not actually be set even though we're calling sosetopt. sosetopt calls down to a single ctloutput function if the name or level is implemented by a specific protocol. Submitted by: pete@isilon.com
117151	02-Jul-2003	phk	Change idle state sleep identifier to "-" for nfsd.
116789	24-Jun-2003	iedowse	Fix a bug in nfsrv_read() that caused the replies to certain NFSv3 short read operations at the end of a file to not have the "eof" flag set as they should. The problem is that the requested read count was compared against the rounded-up reply data length instead of the actual reply data length. This bug appears to have been introduced in revision 1.78 (June 1999). It causes first-time reads of certain file sizes (e.g 4094 bytes) to fail with EIO on a RedHat 9.0 NFSv3 client. MFC after: 1 week
116657	21-Jun-2003	mckusick	Increase the size of the NFS server hash table to improve performance when serving up more than about 32 active files. For details see section 6.3 (pg 111) of Daniel Ellard and Margo Seltzer, ``NFS Tricks and Benchmarking Traps'' in the Proceedings of the Usenix 2003 Freenix Track, June 9-14, 2003 pg 101-114. Obtained from: Daniel Ellard <ellard@eecs.harvard.edu> Sponsored by: DARPA & NAI Labs.
116189	11-Jun-2003	obrien	Use __FBSDID().
115870	05-Jun-2003	hsu	Protect read-modify-write increment of f_count field with file lock.
115476	31-May-2003	phk	Add /* FALLTHROUGH */ Found by: FlexeLint
115301	25-May-2003	truckman	Beat vnode locking in the NFS server code into submission. This change is not pretty, but it fixes the code so that it no longer violates the vnode locking rules in the VFS API and doesn't trip any of the locking assertions enabled by the DEBUG_VFS_LOCKS kernel configuration option. There is one report that this patch fixed a "locking against myself" panic on an NFS server that was tripped by a diskless client. Approved by: re (scottl)
113955	24-Apr-2003	alc	- Acquire the vm_object's lock when performing vm_object_page_clean(). - Add a parameter to vm_pageout_flush() that tells vm_pageout_flush() whether its caller has locked the vm_object. (This is a temporary measure to bootstrap vm_object locking.)
112179	13-Mar-2003	jeff	- Lock bufs before inspecting their flags.
111748	02-Mar-2003	des	More low-hanging fruit: kill caddr_t in calls to wakeup(9) / [mt]sleep(9).
111463	25-Feb-2003	jeff	- Add an interlock argument to BUF_LOCK and BUF_TIMELOCK. - Remove the buftimelock mutex and acquire the buf's interlock to protect these fields instead. - Hold the vnode interlock while locking bufs on the clean/dirty queues. This reduces some cases from one BUF_LOCK with a LK_NOWAIT and another BUF_LOCK with a LK_TIMEFAIL to a single lock. Reviewed by: arch, mckusick
111251	22-Feb-2003	phk	Don't use mbuf allocator flags for malloc(9).
111119	19-Feb-2003	imp	Back out M_* changes, per decision of the TRB. Approved by: trb
109623	21-Jan-2003	alfred	Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
109153	13-Jan-2003	dillon	Bow to the whining masses and change a union back into void *. Retain removal of unnecessary casts and throw in some minor cleanups to see if anyone complains, just for the hell of it.
109123	12-Jan-2003	dillon	Change struct file f_data to un_data, a union of the correct struct pointer types, and remove a huge number of casts from code using it. Change struct xfile xf_data to xun_data (ABI is still compatible). If we need to add a #define for f_data and xf_data we can, but I don't think it will be necessary. There are no operational changes in this commit.
108533	01-Jan-2003	schweikh	Correct typos, mostly s/ a / an / where appropriate. Some whitespace cleanup, especially in troff files.
108356	28-Dec-2002	dillon	Abstract-out the constants for the sequential heuristic. No operational changes. MFC after: 1 day
107636	05-Dec-2002	iedowse	In the NFSv3 `fsinfo' procedure reply, don't claim that we support 32k read and write operations on datagram sockets when in fact we reject requests larger than 16k. It must be the case that virtually all clients use data sizes of 16k or less for UDP transport (FreeBSD's client defaults to 8k and never exceeds 16k), as this bug has been present ever since NFSv3 support was added. Reported by: Senthil <lihtnes78@netscape.net> Reviewed by: dillon Approved by: re MFC-after: 1 week
106412	04-Nov-2002	rwatson	Permit MAC policies to instrument the access control decisions for system accounting configuration and for nfsd server thread attach. Policies might use this to protect the integrity or confidentiality of accounting data, limit the ability to turn on or off accounting, as well as to prevent inappropriately labeled threads from becoming nfs server threads. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
106264	31-Oct-2002	jeff	- Introduce a new macro, since that's what nfs loves, called nfsm_srvpathsiz. This macro plucks a length out of an rpc request and verifies that its size does not exceed NFS_MAXPATHLEN. If it does it generates an ENAMETOOLONG response. - Use this macro, and the existing nfsm_srvnamsiz macro in two places where we deal with paths passed in by the client. This fixes a linux interoperability bug. Linux was sending oversized path components which would cause us to ignore the request all together. This causes linux to hang indefinitly while it waits for a response. This could still happen in other cases where we error out with EBADRPC. Sponsored by: Isilon Systems, Inc. Reviewed by: alfred, fabbri@isilon.com, neal@isilon.com
105481	19-Oct-2002	rwatson	Set the NOMACCHECK flag for namei()'s generated by the NFS server code. We currently don't enforce protections on NFS-originated VOP's. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
104424	03-Oct-2002	rwatson	Correct a problem wherein NFS servers running NFSv2 would not return certain classes of failure responses to the client during a failed remove operation. Submitted by: Ian Dowse <iedowse@maths.tcd.ie>
103940	25-Sep-2002	jeff	- Use incore() instead of gbincore() so we don't have to acquire the vnode interlock.
103554	18-Sep-2002	phk	Use m_length() instead of home-rolled versions.
102236	21-Aug-2002	phk	Make the V2 errno translation more resistent to new errnos.
101308	04-Aug-2002	jeff	- Replace v_flag with v_iflag and v_vflag - v_vflag is protected by the vnode lock and is used when synchronization with VOP calls is needed. - v_iflag is protected by interlock and is used for dealing with vnode management issues. These flags include X/O LOCK, FREE, DOOMED, etc. - All accesses to v_iflag and v_vflag have either been locked or marked with mp_fixme's. - Many ASSERT_VOP_LOCKED calls have been added where the locking was not clear. - Many functions in vfs_subr.c were restructured to provide for stronger locking. Idea stolen from: BSD/OS
100644	24-Jul-2002	peter	Oops, another unused arg to nfssvc_nfsd(). blush Submitted by: jake
100641	24-Jul-2002	peter	Fully exterminate nfsd_srvargs and nfsd_cargs. They were either unused or giant NOP's. There was a credential in srvargs that was giving rwatson some heartburn. :-)
100606	24-Jul-2002	rwatson	Stick a dark comment in about the fact that the NFS server code allocates a ucred by itself as part of an nfs descriptor, then bzero's the ucred, fails to initialize the mutex, etc. This is very bad, but I don't have time to fix it right now. nfsd should instead hold a cred pointer, and the credential should be properly initialized, probably from a descendent of a kernel process credential.
100509	22-Jul-2002	ume	sync comment with reality. IN6P_BINDV6ONLY -> IN6P_IPV6_V6ONLY.
100205	17-Jul-2002	dillon	'recm' was not being unconditionally cleared for each loop, leading to system lockups (infinite loops) when a zero-length RPC is received. Linux clients will sometimes send zero-length RPC requests. Reorganize the use of recm in the loop. Cc: security@freebsd.org Submitted by: Mike Junk <junk@isilon.com> MFC after: 3 days
100134	15-Jul-2002	alfred	Add IPv6 support. Submitted by: Jean-Luc Richier <Jean-Luc.Richier@imag.fr>
99797	11-Jul-2002	dillon	Convert old style (type foo *)0 casts to NULLs PR: kern/40360 Requested by: Hiten PAndya via direct email
99737	10-Jul-2002	dillon	Replace the global buffer hash table with per-vnode splay trees using a methodology similar to the vm_map_entry splay and the VM splay that Alan Cox is working on. Extensive testing has appeared to have shown no increase in overhead. Disadvantages Dirties more cache lines during lookups. Not as fast as a hash table lookup (but still N log N and optimal when there is locality of reference). Advantages vnode->v_dirtyblkhd is now perfectly sorted, making fsync/sync/filesystem syncer operate more efficiently. I get to rip out all the old hacks (some of which were mine) that tried to keep the v_dirtyblkhd tailq sorted. The per-vnode splay tree should be easier to lock / SMPng pushdown on vnodes will be easier. This commit along with another that Alan is working on for the VM page global hash table will allow me to implement ranged fsync(), optimize server-side nfs commit rpcs, and implement partial syncs by the filesystem syncer (aka filesystem syncer would detect that someone is trying to get the vnode lock, remembers its place, and skip to the next vnode). Note that the buffer cache splay is somewhat more complex then other splays due to special handling of background bitmap writes (multiple buffers with the same lblkno in the same vnode), and B_INVAL discontinuities between the old hash table and the existence of the buffer on the v_cleanblkhd list. Suggested by: alc
97658	31-May-2002	tanimura	Back out my lats commit of locking down a socket, it conflicts with hsu's work. Requested by: hsu
96972	20-May-2002	tanimura	Lock down a socket, milestone 1. o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a socket buffer. The mutex in the receive buffer also protects the data in struct socket. o Determine the lock strategy for each members in struct socket. o Lock down the following members: - so_count - so_options - so_linger - so_state o Remove *_locked() socket APIs. Make the following socket APIs touching the members above now require a locked socket: - sodisconnect() - soisconnected() - soisconnecting() - soisdisconnected() - soisdisconnecting() - sofree() - soref() - sorele() - sorwakeup() - sotryfree() - sowakeup() - sowwakeup() Reviewed by: alfred
96755	16-May-2002	trhodes	More s/file system/filesystem/g
95215	21-Apr-2002	iedowse	Limit to the maximum allowed reply size the amount of data that nfsrv_readdir and nfsrv_readdirplus can return. A client request containing an over-large `count' field could trigger the "Bad nfs svc reply" panic in nfs_syscalls.c. Spotted while trying to reproduce kern/37304, which turned out to be fixed in FreeBSD a long time ago. MFC after: 1 week
93593	01-Apr-2002	jhb	Change the suser() API to take advantage of td_ucred as well as do a general cleanup of the API. The entire API now consists of two functions similar to the pre-KSE API. The suser() function takes a thread pointer as its only argument. The td_ucred member of this thread must be valid so the only valid thread pointers are curthread and a few kernel threads such as thread0. The suser_cred() function takes a pointer to a struct ucred as its first argument and an integer flag as its second argument. The flag is currently only used for the PRISON_ROOT flag. Discussed on: smp@
92783	20-Mar-2002	jeff	Remove references to vm_zone.h and switch over to the new uma API.
92462	17-Mar-2002	mckusick	Add a flags parameter to VFS_VGET to pass through the desired locking flags when acquiring a vnode. The immediate purpose is to allow polling lock requests (LK_NOWAIT) needed by soft updates to avoid deadlock when enlisting other processes to help with the background cleanup. For the future it will allow the use of shared locks for read access to vnodes. This change touches a lot of files as it affects most filesystems within the system. It has been well tested on FFS, loopback, and CD-ROM filesystems. only lightly on the others, so if you find a problem there, please let me (mckusick@mckusick.com) know.
91406	27-Feb-2002	jhb	Simple p_ucred -> td_ucred changes to start using the per-thread ucred reference.
89367	14-Jan-2002	dillon	The vnode was not being vput()'d in the EEXIST mknod case on the nfs server side. This can lead to a system deadlock. Reviewed by: iedowse Tested by: Alexey G Misurenko <mag@caravan.ru>, iedowse Bug found with help by: Alexey G Misurenko <mag@caravan.ru> MFC at: earliest convenience
89302	13-Jan-2002	iedowse	It is required by VOP_CREATE, VOP_MKNOD, VOP_SYMLINK and VOP_MKDIR that va_mode of the supplied attributes is filled in with a valid file mode (i.e not VNOVAL, and only ALLPERM bits set). However, some NFS server op functions didn't guarantee this for all possible request messages: If a V3 client chose not include to a mode specification, we could end up creating an ffs inode with mode 0177777, requiring a manual fsck on the next reboot. Fix this by setting va_mode to 0 before calling the VOP if a mode hasn't been supplied by the client. In nfsrv_symlink(), S_IFMT bits supplied by a V2 client could end up in the va_mode passed to VOP_SYMLINK with similar effects. We now use the macro nfstov_mode() to correctly mask the bits.
89278	12-Jan-2002	iedowse	Fix a few NFSv2 issues that slipped in during the big cleanup. The semantics of the nfsm_reply() macro were changed so that the caller has to explicitly handle the V2 error case, whereas before, nfsm_reply() did a `goto nfsmout' then. A few server ops (setattr, readlink, create, mkdir) weren't updated to match, so errors in the V2 case could cause protocol hangs and leaked mbufs. Correct some comments that describe the old nfsm_reply behaviour. [older, harmless nit] Remove the unnecessary `nfsmreply0' label in nfsrv_create(), since for its users, the main `ereply' label does the same thing.
89272	11-Jan-2002	iedowse	The macro nfsm_reply() is supposed to allocate a reply in all cases, but since the nfs cleanup, it hasn't done so in the case where `error' is EBADRPC. Callers of this macro expect it to initialise mrq, and the `nfsmout' exit point expects a reply to be allocated if error == 0. When nfsm_reply() was called with error = EBADRPC, whatever junk was in mrq (often a stale pointer to an old reply mbuf) would be assumed to be a valid reply and passed to pru_sosend(), causing a crash sooner or later. Fix this by allocating a reply even in the EBADRPC case like we used to do. This bug was specific to -current.
89094	08-Jan-2002	msmith	Rename some variables that end up shadowing their namesakes in the NFS client code. Reviewed by: peter
88091	18-Dec-2001	iedowse	Avoid passing the variable `tl' to functions that just use it for temporary storage. In the old NFS code it wasn't at all clear if the value of `tl' was used across or after macro calls, but I'm fairly confident that the convention was to keep its use local. Each ex-macro function now uses a local version of this variable, so all of the double-indirection goes away. The only exception to the `local use' rule for `tl' is nfsm_clget(), which is left unchanged by this commit. Reviewed by: peter
87361	04-Dec-2001	iedowse	When VOP_SYMLINK fails, the value of *vpp is junk, so we must NULL out nd.ni_vp to prevent the resource cleanup code at the end of nfsrv_symlink from trying to vrele it. This fixes a "vrele: negative ref cnt" panic that can occur when a symlink is attempted on an NFS filesystem with no free space. Found locally, but the symptoms correspond to those in the PR referenced below. PR: kern/26878 MFC after: 3 days
86487	17-Nov-2001	dillon	Give struct socket structures a ref counting interface similar to vnodes. This will hopefully serve as a base from which we can expand the MP code. We currently do not attempt to obtain any mutex or SX locks, but the door is open to add them when we nail down exactly how that part of it is going to work.
86432	15-Nov-2001	peter	Fix a leftover client comment, long line fix.
85498	25-Oct-2001	iedowse	Now that nfsm_reply() does not usually set 'error' to 0, we need to do it explicitly in nfsrv_noop so that the reply gets sent back to the client. This fixes the generation of a selection of RPC error replies (RPC_PROGMISMATCH, RPC_PROGUNAVAIL, RPC_PROCUNAVAIL etc.) that are used by some clients to detect support for optional protocols and features. Reviewed by: peter Reported by: Thomas Quinot <quinot@inf.enst.fr> PR: kern/31479
84079	28-Sep-2001	peter	Unwind some more macros. NFSMADV() was kinda silly since it was right next to equivalent m_len adjustments. Move the nfsm_subs.h macros into groups depending on which phase they are used in, since that affects the error recovery requirements. Collect some of the common error checking into a single macro as preparation for unwinding some more. Have nfs_rephead return a value instead of secretly modifying args. Remove some unused function arguments that were being passed around. Clarify nfsm_reply()'s error handling (I hope).
84078	28-Sep-2001	peter	Oops. I forgot to cvs rm this before. There is a common nfsproto.h. This was a repo copy leftover.
84057	27-Sep-2001	peter	Make nfsm_dissect() have an obvious return value.
84002	27-Sep-2001	peter	Tidy up nfsm_build usage. This is only partially finished.
83700	20-Sep-2001	peter	Wrap a module around the init code so that we have somethign do do a modfind(2) on, and declare a version so that loader/kldload etc can locate it (using kldxref's linker.hints file if needed).
83651	18-Sep-2001	peter	Cleanup and split of nfs client and server code. This builds on the top of several repo-copies.
83493	15-Sep-2001	peter	Sync some differences that were different between the copies of the files that were in nfs/nfs.h and nfsserver/nfs.h in the p4 tree.
83366	12-Sep-2001	julian	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
83291	10-Sep-2001	kris	Fix some signed/unsigned integer confusion, and add bounds checking of arguments to some functions. Obtained from: NetBSD Reviewed by: peter MFC after: 2 weeks
82702	31-Aug-2001	dillon	Pushdown Giant for nfs syscalls (nfssvc())
79224	04-Jul-2001	dillon	With Alfred's permission, remove vm_mtx in favor of a fine-grained approach (this commit is just the first stage). Also add various GIANT_ macros to formalize the removal of Giant, making it easy to test in a more piecemeal fashion. These macros will allow us to test fine-grained locks to a degree before removing Giant, and also after, and to remove Giant in a piecemeal fashion via sysctl's on those subsystems which the authors believe can operate without Giant.
78911	28-Jun-2001	jhb	- Protect the mnt_vnode list with the mntvnode lock. - Use queue(9) macros.
76827	19-May-2001	alfred	Introduce a global lock for the vm subsystem (vm_mtx). vm_mtx does not recurse and is required for most low level vm operations. faults can not be taken without holding Giant. Memory subsystems can now call the base page allocators safely. Almost all atomic ops were removed as they are covered under the vm mutex. Alpha and ia64 now need to catch up to i386's trap handlers. FFS and NFS have been tested, other filesystems will need minor changes (grabbing the vm lock when twiddling page properties). Reviewed (partially) by: jake, jhb
76166	01-May-2001	markm	Undo part of the tangle of having sys/lock.h and sys/mutex.h included in other "system" header files. Also help the deprecation of lockmgr.h by making it a sub-include of sys/lock.h and removing sys/lockmgr.h form kernel .c files. Sort sys/*.h includes where possible in affected files. OK'ed by: bde (with reservations)
76117	29-Apr-2001	grog	Revert consequences of changes to mount.h, part 2. Requested by: bde
75858	23-Apr-2001	grog	Correct #includes to work with fixed sys/mount.h.
75631	17-Apr-2001	alfred	Implement client side NFS locks. Obtained from: BSD/os Import Ok'd by: mckusick, jkh, motd on builder.freebsd.org
74384	17-Mar-2001	peter	Use a generic implementation of the Fowler/Noll/Vo hash (FNV hash). Make the name cache hash as well as the nfsnode hash use it. As a special tweak, create an unsigned version of register_t. This allows us to use a special tweak for the 64 bit versions that significantly speeds up the i386 version (ie: int64 XOR int64 is slower than int64 XOR int32). The code layout is a little strange for the string function, but I was able to get between 5 to 10% improvement over the original version I started with. The layout affects gcc code generation choices and this way was fastest on x86 and alpha. Note that 'CPUTYPE=p3' etc makes a fair difference to this. It is around 45% faster with -march=pentiumpro on a p6 cpu.
72650	18-Feb-2001	green	Switch to using a struct xucred instead of a struct xucred when not actually in the kernel. This structure is a different size than what is currently in -CURRENT, but should hopefully be the last time any application breakage is caused there. As soon as any major inconveniences are removed, the definition of the in-kernel struct ucred should be conditionalized upon defined(_KERNEL). This also changes struct export_args to remove dependency on the constantly-changing struct ucred, as well as limiting the bounds of the size fields to the correct size. This means: a) mountd and friends won't break all the time, b) mountd and friends won't crash the kernel all the time if they don't know what they're doing wrt actual struct export_args layout. Reviewed by: bde
72645	18-Feb-2001	asmodai	Preceed/preceeding are not english words. Use precede and preceding.
72216	09-Feb-2001	iedowse	Fix some problems that were introduced in revision 1.97. Instead of returning an error code to the caller, NFS server op routines must themselves build an error reply and return 0 to the caller. This is achieved by replacing the erroneous return statements with code that jumps forward to the op function's reply code. We need to be careful to ensure that the 'struct mount' pointer is NULL though, so that the final vn_finished_write() call becomes a no-op. Reviewed by: mckusick, dillon
70254	21-Dec-2000	bmilekic	* Rename M_WAIT mbuf subsystem flag to M_TRYWAIT. This is because calls with M_WAIT (now M_TRYWAIT) may not wait forever when nothing is available for allocation, and may end up returning NULL. Hopefully we now communicate more of the right thing to developers and make it very clear that it's necessary to check whether calls with M_(TRY)WAIT also resulted in a failed allocation. M_TRYWAIT basically means "try harder, block if necessary, but don't necessarily wait forever." The time spent blocking is tunable with the kern.ipc.mbuf_wait sysctl. M_WAIT is now deprecated but still defined for the next little while. * Fix a typo in a comment in mbuf.h * Fix some code that was actually passing the mbuf subsystem's M_WAIT to malloc(). Made it pass M_WAITOK instead. If we were ever to redefine the value of the M_WAIT flag, this could have became a big problem.
69781	08-Dec-2000	dwmalone	Convert more malloc+bzero to malloc+M_ZERO. Submitted by: josh@zipperup.org Submitted by: Robert Drehmel <robd@gmx.net>
69214	26-Nov-2000	phk	Simplify the tprintf() API. Loose the special <sys/tprintf.h> #include file.
68883	18-Nov-2000	dillon	This patchset fixes a large number of file descriptor race conditions. Pre-rfork code assumed inherent locking of a process's file descriptor array. However, with the advent of rfork() the file descriptor table could be shared between processes. This patch closes over a dozen serious race conditions related to one thread manipulating the table (e.g. closing or dup()ing a descriptor) while another is blocked in an open(), close(), fcntl(), read(), write(), etc... PR: kern/11629 Discussed with: Alexander Viro <viro@math.psu.edu>
68711	14-Nov-2000	mckusick	In preparation for deprecating CIRCLEQ macros in favor of TAILQ macros which provide the same functionality and are a bit more efficient, convert use of CIRCLEQ's in NFS to TAILQ's.
67882	29-Oct-2000	phk	Remove unneeded #include <sys/proc.h> lines.
67486	24-Oct-2000	dwmalone	Problem to avoid processes getting stuck in "vmopar". From Ian's mail: The problem seems to originate with NFS's postop_attr information that is returned with a read or write RPC. Within a vm_fault context, the code cannot deal with vnode_pager_setsize() shrinking a vnode. The workaround in the patch below stops the nfsm_postop_attr() macro from ever shrinking a vnode. If the new size in the postop_attr information is smaller, then it just sets the nfsnode n_attrstamp to 0 to stop the wrong size getting used in the future. This change only affects postop_attr attributes; the nfsm_loadattr() macro works as normal. The change is implemented by adding a new argument to nfs_loadattrcache() called 'dontshrink'. When this is non-zero, nfs_loadattrcache() will never reduce the vnode/nfsnode size; instead it zeros n_attrstamp. There remain other was processes can get stuck in vmopar. Submitted by: Ian Dowse <iedowse@maths.tcd.ie> Reviewed by: dillon Tested by: Vadim Belman <voland@lflat.org>
65557	07-Sep-2000	jasone	Major update to the way synchronization is done in the kernel. Highlights include: * Mutual exclusion is used instead of spl(). See mutex(9). (Note: The alpha port is still in transition and currently uses both.) Per-CPU idle processes. * Interrupts are run in their own separate kernel threads and can be preempted (i386 only). Partially contributed by: BSDi (BSD/OS) Submissions by (at least): cp, dfr, dillon, grog, jake, jhb, sheldonh
63788	24-Jul-2000	mckusick	This patch corrects the first round of panics and hangs reported with the new snapshot code. Update addaliasu to correctly implement the semantics of the old checkalias function. When a device vnode first comes into existence, check to see if an anonymous vnode for the same device was created at boot time by bdevvp(). If so, adopt the bdevvp vnode rather than creating a new vnode for the device. This corrects a problem which caused the kernel to panic when taking a snapshot of the root filesystem. Change the calling convention of vn_write_suspend_wait() to be the same as vn_start_write(). Split out softdep_flushworklist() from softdep_flushfiles() so that it can be used to clear the work queue when suspending filesystem operations. Access to buffers becomes recursive so that snapshots can recursively traverse their indirect blocks using ffs_copyonwrite() when checking for the need for copy on write when flushing one of their own indirect blocks. This eliminates a deadlock between the syncer daemon and a process taking a snapshot. Ensure that softdep_process_worklist() can never block because of a snapshot being taken. This eliminates a problem with buffer starvation. Cleanup change in ffs_sync() which did not synchronously wait when MNT_WAIT was specified. The result was an unclean filesystem panic when doing forcible unmount with heavy filesystem I/O in progress. Return a zero'ed block when reading a block that was not in use at the time that a snapshot was taken. Normally, these blocks should never be read. However, the readahead code will occationally read them which can cause unexpected behavior. Clean up the debugging code that ensures that no blocks be written on a filesystem while it is suspended. Snapshots must explicitly label the blocks that they are writing during the suspension so that they do not cause a `write on suspended filesystem' panic. Reorganize ffs_copyonwrite() to eliminate a deadlock and also to prevent a race condition that would permit the same block to be copied twice. This change eliminates an unexpected soft updates inconsistency in fsck caused by the double allocation. Use bqrelse rather than brelse for buffers that will be needed soon again by the snapshot code. This improves snapshot performance.
62976	11-Jul-2000	mckusick	Add snapshots to the fast filesystem. Most of the changes support the gating of system calls that cause modifications to the underlying filesystem. The gating can be enabled by any filesystem that needs to consistently suspend operations by adding the vop_stdgetwritemount to their set of vnops. Once gating is enabled, the function vfs_write_suspend stops all new write operations to a filesystem, allows any filesystem modifying system calls already in progress to complete, then sync's the filesystem to disk and returns. The function vfs_write_resume allows the suspended write operations to begin again. Gating is not added by default for all filesystems as for SMP systems it adds two extra locks to such critical kernel paths as the write system call. Thus, gating should only be added as needed. Details on the use and current status of snapshots in FFS can be found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness is not included here. Unless and until you create a snapshot file, these changes should have no effect on your system (famous last words).
60938	26-May-2000	jake	Back out the previous change to the queue(3) interface. It was not discussed and should probably not happen. Requested by: msmith and others
60833	23-May-2000	jake	Change the way that the queue(3) structures are declared; don't assume that the type argument to _HEAD and _ENTRY is a struct. Suggested by: phk Reviewed by: phk Approved by: mdodd
60041	05-May-2000	phk	Separate the struct bio related stuff out of <sys/buf.h> into <sys/bio.h>. <sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall not be made a nested include according to bdes teachings on the subject of nested includes. Diskdrivers and similar stuff below specfs::strategy() should no longer need to include <sys/buf.> unless they need caching of data. Still a few bogus uses of struct buf to track down. Repocopy by: peter
59794	30-Apr-2000	phk	Remove unneeded #include <vm/vm_zone.h> Generated by: src/tools/tools/kerninclude
59391	19-Apr-2000	phk	Remove ~25 unneeded #include <sys/conf.h> Remove ~60 unneeded #include <sys/malloc.h>
58710	27-Mar-2000	dillon	Add a sysctl to specify the amount of UDP receive space NFS should reserve, in maximal NFS packets. Originally only 2 packets worth of space was reserved. The default is now 4, which appears to greatly improve performance for slow to mid-speed machines on gigabit networks. Add documentation and correct some prior documentation. Problem Researched by: Andrew Gallatin <gallatin@cs.duke.edu> Approved by: jkh
58349	20-Mar-2000	phk	Rename the existing BUF_STRATEGY() to DEV_STRATEGY() substitute BUF_WRITE(foo) for VOP_BWRITE(foo->b_vp, foo) substitute BUF_STRATEGY(foo) for VOP_STRATEGY(foo->b_vp, foo) This patch is machine generated except for the ccd.c and buf.h parts.
58345	20-Mar-2000	phk	Remove B_READ, B_WRITE and B_FREEBUF and replace them with a new field in struct buf: b_iocmd. The b_iocmd is enforced to have exactly one bit set. B_WRITE was bogusly defined as zero giving rise to obvious coding mistakes. Also eliminate the redundant struct buf flag B_CALL, it can just as efficiently be done by comparing b_iodone to NULL. Should you get a panic or drop into the debugger, complaining about "b_iocmd", don't continue. It is likely to write on your disk where it should have been reading. This change is a step in the direction towards a stackable BIO capability. A lot of this patch were machine generated (Thanks to style(9) compliance!) Vinum users: Greg has not had time to test this yet, be careful.
57178	13-Feb-2000	peter	Clean up some loose ends in the network code, including the X.25 and ISO #ifdefs. Clean out unused netisr's and leftover netisr linker set gunk. Tested on x86 and alpha, including world. Approved by: jkh
55934	13-Jan-2000	dillon	The alpha build cuases the 'nfsuid bloated' warning to occur. Well, there is nothing we can do about it. In fact, after further review there simply are not very many instances of the two structures NFS checks for 'bloat' so I've decided to simply rip the checks out entirely. Submitted by: Andrew Gallatin <gallatin@cs.duke.edu>
55679	09-Jan-2000	shin	tcp updates to support IPv6. also a small patch to sys/nfs/nfs_socket.c, as max_hdr size change. Reviewed by: freebsd-arch, cvs-committers Obtained from: KAME project
55431	05-Jan-2000	dillon	Enhance reassignbuf(). When a buffer cannot be time-optimally inserted into vnode dirtyblkhd we append it to the list instead of prepend it to the list in order to maintain a 'forward' locality of reference, which is arguably better then 'reverse'. The original algorithm did things this way to but at a huge time cost. Enhance the append interlock for NFS writes to handle intr/soft mounts better. Fix the hysteresis for NFS async daemon I/O requests to reduce the number of unnecessary context switches. Modify handling of NFS mount options. Any given user option that is too high now defaults to the kernel maximum for that option rather then the kernel default for that option. Reviewed by: Alfred Perlstein <bright@wintelcom.net>
55206	29-Dec-1999	peter	Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL" is an application space macro and the applications are supposed to be free to use it as they please (but cannot). This is consistant with the other BSD's who made this change quite some time ago. More commits to come.
54970	21-Dec-1999	alfred	make getfh a standard syscall instead of dependant on having NFSSERVER defined, useful for userland fileservers that want to use a filehandle type interface to the filesystem. Submitted by: Assar Westerlund assar@stacken.kth.se PR: kern/15452
54799	19-Dec-1999	green	M_PREPEND-related cleanups (unregisterifying struct mbuf *s).
54785	18-Dec-1999	dillon	Fix compilation warning on alpha when converting pointer to integer to generate hash index. Reviewed by: Andrew Gallatin <gallatin@cs.duke.edu>
54693	16-Dec-1999	dillon	Have NFS use a snapshot of boottime instead of boottime itself to generate the NFSv3 Version id. boottime itself may change, sometimes once every tick if you are running xntpd, which really throws off clients. Clients will tend to throw away what they believe to be stale data too often, and can get into long loops rewriting the same data over and over again because they believe the server has rebooted over and over again due to the changing version id. Approved by: jkh
54655	15-Dec-1999	eivind	Introduce NDFREE (and remove VOP_ABORTOP)
54567	13-Dec-1999	dillon	Add a readahead heuristic to the NFS server side code. While the server cannot unilaterally pass data to a client it can reduce the physical disk transaction overhead by reading larger blocks. This results in better pipelining of requests/responses over the network and an almost 100% increase in cpu efficiency on the server. On a 100BaseTX network NFS read performance increases from 8.5 MBytes/sec to 10 MB/sec (maxed out), and cpu efficiency increases from 72% idle to 80% idle on the server. Reviewed by: Alfred Perlstein <bright@wintelcom.net>
54564	13-Dec-1999	dillon	PR: kern/15222 Submitted by: Ian Dowse <iedowse@maths.tcd.ie>
54536	13-Dec-1999	dillon	Fix a timeout deadlock that can occur when the process holding the receive lock hasn't yet managed to send its own request. PR: kern/15055 Submitted by: Ian Dowse iedowse@maths.tcd.ie
54485	12-Dec-1999	dillon	Fix a number of server-side issues related to aborting badly formed NFS packets, mainly initializing structure pointers to NULL which are conditionally freed prior to return. PR: kern/15249 Submitted by: Ian Dowse <iedowse@maths.tcd.ie>
54480	12-Dec-1999	dillon	Synopsis of problem being fixed: Dan Nelson originally reported that blocks of zeros could wind up in a file written to over NFS by a client. The problem only occurs a few times per several gigabytes of data. This problem turned out to be bug #3 below. bug #1: B_CLUSTEROK must be cleared when an NFS buffer is reverted from stage 2 (ready for commit rpc) to stage 1 (ready for write). Reversions can occur when a dirty NFS buffer is redirtied with new data. Otherwise the VFS/BIO system may end up thinking that a stage 1 NFS buffer is clusterable. Stage 1 NFS buffers are not clusterable. bug #2: B_CLUSTEROK was inappropriately set for a 'short' NFS buffer (short buffers only occur near the EOF of the file). Change to only set when the buffer is a full biosize (usually 8K). This bug has no effect but should be fixed in -current anyway. It need not be backported. bug #3: B_NEEDCOMMIT was inappropriately set in nfs_flush() (which is typically only called by the update daemon). nfs_flush() does a multi-pass loop but due to the lack of vnode locking it is possible for new buffers to be added to the dirtyblkhd list while a flush operation is going on. This may result in nfs_flush() setting B_NEEDCOMMIT on a buffer which has NOT yet gone through its stage 1 write, causing only the commit rpc to be made and thus causing the contents of the buffer to be thrown away (never sent to the server). The patch also contains some cleanup, which only applies to the commit into -current. Reviewed by: dg, julian Originally Reported by: Dan Nelson <dnelson@emsphone.com>
53552	22-Nov-1999	dillon	nm_srtt and nm_sdrtt are arrays[4]. Remove explicit initialization of element [4] in both, which goes beyond the end of the array, leaving [0], [1], [2], and [3]. This bug did not cause any problems since the overrun fields are initialized after the bogus array init but needs to be fixed anyway. Submitted by: Ian Dowse <iedowse@maths.tcd.ie>
53131	13-Nov-1999	eivind	Remove WILLRELE from VOP_SYMLINK Note: Previous commit to these files (except coda_vnops and devfs_vnops) that claimed to remove WILLRELE from VOP_RENAME actually removed it from VOP_MKNOD.
53101	12-Nov-1999	eivind	Remove WILLRELE from VOP_RENAME
53095	11-Nov-1999	dillon	Remove special case socket sharing code in order to allow nfsd to bind IP addresses to udp/cltp sockets separately. PR: kern/13049 Reviewed by: David Malone <dwmalone@maths.tcd.ie>, freebsd-current
53022	08-Nov-1999	dillon	Fix nfssvc_addsock() to not attempt to free a NULL socket structure when returning an error. Bug fix was extracted from the PR. The PR is not yet entirely resolved by this commit. PR: kern/13049 Reviewed by: Matt Dillon <dillon@freebsd.org> Submitted by: Ian Dowse <iedowse@maths.tcd.ie>
52492	25-Oct-1999	dillon	Move NFS access cache hits/misses into nfsstats structure so /usr/bin/nfsstat can get to it easily.
51906	03-Oct-1999	phk	Before we start to mess with the VFS name-cache clean things up a little bit: Isolate the namecache in its own file, and give it a dedicated malloc type.
51799	29-Sep-1999	marcel	Careless use of struct proc *p caused major problems. 'p' is allowed to be NULL in this function (nfs_sigintr). Reorder the statements and guard them all with a single if (p != NULL). reported, reviewed and tested by: jdp
51796	29-Sep-1999	dillon	Make FreeBSD less conservative in determining when to return a cookie error for a directory. I have made this change after a great deal of review although I cannot be absolutely sure that this meets the spec. The issue devolves into whether changes in an underlying (UFS) directory can cause NFS directory blocks to be renumbered. My read of the code indicates that NFS directory blocks will not be renumbered, which means that the cookies should still remain valid after a change is made to the underlying directory. This being the case, a cookie error should not be returned when a change is made to the underlying directory and, instead, the NFS client should rely on mtime detection to invalidate and reload the directory. The use of mtime is problematic in of itself, due to insufficient resolution, which is why I believe the original conservative error handling was done. Still, there have been dozens of bug reports by people needing solaris<->FreeBSD interoperability and these have to be accomodated.
51791	29-Sep-1999	marcel	sigset_t change (part 2 of 5) ----------------------------- The core of the signalling code has been rewritten to operate on the new sigset_t. No methodological changes have been made. Most references to a sigset_t object are through macros (see signalvar.h) to create a level of abstraction and to provide a basis for further improvements. The NSIG constant has not been changed to reflect the maximum number of signals possible. The reason is that it breaks programs (especially shells) which assume that all signals have a non-null name in sys_signame. See src/bin/sh/trap.c for an example. Instead _SIG_MAXSIG has been introduced to hold the maximum signal possible with the new sigset_t. struct sigprop has been moved from signalvar.h to kern_sig.c because a) it is only used there, and b) access must be done though function sigprop(). The latter because the table doesn't holds properties for all signals, but only for the first NSIG signals. signal.h has been reorganized to make reading easier and to add the new and/or modified structures. The "old" structures are moved to signalvar.h to prevent namespace polution. Especially the coda filesystem suffers from the change, because it contained lines like (p->p_sigmask == SIGIO), which is easy to do for integral types, but not for compound types. NOTE: kdump (and port linux_kdump) must be recompiled. Thanks to Garrett Wollman and Daniel Eischen for pressing the importance of changing sigreturn as well.
51344	17-Sep-1999	dillon	Asynchronized client-side nfs_commit. NFS commit operations were previously issued synchronously even if async daemons (nfsiod's) were available. The commit has been moved from the strategy code to the doio code in order to asynchronize it. Removed use of lastr in preparation for removal of vnode->v_lastr. It has been replaced with seqcount, which is already supported by the system and, in fact, gives us a better heuristic for sequential detection then lastr ever did. Made major performance improvements to the server side commit. The server previously fsync'd the entire file for each commit rpc. The server now bawrite()s only those buffers related to the offset/size specified in the commit rpc. Note that we do not commit the meta-data yet. This works still needs to be done. Note that a further optimization can be done (and has not yet been done) on the client: we can merge multiple potential commit rpc's into a single rpc with a greater file offset/size range and greatly reduce rpc traffic. Reviewed by: Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com>
51138	11-Sep-1999	alfred	Seperate the export check in VFS_FHTOVP, exports are now checked via VFS_CHECKEXP. Add fh(open\|stat\|stafs) syscalls to allow userland to query filesystems based on (network) filehandle. Obtained from: NetBSD
50521	28-Aug-1999	phk	remove unused variables.
50477	28-Aug-1999	peter	$Id$ -> $FreeBSD$
50405	26-Aug-1999	phk	Simplify the handling of VCHR and VBLK vnodes using the new dev_t: Make the alias list a SLIST. Drop the "fast recycling" optimization of vnodes (including the returning of a prexisting but stale vnode from checkalias). It doesn't buy us anything now that we don't hardlimit vnodes anymore. Rename checkalias2() and checkalias() to addalias() and addaliasu() - which takes dev_t and udev_t arg respectively. Make the revoke syscalls use vcount() instead of VALIASED. Remove VALIASED flag, we don't need it now and it is faster to traverse the much shorter lists than to maintain the flag. vfs_mountedon() can check the dev_t directly, all the vnodes point to the same one. Print the devicename in specfs/vprint(). Remove a couple of stale LFS vnode flags. Remove unimplemented/unused LK_DRAINED;
50053	19-Aug-1999	peter	Convert all the nfs macros to do { blah } while (0) to ensure it works correctly in if/else etc. egcs had probably picked up most of the problems here before with "ambiguous braces" etc, but this should increase the robustness a bit. Based on an idea from Eivind Eklund.
49535	08-Aug-1999	phk	Decommision miscfs/specfs/specdev.h. Most of it goes into <sys/conf.h>, a few lines into <sys/vnode.h>. Add a few fields to struct specinfo, paving the way for the fun part.
49405	04-Aug-1999	peter	Don't over-allocate and over-copy shorter NFSv2 filehandles and then correct the pointers afterwards. It's kinda bogus that we generate a 24 (?) byte filehandle (2 x int32 fsid and 16 byte VFS fhandle) and pad it out to 64 bytes for NFSv3 with garbage. The whole point of NFSv3's variable filehandle length was to allow for shorter handles, both in memory and over the wire. I plan on taking a shot at fixing this shortly.
49232	29-Jul-1999	wpaul	Correct the sanity test length calculation in nfsrv_readdirplus(): len is being incremented by 4 bytes too few each time through the loop, which allows more data into the mbuf chain that we really want. In the worst case, when we're using 32K read/write sizes with a TCP client, this causes readdirplus replies to sometimes exceed NFS_MAXPACKET which leads to a panic. This problem cropped up for me using an IRIX 6.5.4 NFSv3 TCP client with 32K read/write sizes, however supposedly it can be triggered by WinNT NFS servers too. In theory, it can probably be triggered by any NFS v3 implementation using TCP as long as it's using the maxiumum block size. Reviewed by: Matthew Dillon <dillon@backplane.com>
49158	28-Jul-1999	alc	Clear error in nfsrv_create when we have a valid reply so that that reply is actually transmitted. Submitted by: dillon
48859	17-Jul-1999	phk	I have not one single time remembered the name of this function correctly so obviously I gave it the wrong name. s/umakedev/makeudev/g
48362	30-Jun-1999	julian	Submitted by: "David E. Cross" <crossd@cs.rpi.edu> Matt missed a line..
48274	27-Jun-1999	peter	Minor tweaks to make sure (new) prerequisites for <sys/buf.h> (mostly splbio()/splx()) are #included in time.
48225	26-Jun-1999	mckusick	Convert buffer locking from using the B_BUSY and B_WANTED flags to using lockmgr locks. This commit should be functionally equivalent to the old semantics. That is, all buffer locking is done with LK_EXCLUSIVE requests. Changes to take advantage of LK_SHARED and LK_RECURSIVE will be done in future commits.
48125	23-Jun-1999	julian	Matt's NFS fixes. Submitted by: Matt Dillon Reviewed by: David Cross, Julian Elischer, Mike Smith, Drew Gallatin 3.2 version to follow when tested
47751	05-Jun-1999	peter	Various changes lifted from the OpenBSD cvs tree: txdr_hyper and fxdr_hyper tweaks to avoid excessive CPU order knowledge. nfs_serv.c: don't call nfsm_adj() with negative values, windows clients could crash servers when doing a readdir of a large directory. nfs_socket.c: Use IP_PORTRANGE to get a priviliged port without a spin loop trying to bind(). Don't clobber a mbuf pointer or we get panics on a NFS3ERR_JUKEBOX error from a server when reusing a freed mbuf. nfs_subs.c: Don't loose st_blocks on NFSv2 mounts when > 2GB. Obtained from: OpenBSD
47028	11-May-1999	phk	Divorce "dev_t" from the "major\|minor" bitmap, which is now called udev_t in the kernel but still called dev_t in userland. Provide functions to manipulate both types: major() umajor() minor() uminor() makedev() umakedev() dev2udev() udev2dev() For now they're functions, they will become in-line functions after one of the next two steps in this process. Return major/minor/makedev to macro-hood for userland. Register a name in cdevsw[] for the "filedescriptor" driver. In the kernel the udev_t appears in places where we have the major/minor number combination, (ie: a potential device: we may not have the driver nor the device), like in inodes, vattr, cdevsw registration and so on, whereas the dev_t appears where we carry around a reference to a actual device. In the future the cdevsw and the aliased-from vnode will be hung directly from the dev_t, along with up to two softc pointers for the device driver and a few houskeeping bits. This will essentially replace the current "alias" check code (same buck, bigger bang). A little stunt has been provided to try to catch places where the wrong type is being used (dev_t vs udev_t), if you see something not working, #undef DEVT_FASCIST in kern/kern_conf.c and see if it makes a difference. If it does, please try to track it down (many hands make light work) or at least try to reproduce it as simply as possible, and describe how to do that. Without DEVT_FASCIST I belive this patch is a no-op. Stylistic/posixoid comments about the userland view of the <sys/*.h> files welcome now, from userland they now contain the end result. Next planned step: make all dev_t's refer to the same devsw[] which means convert BLK's to CHR's at the perimeter of the vnodes and other places where they enter the game (bootdev, mknod, sysctl).
46580	06-May-1999	phk	remove b_proc from struct buf, it's (now) unused. Reviewed by: dillon, bde
46568	06-May-1999	peter	Add sufficient braces to keep egcs happy about potentially ambiguous if/else nesting.
46349	02-May-1999	alc	The VFS/BIO subsystem contained a number of hacks in order to optimize piecemeal, middle-of-file writes for NFS. These hacks have caused no end of trouble, especially when combined with mmap(). I've removed them. Instead, NFS will issue a read-before-write to fully instantiate the struct buf containing the write. NFS does, however, optimize piecemeal appends to files. For most common file operations, you will not notice the difference. The sole remaining fragment in the VFS/BIO system is b_dirtyoff/end, which NFS uses to avoid cache coherency issues with read-merge-write style operations. NFS also optimizes the write-covers-entire-buffer case by avoiding the read-before-write. There is quite a bit of room for further optimization in these areas. The VM system marks pages fully-valid (AKA vm_page_t->valid = VM_PAGE_BITS_ALL) in several places, most noteably in vm_fault. This is not correct operation. The vm_pager_get_pages() code is now responsible for marking VM pages all-valid. A number of VM helper routines have been added to aid in zeroing-out the invalid portions of a VM page prior to the page being marked all-valid. This operation is necessary to properly support mmap(). The zeroing occurs most often when dealing with file-EOF situations. Several bugs have been fixed in the NFS subsystem, including bits handling file and directory EOF situations and buf->b_flags consistancy issues relating to clearing B_ERROR & B_INVAL, and handling B_DONE. getblk() and allocbuf() have been rewritten. B_CACHE operation is now formally defined in comments and more straightforward in implementation. B_CACHE for VMIO buffers is based on the validity of the backing store. B_CACHE for non-VMIO buffers is based simply on whether the buffer is B_INVAL or not (B_CACHE set if B_INVAL clear, and vise-versa). biodone() is now responsible for setting B_CACHE when a successful read completes. B_CACHE is also set when a bdwrite() is initiated and when a bwrite() is initiated. VFS VOP_BWRITE routines (there are only two - nfs_bwrite() and bwrite()) are now expected to set B_CACHE. This means that bowrite() and bawrite() also set B_CACHE indirectly. There are a number of places in the code which were previously using buf->b_bufsize (which is DEV_BSIZE aligned) when they should have been using buf->b_bcount. These have been fixed. getblk() now clears B_DONE on return because the rest of the system is so bad about dealing with B_DONE. Major fixes to NFS/TCP have been made. A server-side bug could cause requests to be lost by the server due to nfs_realign() overwriting other rpc's in the same TCP mbuf chain. The server's kernel must be recompiled to get the benefit of the fixes. Submitted by: Matthew Dillon <dillon@apollo.backplane.com>
46155	28-Apr-1999	phk	This Implements the mumbled about "Jail" feature. This is a seriously beefed up chroot kind of thing. The process is jailed along the same lines as a chroot does it, but with additional tough restrictions imposed on what the superuser can do. For all I know, it is safe to hand over the root bit inside a prison to the customer living in that prison, this is what it was developed for in fact: "real virtual servers". Each prison has an ip number associated with it, which all IP communications will be coerced to use and each prison has its own hostname. Needless to say, you need more RAM this way, but the advantage is that each customer can run their own particular version of apache and not stomp on the toes of their neighbors. It generally does what one would expect, but setting up a jail still takes a little knowledge. A few notes: I have no scripts for setting up a jail, don't ask me for them. The IP number should be an alias on one of the interfaces. mount a /proc in each jail, it will make ps more useable. /proc/<pid>/status tells the hostname of the prison for jailed processes. Quotas are only sensible if you have a mountpoint per prison. There are no privisions for stopping resource-hogging. Some "#ifdef INET" and similar may be missing (send patches!) If somebody wants to take it from here and develop it into more of a "virtual machine" they should be most welcome! Tools, comments, patches & documentation most welcome. Have fun... Sponsored by: http://www.rndassociates.com/ Run for almost a year by: http://www.servetheweb.com/
46112	27-Apr-1999	phk	Suser() simplification: 1: s/suser/suser_xxx/ 2: Add new function: suser(struct proc ), prototyped in <sys/proc.h>. 3: s/suser_xxx($[a-zA-Z0-9_]$->p_ucred, \&\1->p_acflag)/suser(\1)/ The remaining suser_xxx() calls will be scrutinized and dealt with later. There may be some unneeded #include <sys/cred.h>, but they are left as an exercise for Bruce. More changes to the suser() API will come along with the "jail" code.
45996	24-Apr-1999	dt	Fixed printf format errors on alpha.
44246	25-Feb-1999	peter	Untangle the nfs send and receive queue locking a little. One lock routine was [ab]used for two different things, and you couldn't tell from the wait channel which one had wedged. Catch a few things missing from NFS_NOSERVER.
44112	18-Feb-1999	dfr	Move the declaration of the vfs.nfs sysctl node outside an ifdef so that it builds if NFS_NOSERVER is defined. Spotted by: Bruce Evans <bde@zeta.org.au>
44101	17-Feb-1999	bde	Fixed bitrot in NFS_ACDEBUG option.
44078	16-Feb-1999	dfr	* Change sysctl from using linker_set to construct its tree using SLISTs. This makes it possible to change the sysctl tree at runtime. * Change KLD to find and register any sysctl nodes contained in the loaded file and to unregister them when the file is unloaded. Reviewed by: Archie Cobbs <archie@whistle.com>, Peter Wemm <peter@netplex.com.au> (well they looked at it anyway)
43305	27-Jan-1999	dillon	Fix warnings in preparation for adding -Wall -Wcast-qual to the kernel compile
42957	21-Jan-1999	dillon	This is a rather large commit that encompasses the new swapper, changes to the VM system to support the new swapper, VM bug fixes, several VM optimizations, and some additional revamping of the VM code. The specific bug fixes will be documented with additional forced commits. This commit is somewhat rough in regards to code cleanup issues. Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>
42315	05-Jan-1999	eivind	Remove the 'waslocked' parameter to vfs_object_create().
42155	30-Dec-1998	hoek	Silence -Wtrigraph. Submitted by: Bradley Dunn <bradley@dunn.org> (pr: kern/8817)
42060	25-Dec-1998	dfr	Fix for creating files on a Solaris 7 server with NFSv3 (the request was slightly garbled but older servers seemed to understand it). Reviewed by: David O'Brien <obrien@nuxi.ucdavis.edu>
41796	14-Dec-1998	dt	Added 3 new errno values, requred by various standards: EOVERFLOW, ECANCELED, EILSEQ. Fixed ibcs2 and especially linux EIDRM and ENOMSG errno mapping. Reviewed by: Dan Nelson <dnelson@emsphone.com>
41619	09-Dec-1998	eivind	Remove the if fixed in the last commit; bde quite correctly point out that it can never fail.
41606	08-Dec-1998	eivind	Fix typo (; in "if (vp == NULL);").
41591	07-Dec-1998	archie	The "easy" fixes for compiling the kernel -Wunused: remove unreferenced static and local variables, goto labels, and functions declared but not defined.
41132	13-Nov-1998	dfr	Fix a panic in nfsrv_dorec() where a NULL pointer could be passed to free() sometimes. Reviewed by: Eric Haug <ejh@eas.slu.edu>
41026	09-Nov-1998	peter	Remove [apparently] bogus casts to u_long for the vnode_pager_setsize() second argument. np_size is a 64 bit int, so is the second arg. This might have caused needless 2G/4G file size problems. I believe it was Bruce who queried this.
40792	31-Oct-1998	peter	vm_object_page_clean() last arg changed from TRUE to OBJPC_SYNC. I'm not sure that this is necessary to be a sync write here since a VOP_FSYNC() follows and it will schedule, sort and complete the writes that the vm_object_page_clean() started (as I think I understand things).
40790	31-Oct-1998	peter	Use TAILQ macros for clean/dirty block list processing. Set b_xflags rather than abusing the list next pointer with a magic number.
39788	29-Sep-1998	mckusick	The code checks each fragment mark to see if it's valid; if the fragment is less than NFS_MINPACKET or greater than NFS_MAXPACKET in size, it barfs and, I think, drops the connection. However, there's no guarantee that in a multi-fragment RPC, all the fragments will be at least as large as NFS_MINPACKET. In fact, with the version of "tclnfs" we have here, which supports NFS over TCP, at least when built under SunOS 4.1.3 (i.e., with 4.1.3's user-mode ONC RPC library), I can repeatably cause "tclnfs" to send a request with more than one fragment, one of which is only 8 bytes long. I just do a 3877-byte write to a file, at an offset of 0. The check that "slp->ns_reclen" is greater than or equal to NFS_MINPACKET serves no useful purpose - if the NFS server code can't handle packets < NFS_MINPACKET bytes, it can't handle them over any protocol, so the check has to be done above the RPC-over-TCP layer - and should be removed. Obtained from: Fix from Guy Harris, forwarded by Rick Macklem.
38894	07-Sep-1998	bde	Made unloading of the nfs LKM sort of work. This is mainly to test detachment of vfs sysctls. Unloading of vfs LKMs doesn't actually work for any vfs, since it leaves garbage pointers to memory allocation control structures.
38866	05-Sep-1998	bde	Instantiate `nfs_mount_type' in a standard file so that it is present when nfs is an LKM. Declare it in a header file. Don't forget to use it in non-Lite2 code. Initialize it to -1 instead of to 0, since 0 will soon be the mount type number for the first vfs loaded. NetBSD uses strcmp() to avoid this ugly global.
38718	01-Sep-1998	luoqi	Check for NULL pointer before freeing a struct sockaddr. m_freem() can handle NULL, buf free() can't.
38482	23-Aug-1998	wollman	Yow! Completely change the way socket options are handled, eliminating another specialized mbuf type in the process. Also clean up some of the cruft surrounding IPFW, multicast routing, RSVP, and other ill-explored corners.
37997	01-Aug-1998	peter	If we get an ENOBUFS from the network, it's normally transient network interface congestion (eg: nfs over a ppp link, etc). Don't log these for UDP mounts, and don't cause syscalls to fail with EINTR. This stops the 'nfs send error 55' warnings. If the error is because the system is really hosed, this is the least of your problems...
37649	15-Jul-1998	bde	Cast pointers to uintptr_t/intptr_t instead of to u_long/long, respectively. Most of the longs should probably have been u_longs, but this changes is just to prevent warnings about casts between pointers and integers of different sizes, not to fix poorly chosen types.
37339	02-Jul-1998	kato	Moved `#ifndef NFS_NOSERVER' after including nfs.h.
37291	30-Jun-1998	jmg	fix buildworld hopefully be3fore anyone complains... NFS_*TIMO should possibly be converted to sysctl vars (jkh's suggestion), but in some cases it looks like nfs keeps a copy of the value in a struct hash sizes are already ifdef'd KERNEL, so there aren't userland inpact from them...
37272	30-Jun-1998	jmg	convert some nfs tunables to options, these are: NFS_MINATTRTIMO VREG attrib cache timeout in sec NFS_MAXATTRTIMO NFS_MINDIRATTRTIMO VDIR attrib cache timeout in sec NFS_MAXDIRATTRTIMO NFS_GATHERDELAY Default write gather delay (msec) NFS_UIDHASHSIZ Tune the size of nfssvc_sock with this NFS_WDELAYHASHSIZ and with this NFS_MUIDHASHSIZ Tune the size of nfsmount with this NFS_NOSERVER (already documented in LINT) NFS_DEBUG turn on NFS debugging also, because NFS_ROOT is used by very different files, it has been renamed to opt_nfsroot.h instead of the old opt_nfs.h....
37089	21-Jun-1998	bde	Fixed typo in ifdefed code. (NFS_ACDEBUG is not in LINT. Therefore, code controlled by it did not even compile.)
36979	14-Jun-1998	bde	Avoid an egcs pessimization for 64-bit signed division on i386's. Pre-2.8 versions of gcc generate a call to __divdi3() for all 64-bit signed divisions, but egcs optimizes them to a shift and fixup when the divisor is a constant power of 2. Unfortunately, it generates a call to __cmpdi2() for the fixup, although all except possibly ancient versions of gcc and egcs do ordinary 64-bit comparisons inline.
36735	07-Jun-1998	dfr	This commit fixes various 64bit portability problems required for FreeBSD/alpha. The most significant item is to change the command argument to ioctl functions from int to u_long. This change brings us inline with various other BSD versions. Driver writers may like to use (__FreeBSD_version == 300003) to detect this change. The prototype FreeBSD/alpha machdep will follow in a couple of days time.
36541	31-May-1998	peter	For the on-the-wire protocol, u_long -> u_int32_t; long -> int32_t; int -> int32_t; u_short -> u_int16_t. Also, use mode_t instead of u_short for storing modes (mode_t is a u_int16_t). Obtained from: NetBSD
36540	31-May-1998	peter	Support 'mount -u' remounts. This may require disconnecting and rebinding the socket. Certain mode changes are not allowed. Obtained from: NetBSD
36539	31-May-1998	peter	Cut-n-paste glitch
36534	31-May-1998	peter	Prototype support for selectively allowing non-reserved ports on a per export basis. Needs userland support yet. Obtained from: NetBSD
36533	31-May-1998	peter	Hide whiteouts from NFS, since the protocol doesn't support them. Obtained from: NetBSD
36532	31-May-1998	peter	NetBSD has a comment that Solaris 2.5 doesn't do verifiers correctly, we have weakened this test already for Digital Unix, so it may be enough for Solaris. It needs to be checked again. Obtained from: NetBSD
36531	31-May-1998	peter	Don't pass a second copy of the uid/gid in with the v2/v3 sattr structures, it just makes more work. We pass a copy of the uid/gid with the credentials. (although, this may need to be revisited if a non AUTHUNIX authentication method (such as NFSKERB) ever gets implemented). Obtained from: NetBSD
36530	31-May-1998	peter	Use the new SB_UPCALL flag, Obtained from: NetBSD (but I changed the flag clear order in case).
36520	31-May-1998	peter	Don't try and free mrep twice on some error conditions. Obtained from: NetBSD
36519	31-May-1998	peter	#ifdef a diagnostic panic, plus another missed costmetic change. Obtained from: NetBSD
36518	31-May-1998	peter	We have gained 2 more errno's, add them to the NFSv2 mapping table.
36517	31-May-1998	peter	Missed a cosmetic change that the other BSD's have.
36516	31-May-1998	peter	oops, nfs_msg() is called from client code too.
36515	31-May-1998	peter	When we can't reconnect a socket, don't forget to unlock before retrying or we can deadlock. Obtained from: NetBSD
36514	31-May-1998	peter	Don't log zero length reads, this can happen during normal operation. Obtained from: NetBSD
36513	31-May-1998	peter	Consider for readdir chunk sizes when tuning socket buffer reservations. Obtained from: NetBSD
36512	31-May-1998	peter	Refuse READDIR / READDIRPLUS rpc's for non-directories Obtained from: NetBSD
36511	31-May-1998	peter	Some const's Obtained from: NetBSD
36503	31-May-1998	peter	NFS Jumbo commit part 1. Cosmetic and structural changes only. The aim of this part of commits is to minimize unnecessary differences between the other NFS's of similar origin. Yes, there are gratuitous changes here that the style folks won't like, but it makes the catch-up less difficult.
36473	30-May-1998	peter	When using NFSv3, use the remote server's idea of the maximum file size rather than assuming 2^64. It may not like files that big. :-) On the nfs server, calculate and report the max file size as the point that the block numbers in the cache would turn negative. (ie: 1099511627775 bytes (1TB)). One of the things I'm worried about however, is that directory offsets are really cookies on a NFSv3 server and can be rather large, especially when/if the server generates the opaque directory cookies by using a local filesystem offset in what comes out as the upper 32 bits of the 64 bit cookie. (a server is free to do this, it could save byte swapping depending on the native 64 bit byte order) Obtained from: NetBSD
36329	24-May-1998	peter	Convert a couple of large allocations to use zones rather than malloc for better packing. This means that we can choose better values for the various hash entries without having to try and get it all to fit within an artificial power of two limit for malloc's sake.
36251	20-May-1998	peter	Only ignore "owner" permissions selectively rather than always. In some cases we ignore it (eg: read/write) to maintain chmod-after-open semantics but in other cases we do care, eg: creating files, access() etc. Never ignore errors from VOP_ACCESS() on immutable files. This apparently comes from BSDI (from Keith Bostic) via NetBSD. PR: 5148 Submitted by: Yoshiro MIHIRA <sanpei@yy.cs.keio.ac.jp>
36176	19-May-1998	peter	Allow control of the attribute cache timeouts at mount time. We had run out of bits in the nfs mount flags, I have moved the internal state flags into a seperate variable. These are no longer visible via statfs(), but I don't know of anything that looks at them.
36097	16-May-1998	bde	Get timespecs directly instead of via timevals.
35823	07-May-1998	msmith	In the words of the submitter: --------- Make callers of namei() responsible for releasing references or locks instead of having the underlying filesystems do it. This eliminates redundancy in all terminal filesystems and makes it possible for stacked transport layers such as umapfs or nullfs to operate correctly. Quality testing was done with testvn, and lat_fs from the lmbench suite. Some NFS client testing courtesy of Patrik Kudo. vop_mknod and vop_symlink still release the returned vpp. vop_rename still releases 4 vnode arguments before it returns. These remaining cases will be corrected in the next set of patches. --------- Submitted by: Michael Hancock <michaelh@cet.co.jp>
35066	06-Apr-1998	phk	Use random() to find our initial xid.
34961	30-Mar-1998	phk	Eradicate the variable "time" from the kernel, using various measures. "time" wasn't a atomic variable, so splfoo() protection were needed around any access to it, unless you just wanted the seconds part. Most uses of time.tv_sec now uses the new variable time_second instead. gettime() changed to getmicrotime(0. Remove a couple of unneeded splfoo() protections, the new getmicrotime() is atomic, (until Bruce sets a breakpoint in it). A couple of places needed random data, so use read_random() instead of mucking about with time which isn't random. Add a new nfs_curusec() function. Mark a couple of bogosities involving the now disappeard time variable. Update ffs_update() to avoid the weird "== &time" checks, by fixing the one remaining call that passwd &time as args. Change profiling in ncr.c to use ticks instead of time. Resolution is the same. Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call hzto() which subtracts time" sequences. Reviewed by: bde
33181	09-Feb-1998	eivind	Staticize.
33134	06-Feb-1998	eivind	Back out DIAGNOSTIC changes.
33108	04-Feb-1998	eivind	Turn DIAGNOSTIC into a new-style option.
33058	03-Feb-1998	bde	Added #include of <sys/queue.h> so that this file is more "self"-sufficent.
33054	03-Feb-1998	bde	Forward declare some structs so that this file is more self-sufficient.
32998	01-Feb-1998	bde	Moved declaration of `union nethostadr' outside of the KERNEL section, to give pollution compatible with <nfs/nqfs.h>. At least mount_nfs.c previously had to #define KERNEL before including <nfs/nfs.h> to get this pollution, but this gave other pollution. Moved comment about NFSINT_SIGMASK to immediately before the code that it applies to.
32937	31-Jan-1998	dyson	Change the busy page mgmt, so that when pages are freed, they MUST be PG_BUSY. It is bogus to free a page that isn't busy, because it is in a state of being "unavailable" when being freed. The additional advantage is that the page_remove code has a better cross-check that the page should be busy and unavailable for other use. There were some minor problems with the collapse code, and this plugs those subtile "holes." Also, the vfs_bio code wasn't checking correctly for PG_BUSY pages. I am going to develop a more consistant scheme for grabbing pages, busy or otherwise. For now, we are stuck with the current morass.
32071	29-Dec-1997	dyson	Lots of improvements, including restructring the caching and management of vnodes and objects. There are some metadata performance improvements that come along with this. There are also a few prototypes added when the need is noticed. Changes include: 1) Cleaning up vref, vget. 2) Removal of the object cache. 3) Nuke vnode_pager_uncache and friends, because they aren't needed anymore. 4) Correct some missing LK_RETRY's in vn_lock. 5) Correct the page range in the code for msync. Be gentle, and please give me feedback asap.
32011	27-Dec-1997	bde	Unspammed nested include of <vm/vm_zone.h>.
31886	20-Dec-1997	bde	Added a used include. Fixed a gratuitous ANSIism and nearby KNF violations.
31391	24-Nov-1997	bde	Don't call malloc(..., M_WAITOK) at splnet(). Doing so is often a mistake (since softnet interrupts may occur if malloc() waits), and doing it harmlessly but unnecessarily here interfered with detection of the mistaken cases.
31016	07-Nov-1997	phk	Remove a bunch of variables which were unused both in GENERIC and LINT. Found by: -Wunused
30994	06-Nov-1997	phk	Move the "retval" (3rd) parameter from all syscall functions and put it in struct proc instead. This fixes a boatload of compiler warning, and removes a lot of cruft from the sources. I have not removed the /ARGSUSED/, they will require some looking at. libkvm, ps and other userland struct proc frobbing programs will need recompiled.
30813	28-Oct-1997	bde	Removed unused #includes.
30808	28-Oct-1997	bde	Don't #include <nfs/nfs.h> in <nfs/nfs_node.h> if KERNEL is defined. Fixed everything that depended on the nested include.
30738	26-Oct-1997	phk	Always initialize the syscall vectors for our "private" syscalls (not just in the LKM case). Plug nqnfs_vop_lease_check directly into the default_vnodeop_p table.
30354	12-Oct-1997	phk	Last major round (Unless Bruce thinks of somthing :-) of malloc changes. Distribute all but the most fundamental malloc types. This time I also remembered the trick to making things static: Put "static" in front of them. A couple of finer points by: bde
30309	11-Oct-1997	phk	Distribute and statizice a lot of the malloc M_* types. Substantial input from: bde
29653	21-Sep-1997	dyson	Change the M_NAMEI allocations to use the zone allocator. This change plus the previous changes to use the zone allocator decrease the useage of malloc by half. The Zone allocator will be upgradeable to be able to use per CPU-pools, and has more intelligent usage of SPLs. Additionally, it has reasonable stats gathering capabilities, while making most calls inline.
29291	10-Sep-1997	phk	Remove a couple of stubborn NetBSD #if's.
29288	10-Sep-1997	phk	unifdef -U__NetBSD__ -D__FreeBSD__
29024	02-Sep-1997	bde	Added used #include - don't depend on <sys/mbuf.h> including <sys/malloc.h> (unless we only use the bogusly shared M*WAIT flags).
28270	16-Aug-1997	wollman	Fix all areas of the system (or at least all those in LINT) to avoid storing socket addresses in mbufs. (Socket buffers are the one exception.) A number of kernel APIs needed to get fixed in order to make this happen. Also, fix three protocol families which kept PCBs in mbufs to not malloc them instead. Delete some old compatibility cruft while we're at it, and add some new routines in the in_cksum family.
27845	02-Aug-1997	bde	Removed unused #includes.
27609	22-Jul-1997	dfr	Correct some dumb mistakes in the WebNFS stuff. Submitted by: bde
27608	22-Jul-1997	dfr	Allow NULL cookie verifiers for non-NULL offsets. This is needed for Digital Unix boxes since they appear to always send null verifiers.
27446	16-Jul-1997	dfr	Merge WebNFS changes from NetBSD. Obtained from: NetBSD
26952	25-Jun-1997	tegge	Clear nfs_iodwant[myiod] when the nfsiod process exits due to a signal.
26637	14-Jun-1997	bde	Don't require superuser privileges for creating fifos. The v2 case was broken when support for v3 was introduced in rev.1.16. The v3 case has always been broken in FreeBSD. Should be in 2.2. PR: 3838
26420	03-Jun-1997	dfr	Various fixes from NetBSD: Use u_int for rpc procedure numbers. Some fixes to NQNFS. A rare NULL pointer dereference. Ignore NFSMNT_NOCONN for TCP mounts. Obtained from: NetBSD
26418	03-Jun-1997	dfr	Implement the async mount option for NFSv3. This makes NFS pretend that all writes sent to the server were synchronous and therefore no commits are needed. This is the same as the vfs.nfs.async variable on the server but allows each client to choose whether to work this way. Also make the vfs.nfs.async variable do the 'right' thing for NFSv3, i.e. pretend that the write was synchronous.
25930	19-May-1997	dfr	Fix a few bugs with NFS and mmap caused by NFS' use of b_validoff and b_validend. The changes to vfs_bio.c are a bit ugly but hopefully can be tidied up later by a slight redesign. PR: kern/2573, kern/2754, kern/3046 (possibly) Reviewed by: dyson
25781	13-May-1997	dfr	Don't keep addresses in mbuf chains. This should simplify the next round of network changes from Garret. Reviewed by: Garrett Wollman <wollman@khavrinen.lcs.mit.edu>
25664	10-May-1997	dfr	Implement a separate control for write gathering on NFSv3. This is turned off for NFSv3 by default since write gathering seems to reduce performance for NFSv3 by up to 60%. Add sysctl knobs to control both variables.
25663	10-May-1997	dfr	Fix a nasty hang connected with write gathering. Also add debug print statements to bits of the server which helped me find the hang.
25307	30-Apr-1997	dfr	Allow NULL rpcs on non-privileged ports at all times to work around broken clients. PR: kern/3298 Submitted by: Tor Egge <Tor.Egge@idi.ntnu.no>
25201	27-Apr-1997	wollman	The long-awaited mega-massive-network-code- cleanup. Part I. This commit includes the following changes: 1) Old-style (pr_usrreq()) protocols are no longer supported, the compatibility glue for them is deleted, and the kernel will panic on boot if any are compiled in. 2) Certain protocol entry points are modified to take a process structure, so they they can easily tell whether or not it is possible to sleep, and also to access credentials. 3) SS_PRIV is no more, and with it goes the SO_PRIVSTATE setsockopt() call. Protocols should use the process pointer they are now passed. 4) The PF_LOCAL and PF_ROUTE families have been updated to use the new style, as has the `raw' skeleton family. 5) PF_LOCAL sockets now obey the process's umask when creating a socket in the filesystem. As a result, LINT is now broken. I'm hoping that some enterprising hacker with a bit more time will either make the broken bits work (should be easy for netipx) or dike them out.
25089	22-Apr-1997	dfr	Fix broken usage of nm_readdirsize and increase the socket buffers for UDP to prevent possible socket overflows. 2.2 candidate. PR: kern/3304 Reviewed by: Thomas David Rivers <ponds!rivers@dg-rtp.dg.com>
24626	04-Apr-1997	dfr	Fix various bugs in the locking protocol, allowing proper shared locks to be used. This should fix the lock panics that people are seeing.
24381	29-Mar-1997	bde	Removed #include of <ufs/ufs/dir.h>. Nfs no longer depends on any ufs features, and the one thing that it depended on (DIRBLKSIZ) now has conflicting spelling.
24378	29-Mar-1997	bde	Define our own version of DIRBLKSIZ instead of (ab)using ufs's value. Use the same value of 512 (ufs actually uses DEV_BSIZE). There are too many versions of DIRBLKSIZ, one for ufs, one for ext2fs, one for nfs, one for ibcs2, one for linux, one for applications, ... I think nfs's DIRBLKSIZ needs to be a divisor of the directory blocks sizes of all supported file systems. There is also NFS_DIRBLKSIZ, which is different from nfs's DIRBLKSIZ but is sometimes confused with it in comments. Removed a bogus #ifdef KERNEL that hid the tunable constants for nfs. This came in undocumented with the Lite2 merge although it isn't in Lite2. It required more-bogus #define KERNEL's in fstat and pstat to make the constants visible. Restored a spelling fix from rev.1.17. Removed duplicate #defines of all the the NFS mount option flags.
24330	27-Mar-1997	guido	Add code that will reject nfs requests in teh kernel from nonprivileged ports. This option will be automatically set/cleraed when mount is run without/with the -n option. Reviewed by: Doug Rabson
24250	25-Mar-1997	peter	Use the correct (relative to the implementation) ordering of args in the VOP_LINK() calls, Closes PR#3064 Submitted by: bde
24249	25-Mar-1997	peter	The local fs interface does not allow link()/unlink() of directories, do not allow a remote nfs client to cause local fs corruption either.
24101	22-Mar-1997	bde	Fixed some invalid (non-atomic) accesses to `time', mostly ones of the form `tv = time'. Use a new function gettime(). The current version just forces atomicicity without fixing precision or efficiency bugs. Simplified some related valid accesses by using the central function.
22975	22-Feb-1997	peter	Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.
22521	10-Feb-1997	dyson	This is the kernel Lite/2 commit. There are some requisite userland changes, so don't expect to be able to run the kernel as-is (very well) without the appropriate Lite/2 userland changes. The system boots and can mount UFS filesystems. Untested: ext2fs, msdosfs, NFS Known problems: Incorrect Berkeley ID strings in some files. Mount_std mounts will not work until the getfsent library routine is changed. Reviewed by: various people Submitted by: Jeffery Hsu <hsu@freebsd.org>
21673	14-Jan-1997	jkh	Make the long-awaited change from $Id$ to $FreeBSD$ This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long. Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
19449	06-Nov-1996	dfr	Improve the queuing algorithms used by NFS' asynchronous i/o. The existing mechanism uses a global queue for some buffers and the vp->b_dirtyblkhd queue for others. This turns sequential writes into randomly ordered writes to the server, affecting both read and write performance. The existing mechanism also copes badly with hung servers, tending to block accesses to other servers when all the iods are waiting for a hung server. The new mechanism uses a queue for each mount point. All asynchronous i/o goes through this queue which preserves the ordering of requests. A simple mechanism ensures that the iods are shared out fairly between active mount points. This removes the sysctl variable vfs.nfs.dwrite since the new queueing mechanism removes the old delayed write code completely. This should go into the 2.2 branch.
18866	11-Oct-1996	dfr	This fixes a problem with the nfs socket handling code which happens if a single process is performing a large number of requests (in this case writing a large file). The writing process could monopolise the recieve lock and prevent any other processes from recieving their replies. It also adds a new sysctl variable 'vfs.nfs.dwrite' which controls the behaviour which originally pointed out the problem. When a process writes to a file over NFS, it usually arranges for another process (the 'iod') to perform the request. If no iods are available, then it turns the write into a 'delayed write' which is later picked up by the next iod to do a write request for that file. This can cause that particular iod to do a disproportionate number of requests from a single process which can harm performance on some NFS servers. The alternative is to perform the write synchronously in the context of the original writing process if no iod is avaiable for asynchronous writing. The 'delayed write' behaviour is selected when vfs.nfs.dwrite=1 and the non-delayed behaviour is selected when vfs.nfs.dwrite=0. The default is vfs.nfs.dwrite=1; if many people tell me that performance is better if vfs.nfs.dwrite=0 then I will change the default. Submitted by: Hidetoshi Shimokawa <simokawa@sat.t.u-tokyo.ac.jp>
18397	19-Sep-1996	nate	In sys/time.h, struct timespec is defined as: /* * Structure defined by POSIX.4 to be like a timeval. / struct timespec { time_t ts_sec; / seconds / long ts_nsec; / and nanoseconds */ }; The correct names of the fields are tv_sec and tv_nsec. Reminded by: James Drobina <jdrobina@infinet.com>
18041	05-Sep-1996	dg	Release an unneeded reference to a vnode that was gained in a VFS_VGET(). Fixes a readdirplus panic. Submitted by: Doug Rabson <dfr@render.com>
18020	03-Sep-1996	bde	Eliminated nested include of <sys/unistd.h> in <sys/file.h> in the kernel. Include it directly in the few places where it is used. Reduced some #includes of <sys/file.h> to #includes of <sys/fcntl.h> or nothing.
17761	21-Aug-1996	dyson	Even though this looks like it, this is not a complex code change. The interface into the "VMIO" system has changed to be more consistant and robust. Essentially, it is now no longer necessary to call vn_open to get merged VM/Buffer cache operation, and exceptional conditions such as merged operation of VBLK devices is simpler and more correct. This code corrects a potentially large set of problems including the problems with ktrace output and loaded systems, file create/deletes, etc. Most of the changes to NFS are cosmetic and name changes, eliminating a layer of subroutine calls. The direct calls to vput/vrele have been re-instituted for better cross platform compatibility. Reviewed by: davidg
17186	16-Jul-1996	dfr	Various fixes from frank@fwi.uva.nl (Frank van der Linden) via rick@snowhite.cis.uoguelph.ca: 1. Clear B_NEEDCOMMIT in nfs_write to make sure that dirty data is correctly send to the server. If a buffer was dirtied when it was in the B_DELWRI+B_NEEDCOMMIT state, the state of the buffer was left unchanged and when the buffer was later cleaned, just a commit rpc was made to the server to complete the previous write. Clearing B_NEEDCOMMIT ensures that another write is made to the server. 2. If a server returned a server (for whatever reason) returned an answer to a write RPC that implied that fewer bytes than requested were written, bad things would happen. 3. The setattr operation passed on the atime in stead of the mtime to the server. The fix is trivial. 4. XIDs always started at 0, but this caused some servers (older DEC OSF/1 3.0 so I've been told) who had very long-lasting XID caches to get confused if, after a reboot of a BSD client, RPCs came in with a XID that had in the past been used before from that client. Patch is to use the current time in seconds as a starting point for XIDs. The patch below is not perfect, because it requires the root fs to be mounted first. This is because of the check BSD systems do, comparing FS time to system time. Reviewed by: Bruce Evans, Terry Lambert. Obtained from: frank@fwi.uva.nl (Frank van der Linden) via rick@snowhite.cis.uoguelph.ca
17096	11-Jul-1996	wollman	Modify the kernel to use the new pr_usrreqs interface rather than the old pr_usrreq mechanism which was poorly designed and error-prone. This commit renames pr_usrreq to pr_ousrreq so that old code which depended on it would break in an obvious manner. This commit also implements the new interface for TCP, although the old function is left as an example (#ifdef'ed out). This commit ALSO fixes a longstanding bug in the TCP timer processing (introduced by davidg on 1995/04/12) which caused timer processing on a TCB to always stop after a single timer had expired (because it misinterpreted the return value from tcp_usrreq() to indicate that the TCB had been deleted). Finally, some code related to polling has been deleted from if.c because it is not relevant t -current and doesn't look at all like my current code.
16634	23-Jun-1996	bde	Don't truncate minor or major numbers in the nfsv3 client.
16365	14-Jun-1996	phk	Fix for NFS_NOSERVER Poul mentioned that he thought this was some kind of timing problem, and that started me thinking. After a little poking around, I found that nfs_timer() was completely disabled when NFS_NOSERVER was #defined. But after looking at nfs_timer(), it seemed like it was something required by both the client and server code, and disabling it outright just didn't seem to make any sense. Parts of it relate only to the NFS server side code, so I disabled those, but I re-enabled the rest of the function and made sure that it would be called from nfs_init() (in nfs_subs.c). With nfs_timer() re-enabled, everything seems to work again. The only other changes I made were to #ifdef away some variable declarations in the NFS_NOSERVER case so that gcc would stop complaining about unused variables. Reviewed by: phk Submitted by: Bill Paul <wpaul@skynet.ctr.columbia.edu>
16226	08-Jun-1996	bde	Fixed a vnode reference leak in nfsrv_rename(). The target inode wasn't released until the file system was unmounted. This bug also affected kern/vfs_syscalls.c but was fixed in rev.1.18 and rev.1.20 there. Reviewed by: davidg
15480	30-Apr-1996	bde	#include <sys/filedesc.h> explicitly instead of depending on it being bogusly included by <sys/socketvar.h>.
15479	30-Apr-1996	bde	Fixed nfs sysctls. They missed out on the fs -> vfs name changes from Lite2. This broke nfsstat.
14093	13-Feb-1996	wollman	Kill XNS. While we're at it, fix socreate() to take a process argument. (This was supposed to get committed days ago...)
13765	30-Jan-1996	mpp	Fix a bunch of spelling errors in the comment fields of a bunch of system include files.
13490	19-Jan-1996	dyson	Eliminated many redundant vm_map_lookup operations for vm_mmap. Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish overhead for merged cache. Efficiency improvement for vfs_cluster. It used to do alot of redundant calls to cluster_rbuild. Correct the ordering for vrele of .text and release of credentials. Use the selective tlb update for 486/586/P6. Numerous fixes to the size of objects allocated for files. Additionally, fixes in the various pagers. Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs. Fixes in the swap pager for exhausted resources. The pageout code will not as readily thrash. Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE), thereby improving efficiency of several routines. Eliminate even more unnecessary vm_page_protect operations. Significantly speed up process forks. Make vm_object_page_clean more efficient, thereby eliminating the pause that happens every 30seconds. Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the case of filesystems mounted async. Fix a panic with busy pages when write clustering is done for non-VMIO buffers.
13416	13-Jan-1996	phk	Add an option NFS_NOSERVER which saves 100K in the install kernel (or any other kernel that uses it). Use with option NFS.
12911	17-Dec-1995	phk	Staticize.
12662	07-Dec-1995	dg	Untangled the vm.h include file spaghetti.
12588	03-Dec-1995	bde	Completed function declarations and/or added prototypes and/or moved prototypes to the right place.
12457	21-Nov-1995	bde	Completed function declarations, added prototypes and removed redundant declarations.
12453	21-Nov-1995	bde	Completed function declarations and/or added prototypes.
12274	14-Nov-1995	bde	Included <sys/sysproto.h> to get central declarations for syscall args structs and prototypes for syscalls. Ifdefed duplicated decentralized declarations of args structs. It's convenient to have this visible but they are hard to maintain. Some are already different from the central declarations. 4.4lite2 puts them in comments in the function headers but I wanted to avoid the large changes for that.
11982	31-Oct-1995	joerg	Include a prerequisite header (so this is consistent again with the NFSv2 state).
11921	29-Oct-1995	phk	Second batch of cleanup changes. This time mostly making a lot of things static and some unused variables here and there.
10224	24-Aug-1995	dg	Added NFS_ASYNC kernel option. It only has an effect for NFSv2.
10223	24-Aug-1995	dg	Killed redundant declarations of nfsm_rpchead().
10222	24-Aug-1995	dfr	Some fixes found using gcc -Wall: nfsm_rpchead() has been called with the wrong number of args and misplaced args since someone added new args in the middle for nfsv3. Here's another one that would be important on 64-bit systems. VOP_READDIR takes a `u_int **cookies' arg. Submitted by: Bruce Evans <bde@zeta.org.au>
10219	24-Aug-1995	dfr	Add support for amd direct maps. Reviewed by: Thomas Graichen <graichen@sirius.physik.fu-berlin.de>
9966	06-Aug-1995	dg	Fixed bug where vnode_pager_uncache() wasn't always called when it should be. The result was that the file's space wouldn't be properly freed when it was deleted. Submitted by: John Dyson
9877	03-Aug-1995	dfr	Slight changes to locking around VOP_READRIR. Detect in nfsrv_readdirplus when a filesystem soes not support VFS_VGET and return NFSERR_NOTSUPP so that the client will use ordinary readdir instead.
9854	02-Aug-1995	dfr	Lock the directory vnode before VOP_READDIR in nfsrv_readdirplus
9842	01-Aug-1995	dg	Removed my special-case hack for VOP_LINK and fixed the problem with the wrong vp's ops vector being used by changing the VOP_LINK's argument order. The special-case hack doesn't go far enough and breaks the generic bypass routine used in some non-leaf filesystems. Pointed out by Kirk McKusick.
9759	29-Jul-1995	bde	Eliminate sloppy common-style declarations. There should be none left for the LINT configuation.
9588	20-Jul-1995	dg	vnode_pager_alloc() never returns NULL, so don't check for it.
9507	13-Jul-1995	dg	NOTE: libkvm, w, ps, 'top', and any other utility which depends on struct proc or any VM system structure will have to be rebuilt!!! Much needed overhaul of the VM system. Included in this first round of changes: 1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages, haspage, and sync operations are supported. The haspage interface now provides information about clusterability. All pager routines now take struct vm_object's instead of "pagers". 2) Improved data structures. In the previous paradigm, there is constant confusion caused by pagers being both a data structure ("allocate a pager") and a collection of routines. The idea of a pager structure has escentially been eliminated. Objects now have types, and this type is used to index the appropriate pager. In most cases, items in the pager structure were duplicated in the object data structure and thus were unnecessary. In the few cases that remained, a un_pager structure union was created in the object to contain these items. 3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now be removed. For instance, vm_object_enter(), vm_object_lookup(), vm_object_remove(), and the associated object hash list were some of the things that were removed. 4) simple_lock's removed. Discussion with several people reveals that the SMP locking primitives used in the VM system aren't likely the mechanism that we'll be adopting. Even if it were, the locking that was in the code was very inadequate and would have to be mostly re-done anyway. The locking in a uni-processor kernel was a no-op but went a long way toward making the code difficult to read and debug. 5) Places that attempted to kludge-up the fact that we don't have kernel thread support have been fixed to reflect the reality that we are really dealing with processes, not threads. The VM system didn't have complete thread support, so the comments and mis-named routines were just wrong. We now use tsleep and wakeup directly in the lock routines, for instance. 6) Where appropriate, the pagers have been improved, especially in the pager_alloc routines. Most of the pager_allocs have been rewritten and are now faster and easier to maintain. 7) The pagedaemon pageout clustering algorithm has been rewritten and now tries harder to output an even number of pages before and after the requested page. This is sort of the reverse of the ideal pagein algorithm and should provide better overall performance. 8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup have been removed. Some other unnecessary casts have also been removed. 9) Some almost useless debugging code removed. 10) Terminology of shadow objects vs. backing objects straightened out. The fact that the vm_object data structure escentially had this backwards really confused things. The use of "shadow" and "backing object" throughout the code is now internally consistent and correct in the Mach terminology. 11) Several minor bug fixes, including one in the vm daemon that caused 0 RSS objects to not get purged as intended. 12) A "default pager" has now been created which cleans up the transition of objects to the "swap" type. The previous checks throughout the code for swp->pg_data != NULL were really ugly. This change also provides the rudiments for future backing of "anonymous" memory by something other than the swap pager (via the vnode pager, for example), and it allows the decision about which of these pagers to use to be made dynamically (although will need some additional decision code to do this, of course). 13) (dyson) MAP_COPY has been deprecated and the corresponding "copy object" code has been removed. MAP_COPY was undocumented and non- standard. It was furthermore broken in several ways which caused its behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will continue to work correctly, but via the slightly different semantics of MAP_PRIVATE. 14) (dyson) Sharing maps have been removed. It's marginal usefulness in a threads design can be worked around in other ways. Both #12 and #13 were done to simplify the code and improve readability and maintain- ability. (As were most all of these changes) TODO: 1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing this will reduce the vnode pager to a mere fraction of its current size. 2) Rewrite vm_fault and the swap/vnode pagers to use the clustering information provided by the new haspage pager interface. This will substantially reduce the overhead by eliminating a large number of VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be improved to provide both a "behind" and "ahead" indication of contiguousness. 3) Implement the extended features of pager_haspage in swap_pager_haspage(). It currently just says 0 pages ahead/behind. 4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps via a much more general mechanism that could also be used for disk striping of regular filesystems. 5) Do something to improve the architecture of vm_object_collapse(). The fact that it makes calls into the swap pager and knows too much about how the swap pager operates really bothers me. It also doesn't allow for collapsing of non-swap pager objects ("unnamed" objects backed by other pagers).
9456	09-Jul-1995	dg	Moved call to VOP_GETATTR() out of vnode_pager_alloc() and into the places that call vnode_pager_alloc() so that a failure return can be dealt with. This fixes a panic seen on NFS clients when a file being opened is deleted on the server before the open completes.
9356	28-Jun-1995	dg	1) Converted v_vmdata to v_object. 2) Removed unnecessary vm_object_lookup()/pager_cache(object, TRUE) pairs after vnode_pager_alloc() calls - the object is already guaranteed to be persistent. 3) Removed some gratuitous casts.
9354	28-Jun-1995	dg	Fixed VOP_LINK argument order botch.
9336	27-Jun-1995	dfr	Changes to support version 3 of the NFS protocol. The version 2 support has been tested (client+server) against FreeBSD-2.0, IRIX 5.3 and FreeBSD-current (using a loopback mount). The version 2 support is stable AFAIK. The version 3 support has been tested with a loopback mount and minimally against an IRIX 5.3 server. It needs more testing and may have problems. I have patched amd to support the new variable length filehandles although it will still only use version 2 of the protocol. Before booting a kernel with these changes, nfs clients will need to at least build and install /usr/sbin/mount_nfs. Servers will need to build and install /usr/sbin/mountd. NFS diskless support is untested. Obtained from: Rick Macklem <rick@snowhite.cis.uoguelph.ca>
9221	14-Jun-1995	joerg	The duplicate information returned in fa_type and fa_mode is an ambiguity in the NFS version 2 protocol. VREG should be taken literally as a regular file. If a server intents to return some type information differently in the upper bits of the mode field (e.g. for sockets, or FIFOs), NFSv2 mandates fa_type to be VNON. Anyway, we leave the examination of the mode bits even in the VREG case to avoid breakage for bogus servers, but we make sure that there are actually type bits set in the upper part of fa_mode (and failing that, trust the va_type field). NFSv3 cleared the issue, and requires fa_mode to not contain any type information (while also introduing sockets and FIFOs for fa_type). The fix has been tested against a variety of NFS servers. It fixes problems with the ``Tropic'' NFS server for Windows, while apparently not breaking anything. Pointed-out by: scott@zorch.sf-bay.org (Scott Hazen Mueller)
9202	11-Jun-1995	rgrimes	Merge RELENG_2_0_5 into HEAD
8876	30-May-1995	rgrimes	Remove trailing whitespace.
8832	29-May-1995	dg	Fixed some serious bugs that resulted in object reference counts not being handled correctly. This would manifest itself as "object deallocated too many times" panics and perhaps other strange inconsistencies on NFS servers. Reviewed by: me, of course Submitted by: John Dyson
7969	21-Apr-1995	dyson	Slight re-ordering of the creation of a vmio object to fix a condition that can cause NFS I/O failures.
7160	19-Mar-1995	dg	Removed unnecessary call to vnode_pager_uncache(). We automatically clear the VTEXT flag after all mappers have finished with the object.
7110	17-Mar-1995	dg	Changed some (incorrect) nfsrv_vput()'s back into regular vput()'s. This fixes the last of the known NQNFS problems (until I find more, that is :-)).
7090	16-Mar-1995	bde	Add and move declarations to fix all of the warnings from `gcc -Wimplicit' (except in netccitt, netiso and netns) and most of the warnings from `gcc -Wnested-externs'. Fix all the bugs found. There were no serious ones.
6420	15-Feb-1995	phk	YF fix.
6417	15-Feb-1995	dg	Fixed two more bugs related to the merged cache changes. Submitted by: John Dyson
6416	15-Feb-1995	dg	Woops, change a nfsrv_vput back into a nfsrv_vrele. Submitted by: John Dyson
6414	15-Feb-1995	dg	Fixed three bugs related to the merged cache changes. The bugs likely would make NFS servers flakey - probably the cause of freefall's recent hangs. Submitted by: John Dyson
6361	14-Feb-1995	phk	YFfix +int nfsrv_vput __P(( struct vnode * )); +int nfsrv_vrele __P(( struct vnode * )); +int nfsrv_vmio __P(( struct vnode * ));
6210	06-Feb-1995	dg	Changed order of release of vnode/object to fix a problem where the vnode is freed with an old object still attached (subsequently causing a panic). Fixes NFS server panic "object/pager mismatch". Submitted by: John Dyson
5455	09-Jan-1995	dg	These changes embody the support of the fully coherent merged VM buffer cache, much higher filesystem I/O performance, and much better paging performance. It represents the culmination of over 6 months of R&D. The majority of the merged VM/cache work is by John Dyson. The following highlights the most significant changes. Additionally, there are (mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to support the new VM/buffer scheme. vfs_bio.c: Significant rewrite of most of vfs_bio to support the merged VM buffer cache scheme. The scheme is almost fully compatible with the old filesystem interface. Significant improvement in the number of opportunities for write clustering. vfs_cluster.c, vfs_subr.c Upgrade and performance enhancements in vfs layer code to support merged VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff. vm_object.c: Yet more improvements in the collapse code. Elimination of some windows that can cause list corruption. vm_pageout.c: Fixed it, it really works better now. Somehow in 2.0, some "enhancements" broke the code. This code has been reworked from the ground-up. vm_fault.c, vm_page.c, pmap.c, vm_object.c Support for small-block filesystems with merged VM/buffer cache scheme. pmap.c vm_map.c Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of kernel PTs. vm_glue.c Much simpler and more effective swapping code. No more gratuitous swapping. proc.h Fixed the problem that the p_lock flag was not being cleared on a fork. swap_pager.c, vnode_pager.c Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the code doesn't need it anymore. machdep.c Changes to better support the parameter values for the merged VM/buffer cache scheme. machdep.c, kern_exec.c, vm_glue.c Implemented a seperate submap for temporary exec string space and another one to contain process upages. This eliminates all map fragmentation problems that previously existed. ffs_inode.c, ufs_inode.c, ufs_readwrite.c Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on busy buffers. Submitted by: John Dyson and David Greenman
4067	02-Nov-1994	wollman	Forward-declare a few structures to avoid warning messages.
3820	23-Oct-1994	wollman	Implement fs.nfs MIB variables.
3664	17-Oct-1994	phk	This is a bunch of changes from NetBSD. There are a couple of bug-fixes. But mostly it is changes to use the list-maintenance macros instead of doing the pointer-gymnastics by hand. Obtained from: NetBSD
3305	02-Oct-1994	phk	Prototyping and general gcc-shutting up. Gcc has one warning now which looks bad, I will get to it eventually, unless somebody beats me to it.
3167	28-Sep-1994	dfr	Make NFS ask the filesystems for directory cookies instead of making them itself.
2997	22-Sep-1994	wollman	Make NFS loadable.
2979	22-Sep-1994	wollman	More loadable VFS changes: - Make a number of filesystems work again when they are statically compiled (blush) - FIFOs are no longer optional; ``options FIFO'' removed from distributed config files.
2384	29-Aug-1994	dg	"bogus" fixes from 1.1.5 to work around some cache coherency problems.
2175	21-Aug-1994	paul	More idempotency....... this is fun :-)
1828	04-Aug-1994	dg	Made NFS attribute cache timeouts kernel config file tunable via NFS_MINATTRTIMO and NFS_MAXATTRTIMO.
1817	02-Aug-1994	dg	Added $Id$
1549	25-May-1994	rgrimes	The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch. Reviewed by: Rodney W. Grimes Submitted by: John Dyson and David Greenman
1541	24-May-1994	rgrimes	BSD 4.4 Lite Kernel Sources