272461 |
03-Oct-2014 |
gjb |
Copy stable/10@r272459 to releng/10.1 as part of the 10.1-RELEASE process.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation
|
268961 |
21-Jul-2014 |
bdrewery |
MFC r268114:
Change NFS readdir() to only ignore cookies preceding the given offset for UFS rather than for all but ZFS.
|
265667 |
08-May-2014 |
rmacklem |
MFC: r264888 The PR reported that the old NFS server did not set uio_td == NULL for the VOP_READ() call. This patch fixes both the old and new server for this case.
|
261049 |
22-Jan-2014 |
mav |
MFC r259765: Fix RPC server threads file handle affinity to work better with ZFS.
Instead of taking 8 specific bytes of file handle to identify file during RPC thread affitinity handling, use trivial hash of the full file handle. ZFS's struct zfid_short does not have padding field after the length field, as result, originally picked 8 bytes are loosing lower 16 bits of object ID, causing many false matches and unneeded requests affinity to same thread. This fix substantially improves NFS server latency and scalability in SPEC NFS benchmark by more flexible use of multiple NFS threads.
|
256281 |
10-Oct-2013 |
gjb |
Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation
|
255219 |
05-Sep-2013 |
pjd |
Change the cap_rights_t type from uint64_t to a structure that we can extend in the future in a backward compatible (API and ABI) way.
The cap_rights_t represents capability rights. We used to use one bit to represent one right, but we are running out of spare bits. Currently the new structure provides place for 114 rights (so 50 more than the previous cap_rights_t), but it is possible to grow the structure to hold at least 285 rights, although we can make it even larger if 285 rights won't be enough.
The structure definition looks like this:
struct cap_rights { uint64_t cr_rights[CAP_RIGHTS_VERSION + 2]; };
The initial CAP_RIGHTS_VERSION is 0.
The top two bits in the first element of the cr_rights[] array contain total number of elements in the array - 2. This means if those two bits are equal to 0, we have 2 array elements.
The top two bits in all remaining array elements should be 0. The next five bits in all array elements contain array index. Only one bit is used and bit position in this five-bits range defines array index. This means there can be at most five array elements in the future.
To define new right the CAPRIGHT() macro must be used. The macro takes two arguments - an array index and a bit to set, eg.
#define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL)
We still support aliases that combine few rights, but the rights have to belong to the same array element, eg:
#define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL) #define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL)
#define CAP_FCHMODAT (CAP_FCHMOD | CAP_LOOKUP)
There is new API to manage the new cap_rights_t structure:
cap_rights_t *cap_rights_init(cap_rights_t *rights, ...); void cap_rights_set(cap_rights_t *rights, ...); void cap_rights_clear(cap_rights_t *rights, ...); bool cap_rights_is_set(const cap_rights_t *rights, ...);
bool cap_rights_is_valid(const cap_rights_t *rights); void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src); void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src); bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little);
Capability rights to the cap_rights_init(), cap_rights_set(), cap_rights_clear() and cap_rights_is_set() functions are provided by separating them with commas, eg:
cap_rights_t rights;
cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT);
There is no need to terminate the list of rights, as those functions are actually macros that take care of the termination, eg:
#define cap_rights_set(rights, ...) \ __cap_rights_set((rights), __VA_ARGS__, 0ULL) void __cap_rights_set(cap_rights_t *rights, ...);
Thanks to using one bit as an array index we can assert in those functions that there are no two rights belonging to different array elements provided together. For example this is illegal and will be detected, because CAP_LOOKUP belongs to element 0 and CAP_PDKILL to element 1:
cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL);
Providing several rights that belongs to the same array's element this way is correct, but is not advised. It should only be used for aliases definition.
This commit also breaks compatibility with some existing Capsicum system calls, but I see no other way to do that. This should be fine as Capsicum is still experimental and this change is not going to 9.x.
Sponsored by: The FreeBSD Foundation
|
251171 |
31-May-2013 |
jeff |
- Convert the bufobj lock to rwlock. - Use a shared bufobj lock in getblk() and inmem(). - Convert softdep's lk to rwlock to match the bufobj lock. - Move INFREECNT to b_flags and protect it with the buf lock. - Remove unnecessary locking around bremfree() and BKGRDINPROG.
Sponsored by: EMC / Isilon Storage Division Discussed with: mckusick, kib, mdf
|
249596 |
17-Apr-2013 |
ken |
Move the NFS FHA (File Handle Affinity) code from sys/nfsserver to sys/nfs, since it is now shared by the two NFS servers.
Suggested by: rmacklem Sponsored by: Spectra Logic MFC after: 2 weeks
|
249592 |
17-Apr-2013 |
ken |
Revamp the old NFS server's File Handle Affinity (FHA) code so that it will work with either the old or new server.
The FHA code keeps a cache of currently active file handles for NFSv2 and v3 requests, so that read and write requests for the same file are directed to the same group of threads (reads) or thread (writes). It does not currently work for NFSv4 requests. They are more complex, and will take more work to support.
This improves read-ahead performance, especially with ZFS, if the FHA tuning parameters are configured appropriately. Without the FHA code, concurrent reads that are part of a sequential read from a file will be directed to separate NFS threads. This has the effect of confusing the ZFS zfetch (prefetch) code and makes sequential reads significantly slower with clients like Linux that do a lot of prefetching.
The FHA code has also been updated to direct write requests to nearby file offsets to the same thread in the same way it batches reads, and the FHA code will now also send writes to multiple threads when needed.
This improves sequential write performance in ZFS, because writes to a file are now more ordered. Since NFS writes (generally less than 64K) are smaller than the typical ZFS record size (usually 128K), out of order NFS writes to the same block can trigger a read in ZFS. Sending them down the same thread increases the odds of their being in order.
In order for multiple write threads per file in the FHA code to be useful, writes in the NFS server have been changed to use a LK_SHARED vnode lock, and upgrade that to LK_EXCLUSIVE if the filesystem doesn't allow multiple writers to a file at once. ZFS is currently the only filesystem that allows multiple writers to a file, because it has internal file range locking. This change does not affect the NFSv4 code.
This improves random write performance to a single file in ZFS, since we can now have multiple writers inside ZFS at one time.
I have changed the default tuning parameters to a 22 bit (4MB) window size (from 256K) and unlimited commands per thread as a result of my benchmarking with ZFS.
The FHA code has been updated to allow configuring the tuning parameters from loader tunable variables in addition to sysctl variables. The read offset window calculation has been slightly modified as well. Instead of having separate bins, each file handle has a rolling window of bin_shift size. This minimizes glitches in throughput when shifting from one bin to another.
sys/conf/files: Add nfs_fha_new.c and nfs_fha_old.c. Compile nfs_fha.c when either the old or the new NFS server is built.
sys/fs/nfs/nfsport.h, sys/fs/nfs/nfs_commonport.c: Bring in changes from Rick Macklem to newnfs_realign that allow it to operate in blocking (M_WAITOK) or non-blocking (M_NOWAIT) mode.
sys/fs/nfs/nfs_commonsubs.c, sys/fs/nfs/nfs_var.h: Bring in a change from Rick Macklem to allow telling nfsm_dissect() whether or not to wait for mallocs.
sys/fs/nfs/nfsm_subs.h: Bring in changes from Rick Macklem to create a new nfsm_dissect_nonblock() inline function and NFSM_DISSECT_NONBLOCK() macro.
sys/fs/nfs/nfs_commonkrpc.c, sys/fs/nfsclient/nfs_clkrpc.c: Add the malloc wait flag to a newnfs_realign() call.
sys/fs/nfsserver/nfs_nfsdkrpc.c: Setup the new NFS server's RPC thread pool so that it will call the FHA code.
Add the malloc flag argument to newnfs_realign().
Unstaticize newnfs_nfsv3_procid[] so that we can use it in the FHA code.
sys/fs/nfsserver/nfs_nfsdsocket.c: In nfsrvd_dorpc(), add NFSPROC_WRITE to the list of RPC types that use the LK_SHARED lock type.
sys/fs/nfsserver/nfs_nfsdport.c: In nfsd_fhtovp(), if we're starting a write, check to see whether the underlying filesystem supports shared writes. If not, upgrade the lock type from LK_SHARED to LK_EXCLUSIVE.
sys/nfsserver/nfs_fha.c: Remove all code that is specific to the NFS server implementation. Anything that is server-specific is now accessed through a callback supplied by that server's FHA shim in the new softc.
There are now separate sysctls and tunables for the FHA implementations for the old and new NFS servers. The new NFS server has its tunables under vfs.nfsd.fha, the old NFS server's tunables are under vfs.nfsrv.fha as before.
In fha_extract_info(), use callouts for all server-specific code. Getting file handles and offsets is now done in the individual server's shim module.
In fha_hash_entry_choose_thread(), change the way we decide whether two reads are in proximity to each other. Previously, the calculation was a simple shift operation to see whether the offsets were in the same power of 2 bucket. The issue was that there would be a bucket (and therefore thread) transition, even if the reads were in close proximity. When there is a thread transition, reads wind up going somewhat out of order, and ZFS gets confused.
The new calculation simply tries to see whether the offsets are within 1 << bin_shift of each other. If they are, the reads will be sent to the same thread.
The effect of this change is that for sequential reads, if the client doesn't exceed the max_reqs_per_nfsd parameter and the bin_shift is set to a reasonable value (22, or 4MB works well in my tests), the reads in any sequential stream will largely be confined to a single thread.
Change fha_assign() so that it takes a softc argument. It is now called from the individual server's shim code, which will pass in the softc.
Change fhe_stats_sysctl() so that it takes a softc parameter. It is now called from the individual server's shim code. Add the current offset to the list of things printed out about each active thread.
Change the num_reads and num_writes counters in the fha_hash_entry structure to 32-bit values, and rename them num_rw and num_exclusive, respectively, to reflect their changed usage.
Add an enable sysctl and tunable that allows the user to disable the FHA code (when vfs.XXX.fha.enable = 0). This is useful for before/after performance comparisons.
nfs_fha.h: Move most structure definitions out of nfs_fha.c and into the header file, so that the individual server shims can see them.
Change the default bin_shift to 22 (4MB) instead of 18 (256K). Allow unlimited commands per thread.
sys/nfsserver/nfs_fha_old.c, sys/nfsserver/nfs_fha_old.h, sys/fs/nfsserver/nfs_fha_new.c, sys/fs/nfsserver/nfs_fha_new.h: Add shims for the old and new NFS servers to interface with the FHA code, and callbacks for the
The shims contain all of the code and definitions that are specific to the NFS servers.
They setup the server-specific callbacks and set the server name for the sysctl and loader tunable variables.
sys/nfsserver/nfs_srvkrpc.c: Configure the RPC code to call fhaold_assign() instead of fha_assign().
sys/modules/nfsd/Makefile: Add nfs_fha.c and nfs_fha_new.c.
sys/modules/nfsserver/Makefile: Add nfs_fha_old.c.
Reviewed by: rmacklem Sponsored by: Spectra Logic MFC after: 2 weeks
|
248084 |
09-Mar-2013 |
attilio |
Switch the vm_object mutex to be a rwlock. This will enable in the future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes.
The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs.
The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example).
Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho
|
247602 |
02-Mar-2013 |
pjd |
Merge Capsicum overhaul:
- Capability is no longer separate descriptor type. Now every descriptor has set of its own capability rights.
- The cap_new(2) system call is left, but it is no longer documented and should not be used in new code.
- The new syscall cap_rights_limit(2) should be used instead of cap_new(2), which limits capability rights of the given descriptor without creating a new one.
- The cap_getrights(2) syscall is renamed to cap_rights_get(2).
- If CAP_IOCTL capability right is present we can further reduce allowed ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed ioctls can be retrived with cap_ioctls_get(2) syscall.
- If CAP_FCNTL capability right is present we can further reduce fcntls that can be used with the new cap_fcntls_limit(2) syscall and retrive them with cap_fcntls_get(2).
- To support ioctl and fcntl white-listing the filedesc structure was heavly modified.
- The audit subsystem, kdump and procstat tools were updated to recognize new syscalls.
- Capability rights were revised and eventhough I tried hard to provide backward API and ABI compatibility there are some incompatible changes that are described in detail below:
CAP_CREATE old behaviour: - Allow for openat(2)+O_CREAT. - Allow for linkat(2). - Allow for symlinkat(2). CAP_CREATE new behaviour: - Allow for openat(2)+O_CREAT.
Added CAP_LINKAT: - Allow for linkat(2). ABI: Reuses CAP_RMDIR bit. - Allow to be target for renameat(2).
Added CAP_SYMLINKAT: - Allow for symlinkat(2).
Removed CAP_DELETE. Old behaviour: - Allow for unlinkat(2) when removing non-directory object. - Allow to be source for renameat(2).
Removed CAP_RMDIR. Old behaviour: - Allow for unlinkat(2) when removing directory.
Added CAP_RENAMEAT: - Required for source directory for the renameat(2) syscall.
Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR): - Allow for unlinkat(2) on any object. - Required if target of renameat(2) exists and will be removed by this call.
Removed CAP_MAPEXEC.
CAP_MMAP old behaviour: - Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and PROT_WRITE. CAP_MMAP new behaviour: - Allow for mmap(2)+PROT_NONE.
Added CAP_MMAP_R: - Allow for mmap(PROT_READ). Added CAP_MMAP_W: - Allow for mmap(PROT_WRITE). Added CAP_MMAP_X: - Allow for mmap(PROT_EXEC). Added CAP_MMAP_RW: - Allow for mmap(PROT_READ | PROT_WRITE). Added CAP_MMAP_RX: - Allow for mmap(PROT_READ | PROT_EXEC). Added CAP_MMAP_WX: - Allow for mmap(PROT_WRITE | PROT_EXEC). Added CAP_MMAP_RWX: - Allow for mmap(PROT_READ | PROT_WRITE | PROT_EXEC).
Renamed CAP_MKDIR to CAP_MKDIRAT. Renamed CAP_MKFIFO to CAP_MKFIFOAT. Renamed CAP_MKNODE to CAP_MKNODEAT.
CAP_READ old behaviour: - Allow pread(2). - Disallow read(2), readv(2) (if there is no CAP_SEEK). CAP_READ new behaviour: - Allow read(2), readv(2). - Disallow pread(2) (CAP_SEEK was also required).
CAP_WRITE old behaviour: - Allow pwrite(2). - Disallow write(2), writev(2) (if there is no CAP_SEEK). CAP_WRITE new behaviour: - Allow write(2), writev(2). - Disallow pwrite(2) (CAP_SEEK was also required).
Added convinient defines:
#define CAP_PREAD (CAP_SEEK | CAP_READ) #define CAP_PWRITE (CAP_SEEK | CAP_WRITE) #define CAP_MMAP_R (CAP_MMAP | CAP_SEEK | CAP_READ) #define CAP_MMAP_W (CAP_MMAP | CAP_SEEK | CAP_WRITE) #define CAP_MMAP_X (CAP_MMAP | CAP_SEEK | 0x0000000000000008ULL) #define CAP_MMAP_RW (CAP_MMAP_R | CAP_MMAP_W) #define CAP_MMAP_RX (CAP_MMAP_R | CAP_MMAP_X) #define CAP_MMAP_WX (CAP_MMAP_W | CAP_MMAP_X) #define CAP_MMAP_RWX (CAP_MMAP_R | CAP_MMAP_W | CAP_MMAP_X) #define CAP_RECV CAP_READ #define CAP_SEND CAP_WRITE
#define CAP_SOCK_CLIENT \ (CAP_CONNECT | CAP_GETPEERNAME | CAP_GETSOCKNAME | CAP_GETSOCKOPT | \ CAP_PEELOFF | CAP_RECV | CAP_SEND | CAP_SETSOCKOPT | CAP_SHUTDOWN) #define CAP_SOCK_SERVER \ (CAP_ACCEPT | CAP_BIND | CAP_GETPEERNAME | CAP_GETSOCKNAME | \ CAP_GETSOCKOPT | CAP_LISTEN | CAP_PEELOFF | CAP_RECV | CAP_SEND | \ CAP_SETSOCKOPT | CAP_SHUTDOWN)
Added defines for backward API compatibility:
#define CAP_MAPEXEC CAP_MMAP_X #define CAP_DELETE CAP_UNLINKAT #define CAP_MKDIR CAP_MKDIRAT #define CAP_RMDIR CAP_UNLINKAT #define CAP_MKFIFO CAP_MKFIFOAT #define CAP_MKNOD CAP_MKNODAT #define CAP_SOCK_ALL (CAP_SOCK_CLIENT | CAP_SOCK_SERVER)
Sponsored by: The FreeBSD Foundation Reviewed by: Christoph Mallon <christoph.mallon@gmx.de> Many aspects discussed with: rwatson, benl, jonathan ABI compatibility discussed with: kib
|
245611 |
18-Jan-2013 |
jhb |
Use vfs_timestamp() to set file timestamps rather than invoking getmicrotime() or getnanotime() directly in NFS.
Reviewed by: rmacklem, bde MFC after: 1 week
|
243882 |
05-Dec-2012 |
glebius |
Mechanically substitute flags from historic mbuf allocator with malloc(9) flags within sys.
Exceptions:
- sys/contrib not touched - sys/mbuf.h edited manually
|
241896 |
22-Oct-2012 |
kib |
Remove the support for using non-mpsafe filesystem modules.
In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems.
The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes.
Conducted and reviewed by: attilio Tested by: pho
|
241394 |
10-Oct-2012 |
kevlo |
Revert previous commit...
Pointyhat to: kevlo (myself)
|
241370 |
09-Oct-2012 |
kevlo |
Prefer NULL over 0 for pointers
|
241025 |
28-Sep-2012 |
kib |
Fix the mis-handling of the VV_TEXT on the nullfs vnodes.
If you have a binary on a filesystem which is also mounted over by nullfs, you could execute the binary from the lower filesystem, or from the nullfs mount. When executed from lower filesystem, the lower vnode gets VV_TEXT flag set, and the file cannot be modified while the binary is active. But, if executed as the nullfs alias, only the nullfs vnode gets VV_TEXT set, and you still can open the lower vnode for write.
Add a set of VOPs for the VV_TEXT query, set and clear operations, which are correctly bypassed to lower vnode.
Tested by: pho (previous version) MFC after: 2 weeks
|
228520 |
15-Dec-2011 |
delphij |
Honor NFSv3 commit call (RFC 1813, Section 3.3.21) where when count is 0, the full length from offset is being flushed. Note that for now VOP_FSYNC does not support offset and length parameters so we still do the same full VOP_FSYNC. This issue was reported at FreeNAS support site as FreeNAS ticket #1096.
Submitted by: "ceckerle" <ce.freenas eckerle net> Prodded by: gcooper Reviewed by: rmacklem MFC after: 2 weeks
|
228185 |
01-Dec-2011 |
jhb |
Enhance the sequential access heuristic used to perform readahead in the NFS server and reuse it for writes as well to allow writes to the backing store to be clustered. - Use a prime number for the size of the heuristic table (1017 is not prime). - Move the logic to locate a heuristic entry from the table and compute the sequential count out of VOP_READ() and into a separate routine. - Use the logic from sequential_heuristic() in vfs_vnops.c to update the seqcount when a sequential access is performed rather than just increasing seqcount by 1. This lets the clustering count ramp up faster. - Allow for some reordering of RPCs and if it is detected leave the current seqcount as-is rather than dropping back to a seqcount of 1. Also, when out of order access is encountered, cut seqcount in half rather than dropping it all the way back to 1 to further aid with reordering. - Fix the new NFS server to properly update the next offset after a successful VOP_READ() so that the readahead actually works.
Some of these changes came from an earlier patch by Bjorn Gronwall that was forwarded to me by bde@.
Discussed with: bde, rmacklem, fs@ Submitted by: Bjorn Gronwall (1, 4) MFC after: 2 weeks
|
225356 |
03-Sep-2011 |
rmacklem |
Fix the NFS servers so that they can do a Lookup of "..", which requires that ni_strictrelative be set to 0, post-r224810.
Tested by: swills (earlier version), geo dot liaskos at gmail.com Approved by: re (kib)
|
224778 |
11-Aug-2011 |
rwatson |
Second-to-last commit implementing Capsicum capabilities in the FreeBSD kernel for FreeBSD 9.0:
Add a new capability mask argument to fget(9) and friends, allowing system call code to declare what capabilities are required when an integer file descriptor is converted into an in-kernel struct file *. With options CAPABILITIES compiled into the kernel, this enforces capability protection; without, this change is effectively a no-op.
Some cases require special handling, such as mmap(2), which must preserve information about the maximum rights at the time of mapping in the memory map so that they can later be enforced in mprotect(2) -- this is done by narrowing the rights in the existing max_protection field used for similar purposes with file permissions.
In namei(9), we assert that the code is not reached from within capability mode, as we're not yet ready to enforce namespace capabilities there. This will follow in a later commit.
Update two capability names: CAP_EVENT and CAP_KEVENT become CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they represent.
Approved by: re (bz) Submitted by: jonathan Sponsored by: Google Inc
|
223309 |
19-Jun-2011 |
rmacklem |
Fix the kgssapi so that it can be loaded as a module. Currently the NFS subsystems use five of the rpcsec_gss/kgssapi entry points, but since it was not obvious which others might be useful, all nineteen were included. Basically the nineteen entry points are set in a structure called rpc_gss_entries and inline functions defined in sys/rpc/rpcsec_gss.h check for the entry points being non-NULL and then call them. A default value is returned otherwise. Requested by rwatson.
Reviewed by: jhb MFC after: 2 weeks
|
222167 |
22-May-2011 |
rmacklem |
Add a lock flags argument to the VFS_FHTOVP() file system method, so that callers can indicate the minimum vnode locking requirement. This will allow some file systems to choose to return a LK_SHARED locked vnode when LK_SHARED is specified for the flags argument. This patch only adds the flag. It does not change any file system to use it and all callers specify LK_EXCLUSIVE, so file system semantics are not changed.
Reviewed by: kib
|
219028 |
25-Feb-2011 |
netchild |
Add some FEATURE macros for various features (AUDIT/CAM/IPC/KTR/MAC/NFS/NTP/ PMC/SYSV/...).
No FreeBSD version bump, the userland application to query the features will be committed last and can serve as an indication of the availablility if needed.
Sponsored by: Google Summer of Code 2010 Submitted by: kibab Reviewed by: arch@ (parts by rwatson, trasz, jhb) X-MFC after: to be determined in last commit with code from this project
|
218345 |
05-Feb-2011 |
alc |
Unless "cnt" exceeds MAX_COMMIT_COUNT, nfsrv_commit() and nfsvno_fsync() are incorrectly calling vm_object_page_clean(). They are passing the length of the range rather than the ending offset of the range.
Perform the OFF_TO_IDX() conversion in vm_object_page_clean() rather than the callers.
Reviewed by: kib MFC after: 3 weeks
|
216774 |
28-Dec-2010 |
pjd |
ZFS might not return monotonically increasing directory offset cookies, so turn off UFS-specific hack that assumes so in ZFS case. Before the change we can miss returning some directory entries to a NFS client.
I believe that the hack should be moved to ufs_readdir(), but until we find somebody who will do it, turn it off for ZFS in NFS server code.
Submitted by: rmacklem Discussed with: rmacklem, mckusick MFC after: 3 days
|
216633 |
21-Dec-2010 |
pjd |
Use newly added NFSRV_FLAG_BUSY flag for nfsrv_fhtovp() to keep mount point busy. This fixes a race where we can pass invalid mount point to VFS_VGET() via vp->v_mount when exported file system was forcibly unmounted between nfsrv_fhtovp() and VFS_VGET().
Reviewed by: kib MFC after: 5 days
|
216632 |
21-Dec-2010 |
pjd |
- Move pubflag and lockflag handling from nfsrv_fhtovp() to nfs_namei() - this is the only place that is different from all the other nfsrv_fhtovp() consumers. This simplifies nfsrv_fhtovp() a bit and also eliminates one vn_lock/VOP_UNLOCK() cycle in case of NFSv3. - Implement NFSRV_FLAG_BUSY flag for nfsrv_fhtovp() that tells it to leave mount point busy.
Reviewed by: kib MFC after: 5 days
|
216631 |
21-Dec-2010 |
pjd |
On error, unbusy file system and jump to the end, so we won't try to unlock NULL *vpp.
Reviewed by: kib MFC after: 5 days
|
216627 |
21-Dec-2010 |
pjd |
After r216626 no extra { } are needed with VFS_UNLOCK_GIANT().
|
216565 |
19-Dec-2010 |
pjd |
Reduce lock scope a little.
|
216454 |
15-Dec-2010 |
kib |
VOP_ISLOCKED() should not be used to determine if the vnode is locked. Explicitely track the locked status of the vnode.
Reviewed by: pjd Tested by: avg MFC after: 1 week
|
214851 |
05-Nov-2010 |
kib |
Fix a bug in r214049. The nvp == vp case shall be handled specially only for !usevget case. If VFS_VGET is working, the vnode shared lock is obtained recursively and vput() shall be done, not vunref().
Submitted by: rmacklem Tested by: Josh Carroll <josh.carroll gmail com> MFC after: 3 days
|
214049 |
19-Oct-2010 |
kib |
When readdirplus() is handled on the exported filesystem that does not support VFS_VGET, like msdosfs, do not call VOP_LOOKUP() for dotdot on the root directory. Our filesystems expect that VFS handles dotdot lookups on root on its own.
Reported and tested by: kevlo MFC after: 2 weeks
|
211854 |
26-Aug-2010 |
pjd |
- When VFS_VGET() is not supported, switch to VOP_LOOKUP(). - We are fine by only share-locking the vnode. - Remove assertion that doesn't hold for ZFS where we cross mount points boundaries by going into .zfs/snapshot/<name>/.
Reviewed by: rmacklem MFC after: 1 month
|
205661 |
26-Mar-2010 |
rmacklem |
Patch the regular NFS server so that it returns ESTALE to the client for all errors returned by VFS_FHTOVP(). This is required to ensure that EIO doesn't get returned to the client when ZFS is used as the server file system.
Tested by: korvus AT comcast.net Reviewed by: jhb MFC after: 2 weeks
|
203968 |
16-Feb-2010 |
marius |
Factor out the code shared between NFS client and server into its own module. With r203732 it became apparent that creating the sysctl nodes twice causes at least a warning, however the whole code shouldn't be present twice in the first place.
Discussed with: rmacklem
|
203732 |
09-Feb-2010 |
marius |
- Move nfs_realign() from the NFS client to the shared NFS code and remove the NFS server version in order to reduce code duplication. The shared version now uses a second parameter how, which is passed on to m_get(9) and m_getcl(9) as the server used M_WAIT while the client requires M_DONTWAIT, and replaces the the previously unused parameter hsiz. - Change nfs_realign() to use nfsm_aligned() so as with other NFS code the alignment check isn't actually performed on platforms without strict alignment requirements for performance reasons because as the comment suggests unaligned data only occasionally occurs with TCP. - Change fha_extract_info() to use nfs_realign() with M_DONTWAIT rather than M_WAIT because it's called with the RPC sp_lock held.
Reviewed by: jhb, rmacklem MFC after: 1 week
|
201899 |
09-Jan-2010 |
marius |
Some style(9) fixes in order to fabricate a commit to denote that the commit message for r201896 actually should have read:
As nfsm_srvmtofh_xx() assumes the 4-byte alignment required by XDR ensure the mbuf data is aligned accordingly by calling nfs_realign() in fha_extract_info(). This fix is orthogonal to the problem solved by r199274/r199284.
PR: 142102 (second part) MFC after: 1 week
|
201896 |
09-Jan-2010 |
marius |
Exclude options COMPAT_FREEBSD4 now that the MD freebsd4_sigreturn() is gone since r201396 and which is also in line with the fact that FreeBSD 4 didn't supported sparc64.
PR: 142102 (second part) MFC after: 1 week
|
200084 |
03-Dec-2009 |
jhb |
Properly return an error reply if an NFS remove or link operation fails. Previously the failing operation would allocate an mbuf and construct an error reply, but because the function did not return 0, the NFS server assumed it had failed to generate a reply and would leak the reply mbuf as well as not sending the reply to the NFS client.
PR: kern/140853 Submitted by: Ted Faber faber at isi edu (remove) Reviewed by: rmacklem (remove) MFC after: 1 week
|
199284 |
15-Nov-2009 |
marcel |
Revert previous change and fix misalignment by using bcopy() to copy the file handle from fid_data into fh. This eliminates conditional compilation.
Pointed out by: imp
|
199274 |
14-Nov-2009 |
marcel |
Fix an obvious panic by not casting from a pointer that is 4-bytes alignment to a type that needs 8-byte alignment, and thus causing misaligned memory references.
MFC after: 1 week
|
197525 |
26-Sep-2009 |
pjd |
Ensure that tv_sec is between INT32_MIN and INT32_MAX, so ZFS won't object. This completes the fix from r185586.
PR: kern/139059 Reported by: Daniel Braniss <danny@cs.huji.ac.il> Submitted by: Jaakko Heinonen <jh@saunalahti.fi> Tested by: Daniel Braniss <danny@cs.huji.ac.il> MFC after: 3 days
|
197040 |
09-Sep-2009 |
pjd |
Correct typo after manual patching.
Noticed by: b. f.
|
197039 |
09-Sep-2009 |
pjd |
Fix usecount leak in mknod(2) on file system exported over NFS.
While I'm here, correct typo in comment.
Reviewed by: kan, kib MFC after: 3 days
|
195202 |
30-Jun-2009 |
dfr |
Remove the old kernel RPC implementation and the NFS_LEGACYRPC option.
Approved by: re
|
195181 |
30-Jun-2009 |
jhb |
Fix build with NFS_LEGACYRPC enabled after the socket upcall locking changes.
Approved by: re (kensmith)
|
194498 |
19-Jun-2009 |
brooks |
Rework the credential code to support larger values of NGROUPS and NGROUPS_MAX, eliminate ABI dependencies on them, and raise the to 1024 and 1023 respectively. (Previously they were equal, but under a close reading of POSIX, NGROUPS_MAX was defined to be too large by 1 since it is the number of supplemental groups, not total number of groups.)
The bulk of the change consists of converting the struct ucred member cr_groups from a static array to a pointer. Do the equivalent in kinfo_proc.
Introduce new interfaces crcopysafe() and crsetgroups() for duplicating a process credential before modifying it and for setting group lists respectively. Both interfaces take care for the details of allocating groups array. crsetgroups() takes care of truncating the group list to the current maximum (NGROUPS) if necessary. In the future, crsetgroups() may be responsible for insuring invariants such as sorting the supplemental groups to allow groupmember() to be implemented as a binary search.
Because we can not change struct xucred without breaking application ABIs, we leave it alone and introduce a new XU_NGROUPS value which is always 16 and is to be used or NGRPS as appropriate for things such as NFS which need to use no more than 16 groups. When feasible, truncate the group list rather than generating an error.
Minor changes: - Reduce the number of hand rolled versions of groupmember(). - Do not assign to both cr_gid and cr_groups[0]. - Modify ipfw to cache ucreds instead of part of their contents since they are immutable once referenced by more than one entity.
Submitted by: Isilon Systems (initial implementation) X-MFC after: never PR: bin/113398 kern/133867
|
194407 |
17-Jun-2009 |
rmacklem |
Since svc_[dg|vc|tli|tp]_create() did not hold a reference count on the SVCXPTR structure returned by them, it was possible for the structure to be free'd before svc_reg() had been completed using the structure. This patch acquires a reference count on the newly created structure that is returned by svc_[dg|vc|tli|tp]_create(). It also adds the appropriate SVC_RELEASE() calls to the callers, except the experimental nfs subsystem. The latter will be committed separately.
Submitted by: dfr Tested by: pho Approved by: kib (mentor)
|
194073 |
12-Jun-2009 |
rmacklem |
Add a #include <sys/jail.h> so that it builds when options KGSSAPI is specified in the kernel configuration.
Approved by: kib (mentor)
|
193511 |
05-Jun-2009 |
rwatson |
Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include.
Discussed with: pjd
|
193272 |
01-Jun-2009 |
jhb |
Rework socket upcalls to close some races with setup/teardown of upcalls. - Each socket upcall is now invoked with the appropriate socket buffer locked. It is not permissible to call soisconnected() with this lock held; however, so socket upcalls now return an integer value. The two possible values are SU_OK and SU_ISCONNECTED. If an upcall returns SU_ISCONNECTED, then the soisconnected() will be invoked on the socket after the socket buffer lock is dropped. - A new API is provided for setting and clearing socket upcalls. The API consists of soupcall_set() and soupcall_clear(). - To simplify locking, each socket buffer now has a separate upcall. - When a socket upcall returns SU_ISCONNECTED, the upcall is cleared from the receive socket buffer automatically. Note that a SO_SND upcall should never return SU_ISCONNECTED. - All this means that accept filters should now return SU_ISCONNECTED instead of calling soisconnected() directly. They also no longer need to explicitly clear the upcall on the new socket. - The HTTP accept filter still uses soupcall_set() to manage its internal state machine, but other accept filters no longer have any explicit knowlege of socket upcall internals aside from their return value. - The various RPC client upcalls currently drop the socket buffer lock while invoking soreceive() as a temporary band-aid. The plan for the future is to add a new flag to allow soreceive() to be called with the socket buffer locked. - The AIO callback for socket I/O is now also invoked with the socket buffer locked. Previously sowakeup() would drop the socket buffer lock only to call aio_swake() which immediately re-acquired the socket buffer lock for the duration of the function call.
Discussed with: rwatson, rmacklem
|
193066 |
29-May-2009 |
jamie |
Place hostnames and similar information fully under the prison system. The system hostname is now stored in prison0, and the global variable "hostname" has been removed, as has the hostname_mtx mutex. Jails may have their own host information, or they may inherit it from the parent/system. The proper way to read the hostname is via getcredhostname(), which will copy either the hostname associated with the passed cred, or the system hostname if you pass NULL. The system hostname can still be accessed directly (and without locking) at prison0.pr_host, but that should be avoided where possible.
The "similar information" referred to is domainname, hostid, and hostuuid, which have also become prison parameters and had their associated global variables removed.
Approved by: bz (mentor)
|
192895 |
27-May-2009 |
jamie |
Add hierarchical jails. A jail may further virtualize its environment by creating a child jail, which is visible to that jail and to any parent jails. Child jails may be restricted more than their parents, but never less. Jail names reflect this hierarchy, being MIB-style dot-separated strings.
Every thread now points to a jail, the default being prison0, which contains information about the physical system. Prison0's root directory is the same as rootvnode; its hostname is the same as the global hostname, and its securelevel replaces the global securelevel. Note that the variable "securelevel" has actually gone away, which should not cause any problems for code that properly uses securelevel_gt() and securelevel_ge().
Some jail-related permissions that were kept in global variables and set via sysctls are now per-jail settings. The sysctls still exist for backward compatibility, used only by the now-deprecated jail(2) system call.
Approved by: bz (mentor)
|
192678 |
24-May-2009 |
dfr |
Fix build of KGSSAPI bits post-vimage.
|
191990 |
11-May-2009 |
attilio |
Remove the thread argument from the FSD (File-System Dependent) parts of the VFS. Now all the VFS_* functions and relating parts don't want the context as long as it always refers to curthread.
In some points, in particular when dealing with VOPs and functions living in the same namespace (eg. vflush) which still need to be converted, pass curthread explicitly in order to retain the old behaviour. Such loose ends will be fixed ASAP.
While here fix a bug: now, UFS_EXTATTR can be compiled alone without the UFS_EXTATTR_AUTOSTART option.
VFS KPI is heavilly changed by this commit so thirdy parts modules needs to be recompiled. Bump __FreeBSD_version in order to signal such situation.
|
191940 |
09-May-2009 |
kan |
Do not embed struct ucred into larger netcred parent structures.
Credential might need to hang around longer than its parent and be used outside of mnt_explock scope controlling netcred lifetime. Use separate reference-counted ucred allocated separately instead.
While there, extend mnt_explock coverage in vfs_stdexpcheck and clean-up some unused declarations in new NFS code.
Reported by: John Hickey PR: kern/133439 Reviewed by: dfr, kib
|
190971 |
12-Apr-2009 |
rmacklem |
Change nfsserver so that it uses the nfssvc() system call provided in sys/nfs/nfs_nfssvc.c by registering with it using the nfsd_call_nfsserver function pointer. Also, add the build glue for nfs_nfssvc.c optionally based on "nfsserver" and also as a loadable module.
Submitted by: rmacklem Reviewed by: kib Approved by: kib (mentor)
|
190053 |
19-Mar-2009 |
dfr |
Fix an mbuf leak in the error path.
Submitted by: Rick Macklem <rick at snowhite dot cis dot uoguelph dot ca>
|
188965 |
23-Feb-2009 |
rwatson |
Include audit.h so that the system call path protected by NFS_LEGACYRPC can audit its arguments.
Submitted by: Jaakko Heinonen <jh at saunalahti.fi> MFC after: 1 week X-MFC-note: MFC with r188311
|
188588 |
13-Feb-2009 |
jhb |
Use shared vnode locks when invoking VOP_READDIR().
MFC after: 1 month
|
188311 |
08-Feb-2009 |
rwatson |
Audit the flag argument to the nfssvc(2) system call.
Obtained from: TrustedBSD Project Sponsored by: Apple, Inc.
|
187830 |
28-Jan-2009 |
ed |
Last step of splitting up minor and unit numbers: remove minor().
Inside the kernel, the minor() function was responsible for obtaining the device minor number of a character device. Because we made device numbers dynamically allocated and independent of the unit number passed to make_dev() a long time ago, it was actually a misnomer. If you really want to obtain the device number, you should use dev2udev().
We already converted all the drivers to use dev2unit() to obtain the device unit number, which is still used by a lot of drivers. I've noticed not a single driver passes NULL to dev2unit(). Even if they would, its behaviour would make little sense. This is why I've removed the NULL check.
Ths commit removes minor(), minor2unit() and unit2minor() from the kernel. Because there was a naming collision with uminor(), we can rename umajor() and uminor() back to major() and minor(). This means that the makedev(3) manual page also applies to kernel space code now.
I suspect umajor() and uminor() isn't used that often in external code, but to make it easier for other parties to port their code, I've increased __FreeBSD_version to 800062.
|
186165 |
16-Dec-2008 |
kensmith |
Handle VFS_VGET() failing with an error other than EOPNOTSUPP in addition to failing with that error.
PR: 125149 Submitted by: Jaakko Heinonen (jh <at> saunalahti <dot> fi) Reviewed by: mohans, kan MFC after: 3 days
|
185860 |
10-Dec-2008 |
dfr |
We need to pass a structure with enough space for an NFSv2 filehandle to nfs_srvmtofh_xx otherwise bad things happen when an NFSv2 client tries to make a request.
|
185586 |
03-Dec-2008 |
kan |
Change nfsserver slightly so that it does not trip over the timestamp validation code on ZFS.
Problem: when opening file with O_CREAT|O_EXCL NFS has to jump through extra hoops to ensure O_EXCL semantics. Namely, client supplies of 8 bytes (NFSX_V3CREATEVERF) bytes of verification data to uniquely identify this create request. Server then creates a new file with access mode 0, copies received 8 bytes into va_atime member of struct vattr and attempt to set the atime on file using VOP_SETATTR. If that succeeds, it fetches file attributes with VOP_GETATTR and verifies that atime timestamps match. If timestamps do not match, NFS server concludes it has probbaly lost the race to another process creating the file with the same name and bails with EEXIST.
This scheme works OK when exported FS is FFS, but if underlying filesystem is ZFS _and_ server is running 64bit kernel, it breaks down due to sanity checking in zfs_setattr function, which refuses to accept any timestamps which have tv_sec that cannot be represented as 32bit int. Since struct timespec fields are 64 bit integers on 64bit platforms and server just copies NFSX_V3CREATEVERF bytes info va_atime, all eight bytes supplied by client end up in va_atime.tv_sec, forcing it out of valid 32bit range.
The solution this change implements is simple: it treats NFSX_V3CREATEVERF as two 32bit integers and unpacks them separately into va_atime.tv_sec and va_atime.tv_nsec respectively, thus guaranteeing that tv_sec remains in 32 bit range and ZFS remains happy.
Reviewed by: kib
|
185432 |
29-Nov-2008 |
kib |
In the nfsrv_fhtovp(), after the vfs_getvfs() function found the pointer to the fs, but before a vnode on the fs is locked, unmount may free fs structures, causing access to destroyed data and freed memory.
Introduce a vfs_busymp() function that looks up and busies found fs while mountlist_mtx is held. Use it in nfsrv_fhtovp() and in the implementation of the handle syscalls.
Two other uses of the vfs_getvfs() in the vfs_subr.c, namely in sysctl_vfs_ctl and vfs_getnewfsid seems to be ok. In particular, sysctl_vfs_ctl is protected by Giant by being a non-sleeping sysctl handler, that prevents Giant-locked unmount code to interfere with it.
Noted by: tegge Reviewed by: dfr Tested by: pho MFC after: 1 month
|
184969 |
14-Nov-2008 |
dfr |
Switch the default rpc implementation for NFS back to the new code. I believe I have fixed the reported problems - if you still have trouble with it, please contact me with as much detail as possible so that I can track down any other issues as quickly as possible.
|
184921 |
13-Nov-2008 |
dfr |
Use the remote address for access control, not the local address. This fixes the nfsd problems that some people have with the new code.
Add support for the vfs.nfsrv.nfs_privport sysctl which denies access unless the client is using a port number less than 1024. Not really sure if this is particularly useful since it doesn't add any real security.
|
184920 |
13-Nov-2008 |
dfr |
Temporarily switch NFS back to the old RPC code while I try to diagnose and fix the problems a few people have noticed with the new code. People who want to continue testing the new code or who need RPCSEC_GSS support should use the new option NFS_NEWRPC to select it.
|
184869 |
12-Nov-2008 |
dfr |
Turn (NFSERR_AUTHERR|code) status values into svcerr_auth(rqst, code) replies instead of returning a success with a bogus NFS error code.
|
184868 |
12-Nov-2008 |
dfr |
Allow v3 GETATTR requests even when weakly authenticated. Change the error return for for weakly authenticated requests from REJECTEDCRED to WEAKAUTH for consistency with Solaris.
|
184744 |
07-Nov-2008 |
dfr |
Range-check NFSv2 procedure numbers before converting to NFSv3.
Submitted by: csjp
|
184719 |
06-Nov-2008 |
dfr |
Don't depend on krpc.ko in the NFS_LEGACYRPC case.
|
184716 |
06-Nov-2008 |
des |
Unbreak NFS.
Pointy hat to: dfr
|
184693 |
05-Nov-2008 |
dfr |
If mountd doesn't specify a secflavor list for the mount, assume that -sec=sys is what was wanted.
|
184643 |
04-Nov-2008 |
dfr |
Include <sys/eventhandler.h>.
|
184588 |
03-Nov-2008 |
dfr |
Implement support for RPCSEC_GSS authentication to both the NFS client and server. This replaces the RPC implementation of the NFS client and server with the newer RPC implementation originally developed (actually ported from the userland sunrpc code) to support the NFS Lock Manager. I have tested this code extensively and I believe it is stable and that performance is at least equal to the legacy RPC implementation.
The NFS code currently contains support for both the new RPC implementation and the older legacy implementation inherited from the original NFS codebase. The default is to use the new implementation - add the NFS_LEGACYRPC option to fall back to the old code. When I merge this support back to RELENG_7, I will probably change this so that users have to 'opt in' to get the new code.
To use RPCSEC_GSS on either client or server, you must build a kernel which includes the KGSSAPI option and the crypto device. On the userland side, you must build at least a new libc, mountd, mount_nfs and gssd. You must install new versions of /etc/rc.d/gssd and /etc/rc.d/nfsd and add 'gssd_enable=YES' to /etc/rc.conf.
As long as gssd is running, you should be able to mount an NFS filesystem from a server that requires RPCSEC_GSS authentication. The mount itself can happen without any kerberos credentials but all access to the filesystem will be denied unless the accessing user has a valid ticket file in the standard place (/tmp/krb5cc_<uid>). There is currently no support for situations where the ticket file is in a different place, such as when the user logged in via SSH and has delegated credentials from that login. This restriction is also present in Solaris and Linux. In theory, we could improve this in future, possibly using Brooks Davis' implementation of variant symlinks.
Supporting RPCSEC_GSS on a server is nearly as simple. You must create service creds for the server in the form 'nfs/<fqdn>@<REALM>' and install them in /etc/krb5.keytab. The standard heimdal utility ktutil makes this fairly easy. After the service creds have been created, you can add a '-sec=krb5' option to /etc/exports and restart both mountd and nfsd.
The only other difference an administrator should notice is that nfsd doesn't fork to create service threads any more. In normal operation, there will be two nfsd processes, one in userland waiting for TCP connections and one in the kernel handling requests. The latter process will create as many kthreads as required - these should be visible via 'top -H'. The code has some support for varying the number of service threads according to load but initially at least, nfsd uses a fixed number of threads according to the value supplied to its '-n' option.
Sponsored by: Isilon Systems MFC after: 1 month
|
184561 |
02-Nov-2008 |
trhodes |
Document a few sysctls in the NFS client and server code. Minor style(9) where applicable.
Approved by: alfred (slightly older version)
|
184413 |
28-Oct-2008 |
trasz |
Introduce accmode_t. This is required for NFSv4 ACLs - it will be neccessary to add more V* constants, and the variables changed by this patch were often being assigned to mode_t variables, which is 16 bit.
Approved by: rwatson (mentor)
|
184407 |
28-Oct-2008 |
rwatson |
Rename three MAC entry points from _proc_ to _cred_ to reflect the fact that they operate directly on credentials: mac_proc_create_swapper(), mac_proc_create_init(), and mac_proc_associate_nfsd(). Update policies.
Obtained from: TrustedBSD Project
|
184205 |
23-Oct-2008 |
des |
Retire the MALLOC and FREE macros. They are an abomination unto style(9).
MFC after: 3 months
|
183809 |
12-Oct-2008 |
rwatson |
Turn XXX's for unlocked writes of NFS server statistics to simple notes, as we consider it a feature to exchange performance for consistency.
MFC after: 3 days
|
183113 |
17-Sep-2008 |
attilio |
Remove the suser(9) interface from the kernel. It has been replaced from years by the priv_check(9) interface and just very few places are left. Note that compatibility stub with older FreeBSD version (all above the 8 limit though) are left in order to reduce diffs against old versions. It is responsibility of the maintainers for any module, if they think it is the case, to axe out such cases.
This patch breaks KPI so __FreeBSD_version will be bumped into a later commit.
This patch needs to be credited 50-50 with rwatson@ as he found time to explain me how the priv_check() works in detail and to review patches.
Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com> Reviewed by: rwatson
|
183103 |
16-Sep-2008 |
attilio |
Decontext-alize the nfsserver module. Now, only some few places still require thread passing (mostly the ones which access to VOP_* functions) and will be fixed once the primitive also will be.
Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
|
182371 |
28-Aug-2008 |
attilio |
Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread was always curthread and totally unuseful.
Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
|
180131 |
30-Jun-2008 |
rwatson |
Remove spls from NFS server setup call; expand receive socket buffer locking to cover full setup of socket upcalls; remove XXX about locking.
MFC after: 3 weeks
|
179381 |
28-May-2008 |
kib |
Change the fix in the rev. 1.179 to use nfsrv_lockedpair_nd().
Tested by: pho MFC after: 3 days
|
179380 |
28-May-2008 |
kib |
Initialize vfslocked prior to calling nfsm_srvmtofh where it was forgotten.
Reported by: Andrew Edwards <aedwards sandvine com> Tested by: pho MFC after: 3 days
|
177599 |
25-Mar-2008 |
ru |
Replaced the misleading uses of a historical artefact M_TRYWAIT with M_WAIT. Removed dead code that assumed that M_TRYWAIT can return NULL; it's not true since the advent of MBUMA.
Reviewed by: arch
There are ongoing disputes as to whether we want to switch to directly using UMA flags M_WAITOK/M_NOWAIT for mbuf(9) allocation.
|
177493 |
22-Mar-2008 |
jeff |
- Complete part of the unfinished bufobj work by consistently using BO_LOCK/UNLOCK/MTX when manipulating the bufobj. - Create a new lock in the bufobj to lock bufobj fields independently. This leaves the vnode interlock as an 'identity' lock while the bufobj is an io lock. The bufobj lock is ordered before the vnode interlock and also before the mnt ilock. - Exploit this new lock order to simplify softdep_check_suspend(). - A few sync related functions are marked with a new XXX to note that we may not properly interlock against a non-zero bv_cnt when attempting to sync all vnodes on a mountlist. I do not believe this race is important. If I'm wrong this will make these locations easier to find.
Reviewed by: kib (earlier diff) Tested by: kris, pho (earlier diff)
|
177386 |
19-Mar-2008 |
dfr |
Fix a regression from the last revision - don't edit the ns_rec list while not holding the lock.
|
177360 |
18-Mar-2008 |
dfr |
Don't call nfs_realign while holding locks.
Reviewed by: kib
|
176790 |
04-Mar-2008 |
kib |
Fix the Giant leak in the nfsrv_remove().
Reported by: pluknet <pluknet gmail com> MFC after: 1 week
|
175450 |
18-Jan-2008 |
remko |
Use nfsrv_destroycache() only once, else it crashes the server.
PR: kern/118152 Submitted by: Bjoern Groenvall <bg at sics dot se> Approved by: imp (mentor, a while ago already), jhb MFC After: 3 days
|
175294 |
13-Jan-2008 |
attilio |
VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary.
KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed.
Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
|
175202 |
10-Jan-2008 |
attilio |
vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed.
Manpage and FreeBSD_version will be updated through further commits.
As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock.
Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>
|
173334 |
04-Nov-2007 |
rwatson |
Garbage collect now-unused nfsrv_setcred() -- it's not only unused, but also a purveyor of unfortunate (and now unsupported) direct frobbing of struct ucred.
MFC after: 3 days
|
172957 |
25-Oct-2007 |
rwatson |
Rename mac_associate_nfsd_label() to mac_proc_associate_nfsd(), and move from mac_vfs.c to mac_process.c to join other functions that setup up process labels for specific purposes. Unlike the two proc create calls, this call is intended to run after creation when a process registers as the NFS daemon, so remains an _associate_ call..
Obtained from: TrustedBSD Project
|
172759 |
18-Oct-2007 |
jhb |
Add a -z flag to nfsstat which zeros the NFS statistics after displaying them.
MFC after: 1 week Requested by: ps Submitted by: ps (6 years ago)
|
172557 |
12-Oct-2007 |
mohans |
Set the NFS server sockbuf high watermarks to the system defaults (up form 32KB). The low highwatermark setting caused UDP fullsock request drops, throttling thruput greatly. Reported by: Kris Kennaway Approved by: re@ (Ken Smith)
|
171744 |
06-Aug-2007 |
rwatson |
Remove the now-unused NET_{LOCK,UNLOCK,ASSERT}_GIANT() macros, which previously conditionally acquired Giant based on debug.mpsafenet. As that has now been removed, they are no longer required. Removing them significantly simplifies error-handling in the socket layer, eliminated quite a bit of unwinding of locking in error cases.
While here clean up the now unneeded opt_net.h, which previously was used for the NET_WITH_GIANT kernel option. Clean up some related gotos for consistency.
Reviewed by: bz, csjp Tested by: kris Approved by: re (kensmith)
|
171613 |
27-Jul-2007 |
rwatson |
First in a series of changes to remove the now-unused Giant compatibility framework for non-MPSAFE network protocols:
- Remove debug_mpsafenet variable, sysctl, and tunable. - Remove NET_NEEDS_GIANT() and associate SYSINITSs used by it to force debug.mpsafenet=0 if non-MPSAFE protocols are compiled into the kernel. - Remove logic to automatically flag interrupt handlers as non-MPSAFE if debug.mpsafenet is set for an INTR_TYPE_NET handler. - Remove logic to automatically flag netisr handlers as non-MPSAFE if debug.mpsafenet is set. - Remove references in a few subsystems, including NFS and Cronyx drivers, which keyed off debug_mpsafenet to determine various aspects of their own locking behavior. - Convert NET_LOCK_GIANT(), NET_UNLOCK_GIANT(), and NET_ASSERT_GIANT into no-op's, as their entire behavior was determined by the value in debug_mpsafenet. - Alias NET_CALLOUT_MPSAFE to CALLOUT_MPSAFE.
Many remaining references to NET_.*_GIANT() and NET_CALLOUT_MPSAFE are still present in subsystems, and will be removed in followup commits.
Reviewed by: bz, jhb Approved by: re (kensmith)
|
170689 |
13-Jun-2007 |
rwatson |
Include priv.h to pick up suser(9) definitions, missed in an earlier commit.
Warnings spotted by: kris
|
170488 |
10-Jun-2007 |
mjacob |
Init timespec to zero fo quiesce warnings.
|
168951 |
22-Apr-2007 |
rwatson |
Remove MAC Framework access control check entry points made redundant with the introduction of priv(9) and MAC Framework entry points for privilege checking/granting. These entry points exactly aligned with privileges and provided no additional security context:
- mac_check_sysarch_ioperm() - mac_check_kld_unload() - mac_check_settime() - mac_check_system_nfsd()
Add mpo_priv_check() implementations to Biba and LOMAC policies, which, for each privilege, determine if they can be granted to processes considered unprivileged by those two policies. These mostly, but not entirely, align with the set of privileges granted in jails.
Obtained from: TrustedBSD Project
|
168931 |
21-Apr-2007 |
rwatson |
Attempt to rationalize NFS privileges:
- Replace PRIV_NFSD with PRIV_NFS_DAEMON, add PRIV_NFS_LOCKD.
- Use PRIV_NFS_DAEMON in the NFS server.
- In the NFS client, move the privilege check from nfslockdans(), which occurs every time a write is performed on /dev/nfslock, and instead do it in nfslock_open() just once. This allows us to avoid checking the saved uid for root, and just use the effective on open. Use PRIV_NFS_LOCKD.
|
168761 |
15-Apr-2007 |
rwatson |
In nfsrv_rcv(), don't reacquire the nfs server lock until after nfs_realign() has been called, as it may sleep waiting on memory allocation.
Reported by: simon
|
168268 |
02-Apr-2007 |
jhb |
- Split out the part of SYSCALL_MODULE_HELPER() that builds a 'struct sysent' for a new system call into a new MAKE_SYSENT() macro. - Use MAKE_SYSENT() to build a full sysent for the nfssvc system call in the NFS server and use syscall_register() and syscall_deregister() to manage the nfssvc system call entry instead of manually frobbing the sysent[] array.
|
167899 |
26-Mar-2007 |
jhb |
Initialize vfslocked to 0 before nfsm_srvmtofh() so that the variable is not used uninitialized in 'nfsmout' if nfsm_srvmtofh() gets an internal error.
CID: 1766 Found by: Coverity Prevent (tm)
|
167665 |
17-Mar-2007 |
jeff |
- Turn all explicit giant acquires into conditional VFS_LOCK_GIANTs. Only ops which used namei still remained. - Implement a scheme for reducing the overhead of tracking which vops require giant by constantly reducing the number of recursive giant acquires to one, leaving us with only one vfslocked variable. - Remove all NFSD lock acquisition and release from the individual nfs ops. Careful examination has shown that they are not required. This greatly simplifies the code.
Sponsored by: Isilon Systems, Inc. Discussed with: rwatson Tested by: kkenn Approved by: re
|
167214 |
05-Mar-2007 |
wkoszek |
Change these descriptions of memory types used in malloc(9), as their current, rather long strings make output from vmstat -m look unpleasant.
Approved by: cognet (mentor)
|
167211 |
04-Mar-2007 |
rwatson |
Remove 'MPSAFE' annotations from the comments above most system calls: all system calls now enter without Giant held, and then in some cases, acquire Giant explicitly.
Remove a number of other MPSAFE annotations in the credential code and tweak one or two other adjacent comments.
|
166774 |
15-Feb-2007 |
pjd |
Move vnode-to-file-handle translation from vfs_vptofh to vop_vptofh method. This way we may support multiple structures in v_data vnode field within one file system without using black magic.
Vnode-to-file-handle should be VOP in the first place, but was made VFS operation to keep interface as compatible as possible with SUN's VFS. BTW. Now Solaris also implements vnode-to-file-handle as VOP operation.
VFS_VPTOFH() was left for API backward compatibility, but is marked for removal before 8.0-RELEASE.
Approved by: mckusick Discussed with: many (on IRC) Tested with: ufs, msdosfs, cd9660, nullfs and zfs
|
166683 |
13-Feb-2007 |
mpp |
Get the vfs giant lock before calling nfs_access.
Reviewed by: mohan
|
165739 |
02-Jan-2007 |
hrs |
The nfsm_srvpathsiz() macro in nfsrv_symlink() in nfs_serv.c should check length of the pathname in the range 0<=n<=NFS_MAXPATHLEN, not 0<n<=NFS_MAXPATHLEN. This fixes a minor interoperability problem that the FreeBSD NFS server did not allow a symlink pointing the empty pathname.
MFC after: 1 week
|
165118 |
12-Dec-2006 |
bz |
MFp4: 92972, 98913 + one more change
In ip6_sprintf no longer use and return one of eight static buffers for printing/logging ipv6 addresses. The caller now has to hand in a sufficiently large buffer as first argument.
|
164585 |
24-Nov-2006 |
rwatson |
Push Giant a bit further off the NFS server in a number of straight forward cases by converting from unconditional acquisition of Giant around vnode operations to conditional acquisition:
- Remove nfsrv_access_withgiant(), and cause nfsrv_access() to now assert that Giant will be held if it is required for the vnode.
- Add nfsrv_fhtovp_locked(), which will drop the NFS server lock if required, and modify nfsrv_fhtovp() to conditionally acquire Giant if required.
- In the VOP's not dealing with more than one vnode at a time (i.e., not involving a lookup), conditionally acquire Giant.
This removes Giant use for MPSAFE file systems for a number of quite important RPCs, including getattr, read, write. It leaves unconditional Giant acquisitions in vnode operations that interact with the name space or more than one vnode at a time as these require further work.
Tested by: kris Reviewed by: kib
|
164436 |
20-Nov-2006 |
pjd |
Protect nfsm_srvpathsiz() call with the nfsd_mtx lock.
Reviewed by: mohans
|
164033 |
06-Nov-2006 |
rwatson |
Sweep kernel replacing suser(9) calls with priv(9) calls, assigning specific privilege names to a broad range of privileges. These may require some future tweaking.
Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>
|
163701 |
26-Oct-2006 |
kib |
Fix leak in NAMEI zone caused by nfs server when VOP_RENAME fails.
Submitted by: Padma Bhooma <pbhooma at panasas com> Reviewed by: bde Approved by: pjd (mentor) MFC after: 1 week
|
163606 |
22-Oct-2006 |
rwatson |
Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead.
This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd.
Obtained from: TrustedBSD Project Sponsored by: SPARTA
|
160881 |
01-Aug-2006 |
jhb |
- Add a new function nfsrv_destroycache() to tear down the server request cache when unloading the nfsserver module. This fixes a memory leak and a stale pointer. - Use callout_drain() rather than callout_stop() when unloading the nfsserver module.
MFC after: 3 days
|
160880 |
01-Aug-2006 |
jhb |
Use TAILQ_FOREACH_SAFE() in a couple of places.
|
160798 |
28-Jul-2006 |
jhb |
Now that all system calls are MPSAFE, retire the SYF_MPSAFE flag used to mark system calls as being MPSAFE: - Stop conditionally acquiring Giant around system call invocations. - Remove all of the 'M' prefixes from the master system call files. - Remove support for the 'M' prefix from the script that generates the syscall-related files from the master system call files. - Don't explicitly set SYF_MPSAFE when registering nfssvc.
|
160619 |
24-Jul-2006 |
rwatson |
soreceive_generic(), and sopoll_generic(). Add new functions sosend(), soreceive(), and sopoll(), which are wrappers for pru_sosend, pru_soreceive, and pru_sopoll, and are now used univerally by socket consumers rather than either directly invoking the old so*() functions or directly invoking the protocol switch method (about an even split prior to this commit).
This completes an architectural change that was begun in 1996 to permit protocols to provide substitute implementations, as now used by UDP. Consumers now uniformly invoke sosend(), soreceive(), and sopoll() to perform these operations on sockets -- in particular, distributed file systems and socket system calls.
Architectural head nod: sam, gnn, wollman
|
159871 |
23-Jun-2006 |
mohans |
Size the NFS server dupreq cache on the basis of nmbclusters. On servers with low nmbclusters, we tie up too many mbclusters in the NFS duplicate request cache. This change limits the size of the dupreq cache to 1/2 the nmbclusters (and flaots in a range of [64, 2048]).
MFC after 2 weeks.
Reported by: Steve Kargl, David O'Brien Tested by: Steve Kargl
|
159268 |
05-Jun-2006 |
kib |
Temporary workaround to prevent leak of Giant from nfsd when calling lookup().
Reviewed by: tegge Tested by: "Arno J. Klaassen" <arno at heho snv jussieu fr>, "Rong-en Fan" <grafan at gmail com>, Dmitriy Kirhlarov <dimma at higis ru>, Dmitry Pryanishnikov <dmitry at atlantis dp ua> MFC after: 1 week Approved by: kan, pjd (mentors)
|
158008 |
25-Apr-2006 |
mohans |
Bump up the NFS server dupreq cache limit to 2K (from 64). With a small duplicate request cache, under heavy load a lot of non-idempotent requests were getting served again, resulting in errors.
Found by : Kris Kennaway.
|
157575 |
06-Apr-2006 |
csjp |
Introduce a new MAC entry point for label initialization of the NFS daemon's credential: mac_associate_nfsd_label()
This entry point can be utilized by various Mandatory Access Control policies so they can properly initialize the label of files which get created as a result of an NFS operation. This work will be useful for fixing kernel panics associated with accessing un-initialized or invalid vnode labels.
The implementation of these entry points will come shortly.
Obtained from: TrustedBSD Requested by: mdodd MFC after: 3 weeks
|
157391 |
02-Apr-2006 |
cel |
rick says:
The following bug was just identified in OpenBSD and it looks like the same bug exists in the other BSDen NFS servers.
A Linux client (don't know which version, but you can look at http://bugzilla.kernel.org/show_bug.cgi?id=6256) does a Setattr of mtime to the server's time, where the file is mode 0664 and the client user has group access (ie. caller is not the file owner).
The BSD servers fail the Setattr with EPERM, since the VA_UTIMES_NULL flag isn't set before doing the VOP_SETATTR.
It seems to me that this should be allowed, since it is allowed for a local utimes(2). If so, the fix is to set VA_UTIMES_NULL for the "set-time-to-server-time" cases of setting atime and/or mtime.
Submitted by: rick@snowhite.cis.uoguelph.ca Reviewed by: cel Approved by: silby MFC after: 1 week
|
157325 |
31-Mar-2006 |
jeff |
- Release the references acquired by VOP_GETWRITEMOUNT and vfs_getvfs().
Discussed with: tegge Tested by: kris Sponsored by: Isilon Systems, Inc.
|
156586 |
12-Mar-2006 |
jeff |
- Reorder vrele calls after vput calls to prevent lock order reversals between leaf and directory locks.
Found by: kris Sponsored by: Isilon Systems, Inc.
|
156439 |
08-Mar-2006 |
simon |
When parsing an RPC request in nfsrv_dorec(), KASSERT that there actually is an mbuf to process. This catches the missing mbuf before it would otherwise causes a NULL pointer dereference, which could be triggered by a 0 length RPC record before the check for such records was added in rev 1.97.
Approved by: cperciva (mentor)
|
156146 |
01-Mar-2006 |
simon |
Correct a remote kernel panic when processing zero-length RPC records via TCP. [06:10]
Security: FreeBSD-SA-06:10.nfs Approved by: cperciva
|
155160 |
01-Feb-2006 |
jeff |
- Reorder calls to vrele() after calls to vput() when the vrele is a directory. vrele() may lock the passed vnode, which in these cases would give an invalid lock order of child -> parent. These situations are deadlock prone although do not typically deadlock because the vrele is typically not releasing the last reference to the vnode. Users of vrele must consider it as a call to vn_lock() and order it appropriately.
MFC After: 1 week Sponsored by: Isilon Systems, Inc. Tested by: kkenn
|
154960 |
28-Jan-2006 |
csjp |
Manage the ucred for the NFS server using the crget/crfree API defined in kern_prot.c. This API handles reference counting among many other things. Notably, if MAC is compiled into the kernel, it will properly initialize the MAC labels when the ucred is allocated.
This work is in preparation for a new MAC entry point which will be responsible for properly initializing policy specific labels for the NFS server credential. Utilization of the crfree/crget APIs reduce the complexity associated with this label's management.
Submitted by: green (with changes) [1] Obtained from: TrustedBSD Project Discussed with: rwatson, alfred
[1] I moved the ucred allocation outside the scope of the NFS server lock to prevent M_WAIKOK allocations from occurring with non-sleep-able locks held. Additionally, to reduce complexity, the ucred persist as long as the NFS server descriptor.
|
154737 |
23-Jan-2006 |
trhodes |
Revert my previous commit.
Proved I'm not that bright at times: jhb
|
154729 |
23-Jan-2006 |
trhodes |
Fix indentation.
Prodded by: stefanf, ru, njl (in that order)
|
154629 |
21-Jan-2006 |
trhodes |
Remove some dead code.
Found with: Coverity Prevent(tm)
|
151897 |
31-Oct-2005 |
rwatson |
Normalize a significant number of kernel malloc type names:
- Prefer '_' to ' ', as it results in more easily parsed results in memory monitoring tools such as vmstat.
- Remove punctuation that is incompatible with using memory type names as file names, such as '/' characters.
- Disambiguate some collisions by adding subsystem prefixes to some memory types.
- Generally prefer lower case to upper case.
- If the same type is defined in multiple architecture directories, attempt to use the same name in additional cases.
Not all instances were caught in this change, so more work is required to finish this conversion. Similar changes are required for UMA zone names.
|
151761 |
27-Oct-2005 |
glebius |
Keep locks consistent before goto.
Reported by: pho Reviewed by: mohans
|
150634 |
27-Sep-2005 |
jhb |
Use the refcount API to manage the reference count for user credentials rather than using pool mutexes.
Tested on: i386, alpha, sparc64
|
145197 |
17-Apr-2005 |
rwatson |
NFS write gathering defers execution of NFS server write requests to wait to see if additional write requests will arrive that can be coalesced and clustered with earlier ones. When doing so, it must determine whether the two requests are made by credentials with the same access writes, so as not to coalesce improperly. NFSW_SAMECRED() implements a test of two credentials using a binary compare.
Replace NFSW_SAMECRED() macro with nfsrv_samecred() function, which is aware of the contents and layout of a struct ucred, rather than a simple binary compare. While the binary compare works when ucred is simply a zero'd and embedded 'struct ucred' in the NFS descriptor, it will work less well when the ucred associated with an NFS descriptor is "real", so has defined and populated reference count, mutex, etc.
MFC after: 1 week Obtained from: TrustedBSD Project
|
144246 |
28-Mar-2005 |
sam |
avoid potential null ptr deref by free'ing excess mbufs instead of zero'ing their length (copied from m_adj where this code came from after the equivalent change there has had time to soak)
Noticed by: Coverity Prevent analysis tool
|
144141 |
26-Mar-2005 |
delphij |
Do not do write gathering for NFSv3, since it makes no sense unless the client is broken and does sync writes all the time.
Obtained from: NetBSD (sys/nfs/nfs_syscalls.c,v 1.44) Reviewed by: -arch (bde)
|
140770 |
24-Jan-2005 |
phk |
Don't try to create vnode_pager objects on other filesystems vnodes, either they did it themselves or it won't happen.
|
140495 |
19-Jan-2005 |
ps |
Now that we have a non blocking version of nfsm_dissect(), change all the nfsm_dissect() calls (done under the NFSD lock) to nfsm_dissect_nonblock().
Submitted by: Mohan Srinivasan
|
140181 |
13-Jan-2005 |
phk |
Ditch vfs_object_create() and make the callers call VOP_CREATEVOBJECT() directly.
|
140048 |
11-Jan-2005 |
phk |
Remove the unused credential argument from VOP_FSYNC() and VFS_SYNC().
I'm not sure why a credential was added to these in the first place, it is not used anywhere and it doesn't make much sense:
The credentials for syncing a file (ability to write to the file) should be checked at the system call level.
Credentials for syncing one or more filesystems ("none") should be checked at the system call level as well.
If the filesystem implementation needs a particular credential to carry out the syncing it would logically have to the cached mount credential, or a credential cached along with any delayed write data.
Discussed with: rwatson
|
139823 |
07-Jan-2005 |
imp |
/* -> /*- for license, minor formatting changes
|
137589 |
11-Nov-2004 |
rwatson |
Correct a bug in nfsrv_create() where a call to nfsrv_access() might be made holding the NFS server mutex. To clean this up, introduce a version of the function, nfsrv_access_withgiant(), that expects the NFS server mutex to already have been dropped and Giant acquired. Wrap nfsrv_access() around this. This permits callers to more efficiently check access if they're in a code block performing VFS operations, and can be substitited for the nfsrv_access() call that triggered this bug.
PR: 73807, 73208 MFC after: 1 week
|
136767 |
22-Oct-2004 |
phk |
Add b_bufobj to struct buf which eventually will eliminate the need for b_vp.
Initialize b_bufobj for all buffers.
Make incore() and gbincore() take a bufobj instead of a vnode.
Make inmem() local to vfs_bio.c
Change a lot of VI_[UN]LOCK(bp->b_vp) to BO_[UN]LOCK(bp->b_bufobj) also VI_MTX() to BO_MTX(),
Make buf_vlist_add() take a bufobj instead of a vnode.
Eliminate other uses of bp->b_vp where bp->b_bufobj will do.
Various minor polishing: remove "register", turn panic into KASSERT, use new function declarations, TAILQ_FOREACH_SAFE() etc.
|
136662 |
18-Oct-2004 |
rwatson |
Correct several instances where calls to vfs_getvfs() resulting in failure in the NFS server would result in a leaked instance of the NFS server subsystem lock. Liberally sprinkle assertions in all target labels for error unwinding to assert the desired locking state.
RELENG_5_3 candidate.
MFC after: 3 days Reported by: Wilkinson, Alex <alex dot wilkinson at dsto dot defence dot gov dot au>
|
134296 |
25-Aug-2004 |
rwatson |
Convert a mtx_lock(&Giant) to a mtx_unlock(&Giant) in nfsrv_link() to prevent leakage of Giant. With INVARIANTS, this results in an assertion failure following execution of the RPC. Without INVARIANTS, it could result in problems if the NFS server is killed causing nfsd to return to user space holding Giant.
Feet provided by: brueffer
|
132591 |
24-Jul-2004 |
rwatson |
If debug.mpsafenet is non-zero, run the NFS server callout without Giant.
|
132590 |
24-Jul-2004 |
rwatson |
Remove spl() use from nfsrv_timer.
|
132199 |
15-Jul-2004 |
phk |
Do a pass over all modules in the kernel and make them return EOPNOTSUPP for unknown events.
A number of modules return EINVAL in this instance, and I have left those alone for now and instead taught MOD_QUIESCE to accept this as "didn't do anything".
|
132086 |
13-Jul-2004 |
alfred |
Do not call sorecieve() in the context of a socket callback as it causes lock order reversals so->inpcb since we're called with the socket lock held.
|
131532 |
03-Jul-2004 |
rwatson |
Change M_WAITOK argument to sodupsockaddr() to M_NOWAIT. When the call to dup_sockaddr() was renamed to sodupsockaddr(), the argument was changed from '1' to 'M_WAITOK', which changed the semantics. This resulted in a WITNESS warning about a potential sleep while holding the NFS server mutex. Now this will no longer happen, restoring a possible bug present in the original code (setting RC_NAM even though the malloc to copy the addres may fail). bde observes that the flag names here should probably not be the same as the malloc flags for name space reasons.
Bumped into by: kuriyama
|
130653 |
17-Jun-2004 |
rwatson |
Merge additional socket buffer locking from rwatson_netperf:
- Lock down low hanging fruit use of sb_flags with socket buffer lock.
- Lock down low hanging fruit use of so_state with socket lock.
- Lock down low hanging fruit use of so_options.
- Lock down low-hanging fruit use of sb_lowwat and sb_hiwat with socket buffer lock.
- Annotate situations in which we unlock the socket lock and then grab the receive socket buffer lock, which are currently actually the same lock. Depending on how we want to play our cards, we may want to coallesce these lock uses to reduce overhead.
- Convert a if()->panic() into a KASSERT relating to so_state in soaccept().
- Remove a number of splnet()/splx() references.
More complex merging of socket and socket buffer locking to follow.
|
130640 |
17-Jun-2004 |
phk |
Second half of the dev_t cleanup.
The big lines are: NODEV -> NULL NOUDEV -> NODEV udev_t -> dev_t udev2dev() -> findcdev()
Various minor adjustments including handling of userland access to kernel space struct cdev etc.
|
129902 |
31-May-2004 |
bmilekic |
Giant wasn't dropped here if we have to return EBUSY. This is bad.
|
129899 |
31-May-2004 |
rwatson |
Release NFS subsystem lock and acquire Giant when calling into vn_start_write().
|
129894 |
31-May-2004 |
rwatson |
Add an assertion that nfssvc() isn't called with Giant.
Add two additional pairs of assertions, one at the end of the NFS server event loop, and one one exit from the NFS daemon, that assert that if debug.mpsafenet is enabled, Giant is not held, and that if it is not enabled, Giant will be held. This is intended to support debugging scenarios where Giant is "leaked" during NFS processing.
|
129888 |
31-May-2004 |
rwatson |
The NFS server modevent code manually patches the system call table to install nfssvc(). It also updates the argument count, but did so without setting SYF_MPSAFE, effectively removing the MPSAFE flag even when syscalls.master indicates it doesn't require Giant. This change forces the modevent to set MPSAFE as a flag to its internal notion of an argument coutn.
Note: this duplication of information is a bad thing, but is a more general problem I'm not currently willing to address.
|
129886 |
30-May-2004 |
rwatson |
One more case where we want to drop the NFS server lock and acquire Giant when entering VFS. Discovered by code inspection; still not hit without debug.mpsafenet=1.
Reported by: bmilekic
|
129885 |
30-May-2004 |
rwatson |
Acquire Giant around two more cases when calling into VFS to vput() a vnode. Not bumped into with asserts in the main tree because we run the NFS server with Giant by default. Discovered by inspection.
Complete annotations of Giant acquisition/release to note that it's only because of VFS that we acquire Giant in most places in the NFS server.
|
129841 |
29-May-2004 |
rwatson |
Don't release Giant until after the call to vput() in nfsrv_setattr(). Unless running with debug.mpsafenet=1, this was not actually a problem.
|
129839 |
29-May-2004 |
rwatson |
No need to conditionally acquire Giant in nfssvc_nfsd() because it is acquired by the caller. Should not cause problems, but causes an unnecessary recursion on Giant.
Pointed out by: bmilekic
|
129785 |
27-May-2004 |
rwatson |
Call nfsm_clget_nolock() instead of nfsm_clget() when holding the NFS subsystem lock to avoid tripping over an assertion regarding whether the lock is held or not. This is likely to be the cause of a panic tripped over by Andrea Campi.
|
129639 |
24-May-2004 |
rwatson |
The socket code upcalls into the NFS server using the so_upcall mechanism so that early processing on mbufs can be performed before a context switch to the NFS server threads. Because of this, if the socket code is running without Giant, the NFS server also needs to be able to run the upcall code without relying on the presence on Giant. This change modifies the NFS server to run using a "giant code lock" covering operation of the whole subsystem. Work is in progress to move to data-based locking as part of the NFSv4 server changes.
Introduce an NFS server subsystem lock, 'nfsd_mtx', and a set of macros to operate on the lock:
NFSD_LOCK_ASSERT() Assert nfsd_mtx owned by current thread NFSD_UNLOCK_ASSERT() Assert nfsd_mtx not owned by current thread NFSD_LOCK_DONTCARE() Advisory: this function doesn't care NFSD_LOCK() Lock nfsd_mtx NFSD_UNLOCK() Unlock nfsd_mtx
Constify a number of global variables/structures in the NFS server code, as they are not modified and contain constants only:
nfsrvv2_procid nfsrv_nfsv3_procid nonidempotent nfsv2_repstat nfsv2_type nfsrv_nfsv3_procid nfsrvv2_procid nfsrv_v2errmap nfsv3err_null nfsv3err_getattr nfsv3err_setattr nfsv3err_lookup nfsv3err_access nfsv3err_readlink nfsv3err_read nfsv3err_write nfsv3err_create nfsv3err_mkdir nfsv3err_symlink nfsv3err_mknod nfsv3err_remove nfsv3err_rmdir nfsv3err_rename nfsv3err_link nfsv3err_readdir nfsv3err_readdirplus nfsv3err_fsstat nfsv3err_fsinfo nfsv3err_pathconf nfsv3err_commit nfsrv_v3errmap
There are additional structures that should be constified but due to their being passed into general purpose functions without const arguments, I have not yet converted.
In general, acquire nfsd_mtx when accessing any of the global NFS structures, including struct nfssvc_sock, struct nfsd, struct nfsrv_descript.
Release nfsd_mtx whenever calling into VFS, and acquire Giant for calls into VFS. Giant is not required for any part of the operation of the NFS server with the exception of calls into VFS. Giant will never by acquired in the upcall code path. However, it may operate entirely covered by Giant, or not. If debug.mpsafenet is set to 0, the system calls will acquire Giant across all operations, and the upcall will assert Giant. As such, by default, this enables locking and allows us to test assertions, but should not cause any substantial new amount of code to be run without Giant. Bugs should manifest in the form of lock assertion failures for now.
This approach is similar (but not identical) to modifications to the BSD/OS NFS server code snapshot provided by BSDi as part of their SMPng snapshot. The strategy is almost the same (single lock over the NFS server), but differs in the following ways:
- Our NFS client and server code bases don't overlap, which means both fewer bugs and easier locking (thanks Peter!). Also means NFSD_*() as opposed to NFS_*().
- We make broad use of assertions, whereas the BSD/OS code does not.
- Made slightly different choices about how to handle macros building packets but operating with side effects.
- We acquire Giant only when entering VFS from the NFS server daemon threads.
- Serious bugs in BSD/OS implementation corrected -- the snapshot we received was clearly a work in progress.
Based on ideas from: BSDi SMPng Snapshot Reviewed by: rick@snowhite.cis.uoguelph.ca Extensive testing by: kris
|
128154 |
12-Apr-2004 |
mux |
Don't send the available space as is in the FSSTAT call. Under FreeBSD, we can have a negative available space value, but the corresponding fields in the NFS protocol are unsigned. So trnucate the value to 0 if it's negative, so that the client doesn't receive absurdly high values.
Tested by: cognet
|
128112 |
11-Apr-2004 |
peadar |
Don't let the NFS server module be unloaded as long as there are nfsd processes running
Reviewed By: iedowse PR: 16299
|
127977 |
07-Apr-2004 |
imp |
Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999 and email from Peter Wemm, Alan Cox and Robert Watson.
Approved by: core, peter, alc, rwatson
|
127924 |
06-Apr-2004 |
rwatson |
Add imperfect comments identifying the function of various nfs socket condition flags. Corrections, if appropriate, welcome.
|
127857 |
04-Apr-2004 |
rwatson |
Spell 2 as SHUT_RDWR when used as an argument to soshutdown().
|
127851 |
04-Apr-2004 |
rwatson |
Explicitly compare pointers with NULL rather than treating a pointer as a boolean directly, use NULL instead of 0.
|
126962 |
14-Mar-2004 |
peter |
Calculate NFS timeouts in units of 10ms, not 5ms. This matches the default clock precision on i386. This is a NOP change on i386. But this stops the mount_nfs units from suddenly changing to units of 1/20 of a second (vs the normal 1/10 of a second) if HZ is increased.
|
126853 |
11-Mar-2004 |
phk |
Properly vector all bwrite() and BUF_WRITE() calls through the same path and s/BUF_WRITE()/bwrite()/ since it now does the same as bwrite().
|
126723 |
07-Mar-2004 |
kan |
Convert from timeout to callout API.
Submitted by: rwatson
|
126425 |
01-Mar-2004 |
rwatson |
Rename dup_sockaddr() to sodupsockaddr() for consistency with other functions in kern_socket.c.
Rename the "canwait" field to "mflags" and pass M_WAITOK and M_NOWAIT in from the caller context rather than "1" or "0".
Correct mflags pass into mac_init_socket() from previous commit to not include M_ZERO.
Submitted by: sam
|
123608 |
17-Dec-2003 |
jhb |
Fix some becuase -> because typos.
Reported by: Marco Wertejuk <wertejuk@mwcis.com>
|
122823 |
17-Nov-2003 |
rwatson |
Update a comment about needing to fix NFS server credential use by 5.0-RELEASE: make it now read 5.3-RELEASE to be realistic. Still needs fixing...
|
122261 |
07-Nov-2003 |
sam |
Assert GIANT_REQUIRED where sockets are manipulated. This is preparatory for MPSAFE network commits and ongoing socket locking work.
Supported by: FreeBSD Foundation
|
121473 |
24-Oct-2003 |
phk |
When grabbing vnodes to service NFS requests, make sure to call vn_start_write() early to avoid snapshot deadlocks.
By: mckusick
|
120753 |
04-Oct-2003 |
jeff |
- Set the sopt_dir member of the sockopt structure, otherwise, this parameter will not actually be set even though we're calling sosetopt. sosetopt calls down to a single ctloutput function if the name or level is implemented by a specific protocol.
Submitted by: pete@isilon.com
|
117151 |
02-Jul-2003 |
phk |
Change idle state sleep identifier to "-" for nfsd.
|
116789 |
24-Jun-2003 |
iedowse |
Fix a bug in nfsrv_read() that caused the replies to certain NFSv3 short read operations at the end of a file to not have the "eof" flag set as they should. The problem is that the requested read count was compared against the rounded-up reply data length instead of the actual reply data length. This bug appears to have been introduced in revision 1.78 (June 1999). It causes first-time reads of certain file sizes (e.g 4094 bytes) to fail with EIO on a RedHat 9.0 NFSv3 client.
MFC after: 1 week
|
116657 |
21-Jun-2003 |
mckusick |
Increase the size of the NFS server hash table to improve performance when serving up more than about 32 active files. For details see section 6.3 (pg 111) of Daniel Ellard and Margo Seltzer, ``NFS Tricks and Benchmarking Traps'' in the Proceedings of the Usenix 2003 Freenix Track, June 9-14, 2003 pg 101-114.
Obtained from: Daniel Ellard <ellard@eecs.harvard.edu> Sponsored by: DARPA & NAI Labs.
|
116189 |
11-Jun-2003 |
obrien |
Use __FBSDID().
|
115870 |
05-Jun-2003 |
hsu |
Protect read-modify-write increment of f_count field with file lock.
|
115476 |
31-May-2003 |
phk |
Add /* FALLTHROUGH */
Found by: FlexeLint
|
115301 |
25-May-2003 |
truckman |
Beat vnode locking in the NFS server code into submission. This change is not pretty, but it fixes the code so that it no longer violates the vnode locking rules in the VFS API and doesn't trip any of the locking assertions enabled by the DEBUG_VFS_LOCKS kernel configuration option. There is one report that this patch fixed a "locking against myself" panic on an NFS server that was tripped by a diskless client.
Approved by: re (scottl)
|
113955 |
24-Apr-2003 |
alc |
- Acquire the vm_object's lock when performing vm_object_page_clean(). - Add a parameter to vm_pageout_flush() that tells vm_pageout_flush() whether its caller has locked the vm_object. (This is a temporary measure to bootstrap vm_object locking.)
|
112179 |
13-Mar-2003 |
jeff |
- Lock bufs before inspecting their flags.
|
111748 |
02-Mar-2003 |
des |
More low-hanging fruit: kill caddr_t in calls to wakeup(9) / [mt]sleep(9).
|
111463 |
25-Feb-2003 |
jeff |
- Add an interlock argument to BUF_LOCK and BUF_TIMELOCK. - Remove the buftimelock mutex and acquire the buf's interlock to protect these fields instead. - Hold the vnode interlock while locking bufs on the clean/dirty queues. This reduces some cases from one BUF_LOCK with a LK_NOWAIT and another BUF_LOCK with a LK_TIMEFAIL to a single lock.
Reviewed by: arch, mckusick
|
111251 |
22-Feb-2003 |
phk |
Don't use mbuf allocator flags for malloc(9).
|
111119 |
19-Feb-2003 |
imp |
Back out M_* changes, per decision of the TRB.
Approved by: trb
|
109623 |
21-Jan-2003 |
alfred |
Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
|
109153 |
13-Jan-2003 |
dillon |
Bow to the whining masses and change a union back into void *. Retain removal of unnecessary casts and throw in some minor cleanups to see if anyone complains, just for the hell of it.
|
109123 |
12-Jan-2003 |
dillon |
Change struct file f_data to un_data, a union of the correct struct pointer types, and remove a huge number of casts from code using it.
Change struct xfile xf_data to xun_data (ABI is still compatible).
If we need to add a #define for f_data and xf_data we can, but I don't think it will be necessary. There are no operational changes in this commit.
|
108533 |
01-Jan-2003 |
schweikh |
Correct typos, mostly s/ a / an / where appropriate. Some whitespace cleanup, especially in troff files.
|
108356 |
28-Dec-2002 |
dillon |
Abstract-out the constants for the sequential heuristic.
No operational changes.
MFC after: 1 day
|
107636 |
05-Dec-2002 |
iedowse |
In the NFSv3 `fsinfo' procedure reply, don't claim that we support 32k read and write operations on datagram sockets when in fact we reject requests larger than 16k. It must be the case that virtually all clients use data sizes of 16k or less for UDP transport (FreeBSD's client defaults to 8k and never exceeds 16k), as this bug has been present ever since NFSv3 support was added.
Reported by: Senthil <lihtnes78@netscape.net> Reviewed by: dillon Approved by: re MFC-after: 1 week
|
106412 |
04-Nov-2002 |
rwatson |
Permit MAC policies to instrument the access control decisions for system accounting configuration and for nfsd server thread attach. Policies might use this to protect the integrity or confidentiality of accounting data, limit the ability to turn on or off accounting, as well as to prevent inappropriately labeled threads from becoming nfs server threads.
Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
|
106264 |
31-Oct-2002 |
jeff |
- Introduce a new macro, since that's what nfs loves, called nfsm_srvpathsiz. This macro plucks a length out of an rpc request and verifies that its size does not exceed NFS_MAXPATHLEN. If it does it generates an ENAMETOOLONG response. - Use this macro, and the existing nfsm_srvnamsiz macro in two places where we deal with paths passed in by the client.
This fixes a linux interoperability bug. Linux was sending oversized path components which would cause us to ignore the request all together. This causes linux to hang indefinitly while it waits for a response. This could still happen in other cases where we error out with EBADRPC.
Sponsored by: Isilon Systems, Inc. Reviewed by: alfred, fabbri@isilon.com, neal@isilon.com
|
105481 |
19-Oct-2002 |
rwatson |
Set the NOMACCHECK flag for namei()'s generated by the NFS server code. We currently don't enforce protections on NFS-originated VOP's.
Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
|
104424 |
03-Oct-2002 |
rwatson |
Correct a problem wherein NFS servers running NFSv2 would not return certain classes of failure responses to the client during a failed remove operation.
Submitted by: Ian Dowse <iedowse@maths.tcd.ie>
|
103940 |
25-Sep-2002 |
jeff |
- Use incore() instead of gbincore() so we don't have to acquire the vnode interlock.
|
103554 |
18-Sep-2002 |
phk |
Use m_length() instead of home-rolled versions.
|
102236 |
21-Aug-2002 |
phk |
Make the V2 errno translation more resistent to new errnos.
|
101308 |
04-Aug-2002 |
jeff |
- Replace v_flag with v_iflag and v_vflag - v_vflag is protected by the vnode lock and is used when synchronization with VOP calls is needed. - v_iflag is protected by interlock and is used for dealing with vnode management issues. These flags include X/O LOCK, FREE, DOOMED, etc. - All accesses to v_iflag and v_vflag have either been locked or marked with mp_fixme's. - Many ASSERT_VOP_LOCKED calls have been added where the locking was not clear. - Many functions in vfs_subr.c were restructured to provide for stronger locking.
Idea stolen from: BSD/OS
|
100644 |
24-Jul-2002 |
peter |
Oops, another unused arg to nfssvc_nfsd(). *blush*
Submitted by: jake
|
100641 |
24-Jul-2002 |
peter |
Fully exterminate nfsd_srvargs and nfsd_cargs. They were either unused or giant NOP's. There was a credential in srvargs that was giving rwatson some heartburn. :-)
|
100606 |
24-Jul-2002 |
rwatson |
Stick a dark comment in about the fact that the NFS server code allocates a ucred by itself as part of an nfs descriptor, then bzero's the ucred, fails to initialize the mutex, etc. This is very bad, but I don't have time to fix it right now. nfsd should instead hold a cred pointer, and the credential should be properly initialized, probably from a descendent of a kernel process credential.
|
100509 |
22-Jul-2002 |
ume |
sync comment with reality. IN6P_BINDV6ONLY -> IN6P_IPV6_V6ONLY.
|
100205 |
17-Jul-2002 |
dillon |
'recm' was not being unconditionally cleared for each loop, leading to system lockups (infinite loops) when a zero-length RPC is received. Linux clients will sometimes send zero-length RPC requests.
Reorganize the use of recm in the loop.
Cc: security@freebsd.org Submitted by: Mike Junk <junk@isilon.com> MFC after: 3 days
|
100134 |
15-Jul-2002 |
alfred |
Add IPv6 support.
Submitted by: Jean-Luc Richier <Jean-Luc.Richier@imag.fr>
|
99797 |
11-Jul-2002 |
dillon |
Convert old style (type foo *)0 casts to NULLs
PR: kern/40360 Requested by: Hiten PAndya via direct email
|
99737 |
10-Jul-2002 |
dillon |
Replace the global buffer hash table with per-vnode splay trees using a methodology similar to the vm_map_entry splay and the VM splay that Alan Cox is working on. Extensive testing has appeared to have shown no increase in overhead.
Disadvantages Dirties more cache lines during lookups.
Not as fast as a hash table lookup (but still N log N and optimal when there is locality of reference).
Advantages vnode->v_dirtyblkhd is now perfectly sorted, making fsync/sync/filesystem syncer operate more efficiently.
I get to rip out all the old hacks (some of which were mine) that tried to keep the v_dirtyblkhd tailq sorted.
The per-vnode splay tree should be easier to lock / SMPng pushdown on vnodes will be easier.
This commit along with another that Alan is working on for the VM page global hash table will allow me to implement ranged fsync(), optimize server-side nfs commit rpcs, and implement partial syncs by the filesystem syncer (aka filesystem syncer would detect that someone is trying to get the vnode lock, remembers its place, and skip to the next vnode).
Note that the buffer cache splay is somewhat more complex then other splays due to special handling of background bitmap writes (multiple buffers with the same lblkno in the same vnode), and B_INVAL discontinuities between the old hash table and the existence of the buffer on the v_cleanblkhd list.
Suggested by: alc
|
97658 |
31-May-2002 |
tanimura |
Back out my lats commit of locking down a socket, it conflicts with hsu's work.
Requested by: hsu
|
96972 |
20-May-2002 |
tanimura |
Lock down a socket, milestone 1.
o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a socket buffer. The mutex in the receive buffer also protects the data in struct socket.
o Determine the lock strategy for each members in struct socket.
o Lock down the following members:
- so_count - so_options - so_linger - so_state
o Remove *_locked() socket APIs. Make the following socket APIs touching the members above now require a locked socket:
- sodisconnect() - soisconnected() - soisconnecting() - soisdisconnected() - soisdisconnecting() - sofree() - soref() - sorele() - sorwakeup() - sotryfree() - sowakeup() - sowwakeup()
Reviewed by: alfred
|
96755 |
16-May-2002 |
trhodes |
More s/file system/filesystem/g
|
95215 |
21-Apr-2002 |
iedowse |
Limit to the maximum allowed reply size the amount of data that nfsrv_readdir and nfsrv_readdirplus can return. A client request containing an over-large `count' field could trigger the "Bad nfs svc reply" panic in nfs_syscalls.c.
Spotted while trying to reproduce kern/37304, which turned out to be fixed in FreeBSD a long time ago.
MFC after: 1 week
|
93593 |
01-Apr-2002 |
jhb |
Change the suser() API to take advantage of td_ucred as well as do a general cleanup of the API. The entire API now consists of two functions similar to the pre-KSE API. The suser() function takes a thread pointer as its only argument. The td_ucred member of this thread must be valid so the only valid thread pointers are curthread and a few kernel threads such as thread0. The suser_cred() function takes a pointer to a struct ucred as its first argument and an integer flag as its second argument. The flag is currently only used for the PRISON_ROOT flag.
Discussed on: smp@
|
92783 |
20-Mar-2002 |
jeff |
Remove references to vm_zone.h and switch over to the new uma API.
|
92462 |
17-Mar-2002 |
mckusick |
Add a flags parameter to VFS_VGET to pass through the desired locking flags when acquiring a vnode. The immediate purpose is to allow polling lock requests (LK_NOWAIT) needed by soft updates to avoid deadlock when enlisting other processes to help with the background cleanup. For the future it will allow the use of shared locks for read access to vnodes. This change touches a lot of files as it affects most filesystems within the system. It has been well tested on FFS, loopback, and CD-ROM filesystems. only lightly on the others, so if you find a problem there, please let me (mckusick@mckusick.com) know.
|
91406 |
27-Feb-2002 |
jhb |
Simple p_ucred -> td_ucred changes to start using the per-thread ucred reference.
|
89367 |
14-Jan-2002 |
dillon |
The vnode was not being vput()'d in the EEXIST mknod case on the nfs server side. This can lead to a system deadlock.
Reviewed by: iedowse Tested by: Alexey G Misurenko <mag@caravan.ru>, iedowse Bug found with help by: Alexey G Misurenko <mag@caravan.ru> MFC at: earliest convenience
|
89302 |
13-Jan-2002 |
iedowse |
It is required by VOP_CREATE, VOP_MKNOD, VOP_SYMLINK and VOP_MKDIR that va_mode of the supplied attributes is filled in with a valid file mode (i.e not VNOVAL, and only ALLPERM bits set). However, some NFS server op functions didn't guarantee this for all possible request messages:
If a V3 client chose not include to a mode specification, we could end up creating an ffs inode with mode 0177777, requiring a manual fsck on the next reboot. Fix this by setting va_mode to 0 before calling the VOP if a mode hasn't been supplied by the client.
In nfsrv_symlink(), S_IFMT bits supplied by a V2 client could end up in the va_mode passed to VOP_SYMLINK with similar effects. We now use the macro nfstov_mode() to correctly mask the bits.
|
89278 |
12-Jan-2002 |
iedowse |
Fix a few NFSv2 issues that slipped in during the big cleanup. The semantics of the nfsm_reply() macro were changed so that the caller has to explicitly handle the V2 error case, whereas before, nfsm_reply() did a `goto nfsmout' then. A few server ops (setattr, readlink, create, mkdir) weren't updated to match, so errors in the V2 case could cause protocol hangs and leaked mbufs.
Correct some comments that describe the old nfsm_reply behaviour.
[older, harmless nit] Remove the unnecessary `nfsmreply0' label in nfsrv_create(), since for its users, the main `ereply' label does the same thing.
|
89272 |
11-Jan-2002 |
iedowse |
The macro nfsm_reply() is supposed to allocate a reply in all cases, but since the nfs cleanup, it hasn't done so in the case where `error' is EBADRPC. Callers of this macro expect it to initialise *mrq, and the `nfsmout' exit point expects a reply to be allocated if error == 0. When nfsm_reply() was called with error = EBADRPC, whatever junk was in *mrq (often a stale pointer to an old reply mbuf) would be assumed to be a valid reply and passed to pru_sosend(), causing a crash sooner or later.
Fix this by allocating a reply even in the EBADRPC case like we used to do. This bug was specific to -current.
|
89094 |
08-Jan-2002 |
msmith |
Rename some variables that end up shadowing their namesakes in the NFS client code.
Reviewed by: peter
|
88091 |
18-Dec-2001 |
iedowse |
Avoid passing the variable `tl' to functions that just use it for temporary storage. In the old NFS code it wasn't at all clear if the value of `tl' was used across or after macro calls, but I'm fairly confident that the convention was to keep its use local. Each ex-macro function now uses a local version of this variable, so all of the double-indirection goes away.
The only exception to the `local use' rule for `tl' is nfsm_clget(), which is left unchanged by this commit.
Reviewed by: peter
|
87361 |
04-Dec-2001 |
iedowse |
When VOP_SYMLINK fails, the value of *vpp is junk, so we must NULL out nd.ni_vp to prevent the resource cleanup code at the end of nfsrv_symlink from trying to vrele it. This fixes a "vrele: negative ref cnt" panic that can occur when a symlink is attempted on an NFS filesystem with no free space. Found locally, but the symptoms correspond to those in the PR referenced below.
PR: kern/26878 MFC after: 3 days
|
86487 |
17-Nov-2001 |
dillon |
Give struct socket structures a ref counting interface similar to vnodes. This will hopefully serve as a base from which we can expand the MP code. We currently do not attempt to obtain any mutex or SX locks, but the door is open to add them when we nail down exactly how that part of it is going to work.
|
86432 |
15-Nov-2001 |
peter |
Fix a leftover client comment, long line fix.
|
85498 |
25-Oct-2001 |
iedowse |
Now that nfsm_reply() does not usually set 'error' to 0, we need to do it explicitly in nfsrv_noop so that the reply gets sent back to the client. This fixes the generation of a selection of RPC error replies (RPC_PROGMISMATCH, RPC_PROGUNAVAIL, RPC_PROCUNAVAIL etc.) that are used by some clients to detect support for optional protocols and features.
Reviewed by: peter Reported by: Thomas Quinot <quinot@inf.enst.fr> PR: kern/31479
|
84079 |
28-Sep-2001 |
peter |
Unwind some more macros. NFSMADV() was kinda silly since it was right next to equivalent m_len adjustments. Move the nfsm_subs.h macros into groups depending on which phase they are used in, since that affects the error recovery requirements. Collect some of the common error checking into a single macro as preparation for unwinding some more. Have nfs_rephead return a value instead of secretly modifying args. Remove some unused function arguments that were being passed around. Clarify nfsm_reply()'s error handling (I hope).
|
84078 |
28-Sep-2001 |
peter |
Oops. I forgot to cvs rm this before. There is a common nfsproto.h. This was a repo copy leftover.
|
84057 |
27-Sep-2001 |
peter |
Make nfsm_dissect() have an obvious return value.
|
84002 |
27-Sep-2001 |
peter |
Tidy up nfsm_build usage. This is only partially finished.
|
83700 |
20-Sep-2001 |
peter |
Wrap a module around the init code so that we have somethign do do a modfind(2) on, and declare a version so that loader/kldload etc can locate it (using kldxref's linker.hints file if needed).
|
83651 |
18-Sep-2001 |
peter |
Cleanup and split of nfs client and server code. This builds on the top of several repo-copies.
|
83493 |
15-Sep-2001 |
peter |
Sync some differences that were different between the copies of the files that were in nfs/nfs.h and nfsserver/nfs.h in the p4 tree.
|
83366 |
12-Sep-2001 |
julian |
KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process.
Sorry john! (your next MFC will be a doosie!)
Reviewed by: peter@freebsd.org, dillon@freebsd.org
X-MFC after: ha ha ha ha
|
83291 |
10-Sep-2001 |
kris |
Fix some signed/unsigned integer confusion, and add bounds checking of arguments to some functions.
Obtained from: NetBSD Reviewed by: peter MFC after: 2 weeks
|
82702 |
31-Aug-2001 |
dillon |
Pushdown Giant for nfs syscalls (nfssvc())
|
79224 |
04-Jul-2001 |
dillon |
With Alfred's permission, remove vm_mtx in favor of a fine-grained approach (this commit is just the first stage). Also add various GIANT_ macros to formalize the removal of Giant, making it easy to test in a more piecemeal fashion. These macros will allow us to test fine-grained locks to a degree before removing Giant, and also after, and to remove Giant in a piecemeal fashion via sysctl's on those subsystems which the authors believe can operate without Giant.
|
78911 |
28-Jun-2001 |
jhb |
- Protect the mnt_vnode list with the mntvnode lock. - Use queue(9) macros.
|
76827 |
19-May-2001 |
alfred |
Introduce a global lock for the vm subsystem (vm_mtx).
vm_mtx does not recurse and is required for most low level vm operations.
faults can not be taken without holding Giant.
Memory subsystems can now call the base page allocators safely.
Almost all atomic ops were removed as they are covered under the vm mutex.
Alpha and ia64 now need to catch up to i386's trap handlers.
FFS and NFS have been tested, other filesystems will need minor changes (grabbing the vm lock when twiddling page properties).
Reviewed (partially) by: jake, jhb
|
76166 |
01-May-2001 |
markm |
Undo part of the tangle of having sys/lock.h and sys/mutex.h included in other "system" header files.
Also help the deprecation of lockmgr.h by making it a sub-include of sys/lock.h and removing sys/lockmgr.h form kernel .c files.
Sort sys/*.h includes where possible in affected files.
OK'ed by: bde (with reservations)
|
76117 |
29-Apr-2001 |
grog |
Revert consequences of changes to mount.h, part 2.
Requested by: bde
|
75858 |
23-Apr-2001 |
grog |
Correct #includes to work with fixed sys/mount.h.
|
75631 |
17-Apr-2001 |
alfred |
Implement client side NFS locks.
Obtained from: BSD/os Import Ok'd by: mckusick, jkh, motd on builder.freebsd.org
|
74384 |
17-Mar-2001 |
peter |
Use a generic implementation of the Fowler/Noll/Vo hash (FNV hash). Make the name cache hash as well as the nfsnode hash use it.
As a special tweak, create an unsigned version of register_t. This allows us to use a special tweak for the 64 bit versions that significantly speeds up the i386 version (ie: int64 XOR int64 is slower than int64 XOR int32).
The code layout is a little strange for the string function, but I was able to get between 5 to 10% improvement over the original version I started with. The layout affects gcc code generation choices and this way was fastest on x86 and alpha.
Note that 'CPUTYPE=p3' etc makes a fair difference to this. It is around 45% faster with -march=pentiumpro on a p6 cpu.
|
72650 |
18-Feb-2001 |
green |
Switch to using a struct xucred instead of a struct xucred when not actually in the kernel. This structure is a different size than what is currently in -CURRENT, but should hopefully be the last time any application breakage is caused there. As soon as any major inconveniences are removed, the definition of the in-kernel struct ucred should be conditionalized upon defined(_KERNEL).
This also changes struct export_args to remove dependency on the constantly-changing struct ucred, as well as limiting the bounds of the size fields to the correct size. This means: a) mountd and friends won't break all the time, b) mountd and friends won't crash the kernel all the time if they don't know what they're doing wrt actual struct export_args layout.
Reviewed by: bde
|
72645 |
18-Feb-2001 |
asmodai |
Preceed/preceeding are not english words. Use precede and preceding.
|
72216 |
09-Feb-2001 |
iedowse |
Fix some problems that were introduced in revision 1.97. Instead of returning an error code to the caller, NFS server op routines must themselves build an error reply and return 0 to the caller.
This is achieved by replacing the erroneous return statements with code that jumps forward to the op function's reply code. We need to be careful to ensure that the 'struct mount' pointer is NULL though, so that the final vn_finished_write() call becomes a no-op.
Reviewed by: mckusick, dillon
|
70254 |
21-Dec-2000 |
bmilekic |
* Rename M_WAIT mbuf subsystem flag to M_TRYWAIT. This is because calls with M_WAIT (now M_TRYWAIT) may not wait forever when nothing is available for allocation, and may end up returning NULL. Hopefully we now communicate more of the right thing to developers and make it very clear that it's necessary to check whether calls with M_(TRY)WAIT also resulted in a failed allocation. M_TRYWAIT basically means "try harder, block if necessary, but don't necessarily wait forever." The time spent blocking is tunable with the kern.ipc.mbuf_wait sysctl. M_WAIT is now deprecated but still defined for the next little while.
* Fix a typo in a comment in mbuf.h
* Fix some code that was actually passing the mbuf subsystem's M_WAIT to malloc(). Made it pass M_WAITOK instead. If we were ever to redefine the value of the M_WAIT flag, this could have became a big problem.
|
69781 |
08-Dec-2000 |
dwmalone |
Convert more malloc+bzero to malloc+M_ZERO.
Submitted by: josh@zipperup.org Submitted by: Robert Drehmel <robd@gmx.net>
|
69214 |
26-Nov-2000 |
phk |
Simplify the tprintf() API.
Loose the special <sys/tprintf.h> #include file.
|
68883 |
18-Nov-2000 |
dillon |
This patchset fixes a large number of file descriptor race conditions. Pre-rfork code assumed inherent locking of a process's file descriptor array. However, with the advent of rfork() the file descriptor table could be shared between processes. This patch closes over a dozen serious race conditions related to one thread manipulating the table (e.g. closing or dup()ing a descriptor) while another is blocked in an open(), close(), fcntl(), read(), write(), etc...
PR: kern/11629 Discussed with: Alexander Viro <viro@math.psu.edu>
|
68711 |
14-Nov-2000 |
mckusick |
In preparation for deprecating CIRCLEQ macros in favor of TAILQ macros which provide the same functionality and are a bit more efficient, convert use of CIRCLEQ's in NFS to TAILQ's.
|
67882 |
29-Oct-2000 |
phk |
Remove unneeded #include <sys/proc.h> lines.
|
67486 |
24-Oct-2000 |
dwmalone |
Problem to avoid processes getting stuck in "vmopar". From Ian's mail:
The problem seems to originate with NFS's postop_attr information that is returned with a read or write RPC. Within a vm_fault context, the code cannot deal with vnode_pager_setsize() shrinking a vnode.
The workaround in the patch below stops the nfsm_postop_attr() macro from ever shrinking a vnode. If the new size in the postop_attr information is smaller, then it just sets the nfsnode n_attrstamp to 0 to stop the wrong size getting used in the future. This change only affects postop_attr attributes; the nfsm_loadattr() macro works as normal.
The change is implemented by adding a new argument to nfs_loadattrcache() called 'dontshrink'. When this is non-zero, nfs_loadattrcache() will never reduce the vnode/nfsnode size; instead it zeros n_attrstamp.
There remain other was processes can get stuck in vmopar.
Submitted by: Ian Dowse <iedowse@maths.tcd.ie> Reviewed by: dillon Tested by: Vadim Belman <voland@lflat.org>
|
65557 |
07-Sep-2000 |
jasone |
Major update to the way synchronization is done in the kernel. Highlights include:
* Mutual exclusion is used instead of spl*(). See mutex(9). (Note: The alpha port is still in transition and currently uses both.)
* Per-CPU idle processes.
* Interrupts are run in their own separate kernel threads and can be preempted (i386 only).
Partially contributed by: BSDi (BSD/OS) Submissions by (at least): cp, dfr, dillon, grog, jake, jhb, sheldonh
|
63788 |
24-Jul-2000 |
mckusick |
This patch corrects the first round of panics and hangs reported with the new snapshot code.
Update addaliasu to correctly implement the semantics of the old checkalias function. When a device vnode first comes into existence, check to see if an anonymous vnode for the same device was created at boot time by bdevvp(). If so, adopt the bdevvp vnode rather than creating a new vnode for the device. This corrects a problem which caused the kernel to panic when taking a snapshot of the root filesystem.
Change the calling convention of vn_write_suspend_wait() to be the same as vn_start_write().
Split out softdep_flushworklist() from softdep_flushfiles() so that it can be used to clear the work queue when suspending filesystem operations.
Access to buffers becomes recursive so that snapshots can recursively traverse their indirect blocks using ffs_copyonwrite() when checking for the need for copy on write when flushing one of their own indirect blocks. This eliminates a deadlock between the syncer daemon and a process taking a snapshot.
Ensure that softdep_process_worklist() can never block because of a snapshot being taken. This eliminates a problem with buffer starvation.
Cleanup change in ffs_sync() which did not synchronously wait when MNT_WAIT was specified. The result was an unclean filesystem panic when doing forcible unmount with heavy filesystem I/O in progress.
Return a zero'ed block when reading a block that was not in use at the time that a snapshot was taken. Normally, these blocks should never be read. However, the readahead code will occationally read them which can cause unexpected behavior.
Clean up the debugging code that ensures that no blocks be written on a filesystem while it is suspended. Snapshots must explicitly label the blocks that they are writing during the suspension so that they do not cause a `write on suspended filesystem' panic.
Reorganize ffs_copyonwrite() to eliminate a deadlock and also to prevent a race condition that would permit the same block to be copied twice. This change eliminates an unexpected soft updates inconsistency in fsck caused by the double allocation.
Use bqrelse rather than brelse for buffers that will be needed soon again by the snapshot code. This improves snapshot performance.
|
62976 |
11-Jul-2000 |
mckusick |
Add snapshots to the fast filesystem. Most of the changes support the gating of system calls that cause modifications to the underlying filesystem. The gating can be enabled by any filesystem that needs to consistently suspend operations by adding the vop_stdgetwritemount to their set of vnops. Once gating is enabled, the function vfs_write_suspend stops all new write operations to a filesystem, allows any filesystem modifying system calls already in progress to complete, then sync's the filesystem to disk and returns. The function vfs_write_resume allows the suspended write operations to begin again. Gating is not added by default for all filesystems as for SMP systems it adds two extra locks to such critical kernel paths as the write system call. Thus, gating should only be added as needed.
Details on the use and current status of snapshots in FFS can be found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness is not included here. Unless and until you create a snapshot file, these changes should have no effect on your system (famous last words).
|
60938 |
26-May-2000 |
jake |
Back out the previous change to the queue(3) interface. It was not discussed and should probably not happen.
Requested by: msmith and others
|
60833 |
23-May-2000 |
jake |
Change the way that the queue(3) structures are declared; don't assume that the type argument to *_HEAD and *_ENTRY is a struct.
Suggested by: phk Reviewed by: phk Approved by: mdodd
|
60041 |
05-May-2000 |
phk |
Separate the struct bio related stuff out of <sys/buf.h> into <sys/bio.h>.
<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall not be made a nested include according to bdes teachings on the subject of nested includes.
Diskdrivers and similar stuff below specfs::strategy() should no longer need to include <sys/buf.> unless they need caching of data.
Still a few bogus uses of struct buf to track down.
Repocopy by: peter
|
59794 |
30-Apr-2000 |
phk |
Remove unneeded #include <vm/vm_zone.h>
Generated by: src/tools/tools/kerninclude
|
59391 |
19-Apr-2000 |
phk |
Remove ~25 unneeded #include <sys/conf.h> Remove ~60 unneeded #include <sys/malloc.h>
|
58710 |
27-Mar-2000 |
dillon |
Add a sysctl to specify the amount of UDP receive space NFS should reserve, in maximal NFS packets. Originally only 2 packets worth of space was reserved. The default is now 4, which appears to greatly improve performance for slow to mid-speed machines on gigabit networks.
Add documentation and correct some prior documentation.
Problem Researched by: Andrew Gallatin <gallatin@cs.duke.edu> Approved by: jkh
|
58349 |
20-Mar-2000 |
phk |
Rename the existing BUF_STRATEGY() to DEV_STRATEGY()
substitute BUF_WRITE(foo) for VOP_BWRITE(foo->b_vp, foo)
substitute BUF_STRATEGY(foo) for VOP_STRATEGY(foo->b_vp, foo)
This patch is machine generated except for the ccd.c and buf.h parts.
|
58345 |
20-Mar-2000 |
phk |
Remove B_READ, B_WRITE and B_FREEBUF and replace them with a new field in struct buf: b_iocmd. The b_iocmd is enforced to have exactly one bit set.
B_WRITE was bogusly defined as zero giving rise to obvious coding mistakes.
Also eliminate the redundant struct buf flag B_CALL, it can just as efficiently be done by comparing b_iodone to NULL.
Should you get a panic or drop into the debugger, complaining about "b_iocmd", don't continue. It is likely to write on your disk where it should have been reading.
This change is a step in the direction towards a stackable BIO capability.
A lot of this patch were machine generated (Thanks to style(9) compliance!)
Vinum users: Greg has not had time to test this yet, be careful.
|
57178 |
13-Feb-2000 |
peter |
Clean up some loose ends in the network code, including the X.25 and ISO #ifdefs. Clean out unused netisr's and leftover netisr linker set gunk. Tested on x86 and alpha, including world.
Approved by: jkh
|
55934 |
13-Jan-2000 |
dillon |
The alpha build cuases the 'nfsuid bloated' warning to occur. Well, there is nothing we can do about it. In fact, after further review there simply are not very many instances of the two structures NFS checks for 'bloat' so I've decided to simply rip the checks out entirely.
Submitted by: Andrew Gallatin <gallatin@cs.duke.edu>
|
55679 |
09-Jan-2000 |
shin |
tcp updates to support IPv6. also a small patch to sys/nfs/nfs_socket.c, as max_hdr size change.
Reviewed by: freebsd-arch, cvs-committers Obtained from: KAME project
|
55431 |
05-Jan-2000 |
dillon |
Enhance reassignbuf(). When a buffer cannot be time-optimally inserted into vnode dirtyblkhd we append it to the list instead of prepend it to the list in order to maintain a 'forward' locality of reference, which is arguably better then 'reverse'. The original algorithm did things this way to but at a huge time cost.
Enhance the append interlock for NFS writes to handle intr/soft mounts better.
Fix the hysteresis for NFS async daemon I/O requests to reduce the number of unnecessary context switches.
Modify handling of NFS mount options. Any given user option that is too high now defaults to the kernel maximum for that option rather then the kernel default for that option.
Reviewed by: Alfred Perlstein <bright@wintelcom.net>
|
55206 |
29-Dec-1999 |
peter |
Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL" is an application space macro and the applications are supposed to be free to use it as they please (but cannot). This is consistant with the other BSD's who made this change quite some time ago. More commits to come.
|
54970 |
21-Dec-1999 |
alfred |
make getfh a standard syscall instead of dependant on having NFSSERVER defined, useful for userland fileservers that want to use a filehandle type interface to the filesystem.
Submitted by: Assar Westerlund assar@stacken.kth.se PR: kern/15452
|
54799 |
19-Dec-1999 |
green |
M_PREPEND-related cleanups (unregisterifying struct mbuf *s).
|
54785 |
18-Dec-1999 |
dillon |
Fix compilation warning on alpha when converting pointer to integer to generate hash index.
Reviewed by: Andrew Gallatin <gallatin@cs.duke.edu>
|
54693 |
16-Dec-1999 |
dillon |
Have NFS use a snapshot of boottime instead of boottime itself to generate the NFSv3 Version id. boottime itself may change, sometimes once every tick if you are running xntpd, which really throws off clients. Clients will tend to throw away what they believe to be stale data too often, and can get into long loops rewriting the same data over and over again because they believe the server has rebooted over and over again due to the changing version id.
Approved by: jkh
|
54655 |
15-Dec-1999 |
eivind |
Introduce NDFREE (and remove VOP_ABORTOP)
|
54567 |
13-Dec-1999 |
dillon |
Add a readahead heuristic to the NFS server side code. While the server cannot unilaterally pass data to a client it can reduce the physical disk transaction overhead by reading larger blocks. This results in better pipelining of requests/responses over the network and an almost 100% increase in cpu efficiency on the server. On a 100BaseTX network NFS read performance increases from 8.5 MBytes/sec to 10 MB/sec (maxed out), and cpu efficiency increases from 72% idle to 80% idle on the server.
Reviewed by: Alfred Perlstein <bright@wintelcom.net>
|
54564 |
13-Dec-1999 |
dillon |
PR: kern/15222 Submitted by: Ian Dowse <iedowse@maths.tcd.ie>
|
54536 |
13-Dec-1999 |
dillon |
Fix a timeout deadlock that can occur when the process holding the receive lock hasn't yet managed to send its own request.
PR: kern/15055 Submitted by: Ian Dowse iedowse@maths.tcd.ie
|
54485 |
12-Dec-1999 |
dillon |
Fix a number of server-side issues related to aborting badly formed NFS packets, mainly initializing structure pointers to NULL which are conditionally freed prior to return.
PR: kern/15249 Submitted by: Ian Dowse <iedowse@maths.tcd.ie>
|
54480 |
12-Dec-1999 |
dillon |
Synopsis of problem being fixed: Dan Nelson originally reported that blocks of zeros could wind up in a file written to over NFS by a client. The problem only occurs a few times per several gigabytes of data. This problem turned out to be bug #3 below.
bug #1:
B_CLUSTEROK must be cleared when an NFS buffer is reverted from stage 2 (ready for commit rpc) to stage 1 (ready for write). Reversions can occur when a dirty NFS buffer is redirtied with new data.
Otherwise the VFS/BIO system may end up thinking that a stage 1 NFS buffer is clusterable. Stage 1 NFS buffers are not clusterable.
bug #2:
B_CLUSTEROK was inappropriately set for a 'short' NFS buffer (short buffers only occur near the EOF of the file). Change to only set when the buffer is a full biosize (usually 8K). This bug has no effect but should be fixed in -current anyway. It need not be backported.
bug #3:
B_NEEDCOMMIT was inappropriately set in nfs_flush() (which is typically only called by the update daemon). nfs_flush() does a multi-pass loop but due to the lack of vnode locking it is possible for new buffers to be added to the dirtyblkhd list while a flush operation is going on. This may result in nfs_flush() setting B_NEEDCOMMIT on a buffer which has *NOT* yet gone through its stage 1 write, causing only the commit rpc to be made and thus causing the contents of the buffer to be thrown away (never sent to the server).
The patch also contains some cleanup, which only applies to the commit into -current.
Reviewed by: dg, julian Originally Reported by: Dan Nelson <dnelson@emsphone.com>
|
53552 |
22-Nov-1999 |
dillon |
nm_srtt and nm_sdrtt are arrays[4]. Remove explicit initialization of element [4] in both, which goes beyond the end of the array, leaving [0], [1], [2], and [3]. This bug did not cause any problems since the overrun fields are initialized after the bogus array init but needs to be fixed anyway.
Submitted by: Ian Dowse <iedowse@maths.tcd.ie>
|
53131 |
13-Nov-1999 |
eivind |
Remove WILLRELE from VOP_SYMLINK
Note: Previous commit to these files (except coda_vnops and devfs_vnops) that claimed to remove WILLRELE from VOP_RENAME actually removed it from VOP_MKNOD.
|
53101 |
12-Nov-1999 |
eivind |
Remove WILLRELE from VOP_RENAME
|
53095 |
11-Nov-1999 |
dillon |
Remove special case socket sharing code in order to allow nfsd to bind IP addresses to udp/cltp sockets separately.
PR: kern/13049 Reviewed by: David Malone <dwmalone@maths.tcd.ie>, freebsd-current
|
53022 |
08-Nov-1999 |
dillon |
Fix nfssvc_addsock() to not attempt to free a NULL socket structure when returning an error. Bug fix was extracted from the PR. The PR is not yet entirely resolved by this commit.
PR: kern/13049 Reviewed by: Matt Dillon <dillon@freebsd.org> Submitted by: Ian Dowse <iedowse@maths.tcd.ie>
|
52492 |
25-Oct-1999 |
dillon |
Move NFS access cache hits/misses into nfsstats structure so /usr/bin/nfsstat can get to it easily.
|
51906 |
03-Oct-1999 |
phk |
Before we start to mess with the VFS name-cache clean things up a little bit: Isolate the namecache in its own file, and give it a dedicated malloc type.
|
51799 |
29-Sep-1999 |
marcel |
Careless use of struct proc *p caused major problems. 'p' is allowed to be NULL in this function (nfs_sigintr). Reorder the statements and guard them all with a single if (p != NULL).
reported, reviewed and tested by: jdp
|
51796 |
29-Sep-1999 |
dillon |
Make FreeBSD less conservative in determining when to return a cookie error for a directory. I have made this change after a great deal of review although I cannot be absolutely sure that this meets the spec.
The issue devolves into whether changes in an underlying (UFS) directory can cause NFS directory blocks to be renumbered. My read of the code indicates that NFS directory blocks will not be renumbered, which means that the cookies should still remain valid after a change is made to the underlying directory. This being the case, a cookie error should not be returned when a change is made to the underlying directory and, instead, the NFS client should rely on mtime detection to invalidate and reload the directory.
The use of mtime is problematic in of itself, due to insufficient resolution, which is why I believe the original conservative error handling was done. Still, there have been dozens of bug reports by people needing solaris<->FreeBSD interoperability and these have to be accomodated.
|
51791 |
29-Sep-1999 |
marcel |
sigset_t change (part 2 of 5) -----------------------------
The core of the signalling code has been rewritten to operate on the new sigset_t. No methodological changes have been made. Most references to a sigset_t object are through macros (see signalvar.h) to create a level of abstraction and to provide a basis for further improvements.
The NSIG constant has not been changed to reflect the maximum number of signals possible. The reason is that it breaks programs (especially shells) which assume that all signals have a non-null name in sys_signame. See src/bin/sh/trap.c for an example. Instead _SIG_MAXSIG has been introduced to hold the maximum signal possible with the new sigset_t.
struct sigprop has been moved from signalvar.h to kern_sig.c because a) it is only used there, and b) access must be done though function sigprop(). The latter because the table doesn't holds properties for all signals, but only for the first NSIG signals.
signal.h has been reorganized to make reading easier and to add the new and/or modified structures. The "old" structures are moved to signalvar.h to prevent namespace polution.
Especially the coda filesystem suffers from the change, because it contained lines like (p->p_sigmask == SIGIO), which is easy to do for integral types, but not for compound types.
NOTE: kdump (and port linux_kdump) must be recompiled.
Thanks to Garrett Wollman and Daniel Eischen for pressing the importance of changing sigreturn as well.
|
51344 |
17-Sep-1999 |
dillon |
Asynchronized client-side nfs_commit. NFS commit operations were previously issued synchronously even if async daemons (nfsiod's) were available. The commit has been moved from the strategy code to the doio code in order to asynchronize it.
Removed use of lastr in preparation for removal of vnode->v_lastr. It has been replaced with seqcount, which is already supported by the system and, in fact, gives us a better heuristic for sequential detection then lastr ever did.
Made major performance improvements to the server side commit. The server previously fsync'd the entire file for each commit rpc. The server now bawrite()s only those buffers related to the offset/size specified in the commit rpc.
Note that we do not commit the meta-data yet. This works still needs to be done.
Note that a further optimization can be done (and has not yet been done) on the client: we can merge multiple potential commit rpc's into a single rpc with a greater file offset/size range and greatly reduce rpc traffic.
Reviewed by: Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com>
|
51138 |
11-Sep-1999 |
alfred |
Seperate the export check in VFS_FHTOVP, exports are now checked via VFS_CHECKEXP.
Add fh(open|stat|stafs) syscalls to allow userland to query filesystems based on (network) filehandle.
Obtained from: NetBSD
|
50521 |
28-Aug-1999 |
phk |
remove unused variables.
|
50477 |
28-Aug-1999 |
peter |
$Id$ -> $FreeBSD$
|
50405 |
26-Aug-1999 |
phk |
Simplify the handling of VCHR and VBLK vnodes using the new dev_t:
Make the alias list a SLIST.
Drop the "fast recycling" optimization of vnodes (including the returning of a prexisting but stale vnode from checkalias). It doesn't buy us anything now that we don't hardlimit vnodes anymore.
Rename checkalias2() and checkalias() to addalias() and addaliasu() - which takes dev_t and udev_t arg respectively.
Make the revoke syscalls use vcount() instead of VALIASED.
Remove VALIASED flag, we don't need it now and it is faster to traverse the much shorter lists than to maintain the flag.
vfs_mountedon() can check the dev_t directly, all the vnodes point to the same one.
Print the devicename in specfs/vprint().
Remove a couple of stale LFS vnode flags.
Remove unimplemented/unused LK_DRAINED;
|
50053 |
19-Aug-1999 |
peter |
Convert all the nfs macros to do { blah } while (0) to ensure it works correctly in if/else etc. egcs had probably picked up most of the problems here before with "ambiguous braces" etc, but this should increase the robustness a bit. Based on an idea from Eivind Eklund.
|
49535 |
08-Aug-1999 |
phk |
Decommision miscfs/specfs/specdev.h. Most of it goes into <sys/conf.h>, a few lines into <sys/vnode.h>.
Add a few fields to struct specinfo, paving the way for the fun part.
|
49405 |
04-Aug-1999 |
peter |
Don't over-allocate and over-copy shorter NFSv2 filehandles and then correct the pointers afterwards.
It's kinda bogus that we generate a 24 (?) byte filehandle (2 x int32 fsid and 16 byte VFS fhandle) and pad it out to 64 bytes for NFSv3 with garbage. The whole point of NFSv3's variable filehandle length was to allow for shorter handles, both in memory and over the wire. I plan on taking a shot at fixing this shortly.
|
49232 |
29-Jul-1999 |
wpaul |
Correct the sanity test length calculation in nfsrv_readdirplus(): len is being incremented by 4 bytes too few each time through the loop, which allows more data into the mbuf chain that we really want. In the worst case, when we're using 32K read/write sizes with a TCP client, this causes readdirplus replies to sometimes exceed NFS_MAXPACKET which leads to a panic. This problem cropped up for me using an IRIX 6.5.4 NFSv3 TCP client with 32K read/write sizes, however supposedly it can be triggered by WinNT NFS servers too. In theory, it can probably be triggered by any NFS v3 implementation using TCP as long as it's using the maxiumum block size.
Reviewed by: Matthew Dillon <dillon@backplane.com>
|
49158 |
28-Jul-1999 |
alc |
Clear error in nfsrv_create when we have a valid reply so that that reply is actually transmitted. Submitted by: dillon
|
48859 |
17-Jul-1999 |
phk |
I have not one single time remembered the name of this function correctly so obviously I gave it the wrong name. s/umakedev/makeudev/g
|
48362 |
30-Jun-1999 |
julian |
Submitted by: "David E. Cross" <crossd@cs.rpi.edu> Matt missed a line..
|
48274 |
27-Jun-1999 |
peter |
Minor tweaks to make sure (new) prerequisites for <sys/buf.h> (mostly splbio()/splx()) are #included in time.
|
48225 |
26-Jun-1999 |
mckusick |
Convert buffer locking from using the B_BUSY and B_WANTED flags to using lockmgr locks. This commit should be functionally equivalent to the old semantics. That is, all buffer locking is done with LK_EXCLUSIVE requests. Changes to take advantage of LK_SHARED and LK_RECURSIVE will be done in future commits.
|
48125 |
23-Jun-1999 |
julian |
Matt's NFS fixes. Submitted by: Matt Dillon Reviewed by: David Cross, Julian Elischer, Mike Smith, Drew Gallatin 3.2 version to follow when tested
|
47751 |
05-Jun-1999 |
peter |
Various changes lifted from the OpenBSD cvs tree:
txdr_hyper and fxdr_hyper tweaks to avoid excessive CPU order knowledge.
nfs_serv.c: don't call nfsm_adj() with negative values, windows clients could crash servers when doing a readdir of a large directory.
nfs_socket.c: Use IP_PORTRANGE to get a priviliged port without a spin loop trying to bind(). Don't clobber a mbuf pointer or we get panics on a NFS3ERR_JUKEBOX error from a server when reusing a freed mbuf.
nfs_subs.c: Don't loose st_blocks on NFSv2 mounts when > 2GB.
Obtained from: OpenBSD
|
47028 |
11-May-1999 |
phk |
Divorce "dev_t" from the "major|minor" bitmap, which is now called udev_t in the kernel but still called dev_t in userland.
Provide functions to manipulate both types: major() umajor() minor() uminor() makedev() umakedev() dev2udev() udev2dev()
For now they're functions, they will become in-line functions after one of the next two steps in this process.
Return major/minor/makedev to macro-hood for userland.
Register a name in cdevsw[] for the "filedescriptor" driver.
In the kernel the udev_t appears in places where we have the major/minor number combination, (ie: a potential device: we may not have the driver nor the device), like in inodes, vattr, cdevsw registration and so on, whereas the dev_t appears where we carry around a reference to a actual device.
In the future the cdevsw and the aliased-from vnode will be hung directly from the dev_t, along with up to two softc pointers for the device driver and a few houskeeping bits. This will essentially replace the current "alias" check code (same buck, bigger bang).
A little stunt has been provided to try to catch places where the wrong type is being used (dev_t vs udev_t), if you see something not working, #undef DEVT_FASCIST in kern/kern_conf.c and see if it makes a difference. If it does, please try to track it down (many hands make light work) or at least try to reproduce it as simply as possible, and describe how to do that.
Without DEVT_FASCIST I belive this patch is a no-op.
Stylistic/posixoid comments about the userland view of the <sys/*.h> files welcome now, from userland they now contain the end result.
Next planned step: make all dev_t's refer to the same devsw[] which means convert BLK's to CHR's at the perimeter of the vnodes and other places where they enter the game (bootdev, mknod, sysctl).
|
46580 |
06-May-1999 |
phk |
remove b_proc from struct buf, it's (now) unused.
Reviewed by: dillon, bde
|
46568 |
06-May-1999 |
peter |
Add sufficient braces to keep egcs happy about potentially ambiguous if/else nesting.
|
46349 |
02-May-1999 |
alc |
The VFS/BIO subsystem contained a number of hacks in order to optimize piecemeal, middle-of-file writes for NFS. These hacks have caused no end of trouble, especially when combined with mmap(). I've removed them. Instead, NFS will issue a read-before-write to fully instantiate the struct buf containing the write. NFS does, however, optimize piecemeal appends to files. For most common file operations, you will not notice the difference. The sole remaining fragment in the VFS/BIO system is b_dirtyoff/end, which NFS uses to avoid cache coherency issues with read-merge-write style operations. NFS also optimizes the write-covers-entire-buffer case by avoiding the read-before-write. There is quite a bit of room for further optimization in these areas.
The VM system marks pages fully-valid (AKA vm_page_t->valid = VM_PAGE_BITS_ALL) in several places, most noteably in vm_fault. This is not correct operation. The vm_pager_get_pages() code is now responsible for marking VM pages all-valid. A number of VM helper routines have been added to aid in zeroing-out the invalid portions of a VM page prior to the page being marked all-valid. This operation is necessary to properly support mmap(). The zeroing occurs most often when dealing with file-EOF situations. Several bugs have been fixed in the NFS subsystem, including bits handling file and directory EOF situations and buf->b_flags consistancy issues relating to clearing B_ERROR & B_INVAL, and handling B_DONE.
getblk() and allocbuf() have been rewritten. B_CACHE operation is now formally defined in comments and more straightforward in implementation. B_CACHE for VMIO buffers is based on the validity of the backing store. B_CACHE for non-VMIO buffers is based simply on whether the buffer is B_INVAL or not (B_CACHE set if B_INVAL clear, and vise-versa). biodone() is now responsible for setting B_CACHE when a successful read completes. B_CACHE is also set when a bdwrite() is initiated and when a bwrite() is initiated. VFS VOP_BWRITE routines (there are only two - nfs_bwrite() and bwrite()) are now expected to set B_CACHE. This means that bowrite() and bawrite() also set B_CACHE indirectly.
There are a number of places in the code which were previously using buf->b_bufsize (which is DEV_BSIZE aligned) when they should have been using buf->b_bcount. These have been fixed. getblk() now clears B_DONE on return because the rest of the system is so bad about dealing with B_DONE.
Major fixes to NFS/TCP have been made. A server-side bug could cause requests to be lost by the server due to nfs_realign() overwriting other rpc's in the same TCP mbuf chain. The server's kernel must be recompiled to get the benefit of the fixes.
Submitted by: Matthew Dillon <dillon@apollo.backplane.com>
|
46155 |
28-Apr-1999 |
phk |
This Implements the mumbled about "Jail" feature.
This is a seriously beefed up chroot kind of thing. The process is jailed along the same lines as a chroot does it, but with additional tough restrictions imposed on what the superuser can do.
For all I know, it is safe to hand over the root bit inside a prison to the customer living in that prison, this is what it was developed for in fact: "real virtual servers".
Each prison has an ip number associated with it, which all IP communications will be coerced to use and each prison has its own hostname.
Needless to say, you need more RAM this way, but the advantage is that each customer can run their own particular version of apache and not stomp on the toes of their neighbors.
It generally does what one would expect, but setting up a jail still takes a little knowledge.
A few notes:
I have no scripts for setting up a jail, don't ask me for them.
The IP number should be an alias on one of the interfaces.
mount a /proc in each jail, it will make ps more useable.
/proc/<pid>/status tells the hostname of the prison for jailed processes.
Quotas are only sensible if you have a mountpoint per prison.
There are no privisions for stopping resource-hogging.
Some "#ifdef INET" and similar may be missing (send patches!)
If somebody wants to take it from here and develop it into more of a "virtual machine" they should be most welcome!
Tools, comments, patches & documentation most welcome.
Have fun...
Sponsored by: http://www.rndassociates.com/ Run for almost a year by: http://www.servetheweb.com/
|
46112 |
27-Apr-1999 |
phk |
Suser() simplification:
1: s/suser/suser_xxx/
2: Add new function: suser(struct proc *), prototyped in <sys/proc.h>.
3: s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/
The remaining suser_xxx() calls will be scrutinized and dealt with later.
There may be some unneeded #include <sys/cred.h>, but they are left as an exercise for Bruce.
More changes to the suser() API will come along with the "jail" code.
|
45996 |
24-Apr-1999 |
dt |
Fixed printf format errors on alpha.
|
44246 |
25-Feb-1999 |
peter |
Untangle the nfs send and receive queue locking a little. One lock routine was [ab]used for two different things, and you couldn't tell from the wait channel which one had wedged. Catch a few things missing from NFS_NOSERVER.
|
44112 |
18-Feb-1999 |
dfr |
Move the declaration of the vfs.nfs sysctl node outside an ifdef so that it builds if NFS_NOSERVER is defined.
Spotted by: Bruce Evans <bde@zeta.org.au>
|
44101 |
17-Feb-1999 |
bde |
Fixed bitrot in NFS_ACDEBUG option.
|
44078 |
16-Feb-1999 |
dfr |
* Change sysctl from using linker_set to construct its tree using SLISTs. This makes it possible to change the sysctl tree at runtime.
* Change KLD to find and register any sysctl nodes contained in the loaded file and to unregister them when the file is unloaded.
Reviewed by: Archie Cobbs <archie@whistle.com>, Peter Wemm <peter@netplex.com.au> (well they looked at it anyway)
|
43305 |
27-Jan-1999 |
dillon |
Fix warnings in preparation for adding -Wall -Wcast-qual to the kernel compile
|
42957 |
21-Jan-1999 |
dillon |
This is a rather large commit that encompasses the new swapper, changes to the VM system to support the new swapper, VM bug fixes, several VM optimizations, and some additional revamping of the VM code. The specific bug fixes will be documented with additional forced commits. This commit is somewhat rough in regards to code cleanup issues.
Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>
|
42315 |
05-Jan-1999 |
eivind |
Remove the 'waslocked' parameter to vfs_object_create().
|
42155 |
30-Dec-1998 |
hoek |
Silence -Wtrigraph.
Submitted by: Bradley Dunn <bradley@dunn.org> (pr: kern/8817)
|
42060 |
25-Dec-1998 |
dfr |
Fix for creating files on a Solaris 7 server with NFSv3 (the request was slightly garbled but older servers seemed to understand it).
Reviewed by: David O'Brien <obrien@nuxi.ucdavis.edu>
|
41796 |
14-Dec-1998 |
dt |
Added 3 new errno values, requred by various standards: EOVERFLOW, ECANCELED, EILSEQ.
Fixed ibcs2 and especially linux EIDRM and ENOMSG errno mapping. Reviewed by: Dan Nelson <dnelson@emsphone.com>
|
41619 |
09-Dec-1998 |
eivind |
Remove the if fixed in the last commit; bde quite correctly point out that it can never fail.
|
41606 |
08-Dec-1998 |
eivind |
Fix typo (; in "if (vp == NULL);").
|
41591 |
07-Dec-1998 |
archie |
The "easy" fixes for compiling the kernel -Wunused: remove unreferenced static and local variables, goto labels, and functions declared but not defined.
|
41132 |
13-Nov-1998 |
dfr |
Fix a panic in nfsrv_dorec() where a NULL pointer could be passed to free() sometimes.
Reviewed by: Eric Haug <ejh@eas.slu.edu>
|
41026 |
09-Nov-1998 |
peter |
Remove [apparently] bogus casts to u_long for the vnode_pager_setsize() second argument. np_size is a 64 bit int, so is the second arg. This might have caused needless 2G/4G file size problems.
I believe it was Bruce who queried this.
|
40792 |
31-Oct-1998 |
peter |
vm_object_page_clean() last arg changed from TRUE to OBJPC_SYNC. I'm not sure that this is necessary to be a sync write here since a VOP_FSYNC() follows and it will schedule, sort and complete the writes that the vm_object_page_clean() started (as I think I understand things).
|
40790 |
31-Oct-1998 |
peter |
Use TAILQ macros for clean/dirty block list processing. Set b_xflags rather than abusing the list next pointer with a magic number.
|
39788 |
29-Sep-1998 |
mckusick |
The code checks each fragment mark to see if it's valid; if the fragment is less than NFS_MINPACKET or greater than NFS_MAXPACKET in size, it barfs and, I think, drops the connection.
However, there's no guarantee that in a multi-fragment RPC, all the fragments will be at least as large as NFS_MINPACKET.
In fact, with the version of "tclnfs" we have here, which supports NFS over TCP, at least when built under SunOS 4.1.3 (i.e., with 4.1.3's user-mode ONC RPC library), I can *repeatably* cause "tclnfs" to send a request with more than one fragment, one of which is only 8 bytes long. I just do a 3877-byte write to a file, at an offset of 0.
The check that "slp->ns_reclen" is greater than or equal to NFS_MINPACKET serves no useful purpose - if the NFS server code can't handle packets < NFS_MINPACKET bytes, it can't handle them over *any* protocol, so the check has to be done above the RPC-over-TCP layer - and should be removed. Obtained from: Fix from Guy Harris, forwarded by Rick Macklem.
|
38894 |
07-Sep-1998 |
bde |
Made unloading of the nfs LKM sort of work. This is mainly to test detachment of vfs sysctls. Unloading of vfs LKMs doesn't actually work for any vfs, since it leaves garbage pointers to memory allocation control structures.
|
38866 |
05-Sep-1998 |
bde |
Instantiate `nfs_mount_type' in a standard file so that it is present when nfs is an LKM. Declare it in a header file. Don't forget to use it in non-Lite2 code. Initialize it to -1 instead of to 0, since 0 will soon be the mount type number for the first vfs loaded.
NetBSD uses strcmp() to avoid this ugly global.
|
38718 |
01-Sep-1998 |
luoqi |
Check for NULL pointer before freeing a struct sockaddr. m_freem() can handle NULL, buf free() can't.
|
38482 |
23-Aug-1998 |
wollman |
Yow! Completely change the way socket options are handled, eliminating another specialized mbuf type in the process. Also clean up some of the cruft surrounding IPFW, multicast routing, RSVP, and other ill-explored corners.
|
37997 |
01-Aug-1998 |
peter |
If we get an ENOBUFS from the network, it's normally transient network interface congestion (eg: nfs over a ppp link, etc). Don't log these for UDP mounts, and don't cause syscalls to fail with EINTR. This stops the 'nfs send error 55' warnings.
If the error is because the system is really hosed, this is the least of your problems...
|
37649 |
15-Jul-1998 |
bde |
Cast pointers to uintptr_t/intptr_t instead of to u_long/long, respectively. Most of the longs should probably have been u_longs, but this changes is just to prevent warnings about casts between pointers and integers of different sizes, not to fix poorly chosen types.
|
37339 |
02-Jul-1998 |
kato |
Moved `#ifndef NFS_NOSERVER' after including nfs.h.
|
37291 |
30-Jun-1998 |
jmg |
fix buildworld hopefully be3fore anyone complains...
NFS_*TIMO should possibly be converted to sysctl vars (jkh's suggestion), but in some cases it looks like nfs keeps a copy of the value in a struct
hash sizes are already ifdef'd KERNEL, so there aren't userland inpact from them...
|
37272 |
30-Jun-1998 |
jmg |
convert some nfs tunables to options, these are: NFS_MINATTRTIMO VREG attrib cache timeout in sec NFS_MAXATTRTIMO NFS_MINDIRATTRTIMO VDIR attrib cache timeout in sec NFS_MAXDIRATTRTIMO NFS_GATHERDELAY Default write gather delay (msec) NFS_UIDHASHSIZ Tune the size of nfssvc_sock with this NFS_WDELAYHASHSIZ and with this NFS_MUIDHASHSIZ Tune the size of nfsmount with this NFS_NOSERVER (already documented in LINT) NFS_DEBUG turn on NFS debugging
also, because NFS_ROOT is used by very different files, it has been renamed to opt_nfsroot.h instead of the old opt_nfs.h....
|
37089 |
21-Jun-1998 |
bde |
Fixed typo in ifdefed code. (NFS_ACDEBUG is not in LINT. Therefore, code controlled by it did not even compile.)
|
36979 |
14-Jun-1998 |
bde |
Avoid an egcs pessimization for 64-bit signed division on i386's. Pre-2.8 versions of gcc generate a call to __divdi3() for all 64-bit signed divisions, but egcs optimizes them to a shift and fixup when the divisor is a constant power of 2. Unfortunately, it generates a call to __cmpdi2() for the fixup, although all except possibly ancient versions of gcc and egcs do ordinary 64-bit comparisons inline.
|
36735 |
07-Jun-1998 |
dfr |
This commit fixes various 64bit portability problems required for FreeBSD/alpha. The most significant item is to change the command argument to ioctl functions from int to u_long. This change brings us inline with various other BSD versions. Driver writers may like to use (__FreeBSD_version == 300003) to detect this change.
The prototype FreeBSD/alpha machdep will follow in a couple of days time.
|
36541 |
31-May-1998 |
peter |
For the on-the-wire protocol, u_long -> u_int32_t; long -> int32_t; int -> int32_t; u_short -> u_int16_t. Also, use mode_t instead of u_short for storing modes (mode_t is a u_int16_t).
Obtained from: NetBSD
|
36540 |
31-May-1998 |
peter |
Support 'mount -u' remounts. This may require disconnecting and rebinding the socket. Certain mode changes are not allowed.
Obtained from: NetBSD
|
36539 |
31-May-1998 |
peter |
Cut-n-paste glitch
|
36534 |
31-May-1998 |
peter |
Prototype support for selectively allowing non-reserved ports on a per export basis. Needs userland support yet.
Obtained from: NetBSD
|
36533 |
31-May-1998 |
peter |
Hide whiteouts from NFS, since the protocol doesn't support them.
Obtained from: NetBSD
|
36532 |
31-May-1998 |
peter |
NetBSD has a comment that Solaris 2.5 doesn't do verifiers correctly, we have weakened this test already for Digital Unix, so it may be enough for Solaris. It needs to be checked again.
Obtained from: NetBSD
|
36531 |
31-May-1998 |
peter |
Don't pass a second copy of the uid/gid in with the v2/v3 sattr structures, it just makes more work. We pass a copy of the uid/gid with the credentials. (although, this may need to be revisited if a non AUTHUNIX authentication method (such as NFSKERB) ever gets implemented).
Obtained from: NetBSD
|
36530 |
31-May-1998 |
peter |
Use the new SB_UPCALL flag,
Obtained from: NetBSD (but I changed the flag clear order in case).
|
36520 |
31-May-1998 |
peter |
Don't try and free mrep twice on some error conditions.
Obtained from: NetBSD
|
36519 |
31-May-1998 |
peter |
#ifdef a diagnostic panic, plus another missed costmetic change.
Obtained from: NetBSD
|
36518 |
31-May-1998 |
peter |
We have gained 2 more errno's, add them to the NFSv2 mapping table.
|
36517 |
31-May-1998 |
peter |
Missed a cosmetic change that the other BSD's have.
|
36516 |
31-May-1998 |
peter |
oops, nfs_msg() is called from client code too.
|
36515 |
31-May-1998 |
peter |
When we can't reconnect a socket, don't forget to unlock before retrying or we can deadlock.
Obtained from: NetBSD
|
36514 |
31-May-1998 |
peter |
Don't log zero length reads, this can happen during normal operation.
Obtained from: NetBSD
|
36513 |
31-May-1998 |
peter |
Consider for readdir chunk sizes when tuning socket buffer reservations.
Obtained from: NetBSD
|
36512 |
31-May-1998 |
peter |
Refuse READDIR / READDIRPLUS rpc's for non-directories
Obtained from: NetBSD
|
36511 |
31-May-1998 |
peter |
Some const's
Obtained from: NetBSD
|
36503 |
31-May-1998 |
peter |
NFS Jumbo commit part 1. Cosmetic and structural changes only. The aim of this part of commits is to minimize unnecessary differences between the other NFS's of similar origin. Yes, there are gratuitous changes here that the style folks won't like, but it makes the catch-up less difficult.
|
36473 |
30-May-1998 |
peter |
When using NFSv3, use the remote server's idea of the maximum file size rather than assuming 2^64. It may not like files that big. :-) On the nfs server, calculate and report the max file size as the point that the block numbers in the cache would turn negative. (ie: 1099511627775 bytes (1TB)).
One of the things I'm worried about however, is that directory offsets are really cookies on a NFSv3 server and can be rather large, especially when/if the server generates the opaque directory cookies by using a local filesystem offset in what comes out as the upper 32 bits of the 64 bit cookie. (a server is free to do this, it could save byte swapping depending on the native 64 bit byte order)
Obtained from: NetBSD
|
36329 |
24-May-1998 |
peter |
Convert a couple of large allocations to use zones rather than malloc for better packing. This means that we can choose better values for the various hash entries without having to try and get it all to fit within an artificial power of two limit for malloc's sake.
|
36251 |
20-May-1998 |
peter |
Only ignore "owner" permissions selectively rather than always. In some cases we ignore it (eg: read/write) to maintain chmod-after-open semantics but in other cases we do care, eg: creating files, access() etc. Never ignore errors from VOP_ACCESS() on immutable files.
This apparently comes from BSDI (from Keith Bostic) via NetBSD.
PR: 5148 Submitted by: Yoshiro MIHIRA <sanpei@yy.cs.keio.ac.jp>
|
36176 |
19-May-1998 |
peter |
Allow control of the attribute cache timeouts at mount time.
We had run out of bits in the nfs mount flags, I have moved the internal state flags into a seperate variable. These are no longer visible via statfs(), but I don't know of anything that looks at them.
|
36097 |
16-May-1998 |
bde |
Get timespecs directly instead of via timevals.
|
35823 |
07-May-1998 |
msmith |
In the words of the submitter:
--------- Make callers of namei() responsible for releasing references or locks instead of having the underlying filesystems do it. This eliminates redundancy in all terminal filesystems and makes it possible for stacked transport layers such as umapfs or nullfs to operate correctly.
Quality testing was done with testvn, and lat_fs from the lmbench suite.
Some NFS client testing courtesy of Patrik Kudo.
vop_mknod and vop_symlink still release the returned vpp. vop_rename still releases 4 vnode arguments before it returns. These remaining cases will be corrected in the next set of patches. ---------
Submitted by: Michael Hancock <michaelh@cet.co.jp>
|
35066 |
06-Apr-1998 |
phk |
Use random() to find our initial xid.
|
34961 |
30-Mar-1998 |
phk |
Eradicate the variable "time" from the kernel, using various measures. "time" wasn't a atomic variable, so splfoo() protection were needed around any access to it, unless you just wanted the seconds part.
Most uses of time.tv_sec now uses the new variable time_second instead.
gettime() changed to getmicrotime(0.
Remove a couple of unneeded splfoo() protections, the new getmicrotime() is atomic, (until Bruce sets a breakpoint in it).
A couple of places needed random data, so use read_random() instead of mucking about with time which isn't random.
Add a new nfs_curusec() function.
Mark a couple of bogosities involving the now disappeard time variable.
Update ffs_update() to avoid the weird "== &time" checks, by fixing the one remaining call that passwd &time as args.
Change profiling in ncr.c to use ticks instead of time. Resolution is the same.
Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call hzto() which subtracts time" sequences.
Reviewed by: bde
|
33181 |
09-Feb-1998 |
eivind |
Staticize.
|
33134 |
06-Feb-1998 |
eivind |
Back out DIAGNOSTIC changes.
|
33108 |
04-Feb-1998 |
eivind |
Turn DIAGNOSTIC into a new-style option.
|
33058 |
03-Feb-1998 |
bde |
Added #include of <sys/queue.h> so that this file is more "self"-sufficent.
|
33054 |
03-Feb-1998 |
bde |
Forward declare some structs so that this file is more self-sufficient.
|
32998 |
01-Feb-1998 |
bde |
Moved declaration of `union nethostadr' outside of the KERNEL section, to give pollution compatible with <nfs/nqfs.h>. At least mount_nfs.c previously had to #define KERNEL before including <nfs/nfs.h> to get this pollution, but this gave other pollution.
Moved comment about NFSINT_SIGMASK to immediately before the code that it applies to.
|
32937 |
31-Jan-1998 |
dyson |
Change the busy page mgmt, so that when pages are freed, they MUST be PG_BUSY. It is bogus to free a page that isn't busy, because it is in a state of being "unavailable" when being freed. The additional advantage is that the page_remove code has a better cross-check that the page should be busy and unavailable for other use. There were some minor problems with the collapse code, and this plugs those subtile "holes."
Also, the vfs_bio code wasn't checking correctly for PG_BUSY pages. I am going to develop a more consistant scheme for grabbing pages, busy or otherwise. For now, we are stuck with the current morass.
|
32071 |
29-Dec-1997 |
dyson |
Lots of improvements, including restructring the caching and management of vnodes and objects. There are some metadata performance improvements that come along with this. There are also a few prototypes added when the need is noticed. Changes include:
1) Cleaning up vref, vget. 2) Removal of the object cache. 3) Nuke vnode_pager_uncache and friends, because they aren't needed anymore. 4) Correct some missing LK_RETRY's in vn_lock. 5) Correct the page range in the code for msync.
Be gentle, and please give me feedback asap.
|
32011 |
27-Dec-1997 |
bde |
Unspammed nested include of <vm/vm_zone.h>.
|
31886 |
20-Dec-1997 |
bde |
Added a used include.
Fixed a gratuitous ANSIism and nearby KNF violations.
|
31391 |
24-Nov-1997 |
bde |
Don't call malloc(..., M_WAITOK) at splnet(). Doing so is often a mistake (since softnet interrupts may occur if malloc() waits), and doing it harmlessly but unnecessarily here interfered with detection of the mistaken cases.
|
31016 |
07-Nov-1997 |
phk |
Remove a bunch of variables which were unused both in GENERIC and LINT.
Found by: -Wunused
|
30994 |
06-Nov-1997 |
phk |
Move the "retval" (3rd) parameter from all syscall functions and put it in struct proc instead.
This fixes a boatload of compiler warning, and removes a lot of cruft from the sources.
I have not removed the /*ARGSUSED*/, they will require some looking at.
libkvm, ps and other userland struct proc frobbing programs will need recompiled.
|
30813 |
28-Oct-1997 |
bde |
Removed unused #includes.
|
30808 |
28-Oct-1997 |
bde |
Don't #include <nfs/nfs.h> in <nfs/nfs_node.h> if KERNEL is defined. Fixed everything that depended on the nested include.
|
30738 |
26-Oct-1997 |
phk |
Always initialize the syscall vectors for our "private" syscalls (not just in the LKM case). Plug nqnfs_vop_lease_check directly into the default_vnodeop_p table.
|
30354 |
12-Oct-1997 |
phk |
Last major round (Unless Bruce thinks of somthing :-) of malloc changes.
Distribute all but the most fundamental malloc types. This time I also remembered the trick to making things static: Put "static" in front of them.
A couple of finer points by: bde
|
30309 |
11-Oct-1997 |
phk |
Distribute and statizice a lot of the malloc M_* types.
Substantial input from: bde
|
29653 |
21-Sep-1997 |
dyson |
Change the M_NAMEI allocations to use the zone allocator. This change plus the previous changes to use the zone allocator decrease the useage of malloc by half. The Zone allocator will be upgradeable to be able to use per CPU-pools, and has more intelligent usage of SPLs. Additionally, it has reasonable stats gathering capabilities, while making most calls inline.
|
29291 |
10-Sep-1997 |
phk |
Remove a couple of stubborn NetBSD #if's.
|
29288 |
10-Sep-1997 |
phk |
unifdef -U__NetBSD__ -D__FreeBSD__
|
29024 |
02-Sep-1997 |
bde |
Added used #include - don't depend on <sys/mbuf.h> including <sys/malloc.h> (unless we only use the bogusly shared M*WAIT flags).
|
28270 |
16-Aug-1997 |
wollman |
Fix all areas of the system (or at least all those in LINT) to avoid storing socket addresses in mbufs. (Socket buffers are the one exception.) A number of kernel APIs needed to get fixed in order to make this happen. Also, fix three protocol families which kept PCBs in mbufs to not malloc them instead. Delete some old compatibility cruft while we're at it, and add some new routines in the in_cksum family.
|
27845 |
02-Aug-1997 |
bde |
Removed unused #includes.
|
27609 |
22-Jul-1997 |
dfr |
Correct some dumb mistakes in the WebNFS stuff.
Submitted by: bde
|
27608 |
22-Jul-1997 |
dfr |
Allow NULL cookie verifiers for non-NULL offsets. This is needed for Digital Unix boxes since they appear to always send null verifiers.
|
27446 |
16-Jul-1997 |
dfr |
Merge WebNFS changes from NetBSD.
Obtained from: NetBSD
|
26952 |
25-Jun-1997 |
tegge |
Clear nfs_iodwant[myiod] when the nfsiod process exits due to a signal.
|
26637 |
14-Jun-1997 |
bde |
Don't require superuser privileges for creating fifos. The v2 case was broken when support for v3 was introduced in rev.1.16. The v3 case has always been broken in FreeBSD.
Should be in 2.2.
PR: 3838
|
26420 |
03-Jun-1997 |
dfr |
Various fixes from NetBSD:
Use u_int for rpc procedure numbers. Some fixes to NQNFS. A rare NULL pointer dereference. Ignore NFSMNT_NOCONN for TCP mounts.
Obtained from: NetBSD
|
26418 |
03-Jun-1997 |
dfr |
Implement the async mount option for NFSv3. This makes NFS pretend that all writes sent to the server were synchronous and therefore no commits are needed. This is the same as the vfs.nfs.async variable on the server but allows each client to choose whether to work this way.
Also make the vfs.nfs.async variable do the 'right' thing for NFSv3, i.e. pretend that the write was synchronous.
|
25930 |
19-May-1997 |
dfr |
Fix a few bugs with NFS and mmap caused by NFS' use of b_validoff and b_validend. The changes to vfs_bio.c are a bit ugly but hopefully can be tidied up later by a slight redesign.
PR: kern/2573, kern/2754, kern/3046 (possibly) Reviewed by: dyson
|
25781 |
13-May-1997 |
dfr |
Don't keep addresses in mbuf chains. This should simplify the next round of network changes from Garret.
Reviewed by: Garrett Wollman <wollman@khavrinen.lcs.mit.edu>
|
25664 |
10-May-1997 |
dfr |
Implement a separate control for write gathering on NFSv3. This is turned off for NFSv3 by default since write gathering seems to reduce performance for NFSv3 by up to 60%.
Add sysctl knobs to control both variables.
|
25663 |
10-May-1997 |
dfr |
Fix a nasty hang connected with write gathering. Also add debug print statements to bits of the server which helped me find the hang.
|
25307 |
30-Apr-1997 |
dfr |
Allow NULL rpcs on non-privileged ports at all times to work around broken clients.
PR: kern/3298 Submitted by: Tor Egge <Tor.Egge@idi.ntnu.no>
|
25201 |
27-Apr-1997 |
wollman |
The long-awaited mega-massive-network-code- cleanup. Part I.
This commit includes the following changes: 1) Old-style (pr_usrreq()) protocols are no longer supported, the compatibility glue for them is deleted, and the kernel will panic on boot if any are compiled in.
2) Certain protocol entry points are modified to take a process structure, so they they can easily tell whether or not it is possible to sleep, and also to access credentials.
3) SS_PRIV is no more, and with it goes the SO_PRIVSTATE setsockopt() call. Protocols should use the process pointer they are now passed.
4) The PF_LOCAL and PF_ROUTE families have been updated to use the new style, as has the `raw' skeleton family.
5) PF_LOCAL sockets now obey the process's umask when creating a socket in the filesystem.
As a result, LINT is now broken. I'm hoping that some enterprising hacker with a bit more time will either make the broken bits work (should be easy for netipx) or dike them out.
|
25089 |
22-Apr-1997 |
dfr |
Fix broken usage of nm_readdirsize and increase the socket buffers for UDP to prevent possible socket overflows.
2.2 candidate.
PR: kern/3304 Reviewed by: Thomas David Rivers <ponds!rivers@dg-rtp.dg.com>
|
24626 |
04-Apr-1997 |
dfr |
Fix various bugs in the locking protocol, allowing proper shared locks to be used. This should fix the lock panics that people are seeing.
|
24381 |
29-Mar-1997 |
bde |
Removed #include of <ufs/ufs/dir.h>. Nfs no longer depends on any ufs features, and the one thing that it depended on (DIRBLKSIZ) now has conflicting spelling.
|
24378 |
29-Mar-1997 |
bde |
Define our own version of DIRBLKSIZ instead of (ab)using ufs's value. Use the same value of 512 (ufs actually uses DEV_BSIZE). There are too many versions of DIRBLKSIZ, one for ufs, one for ext2fs, one for nfs, one for ibcs2, one for linux, one for applications, ... I think nfs's DIRBLKSIZ needs to be a divisor of the directory blocks sizes of all supported file systems. There is also NFS_DIRBLKSIZ, which is different from nfs's DIRBLKSIZ but is sometimes confused with it in comments.
Removed a bogus #ifdef KERNEL that hid the tunable constants for nfs. This came in undocumented with the Lite2 merge although it isn't in Lite2. It required more-bogus #define KERNEL's in fstat and pstat to make the constants visible.
Restored a spelling fix from rev.1.17.
Removed duplicate #defines of all the the NFS mount option flags.
|
24330 |
27-Mar-1997 |
guido |
Add code that will reject nfs requests in teh kernel from nonprivileged ports. This option will be automatically set/cleraed when mount is run without/with the -n option. Reviewed by: Doug Rabson
|
24250 |
25-Mar-1997 |
peter |
Use the correct (relative to the implementation) ordering of args in the VOP_LINK() calls, Closes PR#3064
Submitted by: bde
|
24249 |
25-Mar-1997 |
peter |
The local fs interface does not allow link()/unlink() of directories, do not allow a remote nfs client to cause local fs corruption either.
|
24101 |
22-Mar-1997 |
bde |
Fixed some invalid (non-atomic) accesses to `time', mostly ones of the form `tv = time'. Use a new function gettime(). The current version just forces atomicicity without fixing precision or efficiency bugs. Simplified some related valid accesses by using the central function.
|
22975 |
22-Feb-1997 |
peter |
Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.
|
22521 |
10-Feb-1997 |
dyson |
This is the kernel Lite/2 commit. There are some requisite userland changes, so don't expect to be able to run the kernel as-is (very well) without the appropriate Lite/2 userland changes.
The system boots and can mount UFS filesystems.
Untested: ext2fs, msdosfs, NFS Known problems: Incorrect Berkeley ID strings in some files. Mount_std mounts will not work until the getfsent library routine is changed.
Reviewed by: various people Submitted by: Jeffery Hsu <hsu@freebsd.org>
|
21673 |
14-Jan-1997 |
jkh |
Make the long-awaited change from $Id$ to $FreeBSD$
This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long.
Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
|
19449 |
06-Nov-1996 |
dfr |
Improve the queuing algorithms used by NFS' asynchronous i/o. The existing mechanism uses a global queue for some buffers and the vp->b_dirtyblkhd queue for others. This turns sequential writes into randomly ordered writes to the server, affecting both read and write performance. The existing mechanism also copes badly with hung servers, tending to block accesses to other servers when all the iods are waiting for a hung server.
The new mechanism uses a queue for each mount point. All asynchronous i/o goes through this queue which preserves the ordering of requests. A simple mechanism ensures that the iods are shared out fairly between active mount points. This removes the sysctl variable vfs.nfs.dwrite since the new queueing mechanism removes the old delayed write code completely.
This should go into the 2.2 branch.
|
18866 |
11-Oct-1996 |
dfr |
This fixes a problem with the nfs socket handling code which happens if a single process is performing a large number of requests (in this case writing a large file). The writing process could monopolise the recieve lock and prevent any other processes from recieving their replies.
It also adds a new sysctl variable 'vfs.nfs.dwrite' which controls the behaviour which originally pointed out the problem. When a process writes to a file over NFS, it usually arranges for another process (the 'iod') to perform the request. If no iods are available, then it turns the write into a 'delayed write' which is later picked up by the next iod to do a write request for that file. This can cause that particular iod to do a disproportionate number of requests from a single process which can harm performance on some NFS servers. The alternative is to perform the write synchronously in the context of the original writing process if no iod is avaiable for asynchronous writing.
The 'delayed write' behaviour is selected when vfs.nfs.dwrite=1 and the non-delayed behaviour is selected when vfs.nfs.dwrite=0. The default is vfs.nfs.dwrite=1; if many people tell me that performance is better if vfs.nfs.dwrite=0 then I will change the default.
Submitted by: Hidetoshi Shimokawa <simokawa@sat.t.u-tokyo.ac.jp>
|
18397 |
19-Sep-1996 |
nate |
In sys/time.h, struct timespec is defined as:
/* * Structure defined by POSIX.4 to be like a timeval. */ struct timespec { time_t ts_sec; /* seconds */ long ts_nsec; /* and nanoseconds */ };
The correct names of the fields are tv_sec and tv_nsec.
Reminded by: James Drobina <jdrobina@infinet.com>
|
18041 |
05-Sep-1996 |
dg |
Release an unneeded reference to a vnode that was gained in a VFS_VGET(). Fixes a readdirplus panic.
Submitted by: Doug Rabson <dfr@render.com>
|
18020 |
03-Sep-1996 |
bde |
Eliminated nested include of <sys/unistd.h> in <sys/file.h> in the kernel. Include it directly in the few places where it is used.
Reduced some #includes of <sys/file.h> to #includes of <sys/fcntl.h> or nothing.
|
17761 |
21-Aug-1996 |
dyson |
Even though this looks like it, this is not a complex code change. The interface into the "VMIO" system has changed to be more consistant and robust. Essentially, it is now no longer necessary to call vn_open to get merged VM/Buffer cache operation, and exceptional conditions such as merged operation of VBLK devices is simpler and more correct.
This code corrects a potentially large set of problems including the problems with ktrace output and loaded systems, file create/deletes, etc.
Most of the changes to NFS are cosmetic and name changes, eliminating a layer of subroutine calls. The direct calls to vput/vrele have been re-instituted for better cross platform compatibility.
Reviewed by: davidg
|
17186 |
16-Jul-1996 |
dfr |
Various fixes from frank@fwi.uva.nl (Frank van der Linden) via rick@snowhite.cis.uoguelph.ca:
1. Clear B_NEEDCOMMIT in nfs_write to make sure that dirty data is correctly send to the server. If a buffer was dirtied when it was in the B_DELWRI+B_NEEDCOMMIT state, the state of the buffer was left unchanged and when the buffer was later cleaned, just a commit rpc was made to the server to complete the previous write. Clearing B_NEEDCOMMIT ensures that another write is made to the server.
2. If a server returned a server (for whatever reason) returned an answer to a write RPC that implied that fewer bytes than requested were written, bad things would happen.
3. The setattr operation passed on the atime in stead of the mtime to the server. The fix is trivial.
4. XIDs always started at 0, but this caused some servers (older DEC OSF/1 3.0 so I've been told) who had very long-lasting XID caches to get confused if, after a reboot of a BSD client, RPCs came in with a XID that had in the past been used before from that client. Patch is to use the current time in seconds as a starting point for XIDs. The patch below is not perfect, because it requires the root fs to be mounted first. This is because of the check BSD systems do, comparing FS time to system time.
Reviewed by: Bruce Evans, Terry Lambert. Obtained from: frank@fwi.uva.nl (Frank van der Linden) via rick@snowhite.cis.uoguelph.ca
|
17096 |
11-Jul-1996 |
wollman |
Modify the kernel to use the new pr_usrreqs interface rather than the old pr_usrreq mechanism which was poorly designed and error-prone. This commit renames pr_usrreq to pr_ousrreq so that old code which depended on it would break in an obvious manner. This commit also implements the new interface for TCP, although the old function is left as an example (#ifdef'ed out). This commit ALSO fixes a longstanding bug in the TCP timer processing (introduced by davidg on 1995/04/12) which caused timer processing on a TCB to always stop after a single timer had expired (because it misinterpreted the return value from tcp_usrreq() to indicate that the TCB had been deleted). Finally, some code related to polling has been deleted from if.c because it is not relevant t -current and doesn't look at all like my current code.
|
16634 |
23-Jun-1996 |
bde |
Don't truncate minor or major numbers in the nfsv3 client.
|
16365 |
14-Jun-1996 |
phk |
Fix for NFS_NOSERVER
Poul mentioned that he thought this was some kind of timing problem, and that started me thinking. After a little poking around, I found that nfs_timer() was completely disabled when NFS_NOSERVER was #defined. But after looking at nfs_timer(), it seemed like it was something required by both the client and server code, and disabling it outright just didn't seem to make any sense. Parts of it relate only to the NFS server side code, so I disabled those, but I re-enabled the rest of the function and made sure that it would be called from nfs_init() (in nfs_subs.c).
With nfs_timer() re-enabled, everything seems to work again. The only other changes I made were to #ifdef away some variable declarations in the NFS_NOSERVER case so that gcc would stop complaining about unused variables.
Reviewed by: phk Submitted by: Bill Paul <wpaul@skynet.ctr.columbia.edu>
|
16226 |
08-Jun-1996 |
bde |
Fixed a vnode reference leak in nfsrv_rename(). The target inode wasn't released until the file system was unmounted. This bug also affected kern/vfs_syscalls.c but was fixed in rev.1.18 and rev.1.20 there.
Reviewed by: davidg
|
15480 |
30-Apr-1996 |
bde |
#include <sys/filedesc.h> explicitly instead of depending on it being bogusly included by <sys/socketvar.h>.
|
15479 |
30-Apr-1996 |
bde |
Fixed nfs sysctls. They missed out on the fs -> vfs name changes from Lite2. This broke nfsstat.
|
14093 |
13-Feb-1996 |
wollman |
Kill XNS. While we're at it, fix socreate() to take a process argument. (This was supposed to get committed days ago...)
|
13765 |
30-Jan-1996 |
mpp |
Fix a bunch of spelling errors in the comment fields of a bunch of system include files.
|
13490 |
19-Jan-1996 |
dyson |
Eliminated many redundant vm_map_lookup operations for vm_mmap. Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish overhead for merged cache. Efficiency improvement for vfs_cluster. It used to do alot of redundant calls to cluster_rbuild. Correct the ordering for vrele of .text and release of credentials. Use the selective tlb update for 486/586/P6. Numerous fixes to the size of objects allocated for files. Additionally, fixes in the various pagers. Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs. Fixes in the swap pager for exhausted resources. The pageout code will not as readily thrash. Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE), thereby improving efficiency of several routines. Eliminate even more unnecessary vm_page_protect operations. Significantly speed up process forks. Make vm_object_page_clean more efficient, thereby eliminating the pause that happens every 30seconds. Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the case of filesystems mounted async. Fix a panic with busy pages when write clustering is done for non-VMIO buffers.
|
13416 |
13-Jan-1996 |
phk |
Add an option NFS_NOSERVER which saves 100K in the install kernel (or any other kernel that uses it). Use with option NFS.
|
12911 |
17-Dec-1995 |
phk |
Staticize.
|
12662 |
07-Dec-1995 |
dg |
Untangled the vm.h include file spaghetti.
|
12588 |
03-Dec-1995 |
bde |
Completed function declarations and/or added prototypes and/or moved prototypes to the right place.
|
12457 |
21-Nov-1995 |
bde |
Completed function declarations, added prototypes and removed redundant declarations.
|
12453 |
21-Nov-1995 |
bde |
Completed function declarations and/or added prototypes.
|
12274 |
14-Nov-1995 |
bde |
Included <sys/sysproto.h> to get central declarations for syscall args structs and prototypes for syscalls.
Ifdefed duplicated decentralized declarations of args structs. It's convenient to have this visible but they are hard to maintain. Some are already different from the central declarations. 4.4lite2 puts them in comments in the function headers but I wanted to avoid the large changes for that.
|
11982 |
31-Oct-1995 |
joerg |
Include a prerequisite header (so this is consistent again with the NFSv2 state).
|
11921 |
29-Oct-1995 |
phk |
Second batch of cleanup changes. This time mostly making a lot of things static and some unused variables here and there.
|
10224 |
24-Aug-1995 |
dg |
Added NFS_ASYNC kernel option. It only has an effect for NFSv2.
|
10223 |
24-Aug-1995 |
dg |
Killed redundant declarations of nfsm_rpchead().
|
10222 |
24-Aug-1995 |
dfr |
Some fixes found using gcc -Wall:
nfsm_rpchead() has been called with the wrong number of args and misplaced args since someone added new args in the middle for nfsv3.
Here's another one that would be important on 64-bit systems. VOP_READDIR takes a `u_int **cookies' arg.
Submitted by: Bruce Evans <bde@zeta.org.au>
|
10219 |
24-Aug-1995 |
dfr |
Add support for amd direct maps.
Reviewed by: Thomas Graichen <graichen@sirius.physik.fu-berlin.de>
|
9966 |
06-Aug-1995 |
dg |
Fixed bug where vnode_pager_uncache() wasn't always called when it should be. The result was that the file's space wouldn't be properly freed when it was deleted.
Submitted by: John Dyson
|
9877 |
03-Aug-1995 |
dfr |
Slight changes to locking around VOP_READRIR. Detect in nfsrv_readdirplus when a filesystem soes not support VFS_VGET and return NFSERR_NOTSUPP so that the client will use ordinary readdir instead.
|
9854 |
02-Aug-1995 |
dfr |
Lock the directory vnode before VOP_READDIR in nfsrv_readdirplus
|
9842 |
01-Aug-1995 |
dg |
Removed my special-case hack for VOP_LINK and fixed the problem with the wrong vp's ops vector being used by changing the VOP_LINK's argument order. The special-case hack doesn't go far enough and breaks the generic bypass routine used in some non-leaf filesystems. Pointed out by Kirk McKusick.
|
9759 |
29-Jul-1995 |
bde |
Eliminate sloppy common-style declarations. There should be none left for the LINT configuation.
|
9588 |
20-Jul-1995 |
dg |
vnode_pager_alloc() never returns NULL, so don't check for it.
|
9507 |
13-Jul-1995 |
dg |
NOTE: libkvm, w, ps, 'top', and any other utility which depends on struct proc or any VM system structure will have to be rebuilt!!!
Much needed overhaul of the VM system. Included in this first round of changes:
1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages, haspage, and sync operations are supported. The haspage interface now provides information about clusterability. All pager routines now take struct vm_object's instead of "pagers".
2) Improved data structures. In the previous paradigm, there is constant confusion caused by pagers being both a data structure ("allocate a pager") and a collection of routines. The idea of a pager structure has escentially been eliminated. Objects now have types, and this type is used to index the appropriate pager. In most cases, items in the pager structure were duplicated in the object data structure and thus were unnecessary. In the few cases that remained, a un_pager structure union was created in the object to contain these items.
3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now be removed. For instance, vm_object_enter(), vm_object_lookup(), vm_object_remove(), and the associated object hash list were some of the things that were removed.
4) simple_lock's removed. Discussion with several people reveals that the SMP locking primitives used in the VM system aren't likely the mechanism that we'll be adopting. Even if it were, the locking that was in the code was very inadequate and would have to be mostly re-done anyway. The locking in a uni-processor kernel was a no-op but went a long way toward making the code difficult to read and debug.
5) Places that attempted to kludge-up the fact that we don't have kernel thread support have been fixed to reflect the reality that we are really dealing with processes, not threads. The VM system didn't have complete thread support, so the comments and mis-named routines were just wrong. We now use tsleep and wakeup directly in the lock routines, for instance.
6) Where appropriate, the pagers have been improved, especially in the pager_alloc routines. Most of the pager_allocs have been rewritten and are now faster and easier to maintain.
7) The pagedaemon pageout clustering algorithm has been rewritten and now tries harder to output an even number of pages before and after the requested page. This is sort of the reverse of the ideal pagein algorithm and should provide better overall performance.
8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup have been removed. Some other unnecessary casts have also been removed.
9) Some almost useless debugging code removed.
10) Terminology of shadow objects vs. backing objects straightened out. The fact that the vm_object data structure escentially had this backwards really confused things. The use of "shadow" and "backing object" throughout the code is now internally consistent and correct in the Mach terminology.
11) Several minor bug fixes, including one in the vm daemon that caused 0 RSS objects to not get purged as intended.
12) A "default pager" has now been created which cleans up the transition of objects to the "swap" type. The previous checks throughout the code for swp->pg_data != NULL were really ugly. This change also provides the rudiments for future backing of "anonymous" memory by something other than the swap pager (via the vnode pager, for example), and it allows the decision about which of these pagers to use to be made dynamically (although will need some additional decision code to do this, of course).
13) (dyson) MAP_COPY has been deprecated and the corresponding "copy object" code has been removed. MAP_COPY was undocumented and non- standard. It was furthermore broken in several ways which caused its behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will continue to work correctly, but via the slightly different semantics of MAP_PRIVATE.
14) (dyson) Sharing maps have been removed. It's marginal usefulness in a threads design can be worked around in other ways. Both #12 and #13 were done to simplify the code and improve readability and maintain- ability. (As were most all of these changes)
TODO:
1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing this will reduce the vnode pager to a mere fraction of its current size.
2) Rewrite vm_fault and the swap/vnode pagers to use the clustering information provided by the new haspage pager interface. This will substantially reduce the overhead by eliminating a large number of VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be improved to provide both a "behind" and "ahead" indication of contiguousness.
3) Implement the extended features of pager_haspage in swap_pager_haspage(). It currently just says 0 pages ahead/behind.
4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps via a much more general mechanism that could also be used for disk striping of regular filesystems.
5) Do something to improve the architecture of vm_object_collapse(). The fact that it makes calls into the swap pager and knows too much about how the swap pager operates really bothers me. It also doesn't allow for collapsing of non-swap pager objects ("unnamed" objects backed by other pagers).
|
9456 |
09-Jul-1995 |
dg |
Moved call to VOP_GETATTR() out of vnode_pager_alloc() and into the places that call vnode_pager_alloc() so that a failure return can be dealt with. This fixes a panic seen on NFS clients when a file being opened is deleted on the server before the open completes.
|
9356 |
28-Jun-1995 |
dg |
1) Converted v_vmdata to v_object. 2) Removed unnecessary vm_object_lookup()/pager_cache(object, TRUE) pairs after vnode_pager_alloc() calls - the object is already guaranteed to be persistent. 3) Removed some gratuitous casts.
|
9354 |
28-Jun-1995 |
dg |
Fixed VOP_LINK argument order botch.
|
9336 |
27-Jun-1995 |
dfr |
Changes to support version 3 of the NFS protocol. The version 2 support has been tested (client+server) against FreeBSD-2.0, IRIX 5.3 and FreeBSD-current (using a loopback mount). The version 2 support is stable AFAIK. The version 3 support has been tested with a loopback mount and minimally against an IRIX 5.3 server. It needs more testing and may have problems. I have patched amd to support the new variable length filehandles although it will still only use version 2 of the protocol.
Before booting a kernel with these changes, nfs clients will need to at least build and install /usr/sbin/mount_nfs. Servers will need to build and install /usr/sbin/mountd.
NFS diskless support is untested.
Obtained from: Rick Macklem <rick@snowhite.cis.uoguelph.ca>
|
9221 |
14-Jun-1995 |
joerg |
The duplicate information returned in fa_type and fa_mode is an ambiguity in the NFS version 2 protocol.
VREG should be taken literally as a regular file. If a server intents to return some type information differently in the upper bits of the mode field (e.g. for sockets, or FIFOs), NFSv2 mandates fa_type to be VNON. Anyway, we leave the examination of the mode bits even in the VREG case to avoid breakage for bogus servers, but we make sure that there are actually type bits set in the upper part of fa_mode (and failing that, trust the va_type field).
NFSv3 cleared the issue, and requires fa_mode to not contain any type information (while also introduing sockets and FIFOs for fa_type).
The fix has been tested against a variety of NFS servers. It fixes problems with the ``Tropic'' NFS server for Windows, while apparently not breaking anything.
Pointed-out by: scott@zorch.sf-bay.org (Scott Hazen Mueller)
|
9202 |
11-Jun-1995 |
rgrimes |
Merge RELENG_2_0_5 into HEAD
|
8876 |
30-May-1995 |
rgrimes |
Remove trailing whitespace.
|
8832 |
29-May-1995 |
dg |
Fixed some serious bugs that resulted in object reference counts not being handled correctly. This would manifest itself as "object deallocated too many times" panics and perhaps other strange inconsistencies on NFS servers.
Reviewed by: me, of course Submitted by: John Dyson
|
7969 |
21-Apr-1995 |
dyson |
Slight re-ordering of the creation of a vmio object to fix a condition that can cause NFS I/O failures.
|
7160 |
19-Mar-1995 |
dg |
Removed unnecessary call to vnode_pager_uncache(). We automatically clear the VTEXT flag after all mappers have finished with the object.
|
7110 |
17-Mar-1995 |
dg |
Changed some (incorrect) nfsrv_vput()'s back into regular vput()'s. This fixes the last of the known NQNFS problems (until I find more, that is :-)).
|
7090 |
16-Mar-1995 |
bde |
Add and move declarations to fix all of the warnings from `gcc -Wimplicit' (except in netccitt, netiso and netns) and most of the warnings from `gcc -Wnested-externs'. Fix all the bugs found. There were no serious ones.
|
6420 |
15-Feb-1995 |
phk |
YF fix.
|
6417 |
15-Feb-1995 |
dg |
Fixed two more bugs related to the merged cache changes.
Submitted by: John Dyson
|
6416 |
15-Feb-1995 |
dg |
Woops, change a nfsrv_vput back into a nfsrv_vrele.
Submitted by: John Dyson
|
6414 |
15-Feb-1995 |
dg |
Fixed three bugs related to the merged cache changes. The bugs likely would make NFS servers flakey - probably the cause of freefall's recent hangs.
Submitted by: John Dyson
|
6361 |
14-Feb-1995 |
phk |
YFfix +int nfsrv_vput __P(( struct vnode * )); +int nfsrv_vrele __P(( struct vnode * )); +int nfsrv_vmio __P(( struct vnode * ));
|
6210 |
06-Feb-1995 |
dg |
Changed order of release of vnode/object to fix a problem where the vnode is freed with an old object still attached (subsequently causing a panic). Fixes NFS server panic "object/pager mismatch".
Submitted by: John Dyson
|
5455 |
09-Jan-1995 |
dg |
These changes embody the support of the fully coherent merged VM buffer cache, much higher filesystem I/O performance, and much better paging performance. It represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are (mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to support the new VM/buffer scheme.
vfs_bio.c: Significant rewrite of most of vfs_bio to support the merged VM buffer cache scheme. The scheme is almost fully compatible with the old filesystem interface. Significant improvement in the number of opportunities for write clustering.
vfs_cluster.c, vfs_subr.c Upgrade and performance enhancements in vfs layer code to support merged VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c: Yet more improvements in the collapse code. Elimination of some windows that can cause list corruption.
vm_pageout.c: Fixed it, it really works better now. Somehow in 2.0, some "enhancements" broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of kernel PTs.
vm_glue.c Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the code doesn't need it anymore.
machdep.c Changes to better support the parameter values for the merged VM/buffer cache scheme.
machdep.c, kern_exec.c, vm_glue.c Implemented a seperate submap for temporary exec string space and another one to contain process upages. This eliminates all map fragmentation problems that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on busy buffers.
Submitted by: John Dyson and David Greenman
|
4067 |
02-Nov-1994 |
wollman |
Forward-declare a few structures to avoid warning messages.
|
3820 |
23-Oct-1994 |
wollman |
Implement fs.nfs MIB variables.
|
3664 |
17-Oct-1994 |
phk |
This is a bunch of changes from NetBSD. There are a couple of bug-fixes. But mostly it is changes to use the list-maintenance macros instead of doing the pointer-gymnastics by hand.
Obtained from: NetBSD
|
3305 |
02-Oct-1994 |
phk |
Prototyping and general gcc-shutting up. Gcc has one warning now which looks bad, I will get to it eventually, unless somebody beats me to it.
|
3167 |
28-Sep-1994 |
dfr |
Make NFS ask the filesystems for directory cookies instead of making them itself.
|
2997 |
22-Sep-1994 |
wollman |
Make NFS loadable.
|
2979 |
22-Sep-1994 |
wollman |
More loadable VFS changes:
- Make a number of filesystems work again when they are statically compiled (blush)
- FIFOs are no longer optional; ``options FIFO'' removed from distributed config files.
|
2384 |
29-Aug-1994 |
dg |
"bogus" fixes from 1.1.5 to work around some cache coherency problems.
|
2175 |
21-Aug-1994 |
paul |
More idempotency....... this is fun :-)
|
1828 |
04-Aug-1994 |
dg |
Made NFS attribute cache timeouts kernel config file tunable via NFS_MINATTRTIMO and NFS_MAXATTRTIMO.
|
1817 |
02-Aug-1994 |
dg |
Added $Id$
|
1549 |
25-May-1994 |
rgrimes |
The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.
Reviewed by: Rodney W. Grimes Submitted by: John Dyson and David Greenman
|
1541 |
24-May-1994 |
rgrimes |
BSD 4.4 Lite Kernel Sources
|