History log of /freebsd-10.1-release/sys/fs/nfsclient/
Revision Date Author Comments
272461 03-Oct-2014 gjb

Copy stable/10@r272459 to releng/10.1 as part of
the 10.1-RELEASE process.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


269398 01-Aug-2014 rmacklem

MFC: r268115
Merge the NFSv4.1 server code in projects/nfsv4.1-server over
into head. The code is not believed to have any effect
on the semantics of non-NFSv4.1 server behaviour.
It is a rather large merge, but I am hoping that there will
not be any regressions for the NFS server.


269283 30-Jul-2014 kib

MFC r268764:
Check for the cross-device cross-link attempt in the VFS, instead of
VOP_LINK() implemenations.


268580 13-Jul-2014 rmacklem

MFC: r268008
There might be a potential race condition for the NFSv4 client
when a newly created file has another open done on it that
update the open mode. This patch moves the code that updates
the open mode up into the block where the mutex is held to
ensure this cannot happen. No bug caused by this potential
race has been observed, but this fix is a safety belt to ensure
it cannot happen.


265620 07-May-2014 rmacklem

MFC: r264842
Modify the NFSv4 client's Pathconf RPC (actually a Getattr Op.)
so that it only does the RPC for names that are answered by the RPC.
Doing the RPC for other names is harmless, but unnecessary.


265470 06-May-2014 rmacklem

MFC: r264738
For an NFSv4 mount with the "nocto" option, don't get the
up to date file attributes upon close. This reduces the
Getattr RPC count by about 65% for software builds.


265469 06-May-2014 rmacklem

MFC: r264705, r264749
Modify the NFSv4 client create/mkdir RPC so that it acquires
post-create/mkdir directory attributes. This allows the RPC to
name cache the newly created directory and reduces the lookup RPC
count for applications creating a lot of directories.


265466 06-May-2014 rmacklem

MFC: r264681
Modify the NFSv4 client open/create RPC so that it acquires
post-open/create directory attributes. This allows the RPC to
name cache the newly created file and reduces the lookup RPC
count by about 10% for software builds.


265434 06-May-2014 rmacklem

MFC: r264672
Modify the Lookup RPC for NFSv4 so that it acquires directory
attributes. This allows the client to cache directory names
when they are looked up, reducing the Lookup RPC count by
about 40% for software builds.


260143 31-Dec-2013 rmacklem

MFC: r259801
The NFSv4 client was passing both the p and cred arguments to
nfsv4_fillattr() as NULLs for the Getattr callback. This caused
nfsv4_fillattr() to not fill in the Change attribute for the reply.
I believe this was a violation of the RFC, but had little effect on
server behaviour. This patch passes a non-NULL p argument to fix this.


260109 30-Dec-2013 rmacklem

MFC: r259771
The NFSv4.1 client didn't return NFSv4.1 specific error codes
for the Getattr and Recall callbacks. This patch fixes it.
Since the NFSv4.1 specific error codes would only happen for
abnormal circumstances, this patch has little effect, in practice.


260107 30-Dec-2013 rmacklem

MFC: r259084
For software builds, the NFS client does many small
synchronous (with FILE_SYNC) writes because non-contiguous
byte ranges in the same buffer cache block are being
written. This patch adds a new mount option "noncontigwr"
which allows the non-contiguous byte ranges to be combined,
with the dirty byte range becoming the superset of the bytes
that are dirty, if the file has not been file locked.
This reduces the number of writes significantly for software
builds. The only case where this change might break existing
applications is where an application is writing
non-overlapping byte ranges within the same buffer cache block
of a file from multiple clients concurrently.
Since such an application would normally do file locking on
the file, avoiding the byte range merge for files that have
been file locked should be sufficient for most (maybe all?) cases.


259238 11-Dec-2013 rmacklem

MFC: r257901
Fix an NFSv4.1 client specific case where a forced dismount would hang.
The hang occurred in nfsv4_setsequence() when it couldn't find an
available session slot and is fixed by checking for a forced dismount
in progress and just returning for this case.


256281 10-Oct-2013 gjb

Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation


255219 05-Sep-2013 pjd

Change the cap_rights_t type from uint64_t to a structure that we can extend
in the future in a backward compatible (API and ABI) way.

The cap_rights_t represents capability rights. We used to use one bit to
represent one right, but we are running out of spare bits. Currently the new
structure provides place for 114 rights (so 50 more than the previous
cap_rights_t), but it is possible to grow the structure to hold at least 285
rights, although we can make it even larger if 285 rights won't be enough.

The structure definition looks like this:

struct cap_rights {
uint64_t cr_rights[CAP_RIGHTS_VERSION + 2];
};

The initial CAP_RIGHTS_VERSION is 0.

The top two bits in the first element of the cr_rights[] array contain total
number of elements in the array - 2. This means if those two bits are equal to
0, we have 2 array elements.

The top two bits in all remaining array elements should be 0.
The next five bits in all array elements contain array index. Only one bit is
used and bit position in this five-bits range defines array index. This means
there can be at most five array elements in the future.

To define new right the CAPRIGHT() macro must be used. The macro takes two
arguments - an array index and a bit to set, eg.

#define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL)

We still support aliases that combine few rights, but the rights have to belong
to the same array element, eg:

#define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL)
#define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL)

#define CAP_FCHMODAT (CAP_FCHMOD | CAP_LOOKUP)

There is new API to manage the new cap_rights_t structure:

cap_rights_t *cap_rights_init(cap_rights_t *rights, ...);
void cap_rights_set(cap_rights_t *rights, ...);
void cap_rights_clear(cap_rights_t *rights, ...);
bool cap_rights_is_set(const cap_rights_t *rights, ...);

bool cap_rights_is_valid(const cap_rights_t *rights);
void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src);
void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src);
bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little);

Capability rights to the cap_rights_init(), cap_rights_set(),
cap_rights_clear() and cap_rights_is_set() functions are provided by
separating them with commas, eg:

cap_rights_t rights;

cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT);

There is no need to terminate the list of rights, as those functions are
actually macros that take care of the termination, eg:

#define cap_rights_set(rights, ...) \
__cap_rights_set((rights), __VA_ARGS__, 0ULL)
void __cap_rights_set(cap_rights_t *rights, ...);

Thanks to using one bit as an array index we can assert in those functions that
there are no two rights belonging to different array elements provided
together. For example this is illegal and will be detected, because CAP_LOOKUP
belongs to element 0 and CAP_PDKILL to element 1:

cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL);

Providing several rights that belongs to the same array's element this way is
correct, but is not advised. It should only be used for aliases definition.

This commit also breaks compatibility with some existing Capsicum system calls,
but I see no other way to do that. This should be fine as Capsicum is still
experimental and this change is not going to 9.x.

Sponsored by: The FreeBSD Foundation


255216 04-Sep-2013 rmacklem

Crashes have been observed for NFSv4.1 mounts when the system
is being shut down which were caused by the nfscbd_pool being
destroyed before the backchannel is disabled. This patch is
believed to fix the problem, by simply avoiding ever destroying
the nfscbd_pool. Since the NFS client module cannot be unloaded,
this should not cause a memory leak.

MFC after: 2 weeks


255136 01-Sep-2013 rmacklem

Forced dismounts of NFS mounts can fail when thread(s) are stuck
waiting for an RPC reply from the server while holding the mount
point busy (mnt_lockref incremented). This happens because dounmount()
msleep()s waiting for mnt_lockref to become 0, before calling
VFS_UNMOUNT(). This patch adds a new VFS operation called VFS_PURGE(),
which the NFS client implements as purging RPCs in progress. Making
this call before checking mnt_lockref fixes the problem, by ensuring
that the VOP_xxx() calls will fail and unbusy the mount point.

Reported by: sbruno
Reviewed by: kib
MFC after: 2 weeks


253049 09-Jul-2013 rmacklem

Add support for host-based (Kerberos 5 service principal) initiator
credentials to the kernel rpc. Modify the NFSv4 client to add
support for the gssname and allgssname mount options to use this
capability. Requires the gssd daemon to be running with the "-h" option.

Reviewed by: jhb


252528 03-Jul-2013 rmacklem

A problem with the old NFS client where large writes to large files
would sometimes result in a corrupted file was reported via email.
This problem appears to have been caused by r251719 (reverting
r251719 fixed the problem). Although I have not been able to
reproduce this problem, I suspect it is caused by another thread
increasing np->n_size after the mtx_unlock(&np->n_mtx) but before
the vnode_pager_setsize() call. Since the np->n_mtx mutex serializes
updates to np->n_size, doing the vnode_pager_setsize() with the
mutex locked appears to avoid the problem.
Unfortunately, vnode_pager_setsize() where the new size is smaller,
cannot be called with a mutex held.
This patch returns the semantics to be close to pre-r251719 (actually
pre-r248567, r248581, r248567 for the new client) such that the call to
vnode_pager_setsize() is only delayed until after the mutex is
unlocked when np->n_size is shrinking. Since the file is growing
when being written, I believe this will fix the corruption.
A better solution might be to replace the mutex with a sleep lock,
but that is a non-trivial conversion, so this fix is hoped to be
sufficient in the meantime.

Reported by: David G. Lawrence (dg@dglawrence.com)
Tested by: David G. Lawrence (to be done soon)
Reviewed by: kib
MFC after: 1 week


252100 22-Jun-2013 rmacklem

Fix r252074 so that it builds on 64bit arches.


252074 21-Jun-2013 rmacklem

The NFSv4.1 LayoutCommit operation requires a valid offset and length.
(0, 0 is not sufficient) This patch a loop for each file layout, using
the offset, length of each file layout in a separate LayoutCommit.


252072 21-Jun-2013 rmacklem

When the NFSv4.1 client is writing to a pNFS Data Server (DS), the
file's size attribute does not get updated. As such, it is necessary
to invalidate the attribute cache before clearing NMODIFIED for pNFS.

MFC after: 2 weeks


252067 21-Jun-2013 rmacklem

Since some NFSv4 servers enforce the requirement for a reserved port#,
enable use of the (no)resvport mount option for NFSv4. I had thought
that the RFC required that non-reserved port #s be allowed, but I couldn't
find it in the RFC.

MFC after: 2 weeks


251171 31-May-2013 jeff

- Convert the bufobj lock to rwlock.
- Use a shared bufobj lock in getblk() and inmem().
- Convert softdep's lk to rwlock to match the bufobj lock.
- Move INFREECNT to b_flags and protect it with the buf lock.
- Remove unnecessary locking around bremfree() and BKGRDINPROG.

Sponsored by: EMC / Isilon Storage Division
Discussed with: mckusick, kib, mdf


251079 28-May-2013 rmacklem

Post-r248567, there were times when the client would return a
truncated directory for some NFS servers. This turned out to
be because the size of a directory reported by an NFS server
can be smaller that the ufs-like directory created from the
RPC XDR in the client. This patch fixes the problem by changing
r248567 so that vnode_pager_setsize() is only done for regular files.

Reported and tested by: hartmut.brandt@dlr.de
Reviewed by: kib
MFC after: 1 week


250580 12-May-2013 rmacklem

Add support for the eofflag to nfs_readdir() in the new NFS
client so that it works under a unionfs mount.

Submitted by: Jared Yanovich (slovichon@gmail.com)
Reviewed by: kib
MFC after: 2 weeks


249630 18-Apr-2013 rmacklem

When an NFS unmount occurs, once vflush() writes the last dirty
buffer for the last vnode on the mount back to the server, it
returns. At that point, the code continues with the unmount,
including freeing up the nfs specific part of the mount structure.
It is possible that an nfsiod thread will try to check for an
empty I/O queue in the nfs specific part of the mount structure
after it has been free'd by the unmount. This patch avoids this problem by
setting the iodmount entries for the mount back to NULL while holding the
mutex in the unmount and checking the appropriate entry is non-NULL after
acquiring the mutex in the nfsiod thread.

Reported and tested by: pho
Reviewed by: kib
MFC after: 2 weeks


249623 18-Apr-2013 rmacklem

Both NFS clients can deadlock when using the "rdirplus" mount
option. This can occur when an nfsiod thread that already holds
a buffer lock attempts to acquire a vnode lock on an entry in
the directory (a LOR) when another thread holding the vnode lock
is waiting on an nfsiod thread. This patch avoids the deadlock by disabling
readahead for this case, so the nfsiod threads never do readdirplus.
Since readaheads for directories need the directory offset cookie
from the previous read, they cannot normally happen in parallel.
As such, testing by jhb@ and myself didn't find any performance
degredation when this patch is applied. If there is a case where
this results in a significant performance degradation, mounting
without the "rdirplus" option can be done to re-enable readahead
for directories.

Reported and tested by: jhb
Reviewed by: jhb
MFC after: 2 weeks


249592 17-Apr-2013 ken

Revamp the old NFS server's File Handle Affinity (FHA) code so that
it will work with either the old or new server.

The FHA code keeps a cache of currently active file handles for
NFSv2 and v3 requests, so that read and write requests for the same
file are directed to the same group of threads (reads) or thread
(writes). It does not currently work for NFSv4 requests. They are
more complex, and will take more work to support.

This improves read-ahead performance, especially with ZFS, if the
FHA tuning parameters are configured appropriately. Without the
FHA code, concurrent reads that are part of a sequential read from
a file will be directed to separate NFS threads. This has the
effect of confusing the ZFS zfetch (prefetch) code and makes
sequential reads significantly slower with clients like Linux that
do a lot of prefetching.

The FHA code has also been updated to direct write requests to nearby
file offsets to the same thread in the same way it batches reads,
and the FHA code will now also send writes to multiple threads when
needed.

This improves sequential write performance in ZFS, because writes
to a file are now more ordered. Since NFS writes (generally
less than 64K) are smaller than the typical ZFS record size
(usually 128K), out of order NFS writes to the same block can
trigger a read in ZFS. Sending them down the same thread increases
the odds of their being in order.

In order for multiple write threads per file in the FHA code to be
useful, writes in the NFS server have been changed to use a LK_SHARED
vnode lock, and upgrade that to LK_EXCLUSIVE if the filesystem
doesn't allow multiple writers to a file at once. ZFS is currently
the only filesystem that allows multiple writers to a file, because
it has internal file range locking. This change does not affect the
NFSv4 code.

This improves random write performance to a single file in ZFS, since
we can now have multiple writers inside ZFS at one time.

I have changed the default tuning parameters to a 22 bit (4MB)
window size (from 256K) and unlimited commands per thread as a
result of my benchmarking with ZFS.

The FHA code has been updated to allow configuring the tuning
parameters from loader tunable variables in addition to sysctl
variables. The read offset window calculation has been slightly
modified as well. Instead of having separate bins, each file
handle has a rolling window of bin_shift size. This minimizes
glitches in throughput when shifting from one bin to another.

sys/conf/files:
Add nfs_fha_new.c and nfs_fha_old.c. Compile nfs_fha.c
when either the old or the new NFS server is built.

sys/fs/nfs/nfsport.h,
sys/fs/nfs/nfs_commonport.c:
Bring in changes from Rick Macklem to newnfs_realign that
allow it to operate in blocking (M_WAITOK) or non-blocking
(M_NOWAIT) mode.

sys/fs/nfs/nfs_commonsubs.c,
sys/fs/nfs/nfs_var.h:
Bring in a change from Rick Macklem to allow telling
nfsm_dissect() whether or not to wait for mallocs.

sys/fs/nfs/nfsm_subs.h:
Bring in changes from Rick Macklem to create a new
nfsm_dissect_nonblock() inline function and
NFSM_DISSECT_NONBLOCK() macro.

sys/fs/nfs/nfs_commonkrpc.c,
sys/fs/nfsclient/nfs_clkrpc.c:
Add the malloc wait flag to a newnfs_realign() call.

sys/fs/nfsserver/nfs_nfsdkrpc.c:
Setup the new NFS server's RPC thread pool so that it will
call the FHA code.

Add the malloc flag argument to newnfs_realign().

Unstaticize newnfs_nfsv3_procid[] so that we can use it in
the FHA code.

sys/fs/nfsserver/nfs_nfsdsocket.c:
In nfsrvd_dorpc(), add NFSPROC_WRITE to the list of RPC types
that use the LK_SHARED lock type.

sys/fs/nfsserver/nfs_nfsdport.c:
In nfsd_fhtovp(), if we're starting a write, check to see
whether the underlying filesystem supports shared writes.
If not, upgrade the lock type from LK_SHARED to LK_EXCLUSIVE.

sys/nfsserver/nfs_fha.c:
Remove all code that is specific to the NFS server
implementation. Anything that is server-specific is now
accessed through a callback supplied by that server's FHA
shim in the new softc.

There are now separate sysctls and tunables for the FHA
implementations for the old and new NFS servers. The new
NFS server has its tunables under vfs.nfsd.fha, the old
NFS server's tunables are under vfs.nfsrv.fha as before.

In fha_extract_info(), use callouts for all server-specific
code. Getting file handles and offsets is now done in the
individual server's shim module.

In fha_hash_entry_choose_thread(), change the way we decide
whether two reads are in proximity to each other.
Previously, the calculation was a simple shift operation to
see whether the offsets were in the same power of 2 bucket.
The issue was that there would be a bucket (and therefore
thread) transition, even if the reads were in close
proximity. When there is a thread transition, reads wind
up going somewhat out of order, and ZFS gets confused.

The new calculation simply tries to see whether the offsets
are within 1 << bin_shift of each other. If they are, the
reads will be sent to the same thread.

The effect of this change is that for sequential reads, if
the client doesn't exceed the max_reqs_per_nfsd parameter
and the bin_shift is set to a reasonable value (22, or
4MB works well in my tests), the reads in any sequential
stream will largely be confined to a single thread.

Change fha_assign() so that it takes a softc argument. It
is now called from the individual server's shim code, which
will pass in the softc.

Change fhe_stats_sysctl() so that it takes a softc
parameter. It is now called from the individual server's
shim code. Add the current offset to the list of things
printed out about each active thread.

Change the num_reads and num_writes counters in the
fha_hash_entry structure to 32-bit values, and rename them
num_rw and num_exclusive, respectively, to reflect their
changed usage.

Add an enable sysctl and tunable that allows the user to
disable the FHA code (when vfs.XXX.fha.enable = 0). This
is useful for before/after performance comparisons.

nfs_fha.h:
Move most structure definitions out of nfs_fha.c and into
the header file, so that the individual server shims can
see them.

Change the default bin_shift to 22 (4MB) instead of 18
(256K). Allow unlimited commands per thread.

sys/nfsserver/nfs_fha_old.c,
sys/nfsserver/nfs_fha_old.h,
sys/fs/nfsserver/nfs_fha_new.c,
sys/fs/nfsserver/nfs_fha_new.h:
Add shims for the old and new NFS servers to interface with
the FHA code, and callbacks for the

The shims contain all of the code and definitions that are
specific to the NFS servers.

They setup the server-specific callbacks and set the server
name for the sysctl and loader tunable variables.

sys/nfsserver/nfs_srvkrpc.c:
Configure the RPC code to call fhaold_assign() instead of
fha_assign().

sys/modules/nfsd/Makefile:
Add nfs_fha.c and nfs_fha_new.c.

sys/modules/nfsserver/Makefile:
Add nfs_fha_old.c.

Reviewed by: rmacklem
Sponsored by: Spectra Logic
MFC after: 2 weeks


248967 01-Apr-2013 kib

Strip the unnneeded spaces, mostly at the end of lines.

MFC after: 3 days


248581 21-Mar-2013 kib

Initialize the variable to avoid (false) compiler warning about
use of an uninitialized local.

Reported by: Ivan Klymenko <fidaj@ukr.net>
MFC after: 2 weeks


248567 21-Mar-2013 kib

Do not call vnode_pager_setsize() while a NFS node mutex is
locked. vnode_pager_setsize() might sleep waiting for the page after
EOF be unbusied.

Call vnode_pager_setsize() both for the regular and directory vnodes.

Reported by: mich
Reviewed by: rmacklem
Discussed with: avg, jhb
MFC after: 2 weeks


248500 19-Mar-2013 emaste

Fix remainder calculation when biosize is not a power of 2

In common configurations biosize is a power of two, but is not required to
be so. Thanks to markj@ for spotting an additional case beyond my original
patch.

Reviewed by: rmacklem@


248255 13-Mar-2013 jhb

Revert 195703 and 195821 as this special stop handling in NFS is now
implemented via VFCF_SBDRY rather than passing PBDRY to individual
sleep calls.


248084 09-Mar-2013 attilio

Switch the vm_object mutex to be a rwlock. This will enable in the
future further optimizations where the vm_object lock will be held
in read mode most of the time the page cache resident pool of pages
are accessed for reading purposes.

The change is mostly mechanical but few notes are reported:
* The KPI changes as follow:
- VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK()
- VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK()
- VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK()
- VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED()
(in order to avoid visibility of implementation details)
- The read-mode operations are added:
VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(),
VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED()
* The vm/vm_pager.h namespace pollution avoidance (forcing requiring
sys/mutex.h in consumers directly to cater its inlining functions
using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h
consumers now must include also sys/rwlock.h.
* zfs requires a quite convoluted fix to include FreeBSD rwlocks into
the compat layer because the name clash between FreeBSD and solaris
versions must be avoided.
At this purpose zfs redefines the vm_object locking functions
directly, isolating the FreeBSD components in specific compat stubs.

The KPI results heavilly broken by this commit. Thirdy part ports must
be updated accordingly (I can think off-hand of VirtualBox, for example).

Sponsored by: EMC / Isilon storage division
Reviewed by: jeff
Reviewed by: pjd (ZFS specific review)
Discussed with: alc
Tested by: pho


247602 02-Mar-2013 pjd

Merge Capsicum overhaul:

- Capability is no longer separate descriptor type. Now every descriptor
has set of its own capability rights.

- The cap_new(2) system call is left, but it is no longer documented and
should not be used in new code.

- The new syscall cap_rights_limit(2) should be used instead of
cap_new(2), which limits capability rights of the given descriptor
without creating a new one.

- The cap_getrights(2) syscall is renamed to cap_rights_get(2).

- If CAP_IOCTL capability right is present we can further reduce allowed
ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed
ioctls can be retrived with cap_ioctls_get(2) syscall.

- If CAP_FCNTL capability right is present we can further reduce fcntls
that can be used with the new cap_fcntls_limit(2) syscall and retrive
them with cap_fcntls_get(2).

- To support ioctl and fcntl white-listing the filedesc structure was
heavly modified.

- The audit subsystem, kdump and procstat tools were updated to
recognize new syscalls.

- Capability rights were revised and eventhough I tried hard to provide
backward API and ABI compatibility there are some incompatible changes
that are described in detail below:

CAP_CREATE old behaviour:
- Allow for openat(2)+O_CREAT.
- Allow for linkat(2).
- Allow for symlinkat(2).
CAP_CREATE new behaviour:
- Allow for openat(2)+O_CREAT.

Added CAP_LINKAT:
- Allow for linkat(2). ABI: Reuses CAP_RMDIR bit.
- Allow to be target for renameat(2).

Added CAP_SYMLINKAT:
- Allow for symlinkat(2).

Removed CAP_DELETE. Old behaviour:
- Allow for unlinkat(2) when removing non-directory object.
- Allow to be source for renameat(2).

Removed CAP_RMDIR. Old behaviour:
- Allow for unlinkat(2) when removing directory.

Added CAP_RENAMEAT:
- Required for source directory for the renameat(2) syscall.

Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR):
- Allow for unlinkat(2) on any object.
- Required if target of renameat(2) exists and will be removed by this
call.

Removed CAP_MAPEXEC.

CAP_MMAP old behaviour:
- Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and
PROT_WRITE.
CAP_MMAP new behaviour:
- Allow for mmap(2)+PROT_NONE.

Added CAP_MMAP_R:
- Allow for mmap(PROT_READ).
Added CAP_MMAP_W:
- Allow for mmap(PROT_WRITE).
Added CAP_MMAP_X:
- Allow for mmap(PROT_EXEC).
Added CAP_MMAP_RW:
- Allow for mmap(PROT_READ | PROT_WRITE).
Added CAP_MMAP_RX:
- Allow for mmap(PROT_READ | PROT_EXEC).
Added CAP_MMAP_WX:
- Allow for mmap(PROT_WRITE | PROT_EXEC).
Added CAP_MMAP_RWX:
- Allow for mmap(PROT_READ | PROT_WRITE | PROT_EXEC).

Renamed CAP_MKDIR to CAP_MKDIRAT.
Renamed CAP_MKFIFO to CAP_MKFIFOAT.
Renamed CAP_MKNODE to CAP_MKNODEAT.

CAP_READ old behaviour:
- Allow pread(2).
- Disallow read(2), readv(2) (if there is no CAP_SEEK).
CAP_READ new behaviour:
- Allow read(2), readv(2).
- Disallow pread(2) (CAP_SEEK was also required).

CAP_WRITE old behaviour:
- Allow pwrite(2).
- Disallow write(2), writev(2) (if there is no CAP_SEEK).
CAP_WRITE new behaviour:
- Allow write(2), writev(2).
- Disallow pwrite(2) (CAP_SEEK was also required).

Added convinient defines:

#define CAP_PREAD (CAP_SEEK | CAP_READ)
#define CAP_PWRITE (CAP_SEEK | CAP_WRITE)
#define CAP_MMAP_R (CAP_MMAP | CAP_SEEK | CAP_READ)
#define CAP_MMAP_W (CAP_MMAP | CAP_SEEK | CAP_WRITE)
#define CAP_MMAP_X (CAP_MMAP | CAP_SEEK | 0x0000000000000008ULL)
#define CAP_MMAP_RW (CAP_MMAP_R | CAP_MMAP_W)
#define CAP_MMAP_RX (CAP_MMAP_R | CAP_MMAP_X)
#define CAP_MMAP_WX (CAP_MMAP_W | CAP_MMAP_X)
#define CAP_MMAP_RWX (CAP_MMAP_R | CAP_MMAP_W | CAP_MMAP_X)
#define CAP_RECV CAP_READ
#define CAP_SEND CAP_WRITE

#define CAP_SOCK_CLIENT \
(CAP_CONNECT | CAP_GETPEERNAME | CAP_GETSOCKNAME | CAP_GETSOCKOPT | \
CAP_PEELOFF | CAP_RECV | CAP_SEND | CAP_SETSOCKOPT | CAP_SHUTDOWN)
#define CAP_SOCK_SERVER \
(CAP_ACCEPT | CAP_BIND | CAP_GETPEERNAME | CAP_GETSOCKNAME | \
CAP_GETSOCKOPT | CAP_LISTEN | CAP_PEELOFF | CAP_RECV | CAP_SEND | \
CAP_SETSOCKOPT | CAP_SHUTDOWN)

Added defines for backward API compatibility:

#define CAP_MAPEXEC CAP_MMAP_X
#define CAP_DELETE CAP_UNLINKAT
#define CAP_MKDIR CAP_MKDIRAT
#define CAP_RMDIR CAP_UNLINKAT
#define CAP_MKFIFO CAP_MKFIFOAT
#define CAP_MKNOD CAP_MKNODAT
#define CAP_SOCK_ALL (CAP_SOCK_CLIENT | CAP_SOCK_SERVER)

Sponsored by: The FreeBSD Foundation
Reviewed by: Christoph Mallon <christoph.mallon@gmx.de>
Many aspects discussed with: rwatson, benl, jonathan
ABI compatibility discussed with: kib


247116 21-Feb-2013 jhb

Further refine the handling of stop signals in the NFS client. The
changes in r246417 were incomplete as they did not add explicit calls to
sigdeferstop() around all the places that previously passed SBDRY to
_sleep(). In addition, nfs_getcacheblk() could trigger a write RPC from
getblk() resulting in sigdeferstop() recursing. Rather than manually
deferring stop signals in specific places, change the VFS_*() and VOP_*()
methods to defer stop signals for filesystems which request this behavior
via a new VFCF_SBDRY flag. Note that this has to be a VFC flag rather than
a MNTK flag so that it works properly with VFS_MOUNT() when the mount is
not yet fully constructed. For now, only the NFS clients are set this new
flag in VFS_SET().

A few other related changes:
- Add an assertion to ensure that TDF_SBDRY doesn't leak to userland.
- When a lookup request uses VOP_READLINK() to follow a symlink, mark
the request as being on behalf of the thread performing the lookup
(cnp_thread) rather than using a NULL thread pointer. This causes
NFS to properly handle signals during this VOP on an interruptible
mount.

PR: kern/176179
Reported by: Russell Cattelan (sigdeferstop() recursion)
Reviewed by: kib
MFC after: 1 month


247072 21-Feb-2013 imp

The request queue is already locked, so we don't need the splsofclock/splx
here to note future work.


245977 27-Jan-2013 kib

Be conservative and do not try to consume more bytes than was
requested from the server for the read operation. Server shall not
reply with too large size, but client should be resilent too.

Reviewed by: rmacklem
MFC after: 1 week


245909 25-Jan-2013 jhb

Further cleanups to use of timestamps in NFS:
- Use NFSD_MONOSEC (which maps to time_uptime) instead of the seconds
portion of wall-time stamps to manage timeouts on events.
- Remove unused nd_starttime from the per-request structure in the new
NFS server.
- Use nanotime() for the modification time on a delegation to get as
precise a time as possible.
- Use time_second instead of extracting the second from a call to
getmicrotime().

Submitted by: bde (3)
Reviewed by: bde, rmacklem
MFC after: 2 weeks


245611 18-Jan-2013 jhb

Use vfs_timestamp() to set file timestamps rather than invoking
getmicrotime() or getnanotime() directly in NFS.

Reviewed by: rmacklem, bde
MFC after: 1 week


245566 17-Jan-2013 jhb

Remove a no-longer-used variable after the previous change to use
VA_UTIMES_NULL.

Submitted by: bde, rmacklem
MFC after: 1 week


245508 16-Jan-2013 jhb

Use the VA_UTIMES_NULL flag to detect when NULL was passed to utimes()
instead of comparing the desired time against the current time as a
heuristic.

Reviewed by: rmacklem
MFC after: 1 week


244056 09-Dec-2012 rmacklem

Add "nfsstat -m" support for the two new NFS mount options
added by r244042.


244042 08-Dec-2012 rmacklem

Move the NFSv4.1 client patches over from projects/nfsv4.1-client
to head. I don't think the NFS client behaviour will change unless
the new "minorversion=1" mount option is used. It includes basic
NFSv4.1 support plus support for pNFS using the Files Layout only.
All problems detecting during an NFSv4.1 Bakeathon testing event
in June 2012 have been resolved in this code and it has been tested
against the NFSv4.1 server available to me.
Although not reviewed, I believe that kib@ has looked at it.


243882 05-Dec-2012 glebius

Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags within sys.

Exceptions:

- sys/contrib not touched
- sys/mbuf.h edited manually


243782 02-Dec-2012 rmacklem

Add an nfssvc() option to the kernel for the new NFS client
which dumps out the actual options being used by an NFS mount.
This will be used to implement a "-m" option for nfsstat(1).

Reviewed by: alfred
MFC after: 2 weeks


243311 19-Nov-2012 attilio

r16312 is not any longer real since many years (likely since when VFS
received granular locking) but the comment present in UFS has been
copied all over other filesystems code incorrectly for several times.

Removes comments that makes no sense now.

Reviewed by: kib
MFC after: 3 days


243142 16-Nov-2012 kib

In pget(9), if PGET_NOTWEXIT flag is not specified, also search the
zombie list for the pid. This allows several kern.proc sysctls to
report useful information for zombies.

Hold the allproc_lock around all searches instead of relocking it.
Remove private pfind_locked() from the new nfs client code.

Requested and reviewed by: pjd
Tested by: pho
MFC after: 3 weeks


242833 09-Nov-2012 attilio

Complete MPSAFE VFS interface and remove MNTK_MPSAFE flag.
Porters should refer to __FreeBSD_version 1000021 for this change as
it may have happened at the same timeframe.


240720 20-Sep-2012 rmacklem

Modify the NFSv4 client so that it can handle owner
and owner_group strings that consist entirely of
digits, interpreting them as the uid/gid number.
This change was needed since new (>= 3.3) Linux
servers reply with these strings by default.
This change is mandated by the rfc3530bis draft.
Reported on freebsd-stable@ under the Subject
heading "Problem with Linux >= 3.3 as NFSv4 server"
by Norbert Aschendorff on Aug. 20, 2012.

Tested by: norbert.aschendorff at yahoo.de
Reviewed by: jhb
MFC after: 2 weeks


240289 09-Sep-2012 rmacklem

Add a simple printf() based debug facility to the new nfs client.
Use it for a printf() that can be harmlessly generated for mmap()'d
files. It will be used extensively for the NFSv4.1 client.
Debugging printf()s are enabled by setting vfs.nfs.debuglevel to
a non-zero value. The higher the value, the more debugging printf()s.

Reviewed by: jhb
MFC after: 2 weeks


239246 14-Aug-2012 kib

Do not leave invalid pages in the object after the short read for a
network file systems (not only NFS proper). Short reads cause pages
other then the requested one, which were not filled by read response,
to stay invalid.

Change the vm_page_readahead_finish() interface to not take the error
code, but instead to make a decision to free or to (de)activate the
page only by its validity. As result, not requested invalid pages are
freed even if the read RPC indicated success.

Noted and reviewed by: alc
MFC after: 1 week


239065 05-Aug-2012 kib

After the PHYS_TO_VM_PAGE() function was de-inlined, the main reason
to pull vm_param.h was removed. Other big dependency of vm_page.h on
vm_param.h are PA_LOCK* definitions, which are only needed for
in-kernel code, because modules use KBI-safe functions to lock the
pages.

Stop including vm_param.h into vm_page.h. Include vm_param.h
explicitely for the kernel code which needs it.

Suggested and reviewed by: alc
MFC after: 2 weeks


239040 04-Aug-2012 kib

Reduce code duplication and exposure of direct access to struct
vm_page oflags by providing helper function
vm_page_readahead_finish(), which handles completed reads for pages
with indexes other then the requested one, for VOP_GETPAGES().

Reviewed by: alc
MFC after: 1 week


237987 02-Jul-2012 kib

Do not override an error from uiomove() with (non-)error result from
bwrite(). VFS needs to know about EFAULT from uiomove() and does not
care much that partially filled block writeback after EFAULT was
successfull. Early return without error causes short write to be
reported to usermode.

Reported and tested by: andreast
MFC after: 3 weeks


237367 21-Jun-2012 kib

Enable deadlock avoidance code for NFS client.

MFC after: 2 weeks


237244 18-Jun-2012 rmacklem

Fix the NFSv4 client for the case where mmap'd files are
written, but not msync'd by a process. A VOP_PUTPAGES()
called when VOP_RECLAIM() happens will usually fail, since
the NFSv4 Open has already been closed by VOP_INACTIVE().
Add a vm_object_page_clean() call to the NFSv4 client's
VOP_INACTIVE(), so that the write happens before the NFSv4
Open is closed. kib@ suggested using vgone() instead and
I will explore this, but this patch fixes things in the
meantime. For some reason, the VOP_PUTPAGES() is still
attaempted in VOP_RECLAIM(), but having this fail doesn't
cause any problems except a "stateid0 in write" being logged.

Reviewed by: kib
MFC after: 1 week


237200 17-Jun-2012 rmacklem

Move the nfsrpc_close() call in ncl_reclaim() for the NFSv4 client
to below the vnode_destroy_vobject() call, since that is where
writes are flushed.

Suggested by: kib
MFC after: 1 week


236687 06-Jun-2012 kib

Improve handling of uiomove(9) errors for the NFS client.

Do not brelse() the buffer unconditionally with BIO_ERROR set if
uiomove() failed. The brelse() treats most buffers with BIO_ERROR as
B_INVAL, dropping their content. Instead, if the write request
covered the whole buffer, remember the cached state and brelse() with
BIO_ERROR set only if the buffer was not cached previously.

Update the buffer dirtyoff/dirtyend based on the progress recorded by
uiomove() in passed struct uio, even in the presence of
error. Otherwise, usermode could see changed data in the backed pages,
but later the buffer is destroyed without write-back.

If uiomove() failed for IO_UNIT request, try to truncate the vnode
back to the pre-write state, and rewind the progress in passed uio
accordingly, following the FFS behaviour.

Reviewed by: rmacklem (some time ago)
Tested by: pho
MFC after: 1 month


236313 30-May-2012 kib

Capitalize start of sentence.

MFC after: 3 days


235332 12-May-2012 rmacklem

PR# 165923 reported intermittent write failures for dirty
memory mapped pages being written back on an NFS mount.
Since any thread can call VOP_PUTPAGES() to write back a
dirty page, the credentials of that thread may not have
write access to the file on an NFS server. (Often the uid
is 0, which may be mapped to "nobody" in the NFS server.)
Although there is no completely correct fix for this
(NFS servers check access on every write RPC instead of at
open/mmap time), this patch avoids the common cases by
holding onto a credential that recently opened the file
for writing and uses that credential for the write RPCs
being done by VOP_PUTPAGES() for both NFS clients.

Tested by: Joel Ray Holveck (joelh at juniper.net)
PR: kern/165923
Reviewed by: kib
MFC after: 2 weeks


234742 27-Apr-2012 rmacklem

It was reported via email that some non-FreeBSD NFS servers
do not include file attributes in the reply to an NFS create RPC
under certain circumstances.
This resulted in a vnode of type VNON that was not usable.
This patch adds an NFS getattr RPC to nfs_create() for this case,
to fix the problem. It was tested by the person that reported
the problem and confirmed to fix this case for their server.

Tested by: Steven Haber (steven.haber at isilon.com)
MFC after: 2 weeks


234605 23-Apr-2012 trasz

Remove unused thread argument from vtruncbuf().

Reviewed by: kib


234386 17-Apr-2012 mckusick

Replace the MNT_VNODE_FOREACH interface with MNT_VNODE_FOREACH_ALL.
The primary changes are that the user of the interface no longer
needs to manage the mount-mutex locking and that the vnode that
is returned has its mutex locked (thus avoiding the need to check
to see if its is DOOMED or other possible end of life senarios).

To minimize compatibility issues for third-party developers, the
old MNT_VNODE_FOREACH interface will remain available so that this
change can be MFC'ed to 9. Following the MFC to 9, MNT_VNODE_FOREACH
will be removed in head.

The reason for this update is to prepare for the addition of the
MNT_VNODE_FOREACH_ACTIVE interface that will loop over just the
active vnodes associated with a mount point (typically less than
1% of the vnodes associated with the mount point).

Reviewed by: kib
Tested by: Peter Holm
MFC after: 2 weeks


233101 17-Mar-2012 kib

Add sysctl vfs.nfs.nfs_keep_dirty_on_error to switch the nfs client
behaviour on error from write RPC back to behaviour of old nfs client.
When set to not zero, the pages for which write failed are kept dirty.

PR: kern/165927
Reviewed by: alc
MFC after: 2 weeks


232420 03-Mar-2012 rmacklem

Post r230394, the Lookup RPC counts for both NFS clients increased
significantly. Upon investigation this was caused by name cache
misses for lookups of "..". For name cache entries for non-".."
directories, the cache entry serves double duty. It maps both the
named directory plus ".." for the parent of the directory. As such,
two ctime values (one for each of the directory and its parent) need
to be saved in the name cache entry.
This patch adds an entry for ctime of the parent directory to the
name cache. It also adds an additional uma zone for large entries
with this time value, in order to minimize memory wastage.
As well, it fixes a couple of cases where the mtime of the parent
directory was being saved instead of ctime for positive name cache
entries. With this patch, Lookup RPC counts return to values similar
to pre-r230394 kernels.

Reported by: bde
Discussed with: kib
Reviewed by: jhb
MFC after: 2 weeks


232327 01-Mar-2012 rmacklem

Fix the NFS clients so that they use copyin() instead of bcopy(),
when doing direct I/O. This direct I/O code is not enabled by default.

Submitted by: kib (earlier version)
Reviewed by: kib
MFC after: 1 week


231949 21-Feb-2012 kib

Fix found places where uio_resid is truncated to int.

Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the
sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from
the usermode.

Discussed with: bde, das (previous versions)
MFC after: 1 month


231852 17-Feb-2012 bz

Merge multi-FIB IPv6 support from projects/multi-fibv6/head/:

Extend the so far IPv4-only support for multiple routing tables (FIBs)
introduced in r178888 to IPv6 providing feature parity.

This includes an extended rtalloc(9) KPI for IPv6, the necessary
adjustments to the network stack, and user land support as in netstat.

Sponsored by: Cisco Systems, Inc.
Reviewed by: melifaro (basically)
MFC after: 10 days


231133 07-Feb-2012 rmacklem

r228827 fixed a problem where copying of NFSv4 open credentials into
a credential structure would corrupt it. This happened when the
p argument was != NULL. However, I now realize that the copying of
open credentials should only happen for p == NULL, since that indicates
that it is a read-ahead or write-behind. This patch fixes this.
After this commit, r228827 could be reverted, but I think the code is
clearer and safer with the patch, so I am going to leave it in.
Without this patch, it was possible that a NFSv4 VOP_SETATTR() could have
changed the credentials of the caller. This would have happened if
the process doing the VOP_SETATTR() did not have the file open, but
some other process running as a different uid had the file open for writing
at the same time.

MFC after: 5 days


231088 06-Feb-2012 jhb

Rename cache_lookup_times() to cache_lookup() and retire the old API and
ABI stub for cache_lookup().


231075 06-Feb-2012 kib

Current implementations of sync(2) and syncer vnode fsync() VOP uses
mnt_noasync counter to temporary remove MNTK_ASYNC mount option, which
is needed to guarantee a synchronous completion of the initiated i/o
before syscall or VOP return. Global removal of MNTK_ASYNC option is
harmful because not only i/o started from corresponding thread becomes
synchronous, but all i/o is synchronous on the filesystem which is
initiated during sync(2) or syncer activity.

Instead of removing MNTK_ASYNC from mnt_kern_flag, provide a local
thread flag to disable async i/o for current thread only. Use the
opportunity to move DOINGASYNC() macro into sys/vnode.h and
consistently use it through places which tested for MNTK_ASYNC.

Some testing demonstrated 60-70% improvements in run time for the
metadata-intensive operations on async-mounted UFS volumes, but still
with great deviation due to other reasons.

Reviewed by: mckusick
Tested by: scottl
MFC after: 2 weeks


230803 31-Jan-2012 rmacklem

When a "mount -u" switches an NFS mount point from TCP to UDP,
any thread doing an I/O RPC with a transfer size greater than
NFS_UDPMAXDATA will be hung indefinitely, retrying the RPC.
After a discussion on freebsd-fs@, I decided to add a warning
message for this case, as suggested by Jeremy Chadwick.

Suggested by: freebsd at jdc.parodius.com (Jeremy Chadwick)
MFC after: 2 weeks


230605 27-Jan-2012 rmacklem

A problem with respect to data read through the buffer cache for both
NFS clients was reported to freebsd-fs@ under the subject "NFS
corruption in recent HEAD" on Nov. 26, 2011. This problem occurred when
a TCP mounted root fs was changed to using UDP. I believe that this
problem was caused by the change in mnt_stat.f_iosize that occurred
because rsize was decreased to the maximum supported by UDP. This
patch fixes the problem by using v_bufobj.bo_bsize instead of f_iosize,
since the latter is set to f_iosize when the vnode is allocated, but
does not change for a given vnode when f_iosize changes.

Reported by: pjd
Reviewed by: kib
MFC after: 2 weeks


230559 26-Jan-2012 rmacklem

Revert r230516, since it doesn't really fix the problem.


230552 25-Jan-2012 kib

Fix remaining calls to cache_enter() in both NFS clients to provide
appropriate timestamps. Restore the assertions which verify that
NCF_TS is set when timestamp is asked for.

Reviewed by: jhb (previous version)
MFC after: 2 weeks


230547 25-Jan-2012 jhb

Add a timeout on positive name cache entries in the NFS client. That is,
we will only trust a positive name cache entry for a specified amount of
time before falling back to a LOOKUP RPC, even if the ctime for the file
handle matches the cached copy in the name cache entry. The timeout is
configured via a new 'nametimeo' mount option and defaults to 60 seconds.
It may be set to zero to disable positive name caching entirely.

Reviewed by: rmacklem
MFC after: 1 week


230516 25-Jan-2012 rmacklem

If a mount -u is done to either NFS client that switches it
from TCP to UDP and the rsize/wsize/readdirsize is greater
than NFS_MAXDGRAMDATA, it is possible for a thread doing an
I/O RPC to get stuck repeatedly doing retries. This happens
because the RPC will use a resize/wsize/readdirsize that won't
work for UDP and, as such, it will keep failing indefinitely.
This patch returns an error for this case, to avoid the problem.
A discussion on freebsd-fs@ seemed to indicate that returning
an error was preferable to silently ignoring the "udp"/"mntudp"
option.
This problem was discovered while investigating a problem reported
by pjd@ via email.

MFC after: 2 weeks


230394 20-Jan-2012 jhb

Close a race in NFS lookup processing that could result in stale name cache
entries on one client when a directory was renamed on another client. The
root cause for the stale entry being trusted is that each per-vnode nfsnode
structure has a single 'n_ctime' timestamp used to validate positive name
cache entries. However, if there are multiple entries for a single vnode,
they all share a single timestamp. To fix this, extend the name cache
to allow filesystems to optionally store a timestamp value in each name
cache entry. The NFS clients now fetch the timestamp associated with
each name cache entry and use that to validate cache hits instead of the
timestamps previously stored in the nfsnode. Another part of the fix is
that the NFS clients now use timestamps from the post-op attributes of
RPCs when adding name cache entries rather than pulling the timestamps out
of the file's attribute cache. The latter is subject to races with other
lookups updating the attribute cache concurrently. Some more details:
- Add a variant of nfsm_postop_attr() to the old NFS client that can return
a vattr structure with a copy of the post-op attributes.
- Handle lookups of "." as a special case in the NFS clients since the name
cache does not store name cache entries for ".", so we cannot get a
useful timestamp. It didn't really make much sense to recheck the
attributes on the the directory to validate the namecache hit for "."
anyway.
- ABI compat shims for the name cache routines are present in this commit
so that it is safe to MFC.

MFC after: 2 weeks


230249 17-Jan-2012 mckusick

Make sure all intermediate variables holding mount flags (mnt_flag)
and that all internal kernel calls passing mount flags are declared
as uint64_t so that flags in the top 32-bits are not lost.

MFC after: 2 weeks


229802 08-Jan-2012 rmacklem

opt_inet6.h was missing from some files in the new NFS subsystem.
The effect of this was, for clients mounted via inet6 addresses,
that the DRC cache would never have a hit in the server. It also
broke NFSv4 callbacks when an inet6 address was the only one available
in the client. This patch fixes the above, plus deletes opt_inet6.h
from a couple of files it is not needed for.

MFC after: 2 weeks


228827 23-Dec-2011 rmacklem

During investigation of an NFSv4 client crash reported by glebius@,
jhb@ spotted that nfscl_getstateid() might modify credentials when
called from nfsrpc_read() for the case where p != NULL, whereas
nfsrpc_read() only did a crdup() to get new credentials for p == NULL.
This bug was introduced by r195510, since pre-r195510 nfscl_getstateid()
only modified credentials for the p == NULL case. This patch modifies
nfsrpc_read()/nfsrpc_write() so that they do crdup() for the p != NULL case.
It is conceivable that this bug caused the crash reported by glebius@, but
that will not be determined for some time, since the crash occurred after
about 1month of operation.

Tested by: glebius
Reviewed by: jhb
MFC after: 2 weeks


228217 03-Dec-2011 rmacklem

Post r223774, the NFSv4 client no longer has multiple instances
of the same lock_owner4 string. As such, the handling of cleanup
of lock_owners could be simplified. This simplification permitted
the client to do a ReleaseLockOwner operation when the process that
the lock_owner4 string represents, has exited. This permits the
server to release any storage related to the lock_owner4 string
before the associated open is closed. Without this change, it
is possible to exhaust a server's storage when a long running
process opens a file and then many child processes do locking
on the file, because the open doesn't get closed. A similar patch
was applied to the Linux NFSv4 client recently so that it wouldn't
exhaust a server's storage.

Reviewed by: zack
MFC after: 2 weeks


228156 30-Nov-2011 kib

Rename vm_page_set_valid() to vm_page_set_valid_range().
The vm_page_set_valid() is the most reasonable name for the m->valid
accessor.

Reviewed by: attilio, alc


227796 21-Nov-2011 rmacklem

Clean up some cruft in the NFSv4 client left over from the
OpenBSD port, so that it is more readable. No logic change
is made by this commit.

MFC after: 2 weeks


227760 20-Nov-2011 rmacklem

Add two arguments to the nfsrpc_rellockown() function in the NFSv4
client. This does not change the client's behaviour, but prepares
the code so that nfsrpc_rellockown() can be called elsewhere in a
future commit.

MFC after: 2 weeks


227744 20-Nov-2011 rmacklem

Since the nfscl_cleanup() function isn't used by the FreeBSD NFSv4 client,
delete the code and fix up the related comments. This should not have
any functional effect on the client.

MFC after: 2 weeks


227743 20-Nov-2011 rmacklem

Post r223774 the NFSv4 client never uses the linked list with the
head nfsc_defunctlockowner. This patch simply removes the code that
loops through this always empty list, since the code no longer does
anything useful. It should not have any effect on the client's
behaviour.

MFC after: 2 weeks


227543 15-Nov-2011 rmacklem

Modify the new NFS client so that nfs_fsync() only calls ncl_flush()
for regular files. Since other file types don't write into the
buffer cache, calling ncl_flush() is almost a no-op. However, it does
clear the NMODIFIED flag and this shouldn't be done by nfs_fsync() for
directories.

MFC after: 2 weeks


227517 15-Nov-2011 rmacklem

Move the setting of the default value for nm_wcommitsize to
before the nfs_decode_args() call in the new NFS client, so
that a specfied command line value won't be overwritten.
Also, modify the calculation for small values of desiredvnodes
to avoid an unusually large value or a divide by zero crash.
It seems that the default value for nm_wcommitsize is very
conservative and may need to change at some time.

PR: kern/159351
Submitted by: onwahe at gmail.com (earlier version)
Reviewed by: jhb
MFC after: 2 weeks


227507 14-Nov-2011 jhb

Finish making 'wcommitsize' an NFS client mount option.

Reviewed by: rmacklem
MFC after: 1 week


227504 14-Nov-2011 jhb

Sync with the old NFS client: Remove an obsolete comment.


227494 14-Nov-2011 rmacklem

Since NFSv4 byte range locking only works for regular files,
add a sanity check for the vnode type to the NFSv4 client.

MFC after: 2 weeks


227493 13-Nov-2011 rmacklem

Move the assignment of default values for some mount options
to before the nfs_decode_args() call in the new NFS client,
so they don't overwrite the value specified on the command line.

MFC after: 2 weeks


224778 11-Aug-2011 rwatson

Second-to-last commit implementing Capsicum capabilities in the FreeBSD
kernel for FreeBSD 9.0:

Add a new capability mask argument to fget(9) and friends, allowing system
call code to declare what capabilities are required when an integer file
descriptor is converted into an in-kernel struct file *. With options
CAPABILITIES compiled into the kernel, this enforces capability
protection; without, this change is effectively a no-op.

Some cases require special handling, such as mmap(2), which must preserve
information about the maximum rights at the time of mapping in the memory
map so that they can later be enforced in mprotect(2) -- this is done by
narrowing the rights in the existing max_protection field used for similar
purposes with file permissions.

In namei(9), we assert that the code is not reached from within capability
mode, as we're not yet ready to enforce namespace capabilities there.
This will follow in a later commit.

Update two capability names: CAP_EVENT and CAP_KEVENT become
CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they
represent.

Approved by: re (bz)
Submitted by: jonathan
Sponsored by: Google Inc


224606 02-Aug-2011 rmacklem

Fix a LOR in the NFS client which could cause a deadlock.
This was reported to the mailing list freebsd-net@freebsd.org
on July 21, 2011 under the subject "LOR with nfsclient sillyrename".
The LOR occurred when nfs_inactive() called vrele(sp->s_dvp)
while holding the vnode lock on the file in s_dvp. This patch
modifies the client so that it performs the vrele(sp->s_dvp)
as a separate task to avoid the LOR. This fix was discussed
with jhb@ and kib@, who both proposed variations of it.

Tested by: pho, jlott at averesystems.com
Submitted by: jhb (earlier version)
Reviewed by: kib
Approved by: re (kib)
MFC after: 2 weeks


224532 30-Jul-2011 rmacklem

The new NFS client failed to vput() the new vnode if a setattr
failed after the file was created in nfs_create(). This would
probably only happen during a forced dismount. The old NFS client
does have a vput() for this case. Detected by pho during recent
testing, where an open syscall returned with a vnode still locked.

Tested by: pho
Approved by: re (kib)
MFC after: 2 weeks


224083 16-Jul-2011 zack

Simple find/replace of VOP_ISLOCKED -> NFSVOPISLOCKED. This is done so that NFSVOPISLOCKED can be modified later to add enhanced logging and assertions.

Reviewed by: rmacklem
Approved by: zml (mentor)
MFC after: 2 weeks


224082 16-Jul-2011 zack

Simple find/replace of VOP_UNLOCK -> NFSVOPUNLOCK. This is done so that NFSVOPUNLOCK can be modified later to add enhanced logging and assertions.

Reviewed by: rmacklem
Approved by: zml (mentor)
MFC after: 2 weeks


224081 16-Jul-2011 zack

Simple find/replace of vn_lock -> NFSVOPLOCK. This is done so that NFSVOPLOCK can be modified later to add enhanced logging and assertions.

Reviewed by: rmacklem
Approved by: zml (mentor)
MFC after: 2 weeks


223971 13-Jul-2011 rmacklem

r222389 introduced a case where the NFSv4 client could
loop in nfscl_getcl() when a forced dismount is in progress,
because nfsv4_lock() will return 0 without sleeping when
MNTK_UNMOUNTF is set.
This patch fixes it so it won't loop calling nfsv4_lock()
for this case.

MFC after: 2 weeks


223774 04-Jul-2011 rmacklem

The algorithm used by nfscl_getopen() could have resulted in
multiple instances of the same lock_owner when a process both
inherited an open file descriptor plus opened the same file itself.
Since some NFSv4 servers cannot handle multiple instances of
the same lock_owner string, this patch changes the algorithm
used by nfscl_getopen() in the new NFSv4 client to keep that
from happening. The new algorithm is simpler, since there is
no longer any need to ascend the process's parentage tree because
all NFSv4 Closes for a file are done at VOP_INACTIVE()/VOP_RECLAIM(),
making the Opens indistinct w.r.t. use with Lock Ops.
This problem was discovered at the recent NFSv4 interoperability
Bakeathon.

MFC after: 2 weeks


223747 03-Jul-2011 rmacklem

Modify the new NFSv4 client so that it appends a file handle
to the lock_owner4 string that goes on the wire. Also, add
code to do a ReleaseLockOwner Op on the lock_owner4 string
before a Close. Apparently not all NFSv4 servers handle multiple
instances of the same lock_owner4 string, at least not in a
compatible way. This patch avoids having multiple instances,
except for one unusual case, which will be fixed by a future commit.
Found at the recent NFSv4 interoperability Bakeathon.

Tested by: tdh at excfb.com
MFC after: 2 weeks


223657 28-Jun-2011 rmacklem

Fix the new NFSv4 client so that it doesn't fill the cached
mode attribute in as 0 when doing writes. The change adds
the Mode attribute plus the others except Owner and Owner_group
to the list requested by the NFSv4 Write Operation. This fixed
a problem where an executable file built by "cc" would get mode
0111 instead of 0755 for some NFSv4 servers.
Found at the recent NFSv4 interoperability Bakeathon.

Tested by: tdh at excfb.com
MFC after: 2 weeks


223309 19-Jun-2011 rmacklem

Fix the kgssapi so that it can be loaded as a module. Currently
the NFS subsystems use five of the rpcsec_gss/kgssapi entry points,
but since it was not obvious which others might be useful, all
nineteen were included. Basically the nineteen entry points are
set in a structure called rpc_gss_entries and inline functions
defined in sys/rpc/rpcsec_gss.h check for the entry points being
non-NULL and then call them. A default value is returned otherwise.
Requested by rwatson.

Reviewed by: jhb
MFC after: 2 weeks


223280 18-Jun-2011 rmacklem

Add DTrace support to the new NFS client. This is essentially
cloned from the old NFS client, plus additions for NFSv4. A
review of this code is in progress, however it was felt by the
reviewer that it could go in now, before code slush. Any changes
required by the review can be committed as bug fixes later.


222722 05-Jun-2011 rmacklem

Add support for flock(2) locks to the new NFSv4 client. I think this
should be ok, since the client now delays NFSv4 Close operations
until VOP_INACTIVE()/VOP_RECLAIM(). As such, there should be no
risk that the NFSv4 Open is closed while an associated byte range lock
still exists.

Tested by: avg
MFC after: 2 weeks


222719 05-Jun-2011 rmacklem

The new NFSv4 client was erroneously using "p" instead of
"p_leader" for the "id" for POSIX byte range locking. I think
this would only have affected processes created by rfork(2)
with the RFTHREAD flag specified. This patch fixes that by
passing the "id" down through the various functions from
nfs_advlock().

MFC after: 2 weeks


222718 05-Jun-2011 rmacklem

Fix the new NFSv4 client so that it doesn't crash when
a mount is done for a VIMAGE kernel.

Tested by: glz at hidden-powers dot com
Reviewed by: bz
MFC after: 2 weeks


222586 01-Jun-2011 kib

In the VOP_PUTPAGES() implementations, change the default error from
VM_PAGER_AGAIN to VM_PAGER_ERROR for the uwritten pages. Return
VM_PAGER_AGAIN for the partially written page. Always forward at least
one page in the loop of vm_object_page_clean().

VM_PAGER_ERROR causes the page reactivation and does not clear the
page dirty state, so the write is not lost.

The change fixes an infinite loop in vm_object_page_clean() when the
filesystem returns permanent errors for some page writes.

Reported and tested by: gavin
Reviewed by: alc, rmacklem
MFC after: 1 week


222540 31-May-2011 rmacklem

Fix the new NFS client so that it doesn't do an NFSv3
Pathconf RPC for cases where the reply doesn't include
the answer. This fixes a problem reported by avg@ where
the NFSv3 Pathconf RPC would fail when "ls -l" did an
lpathconf(2) for _PC_ACL_NFS4.

Tested by: avg
MFC after: 2 weeks


222389 27-May-2011 rmacklem

Fix the new NFS client so that it handles NFSv4 state
correctly during a forced dismount. This required that
the exclusive and shared (refcnt) sleep lock functions check
for MNTK_UMOUNTF before sleeping, so that they won't block
while nfscl_umount() is getting rid of the state. As
such, a "struct mount *" argument was added to the locking
functions. I believe the only remaining case where a forced
dismount can get hung in the kernel is when a thread is
already attempting to do a TCP connect to a dead server
when the krpc client structure called nr_client is NULL.
This will only happen just after a "mount -u" with options
that force a new TCP connection is done, so it shouldn't
be a problem in practice.

MFC after: 2 weeks


222329 26-May-2011 rmacklem

Add a check for MNTK_UNMOUNTF at the beginning of nfs_sync()
in the new NFS client so that a forced dismount doesn't
get stuck in the VFS_SYNC() call that happens before
VFS_UNMOUNT() in dounmount().
Additional changes are needed before forced dismounts will work.

MFC after: 2 weeks


222291 25-May-2011 rmacklem

Add some missing mutex locking to the new NFS client.

MFC after: 2 weeks


222289 25-May-2011 rmacklem

Fix the new NFS client so that it correctly sets the "must_commit"
argument for a write RPC when it succeeds for the first one and
fails for a subsequent RPC within the same call to the function.
This makes it compatible with the old NFS client for this case.

MFC after: 2 weeks


222233 23-May-2011 rmacklem

Set the MNT_NFS4ACLS flag for an NFSv4 client mount
if the NFSv4 server supports it. Requested by trasz.

MFC after: 2 weeks


222187 22-May-2011 alc

Eliminate duplicate #include's.


222075 18-May-2011 rmacklem

Add a sanity check for the existence of an "addr" option
to both NFS clients. This avoids the crash reported by
Sergey Kandaurov (pluknet@gmail.com) to the freebsd-fs@
list with subject "[old nfsclient] different nmount()
args passed from mount vs mount_nfs" dated May 17, 2011.

Tested by: pluknet at gmail.com (old nfs client)
MFC after: 2 weeks


221973 15-May-2011 rmacklem

Change the sysctl naming for the old and new NFS clients
to vfs.oldnfs.xxx and vfs.nfs.xxx respectively. This makes
the default nfs client use vfs.nfs.xxx after r221124.


221537 06-May-2011 rmacklem

Set the initial value of maxfilesize to OFF_MAX in the
new NFS client. It will then be reduced to whatever the
server says it can support. There might be an argument
that this could be one block larger, but since NFS is
a byte granular system, I chose not to do that.

Suggested by: Matt Dillon
Tested by: Daniel Braniss (earlier version)
MFC after: 2 weeks


221467 05-May-2011 rmacklem

Fix the new NFS client so that it handles the 64bit fields
that are now in "struct statfs" for NFSv3 and NFSv4. Since
the ffiles value is uint64_t on the wire, I clip the value
to INT64_MAX to avoid setting f_ffree negative.

Tested by: kib
MFC after: 2 weeks


221436 04-May-2011 ru

Implemented a mount option "nocto" that disables cache coherency
checking at open time. It may improve performance for read-only
NFS mounts. Use deliberately.

MFC after: 1 week
Reviewed by: rmacklem, jhb (earlier version)


221429 04-May-2011 ru

In ncl_printf(), call vprintf() instead of printf().

MFC after: 3 days


221205 29-Apr-2011 rmacklem

The build was broken by r221190 for 64bit arches like amd64.
This patch fixes it.

MFC after: 2 weeks


221190 28-Apr-2011 rmacklem

Fix the new NFS client so that it handles the "nfs_args" value
in mnt_optnew. This is needed so that the old mount(2) syscall
works and that is needed so that amd(8) works. The code was
basically just cribbed from sys/nfsclient/nfs_vfsops.c with minor
changes. This patch is mainly to fix the new NFS client so that
amd(8) works with it. Thanks go to Craig Rodrigues for helping with
this.

Tested by: Craig Rodrigues (for amd)
MFC after: 2 weeks


221139 27-Apr-2011 rmacklem

Fix module names and dependencies so the NFS clients will
load correctly as modules after r221124.


221124 27-Apr-2011 rmacklem

This patch changes head so that the default NFS client is now the new
NFS client (which I guess is no longer experimental). The fstype "newnfs"
is now "nfs" and the regular/old NFS client is now fstype "oldnfs".
Although mounts via fstype "nfs" will usually work without userland
changes, an updated mount_nfs(8) binary is needed for kernels built with
"options NFSCL" but not "options NFSCLIENT". Updated mount_nfs(8) and
mount(8) binaries are needed to do mounts for fstype "oldnfs".
The GENERIC kernel configs have been changed to use options
NFSCL and NFSD (the new client and server) instead of NFSCLIENT and NFSSERVER.
For kernels being used on diskless NFS root systems, "options NFSCL"
must be in the kernel config.
Discussed on freebsd-fs@.


221066 26-Apr-2011 rmacklem

Fix a kernel linking problem introduced by r221032, r221040
when building kernels that don't have "options NFS_ROOT"
specified. I plan on moving the functions that use these
data structures into the shared code in sys/nfs/nfs_diskless.c
in a future commit. At that time, these definitions will no
longer be needed in nfs_vfsops.c and nfs_clvfsops.c.

MFC after: 2 weeks


221040 25-Apr-2011 rmacklem

Modify the experimental (newnfs) NFS client so that it uses the
same diskless NFS root code as the regular client, which
was moved to sys/nfs by r221032. This fixes the newnfs
client so that it can do an NFSv3 diskless root file system.

MFC after: 2 weeks


221018 25-Apr-2011 rmacklem

Fix the experimental NFS client so that it does not bogusly
set the f_flags field of "struct statfs". This had the interesting
effect of making the NFSv4 mounts "disappear" after r221014,
since NFSMNT_NFSV4 and MNT_IGNORE became the same bit.

MFC after: 2 weeks


221014 25-Apr-2011 rmacklem

Modify the experimental NFS client so that it uses the same
"struct nfs_args" as the regular NFS client. This is needed
so that the old mount(2) syscall will work and it makes
sharing of the diskless NFS root code easier. Eary in the
porting exercise I introduced a new revision of nfs_args, but
didn't actually need it, thanks to nmount(2). I re-introduced the
NFSMNT_KERB flag, since it does essentially the same thing and
the old one would not have been used because it never worked.
I also added a few new NFSMNT_xxx flags to sys/nfsclient/nfs_args.h
that are used by the experimental NFS client.

MFC after: 2 weeks


220928 21-Apr-2011 rmacklem

Remove the nm_mtx mutex locking from the test for
nm_maxfilesize. This value rarely, if ever, changes
and the nm_mtx mutex is locked/unlocked earlier in
the function, which should be sufficient to avoid
getting a stale cached value for it. There is a
discussion w.r.t. what these tests should be, but
I've left them basically the same as the regular
NFS client for now.

Suggested by: pjd
MFC after: 2 weeks


220921 21-Apr-2011 rmacklem

Revert r220906, since the vp isn't always locked when
nfscl_request() is called. It will need a more involved
patch.


220906 20-Apr-2011 rmacklem

Add a check for VI_DOOMED at the beginning of nfscl_request()
so that it won't try and use vp->v_mount to do an RPC during
a forced dismount. There needs to be at least one more kernel
commit, plus a change to the umount(8) command before forced
dismounts will work for the experimental NFS client.

MFC after: 2 weeks


220877 20-Apr-2011 rmacklem

Modify the offset + size checks for read and write in the
experimental NFS client to take care of overflows for the calls
above the buffer cache layer in a manner similar to r220876.
Thanks go to dillon at apollo.backplane.com for providing the
snippet of code that does this.

MFC after: 2 weeks


220876 20-Apr-2011 rmacklem

Modify the offset + size checks for read and write in the
experimental NFS client to take care of overflows. Thanks
go to dillon at apollo.backplane.com for providing the
snippet of code that does this.

MFC after: 2 weeks


220810 19-Apr-2011 rmacklem

Fix up handling of the nfsmount structure in read and write
within the experimental NFS client. Mostly add mutex locking
and use the same rsize, wsize during the operation by keeping
a local copy of it. This is another change that brings it
closer to the regular NFS client.

MFC after: 2 weeks


220807 18-Apr-2011 rmacklem

Revert r220761 since, as kib@ pointed out, the case of
adding the check to nfsrpc_close() isn't useful. Also,
the check in nfscl_getcl() must be more involved, since
it needs to check before and after the acquisition of
the refcnt on nfsc_lock, while the mutex that protects
the client state data is held.


220764 18-Apr-2011 rmacklem

Add a vput() to nfs_lookitup() in the experimental NFS client
for a case that will probably never happen. It can only
happen if a server were to successfully lookup a file, but not
return attributes for that file. Although technically allowed
by the NFSv3 RFC, I doubt any server would ever do this.
However, if it did, the client would have not vput()'d the
new vnode when it needed to do so.

MFC after: 2 weeks


220763 18-Apr-2011 rmacklem

Add vput() calls in two places in the experimental NFS client
that would be needed if, in the future, nfscl_loadattrcache()
were to return an error. Currently nfscl_loadattrcache()
never returns an error, so these cases never currently happen.

MFC after: 2 weeks


220762 18-Apr-2011 rmacklem

Change the mutex locking for several locations in the
experimental NFS client's vnode op functions to make
them compatible with the regular NFS client. I'll admit
I'm not sure that the mutex locks around the assignments
are needed, but the regular client has them, so I added them.
Also, add handling of the case of partial attributes in
setattr to be compatible with the regular client.

MFC after: 2 weeks


220761 17-Apr-2011 rmacklem

Add checks for MNTK_UNMOUNTF at the beginning of three
functions, so that threads don't get stuck in them during
a forced dismount. nfs_sync/VFS_SYNC() needs this, since it is
called by dounmount() before VFS_UNMOUNT(). The nfscl_nget()
case makes sure that a thread doing an VOP_OPEN() or
VOP_ADVLOCK() call doesn't get blocked before attempting
the RPC. Attempting RPCs don't block, since they all
fail once a forced dismount is in progress.
The third one at the beginning of nfsrpc_close()
is done so threads don't get blocked while doing VOP_INACTIVE()
as the vnodes are cleared out.
With these three changes plus a change to the umount(1)
command so that it doesn't do "sync()" for the forced case
seem to make forced dismounts work for the experimental NFS
client.

MFC after: 2 weeks


220751 17-Apr-2011 rmacklem

Fix up some of the sysctls for the experimental NFS client so
that they use the same names as the regular client. Also add
string descriptions for them.

MFC after: 2 weeks


220739 17-Apr-2011 rmacklem

Change some defaults in the experimental NFS client to be the
same as the regular NFS client for NFSv3. The main one is making
use of a reserved port# the default. Also, set the retry limit
for TCP the same and fix the code so that it doesn't disable
readdirplus for NFSv4.

MFC after: 2 weeks


220735 17-Apr-2011 rmacklem

Fix readdirplus in the experimental NFS client so that it
skips over ".." to avoid a LOR race with nfs_lookup(). This
fix is analagous to r138256 in the regular NFS client.

MFC after: 2 weeks


220732 16-Apr-2011 rmacklem

Add a lktype flags argument to nfscl_nget() and ncl_nget() in the
experimental NFS client so that its nfs_lookup() function can use
cn_lkflags in a manner analagous to the regular NFS client.

MFC after: 2 weeks


220731 16-Apr-2011 rmacklem

Add mutex locking on the nfs node in ncl_inactive() for the
experimental NFS client.

MFC after: 2 weeks


220683 15-Apr-2011 rmacklem

Change the experimental NFS client so that it creates nfsiod
threads in the same manner as the regular NFS client after
r214026 was committed. This resolves the lors fixed by r214026
and its predecessors for the regular client.

Reviewed by: jhb
MFC after: 2 weeks


220648 14-Apr-2011 rmacklem

Fix the experimental NFSv4 server so that it uses VOP_PATHCONF()
to determine if a file system supports NFSv4 ACLs. Since
VOP_PATHCONF() must be called with a locked vnode, the function
is called before nfsvno_fillattr() and the result is passed in
as an extra argument.

MFC after: 2 weeks


220645 14-Apr-2011 rmacklem

Modify the experimental NFSv4 server so that it handles
crossing of server mount points properly. The functions
nfsvno_fillattr() and nfsv4_fillattr() were modified to
take the extra arguments that are the mount point, a flag
to indicate that it is a file system root and the mounted
on fileno. The mount point argument needs to be busy when
nfsvno_fillattr() is called, since the vp argument is not
locked.

Reviewed by: kib
MFC after: 2 weeks


220611 13-Apr-2011 rmacklem

Add VOP_PATHCONF() support to the experimental NFS client
so that it can, along with other things, report whether or
not NFS4 ACLs are supported.

MFC after: 2 weeks


220610 13-Apr-2011 rmacklem

Fix the experimental NFSv4 client so that it recognizes server
mount point crossings correctly. It was testing the wrong flag.
Also, try harder to make sure that the fsid is different than
the one assigned to the client mount point, by hashing the
server's fsid (just to create a different value deterministically)
when it is the same.

MFC after: 2 weeks


220152 30-Mar-2011 zack

This patch fixes the Experimental NFS client to properly deal with 32 bit or 64
bit fileid's in NFSv2 and NFSv3. Without this fix, invalid casting (and sign
extension) was creating problems for any fileid greater than 2^31.

We discovered this because we have test clusters with more than 2 billion
allocated files and 64-bit ino_t's (and friend structures).

Reviewed by: rmacklem
Approved by: zml (mentor)
MFC after: 2 weeks


219968 24-Mar-2011 jhb

Fix some locking nits with the p_state field of struct proc:
- Hold the proc lock while changing the state from PRS_NEW to PRS_NORMAL
in fork to honor the locking requirements. While here, expand the scope
of the PROC_LOCK() on the new process (p2) to avoid some LORs. Previously
the code was locking the new child process (p2) after it had locked the
parent process (p1). However, when locking two processes, the safe order
is to lock the child first, then the parent.
- Fix various places that were checking p_state against PRS_NEW without
having the process locked to use PROC_LOCK(). Every place was already
locking the process, just after the PRS_NEW check.
- Remove or reduce the use of PROC_SLOCK() for places that were checking
p_state against PRS_NEW. The PROC_LOCK() alone is sufficient for reading
the current state.
- Reorder fill_kinfo_proc() slightly so it only acquires PROC_SLOCK() once.

MFC after: 1 week


219028 25-Feb-2011 netchild

Add some FEATURE macros for various features (AUDIT/CAM/IPC/KTR/MAC/NFS/NTP/
PMC/SYSV/...).

No FreeBSD version bump, the userland application to query the features will
be committed last and can serve as an indication of the availablility if
needed.

Sponsored by: Google Summer of Code 2010
Submitted by: kibab
Reviewed by: arch@ (parts by rwatson, trasz, jhb)
X-MFC after: to be determined in last commit with code from this project


218757 16-Feb-2011 bz

Mfp4 CH=177274,177280,177284-177285,177297,177324-177325

VNET socket push back:
try to minimize the number of places where we have to switch vnets
and narrow down the time we stay switched. Add assertions to the
socket code to catch possibly unset vnets as seen in r204147.

While this reduces the number of vnet recursion in some places like
NFS, POSIX local sockets and some netgraph, .. recursions are
impossible to fix.

The current expectations are documented at the beginning of
uipc_socket.c along with the other information there.

Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH
Reviewed by: jhb
Tested by: zec

Tested by: Mikolaj Golub (to.my.trociny gmail.com)
MFC after: 2 weeks


216931 03-Jan-2011 rmacklem

Fix the nlm so that it no longer depends on the regular
nfs client and, as such, can be loaded for the experimental
nfs client without the regular client.

Reviewed by: jhb
MFC after: 2 weeks


215548 19-Nov-2010 kib

Remove prtactive variable and related printf()s in the vop_inactive
and vop_reclaim() methods. They seems to be unused, and the reported
situation is normal for the forced unmount.

MFC after: 1 week
X-MFC-note: keep prtactive symbol in vfs_subr.c


214513 29-Oct-2010 rmacklem

Modify nfs_open() in the experimental NFS client to be compatible
with the regular NFS client. Also, fix a couple of mutex lock issues.

MFC after: 1 week


214511 29-Oct-2010 rmacklem

Add a call for nfsrpc_close() to ncl_reclaim() in the experimental
NFSv4 client, since the call in ncl_inactive() might be missed
because VOP_INACTIVE() is not guaranteed to be called before
VOP_RECLAIM().

MFC after: 1 week


214406 26-Oct-2010 rmacklem

Add a flag to the experimental NFSv4 client to indicate when
delegations are being returned for reasons other than a Recall.
Also, re-organize nfscl_recalldeleg() slightly, so that it leaves
clearing NMODIFIED to the ncl_flush() call and invalidates the
attribute cache after flushing. It is hoped that these changes
might fix the problem others have seen when using the NFSv4
client with delegations enabled, since I can't reliably reproduce
the problem. These changes only affect the client when doing NFSv4
mounts with delegations enabled.

MFC after: 10 days


214053 19-Oct-2010 rmacklem

Fix the type of the 3rd argument for nm_getinfo so that it works
for architectures like sparc64.

Suggested by: kib
MFC after: 2 weeks


214048 19-Oct-2010 rmacklem

Modify the NFS clients and the NLM so that the NLM can be used
by both clients. Since the NLM uses various fields of the
nfsmount structure, those fields were extracted and put in a
separate nfs_mountcommon structure stored in sys/nfs/nfs_mountcommon.h.
This structure also has a function pointer for a function that
extracts the required information from the mount point and nfs vnode
for that particular client, for information stored differently by the
clients.

Reviewed by: jhb
MFC after: 2 weeks


212362 09-Sep-2010 rmacklem

Fix the experimental NFS client so that it doesn't panic when
NFSv2,3 byte range locking is attempted. A fix that allows the
nlm_advlock() to work with both clients is in progress, but
may take a while. As such, I am doing this commit so that
the kernel doesn't panic in the meantime.

Submitted by: jh
MFC after: 2 weeks


212293 07-Sep-2010 jhb

Store the full timestamp when caching timestamps of files and
directories for purposes of validating name cache entries. This
closes races where two updates to a file or directory within the same
second could result in stale entries in the name cache. While here,
remove the 'n_expiry' field as it is no longer used.

Reviewed by: rmacklem
MFC after: 1 week


212217 05-Sep-2010 rmacklem

Change the code in ncl_bioread() in the experimental NFS
client to return an error when rabp is not set, so it
behaves the same way as the regular NFS client for this
case. It does not affect NFSv4, since nfs_getcacheblk()
only fails for "intr" mounts and NFSv4 can't use the
"intr" mount option.

MFC after: 2 weeks


212216 05-Sep-2010 rmacklem

Disable use of the NLM in the experimental NFS client, since
it will crash the kernel because it uses the nfsmount and
nfsnode structures of the regular NFS client.

MFC after: 2 weeks


211531 20-Aug-2010 jhb

Add dedicated routines to toggle lockmgr flags such as LK_NOSHARE and
LK_CANRECURSE after a lock is created. Use them to implement macros that
otherwise manipulated the flags directly. Assert that the associated
lockmgr lock is exclusively locked by the current thread when manipulating
these flags to ensure the flag updates are safe. This last change required
some minor shuffling in a few filesystems to exclusively lock a brand new
vnode slightly earlier.

Reviewed by: kib
MFC after: 3 days


210786 03-Aug-2010 rmacklem

Modify the return value for nfscl_mustflush() from boolean_t,
which I mistakenly thought was correct w.r.t. style(9), back
to int and add the checks for != 0. This is just a stylistic
modification.

MFC after: 1 week


210455 24-Jul-2010 rmacklem

Move sys/nfsclient/nfs_lock.c into sys/nfs and build it as a separate
module that can be used by both the regular and experimental nfs
clients. This fixes the problem reported by jh@ where /dev/nfslock
would be registered twice when both nfs clients were used.
I also defined the size of the lm_fh field to be the correct value,
as it should be the maximum size of an NFSv3 file handle.

Reviewed by: jh
MFC after: 2 weeks


210227 18-Jul-2010 rmacklem

Add a call to nfscl_mustflush() in nfs_close() of the experimental
NFSv4 client, so that attributes are not acquired from the server
when a delegation for the file is held. This can reduce the number
of Getattr Ops significantly.

MFC after: 2 weeks


210201 18-Jul-2010 rmacklem

Change the nfscl_mustflush() function in the experimental NFSv4
client to return a boolean_t in order to make it more compatible
with style(9).

MFC after: 2 weeks


210136 15-Jul-2010 jhb

Retire the NFS access cache timestamp structure. It was used in VOP_OPEN()
to avoid sending multiple ACCESS/GETATTR RPCs during a single open()
between VOP_LOOKUP() and VOP_OPEN(). Now we always send the RPC in
VOP_LOOKUP() and not VOP_OPEN() in the cases that multiple RPCs could be
sent.

MFC after: 2 weeks


210135 15-Jul-2010 jhb

Merge 208603, 209946, and 209948 to the new NFS client:
Move attribute cache flushes from VOP_OPEN() to VOP_LOOKUP() to provide
more graceful recovery for stale filehandles and eliminate the need for
conditionally clearing the attribute cache in the !NMODIFIED case in
VOP_OPEN().

Reviewed by: rmacklem
MFC after: 2 weeks


210034 13-Jul-2010 rmacklem

For the experimental NFSv4 client, make sure that attributes that
predate the issue of a delegation are not cached once the delegation
is held. This is necessary, since cached attributes remain valid
while the delegation is held.

MFC after: 2 weeks


210032 13-Jul-2010 rmacklem

For the experimental NFSv4 client, do not use cached attributes
that were invalidated, even when a delegation for the file is held.

MFC after: 2 weeks


209191 15-Jun-2010 rmacklem

Add MODULE_DEPEND() macros to the experimental NFS client and
server so that the modules will load when kernels are built with
none of the NFS* configuration options specified. I believe this
resolves the problems reported by PR kern/144458 and the email on
freebsd-stable@ posted by Dmitry Pryanishnikov on June 13.

Tested by: kib
PR: kern/144458
Reviewed by: kib
MFC after: 1 week


209120 13-Jun-2010 kib

In NFS clients, instead of inconsistently using #ifdef
DIAGNOSTIC and #ifndef DIAGNOSTIC for debug assertions, prefer
KASSERT(). Also change one #ifdef DIAGNOSTIC in the new nfs server.

Submitted by: Mikolaj Golub <to.my.trociny gmail com>
MFC after: 2 weeks


208254 18-May-2010 rmacklem

Allow the experimental NFSv4 client to use cached attributes
when a write delegation is held. Also, add a missing
mtx_unlock() call for the ACL debugging code.

MFC after: 5 days


208234 18-May-2010 rmacklem

Add a sanity check for a negative args.fhsize to the experimental
NFS client.

MFC after: 5 days


207746 07-May-2010 alc

Push down the page queues lock into vm_page_activate().


207728 06-May-2010 alc

Eliminate page queues locking around most calls to vm_page_free().


207669 05-May-2010 alc

Acquire the page lock around all remaining calls to vm_page_free() on
managed pages that didn't already have that lock held. (Freeing an
unmanaged page, such as the various pmaps use, doesn't require the page
lock.)

This allows a change in vm_page_remove()'s locking requirements. It now
expects the page lock to be held instead of the page queues lock.
Consequently, the page queues lock is no longer required at all by callers
to vm_page_rename().

Discussed with: kib


207662 05-May-2010 trasz

Move checking against RLIMIT_FSIZE into one place, vn_rlimit_fsize().

Reviewed by: kib


207584 03-May-2010 kib

Lock the page around vm_page_activate() and vm_page_deactivate() calls
where it was missed. The wrapped fragments now protect wire_count with
page lock.

Reviewed by: alc


207350 28-Apr-2010 rmacklem

For the experimental NFS client, it should always flush dirty
buffers before closing the NFSv4 opens, as the comment states.
This patch deletes the call to nfscl_mustflush() which would
return 0 for the case where a delegation still exists, which
was incorrect and could cause crashes during recovery from
an expired lease.

MFC after: 1 week


207349 28-Apr-2010 rmacklem

Delete a diagnostic statement that is no longer useful from
the experimental NFS client.

MFC after: 1 week


207170 24-Apr-2010 rmacklem

An NFSv4 server will reply NFSERR_GRACE for non-recovery RPCs
during the grace period after startup. This grace period must
be at least the lease duration, which is typically 1-2 minutes.
It seems prudent for the experimental NFS client to wait a few
seconds before retrying such an RPC, so that the server isn't
flooded with non-recovery RPCs during recovery. This patch adds
an argument to nfs_catnap() to implement a 5 second delay
for this case.

MFC after: 1 week


207082 22-Apr-2010 rmacklem

When the experimental NFS client is handling an NFSv4 server reboot
with delegations enabled, the recovery could fail if the renew
thread is trying to return a delegation, since it will not do the
recovery. This patch fixes the above by having nfscl_recalldeleg()
fail with the I/O operations returning EIO, so that they will be
attempted later. Most of the patch consists of adding an argument
to various functions to indicate the delegation recall case where
this needs to be done.

MFC after: 1 week


206880 20-Apr-2010 rmacklem

For the experimental NFS client doing an NFSv4 mount,
set the NFSCLFLAGS_RECVRINPROG while doing recovery from an expired
lease in a manner similar to r206818 for server reboot recovery.
This will prevent the function that acquires stateids for I/O
operations from acquiring out of date stateids during recovery.
Also, fix up mutex locking on the nfsc_flags field.

MFC after: 1 week


206818 18-Apr-2010 rmacklem

Avoid extraneous recovery cycles in the experimental NFS client
when an NFSv4 server reboots, by doing two things.
1 - Make the function that acquires a stateid for I/O operations
block until recovery is complete, so that it doesn't acquire
out of date stateids.
2 - Only allow a recovery once every 1/2 of a lease duration, since
the NFSv4 server must provide a recovery grace period of at
least a lease duration. This should avoid recoveries caused
by an out of date stateid that was acquired for an I/O op.
just before a recovery cycle started.

MFC after: 1 week


206690 15-Apr-2010 rmacklem

Add mutex lock calls to 2 cases in the experimental NFS client's
renew thread where they were missing.

MFC after: 1 week


206688 15-Apr-2010 rmacklem

The experimental NFS client was not filling in recovery credentials
for opens done locally in the client when a delegation for the file
was held. This could cause the client to crash in crsetgroups() when
recovering from a server crash/reboot. This patch fills in the
recovery credentials for this case, in order to avoid the client crash.
Also, add KASSERT()s to the credential copy functions, to catch any
other cases where the credentials aren't filled in correctly.

MFC after: 1 week


203303 31-Jan-2010 rmacklem

Patch the experimental NFS client so that there is a timeout
for negative name cache entries in a manner analogous to
r202767 for the regular NFS client. Also, make the code in
nfs_lookup() compatible with that of the regular client
and replace the sysctl variable that enabled negative name
caching with the mount point option.

MFC after: 2 weeks


203119 28-Jan-2010 rmacklem

Patch the experimental NFS client in a manner analogous to
r203072 for the regular NFS client. Also, delete two fields
of struct nfsmount that are not used by the FreeBSD port of
the client.

MFC after: 2 weeks


201439 03-Jan-2010 rmacklem

Fix three related problems in the experimental nfs client when
checking for conflicts w.r.t. byte range locks for NFSv4.
1 - Return 0 instead of EACCES when a conflict is found, for F_GETLK.
2 - Check for "same file" when checking for a conflict.
3 - Don't check for a conflict for the F_UNLCK case.


201345 31-Dec-2009 rmacklem

Fix the experimental NFS client so that it can create Unix
domain sockets on an NFSv4 mount point. It was generating
incorrect XDR in the request for this case.

Tested by: infofarmer
MFC after: 2 weeks


201029 26-Dec-2009 rmacklem

When porting the experimental nfs subsystem to the FreeBSD8 krpc,
I added 3 functions that were already in the experimental client
under different names. This patch deletes the functions in the
experimental client and renames the calls to use the other set.
(This is just removal of duplicated code and does not fix any bug.)

MFC after: 2 weeks


200069 03-Dec-2009 trasz

Remove unneeded ifdefs.

Reviewed by: rmacklem


199189 11-Nov-2009 jh

Create verifier used by FreeBSD NFS client is suboptimal because the
first part of a verifier is set to the first IP address from
V_in_ifaddrhead list. This address is typically the loopback address
making the first part of the verifier practically non-unique. The second
part of the verifier is initialized to zero making its initial value
non-unique too.

This commit changes the strategy for create verifier initialization:
just initialize it to a random value. Also move verifier handling into
its own function and use a mutex to protect the variable.

This change is a candidate for porting to sys/nfsclient.

Reviewed by: jhb, rmacklem
Approved by: trasz (mentor)


198291 20-Oct-2009 jh

Unloading of the nfscl module is unsupported because newnfslock doesn't
support unloading. It's not trivial to implement newnfslock unloading so
for now just admit that unloading is unsupported and refuse to attempt
unload in all nfscl module event handlers.

Reviewed by: rmacklem
Approved by: trasz (mentor)


198290 20-Oct-2009 jh

Fix ordering of nfscl_modevent() and ncl_uninit(). nfscl_modevent() must
be called after ncl_uninit() when unloading the nfscl module because
ncl_uninit() uses ncl_iod_mutex which is destroyed in nfscl_modevent().

Reviewed by: rmacklem
Approved by: trasz (mentor)


198289 20-Oct-2009 jh

Fix comment typos.

Reviewed by: rmacklem
Approved by: trasz (mentor)


197048 09-Sep-2009 rmacklem

Add LK_NOWITNESS to the vn_lock() calls done on newly created nfs
vnodes, since these nodes are not linked into the mount queue and,
as such, the vn_lock() cannot cause a deadlock so LORs are harmless.

Suggested by: kib
Approved by: kib (mentor)
MFC after: 3 days


196503 24-Aug-2009 zec

Fix NFS panics with options VIMAGE kernels by apropriately setting curvnet
context inside the RPC code.

Temporarily set td's cred to mount's cred before calling socreate() via
__rpc_nconf2socket().

Submitted by: rmacklem (in part)
Reviewed by: rmacklem, rwatson
Discussed with: dfr, bz
Approved by: re (rwatson), julian (mentor)
MFC after: 3 days


196332 17-Aug-2009 rmacklem

Apply the same patch as r196205 for nfs_upgrade_lock() and
nfs_downgrade_lock() to the experimental nfs client.

Approved by: re (kensmith), kib (mentor)


195943 29-Jul-2009 rmacklem

Fix the experimental nfs client so that it only calls ncl_vinvalbuf()
for NFSv2 and not NFSv4 when nfscl_mustflush() returns 0. Since
nfscl_mustflush() only returns 0 when there is a valid write delegation
issued to the client, it only affects the case of an NFSv4 mount with
callbacks/delegations enabled.

Approved by: re (kensmith), kib (mentor)


195825 22-Jul-2009 rmacklem

When vfs.newnfs.callback_addr is set to an IPv4 address, the
experimental NFSv4 client might try and use it as an IPv6 address,
breaking callbacks. The fix simply initializes the isinet6 variable
for this case.

Approved by: re (kensmith), kib (mentor)


195821 22-Jul-2009 rmacklem

Add changes to the experimental nfs client to use the PBDRY flag for
msleep(9) when a vnode lock or similar may be held. The changes are
just a clone of the changes applied to the regular nfs client by
r195703.

Approved by: re (kensmith), kib (mentor)


195819 22-Jul-2009 rmacklem

When using an NFSv4 mount in the experimental nfs client with delegations
being issued from the server, there was a case where an Open issued locally
based on the delegation would be released before the associated vnode
became inactive. If the delegation was recalled after the open was released,
an Open against the server would not have been acquired and subsequent I/O
operations would need to use the special stateid of all zeros. This patch
fixes that case.

Approved by: re (kensmith), kib (mentor)


195762 19-Jul-2009 rmacklem

Fix two bugs in the experimental nfs client:
- When the root vnode was acquired during mounting, mnt_stat.f_iosize was
still set to 0, so getnewvnode() would set bo_bsize == 0. This would
confuse getblk(), so that it always returned the first block causing
the problem when the root directory of the mount point was greater
than one block in size. It was fixed by setting mnt_stat.f_iosize to
NFS_DIRBLKSIZ before calling ncl_nget() to acquire the root vnode.
- NFSMNT_INT was being set temporarily while the initial connect to a
server was being done. This erroneously configured the krpc for
interruptible RPCs, which caused problems because signals weren't
being masked off as they would have been for interruptible mounts.
This code was deleted to fix the problem. Since mount_nfs does an
NFS null RPC before the mount system call, connections to the server
should work ok.

Tested by: swell dot k at gmail dot com
Approved by: re (kensmith), kib (mentor)


195704 14-Jul-2009 rmacklem

Fix the experimental nfs client so that it does not cause a
"share->excl" panic when doing a lookup of dotdot at the root
of a server's file system. The patch avoids calling vn_lock()
for that case, since nfscl_nget() has already acquired a lock
for the vnode.

Approved by: re (kensmith), kib (mentor)


195699 14-Jul-2009 rwatson

Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator. Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...). This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack. Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory. Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy. Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address. When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by: bz
Reviewed by: bz, zec
Discussed with: gnn, jamie, jeff, jhb, julian, sam
Suggested by: peter
Approved by: re (kensmith)


195641 12-Jul-2009 rmacklem

Fix the handling of dotdot in lookup for the experimental nfs client
in a manner analagous to the change in r195294 for the regular nfs client.

Approved by: re (kensmith), kib (mentor)


195510 09-Jul-2009 rmacklem

Since the nfscl_getclose() function both decremented open counts and,
optionally, created a separate list of NFSv4 opens to be closed, it
was possible for the associated OpenOwner to be free'd before the Open
was closed. The problem was that the Open was taken off the OpenOwner
list before the Close RPC was done and OpenOwners can be free'd once the
list is empty. This patch separates out the case of doing the Close RPC
into a separate function called nfscl_doclose() and simplifies nfsrpc_doclose()
so that it closes a single open instead of a list of them. This avoids
removing the Open from the OpenOwner list before doing the Close RPC.

Approved by: re (kensmith), kib (mentor)


194951 25-Jun-2009 rwatson

Add a new global rwlock, in_ifaddr_lock, which will synchronize use of the
in_ifaddrhead and INADDR_HASH address lists.

Previously, these lists were used unsynchronized as they were effectively
never changed in steady state, but we've seen increasing reports of
writer-writer races on very busy VPN servers as core count has gone up
(and similar configurations where address lists change frequently and
concurrently).

For the time being, use rwlocks rather than rmlocks in order to take
advantage of their better lock debugging support. As a result, we don't
enable ip_input()'s read-locking of INADDR_HASH until an rmlock conversion
is complete and a performance analysis has been done. This means that one
class of reader-writer races still exists.

MFC after: 6 weeks
Reviewed by: bz


194541 20-Jun-2009 rmacklem

Replace RPCAUTH_UNIXGIDS with NFS_MAXGRPS so that nfscbd.c will build.

Approved by: kib (mentor)


194523 20-Jun-2009 rmacklem

Change the size of the nfsc_groups[] array in the experimental nfs
client to RPCAUTH_UNIXGIDS + 1 (17), since that is what can go on
the wire for AUTH_SYS authentication.

Reviewed by: brooks
Approved by: kib (mentor)


194498 19-Jun-2009 brooks

Rework the credential code to support larger values of NGROUPS and
NGROUPS_MAX, eliminate ABI dependencies on them, and raise the to 1024
and 1023 respectively. (Previously they were equal, but under a close
reading of POSIX, NGROUPS_MAX was defined to be too large by 1 since it
is the number of supplemental groups, not total number of groups.)

The bulk of the change consists of converting the struct ucred member
cr_groups from a static array to a pointer. Do the equivalent in
kinfo_proc.

Introduce new interfaces crcopysafe() and crsetgroups() for duplicating
a process credential before modifying it and for setting group lists
respectively. Both interfaces take care for the details of allocating
groups array. crsetgroups() takes care of truncating the group list
to the current maximum (NGROUPS) if necessary. In the future,
crsetgroups() may be responsible for insuring invariants such as sorting
the supplemental groups to allow groupmember() to be implemented as a
binary search.

Because we can not change struct xucred without breaking application
ABIs, we leave it alone and introduce a new XU_NGROUPS value which is
always 16 and is to be used or NGRPS as appropriate for things such as
NFS which need to use no more than 16 groups. When feasible, truncate
the group list rather than generating an error.

Minor changes:
- Reduce the number of hand rolled versions of groupmember().
- Do not assign to both cr_gid and cr_groups[0].
- Modify ipfw to cache ucreds instead of part of their contents since
they are immutable once referenced by more than one entity.

Submitted by: Isilon Systems (initial implementation)
X-MFC after: never
PR: bin/113398 kern/133867


194425 18-Jun-2009 alc

Fix some of the style errors in *getpages().


194408 17-Jun-2009 rmacklem

Add the SVC_RELEASE(xprt), as required by r194407.

Approved by: kib (mentor)


194368 17-Jun-2009 bz

Add explicit includes for jail.h to the files that need them and
remove the "hidden" one from vimage.h.


194363 17-Jun-2009 rmacklem

Fix handling of ".." in nfs_lookup() for the forced dismount case
by cribbing the change made to the regular nfs client in r194358.

Approved by: kib (mentor)


194118 13-Jun-2009 jamie

Rename the host-related prison fields to be the same as the host.*
parameters they represent, and the variables they replaced, instead of
abbreviated versions of them.

Approved by: bz (mentor)


194117 13-Jun-2009 jamie

Use getcredhostuuid instead of accessing the prison directly.

Approved by: bz (mentor)


193955 10-Jun-2009 rmacklem

This commit is analagous to r193952, but for the experimental nfs
subsystem. Add a test for VI_DOOMED just after ncl_upgrade_vnlock() in
ncl_bioread_check_cons(). This is required since it is possible
for the vnode to be vgonel()'d while in ncl_upgrade_vnlock() when
a forced dismount is in progress. Also, move the check for VI_DOOMED
in ncl_vinvalbuf() down to after ncl_upgrade_vnlock() and replace the
out of date comment for it.

Approved by: kib (mentor)


193837 09-Jun-2009 rmacklem

Since vn_lock() with the LK_RETRY flag never returns an error
for FreeBSD-CURRENT, the code that checked for and returned the
error was broken. Change it to check for VI_DOOMED set after
vn_lock() and return an error for that case. I believe this
should only happen for forced dismounts.

Approved by: kib (mentor)


193735 08-Jun-2009 rmacklem

Fix nfscl_getcl() so that it doesn't crash when it is called to
do an NFSv4 Close operation with the cred argument NULL. Also,
clarify what NULL arguments mean in the function's comment.

Approved by: kib (mentor)


193187 31-May-2009 alc

nfs_write() can use the recently introduced vfs_bio_set_valid() instead of
vfs_bio_set_validclean(), thereby avoiding the page queues lock.

Garbage collect vfs_bio_set_validclean(). Nothing uses it any longer.


193162 31-May-2009 zec

Unbreak options VIMAGE kernel builds.

Approved by: julian (mentor)


193125 30-May-2009 rmacklem

Add a check to v_type == VREG for the recently modified code that
does NFSv4 Closes in the experimental client's VOP_INACTIVE().
I also replaced a bunch of ap->a_vp with a local copy of vp,
because I thought that made it more readable.

Approved by: kib (mentor)


193066 29-May-2009 jamie

Place hostnames and similar information fully under the prison system.
The system hostname is now stored in prison0, and the global variable
"hostname" has been removed, as has the hostname_mtx mutex. Jails may
have their own host information, or they may inherit it from the
parent/system. The proper way to read the hostname is via
getcredhostname(), which will copy either the hostname associated with
the passed cred, or the system hostname if you pass NULL. The system
hostname can still be accessed directly (and without locking) at
prison0.pr_host, but that should be avoided where possible.

The "similar information" referred to is domainname, hostid, and
hostuuid, which have also become prison parameters and had their
associated global variables removed.

Approved by: bz (mentor)


192986 28-May-2009 alc

Make *getpages()s' assertion on the state of each page's dirty bits
stricter.


192928 27-May-2009 rmacklem

Fix handling of NFSv4 Close operations in ncl_inactive(). Only
do them for NFSv4 and flush writes to the server before doing
the Close(s), as required. Also, use the a_td argument instead of
curthread.

Approved by: kib (mentor)


192675 24-May-2009 rmacklem

Fix the experimental nfsv4 client so that it works for the
case of a kerberized mount without a host based principal
name. This will only work for mounts being done by a user
other than root. Support for a host based principal name
will not work until proposed changes to the rpcsec_gss part
of the krpc are committed. It now builds for "options KGSSAPI".

Approved by: kib (mentor)


192613 22-May-2009 rmacklem

Change the sysctl_base argument to svcpool_create() to NULL for
client side callbacks so that leaf names are not re-used,
since they are already being used by the server.

Approved by: kib (mentor)


192601 22-May-2009 rmacklem

Fix the name of the module common to the client and server
in the experimental nfs subsystem to the correct one for
the MODULE_DEPEND() macro.

Approved by: kib (mentor)


192585 22-May-2009 rmacklem

Modify the mount handling code in the experimental nfs client to
use the newer nmount() style arguments, as is used by mount_nfs.c.
This prepares the kernel code for the use of a mount_nfs.c with
changes for the experimental client integrated into it.

Approved by: kib (mentor)


192582 22-May-2009 rmacklem

Change the code in the experimental nfs client to avoid flushing
writes upon close when a write delegation is held by the client.
This should be safe to do, now that nfsv4 Close operations are
delayed until ncl_inactive() is called for the vnode.

Approved by: kib (mentor)


192337 18-May-2009 rmacklem

Change the experimental NFSv4 client so that it does not do
the NFSv4 Close operations until ncl_inactive(). This is
necessary so that the Open StateIDs are available for doing
I/O on mmap'd files after VOP_CLOSE(). I also changed some
indentation for the nfscl_getclose() function.

Approved by: kib (mentor)


192245 17-May-2009 alc

Merge r191964: Eliminate a case of unnecessary page queues locking.


192231 16-May-2009 rmacklem

Changed sys/fs/nfs_clbio.c in the same way Alan Cox changed
sys/nfsclient/nfs_bio.c for r192134, so that the sources stay
in sync.

Approved by: kib (mentor)


192145 15-May-2009 rmacklem

Modify the diskless booting code in sys/fs/nfsclient to be compatible
with what is in sys/nfsclient, so that it will at least build now.

Approved by: kib (mentor)


192121 14-May-2009 rmacklem

Apply changes to the experimental nfs server so that it uses the security
flavors as exported in FreeBSD-CURRENT. This allows it to use a
slightly modified mountd.c instead of a different utility.

Approved by: kib (mentor)


192065 13-May-2009 rmacklem

Apply a one line change to nfs_clbio.c (which is largely a copy
of sys/nfsclient/nfs_bio.c) to track the change recently committed
by acl for nfs_bio.c.

Approved by: kib (mentor)


191990 11-May-2009 attilio

Remove the thread argument from the FSD (File-System Dependent) parts of
the VFS. Now all the VFS_* functions and relating parts don't want the
context as long as it always refers to curthread.

In some points, in particular when dealing with VOPs and functions living
in the same namespace (eg. vflush) which still need to be converted,
pass curthread explicitly in order to retain the old behaviour.
Such loose ends will be fixed ASAP.

While here fix a bug: now, UFS_EXTATTR can be compiled alone without the
UFS_EXTATTR_AUTOSTART option.

VFS KPI is heavilly changed by this commit so thirdy parts modules needs
to be recompiled. Bump __FreeBSD_version in order to signal such
situation.


191783 04-May-2009 rmacklem

Add the experimental nfs subtree to the kernel, that includes
support for NFSv4 as well as NFSv2 and 3.
It lives in 3 subdirs under sys/fs:
nfs - functions that are common to the client and server
nfsclient - a mutation of sys/nfsclient that call generic functions
to do RPCs and handle state. As such, it retains the
buffer cache handling characteristics and vnode semantics that
are found in sys/nfsclient, for the most part.
nfsserver - the server. It includes a DRC designed specifically for
NFSv4, that is used instead of the generic DRC in sys/rpc.
The build glue will be checked in later, so at this point, it
consists of 3 new subdirs that should not affect kernel building.

Approved by: kib (mentor)