History log of /freebsd-current/sys/fs/nfsclient/nfs_clrpcops.c
Revision Date Author Comments
# 6251027c 25-Apr-2024 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Do not use nfso_own for delayed nfsrpc_doclose()

When an initial attempt to close an NFSv4 lock returns NFSERR_DELAY,
the open structure is put on a list for delayed closing. When this
is done, the nfso_own field is set to NULL, so it cannot be used by
nfsrpc_doclose().

Without this patch, the NFSv4 client can crash when a NFSv4 server
replies NFSERR_DELAY to a Close operation. Fortunately, most extant
NFSv4 servers do not do this. This patch avoids the crash for any
that do return NFSERR_DELAY for Close.

Found during a IETF bakeathon testing event this week.

MFC after: 5 days


# 8efba70d 25-Apr-2024 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Revert part of commit 196787f79e67

Commit 196787f79e67 erroneously assumed that the client code for
Open/Claim_deleg_cur_FH was broken, but it was not.
It was actually wireshark that was broken and indicated
that the correct XDR was bogus.

This reverts the part of 196787f79e67 that changed the arguments for
Open/Claim_deleg_cur_FH.

Found during the IETF bakeathon testing event this week.

MFC after: 3 days


# cc760de2 11-Jan-2024 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Only update atime for Copy when noatime is not specified

Commit 57ce37f9dcd0 modified the NFSv4.2 Copy operation so that
it will update atime on the infd file whenever possible.
This is done by adding a Setattr of TimeAccess for the
input file.

This patch disables this change for the case of an NFSv4.2
mount with the "noatime" mount option, which avoids the
additional Setattr of TimeAccess operation.

MFC after: 1 week


# 6fa843f6 11-Dec-2023 Mark Johnston <markj@FreeBSD.org>

nfsclient: Propagate copyin() errors from nfsm_uiombuf()

Approved by: so
Security: SA-23:18.nfsclient
Reviewed by: rmacklem
Sponsored by: The FreeBSD Foundation


# 0a958aa1 03-Dec-2023 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Fix comment for commit 6aded1e6b2e5

Commit 6aded1e6b2e5 fixed a rare case when handling an NFSv4
Rename reply when delegations are in use. This patch fixes the
associated comment.

MFC after: 2 weeks


# 6aded1e6 03-Dec-2023 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Fix processing of a rare Rename reply case

When delegations are enabled (they are not by default in
the FreeBSD NFSv4 server), rename will check for and return
delegations. If the second of these DelegReturn operations
were to fail (they rarely do), then the code would not retry
the rename with returning delegations, as it is intended to do.

The patch fixes the problem, since the DelegReturn reply status
is the second iteration of the loop and not the first iteration.

As noted, this bug would have rarely manifested a problem, since
DelegReturn operations do not normally fail.

MFC after: 2 weeks


# dd7d42a1 23-Oct-2023 Rick Macklem <rmacklem@FreeBSD.org>

nfscl/kgssapi: Fix Kerberized NFS mounts to pNFS servers

During recent testing related to the IETF NFSv4 Bakeathon, it was
discovered that Kerberized NFSv4.1/4.2 mounts to pNFS servers
(sec=krb5[ip],pnfs mount options) was broken.
The FreeBSD client was using the "service principal" for
the MDS to try and establish a rpcsec_gss credential for a DS,
which is incorrect. (A "service principal" looks like
"nfs@<fqdn-of-server>" and the <fqdn-of-server> for the DS is not
the same as the MDS for most pNFS servers.)

To fix this, the rpcsec_gss code needs to be able to do a
reverse DNS lookup of the DS's IP address. A new kgssapi upcall
to the gssd(8) daemon is added by this patch to do the reverse DNS
along with a new rpcsec_gss function to generate the "service
principal".

A separate patch to the gssd(8) will be committed, so that this
patch will fix the problem. Without the gssd(8) patch, the new
upcall fails and current/incorrect behaviour remains.

This bug only affects the rare case of a Kerberized (sec=krb5[ip],pnfs)
mount using pNFS.

This patch changes the internal KAPI between the kgssapi and
nfscl modules, but since I did a version bump a few days ago,
I will not do one this time.

MFC after: 1 month


# 14bbf4fe 21-Oct-2023 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Handle a Getattr failure with NFSERR_DELAY following Open

During testing at a recent IETF NFSv4 Bakeathon, a non-FreeBSD
server was rebooted. After the reboot, the FreeBSD client sent
an Open/Claim_previous with a Getattr after the Open in the same
compound. The Open/Claim_previous was done to recover the Open
and a Delegation for for a file. The Open succeeded, but the
Getattr after the Open failed with NFSERR_DELAY. This resulted
in the FreeBSD client retrying the entire RPC over and over again,
until the server's recovery grace period ended. Since the Open
succeeded, there was no need to retry the entire RPC.

This patch modifies the NFSv4 client side recovery Open/Claim_previous
RPC reply handling to deal with this case. With this patch, the
Getattr reply of NFSERR_DELAY is ignored and the successful Open
reply is processed.

This bug will not normally affect users, since this non-FreeBSD
server is not widely used (it may not even have shipped to any
customers).

MFC after: 1 month


# 196787f7 20-Oct-2023 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Use Claim_Null_FH and Claim_Deleg_Cur_FH

For NFSv4.1/4.2, there are two new options for the Open operation.
These two options use the file handle for the file instead of the
file handle for the directory plus a file name. By doing so, the
client code is simplified (it no longer needs the "nfsv4node" structure
attached to the NFS vnode). It also avoids problems caused by another
NFS client (or process running locally in the NFS server) doing a
rename or remove of the file name between the Lookup and Open.

Unfortunately, there was a bug (fixed recently by commit X)
in the NFS server which mis-parsed the Claim_Deleg_Cur_FH
arguments. To allow this patch to work with the broken FreeBSD
NFSv4.1/4.2 server, NFSMNTP_BUGGYFBSDSRV is defined and is set
when a correctly formatted Claim_Deleg_Cur_FH fails with NFSERR_EXPIRED.
(This is what the old, broken NFS server does, since it erroneously
uses the Getattr arguments as a stateID.) Once this flag is set,
the client fills in a stateID, to make the broken NFS server happy.

Tested at a recent IETF NFSv4 Bakeathon.

MFC after: 1 month


# 57ce37f9 18-Oct-2023 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Make NFSv4.2 Copy set atime on infd

RFC7862 does not specify infile atime behaviour when a NFSv4.2 Copy
operation is performed. Since the collective opinion of a mailing
list discussion (on freebsd-hackers@) seemed to indicate that
copy_file_range(2) should update atime on the infd,
even if there is no data copied, this
patch attempts to ensure that behaviour.

For Copy, it preceeds the Copy operation with a Setattr of
TimeAccess_Set(NFSv4. speak for atime) for the invp. For the case
where no data will be copied, it does a Setattr RPC to set
TimeAccess_Set for the invp.

A __FreeBSD_version bump will be done as a separate commit, since
this patch changes the internal interface between the nfscommon and
nfscl modules.

MFC after: 1 month


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# ba8cc6d7 12-Mar-2023 Mateusz Guzik <mjg@FreeBSD.org>

vfs: use __enum_uint8 for vtype and vstate

This whacks hackery around only reading v_type once.

Bump __FreeBSD_version to 1400093


# 695d87ba 28-Mar-2023 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Make coverity happy

Coverity does not like code that checks a function's
return value sometimes. Add "(void)" in front of the
function when the return value does not matter to try
and make it happy.

A recent commit deleted "(void)"s in front of nfsm_fhtom().
This commit puts them back in.

Reported by: emaste
MFC after: 3 months


# 1512579a 27-Mar-2023 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Make coverity happy

Coverity does not like code that checks a function's
return value sometimes. Add "(void)" in front of the
function when the return value does not matter to try
and make it happy.

Reported by: emaste
MFC after: 3 months


# 896516e5 16-Mar-2023 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Add a new NFSv4.1/4.2 mount option for Kerberized mounts

Without this patch, a Kerberized NFSv4.1/4.2 mount must provide
a Kerberos credential for the client at mount time. This credential
is typically referred to as a "machine credential". It can be
created one of two ways:
- The user (usually root) has a valid TGT at the time the mount
is done and this becomes the machine credential.
There are two problems with this.
1 - The user doing the mount must have a valid TGT for a user
principal at mount time. As such, the mount cannot be put
in fstab(5) or similar.
2 - When the TGT expires, the mount breaks.
- The client machine has a service principal in its default keytab
file and this service principal (typically called a host-based
initiator credential) is used as the machine credential.
There are problems with this approach as well:
1 - There is a certain amount of administrative overhead creating
the service principal for the NFS client, creating a keytab
entry for this principal and then copying the keytab entry
into the client's default keytab file via some secure means.
2 - The NFS client must have a fixed, well known, DNS name, since
that FQDN is in the service principal name as the instance.

This patch uses a feature of NFSv4.1/4.2 called SP4_NONE, which
allows the state maintenance operations to be performed by any
authentication mechanism, to do these operations via AUTH_SYS
instead of RPCSEC_GSS (Kerberos). As such, neither of the above
mechanisms is needed.

It is hoped that this option will encourage adoption of Kerberized
NFS mounts using TLS, to provide a more secure NFS mount.

This new NFSv4.1/4.2 mount option, called "syskrb5" must be used
with "sec=krb5[ip]" to avoid the need for either of the above
Kerberos setups to be done by the client.

Note that all file access/modification operations still require
users on the NFS client to have a valid TGT recognized by the
NFSv4.1/4.2 server. As such, this option allows, at most, a
malicious client to do some sort of DOS attack.

Although not required, use of "tls" with this new option is
encouraged, since it provides on-the-wire encryption plus,
optionally, client identity verification via a X.509
certificate provided to the server during TLS handshake.
Alternately, "sec=krb5p" does provide on-the-wire
encryption of file data.

A mount_nfs(8) man page update will be done in a separate commit.

Discussed on: freebsd-current@
MFC after: 3 months


# d4a11b3e 11-Jul-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Fix CreateSession for an established ClientID

Commit 981ef32230b2 added optional use of the session
slots marked bad to recover a new session when all
slots are marked bad. The recovery worked against
a FreeBSD NFSv4.1/4.2 server, but not a Linux one.
It turns out that it was a bug in the FreeBSD client
and not the Linux server.

This patch fixes the client so that DeleteSession
followed by CreateSession after receiving a
NFSERR_BADSESSION error reply works against the
Linux server (and conforms to the RFC).

This also implies that the FreeBSD NFSv4.1/4.2
server needs to be fixed in a future commit.
Without the fix, the FreeBSD server does a full
recovery, including creation of a new ClientID,
but since "intr" mounts were broken, this does
not result in a regression.

This patch only affects the case where a CreateSession
is done for an already confirmed ClientID, which was
not being done prior to commit 981ef32230b2.

PR: 260011
MFC after: 2 weeks


# 2adb3074 11-Jul-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Replace "cred" with NULL to cleanup code

Commit 326bcf9394c7 added a new "cred" argument to nfscl_reqstart().
Fsinfo is a NFSv3 RPC and since the "cred" argument is not
used for NFSv3, it does not matter what is passed in.
However, to be consistent with the rest of the patch, change the
argument to NULL.

This patch should not result in a semantics change.

PR: 260011
MFC after: 2 weeks


# 8f4a5fc6 10-Jul-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Do not call nfscl_hasexpired() for NFSv4.1/4.2

Commit 981ef32230b2 enabled marking of potentially bad
session slots when an RPC is interrupted if the "intr"
mount option is used. As such, it no longer makes
sense to call nfscl_hasexpired() for I/O operations that
reply NFSERR_BADSTATEID for NFSv4.1/4.2, which does a full
recovery of NFSv4 open state, destroying all byte range locks.
Recovery of open state should not be usually needed, since
the session slot has been marked potentially bad and,
although opens for the process that has been terminated via
a signal may be broken, locks for other processes will still
be valid.

This patch disables calls to nfscl_hasexpired for NFSv4.1/4.2
mounts, when I/O RPCs receive NFSERR_BADSTATEID replies.
It does not affect the behaviour of NFSv4.0 mounts nor
hard (non "intr") mounts.

PR: 260011
MFC after: 2 weeks


# 627f1555 09-Jul-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Initialize nfsess_badslots to zero

Commit 40ada74ee1da added a field to mark bad session slots.
This patch ensures that the field is initialized to 0.

PR: 260011
MFC after: 2 weeks


# dff31ae1 09-Jul-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Move nfsrpc_destroysession into nfscommon

This patch moves nfsrpc_destroysession() into nfscommon.ko
and also modifies its arguments slightly. This will allow
the function to be called from nfsv4_sequencelookup() in
a future commit.

This patch should not result in a semantics change.

PR: 260011
MFC after: 2 weeks


# 326bcf93 08-Jul-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Add a cred argument to nfscl_reqstart()

To deal with broken session slots caused by the use of the
"soft" and/or "intr" mount options, nfsv4_sequencelookup()
will be modified to track the potentially broken session
slots. Then, when all session slots are potentially
broken, do a DeleteSession operation, so that the NFSv4
server will reply NFSERR_BADSESSION to uses of the session.
These changes will be done in future commits. However,
to do the DeleteSession RPC, a "cred" argument is needed
for nfscl_reqstart(). This patch adds this argument,
which is unused at this time. If the argument is NULL,
it indicates that DeleteSession should not be done
(usually because the RPC does not use sessions).

This patch should not cause any semantics change.

PR: 260011
MFC after: 2 weeks


# be7b87de 08-Jul-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Fix setting of nfsess_defunct for nfscl_hasexpired()

Commit a7bb120f8b87 added a printf for the case where recovery
has not marked the session defunct by setting nfsess_defunct
to 1. It turns out that nfscl_hasexpired() calls
nfsrpc_setclient() directly, without setting nfsess_defunct.
This patch replaces the printf with code that sets
nfsess_defunct to 1 to handle this case.

If SIGTERM is issued to a process when it is doing I/O on
an "intr" mount, the NFSv4 server may reply NFSERR_BADSTATEID,
due to the Open being prematurely closed.
This can result in a call to nfscl_hasexpired() to do a
recovery.

This would explain at least one hang described in the PR.

PR: 260011
MFC after: 2 weeks


# 746974c0 23-Jun-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Clean up the code by not using the vnode_vtype() macro

The vnode_vtype() macro was used to make the code compatible
with Mac OSX, for the Mac OSX port.
For FreeBSD, this macro just obscured the code, so
avoid using it to clean up the code.

This commit should not result in a semantics change.


# 6d25ea6d 18-Jun-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Clean up the code by removing #if(n)def APPLE

The definition of "APPLE" was used by the Mac OSX port.
For FreeBSD, this definition is never used, so remove
the references to it to clean up the code.

This commit should not result in a semantics change.


# 3c4266ed 17-Jun-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Clean up the code by removing unused arguments

The "void *stuff" (also called fstuff and dstuff) argument
was used by the Mac OSX port. For FreeBSD, this argument
is always NULL, so remove it to clean up the code.

This commit gets rid of "stuff" for assorted functions
defined in nfs_clrpcops.c and called in nfs_clvnops.c and
nfs_clstate.c.

This commit should not result in a semantics change.


# 1e70163c 17-Jun-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Clean up the code by removing unused arguments

The "void *stuff" (also called fstuff and dstuff) argument
was used by the Mac OSX port. For FreeBSD, this argument
is always NULL, so remove it to clean up the code.

This commit gets rid of "stuff" for assorted functions
defined in nfs_clrpcops.c and called in nfs_clvnops.c and
nfs_clvfsops.c. Future commits will do the same for other functions.

This commit should not result in a semantics change.


# c692ea40 16-Jun-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Clean up the code by removing unused arguments

The "void *stuff" (also called fstuff and dstuff) argument
was used by the Mac OSX port. For FreeBSD, this argument
is always NULL, so remove it to clean up the code.

This commit gets rid of "stuff" for assorted functions
defined in nfs_clrpcops.c and called in nfs_clvnops.c.
Future commits will do the same for other functions.

This commit should not result in a semantics change.


# af6665e0 16-Jun-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Clean up the code by removing unused arguments

The "void *stuff" (also called fstuff and dstuff) argument
was used by the Mac OSX port. For FreeBSD, this argument
is always NULL, so remove it to clean up the code.

This commit gets rid of "stuff" for assorted functions
defined in nfs_clrpcops.c and called in nfs_clvnops.c.
Future commits will do the same for other functions.

This commit should not result in a semantics change.


# 8cb42d69 15-Jun-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Clean up the code by removing unused arguments

The "void *stuff" (also called fstuff and dstuff) argument
was used by the Mac OSX port. For FreeBSD, this argument
is always NULL, so remove it to clean up the code.

This commit gets rid of "stuff" for assorted functions
defined in nfs_clrpcops.c and called in nfs_clvnops.c.
Future commits will do the same for other functions.

This commit should not result in a semantics change.


# da47c186 15-Jun-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Clean up the code by removing unused arguments

The "void *stuff" (also called fstuff and dstuff) argument
was used by the Mac OSX port. For FreeBSD, this argument
is always NULL, so remove it to clean up the code.

This commit gets rid of "stuff" for assorted functions
defined in nfs_clrpcops.c and called in nfs_clvnops.c.
Future commits will do the same for other functions.

This commit should not result in a semantics change.


# 1c665e95 14-Jun-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Clean up the code by removing unused arguments

The "void *stuff" (also called fstuff and dstuff) argument
was used by the Mac OSX port. For FreeBSD, this argument
is always NULL, so remove it to clean up the code.

This commit gets rid of "stuff" for assorted functions
defined in nfs_clrpcops.c and called in nfs_clvnops.c.
Future commits will do the same for other functions.

This commit should not result in a semantics change.


# 41c029d5 13-Jun-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Clean up the code by removing unused arguments

The "void *stuff" (also called fstuff and dstuff) argument
was used by the Mac OSX port. For FreeBSD, this argument
is always NULL, so remove it to clean up the code.

This commit gets rid of "stuff" for assorted functions
defined in nfs_clrpcops.c and called in nfs_clvnops.c.
Future commits will do the same for other functions.

This commit should not result in a semantics change.


# a7bb120f 27-May-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Add a diagnostic printf() for a "should never happen" case

When a NFSv4.1/4.2 session to the NFS server (not a pNFS DS) is
replaced, the old session should always be marked defunct by
nfsess_defunct being set non-zero.

However, the hang reported by the PR suggests that this might
be the case.

This patch adds a printf() to indicate this has somehow happened.

PR: 260011
MFC after: 2 weeks


# 425e5c73 27-May-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Do not handle NFSERR_BADSESSION in operation code

The NFSERR_BADSESSION reply from a NFSv4.1/4.2 server
is handled by newnfs_request(). It should not be handled
separately after newnfs_request() has returned.

These two cases were spotted during code inspection.
One of them should only redo what newnfs_request() already
did by the same "nfscl" thread. The other might have
resulted in recovery being done twice, but the code is
only used for "pnfs" mounts, so that would be rare.
Also, since NFSERR_BADSESSION should only be replied by
a server after the server reboots, this would be extremely
rare.

MFC after: 2 weeks


# 70910e4b 03-May-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Acquire a refcount on "cred" for mirrored pNFS RPCs

When the NFSv4.1/4.2 client is doing a pnfs mount to
mirrored DS(s), asynchronous threads are used to do the
RPCs against the DS(s) concurrently. If a DS is slow
to reply, it is possible for the "cred" to be free'd
before the asynchronous thread is done with it, causing
a panic/crash.

This patch fixes the problem by acquiring a refcount on
the "cred" while it is being used by the asynchronous thread
for a DS RPC. This bug was found during a recent IETF
NFSv4 testing event.

This bug only affects "pnfs" mounts to mirrored pNFS
servers.

MFC after: 2 weeks


# 5218d82c 30-Apr-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Add support for a NFSv4 AppendWrite RPC

For IO_APPEND VOP_WRITE()s, the code first does a
Getattr RPC to acquire the file's size, before it
can do the Write RPC.

Although NFS does not have an append write operation,
an NFSv4 compound can use a Verify operation to check
that the client's notion of the file's size is
correct, followed by the Write operation.

This patch modifies the NFSv4 client to use an Appendwrite
RPC, which does a Verify to check the file's size before
doing the Write. This avoids the need for a Getattr RPC
to preceed this RPC and reduces the RPC count by half for
IO_APPEND writes, so long as the client knows the file's
size.

The nfsd structure was moved from the stack to be malloc()'d,
since the kernel stack limit was being exceeded.

While here, fix the types of a few variables, although
there should not be any semantics change caused by these
type changes.


# 32c3e0f0 15-Apr-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Clean up the code by removing unused arguments

The "void *stuff" (also called fstuff and dstuff) argument
was used by the Mac OSX port. For FreeBSD, this argument
is always NULL, so remove it to clean up the code.

This commit gets rid of "stuff" for assorted functions
local to nfs_clrpcops.c.
Future commits will do the same for other functions.


# 068fc057 14-Apr-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Clean up the code by removing unused arguments

The "void *stuff" (also called fstuff and dstuff) argument
was used by the Mac OSX port. For FreeBSD, this argument
is always NULL, so remove it to clean up the code.

This commit gets rid of "stuff" for nfscl_nget().
Future commits will do the same for other functions.


# 4ad3423b 13-Apr-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Clean up the code by removing unused arguments

The "void *stuff" (also called fstuff and dstuff) argument
was used by the Mac OSX port. For FreeBSD, this argument
is always NULL, so remove it to clean up the code.

This commit gets rid of "stuff" for nfscl_loadattrcache().
Future commits will do the same for other functions.


# 5580e5bd 10-Apr-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Clean up the code by removing unused arguments

The "void *stuff" (also called fstuff and dstuff) argument
was used by the Mac OSX port. For FreeBSD, this argument
is always NULL, so remove it to clean up the code.

This commit gets rid of "stuff" for nfscl_request().
Future commits will do the same for other functions.


# 38c3cf6a 09-Apr-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Clean up the code by removing unused arguments

The "void *stuff" (also called fstuff and dstuff) argument
was used by the Mac OSX port. For FreeBSD, this argument
is always NULL, so remove it to clean up the code.

This commit gets rid of "stuff" for nfscl_postop_attr().
Future commits will do the same for other functions.


# 21de450a 08-Apr-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Add support for a NFSv4 AppendWrite RPC

For IO_APPEND VOP_WRITE()s, the code first does a
Getattr RPC to acquire the file's size, before it
can do the Write RPC.

Although NFS does not have an append write operation,
an NFSv4 compound can use a Verify operation to check
that the client's notion of the file's size is
correct, followed by the Write operation.

This patch modifies nfscl_wcc_data() to optionally
acquire the file's size, for use with an AppendWrite.
Although the "stuff" arguments are always NULL
(these were used for the Mac OSX port and should be
cleared out someday), make the argument to
nfscl_wcc_data() explicitly NULL for clarity.

This patch does not cause any semantics change until
the AppendWrite is added in a future commit.


# 57014f21 13-Mar-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Fix NFSv4.1/4.2 Lookup+Open RPC

Use of the Lookup+Open RPC is currently disabled,
due to a problem detected during testing. This
patch fixes this problem. The problem was that
nfscl_postop_attr() does not parse the attributes
if nd_repstat != 0. It also would parse the
return status for the operation, where the
Lookup+Open code had already parsed it.

The first change in the patch does not make any
semantics change, but makes the code identical
to what is done later in the function, so that
it is apparent that the semantics should be the
same in both places.

Lookup+Open remains disabled while further
testing is being done, so this patch has no
effect at this time.


# a91a5784 11-Jan-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: Do not accept audit/alarm ACEs for the NFSv4 server

The UFS and ZFS file systems only support Allow/Deny ACEs
in the NFSv4 ACLs. This patch does not allow the server
to parse Audit/Alarm ACEs. The NFSv4 client is still
allowed to pase Audit/Alarm ACEs, since non-FreeBSD NFSv4
servers may use them.

This patch should not have a significant effect, since the
UFS and ZFS file systems will not handle these ACEs anyhow.
It simply serves as an additional "safety belt" for the
NFSv4 server.

MFC after: 2 weeks


# 5da9b3b0 11-Jan-2022 Rick Macklem <rmacklem@FreeBSD.org>

Revert "nfscommon: Add arguments for support of the dacl attribute"

This reverts commit 0fa074b53e7c22157dcb41aaa25a33abc8118f26.

I now see that the implementation of the "dacl" operation
requires that the NFSv4 server to "automatic inheritance"
and I do not plan on doing this. As such, this patch is
harmless, but unneeded.


# 0fa074b5 26-Dec-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscommon: Add arguments for support of the dacl attribute

NFSv4.1/4.2 has an alternative to the acl attribute, called
dacl, that includes support for the ACL_ENTRY_INHERITED flag,
called NFSV4ACE_INHERITED in NFSv4.

This patch adds a dacl argument to nfsrv_buildacl(),
nfsrv_dissectacl() and nfsrv_dissectace(), so that they
will handle NFSV4ACE_INHERITED when dacl == true.

Since these functions are always called with dacl == false
for this patch, semantics should not have changed.
A future patch will add support for dacl.

MFC after: 2 weeks


# 24947b70 12-Dec-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Fix must_commit handling for mirrored pNFS mounts

For pNFS mounts to mirrored Flexible File layout pNFS servers,
the "must_commit" component in the nfsclwritedsdorpc
structure must be checked and the "must_commit" argument passed
into nfscl_doiods() must be updated. Technically, only writes to
the DS with a writeverf change must be redone, but since this
occurrence will be rare, the must_commit argument to nfscl_doiosd()
is set to 1, so all writes to all DSs will be redone.

This bug would affect few, since use of mirrored pNFS servers
is rare and "writeverf" rarely changes. Normally "writeverf"
only changes when a NFS server reboots.

MFC after: 2 weeks


# ead50c94 11-Dec-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Fix must_commit/writeverf handling for Direct I/O

Without this patch, the KASSERT(must_commit == 0,..) can be
triggered by the writeverf in the Direct I/O write reply changing.
This is not a situation that should cause a panic(). Correct
handling is to ignore the change in "writeverf" for Direct
I/O, since it is done with NFSWRITE_FILESYNC.

This patch modifies the semantics of the "must_commit"
argument slightly, allowing an initial value of 2 to indicate
that a change in "writeverf" should be ignored.
It also fixes the KASSERT()s.

This bug would affect few, since Direct I/O is not enabled
by default and "writeverf" rarely changes. Normally "writeverf"
only changes when a NFS server reboots, however I found the
bug when testing against a Linux 5.15.1 kernel nfsd, which
replied to a NFSWRITE_FILESYNC write with a "writeverf" of all
0x0 bytes.

MFC after: 2 weeks


# ab639f23 09-Dec-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Check for an error return from nfsrv_getattrbits()

There were two places where the client code did not check
for a parse error return from nfsrv_getattrbits().

This patch fixes both of these cases.

Reported by: rtm@lcs.mit.edu
Tested by: rtm@lcs.mit.edu
PR: 260272
MFC after: 2 weeks


# 22f7bcb5 26-Nov-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Sanity check irdcnt in nfsrpc_createsession

Reported by: rtm@lcs.mit.edu
Tested by: rtm@lcs.mit.edu
PR: 259996
MFC after: 2 weeks


# 44744f75 11-Nov-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Add a LayoutError RPC for NFSv4.2 pNFS mounts

If a pNFS server's DS runs out of disk space, it replies
NFSERR_NOSPC to the client doing writing. For the Linux
client, it then sends a LayoutError RPC to the MDS server to
tell it about the error. This patch adds the same to the
FreeBSD NFSv4.2 pNFS client, to maintain Linux compatible
behaviour, particlularily for non-FreeBSD pNFS servers.

MFC after: 2 weeks


# 2be41784 30-Oct-2021 Rick Macklem <rmacklem@FreeBSD.org>

PR#259071 provides a test program that fails for the NFS client.
Testing with it, there appears to be a race between Lookup
and VOPs like Setattr-of-size, where Lookup ends up loading
stale attributes (including what might be the wrong file size)
into the NFS vnode's attribute cache.

The race occurs when the modifying VOP (which holds a lock
on the vnode), blocks the acquisition of the vnode in Lookup,
after the RPC (with now potentially stale attributes).

Here's what seems to happen:
Child Parent

does stat(), which does
VOP_LOOKUP(), doing the Lookup
RPC with the directory vnode
locked, acquiring file attributes
valid at this point in time

blocks waiting for locked file does ftruncate(), which
vnode does VOP_SETATTR() of Size,
changing the file's size
while holding an exclusive
lock on the file's vnode
releases the vnode lock
acquires file vnode and fills in
now stale attributes including
the old wrong Size
does a read() which returns
wrong data size

This patch fixes the problem by saving a timestamp in the NFS vnode
in the VOPs that modify the file (Setattr-of-size, Allocate).
Then lookup/readdirplus compares that timestamp with the time just
before starting the RPC after it has acquired the file's vnode.
If the modifying RPC occurred during the Lookup, the attributes
in the RPC reply are discarded, since they might be stale.

With this patch the test program works as expected.

Note that the test program does not fail on a July stable/12,
although this race is in the NFS client code. I suspect a
fairly recent change to the name caching code exposed this
bug.

PR: 259071
Reviewed by: asomers
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D32635


# 23024f00 25-Oct-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Add a missing delegation lock release

There was a case in nfscl_doiods() where the function would return
without releasing the delegation shared lock, if it was aquired by
the call to nfscl_getstateid(). This patch adds that release.

I have never observed a failure due to this missing release, so I
do not know if it ever happens in practice. However, since the pNFS
client is not yet heavily used, it might be the case.

Found by code inspection during a recent NFSv4 IETF working group
testing event.

MFC after: 2 week


# a4667e09 19-Oct-2021 Mark Johnston <markj@FreeBSD.org>

Convert vm_page_alloc() callers to use vm_page_alloc_noobj().

Remove page zeroing code from consumers and stop specifying
VM_ALLOC_NOOBJ. In a few places, also convert an allocation loop to
simply use VM_ALLOC_WAITOK.

Similarly, convert vm_page_alloc_domain() callers.

Note that callers are now responsible for assigning the pindex.

Reviewed by: alc, hselasky, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31986


# 52dee2bc 18-Oct-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Handle NFSv4.1/4.2 Close RPC NFSERR_DELAY replies better

Without this patch, if a NFSv4.1/4.2 server replies NFSERR_DELAY to
a Close operation, the client loops retrying the Close while holding
a shared lock on the clientID. This shared lock blocks returns of
delegations, even though the server has issued a CB_RECALL to request
the delegation return.

This patch delays doing a retry of a Close that received a reply of
NFSERR_DELAY until after the shared lock on the clientID is released,
for NFSv4.1/4.2. To fix this for NFSv4.0 would be very difficult and
since the only known NFSv4 server to reply NFSERR_DELAY to Close only
does NFSv4.1/4.2, this fix is hoped to be sufficient.

This problem was detected during a recent IETF working group NFSv4
testing event.

MFC after: 2 week


# d95c0a12 17-Oct-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Modify Close RPC so that it does not use "owner" for NFSv4.1/4.2

This patch modifies the function that does the Close RPC (nfsrpc_closerpc)
so that it does not use the open_owner (nfso_own) for NFSv4.1/4.2.
Use of the seqid in the open_owner structure is only needed for NFSv4.0.
Same applies to a NFSERR_STALESTATEID reply, which should only happen
for NFSv4.0. This allows nfsrpc_closerpc() to be called when nfso_own
is no longer valid. This, in turn, allows nfsrpc_closerpc() to be called
after the shared lock on the clientID is released, for NFSv4.1/4.2.

This is being done to prepare the code for a future patch that fixes
the case where an NFSv4.1/4.2 server replies NFSERR_DELAY to a Close
operation.

MFC after: 2 week


# e2aab5e2 16-Oct-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Move release of the clientID lock into nfscl_doclose()

This patch moves release of the shared clientID lock from nfsrpc_close()
just after the nfscl_doclose() call to the end of nfscl_doclose() call.
This does make the code cleaner, since the shared lock is acquired at
the beginning of nfscl_doclose(). The only semantics change is that
the code no longer drops and reaquires the NFSCLSTATELOCK() mutex,
which I do not believe will have a negative effect on the NFSv4 client.

This is being done to prepare the code for a future patch that fixes
the case where an NFSv4.1/4.2 server replies NFSERR_DELAY to a Close
operation.

MFC after: 2 week


# 77c595ce 15-Oct-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Add an argument to nfscl_tryclose()

This patch adds a new argument to nfscl_tryclose() to indicate
whether or not it should loop when a NFSERR_DELAY reply is received
from the NFSv4 server. Since this new argument is always passed in
as "true" at this time, no semantics change should occur.

This is being done to prepare the code for a future patch that fixes
the case where an NFSv4.1/4.2 server replies NFSERR_DELAY to a Close
operation.

MFC after: 2 week


# 6495766a 14-Oct-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Restructure nfscl_freeopen() slightly

This patch factors the unlinking of the nfsclopen structure out of
nfscl_freeopen() into a separate function called nfscl_unlinkopen().
It also adds a new argument to nfscl_freeopen() to conditionally do
the unlink. Since this new argument is always passed in as "true"
at this time, no semantics change should occur.

This is being done to prepare the code for a future patch that fixes
the case where an NFSv4.1/4.2 server replies NFSERR_DELAY to a Close
operation.

MFC after: 2 week


# 24af0fcd 13-Oct-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Make nfscl_getlayout() acquire the correct pNFS layout

Without this patch, if a pNFS read layout has already been acquired
for a file, writes would be redirected to the Metadata Server (MDS),
because nfscl_getlayout() would not acquire a read/write layout for
the file. This happened because there was no "mode" argument to
nfscl_getlayout() to indicate whether reading or writing was being done.
Since doing I/O through the Metadata Server is not encouraged for some
pNFS servers, it is preferable to get a read/write layout for writes
instead of redirecting the write to the MDS.

This patch adds a access mode argument to nfscl_getlayout() and
nfsrpc_getlayout(), so that nfscl_getlayout() knows to acquire a read/write
layout for writing, even if a read layout has already been acquired.
This patch only affects NFSv4.1/4.2 client behaviour when pNFS ("pnfs" mount
option against a server that supports pNFS) is in use.

This problem was detected during a recent NFSv4 interoperability
testing event held by the IETF working group.

MFC after: 2 week


# 120b20bd 11-Oct-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Fix a deadlock related to the NFSv4 clientID lock

Without this patch, it is possible for a process doing an NFSv4
Open/create of a file to block to allow another process
to acquire the exclusive lock on the clientID when holding
a shared lock on the clientID. As such, both processes
deadlock, with one wanting the exclusive lock, while the
other holds the shared lock. This deadlock is unlikely to occur
unless delegations are in use on the NFSv4 mount.

This patch fixes the problem by not deferring to the process
waiting for the exclusive lock when a shared lock (reference cnt)
is already held by the process.

This problem was detected during a recent NFSv4 interoperability
testing event held by the IETF working group.

MFC after: 1 week


# 55089ef4 11-Sep-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Make vfs.nfs.maxcopyrange larger by default

As of commit 103b207536f9, the NFSv4.2 server will limit the size
of a Copy operation based upon a 1 second timeout. The Linux 5.2
kernel server also limits Copy operation size to 4Mbytes.
As such, the NFSv4.2 client can attempt a large Copy without
resulting in a long RPC RTT for these servers.

This patch changes vfs.nfs.maxcopyrange to 64bits and sets
the default to the maximum possible size of SSIZE_MAX, since
a larger size makes the Copy operation more efficient and
allows for copying to complete with fewer RPCs.
The sysctl may be need to be made smaller for other non-FreeBSD
NFSv4.2 servers.

MFC after: 2 weeks


# 08b9cc31 27-Aug-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Add a VOP_DEALLOCATE() for the NFSv4.2 client

This patch adds a VOP_DEALLOCATE() to the NFS client.
For NFSv4.2 servers that support the Deallocate operation,
it is used. Otherwise, it falls back on calling
vop_stddeallocate().

Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D31640


# 3ad1e1c1 11-Aug-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Add a Lookup+Open RPC for NFSv4.1/4.2

This patch adds a Lookup+Open compound RPC to the NFSv4.1/4.2
NFS client, which can be used by nfs_lookup() so that a
subsequent Open RPC is not required.
It uses the cn_flags OPENREAD, OPENWRITE added by commit c18c74a87c15.
This reduced the number of RPCs by about 15% for a kernel
build over NFS.

For now, use of Lookup+Open is only done when the "oneopenown"
mount option is used. It may be possible for Lookup+Open to
be used for non-oneopenown NFSv4.1/4.2 mounts, but that will
require extensive further testing to determine if it works.

While here, I've added the changes to the nfscommon module
that are needed to implement the Deallocate NFSv4.2 operation.
This avoids needing another cycle of changes to the internal
KAPI between the NFS modules.

This commit has changed the internal KAPI between the NFS
modules and, as such, all need to be rebuilt from sources.
I have not bumped __FreeBSD_version, since it was bumped a
few days ago.


# efea1bc1 28-Jul-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Cache an open stateid for the "oneopenown" mount option

For NFSv4.1/4.2, if the "oneopenown" mount option is used,
there is, at most, only one open stateid for each NFS vnode.
When an open stateid for a file is acquired, set a pointer to
the open structure in the NFS vnode. This pointer can be used to
acquire the open stateid without searching the open linked list
when the following is true:
- No delegations have been issued for the file. Since delegations
can outlive an NFS vnode for a file, use the global
NFSMNTP_DELEGISSUED flag on the mount to determine this.
- No lock stateid has been issued for the file. To determine
this, a new NFS vnode flag called NMIGHTBELOCKED is set when a lock
stateid is issued, which can then be tested.

When this open structure pointer can be used, it avoids the need to
acquire the NFSCLSTATELOCK() and searching the open structure list for
an open. The NFSCLSTATELOCK() can be highly contended when there are
a lot of opens issued for the NFSv4.1/4.2 mount.

This patch only affects NFSv4.1/4.2 mounts when the "oneopenown"
mount option is used.

MFC after: 2 weeks


# 7f5508fe 14-Jul-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Avoid KASSERT() panic in cache_enter_time()

Commit 844aa31c6d87 added cache_enter_time_flags(), specifically
so that the NFS client could specify that cache enter replace
any stale entry for the same name. Doing so avoids a KASSERT()
panic() in cache_enter_time(), as reported by the PR.

This patch uses cache_enter_time_flags() for Readdirplus, to
avoid the panic(), since it is impossible for the NFS client
to know if another client (or a local process on the NFS server)
has replaced a file with another file of the same name.

This patch only affects NFS mounts that use the "rdirplus"
mount option.

There may be other places in the NFS client where this needs
to be done, but no panic() has been observed during testing.

PR: 257043
MFC after: 2 weeks


# 1e0a518d 08-Jul-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Add a Linux compatible "nconnect" mount option

Linux has had an "nconnect" NFS mount option for some time.
It specifies that N (up to 16) TCP connections are to created for a mount,
instead of just one TCP connection.

A discussion on freebsd-net@ indicated that this could improve
client<-->server network bandwidth, if either the client or server
have one of the following:
- multiple network ports aggregated to-gether with lagg/lacp.
- a fast NIC that is using multiple queues
It does result in using more IP port#s and might increase server
peak load for a client.

One difference from the Linux implementation is that this implementation
uses the first TCP connection for all RPCs composed of small messages
and uses the additional TCP connections for RPCs that normally have
large messages (Read/Readdir/Write). The Linux implementation spreads
all RPCs across all TCP connections in a round robin fashion, whereas
this implementation spreads Read/Readdir/Write across the additional
TCP connections in a round robin fashion.

Reviewed by: markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D30970


# aed98fa5 15-Jun-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Make NFSv4.0 client acquisition NFSv4.1/4.2 compatible

When the NFSv4.0 client was implemented, acquisition of a clientid
via SetClientID/SetClientIDConfirm was done upon the first Open,
since that was when it was needed. NFSv4.1/4.2 acquires the clientid
during mount (via ExchangeID/CreateSession), since the associated
session is required during mount.

This patch modifies the NFSv4.0 mount so that it acquires the
clientid during mount. This simplifies the code and makes it
easy to implement "find the highest minor version supported by
the NFSv4 server", which will be done for the default minorversion
in a future commit.
The "start_renewthread" argument for nfscl_getcl() is replaced
by "tryminvers", which will be used by the aforementioned
future commit.

MFC after: 2 weeks


# dd02d9d6 07-May-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Add support for va_birthtime to NFSv4

There is a NFSv4 file attribute called TimeCreate
that can be used for va_birthtime.
r362175 added some support for use of TimeCreate.
This patch completes support of va_birthtime by adding
support for setting this attribute to the server.
It also eanbles the client to
acquire and set the attribute for a NFSv4
server that supports the attribute.

Reviewed by: markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D30156


# 0755df1e 03-May-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: fix typo in a comment

MFC after: 2 weeks


# 7763814f 11-Apr-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfsv4 client: do the BindConnectionToSession as required

During a recent testing event, it was reported that the NFSv4.1/4.2
server erroneously bound the back channel to a new TCP connection.
RFC5661 specifies that the fore channel is implicitly bound to a
new TCP connection when an RPC with Sequence (almost any of them)
is done on it. For the back channel to be bound to the new TCP
connection, an explicit BindConnectionToSession must be done as
the first RPC on the new connection.

Since new TCP connections are created by the "reconnect" layer
(sys/rpc/clnt_rc.c) of the krpc, this patch adds an optional
upcall done by the krpc whenever a new connection is created.
The patch also adds the specific upcall function that does a
BindConnectionToSession and configures the krpc to call it
when required.

This is necessary for correct interoperability with NFSv4.1/NFSv4.2
servers when the nfscbd daemon is running.

If doing NFSv4.1/NFSv4.2 mounts without this patch, it is
recommended that the nfscbd daemon not be running and that
the "pnfs" mount option not be specified.

PR: 254840
Comments by: asomers
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D29475


# 5666643a 13-Mar-2021 Gordon Bergling <gbe@FreeBSD.org>

Fix some common typos in comments

- occured -> occurred
- normaly -> normally
- controling -> controlling
- fileds -> fields
- insterted -> inserted
- outputing -> outputting

MFC after: 1 week


# c04199af 02-Mar-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfsclient: Fix ReadDS/WriteDS/CommitDS nfsstats RPC counts for a NFSv3 DS

During a recent virtual NFSv4 testing event, a bug in the FreeBSD client
was detected when doing I/O DS operations on a Flexible File Layout pNFS
server. For an NFSv3 DS, the Read/Write/Commit nfsstats were incremented
instead of the ReadDS/WriteDS/CommitDS counts.
This patch fixes this.

Only the RPC counts reported by nfsstat(1) were affected by this bug,
the I/O operations were performed correctly.

MFC after: 2 weeks


# 94f2e42f 01-Mar-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfsclient: Fix the stripe unit size for a File Layout pNFS layout

During a recent virtual NFSv4 testing event, a bug in the FreeBSD client
was detected when doing a File Layout pNFS DS I/O operation.
The size of the I/O operation was smaller than expected.
The I/O size is specified as a stripe unit size in bits 6->31 of nflh_util
in the layout. I had misinterpreted RFC5661 and had shifted the value
right by 6 bits. The correct interpretation is to use the value as
presented (it is always an exact multiple of 64), clearing bits 0->5.
This patch fixes this.

Without the patch, I/O through the DSs work, but the I/O size is 1/64th
of what is optimal.

MFC after: 2 weeks


# 3fe2c68b 27-Feb-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfsclient: fix panic in cache_enter_time()

Juraj Lutter (otis@) reported a panic "dvp != vp not true" in
cache_enter_time() called from the NFS client's nfsrpc_readdirplus()
function.
This is specific to an NFSv3 mount with the "rdirplus" mount
option. Unlike NFSv4, NFSv3 replies to ReaddirPlus
includes entries for the current directory.

This trivial patch avoids doing a cache_enter_time()
call for the current directory to avoid the panic.

Reported by: otis
Tested by: otis
Reviewed by: mjg
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D28969


# 586ee69f 01-Sep-2020 Mateusz Guzik <mjg@FreeBSD.org>

fs: clean up empty lines in .c and .h files


# 808306dd 17-Aug-2020 Rick Macklem <rmacklem@FreeBSD.org>

Delete the unused "use_ext" argument to nfscl_reqstart().

This is a partial revert of r363210, since the "use_ext" argument added
by that commit is not actually useful.

This patch should not result in any semantics change.


# 02511d21 10-Aug-2020 Rick Macklem <rmacklem@FreeBSD.org>

Add an argument to newnfs_connect() that indicates use TLS for the connection.

For NFSv4.0, the server creates a server->client TCP connection for callbacks.
If the client mount on the server is using TLS, enable TLS for this callback
TCP connection.
TLS connections from clients will not be supported until the kernel RPC
changes are committed.

Since this changes the internal ABI between the NFS kernel modules that
will require a version bump, delete newnfs_trimtrailing(), which is no
longer used.

Since LCL_TLSCB is not yet set, these changes should not have any semantic
affect at this time.


# cfaafa79 24-Jul-2020 Rick Macklem <rmacklem@FreeBSD.org>

Add support for ext_pgs mbufs to nfsm_uiombuflist() and nfsm_split().

This patch uses a slightly different algorithm for nfsm_uiombuflist() for
the non-ext_pgs case, where a variable called "mcp" is maintained, pointing to
the current location that mbuf data can be filled into. This avoids use of
mtod(mp, char *) + mp->m_len to calculate the location, since this does
not work for ext_pgs mbufs and I think it makes the algorithm more readable.
This change should not result in semantic changes for the non-ext_pgs case.
The patch also deletes come unneeded code.

It also adds support for anonymous page ext_pgs mbufs to nfsm_split().

This is another in the series of commits that add support to the NFS client
and server for building RPC messages in ext_pgs mbufs with anonymous pages.
This is useful so that the entire mbuf list does not need to be
copied before calling sosend() when NFS over TLS is enabled.
At this time for this case, use of ext_pgs mbufs cannot be enabled, since
ktls_encrypt() replaces the unencrypted data with encrypted data in place.

Until such time as this can be enabled, there should be no semantic change.
Also, note that this code is only used by the NFS client for a mirrored pNFS
server.


# 9516bcdf 22-Jul-2020 Rick Macklem <rmacklem@FreeBSD.org>

Modify writing to mirrored pNFS DSs to prepare for use of ext_pgs mbufs.

This patch modifies writing to mirrored pNFS DSs slightly so that there is
only one m_copym() call for a mirrored pair instead of two of them.
This call replaces the custom nfsm_copym() call, which is no longer needed
and deleted by this patch. The patch does introduce a new nfsm_split()
function that only calls m_split() for the non-ext_pgs case.
The semantics of nfsm_uiombuflist() is changed to include code that nul
pads the generated mbuf list. This was done by nfsm_copym() prior to this patch.

The main reason for this change is that it allows the data to be a list
of ext_pgs mbufs, since the m_copym() is for the entire mbuf list.
This support will be added in a future commit.

This patch only affects writing to mirrored flexible file layout pNFS servers.


# 7477442f 14-Jul-2020 Rick Macklem <rmacklem@FreeBSD.org>

Fix the pNFS flexible file layout client for servers with small write size.

The code in nfscl_dofflayout() loops when a flexible file layout server
provides a small write data limit (no extant server is known to do this).
If/when it looped, it erroneously reused the "drpc" argument for the
mirror worker thread, corrupting it.
This patch fixes the problem by only using the calling thread after the
first loop iteration.

Found during testing by simulating a server with a small write size.

Since no extant pNFS server is known to provide a small write size,
this fix it not needed in practice at this time.

MFC after: 2 weeks


# 4476c1de 25-Jun-2020 Rick Macklem <rmacklem@FreeBSD.org>

Add a boolean argument to nfscl_reqstart() to indicate that ext_pgs mbufs
should be used.

For KERN_TLS (and possibly some other future network interface) the mbuf
list passed into sosend() must be ext_pgs mbufs. The krpc could simply
copy all the mbuf data into ext_pgs mbufs before calling sosend(), but
that would be inefficient for large RPC messages.
This patch adds an argument to nfscl_reqstart() to indicate that it should
fill the RPC message into ext_pgs mbufs.
It also adds fields to "struct nfsrv_descript" needed for building NFS RPC
messages in ext_pgs mbufs, along with new flags for this.

Since the argument is always "false", this commit should not result in any
semantic change. However, this commit prepares the code
for future commits that will add support for building of NFS RPC messages
in ext_pgs mbufs.


# eea79fde 17-Jun-2020 Alan Somers <asomers@FreeBSD.org>

Remove vfs_statfs and vnode_mount macros from NFS

These macro definitions are no longer needed as the NFS OSX port is long
dead. The vfs_statfs macro conflicts with the vfsops field of the same
name.

Submitted by: shivank@
Reviewed by: rmacklem
MFC after: 2 weeks
Sponsored by: Google, Inc. (GSoC 2020)
Differential Revision: https://reviews.freebsd.org/D25263


# b9cc3262 12-May-2020 Ryan Moeller <freqlabs@FreeBSD.org>

nfs: Remove APPLESTATIC macro

It is no longer useful.

Reviewed by: rmacklem
Approved by: mav (mentor)
MFC after: 1 week
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D24811


# 32033b3d 08-May-2020 Ryan Moeller <freqlabs@FreeBSD.org>

Remove APPLEKEXT ifndefs

They are no longer useful.

Reviewed by: rmacklem
Approved by: mav (mentor)
MFC after: 1 week
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D24752


# 5ecf33c6 27-Apr-2020 Rick Macklem <rmacklem@FreeBSD.org>

Get rid of uio_XXX macros used for the Mac OS/X port.

The NFS code had a bunch of Mac OS/X accessor functions named uio_XXX
left over from the port to Mac OS/X. Since that port is long forgotten,
replace the calls with the code generated by the FreeBSD macros for these
in nfskpiport.h. This allows the macros to be deleted from nfskpiport.h
and I think makes the code more readable.

This patch should not result in any semantic change.


# 897d7d45 22-Apr-2020 Rick Macklem <rmacklem@FreeBSD.org>

Make the NFSv4.n client's recovery from NFSERR_BADSESSION RFC5661 conformant.

RFC5661 specifies that a client's recovery upon receipt of NFSERR_BADSESSION
should first consist of a CreateSession operation using the extant ClientID.
If that fails, then a full recovery beginning with the ExchangeID operation
is to be done.
Without this patch, the FreeBSD client did not attempt the CreateSession
operation with the extant ClientID and went directly to a full recovery
beginning with ExchangeID. I have had this patch several years, but since
no extant NFSv4.n server required the CreateSession with extant ClientID,
I have never committed it.
I an committing it now, since I suspect some future NFSv4.n server will
require this and it should not negatively impact recovery for extant NFSv4.n
servers, since they should all return NFSERR_STATECLIENTID for this first
CreateSession.

The patched client has been tested for recovery against both the FreeBSD
and Linux NFSv4.n servers and no problems have been observed.

MFC after: 1 month


# 0bda1ddd 15-Apr-2020 Rick Macklem <rmacklem@FreeBSD.org>

Fix the NFSv4.2 extended attribute support for remove extended attrbute.

I missed the "atomic" field of the RemoveExtendedAttribute operation's
reply when I implemented it. It worked between FreeBSD client and server,
since it was missed for both, but it did not conform to RFC 8276.
This patch adds the field for both client and server.

Thanks go to Frank for doing interoperability testing of the extended
attribute support against patches for Linux.

Submitted by: Frank van der Linden <fllinden@amazon.com>
Reported by: Frank van der Linden <fllinden@amazon.com>


# fb8ed4c5 14-Apr-2020 Rick Macklem <rmacklem@FreeBSD.org>

Fix the NFSv2 extended attribute support to handle 0 length attributes.

I did not realize that zero length attributes are allowed, but they are.
This patch fixes the NFSv4.2 client and server to handle zero length
extended attributes correctly.

Submitted by: Frank van der Linden <fllinden@amazon.com> (earlier version)
Reported by: Frank van der Linden <fllinder@amazon.com>


# e3e7c612 11-Apr-2020 Rick Macklem <rmacklem@FreeBSD.org>

Replace mbuf macros with the code they would generate in the NFS code.

When the code was ported to Mac OS/X, mbuf handling functions were
converted to using the Mac OS/X accessor functions. For FreeBSD, they
are a simple set of macros in sys/fs/nfs/nfskpiport.h.
Since porting to Mac OS/X is no longer a consideration, replacement of
these macros with the code generated by them makes the code more
readable.
When support for external page mbufs is added as needed by the KERN_TLS,
the patch becomes simpler if done without the macros.

This patch should not result in any semantic change.

This is the final patch of this series and the macros should now be
able to be deleted from the .h files in a future commit.


# f808cf72 13-Dec-2019 Rick Macklem <rmacklem@FreeBSD.org>

Silence some "might not be initialized" warnings for riscv64.

None of these case were actually using the variable(s) uninitialized, but
I figured that silencing the warnings via initializing them made sense.

Some of these predated r355677.


# c057a378 12-Dec-2019 Rick Macklem <rmacklem@FreeBSD.org>

Add support for NFSv4.2 to the NFS client and server.

This patch adds support for NFSv4.2 (RFC-7862) and Extended Attributes
(RFC-8276) to the NFS client and server.
NFSv4.2 is comprised of several optional features that can be supported
in addition to NFSv4.1. This patch adds the following optional features:
- posix_fadvise(POSIX_FADV_WILLNEED/POSIX_FADV_DONTNEED)
- posix_fallocate()
- intra server file range copying via the copy_file_range(2) syscall
--> Avoiding data tranfer over the wire to/from the NFS client.
- lseek(SEEK_DATA/SEEK_HOLE)
- Extended attribute syscalls for "user" namespace attributes as defined
by RFC-8276.

Although this patch is fairly large, it should not affect support for
the other versions of NFS. However it does add two new sysctls that allow
a sysadmin to limit which minor versions of NFSv4 a server supports, allowing
a sysadmin to disable NFSv4.2.

Unfortunately, when the NFS stats structure was last revised, it was assumed
that there would be no additional operations added beyond what was
specified in RFC-7862. However RFC-8276 did add additional operations,
forcing the NFS stats structure to revised again. It now has extra unused
entries in all arrays, so that future extensions to NFSv4.2 can be
accomodated without revising this structure again.

A future commit will update nfsstat(1) to report counts for the new NFSv4.2
specific operations/procedures.

This patch affects the internal interface between the nfscommon, nfscl and
nfsd modules and, as such, they all must be upgraded simultaneously.
I will do a version bump (although arguably not needed), due to this.

This code has survived a "make universe" but has not been built with a
recent GCC. If you encounter build problems, please email me.

Relnotes: yes


# 5d85e12f 23-Sep-2019 Rick Macklem <rmacklem@FreeBSD.org>

Replace all mtx_lock()/mtx_unlock() on n_mtx with the macros.

For a long time, some places in the NFS code have locked/unlocked the
NFS node lock with the macros NFSLOCKNODE()/NFSUNLOCKNODE() whereas
others have simply used mtx_lock()/mtx_unlock().
Since the NFS node mutex needs to change to an sx lock so it can be held when
vnode_pager_setsize() is called, replace all occurrences of mtx_lock/mtx_unlock
with the macros to simply making the change to an sx lock in future commit.
There is no semantic change as a result of this commit.

I am not sure if the change to an sx lock will be MFC'd soon, so I put
an MFC of 1 week on this commit so that it could be MFC'd with that commit.

Suggested by: kib
MFC after: 1 week


# 2df8bd90 12-Mar-2019 Edward Tomasz Napierala <trasz@FreeBSD.org>

Drop unused 'p' argument to nfsv4_strtogid().

MFC after: 2 weeks
Sponsored by: DARPA, AFRL


# 0658ac39 12-Mar-2019 Edward Tomasz Napierala <trasz@FreeBSD.org>

Drop unused 'p' argument to nfsv4_strtouid().

MFC after: 2 weeks
Sponsored by: DARPA, AFRL


# f86bce17 22-Nov-2018 Rick Macklem <rmacklem@FreeBSD.org>

Make sure the NFS readdir client fills in all "struct dirent" data.

The NFS client code (nfsrpc_readdir() and nfsrpc_readdirplus()) wasn't
filling in parts of the readdir reply, such as d_pad[01] and the bytes
at the end of d_name within d_reclen. As such, data left in a buffer cache
block could be leaked to userland in the readdir reply.
This patch makes sure all of the data is filled in.

Reported by: Thomas Barabosch, Fraunhofer FKIE
Reviewed by: kib, markj
MFC after: 2 weeks


# 1493c2ee 02-Nov-2018 Brooks Davis <brooks@FreeBSD.org>

Make vop_symlink take a const target path.

This will enable callers to take const paths as part of syscall
decleration improvements.

Where doing so is easy and non-distruptive carry the const through
implementations. In UFS the value is passed to an interface that must
take non-const values. In ZFS, const poisoning would touch code shared
with upstream and it's not worth adding diffs.

Bump __FreeBSD_version for external API consumers.

Reviewed by: kib (prior version)
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D17805


# ac0d6495 02-Aug-2018 Rick Macklem <rmacklem@FreeBSD.org>

Silence newer gcc warnings.

Newer versions of gcc generate "might not be initialized" warnings for
several variables in nfsrpc_doiods(). I have checked and all of these
variables are assigned values before they are used.
In the one case of "tdrpc", it could have passed garbage as an argument
to nfscl_dofflayoutio() when mirrorcnt is one. However nfscl_dofflayoutio() only
uses the argument when mirrorcnt > 1, so it wasn't actually broken.
This patch initializes "tdrpc" to avoid confusion and initializes the rest
to make the compiler happy.

Requested by: mmacy


# 5da38824 15-Jul-2018 Rick Macklem <rmacklem@FreeBSD.org>

Shut down the TCP connection to a DS in the pNFS client when Renew fails.

When a NFSv4.1 client mount using pNFS detects a failure trying to do a
Renew (actually just a Sequence operation), the code would simply try
again and again and again every 30sec.
This would tie up the "nfscl" thread, which should also be doing other
things like Renews on other DSs and the MDS.
This patch adds code which closes down the TCP connection and marks it
defunct when Renew detects an failure to communicate with the DS, so
further Renews will not be attempted until a new working TCP connection to
the DS is established.
It also makes the call to nfscl_cancelreqs() unconditional, since
nfscl_cancelreqs() checks the NFSCLDS_SAMECONN flag and does so while holding
the lock.
This fix only applies to the NFSv4.1 client whne using pNFS and without it
the only effect would have been an "nfscl" thread busy doing Renew attempts
on an unresponsive DS.

MFC after: 2 weeks


# 89c64a3a 14-Jul-2018 Rick Macklem <rmacklem@FreeBSD.org>

Fix the pNFS client when mirrors aren't on the same machine.

Without this patch, the client side NFSv4.1 pNFS code erroneously did writes
and commits to both DS mirrors using the TCP connection of the first one.
For my test setup this worked, since I have both DSs running on the same
machine, but it would have failed when the DSs are on separate machines.
This patch fixes the code to use the correct TCP connection for each DS.
This patch should only affect the NFSv4.1 client when using "pnfs" mounts
to mirrored DSs.

MFC after: 2 weeks


# 83f526de 12-Jul-2018 Rick Macklem <rmacklem@FreeBSD.org>

Change the pNFS client so that it does not report an NFSERR_STALE from
an I/O attempt on a DS to the server via LayoutReturn.

The current FreeBSD client can generate these errors for an operational
DS while doing a recovery of a mirror after a mirrored DS has been repaired.
I am not sure why these errors occur, but my best current guess is a race
between the Layout Recall issued by the kernel code run from pnfsdscopymr(8)
and a Read operation on the DS for the file bing copied.
The errrors are not fatal, since the client falls back on doing I/O through
the MDS, which can do the I/O successfully as a proxy. (The fact that the
MDS can do this indicates that the file does still exist on the functioning
DS.)
This patch only affects behaviour of the pNFS client and only when using
Flexible File layouts.

MFC after: 2 weeks


# a6fed5f5 12-Jul-2018 Rick Macklem <rmacklem@FreeBSD.org>

Modify the NFSv4.1 pNFS client to use separate TCP connections for DSs.

Without this patch, the NFSv4.1 pNFS client shared a single TCP connection
for all DSs that resided on the same machine. This made disabling one of
the DSs impossible. Although unlikely, it is possible that the storage
subsystem has failed in such a way that the storage for one DS on a machine
is no longer functioning correctly, but the storage used by another DS on
the same machine is still ok. For this case, it would be nice if a system
can fail one of the DSs without failing them all.
This patch changes the default behaviour to use separate TCP connections
for each DS even if they reside on the same machine.
I do not believe that this will be a problem for extant pNFS servers, but
a sysctl can be set to restore the old behaviour if this change causes a
problem for an extant pNFS server.
This patch only affects the NFSv4.1 pNFS client.

MFC after: 2 weeks


# 2e35b8fe 22-Jun-2018 Rick Macklem <rmacklem@FreeBSD.org>

Change the NFSv4.1 pNFS client so that it returns the DS error in layoutreturn.

When the NFSv4.1 pNFS client gets an error for a DS I/O operation using a
Flexible File layout, it returns the layout with an error.
This patch changes the code slightly, so that it returns the layout for all
errors except EACCES and lets the MDS decide what to do based on the error.
It also makes a couple of changes to nfscl_layoutrecall() to ensure that
the first layoutreturn(s) will have the error in the reply.
Plus, the patch adds a wakeup() so that the "nfscl" thread won't wait 1sec
before doing the LayoutReturn.
Tested against the pNFS service.
This patch should not affect non-pNFS use of the client.
The unused "dsp" argument will be used by a future patch that disables the
connection to the DS when possible.

MFC after: 2 weeks


# 2bad6424 17-Jun-2018 Rick Macklem <rmacklem@FreeBSD.org>

Make the pNFS NFSv4.1 client return a Flexible File layout upon error.

The Flexible File layout LayoutReturn operation has argument fields where
an I/O error encountered when attempting I/O on a DS can be reported back
to the MDS.
This patch adds code to the client to do this for the Flexible File layout
mirrored case.
This patch should only affect mounts using the "pnfs" option against servers
that support the Flexible File layout.

MFC after: 2 weeks


# 90d2dfab 12-Jun-2018 Rick Macklem <rmacklem@FreeBSD.org>

Merge the pNFS server code from projects/pnfs-planb-server into head.

This code merge adds a pNFS service to the NFSv4.1 server. Although it is
a large commit it should not affect behaviour for a non-pNFS NFS server.
Some documentation on how this works can be found at:
http://people.freebsd.org/~rmacklem/pnfs-planb-setup.txt
and will hopefully be turned into a proper document soon.
This is a merge of the kernel code. Userland and man page changes will
come soon, once the dust settles on this merge.
It has passed a "make universe", so I hope it will not cause build problems.
It also adds NFSv4.1 server support for the "current stateid".

Here is a brief overview of the pNFS service:
A pNFS service separates the Read/Write oeprations from all the other NFSv4.1
Metadata operations. It is hoped that this separation allows a pNFS service
to be configured that exceeds the limits of a single NFS server for either
storage capacity and/or I/O bandwidth.
It is possible to configure mirroring within the data servers (DSs) so that
the data storage file for an MDS file will be mirrored on two or more of
the DSs.
When this is used, failure of a DS will not stop the pNFS service and a
failed DS can be recovered once repaired while the pNFS service continues
to operate. Although two way mirroring would be the norm, it is possible
to set a mirroring level of up to four or the number of DSs, whichever is
less.
The Metadata server will always be a single point of failure,
just as a single NFS server is.

A Plan B pNFS service consists of a single MetaData Server (MDS) and K
Data Servers (DS), all of which are recent FreeBSD systems.
Clients will mount the MDS as they would a single NFS server.
When files are created, the MDS creates a file tree identical to what a
single NFS server creates, except that all the regular (VREG) files will
be empty. As such, if you look at the exported tree on the MDS directly
on the MDS server (not via an NFS mount), the files will all be of size 0.
Each of these files will also have two extended attributes in the system
attribute name space:
pnfsd.dsfile - This extended attrbute stores the information that
the MDS needs to find the data storage file(s) on DS(s) for this file.
pnfsd.dsattr - This extended attribute stores the Size, AccessTime, ModifyTime
and Change attributes for the file, so that the MDS doesn't need to
acquire the attributes from the DS for every Getattr operation.
For each regular (VREG) file, the MDS creates a data storage file on one
(or more if mirroring is enabled) of the DSs in one of the "dsNN"
subdirectories. The name of this file is the file handle
of the file on the MDS in hexadecimal so that the name is unique.
The DSs use subdirectories named "ds0" to "dsN" so that no one directory
gets too large. The value of "N" is set via the sysctl vfs.nfsd.dsdirsize
on the MDS, with the default being 20.
For production servers that will store a lot of files, this value should
probably be much larger.
It can be increased when the "nfsd" daemon is not running on the MDS,
once the "dsK" directories are created.

For pNFS aware NFSv4.1 clients, the FreeBSD server will return two pieces
of information to the client that allows it to do I/O directly to the DS.
DeviceInfo - This is relatively static information that defines what a DS
is. The critical bits of information returned by the FreeBSD
server is the IP address of the DS and, for the Flexible
File layout, that NFSv4.1 is to be used and that it is
"tightly coupled".
There is a "deviceid" which identifies the DeviceInfo.
Layout - This is per file and can be recalled by the server when it
is no longer valid. For the FreeBSD server, there is support
for two types of layout, call File and Flexible File layout.
Both allow the client to do I/O on the DS via NFSv4.1 I/O
operations. The Flexible File layout is a more recent variant
that allows specification of mirrors, where the client is
expected to do writes to all mirrors to maintain them in a
consistent state. The Flexible File layout also allows the
client to report I/O errors for a DS back to the MDS.
The Flexible File layout supports two variants referred to as
"tightly coupled" vs "loosely coupled". The FreeBSD server always
uses the "tightly coupled" variant where the client uses the
same credentials to do I/O on the DS as it would on the MDS.
For the "loosely coupled" variant, the layout specifies a
synthetic user/group that the client uses to do I/O on the DS.
The FreeBSD server does not do striping and always returns
layouts for the entire file. The critical information in a layout
is Read vs Read/Writea and DeviceID(s) that identify which
DS(s) the data is stored on.

At this time, the MDS generates File Layout layouts to NFSv4.1 clients
that know how to do pNFS for the non-mirrored DS case unless the sysctl
vfs.nfsd.default_flexfile is set non-zero, in which case Flexible File
layouts are generated.
The mirrored DS configuration always generates Flexible File layouts.
For NFS clients that do not support NFSv4.1 pNFS, all I/O operations
are done against the MDS which acts as a proxy for the appropriate DS(s).
When the MDS receives an I/O RPC, it will do the RPC on the DS as a proxy.
If the DS is on the same machine, the MDS/DS will do the RPC on the DS as
a proxy and so on, until the machine runs out of some resource, such as
session slots or mbufs.
As such, DSs must be separate systems from the MDS.

Tested by: james.rose@framestore.com
Relnotes: yes


# dec8894b 01-Jun-2018 Rick Macklem <rmacklem@FreeBSD.org>

Fix the default number of threads for Flex File layout pNFS client I/O.

The intent was that the default would be based on number of CPUs, but the
code disabled using taskqueue() by default.
This code is only executed when mounting a NFSv4.1 server that supports the
Flexible File layout for pNFS and, since such servers are rare, this change
shouldn't result in a POLA violation.
(The FreeBSD pNFS server is still a project and the only other one that
uses Flexible File layout is being developed by Primary Data and I don't
know if they have even shipped any to customers yet.)
Found while testing the pNFS server.


# b7faa59d 20-May-2018 Matt Macy <mmacy@FreeBSD.org>

nfsclient: warnings cleanups


# 222daa42 25-Jan-2018 Conrad Meyer <cem@FreeBSD.org>

style: Remove remaining deprecated MALLOC/FREE macros

Mechanically replace uses of MALLOC/FREE with appropriate invocations of
malloc(9) / free(9) (a series of sed expressions). Something like:

* MALLOC(a, b, ... -> a = malloc(...
* FREE( -> free(
* free((caddr_t) -> free(

No functional change.

For now, punt on modifying contrib ipfilter code, leaving a definition of
the macro in its KMALLOC().

Reported by: jhb
Reviewed by: cy, imp, markj, rmacklem
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D14035


# caa7e52f 26-Dec-2017 Eitan Adler <eadler@FreeBSD.org>

kernel: Fix several typos and minor errors

- duplicate words
- typos
- references to old versions of FreeBSD

Reviewed by: imp, benno


# 51369649 20-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.


# f49c813c 16-Oct-2017 Rick Macklem <rmacklem@FreeBSD.org>

Use taskqueue(9) to do writes/commits to mirrored DSs concurrently.

When the NFSv4.1 pNFS client is using a Flexible File Layout specifying
mirrored Data Servers, it must do the writes and commits to all mirrors.
This patch modifies the client to use a taskqueue to perform these writes
and commits concurrently.
The number of threads can't be changed for taskqueue(9), so it is set
to 4 * mp_ncpus by default, but this can be overridden by setting the
sysctl vfs.nfs.pnfsiothreads.

Differential Revision: https://reviews.freebsd.org/D12632


# b949cc41 05-Oct-2017 Rick Macklem <rmacklem@FreeBSD.org>

Add Flex File Layout support to the NFSv4.1 pNFS client.

This patch adds support for the Flexible File Layout to the pNFS client.
Although the patch is rather large, it should only affect NFS mounts
using the "pnfs" option against pNFS servers that do not support File
Layout.
There are still a couple of things missing from the Flexible File Layout
client implementation:
- The code does not yet do a LayoutReturn with I/O error stats when
I/O error(s) occur when attempting to do I/O on a DS.
This will be fixed in a future commit, since it is important for the
MDS to know that I/O on a DS is failing.
- The current code does writes and commits to mirror DSs serially.
Making them happen concurrently will be done in a future commit,
after discussion on freebsd-current@ on the best way to do this.
- The code does not handle NFSv4.0 DSs. Since there is no extant pNFS
server that implements NFSv4.0 DSs and NFSv4.1 DSs makes more sense
now, I don't intend to implement this until there is a need for it.
There is support for NFSv4.1 and NFSv3 DSs.


# be3d32ad 28-Sep-2017 Rick Macklem <rmacklem@FreeBSD.org>

Change nfsv4_getipaddr() and nfsrpc_fillsa() to not use sockaddr_storage.

This patch changes nfsv4_getipaddr() and nfsrpc_fillsa() to use
a sockaddr_in * and sockaddr_in6 * instead of sockaddr_storage, to
avoid allocating the latter on the stack. It also moves the nfsrpc_fillsa()
call to after the completion of parsing of the DeviceInfo reply from
the server. This patch is in preparation for addition of Flex File
Layout support in a future commit.
It only affects the "pnfs" NFSv4.1 client mount option and should not
have changed its semantics.


# a8462c58 26-Sep-2017 Rick Macklem <rmacklem@FreeBSD.org>

Add major and minor version arguments to nfscl_reqstart().

This patch adds "vers" and "minorvers" arguments to nfscl_reqstart().
The patch always passes them in as "0" and that implies no change
in semantics. These arguments will be used by a future commit that
adds support for the Flexible File Layout.


# 0f29b829 19-Sep-2017 Rick Macklem <rmacklem@FreeBSD.org>

Make the nfsrpc_layoutget() function a static.

Make the NFSv4 pNFS client function nfsrpc_layoutget() a static, since it
is only used in sys/fs/nfsclient/nfs_clrpcops.c.
This prepares the code for future patches that add Flex File layout
support.


# b0932afa 19-Sep-2017 Rick Macklem <rmacklem@FreeBSD.org>

Simplify nfsrpc_layoutreturn() args.

Simplify nfsrpc_layoutreturn() args. in preparation for the addition
of Flex File layout support, since File layout uses a 0 length field.
Flex Files does use a longer field, but that will be added in a
subsequent commit.


# ab118d04 19-Sep-2017 Rick Macklem <rmacklem@FreeBSD.org>

Simplify nfsrpc_layoutcommit() args.

Simplify nfsrpc_layoutcommit() args. in preparation for the addition
of Flex File layout support, since it also uses a 0 length field.


# ccf03825 17-Sep-2017 Rick Macklem <rmacklem@FreeBSD.org>

Fix bogus FREAD with NFSV4OPEN_ACCESSREAD. No functional change.

The code in nfscl_doflayoutio() bogusly used FREAD instead of
NFSV4OPEN_ACCESSREAD. Since both happen to be defined as "1", this
worked and the patch doesn't result in a functional change.
Found by inspection during development of Flex File Layout support.

MFC after: 2 weeks


# 23e148e9 28-Jul-2017 Rick Macklem <rmacklem@FreeBSD.org>

Fix possible crash for the NFSv4.1 pNFS client.

If the nfsrpc_createlayoutrpc() call in nfsrpc_getcreatelayout() fails,
the code used nfhpp when it might be set NULL. This patch checks for
the error cases (laystat != 0) and avoids using nfhpp for the failure case.
This would only affect NFSv4.1 mounts with the "pnfs" option.
Found while testing the "umount -N" patch not yet in head.

MFC after: 2 weeks


# f8181b5e 20-Jul-2017 Rick Macklem <rmacklem@FreeBSD.org>

r320062 introduced a bug when doing NFSv4.1 mounts against some non-FreeBSD servers.

r320062 used nm_rsize, nm_wsize to set the maximum request/response sizes for
the NFSv4.1 session. If rsize,wsize are not specified as options, the
value of nm_rsize, nm_wsize is 0 at session creation, resulting in
values for request/response that are too small.
This patch fixes the problem. A workaround is to specify rsize=N,wsize=N
mount options explicitly, so they are set before session creation.
This bug only affects NFSv4.1 mounts against some non-FreeBSD servers.

MFC after: 1 week


# 06ea10c6 20-Jul-2017 Rick Macklem <rmacklem@FreeBSD.org>

Revert r321308. I'll commit a better fix soon.


# a9d104fd 20-Jul-2017 Rick Macklem <rmacklem@FreeBSD.org>

r320062 introduced a bug when doing NFSv4.1 mounts against some non-FreeBSD servers.

r320062 used nm_rsize, nm_wsize to set the maximum request/response sizes for
the NFSv4.1 session. If rsize,wsize are not specified as options, the
value of nm_rsize, nm_wsize is 0 at session creation, resulting in
values for request/response that are too small.
This patch fixes the problem. A workaround is to specify rsize=N,wsize=N
mount options explicitly, so they are set before session creation.
This bug only affects NFSv4.1 mounts against some non-FreeBSD servers.

MFC after: 1 week


# 81b07aac 25-Jun-2017 Rick Macklem <rmacklem@FreeBSD.org>

Add support to the NFSv4.1/pNFS client for commits through the DS.

A NFSv4.1/pNFS server using File Layout can specify that Commit operations
are to be done against the DS instead of MDS. Since no extant pNFS
server did this, the code was untested and "#ifdef notyet".
The FreeBSD pNFS server I am developing does specify that Commits be done
through the DS, so the code has been enabled/tested.
This patch should only affect the case of a pNFS server that specfies
Commits through the DS.

PR: 219551
MFC after: 2 weeks


# a351e99c 24-Jun-2017 Rick Macklem <rmacklem@FreeBSD.org>

Add two new compound RPCs to the NFSv4.1/pNFS client.

When the NFSv4.1 client is doing pNFS, it needs to get an Open and
a Layout for every file it will be doing I/O on. The current code
does two separate RPCs to get these. This patch adds two new compounds
that do the both the Open and LayoutGet in the same RPC, reducing the
RPC count.
It also factors out the code that sets up and parses the LayoutGet operation
into separate functions, so that the code doesn't get duplicated for
these new RPCs.
This patch is fairly large, but should only affect the NFSv4.1 client
when the "pnfs" option is specified.

PR: 219550
MFC after: 2 weeks


# 95ac7f1a 18-Jun-2017 Rick Macklem <rmacklem@FreeBSD.org>

Fix the NFS client/server so that it actually uses the 64bit ino_t filenos.

The code still doesn't use d_off. That will come in a future commit.
The code also removes the checks for servers returning a fileno that
doesn't fit in 32bits, since that should work ok now.
Bump __FreeBSD_version since this patch changes the interface between
the NFS kernel modules.

Reviewed by: kib


# d1c5e240 17-Jun-2017 Rick Macklem <rmacklem@FreeBSD.org>

Make MAXBCACHEBUF a tunable called vfs.maxbcachebuf.

By making MAXBCACHEBUF a tunable, it can be increased to allow for
larger read/write data sizes for the NFS client.
The tunable is limited to MAXPHYS, which is currently 128K.
Making MAXPHYS a tunable or increasing its value is being discussed,
since it would be nice to support a read/write data size of 1Mbyte
for the NFS client when mounting the AmazonEFS file service.

Reviewed by: kib
MFC after: 2 weeks
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D10991


# 69921123 23-May-2017 Konstantin Belousov <kib@FreeBSD.org>

Commit the 64-bit inode project.

Extend the ino_t, dev_t, nlink_t types to 64-bit ints. Modify
struct dirent layout to add d_off, increase the size of d_fileno
to 64-bits, increase the size of d_namlen to 16-bits, and change
the required alignment. Increase struct statfs f_mntfromname[] and
f_mntonname[] array length MNAMELEN to 1024.

ABI breakage is mitigated by providing compatibility using versioned
symbols, ingenious use of the existing padding in structures, and
by employing other tricks. Unfortunately, not everything can be
fixed, especially outside the base system. For instance, third-party
APIs which pass struct stat around are broken in backward and
forward incompatible ways.

Kinfo sysctl MIBs ABI is changed in backward-compatible way, but
there is no general mechanism to handle other sysctl MIBS which
return structures where the layout has changed. It was considered
that the breakage is either in the management interfaces, where we
usually allow ABI slip, or is not important.

Struct xvnode changed layout, no compat shims are provided.

For struct xtty, dev_t tty device member was reduced to uint32_t.
It was decided that keeping ABI compat in this case is more useful
than reporting 64-bit dev_t, for the sake of pstat.

Update note: strictly follow the instructions in UPDATING. Build
and install the new kernel with COMPAT_FREEBSD11 option enabled,
then reboot, and only then install new world.

Credits: The 64-bit inode project, also known as ino64, started life
many years ago as a project by Gleb Kurtsou (gleb). Kirk McKusick
(mckusick) then picked up and updated the patch, and acted as a
flag-waver. Feedback, suggestions, and discussions were carried
by Ed Maste (emaste), John Baldwin (jhb), Jilles Tjoelker (jilles),
and Rick Macklem (rmacklem). Kris Moore (kris) performed an initial
ports investigation followed by an exp-run by Antoine Brodin (antoine).
Essential and all-embracing testing was done by Peter Holm (pho).
The heavy lifting of coordinating all these efforts and bringing the
project to completion were done by Konstantin Belousov (kib).

Sponsored by: The FreeBSD Foundation (emaste, kib)
Differential revision: https://reviews.freebsd.org/D10439


# 845eb84c 28-Apr-2017 Rick Macklem <rmacklem@FreeBSD.org>

Modify the NFSv4.1/pNFS client to ask for a maximum length of layout.

The code specified the length of a layout as INT64_MAX instead of
UINT64_MAX. This could result in getting a layout for less than the
full file for extremely large files. Although having little practical
effect, this patch corrects this in the code.
Detected during recent testing of the pNFS server.

MFC after: 2 weeks


# 6406db24 23-Apr-2017 Rick Macklem <rmacklem@FreeBSD.org>

Make the NFSv4 client to use a write open for reading if allowed by the server.

An NFSv4 server has the option of allowing a Read to be done using a Write
Open. If this is not allowed, the server will return NFSERR_OPENMODE.
This patch attempts the read with a write open and then disables this
if the server replies NFSERR_OPENMODE.
This change will avoid some uses of the special stateids. This will be
useful for pNFS/DS Reads, since they cannot use special stateids.
It will also be useful for any NFSv4 server that does not support reading
via the special stateids. It has been tested against both types of NFSv4 server.

MFC after: 2 weeks


# b845c29a 23-Apr-2017 Rick Macklem <rmacklem@FreeBSD.org>

Don't set the connection-back-channel flag for DS sessions.

The NFSv4.1/pNFS client does not use/need a backchannel for the Data Server (DS)
sessions, so the flag should only be set for MetaData Server (MDS) sessions.
This patch should have been a part of r317275.

MFC after: 2 weeks


# c20a7210 22-Apr-2017 Rick Macklem <rmacklem@FreeBSD.org>

Fix some krpc leaks for the NFSv4.1/pNFS client.

The NFSv4.1/pNFS client wasn't doing a newnfs_disconnect() call for the
connection to the Data Server (DS) under some circumstances. The main
effect of this was a leak of malloc'd structures in the krpc. This patch
adds the newnfs_disconnect() calls to fix this.
Detected during recent testing against the pNFS server under development.

MFC after: 2 weeks


# 037a2012 13-Apr-2017 Rick Macklem <rmacklem@FreeBSD.org>

Add an NFSv4.1 mount option for "use one openowner".

Some NFSv4.1 servers such as AmazonEFS can only support a small fixed number
of open_owner4s. This patch adds a mount option called "oneopenown" that
can be used for NFSv4.1 mounts to make the client do all Opens with the
same open_owner4 string. This option can only be used with NFSv4.1 and
may not work correctly when Delegations are is use.

Reported by: cperciva
Tested by: cperciva
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D8988


# 83a37350 08-Apr-2017 Rick Macklem <rmacklem@FreeBSD.org>

Fix parsing failure for NFSv4 Setattr operation for failed case.

If an operation that preceeds a Setattr in an NFSv4 compound fails,
there is no bitmap of attributes to parse. Without this patch, the
parsing would fail and return EBADRPC instead of the correct failure
error. This could break recovery from a server crash/reboot.

Tested by: cperciva
PR: 215883
MFC after: 2 weeks


# fbbd9655 28-Feb-2017 Warner Losh <imp@FreeBSD.org>

Renumber copyright clause 4

Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by: Jan Schaumann <jschauma@stevens.edu>
Pull Request: https://github.com/freebsd/freebsd/pull/96


# b2fc0141 23-Dec-2016 Rick Macklem <rmacklem@FreeBSD.org>

Fix NFSv4.1 client recovery from NFS4ERR_BAD_SESSION errors.

For most NFSv4.1 servers, a NFS4ERR_BAD_SESSION error is a rare failure
that indicates that the server has lost session/open/lock state.
However, recent testing by cperciva@ against the AmazonEFS server found
several problems with client recovery from this due to it generating this
failure frequently.
Briefly, the problems fixed are:
- If all session slots were in use at the time of the failure, some processes
would continue to loop waiting for a slot on the old session forever.
- If an RPC that doesn't use open/lock state failed with NFS4ERR_BAD_SESSION,
it would fail the RPC/syscall instead of initiating recovery and then
looping to retry the RPC.
- If a successful reply to an RPC for an old session wasn't processed
until after a new session was created for a NFS4ERR_BAD_SESSION error,
it would erroneously update the new session and corrupt it.
- The use of the first element of the session list in the nfs mount
structure (which is always the current metadata session) was slightly
racey. With changes for the above problems it became more racey, so all
uses of this head pointer was wrapped with a NFSLOCKMNT()/NFSUNLOCKMNT().
- Although the kernel malloc() usually allocates more bytes than requested
and, as such, this wouldn't have caused problems, the allocation of a
session structure was 1 byte smaller than it should have been.
(Null termination byte for the string not included in byte count.)

There are probably still problems with a pNFS data server that fails
with NFS4ERR_BAD_SESSION, but I have no server that does this to test
against (the AmazonEFS server doesn't do pNFS), so I can't fix these yet.

Although this patch is fairly large, it should only affect the handling
of NFS4ERR_BAD_SESSION error replies from an NFSv4.1 server.
Thanks go to cperciva@ for the extension testing he did to help isolate/fix
these problems.

Reported by: cperciva
Tested by: cperciva
MFC after: 3 months
Differential Revision: https://reviews.freebsd.org/D8745


# a96c9b30 29-Apr-2016 Pedro F. Giffuni <pfg@FreeBSD.org>

NFS: spelling fixes on comments.

No funcional change.


# 84aa8a8a 11-Apr-2016 Rick Macklem <rmacklem@FreeBSD.org>

Bruce Evans reported that there was a performance regression between
the old and new NFS clients. He did a good job of isolating the problem
which was caused by the new NFS client not setting the post write mtime
correctly. The new NFS client code was cloned from the old client, but
was incorrect, because the mtime in the nfs vnode's cache wasn't yet
updated. This patch fixes this problem. The patch also adds missing mutex
locking.

Reported and tested by: bde
MFC after: 2 weeks


# 74b8d63d 10-Apr-2016 Pedro F. Giffuni <pfg@FreeBSD.org>

Cleanup unnecessary semicolons from the kernel.

Found with devel/coccinelle.


# d3bf8f64 15-Jan-2016 Alexander V. Chernikov <melifaro@FreeBSD.org>

Make nfscl_getmyip() use new routing KPI.

* Use standard IPv6 SAS instead of rt->rt_ifa address.
* Make address lookup work for IPv6 LLA.
* Save address into buffer provided by caller instead of using static vars.

Discussed with: rmacklem


# b179878d 20-Nov-2015 Rick Macklem <rmacklem@FreeBSD.org>

Revert r283330 since it broke directory caching in the client.
At this time I cannot see a way to fix directory caching when it
has partial blocks in the buffer cache, due to the fact that the
syscall's uio_offset won't stay the same as the lblkno * NFS_DIRBLKSIZ
offset.

Reported by: bde
MFC after: 2 weeks


# d189dcb6 02-Jul-2015 Rick Macklem <rmacklem@FreeBSD.org>

Alex Burlyga reported a POLA violation for the new NFS client as
compared to the old NFS client via email to the freebsd-fs@ mailing list.
For the new client, when multiple clients attempted to create a symbolic
link concurrently, more that one client would report success instead of
EEXIST. This was caused by code in the new client that mapped EEXIST to
OK assuming it was caused by a retried RPC request.
Since the old client did not do this, the patch defaults to the old
behaviour and permits the new behaviour to be enabled via a sysctl.

Reported by: alex.burlyga.ietf@gmail.com
Tested by: alex.burlyga.ietf@gmail.com
MFC after: 2 weeks


# 262a8428 23-May-2015 Rick Macklem <rmacklem@FreeBSD.org>

The NFS client generated directory block(s) with d_fileno == 0
so that it would not return less data than requested.
Since returning less directory data than requested is not a problem
for FreeBSD and even UFS no longer returns directory structures
with d_fileno == 0, this patch stops the client from doing this.
Although entries with d_fileno == 0 should not be a problem,
the man pages no longer document that these entries should be
ignored, so there was a concern that these entries might be an
issue in the future.

Suggested by: trasz
Tested by: trasz
MFC after: 2 weeks


# 2f39c910 20-Apr-2015 Pedro F. Giffuni <pfg@FreeBSD.org>

Prevent a double free.

This is similar to r281756 so set the ptr NULL after free as a safety belt
against future changes.

Obtained from: HardenedBSD (b2e77ced9ae213d358b44d98f552d9ae4636ecac)
Submitted by: Oliver Pinter
Revewed by: rmacklem


# a3a4b110 19-Apr-2015 Pedro F. Giffuni <pfg@FreeBSD.org>

nfsrpc_createv4: fix double free.

Reported by: Oliver Pinter, clang static checker
Obtained from: HardenedBSD (commit 63cac77c42c0c3fc67da62f97d5ab651d52ae707)
Reviewed by: rmacklem
MFC after: 5 days


# 9eeef746 21-Apr-2014 Rick Macklem <rmacklem@FreeBSD.org>

Fixes mkdir for the NFSv2 client that was broken by r264705.

Reported by: bdrewery
MFC after: 2 weeks


# c3e4a726 20-Apr-2014 Rick Macklem <rmacklem@FreeBSD.org>

Modify the NFSv4 client create/mkdir RPC so that it acquires
post-create/mkdir directory attributes. This allows the RPC to
name cache the newly created directory and reduces the lookup RPC
count for applications creating a lot of directories.

MFC after: 2 weeks


# de1a42bd 19-Apr-2014 Rick Macklem <rmacklem@FreeBSD.org>

Modify the NFSv4 client open/create RPC so that it acquires
post-open/create directory attributes. This allows the RPC to
name cache the newly created file and reduces the lookup RPC
count by about 10% for software builds.

MFC after: 2 weeks


# a6f8e64e 18-Apr-2014 Rick Macklem <rmacklem@FreeBSD.org>

Modify the Lookup RPC for NFSv4 so that it acquires directory
attributes. This allows the client to cache directory names
when they are looked up, reducing the Lookup RPC count by
about 40% for software builds.

MFC after: 2 weeks


# 6168020f 27-Jan-2013 Konstantin Belousov <kib@FreeBSD.org>

Be conservative and do not try to consume more bytes than was
requested from the server for the read operation. Server shall not
reply with too large size, but client should be resilent too.

Reviewed by: rmacklem
MFC after: 1 week


# 1f60bfd8 08-Dec-2012 Rick Macklem <rmacklem@FreeBSD.org>

Move the NFSv4.1 client patches over from projects/nfsv4.1-client
to head. I don't think the NFS client behaviour will change unless
the new "minorversion=1" mount option is used. It includes basic
NFSv4.1 support plus support for pNFS using the Files Layout only.
All problems detecting during an NFSv4.1 Bakeathon testing event
in June 2012 have been resolved in this code and it has been tested
against the NFSv4.1 server available to me.
Although not reviewed, I believe that kib@ has looked at it.


# f4e2c07e 09-Sep-2012 Rick Macklem <rmacklem@FreeBSD.org>

Add a simple printf() based debug facility to the new nfs client.
Use it for a printf() that can be harmlessly generated for mmap()'d
files. It will be used extensively for the NFSv4.1 client.
Debugging printf()s are enabled by setting vfs.nfs.debuglevel to
a non-zero value. The higher the value, the more debugging printf()s.

Reviewed by: jhb
MFC after: 2 weeks


# 5e99212d 02-Mar-2012 Rick Macklem <rmacklem@FreeBSD.org>

Post r230394, the Lookup RPC counts for both NFS clients increased
significantly. Upon investigation this was caused by name cache
misses for lookups of "..". For name cache entries for non-".."
directories, the cache entry serves double duty. It maps both the
named directory plus ".." for the parent of the directory. As such,
two ctime values (one for each of the directory and its parent) need
to be saved in the name cache entry.
This patch adds an entry for ctime of the parent directory to the
name cache. It also adds an additional uma zone for large entries
with this time value, in order to minimize memory wastage.
As well, it fixes a couple of cases where the mtime of the parent
directory was being saved instead of ctime for positive name cache
entries. With this patch, Lookup RPC counts return to values similar
to pre-r230394 kernels.

Reported by: bde
Discussed with: kib
Reviewed by: jhb
MFC after: 2 weeks


# 5aefb4cb 20-Jan-2012 John Baldwin <jhb@FreeBSD.org>

Close a race in NFS lookup processing that could result in stale name cache
entries on one client when a directory was renamed on another client. The
root cause for the stale entry being trusted is that each per-vnode nfsnode
structure has a single 'n_ctime' timestamp used to validate positive name
cache entries. However, if there are multiple entries for a single vnode,
they all share a single timestamp. To fix this, extend the name cache
to allow filesystems to optionally store a timestamp value in each name
cache entry. The NFS clients now fetch the timestamp associated with
each name cache entry and use that to validate cache hits instead of the
timestamps previously stored in the nfsnode. Another part of the fix is
that the NFS clients now use timestamps from the post-op attributes of
RPCs when adding name cache entries rather than pulling the timestamps out
of the file's attribute cache. The latter is subject to races with other
lookups updating the attribute cache concurrently. Some more details:
- Add a variant of nfsm_postop_attr() to the old NFS client that can return
a vattr structure with a copy of the post-op attributes.
- Handle lookups of "." as a special case in the NFS clients since the name
cache does not store name cache entries for ".", so we cannot get a
useful timestamp. It didn't really make much sense to recheck the
attributes on the the directory to validate the namecache hit for "."
anyway.
- ABI compat shims for the name cache routines are present in this commit
so that it is safe to MFC.

MFC after: 2 weeks


# f7258644 07-Jan-2012 Rick Macklem <rmacklem@FreeBSD.org>

opt_inet6.h was missing from some files in the new NFS subsystem.
The effect of this was, for clients mounted via inet6 addresses,
that the DRC cache would never have a hit in the server. It also
broke NFSv4 callbacks when an inet6 address was the only one available
in the client. This patch fixes the above, plus deletes opt_inet6.h
from a couple of files it is not needed for.

MFC after: 2 weeks


# f855a3c5 22-Dec-2011 Rick Macklem <rmacklem@FreeBSD.org>

During investigation of an NFSv4 client crash reported by glebius@,
jhb@ spotted that nfscl_getstateid() might modify credentials when
called from nfsrpc_read() for the case where p != NULL, whereas
nfsrpc_read() only did a crdup() to get new credentials for p == NULL.
This bug was introduced by r195510, since pre-r195510 nfscl_getstateid()
only modified credentials for the p == NULL case. This patch modifies
nfsrpc_read()/nfsrpc_write() so that they do crdup() for the p != NULL case.
It is conceivable that this bug caused the crash reported by glebius@, but
that will not be determined for some time, since the crash occurred after
about 1month of operation.

Tested by: glebius
Reviewed by: jhb
MFC after: 2 weeks


# 03423552 20-Nov-2011 Rick Macklem <rmacklem@FreeBSD.org>

Add two arguments to the nfsrpc_rellockown() function in the NFSv4
client. This does not change the client's behaviour, but prepares
the code so that nfsrpc_rellockown() can be called elsewhere in a
future commit.

MFC after: 2 weeks


# 1171f21d 03-Jul-2011 Rick Macklem <rmacklem@FreeBSD.org>

Modify the new NFSv4 client so that it appends a file handle
to the lock_owner4 string that goes on the wire. Also, add
code to do a ReleaseLockOwner Op on the lock_owner4 string
before a Close. Apparently not all NFSv4 servers handle multiple
instances of the same lock_owner4 string, at least not in a
compatible way. This patch avoids having multiple instances,
except for one unusual case, which will be fixed by a future commit.
Found at the recent NFSv4 interoperability Bakeathon.

Tested by: tdh at excfb.com
MFC after: 2 weeks


# 4875024b 28-Jun-2011 Rick Macklem <rmacklem@FreeBSD.org>

Fix the new NFSv4 client so that it doesn't fill the cached
mode attribute in as 0 when doing writes. The change adds
the Mode attribute plus the others except Owner and Owner_group
to the list requested by the NFSv4 Write Operation. This fixed
a problem where an executable file built by "cc" would get mode
0111 instead of 0755 for some NFSv4 servers.
Found at the recent NFSv4 interoperability Bakeathon.

Tested by: tdh at excfb.com
MFC after: 2 weeks


# f8f4e256 05-Jun-2011 Rick Macklem <rmacklem@FreeBSD.org>

The new NFSv4 client was erroneously using "p" instead of
"p_leader" for the "id" for POSIX byte range locking. I think
this would only have affected processes created by rfork(2)
with the RFTHREAD flag specified. This patch fixes that by
passing the "id" down through the various functions from
nfs_advlock().

MFC after: 2 weeks


# 147206ae 25-May-2011 Rick Macklem <rmacklem@FreeBSD.org>

Fix the new NFS client so that it correctly sets the "must_commit"
argument for a write RPC when it succeeds for the first one and
fails for a subsequent RPC within the same call to the function.
This makes it compatible with the old NFS client for this case.

MFC after: 2 weeks


# b1297f14 19-Apr-2011 Rick Macklem <rmacklem@FreeBSD.org>

Modify the offset + size checks for read and write in the
experimental NFS client to take care of overflows. Thanks
go to dillon at apollo.backplane.com for providing the
snippet of code that does this.

MFC after: 2 weeks


# 58c969c8 18-Apr-2011 Rick Macklem <rmacklem@FreeBSD.org>

Fix up handling of the nfsmount structure in read and write
within the experimental NFS client. Mostly add mutex locking
and use the same rsize, wsize during the operation by keeping
a local copy of it. This is another change that brings it
closer to the regular NFS client.

MFC after: 2 weeks


# a8bafa5d 18-Apr-2011 Rick Macklem <rmacklem@FreeBSD.org>

Revert r220761 since, as kib@ pointed out, the case of
adding the check to nfsrpc_close() isn't useful. Also,
the check in nfscl_getcl() must be more involved, since
it needs to check before and after the acquisition of
the refcnt on nfsc_lock, while the mutex that protects
the client state data is held.


# be8b35ed 17-Apr-2011 Rick Macklem <rmacklem@FreeBSD.org>

Add checks for MNTK_UNMOUNTF at the beginning of three
functions, so that threads don't get stuck in them during
a forced dismount. nfs_sync/VFS_SYNC() needs this, since it is
called by dounmount() before VFS_UNMOUNT(). The nfscl_nget()
case makes sure that a thread doing an VOP_OPEN() or
VOP_ADVLOCK() call doesn't get blocked before attempting
the RPC. Attempting RPCs don't block, since they all
fail once a forced dismount is in progress.
The third one at the beginning of nfsrpc_close()
is done so threads don't get blocked while doing VOP_INACTIVE()
as the vnodes are cleared out.
With these three changes plus a change to the umount(1)
command so that it doesn't do "sync()" for the forced case
seem to make forced dismounts work for the experimental NFS
client.

MFC after: 2 weeks


# f5613c1d 16-Apr-2011 Rick Macklem <rmacklem@FreeBSD.org>

Fix readdirplus in the experimental NFS client so that it
skips over ".." to avoid a LOR race with nfs_lookup(). This
fix is analagous to r138256 in the regular NFS client.

MFC after: 2 weeks


# 4b3a38ec 16-Apr-2011 Rick Macklem <rmacklem@FreeBSD.org>

Add a lktype flags argument to nfscl_nget() and ncl_nget() in the
experimental NFS client so that its nfs_lookup() function can use
cn_lkflags in a manner analagous to the regular NFS client.

MFC after: 2 weeks


# a09001a8 14-Apr-2011 Rick Macklem <rmacklem@FreeBSD.org>

Fix the experimental NFSv4 server so that it uses VOP_PATHCONF()
to determine if a file system supports NFSv4 ACLs. Since
VOP_PATHCONF() must be called with a locked vnode, the function
is called before nfsvno_fillattr() and the result is passed in
as an extra argument.

MFC after: 2 weeks


# 07c0c166 14-Apr-2011 Rick Macklem <rmacklem@FreeBSD.org>

Modify the experimental NFSv4 server so that it handles
crossing of server mount points properly. The functions
nfsvno_fillattr() and nfsv4_fillattr() were modified to
take the extra arguments that are the mount point, a flag
to indicate that it is a file system root and the mounted
on fileno. The mount point argument needs to be busy when
nfsvno_fillattr() is called, since the vp argument is not
locked.

Reviewed by: kib
MFC after: 2 weeks


# 418802a9 29-Mar-2011 Zack Kirsch <zack@FreeBSD.org>

This patch fixes the Experimental NFS client to properly deal with 32 bit or 64
bit fileid's in NFSv2 and NFSv3. Without this fix, invalid casting (and sign
extension) was creating problems for any fileid greater than 2^31.

We discovered this because we have test clusters with more than 2 billion
allocated files and 64-bit ino_t's (and friend structures).

Reviewed by: rmacklem
Approved by: zml (mentor)
MFC after: 2 weeks


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# 8e27c182 07-Sep-2010 John Baldwin <jhb@FreeBSD.org>

Store the full timestamp when caching timestamps of files and
directories for purposes of validating name cache entries. This
closes races where two updates to a file or directory within the same
second could result in stale entries in the name cache. While here,
remove the 'n_expiry' field as it is no longer used.

Reviewed by: rmacklem
MFC after: 1 week


# 86836fcf 13-Jul-2010 Rick Macklem <rmacklem@FreeBSD.org>

For the experimental NFSv4 client, make sure that attributes that
predate the issue of a delegation are not cached once the delegation
is held. This is necessary, since cached attributes remain valid
while the delegation is held.

MFC after: 2 weeks


# b38f7723 12-Jun-2010 Konstantin Belousov <kib@FreeBSD.org>

In NFS clients, instead of inconsistently using #ifdef
DIAGNOSTIC and #ifndef DIAGNOSTIC for debug assertions, prefer
KASSERT(). Also change one #ifdef DIAGNOSTIC in the new nfs server.

Submitted by: Mikolaj Golub <to.my.trociny gmail com>
MFC after: 2 weeks


# 5783c9fe 05-May-2010 Rick Macklem <rmacklem@FreeBSD.org>

MFC: r207349
Delete a diagnostic statement that is no longer useful from
the experimental NFS client.


# 227b9ebe 30-Apr-2010 Rick Macklem <rmacklem@FreeBSD.org>

MFC: r207170
An NFSv4 server will reply NFSERR_GRACE for non-recovery RPCs
during the grace period after startup. This grace period must
be at least the lease duration, which is typically 1-2 minutes.
It seems prudent for the experimental NFS client to wait a few
seconds before retrying such an RPC, so that the server isn't
flooded with non-recovery RPCs during recovery. This patch adds
an argument to nfs_catnap() to implement a 5 second delay
for this case.


# 52d8bf89 29-Apr-2010 Rick Macklem <rmacklem@FreeBSD.org>

MFC: r207082
When the experimental NFS client is handling an NFSv4 server reboot
with delegations enabled, the recovery could fail if the renew
thread is trying to return a delegation, since it will not do the
recovery. This patch fixes the above by having nfscl_recalldeleg()
fail with the I/O operations returning EIO, so that they will be
attempted later. Most of the patch consists of adding an argument
to various functions to indicate the delegation recall case where
this needs to be done.


# cb8a84e0 28-Apr-2010 Rick Macklem <rmacklem@FreeBSD.org>

Delete a diagnostic statement that is no longer useful from
the experimental NFS client.

MFC after: 1 week


# 23f929df 24-Apr-2010 Rick Macklem <rmacklem@FreeBSD.org>

An NFSv4 server will reply NFSERR_GRACE for non-recovery RPCs
during the grace period after startup. This grace period must
be at least the lease duration, which is typically 1-2 minutes.
It seems prudent for the experimental NFS client to wait a few
seconds before retrying such an RPC, so that the server isn't
flooded with non-recovery RPCs during recovery. This patch adds
an argument to nfs_catnap() to implement a 5 second delay
for this case.

MFC after: 1 week


# 4a9c979c 22-Apr-2010 Rick Macklem <rmacklem@FreeBSD.org>

MFC: r206688
The experimental NFS client was not filling in recovery credentials
for opens done locally in the client when a delegation for the file
was held. This could cause the client to crash in crsetgroups() when
recovering from a server crash/reboot. This patch fills in the
recovery credentials for this case, in order to avoid the client crash.
Also, add KASSERT()s to the credential copy functions, to catch any
other cases where the credentials aren't filled in correctly.


# 67c5c2d2 22-Apr-2010 Rick Macklem <rmacklem@FreeBSD.org>

When the experimental NFS client is handling an NFSv4 server reboot
with delegations enabled, the recovery could fail if the renew
thread is trying to return a delegation, since it will not do the
recovery. This patch fixes the above by having nfscl_recalldeleg()
fail with the I/O operations returning EIO, so that they will be
attempted later. Most of the patch consists of adding an argument
to various functions to indicate the delegation recall case where
this needs to be done.

MFC after: 1 week


# 55909abf 15-Apr-2010 Rick Macklem <rmacklem@FreeBSD.org>

The experimental NFS client was not filling in recovery credentials
for opens done locally in the client when a delegation for the file
was held. This could cause the client to crash in crsetgroups() when
recovering from a server crash/reboot. This patch fills in the
recovery credentials for this case, in order to avoid the client crash.
Also, add KASSERT()s to the credential copy functions, to catch any
other cases where the credentials aren't filled in correctly.

MFC after: 1 week


# ea6ffaa7 14-Jan-2010 Rick Macklem <rmacklem@FreeBSD.org>

MFC: r201345
Fix the experimental NFS client so that it can create Unix
domain sockets on an NFSv4 mount point. It was generating
incorrect XDR in the request for this case.

Tested by: infofarmer


# 5d60668c 31-Dec-2009 Rick Macklem <rmacklem@FreeBSD.org>

Fix the experimental NFS client so that it can create Unix
domain sockets on an NFSv4 mount point. It was generating
incorrect XDR in the request for this case.

Tested by: infofarmer
MFC after: 2 weeks


# 74991298 03-Dec-2009 Edward Tomasz Napierala <trasz@FreeBSD.org>

Remove unneeded ifdefs.

Reviewed by: rmacklem


# 93a15e20 22-Jul-2009 Rick Macklem <rmacklem@FreeBSD.org>

When vfs.newnfs.callback_addr is set to an IPv4 address, the
experimental NFSv4 client might try and use it as an IPv6 address,
breaking callbacks. The fix simply initializes the isinet6 variable
for this case.

Approved by: re (kensmith), kib (mentor)


# 9ca27b56 09-Jul-2009 Rick Macklem <rmacklem@FreeBSD.org>

Since the nfscl_getclose() function both decremented open counts and,
optionally, created a separate list of NFSv4 opens to be closed, it
was possible for the associated OpenOwner to be free'd before the Open
was closed. The problem was that the Open was taken off the OpenOwner
list before the Close RPC was done and OpenOwners can be free'd once the
list is empty. This patch separates out the case of doing the Close RPC
into a separate function called nfscl_doclose() and simplifies nfsrpc_doclose()
so that it closes a single open instead of a list of them. This avoids
removing the Open from the OpenOwner list before doing the Close RPC.

Approved by: re (kensmith), kib (mentor)


# 47a59856 18-May-2009 Rick Macklem <rmacklem@FreeBSD.org>

Change the experimental NFSv4 client so that it does not do
the NFSv4 Close operations until ncl_inactive(). This is
necessary so that the Open StateIDs are available for doing
I/O on mmap'd files after VOP_CLOSE(). I also changed some
indentation for the nfscl_getclose() function.

Approved by: kib (mentor)


# 9ec7b004 04-May-2009 Rick Macklem <rmacklem@FreeBSD.org>

Add the experimental nfs subtree to the kernel, that includes
support for NFSv4 as well as NFSv2 and 3.
It lives in 3 subdirs under sys/fs:
nfs - functions that are common to the client and server
nfsclient - a mutation of sys/nfsclient that call generic functions
to do RPCs and handle state. As such, it retains the
buffer cache handling characteristics and vnode semantics that
are found in sys/nfsclient, for the most part.
nfsserver - the server. It includes a DRC designed specifically for
NFSv4, that is used instead of the generic DRC in sys/rpc.
The build glue will be checked in later, so at this point, it
consists of 3 new subdirs that should not affect kernel building.

Approved by: kib (mentor)