History log of /freebsd-current/sys/fs/nfs/nfs_commonkrpc.c
Revision Date Author Comments
# a5581308 26-Dec-2023 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Fix handling of expired Kerberos credentials (NFSv4.1/4.2)

If the NFS server detects that the Kerberos credentials provided
by a NFSv4.1/4.2 mount using sec=krb5[ip] have expired, the NFS
server replies with a krpc layer error of RPC_AUTHERROR.
When this happened, the client erroneously left the NFSv4.1/4.2
session slot busy, so that it could not be used by other RPCs.
If this happened for all session slots, the mount point would
hang.

This patch fixes the problem by releasing the session slot
and resetting its sequence# upon receiving a RPC_AUTHERROR
reply.

This bug only affects NFSv4.1/4.2 mounts using sec=krb5[ip],
but has existed since NFSv4.1 client support was added to
FreeBSD.

So, why has the bug remained undetected for so long?
I cannot be sure, but I suspect that, often, the client detected
the Kerberos credential expiration before attempting the RPC.
For this case, the client would not do the RPC and, as such,
there would be no busy session slot. Also, no hang would
occur until all session slots are busied (64 for a FreeBSD
client/server), so many cases of the bug probably went undetected?
Also, use of sec=krb5[ip] mounts are not that common.

PR: 275905
Tested by: Lexi <lexi.freebsd@le-fay.org>
MFC after: 1 week


# dd7d42a1 23-Oct-2023 Rick Macklem <rmacklem@FreeBSD.org>

nfscl/kgssapi: Fix Kerberized NFS mounts to pNFS servers

During recent testing related to the IETF NFSv4 Bakeathon, it was
discovered that Kerberized NFSv4.1/4.2 mounts to pNFS servers
(sec=krb5[ip],pnfs mount options) was broken.
The FreeBSD client was using the "service principal" for
the MDS to try and establish a rpcsec_gss credential for a DS,
which is incorrect. (A "service principal" looks like
"nfs@<fqdn-of-server>" and the <fqdn-of-server> for the DS is not
the same as the MDS for most pNFS servers.)

To fix this, the rpcsec_gss code needs to be able to do a
reverse DNS lookup of the DS's IP address. A new kgssapi upcall
to the gssd(8) daemon is added by this patch to do the reverse DNS
along with a new rpcsec_gss function to generate the "service
principal".

A separate patch to the gssd(8) will be committed, so that this
patch will fix the problem. Without the gssd(8) patch, the new
upcall fails and current/incorrect behaviour remains.

This bug only affects the rare case of a Kerberized (sec=krb5[ip],pnfs)
mount using pNFS.

This patch changes the internal KAPI between the kgssapi and
nfscl modules, but since I did a version bump a few days ago,
I will not do one this time.

MFC after: 1 month


# c4e29825 18-Oct-2023 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Handle the NFSERR_RETRYUNCACHEDREP error from a NFSv4 server

In a recent email list discussion related to NFSv4 mount problems
against a non-FreeBSD NFSv4 server, the reporter of the issue noted
that the server had replied 10068 (NFSERR_RETRYUNCACHEDREP). This
did not seem related to the mount problem, but I had never seen this
error before. It indicates that an RPC retry after a new TCP
connection has been established failed because the server did not
cache the reply. Since this should only happen for idempotent
operations, redoing the RPC should be safe.

This patch modifies the NFSv4.1/4.2 client to redo the RPC instead
of considering the server error fatal. It should only affect the
unusual case where TCP connections to NFSv4 servers are breaking
without the NFSv4 server rebooting.

Reported by: J David <j.devid.lists@gmail.com>
MFC after: 2 weeks


# db7257ef 17-Oct-2023 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: Fix a server crash

PR#274346 reports a crash which appears to be caused by a NULL default session
being destroyed. This patch should avoid the crash.

Tested by: Joshua Kinard <freebsd@kumba.dev>
PR: 274346
MFC after: 2 weeks


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 896516e5 16-Mar-2023 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Add a new NFSv4.1/4.2 mount option for Kerberized mounts

Without this patch, a Kerberized NFSv4.1/4.2 mount must provide
a Kerberos credential for the client at mount time. This credential
is typically referred to as a "machine credential". It can be
created one of two ways:
- The user (usually root) has a valid TGT at the time the mount
is done and this becomes the machine credential.
There are two problems with this.
1 - The user doing the mount must have a valid TGT for a user
principal at mount time. As such, the mount cannot be put
in fstab(5) or similar.
2 - When the TGT expires, the mount breaks.
- The client machine has a service principal in its default keytab
file and this service principal (typically called a host-based
initiator credential) is used as the machine credential.
There are problems with this approach as well:
1 - There is a certain amount of administrative overhead creating
the service principal for the NFS client, creating a keytab
entry for this principal and then copying the keytab entry
into the client's default keytab file via some secure means.
2 - The NFS client must have a fixed, well known, DNS name, since
that FQDN is in the service principal name as the instance.

This patch uses a feature of NFSv4.1/4.2 called SP4_NONE, which
allows the state maintenance operations to be performed by any
authentication mechanism, to do these operations via AUTH_SYS
instead of RPCSEC_GSS (Kerberos). As such, neither of the above
mechanisms is needed.

It is hoped that this option will encourage adoption of Kerberized
NFS mounts using TLS, to provide a more secure NFS mount.

This new NFSv4.1/4.2 mount option, called "syskrb5" must be used
with "sec=krb5[ip]" to avoid the need for either of the above
Kerberos setups to be done by the client.

Note that all file access/modification operations still require
users on the NFS client to have a valid TGT recognized by the
NFSv4.1/4.2 server. As such, this option allows, at most, a
malicious client to do some sort of DOS attack.

Although not required, use of "tls" with this new option is
encouraged, since it provides on-the-wire encryption plus,
optionally, client identity verification via a X.509
certificate provided to the server during TLS handshake.
Alternately, "sec=krb5p" does provide on-the-wire
encryption of file data.

A mount_nfs(8) man page update will be done in a separate commit.

Discussed on: freebsd-current@
MFC after: 3 months


# a63b5d48 16-Feb-2023 Rick Macklem <rmacklem@FreeBSD.org>

nfscommon: Revert use of nfsstatsv1_p in nfs_commonkrpc.c

Commit 9d329bbc9aea converted a lot of accesses to nfsstatsv1
to use nfsstatsv1_p instead. However, the accesses in
nfs_commonkrpc.c are for client side and should not be
converted. This patch puts them back in the correct
pre-commit 9d329bbc9aea form.

MFC after: 3 months


# b039ca07 15-Feb-2023 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: Wrap nfsstatsv1_p in the NFSD_VNET() macro

Commit 7344856e3a6d added a lot of macros that will front end
vnet macros so that nfsd(8) can run in vnet prison.
The nfsstatsv1_p variable got missed. This patch wraps all
uses of nfsstatsv1_p with the NFSD_VNET() macro.
The NFSD_VNET() macro is still a null macro.

MFC after: 3 months


# 9d329bbc 13-Feb-2023 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: Continue adding macros so nfsd can run in a vnet prison

Commit 7344856e3a6d added a lot of macros that will front end
vnet macros so that nfsd(8) can run in vnet prison.
This patch adds some more of them and also a lot of uses of
nfsstatsv1_p instead of nfsstatsv1. nfsstatsv1_p points to
nfsstatsv1 for prison0, but will point to a malloc'd structure
for other prisons.

It also puts nfsstatsv1_p in nfscommon.ko instead of nfsd.ko.

MFC after: 3 months


# 0685c73c 28-Aug-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Add a console message for session recovery

The NFSv4.1/4.2 client does recovery when it receives a
NFSERR_BADSESSION reply from the server. If the server has
not rebooted, this is often caused by multiple clients using
the same /etc/hostid and, as such, not being recognized as
different clients by the server.

This trivial patch adds a console message to suggest that
client's /etc/hostid's need to be checked for uniqueness.

MFC after: 2 weeks


# fb29f817 27-Aug-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Fix handling of nd_slotid while handling NFSERR_BADSESSION

When the NFSv4.1/4.2 client is handling a server error
of NFSERR_BADSESSION, it retries RPCs with a new session.
Without this patch, the nd_slotid was not being updated
for the new session.

This would result in a bogus console message like
"Wrong session srvslot=X slot=Y" and then it would
free the incorrect slot, often generating a
"freeing free slot!!" console message as well.

This patch fixes the problem.

Note that FreeBSD NFSv4.1/4.2 servers only
generate a NFSERR_BADSESSION error after a reboot
or after a client does a DestroySession operation.

PR: 260011
MFC after: 1 week


# f2dfe607 27-Aug-2022 Rick Macklem <rmacklem@FreeBSD.org>

Revert "nfscl: Fix handling of nd_slotid while handling NFSERR_BADSESSION"

Revert this commit, since I now have a better fix to commit.

This reverts commit 8e59ec29e47f6ec64f54ddd88cab388ae536f0ff.


# 8e59ec29 25-Aug-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Fix handling of nd_slotid while handling NFSERR_BADSESSION

When the NFSv4.1/4.2 client is handling a server error
of NFSERR_BADSESSION, it retries RPCs with a new session.
Without this patch, the nd_slotid was not being updated
for the new session.

This would result in a bogus console message like
"Wrong session srvslot=X slot=Y" and then it would
free the incorrect slot, often generating a
"freeing free slot!!" console message as well.

This patch fixes the problem.

Note that FreeBSD NFSv4.1/4.2 servers only
generate a NFSERR_BADSESSION error after a reboot
or after a client does a DestroySession operation.

PR: 260011
MFC after: 1 week


# 2b612c9d 25-Aug-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Fix handling of a bad session slot (NFSv4.1/4.2)

When a session has been marked defunct by the server
sending a NFSERR_BADSESSION reply to the NFSv4.1/4.2
client, nfsv4_sequencelookup() returns NFSERR_BADSESSION
without actually assigning a session slot.
Without this patch, newnfs_request() would erroneously
free slot 0.

This could result in the slot being reused prematurely,
but most likely just generated a "freeing free slot!!"
console message.

This patch fixes the code to not do the erroneous
freeing of the slot for this case.

PR: 260011
MFC after: 1 week


# 981ef322 10-Jul-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Enable detection of bad session slots

To deal with broken session slots caused by the use of the
"soft" and/or "intr" mount options, nfsv4_sequencelookup()
has been modified to track the potentially broken session
slots (commit 40ada74ee1da). Then, when all session slots
are potentially broken, nfsv4_sequencelookup() does a
DeleteSession operation, so that the NFSv4.1/4.2 server will
reply NFSERR_BADSESSION to uses of the session.
The client will then recover by doing a CreateSession to
acquire a new session.

This patch adds the code that marks potentially bad
slots, so that the above semantics become functional.
It has been successfully tested against a FreeBSD
NFSv4.1/4.2 server, but does not work against a Linux 5.15
NFSv4.1/4.2 server. (The Linux 5.15 server creates
a new session with the same sessionid as the destroyed
one and, as such, keeps returning NFSERR_BADSESSION.
I believe this is a bug in the Linux server.)

However, this should not cause a regression and will
make "intr" mounts fairly usable against the NFSv4.1/4.2
servers where it works.

PR: 260011
MFC after: 2 weeks


# 40ada74e 09-Jul-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Add optional support for slots marked bad

This patch adds support for session slots marked bad
to nfsv4_sequencelookup(). An additional boolean
argument indicates if the check for slots marked bad
should be done.

The "cred" argument added to nfscl_reqstart() by
commit 326bcf9394c7 is now passed into nfsv4_setquence()
so that it can optionally set the boolean argument
for nfsv4_sequencelookup(). When optionally enabled,
nfsv4_setsequence() will do a DestroySession when all
slots are marked bad.

Since the code that marks slots bad is not yet committed,
this patch should not result in a semantics change.

PR: 260011
MFC after: 2 weeks


# 1c15c8c0 27-Nov-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Sanity check the Sequence slotid in reply

The slotid in the Sequence reply must be the same as
in the request. Check that it is the same and log
a console message if it is not, plus set it to the
correct value.

Reported by: rtm@lcs.mit.edu
Tested by: rtm@lcs.mit.edu
PR: 260071
MFC after: 2 weeks


# 80e5955b 04-Nov-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Fix NFSv4.1/4.2 pnfs mounts using nconnect

When a mount with the "pnfs" and "nconnect" options specified
does an I/O operation, it erroneously uses a TCP connection
to the MDS when it is meant to be a DS operation and, as such,
needs to use a TCP connection to the DS. This patch fixes this.

When the "pnfs" and "nconnect" options are specified for a
NFSv4.1/4.2 mount, there probably should be N connections
established to each DS for I/O RPCs. This is a fair amount
of work and may be done in a future commit.

This problem was found during a recent IETF NFSv4 working
group testing event.

MFC after: 2 weeks


# ae49051c 03-Nov-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Fix forced dismount when "nconnect" is specified

When a forced dismount is done and the "nconnect" mount
option was used, the additional connections must be closed.
This patch does that.

Found during a recent IETF NFSv4 working group testing event.

MFC after: 2 weeks


# 5a95a6e8 01-Nov-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Use a smaller initial delay time for NFSERR_DELAY

For NFS RPCs that receive a NFSERR_DELAY reply, the delay time
is initially 1sec and then increases exponentially to NFS_TRYLATERDEL.
It was found that this delay time is excessive for some NFSv4
servers, which work well with a 1msec delay.
A 1sec delay resulted in very slow performance for Remove and
Rename when delegations and pNFS were enabled.

This patch decreases the initial delay time to 1msec.

Found during a recent IETF NFSv4 working group testing event.

MFC after: 2 weeks


# 1e0a518d 08-Jul-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Add a Linux compatible "nconnect" mount option

Linux has had an "nconnect" NFS mount option for some time.
It specifies that N (up to 16) TCP connections are to created for a mount,
instead of just one TCP connection.

A discussion on freebsd-net@ indicated that this could improve
client<-->server network bandwidth, if either the client or server
have one of the following:
- multiple network ports aggregated to-gether with lagg/lacp.
- a fast NIC that is using multiple queues
It does result in using more IP port#s and might increase server
peak load for a client.

One difference from the Linux implementation is that this implementation
uses the first TCP connection for all RPCs composed of small messages
and uses the additional TCP connections for RPCs that normally have
large messages (Read/Readdir/Write). The Linux implementation spreads
all RPCs across all TCP connections in a round robin fashion, whereas
this implementation spreads Read/Readdir/Write across the additional
TCP connections in a round robin fashion.

Reviewed by: markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D30970


# c5f4772c 30-Jun-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Improve "Consider increasing kern.ipc.maxsockbuf" message

When the setting of kern.ipc.maxsockbuf is less than what is
desired for I/O based on vfs.maxbcachebuf and vfs.nfs.bufpackets,
a console message of "Consider increasing kern.ipc.maxsockbuf".
is printed.

This patch modifies the message to provide a suggested value
for kern.ipc.maxsockbuf.
Note that the setting is only needed when the NFS rsize/wsize
is set to vfs.maxbcachebuf.

While here, make nfs_bufpackets global, so that it can be used
by a future patch that adds a sysctl to set the NFS server's
maximum I/O size. Also, remove "sizeof(u_int32_t)" from the maximum
packet length, since NFS_MAXXDR is already an "overestimate"
of the actual length.

MFC after: 2 weeks


# fc0dc940 18-May-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: Reduce the callback timeout to 800msec

Recent discussion on the nfsv4@ietf.org mailing list confirmed
that an NFSv4 server should reply to an RPC in less than 1second.
If an NFSv4 RPC requires a delegation be recalled,
the server will attempt a CB_RECALL callback.
If the client is not responsive, the RPC reply will be delayed
until the callback times out.
Without this patch, the timeout is set to 4 seconds (set in
ticks, but used as seconds), resulting in the RPC reply taking over 4sec.
This patch redefines the constant as being in milliseconds and it
implements that for a value of 800msec, to ensure the RPC
reply is sent in less than 1second.

This patch only affects mounts from clients when delegations
are enabled on the server and the client is unresponsive to callbacks.

MFC after: 2 weeks


# f5ff282b 26-Apr-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: fix the handling of NFSERR_DELAY for Open/LayoutGet RPCs

For a pNFS mount, the NFSv4.1/4.2 client uses compound RPCs that
have both Open and LayoutGet operations in them.
If the pNFS server were tp reply NFSERR_DELAY for one of these
compounds, the retry after a delay cannot be handled by
newnfs_request(), since there is a reference held on the open
state for the Open operation in them.

Fix this by adding these RPCs to the "don't do delay here"
list in newnfs_request().

This patch is only needed if the mount is using pNFS (the "pnfs"
mount option) and probably only matters if the MDS server
is issuing delegations as well as pNFS layouts.

Found by code inspection.

MFC after: 2 weeks


# 87597731 26-Apr-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: fix the slot sequence# when a callback fails

Commit 4281bfec3628 patched the server so that the
callback session slot would be free'd for reuse when
a callback attempt fails.
However, this can often result in the sequence# for
the session slot to be advanced such that the client
end will reply NFSERR_SEQMISORDERED.

To avoid the NFSERR_SEQMISORDERED client reply,
this patch negates the sequence# advance for the
case where the callback has failed.
The common case is a failed back channel, where
the callback cannot be sent to the client, and
not advancing the sequence# is correct for this
case. For the uncommon case where the client's
reply to the callback is lost, not advancing the
sequence# will indicate to the client that the
next callback is a retry and not a new callback.
But, since the FreeBSD server always sets "csa_cachethis"
false in the callback sequence operation, a retry
and a new callback should be handled the same way
by the client, so this should not matter.

Until you have this patch in your NFSv4.1/4.2 server,
you should consider avoiding the use of delegations.
Even with this patch, interoperation with the
Linux NFSv4.1/4.2 client in kernel versions prior
to 5.3 can result in frequent 15second delays if
delegations are enabled. This occurs because, for
kernels prior to 5.3, the Linux client does a TCP
reconnect every time it sees multiple concurrent
callbacks and then it takes 15seconds to recover
the back channel after doing so.

MFC after: 2 weeks


# 665b1365 21-Dec-2020 Rick Macklem <rmacklem@FreeBSD.org>

Add a new "tlscertname" NFS mount option.

When using NFS-over-TLS, an NFS client can optionally provide an X.509
certificate to the server during the TLS handshake. For some situations,
such as different NFS servers or different certificates being mapped
to different user credentials on the NFS server, there may be a need
for different mounts to provide different certificates.

This new mount option called "tlscertname" may be used to specify a
non-default certificate be provided. This alernate certificate will
be stored in /etc/rpc.tlsclntd in a file with a name based on what is
provided by this mount option.


# 586ee69f 01-Sep-2020 Mateusz Guzik <mjg@FreeBSD.org>

fs: clean up empty lines in .c and .h files


# 6e4b6ff8 27-Aug-2020 Rick Macklem <rmacklem@FreeBSD.org>

Add flags to enable NFS over TLS to the NFS client and server.

An Internet Draft titled "Towards Remote Procedure Call Encryption By Default"
(soon to be an RFC I think) describes how Sun RPC is to use TLS with NFS
as a specific application case.
Various commits prepared the NFS code to use KERN_TLS, mainly enabling use
of ext_pgs mbufs for large RPC messages.
r364475 added TLS support to the kernel RPC.

This commit (which is the final one for kernel changes required to do
NFS over TLS) adds support for three export flags:
MNT_EXTLS - Requires a TLS connection.
MNT_EXTLSCERT - Requires a TLS connection where the client presents a valid
X.509 certificate during TLS handshake.
MNT_EXTLSCERTUSER - Requires a TLS connection where the client presents a
valid X.509 certificate with "user@domain" in the otherName
field of the SubjectAltName during TLS handshake.
Without these export options, clients are permitted, but not required, to
use TLS.

For the client, a new nmount(2) option called "tls" makes the client do
a STARTTLS Null RPC and TLS handshake for all TCP connections used for the
mount. The CLSET_TLS client control option is used to indicate to the kernel RPC
that this should be done.

Unless the above export flags or "tls" option is used, semantics should
not change for the NFS client nor server.

For NFS over TLS to work, the userspace daemons rpctlscd(8) { for client }
or rpctlssd(8) daemon { for server } must be running.


# 02511d21 10-Aug-2020 Rick Macklem <rmacklem@FreeBSD.org>

Add an argument to newnfs_connect() that indicates use TLS for the connection.

For NFSv4.0, the server creates a server->client TCP connection for callbacks.
If the client mount on the server is using TLS, enable TLS for this callback
TCP connection.
TLS connections from clients will not be supported until the kernel RPC
changes are committed.

Since this changes the internal ABI between the NFS kernel modules that
will require a version bump, delete newnfs_trimtrailing(), which is no
longer used.

Since LCL_TLSCB is not yet set, these changes should not have any semantic
affect at this time.


# e3e7c612 11-Apr-2020 Rick Macklem <rmacklem@FreeBSD.org>

Replace mbuf macros with the code they would generate in the NFS code.

When the code was ported to Mac OS/X, mbuf handling functions were
converted to using the Mac OS/X accessor functions. For FreeBSD, they
are a simple set of macros in sys/fs/nfs/nfskpiport.h.
Since porting to Mac OS/X is no longer a consideration, replacement of
these macros with the code generated by them makes the code more
readable.
When support for external page mbufs is added as needed by the KERN_TLS,
the patch becomes simpler if done without the macros.

This patch should not result in any semantic change.

This is the final patch of this series and the macros should now be
able to be deleted from the .h files in a future commit.


# 9f6624d3 11-Apr-2020 Rick Macklem <rmacklem@FreeBSD.org>

Replace mbuf macros with the code they would generate in the NFS code.

When the code was ported to Mac OS/X, mbuf handling functions were
converted to using the Mac OS/X accessor functions. For FreeBSD, they
are a simple set of macros in sys/fs/nfs/nfskpiport.h.
Since porting to Mac OS/X is no longer a consideration, replacement of
these macros with the code generated by them makes the code more
readable.
When support for external page mbufs is added as needed by the KERN_TLS,
the patch becomes simpler if done without the macros.

This patch should not result in any semantic change.


# 355b3b7f 26-Mar-2020 Mark Johnston <markj@FreeBSD.org>

Simplify td_ucred handling in newnfs_connect().

No functional change intended.

MFC after: 1 week


# 8e1906f7 07-Dec-2019 Rick Macklem <rmacklem@FreeBSD.org>

Fix kernel handling of a NFSERR_MINORVERSMISMATCH NFSv4 server reply.

When an NFSv4 server replies NFSERR_MINORVERSMISMATCH, it does not generate
a status result for the first operation in the compound. Without this
patch, this will result in a bogus EBADXDR error return.
Returning EBADXDR is relatively harmless, but a correct reply of
NFSERR_MINORVERSMISMATCH is needed by the pNFS client to select the correct
minor version to use for a File Layout DS now that there can be NFSv4.2
DS servers.

mount_nfs.c still needs to be fixed for this, although how the mount fails
is only useful to help sysadmins isolate why a mount fails.

Found during testing of the NFSv4.2 client and server.

MFC after: 2 weeks


# 02c8dd7d 04-Apr-2019 Rick Macklem <rmacklem@FreeBSD.org>

Revert r320698, since the related userland changes were reverted by r338192.

r338192 reverted the changes to nfsuserd so that it could use an AF_LOCAL
socket, since it resulted in a vnode locking panic().
Post r338192 nfsuserd daemons use the old AF_INET socket for upcalls and
do not use these kernel changes.
I left them in for a while, so that nfsuserd daemons built from head sources
between r320757 (Jul. 6, 2017) and r338192 (Aug. 22, 2018) would need them
by default.
This only affects head, since the changes were never MFC'd.
I will add an UPDATING entry, since an nfsuserd daemon built from head
sources between r320757 and r338192 will not run unless the "-use-udpsock"
option is specified. (This command line option is only in the affected
revisions of the nfsuserd daemon.)

I suspect few will be affected by this, since most who run systems built
from head sources (not stable or releases) will have rebuilt their nfsuserd
daemon from sources post r338192 (Aug. 22, 2018)

This is being reverted in preparation for an update to include AF_INET6
support to the code.


# 93df87f2 07-Aug-2018 Rick Macklem <rmacklem@FreeBSD.org>

Allow newnfs_request() to retry all callback RPCs with an NFSERR_DELAY reply.

The code in newnfs_request() retries RPCs that get a reply of NFSERR_DELAY,
but exempts certain NFSv4 operations. However, for callback RPCs, there
should not be any exemptions at this time. The code would have erroneously
exempted the CBRECALL callback, since it has the same operation number as
the CLOSE operation.
This patch fixes this by checking for a callback RPC (indicated by clp != NULL)
and not checking for exempt operations for callbacks.
This would have only affected the NFSv4 server when delegations are enabled
(they are not enabled by default) and the client replies to CBRECALL with
NFSERR_DELAY. This may never actually happen.
Spotted during code inspection.

MFC after: 2 weeks


# cecf6c6e 20-Jul-2018 Rick Macklem <rmacklem@FreeBSD.org>

Set CLSET_TIMEOUT on TCP connections to pNFS DSs.

Use CLSET_TIMEOUT to set the timeout for connections to DSs instead of
specifying a timeout on each RPC. This is done so that SO_SNDTIMEO
is set on the TCP socket as well as specifying a time limit when
waiting for an RPC reply. Useful if the send queue for the TCP
connection has become constipated, due to a failed DS.
The choice of lease_duration / 4 is fairly arbitrary, but seems to work
ok, with a lower bound of 10sec.
For client connections to a DS, set the retry limit to vfs.nfsd.dsretries,
which is 2 by default.
This patch should only affect pNFS connections to DSs.
This patch requires r336542.

MFC after: 2 weeks


# ba6cce3a 22-Jun-2018 Rick Macklem <rmacklem@FreeBSD.org>

Fix the handling of NFSv4.1 sessions for "soft" mounts.

When a "soft" mount is used for NFSv4.1, an RPC that fails without completing
will leave a slot in the NFSv4.1 session in an indeterminate state.
As such, all that can be done is free up the slot while making is no longer
usable.
A "soft" NFSv4.1 mount is not recommended in general, since it will leave
Open/Lock state in an indeterminate state. An exception is a pNFS mount of
a DS, since there are no Opens/Locks done for them except file creates
where loss of the Open state does not matter.
The patch also makes connections to DSs soft, so that they will fail when
a DS is non-functional or network partitioned, allowing the pNFS MDS to disable
the DS for a mirrored configuration.
This patch should not affect normal "hard" NFSv4.1 mounts.

MFC after: 2 weeks


# 755e4b79 17-Jun-2018 Rick Macklem <rmacklem@FreeBSD.org>

Revert r335263, since it can cause crashes in unusual circumstances.
This needs to be fixed in a different way.


# 46d30d3d 16-Jun-2018 Rick Macklem <rmacklem@FreeBSD.org>

Fix NFSv4.1 client side handling of "soft,retrans=2" mounts.

Normally "soft,retrans=2" cannot be safely used on NFSv4 mounts, since
the RPC can fail and leave the open/lock state in an undefined state.
Doing I/O on a pNFS DS is an exception to this, since no open/lock state
is maintained on the DS server.
It is useful to do "soft,retrans=2" connections to a DS when it is mirrored,
so that the client can detect failure of the DS. As such, mounts from the MDS
to the DSs should use these mount options when mirroring is enabled.
However, the NFSv4.1 client still leaves the session in an undefined state
when this happens.
This patch fixes the problem by setting the session defunct, so it will
no longer be used.
The patch also sets "retries=2" on the connections done by the client to
a DS, which is the internal equivalent of "soft,retrans=2".
The client does not know if the server implements mirroring at connection
time, but always doing this should be safe, since it will fall back on doing
I/O via the MDS as a proxy when there is a failure doing an I/O RPC to the DS.

This patch should not affect non-pNFS client mounts.

MFC after: 2 weeks


# 73b1879c 11-Jun-2018 Rick Macklem <rmacklem@FreeBSD.org>

Add a couple of safety belt checks to the NFSv4.1 client related to sessions.

There were a couple of cases in newnfs_request() that it assumed that it
was an NFSv4.1 mount with a session. This should always be the case when
a Sequence operation is in the reply or the server replies NFSERR_BADSESSION.
However, if a server was broken and sent an erroneous reply, these safety
belt checks should avoid trouble.
The one check required a small tweak to nfsmnt_mdssession() so that it
returns NULL when there is no session instead of the offset of the field
in the structure (0x8 for i386).
This patch should have no effect on normal operation of the client.
Found by inspection during pNFS server development.

MFC after: 2 weeks


# 222daa42 25-Jan-2018 Conrad Meyer <cem@FreeBSD.org>

style: Remove remaining deprecated MALLOC/FREE macros

Mechanically replace uses of MALLOC/FREE with appropriate invocations of
malloc(9) / free(9) (a series of sed expressions). Something like:

* MALLOC(a, b, ... -> a = malloc(...
* FREE( -> free(
* free((caddr_t) -> free(

No functional change.

For now, punt on modifying contrib ipfilter code, leaving a definition of
the macro in its KMALLOC().

Reported by: jhb
Reviewed by: cy, imp, markj, rmacklem
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D14035


# 151ba793 24-Dec-2017 Alexander Kabaev <kan@FreeBSD.org>

Do pass removing some write-only variables from the kernel.

This reduces noise when kernel is compiled by newer GCC versions,
such as one used by external toolchain ports.

Reviewed by: kib, andrew(sys/arm and sys/arm64), emaste(partial), erj(partial)
Reviewed by: jhb (sys/dev/pci/* sys/kern/vfs_aio.c and sys/kern/kern_synch.c)
Differential Revision: https://reviews.freebsd.org/D10385


# 51369649 20-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.


# 63918d38 10-Oct-2017 Rick Macklem <rmacklem@FreeBSD.org>

Fix forced dismount when a pNFS mount is hung on a DS.

When a "pnfs" NFSv4.1 mount is hung because of an unresponsive DS,
a forced dismount wouldn't work, because the RPC socket for the DS
was not being closed. This patch fixes this.
This will only affect "pnfs" mounts where the pNFS server's DS
is unresponsive (crashed or network partitioned or...).
Found during testing of the pNFS server.

MFC after: 2 weeks


# 16f300fa 27-Jul-2017 Rick Macklem <rmacklem@FreeBSD.org>

Replace the checks for MNTK_UNMOUNTF with a macro that does the same thing.

This patch defines a macro that checks for MNTK_UNMOUNTF and replaces
explicit checks with this macro. It has no effect on semantics, but
prepares the code for a future patch where there will also be a
NFS specific flag for "forced dismount about to occur".

Suggested by: kib
MFC after: 2 weeks


# 25d694a6 05-Jul-2017 Rick Macklem <rmacklem@FreeBSD.org>

Add support for AF_LOCAL socket upcalls to the nfsuserd daemon.

This patch adds support for AF_LOCAL socket upcalls to an nfsuserd daemon
that supports them. A future patch to the nfsuserd daemon will use AF_LOCAL
sockets to avoid a problem when using upcalls to 127.0.0.1 if jails are
in use.

Suggested by: dfr
PR: 205193


# ee791357 19-Jun-2017 Rick Macklem <rmacklem@FreeBSD.org>

Add the definition of maxbcachebuf to sys/buf.h.

r320070 removed the definition of maxbcachebuf from sys/param.h to
fix the build for arm.
This patch adds the definition of maxbcachebuf to sys/buf.h, which
should be ok, since sys/buf.h is not being included in arm/arm/elf_note.S.

Suggested by: kib
MFC after: 2 weeks


# 1d9f01b1 17-Jun-2017 Rick Macklem <rmacklem@FreeBSD.org>

Take "extern int maxbcachebuf" out of sys/param.h, since it breaks the
arm build.

In the arm build, elf_note.S includes sys/param.h and then does an
elf macro called ELFNOTE(). Although the compile error doesn't make
sense to me, I believe it just means that an "extern ..." can't exist
in param.h for this inclusion case.
I suspect adding #if !defined(LOCORE) might fix the build, but this
commit just takes the definition out.
I will ask freebsd-current@ what is the best was to deal with this
and do a subsequent commit after that.

Reported by: melounmichal@gmail.com


# d1c5e240 17-Jun-2017 Rick Macklem <rmacklem@FreeBSD.org>

Make MAXBCACHEBUF a tunable called vfs.maxbcachebuf.

By making MAXBCACHEBUF a tunable, it can be increased to allow for
larger read/write data sizes for the NFS client.
The tunable is limited to MAXPHYS, which is currently 128K.
Making MAXPHYS a tunable or increasing its value is being discussed,
since it would be nice to support a read/write data size of 1Mbyte
for the NFS client when mounting the AmazonEFS file service.

Reviewed by: kib
MFC after: 2 weeks
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D10991


# 0596f343 21-Apr-2017 Rick Macklem <rmacklem@FreeBSD.org>

Don't set ND_NOMOREDATA for a failed Setattr operation (NFSv4).

The NFSv4 Setattr operation always has reply data even when it fails,
so don't set the ND_NOMOREDATA for it. This would only affect unusual
cases where Setattr fails and the RPC code wants to parse the rest of
the compound. Detected during recent development related to the pNFS server.

MFC after: 2 weeks


# 40f8ff48 21-Apr-2017 Rick Macklem <rmacklem@FreeBSD.org>

Don't create a backchannel for a DS connection.

An NFSv4.1 client connection to a Data Server (DS) should not have a
backchannel. This patch fixes the NFSv4.1/pNFS client to not do a backchannel
for this case.
Found during recent testing with the pNFS server under development.

MFC after: 2 weeks


# fbbd9655 28-Feb-2017 Warner Losh <imp@FreeBSD.org>

Renumber copyright clause 4

Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by: Jan Schaumann <jschauma@stevens.edu>
Pull Request: https://github.com/freebsd/freebsd/pull/96


# b2fc0141 23-Dec-2016 Rick Macklem <rmacklem@FreeBSD.org>

Fix NFSv4.1 client recovery from NFS4ERR_BAD_SESSION errors.

For most NFSv4.1 servers, a NFS4ERR_BAD_SESSION error is a rare failure
that indicates that the server has lost session/open/lock state.
However, recent testing by cperciva@ against the AmazonEFS server found
several problems with client recovery from this due to it generating this
failure frequently.
Briefly, the problems fixed are:
- If all session slots were in use at the time of the failure, some processes
would continue to loop waiting for a slot on the old session forever.
- If an RPC that doesn't use open/lock state failed with NFS4ERR_BAD_SESSION,
it would fail the RPC/syscall instead of initiating recovery and then
looping to retry the RPC.
- If a successful reply to an RPC for an old session wasn't processed
until after a new session was created for a NFS4ERR_BAD_SESSION error,
it would erroneously update the new session and corrupt it.
- The use of the first element of the session list in the nfs mount
structure (which is always the current metadata session) was slightly
racey. With changes for the above problems it became more racey, so all
uses of this head pointer was wrapped with a NFSLOCKMNT()/NFSUNLOCKMNT().
- Although the kernel malloc() usually allocates more bytes than requested
and, as such, this wouldn't have caused problems, the allocation of a
session structure was 1 byte smaller than it should have been.
(Null termination byte for the string not included in byte count.)

There are probably still problems with a pNFS data server that fails
with NFS4ERR_BAD_SESSION, but I have no server that does this to test
against (the AmazonEFS server doesn't do pNFS), so I can't fix these yet.

Although this patch is fairly large, it should only affect the handling
of NFS4ERR_BAD_SESSION error replies from an NFSv4.1 server.
Thanks go to cperciva@ for the extension testing he did to help isolate/fix
these problems.

Reported by: cperciva
Tested by: cperciva
MFC after: 3 months
Differential Revision: https://reviews.freebsd.org/D8745


# 1b819cf2 12-Aug-2016 Rick Macklem <rmacklem@FreeBSD.org>

Update the nfsstats structure to include the changes needed by
the patch in D1626 plus changes so that it includes counts for
NFSv4.1 (and the draft of NFSv4.2).
Also, make all the counts uint64_t and add a vers field at the
beginning, so that future revisions can easily be implemented.
There is code in place to handle the old vesion of the nfsstats
structure for backwards binary compatibility.

Subsequent commits will update nfsstat(8) to use the new fields.

Submitted by: will (earlier version)
Reviewed by: ken
MFC after: 1 month
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D1626


# 02abd400 19-Apr-2016 Pedro F. Giffuni <pfg@FreeBSD.org>

kernel: use our nitems() macro when it is available through param.h.

No functional change, only trivial cases are done in this sweep,

Discussed in: freebsd-current


# c15882f0 22-Dec-2014 Rick Macklem <rmacklem@FreeBSD.org>

Remove the old NFS client and server from head,
which means that the NFSCLIENT and NFSSERVER
kernel options will no longer work. This commit
only removes the kernel components. Removal of
unused code in the user utilities will be done
later. This commit does not include an addition
to UPDATING, but that will be committed in a
few minutes.

Discussed on: freebsd-fs


# c59e4cc3 01-Jul-2014 Rick Macklem <rmacklem@FreeBSD.org>

Merge the NFSv4.1 server code in projects/nfsv4.1-server over
into head. The code is not believed to have any effect
on the semantics of non-NFSv4.1 server behaviour.
It is a rather large merge, but I am hoping that there will
not be any regressions for the NFS server.

MFC after: 1 month


# 54366c0b 25-Nov-2013 Attilio Rao <attilio@FreeBSD.org>

- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging
option, unbreak the lock tracing release semantic by embedding
calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined
version of the releasing functions for mutex, rwlock and sxlock.
Failing to do so skips the lockstat_probe_func invokation for
unlocking.
- As part of the LOCKSTAT support is inlined in mutex operation, for
kernel compiled without lock debugging options, potentially every
consumer must be compiled including opt_kdtrace.h.
Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the
dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES
is linked there and it is only used as a compile-time stub [0].

[0] immediately shows some new bug as DTRACE-derived support for debug
in sfxge is broken and it was never really tested. As it was not
including correctly opt_kdtrace.h before it was never enabled so it
was kept broken for a while. Fix this by using a protection stub,
leaving sfxge driver authors the responsibility for fixing it
appropriately [1].

Sponsored by: EMC / Isilon storage division
Discussed with: rstone
[0] Reported by: rstone
[1] Discussed with: philip


# cc085ba8 03-Nov-2013 Rick Macklem <rmacklem@FreeBSD.org>

During code inspection, I spotted that there was a code path where
CLNT_CONTROL() would be called on "client" after it was
released via CLNT_RELEASE(). It was unlikely that this
code path gets executed and I have not heard of any problem
report caused by this bug. This patch fixes the code so that
this cannot happen.

MFC after: 2 months


# 88a2437a 08-Jul-2013 Rick Macklem <rmacklem@FreeBSD.org>

Add support for host-based (Kerberos 5 service principal) initiator
credentials to the kernel rpc. Modify the NFSv4 client to add
support for the gssname and allgssname mount options to use this
capability. Requires the gssd daemon to be running with the "-h" option.

Reviewed by: jhb


# d96b98a3 17-Apr-2013 Kenneth D. Merry <ken@FreeBSD.org>

Revamp the old NFS server's File Handle Affinity (FHA) code so that
it will work with either the old or new server.

The FHA code keeps a cache of currently active file handles for
NFSv2 and v3 requests, so that read and write requests for the same
file are directed to the same group of threads (reads) or thread
(writes). It does not currently work for NFSv4 requests. They are
more complex, and will take more work to support.

This improves read-ahead performance, especially with ZFS, if the
FHA tuning parameters are configured appropriately. Without the
FHA code, concurrent reads that are part of a sequential read from
a file will be directed to separate NFS threads. This has the
effect of confusing the ZFS zfetch (prefetch) code and makes
sequential reads significantly slower with clients like Linux that
do a lot of prefetching.

The FHA code has also been updated to direct write requests to nearby
file offsets to the same thread in the same way it batches reads,
and the FHA code will now also send writes to multiple threads when
needed.

This improves sequential write performance in ZFS, because writes
to a file are now more ordered. Since NFS writes (generally
less than 64K) are smaller than the typical ZFS record size
(usually 128K), out of order NFS writes to the same block can
trigger a read in ZFS. Sending them down the same thread increases
the odds of their being in order.

In order for multiple write threads per file in the FHA code to be
useful, writes in the NFS server have been changed to use a LK_SHARED
vnode lock, and upgrade that to LK_EXCLUSIVE if the filesystem
doesn't allow multiple writers to a file at once. ZFS is currently
the only filesystem that allows multiple writers to a file, because
it has internal file range locking. This change does not affect the
NFSv4 code.

This improves random write performance to a single file in ZFS, since
we can now have multiple writers inside ZFS at one time.

I have changed the default tuning parameters to a 22 bit (4MB)
window size (from 256K) and unlimited commands per thread as a
result of my benchmarking with ZFS.

The FHA code has been updated to allow configuring the tuning
parameters from loader tunable variables in addition to sysctl
variables. The read offset window calculation has been slightly
modified as well. Instead of having separate bins, each file
handle has a rolling window of bin_shift size. This minimizes
glitches in throughput when shifting from one bin to another.

sys/conf/files:
Add nfs_fha_new.c and nfs_fha_old.c. Compile nfs_fha.c
when either the old or the new NFS server is built.

sys/fs/nfs/nfsport.h,
sys/fs/nfs/nfs_commonport.c:
Bring in changes from Rick Macklem to newnfs_realign that
allow it to operate in blocking (M_WAITOK) or non-blocking
(M_NOWAIT) mode.

sys/fs/nfs/nfs_commonsubs.c,
sys/fs/nfs/nfs_var.h:
Bring in a change from Rick Macklem to allow telling
nfsm_dissect() whether or not to wait for mallocs.

sys/fs/nfs/nfsm_subs.h:
Bring in changes from Rick Macklem to create a new
nfsm_dissect_nonblock() inline function and
NFSM_DISSECT_NONBLOCK() macro.

sys/fs/nfs/nfs_commonkrpc.c,
sys/fs/nfsclient/nfs_clkrpc.c:
Add the malloc wait flag to a newnfs_realign() call.

sys/fs/nfsserver/nfs_nfsdkrpc.c:
Setup the new NFS server's RPC thread pool so that it will
call the FHA code.

Add the malloc flag argument to newnfs_realign().

Unstaticize newnfs_nfsv3_procid[] so that we can use it in
the FHA code.

sys/fs/nfsserver/nfs_nfsdsocket.c:
In nfsrvd_dorpc(), add NFSPROC_WRITE to the list of RPC types
that use the LK_SHARED lock type.

sys/fs/nfsserver/nfs_nfsdport.c:
In nfsd_fhtovp(), if we're starting a write, check to see
whether the underlying filesystem supports shared writes.
If not, upgrade the lock type from LK_SHARED to LK_EXCLUSIVE.

sys/nfsserver/nfs_fha.c:
Remove all code that is specific to the NFS server
implementation. Anything that is server-specific is now
accessed through a callback supplied by that server's FHA
shim in the new softc.

There are now separate sysctls and tunables for the FHA
implementations for the old and new NFS servers. The new
NFS server has its tunables under vfs.nfsd.fha, the old
NFS server's tunables are under vfs.nfsrv.fha as before.

In fha_extract_info(), use callouts for all server-specific
code. Getting file handles and offsets is now done in the
individual server's shim module.

In fha_hash_entry_choose_thread(), change the way we decide
whether two reads are in proximity to each other.
Previously, the calculation was a simple shift operation to
see whether the offsets were in the same power of 2 bucket.
The issue was that there would be a bucket (and therefore
thread) transition, even if the reads were in close
proximity. When there is a thread transition, reads wind
up going somewhat out of order, and ZFS gets confused.

The new calculation simply tries to see whether the offsets
are within 1 << bin_shift of each other. If they are, the
reads will be sent to the same thread.

The effect of this change is that for sequential reads, if
the client doesn't exceed the max_reqs_per_nfsd parameter
and the bin_shift is set to a reasonable value (22, or
4MB works well in my tests), the reads in any sequential
stream will largely be confined to a single thread.

Change fha_assign() so that it takes a softc argument. It
is now called from the individual server's shim code, which
will pass in the softc.

Change fhe_stats_sysctl() so that it takes a softc
parameter. It is now called from the individual server's
shim code. Add the current offset to the list of things
printed out about each active thread.

Change the num_reads and num_writes counters in the
fha_hash_entry structure to 32-bit values, and rename them
num_rw and num_exclusive, respectively, to reflect their
changed usage.

Add an enable sysctl and tunable that allows the user to
disable the FHA code (when vfs.XXX.fha.enable = 0). This
is useful for before/after performance comparisons.

nfs_fha.h:
Move most structure definitions out of nfs_fha.c and into
the header file, so that the individual server shims can
see them.

Change the default bin_shift to 22 (4MB) instead of 18
(256K). Allow unlimited commands per thread.

sys/nfsserver/nfs_fha_old.c,
sys/nfsserver/nfs_fha_old.h,
sys/fs/nfsserver/nfs_fha_new.c,
sys/fs/nfsserver/nfs_fha_new.h:
Add shims for the old and new NFS servers to interface with
the FHA code, and callbacks for the

The shims contain all of the code and definitions that are
specific to the NFS servers.

They setup the server-specific callbacks and set the server
name for the sysctl and loader tunable variables.

sys/nfsserver/nfs_srvkrpc.c:
Configure the RPC code to call fhaold_assign() instead of
fha_assign().

sys/modules/nfsd/Makefile:
Add nfs_fha.c and nfs_fha_new.c.

sys/modules/nfsserver/Makefile:
Add nfs_fha_old.c.

Reviewed by: rmacklem
Sponsored by: Spectra Logic
MFC after: 2 weeks


# 593efaf9 21-Feb-2013 John Baldwin <jhb@FreeBSD.org>

Further refine the handling of stop signals in the NFS client. The
changes in r246417 were incomplete as they did not add explicit calls to
sigdeferstop() around all the places that previously passed SBDRY to
_sleep(). In addition, nfs_getcacheblk() could trigger a write RPC from
getblk() resulting in sigdeferstop() recursing. Rather than manually
deferring stop signals in specific places, change the VFS_*() and VOP_*()
methods to defer stop signals for filesystems which request this behavior
via a new VFCF_SBDRY flag. Note that this has to be a VFC flag rather than
a MNTK flag so that it works properly with VFS_MOUNT() when the mount is
not yet fully constructed. For now, only the NFS clients are set this new
flag in VFS_SET().

A few other related changes:
- Add an assertion to ensure that TDF_SBDRY doesn't leak to userland.
- When a lookup request uses VOP_READLINK() to follow a symlink, mark
the request as being on behalf of the thread performing the lookup
(cnp_thread) rather than using a NULL thread pointer. This causes
NFS to properly handle signals during this VOP on an interruptible
mount.

PR: kern/176179
Reported by: Russell Cattelan (sigdeferstop() recursion)
Reviewed by: kib
MFC after: 1 month


# a120a7a3 06-Feb-2013 John Baldwin <jhb@FreeBSD.org>

Rework the handling of stop signals in the NFS client. The changes in
195702, 195703, and 195821 prevented a thread from suspending while holding
locks inside of NFS by forcing the thread to fail sleeps with EINTR or
ERESTART but defer the thread suspension to the user boundary. However,
this had the effect that stopping a process during an NFS request could
abort the request and trigger EINTR errors that were visible to userland
processes (previously the thread would have suspended and completed the
request once it was resumed).

This change instead effectively masks stop signals while in the NFS client.
It uses the existing TDF_SBDRY flag to effect this since SIGSTOP cannot
be masked directly. Also, instead of setting PBDRY on individual sleeps,
the NFS client now sets the TDF_SBDRY flag around each NFS request and
stop signals are masked for all sleeps during that region (the previous
change missed sleeps in lockmgr locks). The end result is that stop
signals sent to threads performing an NFS request are completely
ignored until after the NFS request has finished processing and the
thread prepares to return to userland. This restores the behavior of
stop signals being transparent to userland processes while still
preventing threads from suspending while holding NFS locks.

Reviewed by: kib
MFC after: 1 month


# a89a2c8b 25-Jan-2013 John Baldwin <jhb@FreeBSD.org>

Further cleanups to use of timestamps in NFS:
- Use NFSD_MONOSEC (which maps to time_uptime) instead of the seconds
portion of wall-time stamps to manage timeouts on events.
- Remove unused nd_starttime from the per-request structure in the new
NFS server.
- Use nanotime() for the modification time on a delegation to get as
precise a time as possible.
- Use time_second instead of extracting the second from a call to
getmicrotime().

Submitted by: bde (3)
Reviewed by: bde, rmacklem
MFC after: 2 weeks


# 6910d7a0 15-Jan-2013 John Baldwin <jhb@FreeBSD.org>

- More properly handle interrupted NFS requests on an interruptible mount
by returning an error of EINTR rather than EACCES.
- While here, bring back some (but not all) of the NFS RPC statistics lost
when krpc was committed.

Reviewed by: rmacklem
MFC after: 1 week


# 1f60bfd8 08-Dec-2012 Rick Macklem <rmacklem@FreeBSD.org>

Move the NFSv4.1 client patches over from projects/nfsv4.1-client
to head. I don't think the NFS client behaviour will change unless
the new "minorversion=1" mount option is used. It includes basic
NFSv4.1 support plus support for pNFS using the Files Layout only.
All problems detecting during an NFSv4.1 Bakeathon testing event
in June 2012 have been resolved in this code and it has been tested
against the NFSv4.1 server available to me.
Although not reviewed, I believe that kib@ has looked at it.


# 23b35663 19-Jan-2012 Rick Macklem <rmacklem@FreeBSD.org>

Martin Cracauer reported a problem to freebsd-current@ under the
subject "Data corruption over NFS in -current". During investigation
of this, I came across an ugly bogusity in the new NFS client where
it replaced the cr_uid with the one used for the mount. This was
done so that "system operations" like the NFSv4 Renew would be
performed as the user that did the mount. However, if any other
thread shares the credential with the one doing this operation,
it could do an RPC (or just about anything else) as the wrong cr_uid.
This patch fixes the above, by using the mount credentials instead of
the one provided as an argument for this case. It appears
to have fixed Martin's problem.
This patch is needed for NFSv4 mounts and NFSv3 mounts against
some non-FreeBSD servers that do not put post operation attributes
in the NFSv3 Statfs RPC reply.

Tested by: Martin Cracauer (cracauer at cons.org)
Reviewed by: jhb
MFC after: 2 weeks


# f7258644 07-Jan-2012 Rick Macklem <rmacklem@FreeBSD.org>

opt_inet6.h was missing from some files in the new NFS subsystem.
The effect of this was, for clients mounted via inet6 addresses,
that the DRC cache would never have a hit in the server. It also
broke NFSv4 callbacks when an inet6 address was the only one available
in the client. This patch fixes the above, plus deletes opt_inet6.h
from a couple of files it is not needed for.

MFC after: 2 weeks


# 713f46ac 20-Dec-2011 Rick Macklem <rmacklem@FreeBSD.org>

jwd@ reported a problem via email where the old NFS client would
get a reply of EEXIST from an NFS server when a Mkdir RPC was retried,
for an NFS over UDP mount.
Upon investigation, it was found that the client was retransmitting
the Mkdir RPC request over UDP, but with a different xid. As such,
the retransmitted message would miss the Duplicate Request Cache
in the server, causing it to reply EEXIST. The kernel client side
UDP rpc code has two timers. The first one causes a retransmit using
the same xid and socket and was set to a fixed value of 3seconds.
(The default can be overridden via CLSET_RETRY_TIMEOUT.)
The second one creates a new socket and xid and should be larger
than the first. However, both NFS clients were setting the second
timer to nm_timeo ("timeout=<value>" mount argument), which defaulted to
1second, so the first timer would never time out.
This patch fixes both NFS clients so that they set the first timer
using nm_timeo and makes the second timer larger than the first one.

Reported by: jwd
Tested by: jwd
Reviewed by: jhb
MFC after: 2 weeks


# 6a536cee 16-Jul-2011 Rick Macklem <rmacklem@FreeBSD.org>

The new NFSv4 client handled NFSERR_GRACE as a fatal error
for the remove and rename operations. Some NFSv4 servers will
report NFSERR_GRACE for these operations. This patch changes
the behaviour of the client so that it handles NFSERR_GRACE
like NFSERR_DELAY for non-state related operations like
remove and rename. It also exempts the delegreturn operation
from handling within newnfs_request() for NFSERR_DELAY/NFSERR_GRACE
so that it can handle NFSERR_GRACE in the same manner as before.
This problem was resolved thanks to discussion with bfields at fieldses.org.
The problem was identified at the recent NFSv4 ineroperability
bakeathon.

MFC after: 2 weeks


# a9285ae5 16-Jul-2011 Zack Kirsch <zack@FreeBSD.org>

Add DEXITCODE plumbing to NFS.

Isilon has the concept of an in-memory exit-code ring that saves the last exit
code of a function and allows for stack tracing. This is very helpful when
debugging tough issues.

This patch is essentially a no-op for BSD at this point, until we upstream
the dexitcode logic itself. The patch adds DEXITCODE calls to every NFS
function that returns an errno error code. A number of code paths were also
reorganized to have single exit paths, to reduce code duplication.

Submitted by: David Kwan <dkwan@isilon.com>
Reviewed by: rmacklem
Approved by: zml (mentor)
MFC after: 2 weeks


# 7bb55def 22-Jun-2011 Rick Macklem <rmacklem@FreeBSD.org>

Plug an mbuf leak in the new NFS client that occurred when a
server replied NFS3ERR_JUKEBOX/NFS4ERR_DELAY to an rpc.
This affected both NFSv3 and NFSv4. Found during testing
at the recent NFSv4 interoperability Bakeathon.

MFC after: 2 weeks


# 72b7c8dd 22-Jun-2011 Rick Macklem <rmacklem@FreeBSD.org>

Fix the new NFSv4 client so that it uses the same uid as
was used for doing a mount when performing system operations
on AUTH_SYS mounts. This resolved an issue when mounting
a Linux server. Found during testing at the recent
NFSv4 interoperability Bakeathon.

MFC after: 2 weeks


# 7e7fd7d1 19-Jun-2011 Rick Macklem <rmacklem@FreeBSD.org>

Fix the kgssapi so that it can be loaded as a module. Currently
the NFS subsystems use five of the rpcsec_gss/kgssapi entry points,
but since it was not obvious which others might be useful, all
nineteen were included. Basically the nineteen entry points are
set in a structure called rpc_gss_entries and inline functions
defined in sys/rpc/rpcsec_gss.h check for the entry points being
non-NULL and then call them. A default value is returned otherwise.
Requested by rwatson.

Reviewed by: jhb
MFC after: 2 weeks


# 8f0e65c9 18-Jun-2011 Rick Macklem <rmacklem@FreeBSD.org>

Add DTrace support to the new NFS client. This is essentially
cloned from the old NFS client, plus additions for NFSv4. A
review of this code is in progress, however it was felt by the
reviewer that it could go in now, before code slush. Any changes
required by the review can be committed as bug fixes later.


# 1f376590 15-May-2011 Rick Macklem <rmacklem@FreeBSD.org>

Change the sysctl naming for the old and new NFS clients
to vfs.oldnfs.xxx and vfs.nfs.xxx respectively. This makes
the default nfs client use vfs.nfs.xxx after r221124.


# ebd9ef33 17-Apr-2011 Rick Macklem <rmacklem@FreeBSD.org>

Get rid of the "nfscl: consider increasing kern.ipc.maxsockbuf"
message that was generated when doing experimental NFS client
mounts. I put that message in because the krpc would hang with
the default size for mounts that used large rsize/wsize values.
Since the bug that caused these hangs was fixed by r213756,
I think the message is no longer needed.

MFC after: 2 weeks


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# 44066061 16-May-2010 Rick Macklem <rmacklem@FreeBSD.org>

MFC: r207764
Patch the experimental NFS client so that it works for NFSv2
by adding the necessary mapping from NFSv3 procedure numbers
to NFSv2 procedure numbers when doing NFSv2 RPCs.


# 23d9efa7 07-May-2010 Rick Macklem <rmacklem@FreeBSD.org>

Patch the experimental NFS client so that it works for NFSv2
by adding the necessary mapping from NFSv3 procedure numbers
to NFSv2 procedure numbers when doing NFSv2 RPCs.

MFC after: 1 week


# 227b9ebe 30-Apr-2010 Rick Macklem <rmacklem@FreeBSD.org>

MFC: r207170
An NFSv4 server will reply NFSERR_GRACE for non-recovery RPCs
during the grace period after startup. This grace period must
be at least the lease duration, which is typically 1-2 minutes.
It seems prudent for the experimental NFS client to wait a few
seconds before retrying such an RPC, so that the server isn't
flooded with non-recovery RPCs during recovery. This patch adds
an argument to nfs_catnap() to implement a 5 second delay
for this case.


# 23f929df 24-Apr-2010 Rick Macklem <rmacklem@FreeBSD.org>

An NFSv4 server will reply NFSERR_GRACE for non-recovery RPCs
during the grace period after startup. This grace period must
be at least the lease duration, which is typically 1-2 minutes.
It seems prudent for the experimental NFS client to wait a few
seconds before retrying such an RPC, so that the server isn't
flooded with non-recovery RPCs during recovery. This patch adds
an argument to nfs_catnap() to implement a 5 second delay
for this case.

MFC after: 1 week


# 089f366a 12-Jul-2009 Rick Macklem <rmacklem@FreeBSD.org>

Add calls to the experimental nfs client for the case of an "intr" mount,
so that signals that aren't supposed to terminate RPCs in progress are
masked off during the RPC.

Approved by: re (kensmith), kib (mentor)


# 05c965a2 24-May-2009 Rick Macklem <rmacklem@FreeBSD.org>

Crib the realign function out of nfs_krpc.c and add a call
to it for the client side reply. Hopefully this fixes the
problem with using the new krpc for arm for the experimental
nfs client.

Approved by: kib (mentor)


# 63bde62e 23-May-2009 Rick Macklem <rmacklem@FreeBSD.org>

Fix the experimental nfsv4 client so that it works for the
case of a kerberized mount without a host based principal
name. This will only work for mounts being done by a user
other than root. Support for a host based principal name
will not work until proposed changes to the rpcsec_gss part
of the krpc are committed. It now builds for "options KGSSAPI".

Approved by: kib (mentor)


# e2b84e03 22-May-2009 Rick Macklem <rmacklem@FreeBSD.org>

Fix the rpc_gss_secfind() call in nfs_commonkrpc.c so that
the code will build when "options KGSSAPI" is specified
without requiring the proposed changes that add host based
initiator principal support. It will not handle the case where
the client uses a host based initiator principal until those
changes are committed. The code that uses those changes is
#ifdef'd notyet until the krpc rpcsec_changes are committed.

Approved by: kib (mentor)


# 86ce6a83 21-May-2009 Robert Watson <rwatson@FreeBSD.org>

Remove the unmaintained University of Michigan NFSv4 client from 8.x
prior to 8.0-RELEASE. Rick Macklem's new and more feature-rich NFSv234
client and server are replacing it.

Discussed with: rmacklem


# 15e8331f 15-May-2009 Rick Macklem <rmacklem@FreeBSD.org>

Fixed the Null callback RPCs so that they work with the new krpc. This
required two changes: setting the program and version numbers before
connect and fixing the handling of the Null Rpc case in newnfs_request().

Approved by: kib (mentor)


# 9ec7b004 04-May-2009 Rick Macklem <rmacklem@FreeBSD.org>

Add the experimental nfs subtree to the kernel, that includes
support for NFSv4 as well as NFSv2 and 3.
It lives in 3 subdirs under sys/fs:
nfs - functions that are common to the client and server
nfsclient - a mutation of sys/nfsclient that call generic functions
to do RPCs and handle state. As such, it retains the
buffer cache handling characteristics and vnode semantics that
are found in sys/nfsclient, for the most part.
nfsserver - the server. It includes a DRC designed specifically for
NFSv4, that is used instead of the generic DRC in sys/rpc.
The build glue will be checked in later, so at this point, it
consists of 3 new subdirs that should not affect kernel building.

Approved by: kib (mentor)