History log of /freebsd-current/sys/fs/nfs/nfs_commonsubs.c
Revision Date Author Comments
# cc760de2 11-Jan-2024 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Only update atime for Copy when noatime is not specified

Commit 57ce37f9dcd0 modified the NFSv4.2 Copy operation so that
it will update atime on the infd file whenever possible.
This is done by adding a Setattr of TimeAccess for the
input file.

This patch disables this change for the case of an NFSv4.2
mount with the "noatime" mount option, which avoids the
additional Setattr of TimeAccess operation.

MFC after: 1 week


# b484bcd5 22-Dec-2023 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Fix handling of a copyout() error reply

If vfs.nfs.nfs_directio_enable is set non-zero (the default is
zero) and a file on an NFS mount is read after being opened
with O_DIRECT | O_ RDONLY, a call to nfsm_mbufuio() calls
copyout() without checking for an error return.
If copyout() returns EFAULT, this would not work correctly.

Only the call path
VOP_READ()->ncl_readrpc()->nfsrpc_read()->nfsrpc_readrpc()
will do this and the error return for EFAULT will
be returned back to VOP_READ().

This patch adds the error check to nfsm_mbufuio().

Reviewed by: markj
Differential Revision: https://reviews.freebsd.org/D43160


# 57ce37f9 18-Oct-2023 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Make NFSv4.2 Copy set atime on infd

RFC7862 does not specify infile atime behaviour when a NFSv4.2 Copy
operation is performed. Since the collective opinion of a mailing
list discussion (on freebsd-hackers@) seemed to indicate that
copy_file_range(2) should update atime on the infd,
even if there is no data copied, this
patch attempts to ensure that behaviour.

For Copy, it preceeds the Copy operation with a Setattr of
TimeAccess_Set(NFSv4. speak for atime) for the invp. For the case
where no data will be copied, it does a Setattr RPC to set
TimeAccess_Set for the invp.

A __FreeBSD_version bump will be done as a separate commit, since
this patch changes the internal interface between the nfscommon and
nfscl modules.

MFC after: 1 month


# db7257ef 17-Oct-2023 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: Fix a server crash

PR#274346 reports a crash which appears to be caused by a NULL default session
being destroyed. This patch should avoid the crash.

Tested by: Joshua Kinard <freebsd@kumba.dev>
PR: 274346
MFC after: 2 weeks


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# ba8cc6d7 12-Mar-2023 Mateusz Guzik <mjg@FreeBSD.org>

vfs: use __enum_uint8 for vtype and vstate

This whacks hackery around only reading v_type once.

Bump __FreeBSD_version to 1400093


# 4adb28c0 07-Apr-2023 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Fix support for doing Null RPCs

Although the NFS client does not currently perform Null RPCs,
this fix is needed if/when it might do so.
Found during testing of experimental code that uses Null RPCs
to maintain/monitor TCP connections for "nconnect" mounts.

MFC after: 3 months


# f4179ad4 01-Apr-2023 Rick Macklem <rmacklem@FreeBSD.org>

nfscommon: Add support for an NFSv4 operation bitmap

NFSv4.1/4.2 uses operation bitmaps for various operations,
such as the SP4_MACH_CRED case for ExchangeID.
This patch adds support for operation bitmaps so that
support for SP4_MACH_CRED can be added to the NFSv4.1/4.2
server in a future commit.

This commit should not change any NFSv4.1/4.2 semantics.

MFC after: 3 months


# 695d87ba 28-Mar-2023 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Make coverity happy

Coverity does not like code that checks a function's
return value sometimes. Add "(void)" in front of the
function when the return value does not matter to try
and make it happy.

A recent commit deleted "(void)"s in front of nfsm_fhtom().
This commit puts them back in.

Reported by: emaste
MFC after: 3 months


# 896516e5 16-Mar-2023 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Add a new NFSv4.1/4.2 mount option for Kerberized mounts

Without this patch, a Kerberized NFSv4.1/4.2 mount must provide
a Kerberos credential for the client at mount time. This credential
is typically referred to as a "machine credential". It can be
created one of two ways:
- The user (usually root) has a valid TGT at the time the mount
is done and this becomes the machine credential.
There are two problems with this.
1 - The user doing the mount must have a valid TGT for a user
principal at mount time. As such, the mount cannot be put
in fstab(5) or similar.
2 - When the TGT expires, the mount breaks.
- The client machine has a service principal in its default keytab
file and this service principal (typically called a host-based
initiator credential) is used as the machine credential.
There are problems with this approach as well:
1 - There is a certain amount of administrative overhead creating
the service principal for the NFS client, creating a keytab
entry for this principal and then copying the keytab entry
into the client's default keytab file via some secure means.
2 - The NFS client must have a fixed, well known, DNS name, since
that FQDN is in the service principal name as the instance.

This patch uses a feature of NFSv4.1/4.2 called SP4_NONE, which
allows the state maintenance operations to be performed by any
authentication mechanism, to do these operations via AUTH_SYS
instead of RPCSEC_GSS (Kerberos). As such, neither of the above
mechanisms is needed.

It is hoped that this option will encourage adoption of Kerberized
NFS mounts using TLS, to provide a more secure NFS mount.

This new NFSv4.1/4.2 mount option, called "syskrb5" must be used
with "sec=krb5[ip]" to avoid the need for either of the above
Kerberos setups to be done by the client.

Note that all file access/modification operations still require
users on the NFS client to have a valid TGT recognized by the
NFSv4.1/4.2 server. As such, this option allows, at most, a
malicious client to do some sort of DOS attack.

Although not required, use of "tls" with this new option is
encouraged, since it provides on-the-wire encryption plus,
optionally, client identity verification via a X.509
certificate provided to the server during TLS handshake.
Alternately, "sec=krb5p" does provide on-the-wire
encryption of file data.

A mount_nfs(8) man page update will be done in a separate commit.

Discussed on: freebsd-current@
MFC after: 3 months


# f0db2b60 14-Feb-2023 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: Continue adding macros so nfsd can run in a vnet prison

Commit 7344856e3a6d added a lot of macros that will front end
vnet macros so that nfsd(8) can run in vnet prison.
This patch adds some more, to allow the nfsuserd(8) daemon to
run in vnet prison, once the macros map to vnet ones.
This is the last commit for NFSD_VNET_xxx macros, but there are
still some for KRPC_VNET_xxx and KGSS_VNET_xx to allow the
rpc.tlsservd(8) and gssd(8) daemons to run in a vnet prison.

MFC after: 3 months


# bf312482 08-Nov-2022 Gordon Bergling <gbe@FreeBSD.org>

nfs: Fix common typos in source code comments

- s/attrbute/attribute/

MFC after: 3 days


# 8b43388c 23-Sep-2022 Zhenlei Huang <zlei.huang@gmail.com>

nfscl: Fix parameter order in the calls to MGET().

Reviewed by: imp, rmacklem
Differential Revision: https://reviews.freebsd.org/D36644


# 117cea02 28-Aug-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Fix setup of Sequence when all slots marked bad

Commit 40ada74ee1da modified the NFSv4.1/4.2 client so
that it would issue a DestroySession to the server when
all session slots are marked bad. Once this is done,
the Sequence operation should get a NFSERR_BADSESSION
reply from the server.

Without this patch, the code was setting ND_HASSLOTID
when, in fact, there was no slot marked in use by
nfsv4_sequencelookup(). This would result in the
code freeing a slot not in use. The effect of this
was minimal, since the session was already destroyed.

This patch fixes the code so that it does not set
ND_HASSLOTID for this case.

MFC after: 2 weeks


# 40ada74e 09-Jul-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Add optional support for slots marked bad

This patch adds support for session slots marked bad
to nfsv4_sequencelookup(). An additional boolean
argument indicates if the check for slots marked bad
should be done.

The "cred" argument added to nfscl_reqstart() by
commit 326bcf9394c7 is now passed into nfsv4_setquence()
so that it can optionally set the boolean argument
for nfsv4_sequencelookup(). When optionally enabled,
nfsv4_setsequence() will do a DestroySession when all
slots are marked bad.

Since the code that marks slots bad is not yet committed,
this patch should not result in a semantics change.

PR: 260011
MFC after: 2 weeks


# dff31ae1 09-Jul-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Move nfsrpc_destroysession into nfscommon

This patch moves nfsrpc_destroysession() into nfscommon.ko
and also modifies its arguments slightly. This will allow
the function to be called from nfsv4_sequencelookup() in
a future commit.

This patch should not result in a semantics change.

PR: 260011
MFC after: 2 weeks


# 326bcf93 08-Jul-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Add a cred argument to nfscl_reqstart()

To deal with broken session slots caused by the use of the
"soft" and/or "intr" mount options, nfsv4_sequencelookup()
will be modified to track the potentially broken session
slots. Then, when all session slots are potentially
broken, do a DeleteSession operation, so that the NFSv4
server will reply NFSERR_BADSESSION to uses of the session.
These changes will be done in future commits. However,
to do the DeleteSession RPC, a "cred" argument is needed
for nfscl_reqstart(). This patch adds this argument,
which is unused at this time. If the argument is NULL,
it indicates that DeleteSession should not be done
(usually because the RPC does not use sessions).

This patch should not cause any semantics change.

PR: 260011
MFC after: 2 weeks


# 1ebc14c9 24-Jun-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscommon: Clean up the code by not using the vnode_vtype() macro

The vnode_vtype() macro was used to make the code compatible
with Mac OSX, for the Mac OSX port.
For FreeBSD, this macro just obscured the code, so
avoid using it to clean up the code.

This commit should not result in a semantics change.


# 6d25ea6d 18-Jun-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Clean up the code by removing #if(n)def APPLE

The definition of "APPLE" was used by the Mac OSX port.
For FreeBSD, this definition is never used, so remove
the references to it to clean up the code.

This commit should not result in a semantics change.


# ef4edb70 04-May-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: Add a sanity check for Owner/OwnerGroup string length

Robert Morris reported that, if a client sends an absurdly
large Owner/OwnerGroup string, the kernel malloc() for the
large size string can block forever.

This patch adds a sanity limit for Owner/OwnerGroup string
length. Since the RFCs do not specify any limit and FreeBSD
can handle a group name greater than 1Kbyte, the limit is
set at a generous 10Kbytes.

Reported by: rtm@lcs.mit.edu
PR: 260546
MFC after: 2 weeks


# 21de450a 08-Apr-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Add support for a NFSv4 AppendWrite RPC

For IO_APPEND VOP_WRITE()s, the code first does a
Getattr RPC to acquire the file's size, before it
can do the Write RPC.

Although NFS does not have an append write operation,
an NFSv4 compound can use a Verify operation to check
that the client's notion of the file's size is
correct, followed by the Write operation.

This patch modifies nfscl_wcc_data() to optionally
acquire the file's size, for use with an AppendWrite.
Although the "stuff" arguments are always NULL
(these were used for the Mac OSX port and should be
cleared out someday), make the argument to
nfscl_wcc_data() explicitly NULL for clarity.

This patch does not cause any semantics change until
the AppendWrite is added in a future commit.


# 330aa8ac 05-Apr-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Add support for a NFSv4 AppendWrite RPC

For IO_APPEND VOP_WRITE()s, the code first does a
Getattr RPC to acquire the file's size, before it
can do the Write RPC.

Although NFS does not have an append write operation,
an NFSv4 compound can use a Verify operation to check
that the client's notion of the file's size is
correct before doing the Write operation.

This patch prepares the NFSv4 client for such an
RPC, which will be added in a future commit.

This patch does not cause any semantics change.


# a91a5784 11-Jan-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: Do not accept audit/alarm ACEs for the NFSv4 server

The UFS and ZFS file systems only support Allow/Deny ACEs
in the NFSv4 ACLs. This patch does not allow the server
to parse Audit/Alarm ACEs. The NFSv4 client is still
allowed to pase Audit/Alarm ACEs, since non-FreeBSD NFSv4
servers may use them.

This patch should not have a significant effect, since the
UFS and ZFS file systems will not handle these ACEs anyhow.
It simply serves as an additional "safety belt" for the
NFSv4 server.

MFC after: 2 weeks


# 5da9b3b0 11-Jan-2022 Rick Macklem <rmacklem@FreeBSD.org>

Revert "nfscommon: Add arguments for support of the dacl attribute"

This reverts commit 0fa074b53e7c22157dcb41aaa25a33abc8118f26.

I now see that the implementation of the "dacl" operation
requires that the NFSv4 server to "automatic inheritance"
and I do not plan on doing this. As such, this patch is
harmless, but unneeded.


# 3455c738 09-Jan-2022 Alexander Motin <mav@FreeBSD.org>

nfsd: Reduce callouts rate.

Before this callouts were scheduled twice a seconds even if nfsd was
never used. This reduces the rate to ~1Hz and only after nfsd first
started.

MFC after: 2 weeks


# 0fa074b5 26-Dec-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscommon: Add arguments for support of the dacl attribute

NFSv4.1/4.2 has an alternative to the acl attribute, called
dacl, that includes support for the ACL_ENTRY_INHERITED flag,
called NFSV4ACE_INHERITED in NFSv4.

This patch adds a dacl argument to nfsrv_buildacl(),
nfsrv_dissectacl() and nfsrv_dissectace(), so that they
will handle NFSV4ACE_INHERITED when dacl == true.

Since these functions are always called with dacl == false
for this patch, semantics should not have changed.
A future patch will add support for dacl.

MFC after: 2 weeks


# 2d90ef47 04-Dec-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: Fix Verify for attributes like FilesAvail

When the Verify operation calls nfsv4_loadattr(), it provides
the "struct statfs" information that can be used for doing a
compare for FilesAvail, FilesFree, FilesTotal, SpaceAvail,
SpaceFree and SpaceTotal. However, the code erroneously
used the "struct nfsstatfs *" argument that is NULL.
This patch fixes these cases to use the correct argument
structure. For the case of FilesAvail, the code in
nfsv4_fillattr() was factored out into a separate function
called nfsv4_filesavail(), so that it can be called from
nfsv4_loadattr() as well as nfsv4_fillattr().

In fact, most of the code in nfsv4_filesavail() is old
OpenBSD code that does not build/run on FreeBSD, but I
left it in place, in case it is of some use someday.

I am not aware of any extant NFSv4 client that does Verify
on these attributes.

Reported by: rtm@lcs.mit.edu
Tested by: rtm@lcs.mit.edu
PR: 260176
MFC after: 2 weeks


# 480be96e 04-Dec-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: Sanity check the Layouttype count

Reported by: rtm@lcs.mit.edu
Tested by: rtm@lcs.mit.edu
PR: 260155
MFC after: 2 weeks


# db0ac6de 02-Dec-2021 Cy Schubert <cy@FreeBSD.org>

Revert "wpa: Import wpa_supplicant/hostapd commit 14ab4a816"

This reverts commit 266f97b5e9a7958e365e78288616a459b40d924a, reversing
changes made to a10253cffea84c0c980a36ba6776b00ed96c3e3b.

A mismerge of a merge to catch up to main resulted in files being
committed which should not have been.


# fd020f19 01-Dec-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: Sanity check the ACL attribute

When an ACL is presented to the NFSv4 server in
Setattr or Verify, parsing of the ACL assumed a
sane acecnt and sane sizes for the "who" strings.
This patch adds sanity checks for these.

The patch also fixes handling of an error
return from nfsrv_dissectacl() for one broken
case.

Reported by: rtm@lcs.mit.edu
Tested by: rtm@lcs.mit.edu
PR: 260111
MFC after: 2 weeks


# 638b90a1 28-Nov-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfs: Quiet a few "unused" warnings

For most of these warnings, the variable is loaded
with data parsed out of an RPC messages. In case
the data is useful in the future, I just marked
these with __unused.


# 44744f75 11-Nov-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Add a LayoutError RPC for NFSv4.2 pNFS mounts

If a pNFS server's DS runs out of disk space, it replies
NFSERR_NOSPC to the client doing writing. For the Linux
client, it then sends a LayoutError RPC to the MDS server to
tell it about the error. This patch adds the same to the
FreeBSD NFSv4.2 pNFS client, to maintain Linux compatible
behaviour, particlularily for non-FreeBSD pNFS servers.

MFC after: 2 weeks


# d70ca5b0 08-Nov-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: Fix f_bavail and f_ffree for NFSv4 when negative

Since the NFS Space_available and Files_available are unsigned,
the NFSv3 server sets them to 0 when negative, so that they
do not appear to be large positive values for non-FreeBSD clients.
This patch fixes the NFSv4 server to do the same.

Found during a recent IEFT NFSv4 working group testing event.

MFC after: 2 weeks


# a4667e09 19-Oct-2021 Mark Johnston <markj@FreeBSD.org>

Convert vm_page_alloc() callers to use vm_page_alloc_noobj().

Remove page zeroing code from consumers and stop specifying
VM_ALLOC_NOOBJ. In a few places, also convert an allocation loop to
simply use VM_ALLOC_WAITOK.

Similarly, convert vm_page_alloc_domain() callers.

Note that callers are now responsible for assigning the pindex.

Reviewed by: alc, hselasky, kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31986


# 55089ef4 11-Sep-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Make vfs.nfs.maxcopyrange larger by default

As of commit 103b207536f9, the NFSv4.2 server will limit the size
of a Copy operation based upon a 1 second timeout. The Linux 5.2
kernel server also limits Copy operation size to 4Mbytes.
As such, the NFSv4.2 client can attempt a large Copy without
resulting in a long RPC RTT for these servers.

This patch changes vfs.nfs.maxcopyrange to 64bits and sets
the default to the maximum possible size of SSIZE_MAX, since
a larger size makes the Copy operation more efficient and
allows for copying to complete with fewer RPCs.
The sysctl may be need to be made smaller for other non-FreeBSD
NFSv4.2 servers.

MFC after: 2 weeks


# 3ad1e1c1 11-Aug-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Add a Lookup+Open RPC for NFSv4.1/4.2

This patch adds a Lookup+Open compound RPC to the NFSv4.1/4.2
NFS client, which can be used by nfs_lookup() so that a
subsequent Open RPC is not required.
It uses the cn_flags OPENREAD, OPENWRITE added by commit c18c74a87c15.
This reduced the number of RPCs by about 15% for a kernel
build over NFS.

For now, use of Lookup+Open is only done when the "oneopenown"
mount option is used. It may be possible for Lookup+Open to
be used for non-oneopenown NFSv4.1/4.2 mounts, but that will
require extensive further testing to determine if it works.

While here, I've added the changes to the nfscommon module
that are needed to implement the Deallocate NFSv4.2 operation.
This avoids needing another cycle of changes to the internal
KAPI between the NFS modules.

This commit has changed the internal KAPI between the NFS
modules and, as such, all need to be rebuilt from sources.
I have not bumped __FreeBSD_version, since it was bumped a
few days ago.


# ee29e6f3 16-Jul-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: Add sysctl to set maximum I/O size up to 1Mbyte

Since MAXPHYS now allows the FreeBSD NFS client
to do 1Mbyte I/O operations, add a sysctl called vfs.nfsd.srvmaxio
so that the maximum NFS server I/O size can be set up to 1Mbyte.
The Linux NFS client can also do 1Mbyte I/O operations.

The default of 128Kbytes for the maximum I/O size has
not been changed for two reasons:
- kern.ipc.maxsockbuf must be increased to support 1Mbyte I/O
- The limited benchmarking I can do actually shows a drop in I/O rate
when the I/O size is above 256Kbytes.
However, daveb@spectralogic.com reports seeing an increase
in I/O rate for the 1Mbyte I/O size vs 128Kbytes using a Linux client.

Reviewed by: asomers
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D30826


# 1e0a518d 08-Jul-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Add a Linux compatible "nconnect" mount option

Linux has had an "nconnect" NFS mount option for some time.
It specifies that N (up to 16) TCP connections are to created for a mount,
instead of just one TCP connection.

A discussion on freebsd-net@ indicated that this could improve
client<-->server network bandwidth, if either the client or server
have one of the following:
- multiple network ports aggregated to-gether with lagg/lacp.
- a fast NIC that is using multiple queues
It does result in using more IP port#s and might increase server
peak load for a client.

One difference from the Linux implementation is that this implementation
uses the first TCP connection for all RPCs composed of small messages
and uses the additional TCP connections for RPCs that normally have
large messages (Read/Readdir/Write). The Linux implementation spreads
all RPCs across all TCP connections in a round robin fashion, whereas
this implementation spreads Read/Readdir/Write across the additional
TCP connections in a round robin fashion.

Reviewed by: markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D30970


# 947bd247 30-May-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: Add support for the NFSv4.1/4.2 Secinfo_no_name operation

The Linux client is now attempting to use the Secinfo_no_name
operation for NFSv4.1/4.2 mounts. Although it does not seem to
mind the NFSERR_NOTSUPP reply, adding support for it seems
reasonable.

I also noticed that "savflag" needed to be 64bits in
nfsrvd_secinfo() since nd_flag in now 64bits, so I changed
the declaration of it there. I also added code to set "vp" NULL
after performing Secinfo/Secinfo_no_name, since these
operations consume the current FH, which is represented
by "vp" in nfsrvd_compound().

Fixing when the server replies NFSERR_WRONGSEC so that
it conforms to RFC5661 Sec. 2.6 still needs to be done
in a future commit.

MFC after: 2 weeks


# dd02d9d6 07-May-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Add support for va_birthtime to NFSv4

There is a NFSv4 file attribute called TimeCreate
that can be used for va_birthtime.
r362175 added some support for use of TimeCreate.
This patch completes support of va_birthtime by adding
support for setting this attribute to the server.
It also eanbles the client to
acquire and set the attribute for a NFSv4
server that supports the attribute.

Reviewed by: markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D30156


# 87597731 26-Apr-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: fix the slot sequence# when a callback fails

Commit 4281bfec3628 patched the server so that the
callback session slot would be free'd for reuse when
a callback attempt fails.
However, this can often result in the sequence# for
the session slot to be advanced such that the client
end will reply NFSERR_SEQMISORDERED.

To avoid the NFSERR_SEQMISORDERED client reply,
this patch negates the sequence# advance for the
case where the callback has failed.
The common case is a failed back channel, where
the callback cannot be sent to the client, and
not advancing the sequence# is correct for this
case. For the uncommon case where the client's
reply to the callback is lost, not advancing the
sequence# will indicate to the client that the
next callback is a retry and not a new callback.
But, since the FreeBSD server always sets "csa_cachethis"
false in the callback sequence operation, a retry
and a new callback should be handled the same way
by the client, so this should not matter.

Until you have this patch in your NFSv4.1/4.2 server,
you should consider avoiding the use of delegations.
Even with this patch, interoperation with the
Linux NFSv4.1/4.2 client in kernel versions prior
to 5.3 can result in frequent 15second delays if
delegations are enabled. This occurs because, for
kernels prior to 5.3, the Linux client does a TCP
reconnect every time it sees multiple concurrent
callbacks and then it takes 15seconds to recover
the back channel after doing so.

MFC after: 2 weeks


# 78ffcb86 19-Apr-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscommon: fix function name in comment

MFC after: 2 weeks


# 34256484 15-Apr-2021 Rick Macklem <rmacklem@FreeBSD.org>

Revert "nfsd: cut the Linux NFSv4.1/4.2 some slack w.r.t. RFC5661"

This reverts commit 9edaceca8165e2864267547311daf145bb520270.

It turns out that the Linux client intentionally does an NFSv4.1
RPC with only a Sequence operation in it and with "seqid + 1"
for the slot. This is used to re-synchronize the slot's seqid
and the client expects the NFS4ERR_SEQ_MISORDERED error reply.

As such, revert the patch, so that the server remains RFC5661
compliant.


# 9edaceca 11-Apr-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: cut the Linux NFSv4.1/4.2 some slack w.r.t. RFC5661

Recent testing of network partitioning a FreeBSD NFSv4.1
server from a Linux NFSv4.1 client identified problems
with both the FreeBSD server and Linux client.

Sometimes, after some Linux NFSv4.1/4.2 clients establish
a new TCP connection, they will advance the sequence number
for a session slot by 2 instead of 1.
RFC5661 specifies that a server should reply
NFS4ERR_SEQ_MISORDERED for this case.
This might result in a system call error in the client and
seems to disable future use of the slot by the client.
Since advancing the sequence number by 2 seems harmless,
allow this case if vfs.nfs.linuxseqsesshack is non-zero.

Note that, if the order of RPCs is actually reversed,
a subsequent RPC with a smaller sequence number value
for the slot will be received. This will result in
a NFS4ERR_SEQ_MISORDERED reply.
This has not been observed during testing.
Setting vfs.nfs.linuxseqsesshack to 0 will provide
RFC5661 compliant behaviour.

This fix affects the fairly rare case where a NFSv4
Linux client does a TCP reconnect and then apparently
erroneously increments the sequence number for the
session slot twice during the reconnect cycle.

PR: 254816
MFC after: 2 weeks


# 7763814f 11-Apr-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfsv4 client: do the BindConnectionToSession as required

During a recent testing event, it was reported that the NFSv4.1/4.2
server erroneously bound the back channel to a new TCP connection.
RFC5661 specifies that the fore channel is implicitly bound to a
new TCP connection when an RPC with Sequence (almost any of them)
is done on it. For the back channel to be bound to the new TCP
connection, an explicit BindConnectionToSession must be done as
the first RPC on the new connection.

Since new TCP connections are created by the "reconnect" layer
(sys/rpc/clnt_rc.c) of the krpc, this patch adds an optional
upcall done by the krpc whenever a new connection is created.
The patch also adds the specific upcall function that does a
BindConnectionToSession and configures the krpc to call it
when required.

This is necessary for correct interoperability with NFSv4.1/NFSv4.2
servers when the nfscbd daemon is running.

If doing NFSv4.1/NFSv4.2 mounts without this patch, it is
recommended that the nfscbd daemon not be running and that
the "pnfs" mount option not be specified.

PR: 254840
Comments by: asomers
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D29475


# 22cefe3d 10-Apr-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: fix replies from session cache for multiple retries

Recent testing of network partitioning a FreeBSD NFSv4.1
server from a Linux NFSv4.1 client identified problems
with both the FreeBSD server and Linux client.

Commit 05a39c2c1c18 fixed replying with the cached reply in
in the session slot if same session slot sequence#.
However, the code uses the reply and, as such,
will fail for a subsequent retry of the RPC.
A subsequent retry would be an extremely rare event,
but this patch fixes this, so long as m_copym(..M_NOWAIT)
does not fail, which should also be a rare event.

This fix affects the exceedingly rare case where a NFSv4
client retries a non-idempotent RPC, such as a lock
operation, multiple times. Note that retries only occur
after the client has needed to create a new TCP connection,
with a new TCP connection for each retry.

MFC after: 2 weeks


# 5f742d38 19-Mar-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfsv4 client: fix forced dismount when sleeping on nfsv4lck

During a recent NFSv4 testing event a test server caused a hang
where "umount -N" failed. The renew thread was sleeping on "nfsv4lck"
and the "umount" was sleeping, waiting for the renew thread to
terminate.

This is the first of two patches that is hoped to fix the renew thread
so that it will terminate when "umount -N" is done on the mount.

nfsv4_lock() checks for forced dismount, but only after it wakes up
from msleep(). Without this patch, a wakeup() call was required.
This patch adds a 1second timeout on the msleep(), so that it will
wake up and see the forced dismount flag. Normally a wakeup()
will occur in less than 1second, but if a premature return from
msleep() does occur, it will simply loop around and msleep() again.

While here, replace the nfsmsleep() wrapper that was used for portability
with the actual msleep() call and make the same change for nfsv4_getref().

MFC after: 2 weeks


# 52e63ec2 17-Dec-2020 Brooks Davis <brooks@FreeBSD.org>

VFS_QUOTACTL: Remove needless casts of arg

The argument is a void * so there's no need to cast it to caddr_t.

Update documentation to match function decleration.

Reviewed by: freqlabs
Obtained from: CheriBSD
MFC after: 1 week
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D27093


# 586ee69f 01-Sep-2020 Mateusz Guzik <mjg@FreeBSD.org>

fs: clean up empty lines in .c and .h files


# 808306dd 17-Aug-2020 Rick Macklem <rmacklem@FreeBSD.org>

Delete the unused "use_ext" argument to nfscl_reqstart().

This is a partial revert of r363210, since the "use_ext" argument added
by that commit is not actually useful.

This patch should not result in any semantics change.


# 02511d21 10-Aug-2020 Rick Macklem <rmacklem@FreeBSD.org>

Add an argument to newnfs_connect() that indicates use TLS for the connection.

For NFSv4.0, the server creates a server->client TCP connection for callbacks.
If the client mount on the server is using TLS, enable TLS for this callback
TCP connection.
TLS connections from clients will not be supported until the kernel RPC
changes are committed.

Since this changes the internal ABI between the NFS kernel modules that
will require a version bump, delete newnfs_trimtrailing(), which is no
longer used.

Since LCL_TLSCB is not yet set, these changes should not have any semantic
affect at this time.


# 194d8704 26-Jul-2020 Rick Macklem <rmacklem@FreeBSD.org>

Fix the NFSv4 client so that it checks for support of TimeCreate before
trying to set it.

r362490 added support for setting of the TimeCreate (va_birthtime) attribute,
but it does so without checking to see if the server supports the attribute.
This could result in NFSERR_ATTRNOTSUPP error replies to the Setattr operation.
This patch adds code to check that the server supports TimeCreate before
attempting to do a Setattr of it to avoid these error returns.


# 022346fa 06-Jul-2020 Rick Macklem <rmacklem@FreeBSD.org>

Add support for ext_pgs mbufs to nfsrvd_rephead().

This is another in the series of commits that add support to the NFS client
and server for building RPC messages in ext_pgs mbufs with anonymous pages.
This is useful so that the entire mbuf list does not need to be
copied before calling sosend() when NFS over TLS is enabled.

Since ND_EXTPG is never set yet, there is no semantic change at this time.


# 34fc29e0 05-Jul-2020 Rick Macklem <rmacklem@FreeBSD.org>

Add support for ext_pgs mbufs to nfsm_strtom().

Also, add a new function nfsm_add_ext_pgs() which will either add a page
or add a new ext_pgs mbuf with a page to the mbuf list. Used by nfsm_strtom().
This is another in the series of commits that add support to the NFS client
and server for building RPC messages in ext_pgs mbufs with anonymous pages.
This is useful so that the entire mbuf list does not need to be
copied before calling sosend() when NFS over TLS is enabled.

Since ND_EXTPG is never set yet, there is no semantic change at this time.


# dccb5806 03-Jul-2020 Rick Macklem <rmacklem@FreeBSD.org>

Add support for ext_pgs mbufs to nfscl_reqstart() and nfsm_set().

This is another in the series of commits that add support to the NFS client
and server for building RPC messages in ext_pgs mbufs with anonymous pages.
This is useful so that the entire mbuf list does not need to be
copied before calling sosend() when NFS over TLS is enabled.

Since ND_EXTPG is never set yet, there is no semantic change at this time.


# 4476c1de 25-Jun-2020 Rick Macklem <rmacklem@FreeBSD.org>

Add a boolean argument to nfscl_reqstart() to indicate that ext_pgs mbufs
should be used.

For KERN_TLS (and possibly some other future network interface) the mbuf
list passed into sosend() must be ext_pgs mbufs. The krpc could simply
copy all the mbuf data into ext_pgs mbufs before calling sosend(), but
that would be inefficient for large RPC messages.
This patch adds an argument to nfscl_reqstart() to indicate that it should
fill the RPC message into ext_pgs mbufs.
It also adds fields to "struct nfsrv_descript" needed for building NFS RPC
messages in ext_pgs mbufs, along with new flags for this.

Since the argument is always "false", this commit should not result in any
semantic change. However, this commit prepares the code
for future commits that will add support for building of NFS RPC messages
in ext_pgs mbufs.


# c07782e1 22-Jun-2020 Doug Rabson <dfr@FreeBSD.org>

Add some missing parts for supporting va_birthtime.

Reviewed by: rmacklem


# eea79fde 17-Jun-2020 Alan Somers <asomers@FreeBSD.org>

Remove vfs_statfs and vnode_mount macros from NFS

These macro definitions are no longer needed as the NFS OSX port is long
dead. The vfs_statfs macro conflicts with the vfsops field of the same
name.

Submitted by: shivank@
Reviewed by: rmacklem
MFC after: 2 weeks
Sponsored by: Google, Inc. (GSoC 2020)
Differential Revision: https://reviews.freebsd.org/D25263


# 3900c114 14-Jun-2020 Doug Rabson <dfr@FreeBSD.org>

Add support for the timecreate attribute

This maps to the va_birthtime VFS attribute.


# 3d7650f0 17-May-2020 Rick Macklem <rmacklem@FreeBSD.org>

Add a function nfsm_set() to initialize "struct nfsrv_descript" for building
mbuf lists.

This function is currently trivial, but will that will change when
support for building NFS messages in ext_pgs mbufs is added.
Adding support for ext_pgs mbufs is needed for KERN_TLS, which will
be used to implement nfs-over-tls.


# b9cc3262 12-May-2020 Ryan Moeller <freqlabs@FreeBSD.org>

nfs: Remove APPLESTATIC macro

It is no longer useful.

Reviewed by: rmacklem
Approved by: mav (mentor)
MFC after: 1 week
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D24811


# 32033b3d 08-May-2020 Ryan Moeller <freqlabs@FreeBSD.org>

Remove APPLEKEXT ifndefs

They are no longer useful.

Reviewed by: rmacklem
Approved by: mav (mentor)
MFC after: 1 week
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D24752


# 04d6c514 05-May-2020 Rick Macklem <rmacklem@FreeBSD.org>

Delete unused function newnfs_trimleading.

The NFS function called newnfs_trimleading() has not been used by the
code in long time. To give you a clue, it still had a K&R style function
declaration.
Delete it, since it is just cruft, as a part of the NFS mbuf handling
cleanup in preparation for adding ext_pgs mbuf support.
The ext_pgs mbuf support for the build/send side is needed by
nfs-over-tls.


# 3973ef1d 04-May-2020 Rick Macklem <rmacklem@FreeBSD.org>

Revert r360514, to avoid unnecessary churn of the sources.

r360514 prepared the NFS code for changes to handle ext_pgs mbufs on
the receive side. However, at this time, KERN_TLS does not pass
ext_pgs mbufs up through soreceive(). As such, as this time, only
the send/build side of the NFS mbuf code needs to handle ext_pgs mbufs.
Revert r360514 since the rather extensive changes required for receive
side ext_pgs mbufs are not yet needed.
This avoids unnecessary churn of the sources.


# 0c9cd5ca 30-Apr-2020 Rick Macklem <rmacklem@FreeBSD.org>

Factor some code out of nfsm_dissct() into separate functions.

Factoring some of the code in nfsm_dissct() out into separate functions
allows these functions to be used elsewhere in the NFS mbuf handling code.
Other uses of these functions will be done in future commits.
It also makes it easier to add support for ext_pgs mbufs, which is needed
for nfs-over-tls under development in base/projects/nfs-over-tls.

Although the algorithm in nfsm_dissct() is somewhat re-written by this
patch, the semantics of nfsm_dissct() should not have changed.


# e4a458bb 24-Apr-2020 Rick Macklem <rmacklem@FreeBSD.org>

Remove Mac OS/X macros that did nothing for FreeBSD.

The macros CAST_USER_ADDR_T() and CAST_DOWN() were used for the Mac OS/X
port. The first of these macros was a no-op for FreeBSD and the second
is no longer used.
This patch gets rid of them. It also deletes the "mbuf_t" typedef which
is no longer used in the FreeBSD code from nfskpiport.h

This patch should not change semantics.


# ae070589 17-Apr-2020 Rick Macklem <rmacklem@FreeBSD.org>

Replace all instances of the typedef mbuf_t with "struct mbuf *".

The typedef mbuf_t was used for the Mac OS/X port of the code long ago.
Since this port is no longer used and the use of mbuf_t obscures what
the code does (and is not consistent with style(9)), it is no longer needed.
This patch replaces all instances of mbuf_t with "struct mbuf *", so that
it is no longer used.

This patch should not result in any semantic change.


# c948a17a 09-Apr-2020 Rick Macklem <rmacklem@FreeBSD.org>

Replace mbuf macros with the code they would generate in the NFS code.

When the code was ported to Mac OS/X, mbuf handling functions were
converted to using the Mac OS/X accessor functions. For FreeBSD, they
are a simple set of macros in sys/fs/nfs/nfskpiport.h.
Since porting to Mac OS/X is no longer a consideration, replacement of
these macros with the code generated by them makes the code more
readable.
When support for external page mbufs is added as needed by the KERN_TLS,
the patch becomes simpler if done without the macros.

This patch should not result in any semantic change.
This conversion will be committed one file at a time.


# b249ce48 03-Jan-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: drop the mostly unused flags argument from VOP_UNLOCK

Filesystems which want to use it in limited capacity can employ the
VOP_UNLOCK_FLAGS macro.

Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D21427


# c057a378 12-Dec-2019 Rick Macklem <rmacklem@FreeBSD.org>

Add support for NFSv4.2 to the NFS client and server.

This patch adds support for NFSv4.2 (RFC-7862) and Extended Attributes
(RFC-8276) to the NFS client and server.
NFSv4.2 is comprised of several optional features that can be supported
in addition to NFSv4.1. This patch adds the following optional features:
- posix_fadvise(POSIX_FADV_WILLNEED/POSIX_FADV_DONTNEED)
- posix_fallocate()
- intra server file range copying via the copy_file_range(2) syscall
--> Avoiding data tranfer over the wire to/from the NFS client.
- lseek(SEEK_DATA/SEEK_HOLE)
- Extended attribute syscalls for "user" namespace attributes as defined
by RFC-8276.

Although this patch is fairly large, it should not affect support for
the other versions of NFS. However it does add two new sysctls that allow
a sysadmin to limit which minor versions of NFSv4 a server supports, allowing
a sysadmin to disable NFSv4.2.

Unfortunately, when the NFS stats structure was last revised, it was assumed
that there would be no additional operations added beyond what was
specified in RFC-7862. However RFC-8276 did add additional operations,
forcing the NFS stats structure to revised again. It now has extra unused
entries in all arrays, so that future extensions to NFSv4.2 can be
accomodated without revising this structure again.

A future commit will update nfsstat(1) to report counts for the new NFSv4.2
specific operations/procedures.

This patch affects the internal interface between the nfscommon, nfscl and
nfsd modules and, as such, they all must be upgraded simultaneously.
I will do a version bump (although arguably not needed), due to this.

This code has survived a "make universe" but has not been built with a
recent GCC. If you encounter build problems, please email me.

Relnotes: yes


# e1cda5ee 28-Nov-2019 Rick Macklem <rmacklem@FreeBSD.org>

Fix two races while handling nfsuserd daemon start/stop.

A crash was reported where the nr_client field was NULL during an upcall
to the nfsuserd daemon. Since nr_client == NULL only occurs when the
nfsuserd daemon is being shut down, it appeared to be caused by a race
between doing an upcall and the daemon shutting down.
By inspection two races were identified:
1 - The nfsrv_nfsuserd variable is used to indicate whether or not the
daemon is running. However it did not handle the intermediate phase
where the daemon is starting or stopping.

This was fixed by making nfsrv_nfsuserd tri-state and having the
functions that are called during start/stop to obey the intermediate
state.

2 - nfsrv_nfsuserd was checked to see that the daemon was running at
the beginning of an upcall, but nothing prevented the daemon from
being shut down while an upcall was still in progress.
This race probably caused the crash.

The patch fixes this by adding a count of upcalls in progress and
having the shut down function delay until this count goes to zero
before getting rid of nr_client and related data used by an upcall.

Tested by: avg (Panzura QA)
Reported by: avg
Reviewed by: avg
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D22377


# a6f77c9a 21-Apr-2019 Rick Macklem <rmacklem@FreeBSD.org>

Add #ifdef INET as requested by bz@.


# ea5776ec 18-Apr-2019 Rick Macklem <rmacklem@FreeBSD.org>

Fix the NFSv4.0 server so that it does not support NFSv4.1 attributes.

During inspection of a packet trace, I noticed that an NFSv4.0 mount
reported that it supported attributes that are only defined for NFSv4.1.
In practice, this bug appears to be benign, since NFSv4.0 clients will
not use attributes that were added for NFSv4.1.
However, this was not correct and this patch fixes the NFSv4.0 server
so that it only supports attributes defined for NFSv4.0.
It also adds a definition for NFSv4.1 attributes that can only be set,
although it is only defined as 0 for now.
This is anticipation of the addition of support for the NFSv4.1 mode+mask
attribute soon.

MFC after: 2 weeks


# 80405bcf 06-Apr-2019 Rick Macklem <rmacklem@FreeBSD.org>

Add INET6 support for the upcalls to the nfsuserd daemon.

The kernel code uses UDP to do upcalls to the nfsuserd(8) daemon to get
updates to the username<->uid and groupname<->gid mappings.
A change to AF_LOCAL last year had to be reverted, since it could result
in vnode locking issues on the AF_LOCAL socket.
This patch adds INET6 support and the required #ifdef INET and INET6
to the code.

Requested by: bz
PR: 205193
Reviewed by: bz, rgrimes
MFC after: 2 weeks
Differential Revision: http://reviews.freebsd.org/D19218


# 02c8dd7d 04-Apr-2019 Rick Macklem <rmacklem@FreeBSD.org>

Revert r320698, since the related userland changes were reverted by r338192.

r338192 reverted the changes to nfsuserd so that it could use an AF_LOCAL
socket, since it resulted in a vnode locking panic().
Post r338192 nfsuserd daemons use the old AF_INET socket for upcalls and
do not use these kernel changes.
I left them in for a while, so that nfsuserd daemons built from head sources
between r320757 (Jul. 6, 2017) and r338192 (Aug. 22, 2018) would need them
by default.
This only affects head, since the changes were never MFC'd.
I will add an UPDATING entry, since an nfsuserd daemon built from head
sources between r320757 and r338192 will not run unless the "-use-udpsock"
option is specified. (This command line option is only in the affected
revisions of the nfsuserd daemon.)

I suspect few will be affected by this, since most who run systems built
from head sources (not stable or releases) will have rebuilt their nfsuserd
daemon from sources post r338192 (Aug. 22, 2018)

This is being reverted in preparation for an update to include AF_INET6
support to the code.


# 2df8bd90 12-Mar-2019 Edward Tomasz Napierala <trasz@FreeBSD.org>

Drop unused 'p' argument to nfsv4_strtogid().

MFC after: 2 weeks
Sponsored by: DARPA, AFRL


# c703cba8 12-Mar-2019 Edward Tomasz Napierala <trasz@FreeBSD.org>

Drop unused 'p' argument to nfsv4_gidtostr().

MFC after: 2 weeks
Sponsored by: DARPA, AFRL


# 0658ac39 12-Mar-2019 Edward Tomasz Napierala <trasz@FreeBSD.org>

Drop unused 'p' argument to nfsv4_strtouid().

MFC after: 2 weeks
Sponsored by: DARPA, AFRL


# 0f86b94a 12-Mar-2019 Edward Tomasz Napierala <trasz@FreeBSD.org>

Drop unused 'p' argument to nfsv4_uidtostr().

MFC after: 2 weeks
Sponsored by: DARPA, AFRL


# f32bf292 12-Mar-2019 Edward Tomasz Napierala <trasz@FreeBSD.org>

Drop unused 'p' argument to nfsrv_getuser().

Reviewed by: rmacklem
MFC after: 2 weeks
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D19455


# cc426dd3 11-Dec-2018 Mateusz Guzik <mjg@FreeBSD.org>

Remove unused argument to priv_check_cred.

Patch mostly generated with cocinnelle:

@@
expression E1,E2;
@@

- priv_check_cred(E1,E2,0)
+ priv_check_cred(E1,E2)

Sponsored by: The FreeBSD Foundation


# 778f2983 19-Nov-2018 Rick Macklem <rmacklem@FreeBSD.org>

nfsm_advance() would panic() when the offs argument was negative.
The code assumed that this would indicate a corrupted mbuf chain, but
it could simply be caused by bogus RPC message data.
This patch replaces the panic() with a printf() plus error return.

MFC after: 1 week


# 3e5ba2e1 17-Aug-2018 Rick Macklem <rmacklem@FreeBSD.org>

Fix LORs between vn_start_write() and vn_lock() in the pNFS server.

When coding the pNFS server, I added several vn_start_write() calls done
while the vnode was locked, not realizing I had introduced LORs and
possible deadlock when an exported file system on the MDS is suspended.
This patch fixes this by removing the added vn_start_write() calls and
modifying the code so that the extant vn_start_write() call before the
NFS RPC/operation is done when needed by the pNFS server.
Flags are changed so that LayoutCommit and LayoutReturn now get a
vn_start_write() done for them.
When the pNFS server is enabled, the code now also changes the flags for
Getattr, so that the vn_start_write() is done for Getattr, since it may
need to do a vn_set_extattr(). The nfs_writerpc flag array was made global
to the NFS server and renamed nfsrv_writerpc, which is consistent naming
for globals in the NFS server.
Thanks go to kib@ for reporting that doing vn_start_write() while the vnode is
locked results in a LOR.
This patch only affects the behaviour of the pNFS server.


# 2f32675c 02-Jul-2018 Rick Macklem <rmacklem@FreeBSD.org>

Add an optional feature to the pNFS server.

Without this patch, the pNFS server distributes the data storage files across
all of the specified DSs.
A tester noted that it would be nice if a system administrator could control
which DSs are used to store the file data for a given exported MDS file system.
This patch adds the kernel support to do this. It also makes a slight semantic
change to nfsv4_findmirror(), since some uses of it no longer require that
the DS being searched for have a current mirror.
A patch that will be committed in a few minutes will modify the nfsd daemon
to support this feature.
The patch should only affect sites using the pNFS server (specified via the
"-p" command line option for nfsd.

Suggested by: james.rose@framestore.com


# 9f4c522e 22-Jun-2018 Rick Macklem <rmacklem@FreeBSD.org>

Set the slotid and ND_HASSLOTID flag for NFSv4.1 sequenced operations.

Most NFSv4.1 compound RPCs start with a Sequence operation. For these
cases, save the slotid and note that it is saved by setting ND_HASSLOTID.
This is used by r335568 to free up the session slot and disable it.

MFC after: 2 weeks


# c338c94d 14-Jun-2018 Rick Macklem <rmacklem@FreeBSD.org>

Move four functions in nfscl.ko to nfscommon.ko.

Four functions nfscl_reqstart(), nfscl_fillsattr(), nfsm_stateidtom()
and nfsmnt_mdssession() are now called from within the nfsd.
As such, they needed to be moved from nfscl.ko to nfscommon.ko so that
nfsd.ko would load when nfscl.ko wasn't loaded.

Reported by: herbert@gojira.at


# 90d2dfab 12-Jun-2018 Rick Macklem <rmacklem@FreeBSD.org>

Merge the pNFS server code from projects/pnfs-planb-server into head.

This code merge adds a pNFS service to the NFSv4.1 server. Although it is
a large commit it should not affect behaviour for a non-pNFS NFS server.
Some documentation on how this works can be found at:
http://people.freebsd.org/~rmacklem/pnfs-planb-setup.txt
and will hopefully be turned into a proper document soon.
This is a merge of the kernel code. Userland and man page changes will
come soon, once the dust settles on this merge.
It has passed a "make universe", so I hope it will not cause build problems.
It also adds NFSv4.1 server support for the "current stateid".

Here is a brief overview of the pNFS service:
A pNFS service separates the Read/Write oeprations from all the other NFSv4.1
Metadata operations. It is hoped that this separation allows a pNFS service
to be configured that exceeds the limits of a single NFS server for either
storage capacity and/or I/O bandwidth.
It is possible to configure mirroring within the data servers (DSs) so that
the data storage file for an MDS file will be mirrored on two or more of
the DSs.
When this is used, failure of a DS will not stop the pNFS service and a
failed DS can be recovered once repaired while the pNFS service continues
to operate. Although two way mirroring would be the norm, it is possible
to set a mirroring level of up to four or the number of DSs, whichever is
less.
The Metadata server will always be a single point of failure,
just as a single NFS server is.

A Plan B pNFS service consists of a single MetaData Server (MDS) and K
Data Servers (DS), all of which are recent FreeBSD systems.
Clients will mount the MDS as they would a single NFS server.
When files are created, the MDS creates a file tree identical to what a
single NFS server creates, except that all the regular (VREG) files will
be empty. As such, if you look at the exported tree on the MDS directly
on the MDS server (not via an NFS mount), the files will all be of size 0.
Each of these files will also have two extended attributes in the system
attribute name space:
pnfsd.dsfile - This extended attrbute stores the information that
the MDS needs to find the data storage file(s) on DS(s) for this file.
pnfsd.dsattr - This extended attribute stores the Size, AccessTime, ModifyTime
and Change attributes for the file, so that the MDS doesn't need to
acquire the attributes from the DS for every Getattr operation.
For each regular (VREG) file, the MDS creates a data storage file on one
(or more if mirroring is enabled) of the DSs in one of the "dsNN"
subdirectories. The name of this file is the file handle
of the file on the MDS in hexadecimal so that the name is unique.
The DSs use subdirectories named "ds0" to "dsN" so that no one directory
gets too large. The value of "N" is set via the sysctl vfs.nfsd.dsdirsize
on the MDS, with the default being 20.
For production servers that will store a lot of files, this value should
probably be much larger.
It can be increased when the "nfsd" daemon is not running on the MDS,
once the "dsK" directories are created.

For pNFS aware NFSv4.1 clients, the FreeBSD server will return two pieces
of information to the client that allows it to do I/O directly to the DS.
DeviceInfo - This is relatively static information that defines what a DS
is. The critical bits of information returned by the FreeBSD
server is the IP address of the DS and, for the Flexible
File layout, that NFSv4.1 is to be used and that it is
"tightly coupled".
There is a "deviceid" which identifies the DeviceInfo.
Layout - This is per file and can be recalled by the server when it
is no longer valid. For the FreeBSD server, there is support
for two types of layout, call File and Flexible File layout.
Both allow the client to do I/O on the DS via NFSv4.1 I/O
operations. The Flexible File layout is a more recent variant
that allows specification of mirrors, where the client is
expected to do writes to all mirrors to maintain them in a
consistent state. The Flexible File layout also allows the
client to report I/O errors for a DS back to the MDS.
The Flexible File layout supports two variants referred to as
"tightly coupled" vs "loosely coupled". The FreeBSD server always
uses the "tightly coupled" variant where the client uses the
same credentials to do I/O on the DS as it would on the MDS.
For the "loosely coupled" variant, the layout specifies a
synthetic user/group that the client uses to do I/O on the DS.
The FreeBSD server does not do striping and always returns
layouts for the entire file. The critical information in a layout
is Read vs Read/Writea and DeviceID(s) that identify which
DS(s) the data is stored on.

At this time, the MDS generates File Layout layouts to NFSv4.1 clients
that know how to do pNFS for the non-mirrored DS case unless the sysctl
vfs.nfsd.default_flexfile is set non-zero, in which case Flexible File
layouts are generated.
The mirrored DS configuration always generates Flexible File layouts.
For NFS clients that do not support NFSv4.1 pNFS, all I/O operations
are done against the MDS which acts as a proxy for the appropriate DS(s).
When the MDS receives an I/O RPC, it will do the RPC on the DS as a proxy.
If the DS is on the same machine, the MDS/DS will do the RPC on the DS as
a proxy and so on, until the machine runs out of some resource, such as
session slots or mbufs.
As such, DSs must be separate systems from the MDS.

Tested by: james.rose@framestore.com
Relnotes: yes


# 9442a64e 01-Jun-2018 Rick Macklem <rmacklem@FreeBSD.org>

Add the BindConnectiontoSession operation to the NFSv4.1 server.

Under some fairly unusual circumstances, the Linux NFSv4.1 client is
doing a BindConnectiontoSession operation for TCP connections.
It is also used by the ESXi6.5 NFSv4.1 client.
This patch adds this operation to the NFSv4.1 server.

Reported by: andreas.nagy@frequentis.com
Tested by: andreas.nagy@frequentis.com
MFC after: 2 weeks


# b97b91b5 25-Jan-2018 Conrad Meyer <cem@FreeBSD.org>

nfs: Remove NFSSOCKADDRALLOC, NFSSOCKADDRFREE macros

They were just thin wrappers over malloc(9) w/ M_ZERO and free(9).

Discussed with: rmacklem, markj
Sponsored by: Dell EMC Isilon


# 222daa42 25-Jan-2018 Conrad Meyer <cem@FreeBSD.org>

style: Remove remaining deprecated MALLOC/FREE macros

Mechanically replace uses of MALLOC/FREE with appropriate invocations of
malloc(9) / free(9) (a series of sed expressions). Something like:

* MALLOC(a, b, ... -> a = malloc(...
* FREE( -> free(
* free((caddr_t) -> free(

No functional change.

For now, punt on modifying contrib ipfilter code, leaving a definition of
the macro in its KMALLOC().

Reported by: jhb
Reviewed by: cy, imp, markj, rmacklem
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D14035


# 151ba793 24-Dec-2017 Alexander Kabaev <kan@FreeBSD.org>

Do pass removing some write-only variables from the kernel.

This reduces noise when kernel is compiled by newer GCC versions,
such as one used by external toolchain ports.

Reviewed by: kib, andrew(sys/arm and sys/arm64), emaste(partial), erj(partial)
Reviewed by: jhb (sys/dev/pci/* sys/kern/vfs_aio.c and sys/kern/kern_synch.c)
Differential Revision: https://reviews.freebsd.org/D10385


# 55384243 19-Dec-2017 John Baldwin <jhb@FreeBSD.org>

Replace one more LINK_MAX with NFS_LINK_MAX missed in r326991.

Sponsored by: Chelsio Communications


# a0a073b1 19-Dec-2017 John Baldwin <jhb@FreeBSD.org>

Update NFS to handle larger link counts post ino64.

- Define a NFS_LINK_MAX as UINT32_MAX to match the wire protocol.
- Use NFS_LINK_MAX instead of LINK_MAX as the fallback value reported
for a PATHCONF RPC by the NFS server.
- Use NFS_LINK_MAX instead of LINK_MAX as the default value reported
by the NFS client pathconf() if not overridden by the NFS server.
- When reading the link count out of an RPC reply, read the full 32
bits instead of the lower 16 bits.

Reviewed by: rmacklem (earlier version)
Sponsored by: Chelsio Communications


# 51369649 20-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.


# be3d32ad 28-Sep-2017 Rick Macklem <rmacklem@FreeBSD.org>

Change nfsv4_getipaddr() and nfsrpc_fillsa() to not use sockaddr_storage.

This patch changes nfsv4_getipaddr() and nfsrpc_fillsa() to use
a sockaddr_in * and sockaddr_in6 * instead of sockaddr_storage, to
avoid allocating the latter on the stack. It also moves the nfsrpc_fillsa()
call to after the completion of parsing of the DeviceInfo reply from
the server. This patch is in preparation for addition of Flex File
Layout support in a future commit.
It only affects the "pnfs" NFSv4.1 client mount option and should not
have changed its semantics.


# 16f300fa 27-Jul-2017 Rick Macklem <rmacklem@FreeBSD.org>

Replace the checks for MNTK_UNMOUNTF with a macro that does the same thing.

This patch defines a macro that checks for MNTK_UNMOUNTF and replaces
explicit checks with this macro. It has no effect on semantics, but
prepares the code for a future patch where there will also be a
NFS specific flag for "forced dismount about to occur".

Suggested by: kib
MFC after: 2 weeks


# 1d2fef9b 19-Jul-2017 Edward Tomasz Napierala <trasz@FreeBSD.org>

Rename vfs.nfsd.enable_uidtostring to vfs.nfs.enable_uidtostring.
It applies to both NFS client and NFS server, and is useful for both.
This is different from vfs.nfsd.enable_stringtouid, which is specific
to server side.

Reviewed by: rmacklem@
MFC after: 2 weeks
Sponsored by: DARPA, AFRL


# 25d694a6 05-Jul-2017 Rick Macklem <rmacklem@FreeBSD.org>

Add support for AF_LOCAL socket upcalls to the nfsuserd daemon.

This patch adds support for AF_LOCAL socket upcalls to an nfsuserd daemon
that supports them. A future patch to the nfsuserd daemon will use AF_LOCAL
sockets to avoid a problem when using upcalls to 127.0.0.1 if jails are
in use.

Suggested by: dfr
PR: 205193


# 3c264086 27-Jun-2017 Edward Tomasz Napierala <trasz@FreeBSD.org>

Revert part of r320359, as suggested by rmacklem@. That case is only used
for nfsuserd -manage-gids and shouldn't depend on sysctl.

MFC after: 2 weeks
Sponsored by: DARPA, AFRL


# 6a3450e1 26-Jun-2017 Edward Tomasz Napierala <trasz@FreeBSD.org>

Add vfs.nfsd.nfsd_enable_uidtostring, which works just like
vfs.nfsd.nfsd_enable_stringtouid, but in reverse - when set to 1,
it forces the NFSv4 server to return numeric UIDs and GIDs instead
of "user@domain" strings. This helps with clients that can't
translate returned identifiers, eg when rerooting.

The same can be achieved by just never running nfsuserd(8),
but the sysctl is useful to toggle the behaviour back and forth
without rebooting.

Reviewed by: rmacklem (earlier version)
MFC after: 2 weeks
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D11326


# a351e99c 24-Jun-2017 Rick Macklem <rmacklem@FreeBSD.org>

Add two new compound RPCs to the NFSv4.1/pNFS client.

When the NFSv4.1 client is doing pNFS, it needs to get an Open and
a Layout for every file it will be doing I/O on. The current code
does two separate RPCs to get these. This patch adds two new compounds
that do the both the Open and LayoutGet in the same RPC, reducing the
RPC count.
It also factors out the code that sets up and parses the LayoutGet operation
into separate functions, so that the code doesn't get duplicated for
these new RPCs.
This patch is fairly large, but should only affect the NFSv4.1 client
when the "pnfs" option is specified.

PR: 219550
MFC after: 2 weeks


# 95ac7f1a 18-Jun-2017 Rick Macklem <rmacklem@FreeBSD.org>

Fix the NFS client/server so that it actually uses the 64bit ino_t filenos.

The code still doesn't use d_off. That will come in a future commit.
The code also removes the checks for servers returning a fileno that
doesn't fit in 32bits, since that should work ok now.
Bump __FreeBSD_version since this patch changes the interface between
the NFS kernel modules.

Reviewed by: kib


# 8c1d0d9c 21-Apr-2017 Rick Macklem <rmacklem@FreeBSD.org>

Set default uid/gid to nobody/nogroup for NFSv4 mapping.

The default uid/gid for NFSv4 are set by the nfsuserd(8) daemon.
However, they were 0 until the nfsuserd(8) was run. Since it is
possible to use NFSv4 without running the nfsuserd(8) daemon, set them
to nobody/nogroup initially.
Without this patch, the values would be set by the nfsuserd(8) daemon
and left changed even if the nfsuserd(8) daemon was killed. The default
values of 0 meant that setting a group to "wheel" would fail even when
done by root.
It also adds a definition of GID_NOGROUP to sys/conf.h.

Discussed on: freebsd-current@
MFC after: 2 weeks


# b843ada7 21-Apr-2017 Rick Macklem <rmacklem@FreeBSD.org>

Revert r317240. I didn't realize there were defined constants for
uid/gid values in sys/conf.h. I will do another commit using those.


# 1350db17 20-Apr-2017 Rick Macklem <rmacklem@FreeBSD.org>

Set default uid/gid to nobody/nogroup for NFSv4 mapping.

The default uid/gid for NFSv4 are set by the nfsuserd(8) daemon.
However, they were 0 until the nfsuserd(8) was run. Since it is
possible to use NFSv4 without running the nfsuserd(8) daemon, set them
to nobody/nogroup initially.
Without this patch, the values would be set by the nfsuserd(8) daemon
and left changed even if the nfsuserd(8) daemon was killed. Also, the default
values of 0 meant that setting a group to "wheel" would fail even when
done by root and this patch fixes this issue.

MFC after: 2 weeks


# fb556791 10-Apr-2017 Rick Macklem <rmacklem@FreeBSD.org>

Set initial values for nfsstatfs in the NFSv4 client.

The AmazonEFS NFSv4.1 server does not support the FILES_FREE and FILES_TOTAL
attributes. As such, an NFSv4.1 mount to the server would return garbage
for these values. This patch initializes the fields of the nfsstatfs structure,
so that "df" and friends will at least return consistent bogus values.
This patch should have effect when mounting other NFSv4.1 servers.

Reported by: cperciva
MFC after: 2 weeks


# 2242bc81 09-Apr-2017 Rick Macklem <rmacklem@FreeBSD.org>

Fix the NFSv4.1 client for NFSERR_BADSESSION recovery via ReclaimComplete.

For the ReclaimComplete operation, the RPC layer should not loop on
NFSERR_BADSESSION. If it does, the recovery thread (nfscl) can get stuck
looping and will not do a recovery.
This patch fixes it so it does not loop. This bug only affects NFSv4.1 and
only when a server reboots.

Tested by: cperciva
PR: 215886
MFC after: 2 weeks


# fbbd9655 28-Feb-2017 Warner Losh <imp@FreeBSD.org>

Renumber copyright clause 4

Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by: Jan Schaumann <jschauma@stevens.edu>
Pull Request: https://github.com/freebsd/freebsd/pull/96


# 2f304845 05-Jan-2017 Konstantin Belousov <kib@FreeBSD.org>

Do not allocate struct statfs on kernel stack.

Right now size of the structure is 472 bytes on amd64, which is
already large and stack allocations are indesirable. With the ino64
work, MNAMELEN is increased to 1024, which will make it impossible to have
struct statfs on the stack.

Extracted from: ino64 work by gleb
Discussed with: mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# b2fc0141 23-Dec-2016 Rick Macklem <rmacklem@FreeBSD.org>

Fix NFSv4.1 client recovery from NFS4ERR_BAD_SESSION errors.

For most NFSv4.1 servers, a NFS4ERR_BAD_SESSION error is a rare failure
that indicates that the server has lost session/open/lock state.
However, recent testing by cperciva@ against the AmazonEFS server found
several problems with client recovery from this due to it generating this
failure frequently.
Briefly, the problems fixed are:
- If all session slots were in use at the time of the failure, some processes
would continue to loop waiting for a slot on the old session forever.
- If an RPC that doesn't use open/lock state failed with NFS4ERR_BAD_SESSION,
it would fail the RPC/syscall instead of initiating recovery and then
looping to retry the RPC.
- If a successful reply to an RPC for an old session wasn't processed
until after a new session was created for a NFS4ERR_BAD_SESSION error,
it would erroneously update the new session and corrupt it.
- The use of the first element of the session list in the nfs mount
structure (which is always the current metadata session) was slightly
racey. With changes for the above problems it became more racey, so all
uses of this head pointer was wrapped with a NFSLOCKMNT()/NFSUNLOCKMNT().
- Although the kernel malloc() usually allocates more bytes than requested
and, as such, this wouldn't have caused problems, the allocation of a
session structure was 1 byte smaller than it should have been.
(Null termination byte for the string not included in byte count.)

There are probably still problems with a pNFS data server that fails
with NFS4ERR_BAD_SESSION, but I have no server that does this to test
against (the AmazonEFS server doesn't do pNFS), so I can't fix these yet.

Although this patch is fairly large, it should only affect the handling
of NFS4ERR_BAD_SESSION error replies from an NFSv4.1 server.
Thanks go to cperciva@ for the extension testing he did to help isolate/fix
these problems.

Reported by: cperciva
Tested by: cperciva
MFC after: 3 months
Differential Revision: https://reviews.freebsd.org/D8745


# 63659ba6 15-Nov-2016 Colin Percival <cperciva@FreeBSD.org>

Reduce NFS "NFSv4( mounted on)? fileid > 32bits" log spam.

Rather than printing a warning for every time we receive a fileid > 2^32
from the NFS server, count warnings and print at most one of each warning
type per minute, e.g.,

Nov 15 05:17:34 ip-172-30-1-221 kernel: NFSv4 fileid > 32bits (24730 occurrences)
Nov 15 05:17:56 ip-172-30-1-221 kernel: NFSv4 mounted on fileid > 32bits (178 occurrences)
Nov 15 05:18:53 ip-172-30-1-221 kernel: NFSv4 fileid > 32bits (7582 occurrences)
Nov 15 05:18:58 ip-172-30-1-221 kernel: NFSv4 mounted on fileid > 32bits (23 occurrences)

A buildworld with an NFS mounted /usr/obj can otherwise result in
hundreds of thousands of lines being printed, which seems unnecessarily
verbose.

When ino_t becomes a 64-bit type, these printfs will no longer be needed
(and the problems associated with truncating 64-bit fileids to generate
32-bit inode numbers will also go away).

Reviewed by: rmacklem
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D8523


# 8edac6ee 06-May-2016 Ed Maste <emaste@FreeBSD.org>

Add nid_namelen bounds check to nfssvc system call

This is only allowed by root and only used by the nfs daemon, which
should not provide an incorrect value. However, it's still good
practice to validate data provided by userland.

PR: 206626
Reported by: CTurt <cturt@hardenedbsd.org>
Reviewed by: rmacklem
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D6201


# 74b8d63d 10-Apr-2016 Pedro F. Giffuni <pfg@FreeBSD.org>

Cleanup unnecessary semicolons from the kernel.

Found with devel/coccinelle.


# 65171ebb 01-Dec-2015 Rick Macklem <rmacklem@FreeBSD.org>

Fix the memory leak that occurs when the nfscommon.ko module is unloaded.
This leak was introduced by r291527.
Since the nfscommon.ko module is rarely unloaded, this leak would not
have been much of an issue.

MFC after: 2 weeks


# 84be7e09 30-Nov-2015 Rick Macklem <rmacklem@FreeBSD.org>

Add kernel support to the NFS server for the "-manage-gids"
option that will be added to the nfsuserd daemon in a future
commit. It modifies the cache used by NFSv4 for name<-->id
translation (both username/uid and group/gid) to support this.
When "-manage-gids" is set, the server looks up each uid
for the RPC and uses the list of groups cached in the server
instead of the list of groups provided in the RPC request.
The cached group list is acquired for the cache by the nfsuserd
daemon via getgrouplist(3).
This avoids the 16 groups limit for the list in the RPC request.
Since the cache is now used for every RPC when "-manage-gids"
is enabled, the code also modifies the cache to use a separate
mutex for each hash list instead of a single global mutex.

Suggested by: jpaetzel
Tested by: jpaetzel
MFC after: 2 weeks


# 6d659a5d 19-Dec-2014 Benno Rice <benno@FreeBSD.org>

Adjust the test of a KASSERT to better match the intent.

This assertion was added in r246213 as a guard against corrupted mbufs
arriving from drivers, the key distinguishing factor of said mbufs being
that they had a negative length. Given we're in a while loop specifically
designed to skip over zero-length mbufs, panicking on a zero-length mbuf
seems incorrect.

No objection from: kib


# d8a5961f 02-Oct-2014 Marcelo Araujo <araujo@FreeBSD.org>

Fix failures and warnings reported by newpynfs20090424 test tool.
This fix addresses only issues with the pynfs reports, none of these
issues are know to create problems for extant real clients.

Submitted by: Bart Hsiao <bart.hsiao@gmail.com>
Reworked by: myself
Reviewed by: rmacklem
Approved by: rmacklem
Sponsored by: QNAP Systems Inc.


# c59e4cc3 01-Jul-2014 Rick Macklem <rmacklem@FreeBSD.org>

Merge the NFSv4.1 server code in projects/nfsv4.1-server over
into head. The code is not believed to have any effect
on the semantics of non-NFSv4.1 server behaviour.
It is a rather large merge, but I am hoping that there will
not be any regressions for the NFS server.

MFC after: 1 month


# ca20bd92 02-May-2014 Rick Macklem <rmacklem@FreeBSD.org>

The new draft specification for NFSv4.0 specifies that a server
should either accept owner and owner_group strings that are just
the digits of the uid/gid or return NFS4ERR_BADOWNER.
This patch adds a sysctl vfs.nfsd.enable_stringtouid, which can
be set to enable the server w.r.t. accepting numeric string. It
also ensures that NFS4ERR_BADOWNER is returned if numeric uid/gid
strings are not enabled. This fixes the server for recent Linux
nfs4 clients that use numeric uid/gid strings by default.

Reported and tested by: craigyk@gmail.com
MFC after: 2 weeks


# a6f8e64e 18-Apr-2014 Rick Macklem <rmacklem@FreeBSD.org>

Modify the Lookup RPC for NFSv4 so that it acquires directory
attributes. This allows the client to cache directory names
when they are looked up, reducing the Lookup RPC count by
about 40% for software builds.

MFC after: 2 weeks


# b921158a 23-Dec-2013 Rick Macklem <rmacklem@FreeBSD.org>

The NFSv4 client was passing both the p and cred arguments to
nfsv4_fillattr() as NULLs for the Getattr callback. This caused
nfsv4_fillattr() to not fill in the Change attribute for the reply.
I believe this was a violation of the RFC, but had little effect on
server behaviour. This patch passes a non-NULL p argument to fix this.

MFC after: 1 week


# 42b6336a 09-Nov-2013 Rick Macklem <rmacklem@FreeBSD.org>

Fix an NFSv4.1 client specific case where a forced dismount would hang.
The hang occurred in nfsv4_setsequence() when it couldn't find an
available session slot and is fixed by checking for a forced dismount
in progress and just returning for this case.

MFC after: 1 month


# a36b76a7 20-Jul-2013 Rick Macklem <rmacklem@FreeBSD.org>

The NFSv4 server incorrectly assumed that the high order words of
the attribute bitmap argument would be non-zero. This caused an
interoperability problem for a recent patch to the Linux NFSv4 client.
The Linux folks have changed their patch to avoid this, but this
patch fixes the problem on the server.

Reported and tested by: Andre Heider (a.heider@gmail.com)
MFC after: 3 days


# d96b98a3 17-Apr-2013 Kenneth D. Merry <ken@FreeBSD.org>

Revamp the old NFS server's File Handle Affinity (FHA) code so that
it will work with either the old or new server.

The FHA code keeps a cache of currently active file handles for
NFSv2 and v3 requests, so that read and write requests for the same
file are directed to the same group of threads (reads) or thread
(writes). It does not currently work for NFSv4 requests. They are
more complex, and will take more work to support.

This improves read-ahead performance, especially with ZFS, if the
FHA tuning parameters are configured appropriately. Without the
FHA code, concurrent reads that are part of a sequential read from
a file will be directed to separate NFS threads. This has the
effect of confusing the ZFS zfetch (prefetch) code and makes
sequential reads significantly slower with clients like Linux that
do a lot of prefetching.

The FHA code has also been updated to direct write requests to nearby
file offsets to the same thread in the same way it batches reads,
and the FHA code will now also send writes to multiple threads when
needed.

This improves sequential write performance in ZFS, because writes
to a file are now more ordered. Since NFS writes (generally
less than 64K) are smaller than the typical ZFS record size
(usually 128K), out of order NFS writes to the same block can
trigger a read in ZFS. Sending them down the same thread increases
the odds of their being in order.

In order for multiple write threads per file in the FHA code to be
useful, writes in the NFS server have been changed to use a LK_SHARED
vnode lock, and upgrade that to LK_EXCLUSIVE if the filesystem
doesn't allow multiple writers to a file at once. ZFS is currently
the only filesystem that allows multiple writers to a file, because
it has internal file range locking. This change does not affect the
NFSv4 code.

This improves random write performance to a single file in ZFS, since
we can now have multiple writers inside ZFS at one time.

I have changed the default tuning parameters to a 22 bit (4MB)
window size (from 256K) and unlimited commands per thread as a
result of my benchmarking with ZFS.

The FHA code has been updated to allow configuring the tuning
parameters from loader tunable variables in addition to sysctl
variables. The read offset window calculation has been slightly
modified as well. Instead of having separate bins, each file
handle has a rolling window of bin_shift size. This minimizes
glitches in throughput when shifting from one bin to another.

sys/conf/files:
Add nfs_fha_new.c and nfs_fha_old.c. Compile nfs_fha.c
when either the old or the new NFS server is built.

sys/fs/nfs/nfsport.h,
sys/fs/nfs/nfs_commonport.c:
Bring in changes from Rick Macklem to newnfs_realign that
allow it to operate in blocking (M_WAITOK) or non-blocking
(M_NOWAIT) mode.

sys/fs/nfs/nfs_commonsubs.c,
sys/fs/nfs/nfs_var.h:
Bring in a change from Rick Macklem to allow telling
nfsm_dissect() whether or not to wait for mallocs.

sys/fs/nfs/nfsm_subs.h:
Bring in changes from Rick Macklem to create a new
nfsm_dissect_nonblock() inline function and
NFSM_DISSECT_NONBLOCK() macro.

sys/fs/nfs/nfs_commonkrpc.c,
sys/fs/nfsclient/nfs_clkrpc.c:
Add the malloc wait flag to a newnfs_realign() call.

sys/fs/nfsserver/nfs_nfsdkrpc.c:
Setup the new NFS server's RPC thread pool so that it will
call the FHA code.

Add the malloc flag argument to newnfs_realign().

Unstaticize newnfs_nfsv3_procid[] so that we can use it in
the FHA code.

sys/fs/nfsserver/nfs_nfsdsocket.c:
In nfsrvd_dorpc(), add NFSPROC_WRITE to the list of RPC types
that use the LK_SHARED lock type.

sys/fs/nfsserver/nfs_nfsdport.c:
In nfsd_fhtovp(), if we're starting a write, check to see
whether the underlying filesystem supports shared writes.
If not, upgrade the lock type from LK_SHARED to LK_EXCLUSIVE.

sys/nfsserver/nfs_fha.c:
Remove all code that is specific to the NFS server
implementation. Anything that is server-specific is now
accessed through a callback supplied by that server's FHA
shim in the new softc.

There are now separate sysctls and tunables for the FHA
implementations for the old and new NFS servers. The new
NFS server has its tunables under vfs.nfsd.fha, the old
NFS server's tunables are under vfs.nfsrv.fha as before.

In fha_extract_info(), use callouts for all server-specific
code. Getting file handles and offsets is now done in the
individual server's shim module.

In fha_hash_entry_choose_thread(), change the way we decide
whether two reads are in proximity to each other.
Previously, the calculation was a simple shift operation to
see whether the offsets were in the same power of 2 bucket.
The issue was that there would be a bucket (and therefore
thread) transition, even if the reads were in close
proximity. When there is a thread transition, reads wind
up going somewhat out of order, and ZFS gets confused.

The new calculation simply tries to see whether the offsets
are within 1 << bin_shift of each other. If they are, the
reads will be sent to the same thread.

The effect of this change is that for sequential reads, if
the client doesn't exceed the max_reqs_per_nfsd parameter
and the bin_shift is set to a reasonable value (22, or
4MB works well in my tests), the reads in any sequential
stream will largely be confined to a single thread.

Change fha_assign() so that it takes a softc argument. It
is now called from the individual server's shim code, which
will pass in the softc.

Change fhe_stats_sysctl() so that it takes a softc
parameter. It is now called from the individual server's
shim code. Add the current offset to the list of things
printed out about each active thread.

Change the num_reads and num_writes counters in the
fha_hash_entry structure to 32-bit values, and rename them
num_rw and num_exclusive, respectively, to reflect their
changed usage.

Add an enable sysctl and tunable that allows the user to
disable the FHA code (when vfs.XXX.fha.enable = 0). This
is useful for before/after performance comparisons.

nfs_fha.h:
Move most structure definitions out of nfs_fha.c and into
the header file, so that the individual server shims can
see them.

Change the default bin_shift to 22 (4MB) instead of 18
(256K). Allow unlimited commands per thread.

sys/nfsserver/nfs_fha_old.c,
sys/nfsserver/nfs_fha_old.h,
sys/fs/nfsserver/nfs_fha_new.c,
sys/fs/nfsserver/nfs_fha_new.h:
Add shims for the old and new NFS servers to interface with
the FHA code, and callbacks for the

The shims contain all of the code and definitions that are
specific to the NFS servers.

They setup the server-specific callbacks and set the server
name for the sysctl and loader tunable variables.

sys/nfsserver/nfs_srvkrpc.c:
Configure the RPC code to call fhaold_assign() instead of
fha_assign().

sys/modules/nfsd/Makefile:
Add nfs_fha.c and nfs_fha_new.c.

sys/modules/nfsserver/Makefile:
Add nfs_fha_old.c.

Reviewed by: rmacklem
Sponsored by: Spectra Logic
MFC after: 2 weeks


# dd603523 01-Feb-2013 Konstantin Belousov <kib@FreeBSD.org>

Assert that the mbuf in the chain has sane length. Proper place for
this check is somewhere in the network code, but this assertion
already proven to be useful in catching what seems to be driver bugs
causing NFS scrambling random memory.

Discussed with: rmacklem
MFC after: 1 week


# 5055536e 16-Jan-2013 John Baldwin <jhb@FreeBSD.org>

Use the VA_UTIMES_NULL flag to detect when NULL was passed to utimes()
instead of comparing the desired time against the current time as a
heuristic.

Reviewed by: rmacklem
MFC after: 1 week


# 1f60bfd8 08-Dec-2012 Rick Macklem <rmacklem@FreeBSD.org>

Move the NFSv4.1 client patches over from projects/nfsv4.1-client
to head. I don't think the NFS client behaviour will change unless
the new "minorversion=1" mount option is used. It includes basic
NFSv4.1 support plus support for pNFS using the Files Layout only.
All problems detecting during an NFSv4.1 Bakeathon testing event
in June 2012 have been resolved in this code and it has been tested
against the NFSv4.1 server available to me.
Although not reviewed, I believe that kib@ has looked at it.


# eb1b1807 05-Dec-2012 Gleb Smirnoff <glebius@FreeBSD.org>

Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags within sys.

Exceptions:

- sys/contrib not touched
- sys/mbuf.h edited manually


# c52005a3 19-Sep-2012 Rick Macklem <rmacklem@FreeBSD.org>

Modify the NFSv4 client so that it can handle owner
and owner_group strings that consist entirely of
digits, interpreting them as the uid/gid number.
This change was needed since new (>= 3.3) Linux
servers reply with these strings by default.
This change is mandated by the rfc3530bis draft.
Reported on freebsd-stable@ under the Subject
heading "Problem with Linux >= 3.3 as NFSv4 server"
by Norbert Aschendorff on Aug. 20, 2012.

Tested by: norbert.aschendorff at yahoo.de
Reviewed by: jhb
MFC after: 2 weeks


# f7258644 07-Jan-2012 Rick Macklem <rmacklem@FreeBSD.org>

opt_inet6.h was missing from some files in the new NFS subsystem.
The effect of this was, for clients mounted via inet6 addresses,
that the DRC cache would never have a hit in the server. It also
broke NFSv4 callbacks when an inet6 address was the only one available
in the client. This patch fixes the above, plus deletes opt_inet6.h
from a couple of files it is not needed for.

MFC after: 2 weeks


# 061c683c 16-Jul-2011 Zack Kirsch <zack@FreeBSD.org>

Revert revision 224079 as Rick pointed out that I would be calling VOP_PATHCONF
without the vnode lock held.

Implicitly approved by: zml (mentor)


# a9285ae5 16-Jul-2011 Zack Kirsch <zack@FreeBSD.org>

Add DEXITCODE plumbing to NFS.

Isilon has the concept of an in-memory exit-code ring that saves the last exit
code of a function and allows for stack tracing. This is very helpful when
debugging tough issues.

This patch is essentially a no-op for BSD at this point, until we upstream
the dexitcode logic itself. The patch adds DEXITCODE calls to every NFS
function that returns an errno error code. A number of code paths were also
reorganized to have single exit paths, to reduce code duplication.

Submitted by: David Kwan <dkwan@isilon.com>
Reviewed by: rmacklem
Approved by: zml (mentor)
MFC after: 2 weeks


# a9989634 16-Jul-2011 Zack Kirsch <zack@FreeBSD.org>

Simple find/replace of VOP_UNLOCK -> NFSVOPUNLOCK. This is done so that NFSVOPUNLOCK can be modified later to add enhanced logging and assertions.

Reviewed by: rmacklem
Approved by: zml (mentor)
MFC after: 2 weeks


# 98f234f3 16-Jul-2011 Zack Kirsch <zack@FreeBSD.org>

Simple find/replace of vn_lock -> NFSVOPLOCK. This is done so that NFSVOPLOCK can be modified later to add enhanced logging and assertions.

Reviewed by: rmacklem
Approved by: zml (mentor)
MFC after: 2 weeks


# 51c099f5 16-Jul-2011 Zack Kirsch <zack@FreeBSD.org>

Change loadattr and fillattr to ask the file system for the pathconf variable.

Small modification where VOP_PATHCONF was being called directly.

Reviewed by: rmacklem
Approved by: zml (mentor)
MFC after: 2 weeks


# b008a72c 16-Jul-2011 Zack Kirsch <zack@FreeBSD.org>

Small acl patch to return the aclerror that comes back from nfsrv_dissectacl(). This fixes a problem where ATTRNOTSUPP was being returned instead of BADOWNER.

Reviewed by: rmacklem
Approved by: zml (mentor)
MFC after: 2 weeks


# ff29f3b2 27-May-2011 Rick Macklem <rmacklem@FreeBSD.org>

Fix the new NFS client so that it handles NFSv4 state
correctly during a forced dismount. This required that
the exclusive and shared (refcnt) sleep lock functions check
for MNTK_UMOUNTF before sleeping, so that they won't block
while nfscl_umount() is getting rid of the state. As
such, a "struct mount *" argument was added to the locking
functions. I believe the only remaining case where a forced
dismount can get hung in the kernel is when a thread is
already attempting to do a TCP connect to a dead server
when the krpc client structure called nr_client is NULL.
This will only happen just after a "mount -u" with options
that force a new TCP connection is done, so it shouldn't
be a problem in practice.

MFC after: 2 weeks


# a09001a8 14-Apr-2011 Rick Macklem <rmacklem@FreeBSD.org>

Fix the experimental NFSv4 server so that it uses VOP_PATHCONF()
to determine if a file system supports NFSv4 ACLs. Since
VOP_PATHCONF() must be called with a locked vnode, the function
is called before nfsvno_fillattr() and the result is passed in
as an extra argument.

MFC after: 2 weeks


# 07c0c166 14-Apr-2011 Rick Macklem <rmacklem@FreeBSD.org>

Modify the experimental NFSv4 server so that it handles
crossing of server mount points properly. The functions
nfsvno_fillattr() and nfsv4_fillattr() were modified to
take the extra arguments that are the mount point, a flag
to indicate that it is a file system root and the mounted
on fileno. The mount point argument needs to be busy when
nfsvno_fillattr() is called, since the vp argument is not
locked.

Reviewed by: kib
MFC after: 2 weeks


# 8207db3e 18-Jan-2011 Rick Macklem <rmacklem@FreeBSD.org>

Fix the experimental NFSv4 server so that it uses VOP_ACCESSX()
to check for VREAD_ACL instead of VOP_ACCESS().

MFC after: 3 days


# 5a12538b 01-Jan-2011 Rick Macklem <rmacklem@FreeBSD.org>

Add support for shared vnode locks for the Read operation
in the experimental NFSv4 server.

Reviewed by: kib
MFC after: 2 weeks


# 17891d00 25-Dec-2010 Rick Macklem <rmacklem@FreeBSD.org>

Modify the experimental NFS server so that it uses LK_SHARED
for RPC operations when it can. Since VFS_FHTOVP() currently
always gets an exclusively locked vnode and is usually called
at the beginning of each RPC, the RPCs for a given vnode will
still be serialized. As such, passing a lock type argument to
VFS_FHTOVP() would be preferable to doing the vn_lock() with
LK_DOWNGRADE after the VFS_FHTOVP() call.

Reviewed by: kib
MFC after: 2 weeks


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# 2ec3f925 28-Aug-2010 Rick Macklem <rmacklem@FreeBSD.org>

The timer routine in the experimental NFS server did not acquire
the correct mutex when checking nfsv4root_lock. Although this
could be fixed by adding mutex lock/unlock calls, zack.kirsch at
isilon.com suggested a better fix that uses a non-blocking
acquisition of a reference count on nfsv4root_lock. This fix
allows the weird NFSLOCKSTATE(); NFSUNLOCKSTATE(); synchronization
to be deleted. This patch applies this fix.

Tested by: zack.kirsch at isilon.com
MFC after: 2 weeks


# 066adacf 13-Apr-2010 Rick Macklem <rmacklem@FreeBSD.org>

MFC: r205941
This patch should fix handling of byte range locks locally
on the server for the experimental nfs server. When enabled
by setting vfs.newnfs.locallocks_enable to non-zero, the
experimental nfs server will now acquire byte range locks
on the file on behalf of NFSv4 clients, such that lock
conflicts between the NFSv4 clients and processes running
locally on the server, will be recognized and handled correctly.


# a43fcbe3 30-Mar-2010 Rick Macklem <rmacklem@FreeBSD.org>

This patch should fix handling of byte range locks locally
on the server for the experimental nfs server. When enabled
by setting vfs.newnfs.locallocks_enable to non-zero, the
experimental nfs server will now acquire byte range locks
on the file on behalf of NFSv4 clients, such that lock
conflicts between the NFSv4 clients and processes running
locally on the server, will be recognized and handled correctly.

MFC after: 2 weeks


# 74991298 03-Dec-2009 Edward Tomasz Napierala <trasz@FreeBSD.org>

Remove unneeded ifdefs.

Reviewed by: rmacklem


# c3e22f83 26-May-2009 Rick Macklem <rmacklem@FreeBSD.org>

Fix the experimental nfs subsystem so that it builds with the
current NFSv4 ACLs, as defined in sys/acl.h. It still needs a
way to test a mount point for NFSv4 ACL support before it will
work. Until then, the NFSHASNFS4ACL() macro just always returns 0.

Approved by: kib (mentor)


# dfd233ed 11-May-2009 Attilio Rao <attilio@FreeBSD.org>

Remove the thread argument from the FSD (File-System Dependent) parts of
the VFS. Now all the VFS_* functions and relating parts don't want the
context as long as it always refers to curthread.

In some points, in particular when dealing with VOPs and functions living
in the same namespace (eg. vflush) which still need to be converted,
pass curthread explicitly in order to retain the old behaviour.
Such loose ends will be fixed ASAP.

While here fix a bug: now, UFS_EXTATTR can be compiled alone without the
UFS_EXTATTR_AUTOSTART option.

VFS KPI is heavilly changed by this commit so thirdy parts modules needs
to be recompiled. Bump __FreeBSD_version in order to signal such
situation.


# 9ec7b004 04-May-2009 Rick Macklem <rmacklem@FreeBSD.org>

Add the experimental nfs subtree to the kernel, that includes
support for NFSv4 as well as NFSv2 and 3.
It lives in 3 subdirs under sys/fs:
nfs - functions that are common to the client and server
nfsclient - a mutation of sys/nfsclient that call generic functions
to do RPCs and handle state. As such, it retains the
buffer cache handling characteristics and vnode semantics that
are found in sys/nfsclient, for the most part.
nfsserver - the server. It includes a DRC designed specifically for
NFSv4, that is used instead of the generic DRC in sys/rpc.
The build glue will be checked in later, so at this point, it
consists of 3 new subdirs that should not affect kernel building.

Approved by: kib (mentor)