History log of /freebsd-current/sys/fs/nfsserver/nfs_nfsdserv.c
Revision Date Author Comments
# e2c9fad2 04-Jun-2024 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: Fix delegation handled for atomic upgrade

For NFSv4.1/4.2, an atomic upgrade of a delegation from a
read delegation to a write delegation is allowed and can
result in signoficantly improved performance.

This patch adds support for this atomic upgrade, plus fixes
a couple of other delegation related bugs. Since there were
three cases where delegations were being issued, the patch
factors this out into a separate function called
nfsrv_issuedelegations().

This patch should only affect the NFSv4.1/4.2 behaviour
when delegations are enabled, which is not the default.

MFC after: 1 month


# 3f65000b 04-May-2024 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: Fix Link conformance with RFC8881 for delegations

RFC8881 specifies that, when a Link operation occurs on an
NFSv4, that file delegations issued to other clients must
be recalled. Discovered during a recent discussion on nfsv4@ietf.org.

Although I have not observed a problem caused by not doing
the required delegation recall, it is definitely required
by the RFC, so this patch makes the server do the recall.

Tested during a recent NFSv4 IETF Bakeathon event.

MFC after: 1 week


# 54c3aa02 25-Apr-2024 Rick Macklem <rmacklem@FreeBSD.org>

Revert "nfsd: Fix NFSv4.1/4.2 Claim_Deleg_Cur_FH"

This reverts commit f300335d9aebf2e99862bf783978bd44ede23550.

It turns out that the old code was correct and it was wireshark
that was broken and indicated that the RPC's XDR was bogus.
Found during IETF bakeathon testing this week.


# 748f56c5 15-Mar-2024 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: Add a sysctl to limit NFSv4.2 Copy RPC size

NFSv4.2 supports a Copy operation, which avoids file data being
read to the client and then written back to the server, if both
input and output files are on the same NFSv4.2 mount for
copy_file_range(2).

Unfortunately, this Copy operation can take a long time under
certain circumstances. If this occurs concurrently with a RPC
that requires an exclusive lock on the nfsd such as ExchangeID
done for a new mount, the result can be an nfsd "stall" until
the Copy completes.

This patch adds a sysctl that can be set to limit the size of
a Copy operation or, if set to 0, disable Copy operations.

The use of this sysctl and other ways to avoid Copy operations
taking too long will be documented in the nfsd.4 man page by
a separate commit.

MFC after: 2 weeks


# f300335d 19-Oct-2023 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: Fix NFSv4.1/4.2 Claim_Deleg_Cur_FH

When I implemented a test patch using Open Claim_Deleg_Cur_FH
I discovered that the NFSv4.1/4.2 server was broken for this
Open option. Fortunately it is never used by the FreeBSD
client and never used by other clients unless delegations
are enabled. (The FreeBSD NFSv4 server does not have delegations
enabled by default.)

Claim_Deleg_Cur_FH was broken because the code mistakenly
assumed a stateID argument, which is not the case.
This patch fixes the bug by changing the XDR parser to not
expect a stateID and to fill most of the stateID in from the
clientID. The clientID is the first two elements of the "other"
array for the stateID and is sufficient to identify which
client the delegation is issued to. Since there is only one
delegation issued to a client per file, this is sufficient to
locate the correct delegation.

If you are running non-FreeBSD NFSv4.1/4.2 mounts against the
FreeBSD server, you need this patch if you have delegations enabled.

PR: 274574
MFC after: 2 weeks


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# ba8cc6d7 12-Mar-2023 Mateusz Guzik <mjg@FreeBSD.org>

vfs: use __enum_uint8 for vtype and vstate

This whacks hackery around only reading v_type once.

Bump __FreeBSD_version to 1400093


# ff2f1f69 07-Apr-2023 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: Add support for the SP4_MACH_CRED case in ExchangeID

Commit f4179ad46fa4 added support for operation bitmaps for
NFSv4.1/4.2. This commit uses those to implement the SP4_MACH_CRED
case for the NFSv4.1/4.2 ExchangeID operation since the Linux
NFSv4.1/4.2 client is now using this for Kerberized mounts.
The Linux Kerberized NFSv4.1/4.2 mounts currently work without
support for this because Linux will fall back to SP4_NONE,
but there is no guarantee this fallback will work forever.

This commit only affects Kerberized NFSv4.1/4.2 mounts from
Linux at this time.

MFC after: 3 months


# 695d87ba 28-Mar-2023 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Make coverity happy

Coverity does not like code that checks a function's
return value sometimes. Add "(void)" in front of the
function when the return value does not matter to try
and make it happy.

A recent commit deleted "(void)"s in front of nfsm_fhtom().
This commit puts them back in.

Reported by: emaste
MFC after: 3 months


# 896516e5 16-Mar-2023 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Add a new NFSv4.1/4.2 mount option for Kerberized mounts

Without this patch, a Kerberized NFSv4.1/4.2 mount must provide
a Kerberos credential for the client at mount time. This credential
is typically referred to as a "machine credential". It can be
created one of two ways:
- The user (usually root) has a valid TGT at the time the mount
is done and this becomes the machine credential.
There are two problems with this.
1 - The user doing the mount must have a valid TGT for a user
principal at mount time. As such, the mount cannot be put
in fstab(5) or similar.
2 - When the TGT expires, the mount breaks.
- The client machine has a service principal in its default keytab
file and this service principal (typically called a host-based
initiator credential) is used as the machine credential.
There are problems with this approach as well:
1 - There is a certain amount of administrative overhead creating
the service principal for the NFS client, creating a keytab
entry for this principal and then copying the keytab entry
into the client's default keytab file via some secure means.
2 - The NFS client must have a fixed, well known, DNS name, since
that FQDN is in the service principal name as the instance.

This patch uses a feature of NFSv4.1/4.2 called SP4_NONE, which
allows the state maintenance operations to be performed by any
authentication mechanism, to do these operations via AUTH_SYS
instead of RPCSEC_GSS (Kerberos). As such, neither of the above
mechanisms is needed.

It is hoped that this option will encourage adoption of Kerberized
NFS mounts using TLS, to provide a more secure NFS mount.

This new NFSv4.1/4.2 mount option, called "syskrb5" must be used
with "sec=krb5[ip]" to avoid the need for either of the above
Kerberos setups to be done by the client.

Note that all file access/modification operations still require
users on the NFS client to have a valid TGT recognized by the
NFSv4.1/4.2 server. As such, this option allows, at most, a
malicious client to do some sort of DOS attack.

Although not required, use of "tls" with this new option is
encouraged, since it provides on-the-wire encryption plus,
optionally, client identity verification via a X.509
certificate provided to the server during TLS handshake.
Alternately, "sec=krb5p" does provide on-the-wire
encryption of file data.

A mount_nfs(8) man page update will be done in a separate commit.

Discussed on: freebsd-current@
MFC after: 3 months


# ded5f295 08-Feb-2023 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: Fix handling of the error case for nfsvno_open

Using done_namei instead of ni_startdir did not
fix the crashes reported in the PR. Upon looking
more closely at the code, the only case where the
code near the end of nfsvno_open() needs to be
executed is when nfsvno_namei() has succeeded,
but a subsequent error was detected.

This patch uses done_namei to indicate this case.

Also, nfsvno_relpathbuf() should only be called for
this case and not whenever nfsvno_open() is called
with nd_repstat != 0. A bug was introduced here when
the HASBUF flag was deleted.

Reviewed by: mjg
PR: 268971
Tested by: ish@amail.plala.or.jp
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D38430


# dcfa3ee4 12-Jan-2023 Rick Macklem <rmacklem@FreeBSD.org>

nfsserver: Fix vrele() panic in nfsvno_open()

Commit 65127e982b94 removed a check for ni_startdir != NULL.
This allowed the vrele(ndp->ni_dvp) to be called with
a NULL argument.

This patch adds a new boolean argument to nfsvno_open()
that can be checked instead of ni_startdir, since mjg@ requested
that ni_startdir not be used. (Discussed in PR#268828.)

PR: 268828
Reviewed by: mjg
Differential Revision: https://reviews.freebsd.org/D38032


# 6fd6a0e3 23-Dec-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: Handle file systems without a VOP_VPTOFH()

Unlike NFSv3, the NFSv4 server follows mount points
within the file system tree below the NFSv4 root directory.
If there is a file system mounted within this subtree
that returns EOPNOTSUPP for VOP_VPTOFH(), the NFSv4 server
would return an error for the mount point entry.
This resulted in an "I/O error" report from the Linux NFSv4
client. It also put an error code in the Readdir reply
that is not defined in the NFSv4 RFCs.

For the FreeBSD NFSv4 client, the entry with the error would
be ignored, which I think is reasonable behaviour for a
mounted file system that can never be exported.

This patch changes the NFSv4 server behaviour to ignore the
mount point entry and not send it in the Readdir reply.
It also changes the behaviour of Lookup for the entry so
that it replies ENOENT for the mount point directory, so
that it is consistent with no entry in the Readdir reply.

With these two changes, the Linux client behaviour is the
same as the FreeBSD client behaviour. It also avoids
putting an unknown error on the wire to the client.

MFC after: 1 week


# 65127e98 09-Nov-2022 Mateusz Guzik <mjg@FreeBSD.org>

nfs: stop using SAVESTART

Only the name is wanted which is already always provided.

Reviewed by: rmacklem
Tested by: pho, rmacklem
Differential Revision: https://reviews.freebsd.org/D34470


# bf312482 08-Nov-2022 Gordon Bergling <gbe@FreeBSD.org>

nfs: Fix common typos in source code comments

- s/attrbute/attribute/

MFC after: 3 days


# 5b5b7e2c 17-Sep-2022 Mateusz Guzik <mjg@FreeBSD.org>

vfs: always retain path buffer after lookup

This removes some of the complexity needed to maintain HASBUF and
allows for removing injecting SAVENAME by filesystems.

Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D36542


# 5d3fe02c 22-Jun-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: Clean up the code by not using the vnode_vtype() macro

The vnode_vtype() macro was used to make the code compatible
with Mac OSX, for the Mac OSX port.
For FreeBSD, this macro just obscured the code, so
avoid using it to clean up the code.

This commit should not result in a semantics change.


# 0586a129 20-Jun-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Clean up the code by removing vfs_flags() macro

The vfs_flags() macro was used to make the code compatible
with Mac OSX, for the Mac OSX port.
For FreeBSD, this macro just obscured the code, so
remove it to clean up the code.

This commit should not result in a semantics change.


# 47d75c29 01-May-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: Add a sanity check to SecinfoNoname for file type

Robert Morris reported that, for the case of SecinfoNoname
with the Parent option, providing a non-directory could
cause a crash.

This patch adds a sanity check for v_type == VDIR for
this case, to avoid the crash.

Reported by: rtm@lcs.mit.edu
PR: 260300
MFC after: 2 weeks


# e2fe58d6 02-Feb-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: Allow file owners to perform Open(Delegate_cur)

Commit b0b7d978b6a8 changed the NFSv4 server's default
behaviour to check the file's mode or ACL for permission to
open the file, to be Linux and Solaris compatible.
However, it turns out that Linux makes an exception for
the case of Claim_delegate_cur(_fh).

When a NFSv4 client is returning a delegation, it must
acquire Opens against the server to replace the ones
done locally in the client. The client does this via
an Open operation with Claim_delegate_cur(_fh). If
this operation fails, due to a change to the file's
mode or ACL after the delegation was issued, the
client does not have any way to retain the open.

As such, the Linux client allows the file's owner
to perform an Open with Claim_delegate_cur(_fh)
no matter what the mode or ACL allows.

This patch makes the FreeBSD server allow this case,
to be Linux compatible.

This patch only affects the case where delegations
are enabled, which is not the default.

MFC after: 2 weeks


# db0ac6de 02-Dec-2021 Cy Schubert <cy@FreeBSD.org>

Revert "wpa: Import wpa_supplicant/hostapd commit 14ab4a816"

This reverts commit 266f97b5e9a7958e365e78288616a459b40d924a, reversing
changes made to a10253cffea84c0c980a36ba6776b00ed96c3e3b.

A mismerge of a merge to catch up to main resulted in files being
committed which should not have been.


# 33d0be8a 01-Dec-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: Do not try to cache a reply for NFSERR_BADSLOT

When nfsrv_checksequence() replies NFSERR_BADSLOT,
the value of nd_slotid is not valid. As such, the
reply cannot be cached in the session.
Do not set ND_HASSEQUENCE for this case.

Reported by: rtm@lcs.mit.edu
Tested by: rtm@lcs.mit.edu
PR: 260076
MFC after: 2 weeks


# 638b90a1 28-Nov-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfs: Quiet a few "unused" warnings

For most of these warnings, the variable is loaded
with data parsed out of an RPC messages. In case
the data is useful in the future, I just marked
these with __unused.


# 5b430a13 26-Nov-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: Sanity check the len argument for ListXattr

The check for the original len being >= retlen needs to
be done before the "if (nd->nd_repstat == 0)" code, so
that it can be reported as too small.

Reported by: rtm@lcs.mit.edu
Tested by: rtm@lcs.mit.edu
PR: 260046
MFC after: 2 weeks


# bdd57cbb 26-Nov-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: Add checks for layout errors in LayoutReturn

For a LayoutReturn when using the Flexible File Layout,
error reports may be provided in the request.
Sanity check the size of these error reports and
check that they exist before calling nfsrv_flexlayouterr().

Reported by: rtm@lcs.mit.edu
Tested by: rtm@lcs.mit.edu
PR: 260012
MFC after: 2 weeks


# f8dc0630 08-Nov-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: Fix the NFSv4.2 pNFS MDS server for NFSERR_NOSPC via LayoutError

If a pNFS server's DS runs out of disk space, it replies
NFSERR_NOSPC to the client doing writing. For the Linux
client, it then sends a LayoutError RPC to the MDS server to
tell it about the error and keeps retrying, doing repeated
LayoutGets to the MDS and Write RPCs to the DS. The Linux client is
"stuck" until disk space on the DS is free'd up unless
a subsequent LayoutGet request is sent a NFSERR_NOSPC
reply.
The looping problem still occurs for NFSv4.1 mounts, but no
fix for this is known at this time.

This patch changes the pNFS MDS server to reply to LayoutGet
operations with NFSERR_NOSPC once a LayoutError reports the
problem, until the DS has available space. This keeps the Linux
NFSv4.2 from looping.

Found during recent testing because of issues w.r.t. a DS
being out of space found during a recent IEFT NFSv4 working
group testing event.

MFC after: 2 weeks


# a7e014ee 07-Nov-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: Fix the NFSv4 pNFS MDS server for DS NFSERR_NOSPC

If a pNFS server's DS runs out of disk space, it replies
NFSERR_NOSPC to the client doing writing. For the Linux
client, it then sends a LayoutError RPC to the server to
tell it about the error and keeps retrying, doing repeated
LayoutGet and Write RPCs to the DS. The Linux client is
"stuck" until disk space on the DS is free'd up.
For a mirrored server configuration, the first mirror that
ran out of space was taken offline. This does not make
much sense, since the other mirror(s) will run out of space
soon and the fix is a manual cleanup up disk space.

This patch changes the pNFS server to not disable a mirror
for the mirrored case when this occurs.

Further work is needed, since the Linux client expects the
MDS to reply NFSERR_NOSPC to LayoutGets once the DS is out
of space. Without this further change, the above mentioned
looping occurs.

Found during a recent IEFT NFSv4 working group testing event.

MFC after: 2 weeks


# dfe887b7 10-Oct-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: Disable the NFSv4.2 Allocate operation by default

Some exported file systems, such as ZFS ones, cannot do VOP_ALLOCATE().
Since an NFSv4.2 server must either support the Allocate operation for
all file systems or not support it at all, define a sysctl called
vfs.nfsd.enable_v42allocate to enable the Allocate operation.
This sysctl is false by default and can only be set true if all
exported file systems (or all DSs for a pNFS server) can perform
VOP_ALLOCATE().

Unfortunately, there is no way to know if a ZFS file system will
be exported once the nfsd is operational, even if there are none
exported when the nfsd is started up, so enabling Allocate must
be done manually for a server configuration.

This problem was detected during a recent NFSv4 interoperability
testing event held by the IETF working group.

MFC after: 2 weeks


# ef7d2c1f 01-Oct-2021 Mateusz Guzik <mjg@FreeBSD.org>

nfs: eliminate thread argument from nfsvno_namei

This is a step towards retiring struct componentname cn_thread

Reviewed by: rmacklem
Differential Revision: https://reviews.freebsd.org/D32267


# 272c4a4d 14-Sep-2021 Alexander Motin <mav@FreeBSD.org>

Allow setting NFS server scope and owner.

By default NFS server reports as scope and owner major the host UUID
value and zero for owner minor. It works good in case of standalone
server. But in case of CARP-based HA cluster failover the values
should remain persistent, otherwise some clients like VMware ESXi
get confused by the change and fail to reconnect automatically.

The patch makes server scope, major owner and minor owner values
configurable via sysctls. If not set (by default) the host UUID
value is still used.

Reviewed by: rmacklem
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D31952


# f1c8811d 08-Sep-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: Fix build after commit 103b207536f9 for 32bit arches

MFC after: 2 weeks


# 103b2075 08-Sep-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: Use the COPY_FILE_RANGE_TIMEO1SEC flag

Although it is not specified in the RFCs, the concept that
the NFSv4 server should reply to an RPC request within a
reasonable time is accepted practice within the NFSv4 community.

Without this patch, the NFSv4.2 server attempts to reply to
a Copy operation within 1 second by limiting the copy to
vfs.nfs.maxcopyrange bytes (default 10Mbytes). This is crude at
best, given the large variation in I/O subsystem performance.

This patch uses the COPY_FILE_RANGE_TIMEO1SEC flag added by
commit c5128c48df3c to limit the reply time for a Copy
operation to approximately 1 second.

MFC after: 2 weeks


# bb958dcf 26-Aug-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: Add support for the NFSv4.2 Deallocate operation

The recently added VOP_DEALLOCATE(9) VOP call allows
implementation of the Deallocate NFSv4.2 operation.

Since the Deallocate operation is a single succeed/fail
operation, the call to VOP_DEALLOCATE(9) loops so long
as progress is being made. It calls maybe_yield()
between loop iterations to allow other processes
to preempt it.

Where RFC 7862 underspecifies behaviour, the code
is written to be Linux NFSv4.2 server compatible.

Reviewed by: khng
Differential Revision: https://reviews.freebsd.org/D31624


# 06afb53b 12-Aug-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: Fix sanity check for NFSv4.2 Allocate operations

The NFSv4.2 Allocate operation sanity checks the aa_offset
and aa_length arguments. Since they are assigned to variables
of type off_t (signed) it was possible for them to be negative.
It was also possible for aa_offset+aa_length to exceed OFF_MAX
when stored in lo_end, which is uint64_t.

This patch adds checks for these cases to the sanity check.

Reviewed by: kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D31511


# ee29e6f3 16-Jul-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: Add sysctl to set maximum I/O size up to 1Mbyte

Since MAXPHYS now allows the FreeBSD NFS client
to do 1Mbyte I/O operations, add a sysctl called vfs.nfsd.srvmaxio
so that the maximum NFS server I/O size can be set up to 1Mbyte.
The Linux NFS client can also do 1Mbyte I/O operations.

The default of 128Kbytes for the maximum I/O size has
not been changed for two reasons:
- kern.ipc.maxsockbuf must be increased to support 1Mbyte I/O
- The limited benchmarking I can do actually shows a drop in I/O rate
when the I/O size is above 256Kbytes.
However, daveb@spectralogic.com reports seeing an increase
in I/O rate for the 1Mbyte I/O size vs 128Kbytes using a Linux client.

Reviewed by: asomers
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D30826


# a5df139e 05-Jun-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: Fix when NFSERR_WRONGSEC may be replied to NFSv4 clients

Commit d224f05fcfc1 pre-parsed the next operation number for
the put file handle operations. This patch uses this next
operation number, plus the type of the file handle being set by
the put file handle operation, to implement the rules in RFC5661
Sec. 2.6 with respect to replying NFSERR_WRONGSEC.

This patch also adds a check to see if NFSERR_WRONGSEC should be
replied when about to perform Lookup, Lookupp or Open with a file
name component, so that the NFSERR_WRONGSEC reply is done for
these operations, as required by RFC5661 Sec. 2.6.

This patch does not have any practical effect for the FreeBSD NFSv4
client and I believe that the same is true for the Linux client,
since NFSERR_WRONGSEC is considered a fatal error at this time.

MFC after: 2 weeks


# 56e9d8e3 04-Jun-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: Fix NFSv4.1/4.2 Secinfo_no_name when security flavors empty

Commit 947bd2479ba9 added support for the Secinfo_no_name operation.
When a non-exported file system is being traversed, the list of
security flavors is empty. It turns out that the Linux client
mount attempt fails when the security flavors list in the
Secinfo_no_name reply is empty.

This patch modifies Secinfo/Secinfo_no_name so that it replies
with all four security flavors when the list is empty.
This fixes Linux NFSv4.1/4.2 mounts when the file system at
the NFSv4 root (as specified on a V4: exports(5) line) is
not exported.

MFC after: 2 weeks


# 984c71f9 02-Jun-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: Fix the failure return for non-fh NFSv4 operations

Without this patch, nfsd_checkrootexp() returns failure
and then the NFSv4 operation would reply NFSERR_WRONGSEC.
RFC5661 Sec. 2.6 only allows a few NFSv4 operations, none
of which call nfsv4_checktootexp(), to return NFSERR_WRONGSEC.
This patch modifies nfsd_checkrootexp() to return the
error instead of a boolean and sets the returned error to an RPC
layer AUTH_ERR, as discussed on nfsv4@ietf.org.
The patch also fixes nfsd_errmap() so that the pseudo
error NFSERR_AUTHERR is handled correctly such that an RPC layer
AUTH_ERR is replied to the NFSv4 client.

The two new "enum auth_stat" values have not yet been assigned
by IANA, but are the expected next two values.

The effect on extant NFSv4 clients of this change appears
limited to reporting a different failure error when a
mount that does not use adequate security is attempted.

MFC after: 2 weeks


# 1d4afcac 31-May-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: Delete extraneous NFSv4 root checks

There are several NFSv4.1/4.2 server operation functions which
have unneeded checks for the NFSv4 root being set up.
The checks are not needed because the operations always follow
a Sequence operation, which performs the check.

This patch deletes these checks, simplifying the code so
that a future patch that fixes the checks to conform with
RFC5661 Sec. 2.6 will be less extension.

MFC after: 2 weeks


# 947bd247 30-May-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: Add support for the NFSv4.1/4.2 Secinfo_no_name operation

The Linux client is now attempting to use the Secinfo_no_name
operation for NFSv4.1/4.2 mounts. Although it does not seem to
mind the NFSERR_NOTSUPP reply, adding support for it seems
reasonable.

I also noticed that "savflag" needed to be 64bits in
nfsrvd_secinfo() since nd_flag in now 64bits, so I changed
the declaration of it there. I also added code to set "vp" NULL
after performing Secinfo/Secinfo_no_name, since these
operations consume the current FH, which is represented
by "vp" in nfsrvd_compound().

Fixing when the server replies NFSERR_WRONGSEC so that
it conforms to RFC5661 Sec. 2.6 still needs to be done
in a future commit.

MFC after: 2 weeks


# d80a903a 20-May-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: Add support for CLAIM_DELEG_PREV_FH to the NFSv4.1/4.2 Open

Commit b3d4c70dc60f added support for CLAIM_DELEG_CUR_FH to Open.
While doing this, I noticed that CLAIM_DELEG_PREV_FH support
could be added the same way. Although I am not aware of any extant
NFSv4.1/4.2 client that uses this claim type, it seems prudent to add
support for this variant of Open to the NFSv4.1/4.2 server.

This patch does not affect mounts from extant NFSv4.1/4.2 clients,
as far as I know.

MFC after: 2 weeks


# b3d4c70d 18-May-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: Add support for CLAIM_DELEG_CUR_FH to the NFSv4.1/4.2 Open

The Linux NFSv4.1/4.2 client now uses the CLAIM_DELEG_CUR_FH
variant of the Open operation when delegations are recalled and
the client has a local open of the file. This patch adds
support for this variant of Open to the NFSv4.1/4.2 server.

This patch only affects mounts from Linux clients when delegations
are enabled on the server.

MFC after: 2 weeks


# dd02d9d6 07-May-2021 Rick Macklem <rmacklem@FreeBSD.org>

nfscl: Add support for va_birthtime to NFSv4

There is a NFSv4 file attribute called TimeCreate
that can be used for va_birthtime.
r362175 added some support for use of TimeCreate.
This patch completes support of va_birthtime by adding
support for setting this attribute to the server.
It also eanbles the client to
acquire and set the attribute for a NFSv4
server that supports the attribute.

Reviewed by: markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D30156


# 586ee69f 01-Sep-2020 Mateusz Guzik <mjg@FreeBSD.org>

fs: clean up empty lines in .c and .h files


# 0bb42627 31-Aug-2020 Eric van Gyzen <vangyzen@FreeBSD.org>

Fix nfsrvd_locku memory leak

Coverity detected memory leak fix.

Submitted by: bret_ketchum@dell.com
Reported by: Coverity
Reviewed by: rmacklem
MFC after: 2 weeks
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D26231


# 6e4b6ff8 27-Aug-2020 Rick Macklem <rmacklem@FreeBSD.org>

Add flags to enable NFS over TLS to the NFS client and server.

An Internet Draft titled "Towards Remote Procedure Call Encryption By Default"
(soon to be an RFC I think) describes how Sun RPC is to use TLS with NFS
as a specific application case.
Various commits prepared the NFS code to use KERN_TLS, mainly enabling use
of ext_pgs mbufs for large RPC messages.
r364475 added TLS support to the kernel RPC.

This commit (which is the final one for kernel changes required to do
NFS over TLS) adds support for three export flags:
MNT_EXTLS - Requires a TLS connection.
MNT_EXTLSCERT - Requires a TLS connection where the client presents a valid
X.509 certificate during TLS handshake.
MNT_EXTLSCERTUSER - Requires a TLS connection where the client presents a
valid X.509 certificate with "user@domain" in the otherName
field of the SubjectAltName during TLS handshake.
Without these export options, clients are permitted, but not required, to
use TLS.

For the client, a new nmount(2) option called "tls" makes the client do
a STARTTLS Null RPC and TLS handshake for all TCP connections used for the
mount. The CLSET_TLS client control option is used to indicate to the kernel RPC
that this should be done.

Unless the above export flags or "tls" option is used, semantics should
not change for the NFS client nor server.

For NFS over TLS to work, the userspace daemons rpctlscd(8) { for client }
or rpctlssd(8) daemon { for server } must be running.


# cb889ce6 31-Jul-2020 Rick Macklem <rmacklem@FreeBSD.org>

Add optional support for ext_pgs mbufs to the NFS server's read, readlink
and getxattr operations.

This patch optionally enables generation of read, readlink and getxattr replies
in ext_pgs mbufs. Since neither of ND_EXTPG or ND_TLS are currently ever set,
there is no change in semantics at this time.
It also corrects the message in a couple of panic()s that should never occur.

This is another in the series of commits that add support to the NFS client
and server for building RPC messages in ext_pgs mbufs with anonymous pages.
This is useful so that the entire mbuf list does not need to be
copied before calling sosend() when NFS over TLS is enabled.

Use of ext_pgs mbufs will not be enabled until the kernel RPC is updated
to handle TLS.


# 18a48314 25-Jul-2020 Rick Macklem <rmacklem@FreeBSD.org>

Add support for ext_pgs mbufs to nfsrv_adj().

This patch uses a slightly different algorithm for nfsrv_adj()
since ext_pgs mbuf lists are not permitted to have m_len == 0 mbufs.
As such, the code now frees mbufs after the adjustment in the list instead
of setting their m_len field to 0.
Since mbuf(s) may be trimmed off the tail of the list, the function now
returns a pointer to the last mbuf in the list. This saves the caller
from needing to use m_last() to find the last mbuf.
It also implies that it might return a nul list, which required a check for
that in nfsrvd_readlink().

This is another in the series of commits that add support to the NFS client
and server for building RPC messages in ext_pgs mbufs with anonymous pages.
This is useful so that the entire mbuf list does not need to be
copied before calling sosend() when NFS over TLS is enabled.

Use of ext_pgs mbufs will not be enabled until the kernel RPC is updated
to handle TLS.


# eea79fde 17-Jun-2020 Alan Somers <asomers@FreeBSD.org>

Remove vfs_statfs and vnode_mount macros from NFS

These macro definitions are no longer needed as the NFS OSX port is long
dead. The vfs_statfs macro conflicts with the vfsops field of the same
name.

Submitted by: shivank@
Reviewed by: rmacklem
MFC after: 2 weeks
Sponsored by: Google, Inc. (GSoC 2020)
Differential Revision: https://reviews.freebsd.org/D25263


# b9cc3262 12-May-2020 Ryan Moeller <freqlabs@FreeBSD.org>

nfs: Remove APPLESTATIC macro

It is no longer useful.

Reviewed by: rmacklem
Approved by: mav (mentor)
MFC after: 1 week
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D24811


# 32033b3d 08-May-2020 Ryan Moeller <freqlabs@FreeBSD.org>

Remove APPLEKEXT ifndefs

They are no longer useful.

Reviewed by: rmacklem
Approved by: mav (mentor)
MFC after: 1 week
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D24752


# ae070589 17-Apr-2020 Rick Macklem <rmacklem@FreeBSD.org>

Replace all instances of the typedef mbuf_t with "struct mbuf *".

The typedef mbuf_t was used for the Mac OS/X port of the code long ago.
Since this port is no longer used and the use of mbuf_t obscures what
the code does (and is not consistent with style(9)), it is no longer needed.
This patch replaces all instances of mbuf_t with "struct mbuf *", so that
it is no longer used.

This patch should not result in any semantic change.


# 0bda1ddd 15-Apr-2020 Rick Macklem <rmacklem@FreeBSD.org>

Fix the NFSv4.2 extended attribute support for remove extended attrbute.

I missed the "atomic" field of the RemoveExtendedAttribute operation's
reply when I implemented it. It worked between FreeBSD client and server,
since it was missed for both, but it did not conform to RFC 8276.
This patch adds the field for both client and server.

Thanks go to Frank for doing interoperability testing of the extended
attribute support against patches for Linux.

Submitted by: Frank van der Linden <fllinden@amazon.com>
Reported by: Frank van der Linden <fllinden@amazon.com>


# fb8ed4c5 14-Apr-2020 Rick Macklem <rmacklem@FreeBSD.org>

Fix the NFSv2 extended attribute support to handle 0 length attributes.

I did not realize that zero length attributes are allowed, but they are.
This patch fixes the NFSv4.2 client and server to handle zero length
extended attributes correctly.

Submitted by: Frank van der Linden <fllinden@amazon.com> (earlier version)
Reported by: Frank van der Linden <fllinder@amazon.com>


# 9f6624d3 11-Apr-2020 Rick Macklem <rmacklem@FreeBSD.org>

Replace mbuf macros with the code they would generate in the NFS code.

When the code was ported to Mac OS/X, mbuf handling functions were
converted to using the Mac OS/X accessor functions. For FreeBSD, they
are a simple set of macros in sys/fs/nfs/nfskpiport.h.
Since porting to Mac OS/X is no longer a consideration, replacement of
these macros with the code generated by them makes the code more
readable.
When support for external page mbufs is added as needed by the KERN_TLS,
the patch becomes simpler if done without the macros.

This patch should not result in any semantic change.


# b0b7d978 07-Apr-2020 Rick Macklem <rmacklem@FreeBSD.org>

Fix an interoperability issue w.r.t. the Linux client and the NFSv4 server.

Luoqi Chen reported a problem on freebsd-fs@ where a Linux NFSv4 client
was able to open and write to a file when the file's permissions were
not set to allow the owner write access.

Since NFS servers check file permissions on every write RPC, it is standard
practice to allow the owner of the file to do writes, regardless of
file permissions. This provides POSIX like behaviour, since POSIX only
checks permissions upon open(2).
The traditional way NFS clients handle this is to check access via the
Access operation/RPC and use that to determine if an open(2) on the
client is allowed.

It appears that, for NFSv4, the Linux client expects the NFSv4 Open (not a
POSIX open) operation to fail with NFSERR_ACCES if the file is not being
created and file permissions do not allow owner access, unlike NFSv3.
Since both the Linux and OpenSolaris NFSv4 servers seem to exhibit this
behaviour, this patch changes the FreeBSD NFSv4 server to do the same.
A sysctl called vfs.nfsd.v4openaccess can be set to 0 to return the
NFSv4 server to its previous behaviour.

Since both the Linux and FreeBSD NFSv4 clients seem to exhibit correct
behaviour with the access check for file owner in Open enabled, it is enabled
by default.

Reported by: luoqi.chen@gmail.com
MFC after: 2 weeks


# b249ce48 03-Jan-2020 Mateusz Guzik <mjg@FreeBSD.org>

vfs: drop the mostly unused flags argument from VOP_UNLOCK

Filesystems which want to use it in limited capacity can employ the
VOP_UNLOCK_FLAGS macro.

Reviewed by: kib (previous version)
Differential Revision: https://reviews.freebsd.org/D21427


# f808cf72 13-Dec-2019 Rick Macklem <rmacklem@FreeBSD.org>

Silence some "might not be initialized" warnings for riscv64.

None of these case were actually using the variable(s) uninitialized, but
I figured that silencing the warnings via initializing them made sense.

Some of these predated r355677.


# c057a378 12-Dec-2019 Rick Macklem <rmacklem@FreeBSD.org>

Add support for NFSv4.2 to the NFS client and server.

This patch adds support for NFSv4.2 (RFC-7862) and Extended Attributes
(RFC-8276) to the NFS client and server.
NFSv4.2 is comprised of several optional features that can be supported
in addition to NFSv4.1. This patch adds the following optional features:
- posix_fadvise(POSIX_FADV_WILLNEED/POSIX_FADV_DONTNEED)
- posix_fallocate()
- intra server file range copying via the copy_file_range(2) syscall
--> Avoiding data tranfer over the wire to/from the NFS client.
- lseek(SEEK_DATA/SEEK_HOLE)
- Extended attribute syscalls for "user" namespace attributes as defined
by RFC-8276.

Although this patch is fairly large, it should not affect support for
the other versions of NFS. However it does add two new sysctls that allow
a sysadmin to limit which minor versions of NFSv4 a server supports, allowing
a sysadmin to disable NFSv4.2.

Unfortunately, when the NFS stats structure was last revised, it was assumed
that there would be no additional operations added beyond what was
specified in RFC-7862. However RFC-8276 did add additional operations,
forcing the NFS stats structure to revised again. It now has extra unused
entries in all arrays, so that future extensions to NFSv4.2 can be
accomodated without revising this structure again.

A future commit will update nfsstat(1) to report counts for the new NFSv4.2
specific operations/procedures.

This patch affects the internal interface between the nfscommon, nfscl and
nfsd modules and, as such, they all must be upgraded simultaneously.
I will do a version bump (although arguably not needed), due to this.

This code has survived a "make universe" but has not been built with a
recent GCC. If you encounter build problems, please email me.

Relnotes: yes


# abd80ddb 08-Dec-2019 Mateusz Guzik <mjg@FreeBSD.org>

vfs: introduce v_irflag and make v_type smaller

The current vnode layout is not smp-friendly by having frequently read data
avoidably sharing cachelines with very frequently modified fields. In
particular v_iflag inspected for VI_DOOMED can be found in the same line with
v_usecount. Instead make it available in the same cacheline as the v_op, v_data
and v_type which all get read all the time.

v_type is avoidably 4 bytes while the necessary data will easily fit in 1.
Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new
flag field with a new value: VIRF_DOOMED.

Reviewed by: kib, jeff
Differential Revision: https://reviews.freebsd.org/D22715


# b4372164 19-Apr-2019 Rick Macklem <rmacklem@FreeBSD.org>

Add support for the ModeSetMasked attribute to the NFSv4.1 server.

I do not know of an extant NFSv4.1 client that currently does a Setattr
operation for the ModeSetMasked, but it has been discussed on the linux-nfs
mailing list.
This patch adds support for doing a Setattr of ModeSetMasked, so that it
will work for any future NFSv4.1 client that chooses to do so.
Tested via a hacked FreeBSD NFSv4.1 client.

MFC after: 2 weeks


# b4645807 19-Apr-2019 Rick Macklem <rmacklem@FreeBSD.org>

Replace "vp" with NULL to make the code more readable.

At the time of this nfsv4_sattr() call, "vp == NULL", so this patch doesn't
change the semantics, but I think it makes the code more readable.
It also makes it consistent with the nfsv4_sattr() call a few lines above
this one. Found during code inspection.

MFC after: 2 weeks


# ed2f1001 13-Apr-2019 Rick Macklem <rmacklem@FreeBSD.org>

Add support for INET6 addresses to the kernel code that dumps open/lock state.

PR#223036 reported that INET6 callback addresses were not printed by
nfsdumpstate(8). This kernel patch adds INET6 addresses to the dump structure,
so that nfsdumpstate(8) can print them out, post-r346190.
The patch also includes the addition of #ifdef INET, INET6 as requested
by bz@.

PR: 223036
Reviewed by: bz, rgrimes
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D19839


# 01c27978 04-Mar-2019 Edward Tomasz Napierala <trasz@FreeBSD.org>

Don't pass td to nfsvno_open().

MFC after: 2 weeks
Sponsored by: DARPA, AFRL


# 127152fe 04-Mar-2019 Edward Tomasz Napierala <trasz@FreeBSD.org>

Don't pass td to nfsvno_createsub().

MFC after: 2 weeks
Sponsored by: DARPA, AFRL


# 5edc9102 04-Mar-2019 Edward Tomasz Napierala <trasz@FreeBSD.org>

Don't pass td to nfsd_fhtovp(), it's unused.

Reviewed by: rmacklem (earlier version)
MFC after: 2 weeks
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D19421


# af444b18 04-Mar-2019 Edward Tomasz Napierala <trasz@FreeBSD.org>

Push down the thread argument in NFS server code, using curthread
instead of passing it explicitly. No functional changes

Reviewed by: rmacklem (earlier version)
MFC after: 2 weeks
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D19419


# 8014c971 29-Jul-2018 Rick Macklem <rmacklem@FreeBSD.org>

Silence newer gcc warnings.

Newer versions of gcc generate "set, but not used" warnings in the NFS server.
Add __unused macros to silence these warnings.

Requested by: mmacy


# a3e709cd 28-Jul-2018 Rick Macklem <rmacklem@FreeBSD.org>

Modify the NFSv4.1 server so that it allows ReclaimComplete as done by ESXi 6.7.

I believe that a ReclaimComplete with rca_one_fs == TRUE is only
to be used after a file system has been transferred to a different
file server. However, RFC5661 is somewhat vague w.r.t. this and
the ESXi 6.7 client does both a ReclaimComplete with rca_one_fs == TRUE
and one with ReclaimComplete with rca_one_fs == FALSE.
Therefore, just ignore the rca_one_fs == TRUE operation and return
NFS_OK without doing anything instead of replying NFS4ERR_NOTSUPP.
This allows the ESXi 6.7 NFSv4.1 client to do a mount.
After discussion on the NFSv4 IETF working group mailing list, doing this
along with setting a flag to note that a ReclaimComplete with rca_one_fs TRUE
was an appropriate way to handle this.
The flag that indicates that a ReclaimComplete with rca_one_fs == TRUE was
done may be used to disable replies of NFS4ERR_GRACE for non-reclaim
state operations in a future commit.

This patch along with r332790, r334492 and r336357 allow ESXi 6.7 NFSv4.1 mounts
work ok. ESX 6.5 NFSv4.1 mounts do not work well, due to what I believe are
violations of RFC-5661 and should not be used.

Reported by: andreas.nagy@frequentis.com
Tested by: andreas.nagy@frequentis.com, daniel@ftml.net (earlier version)
MFC after: 2 weeks
Relnotes: yes


# 5d54f186 16-Jul-2018 Rick Macklem <rmacklem@FreeBSD.org>

Modify the reasons for not issuing a delegation in the NFSv4.1 server.

The ESXi NFSv4.1 client will generate warning messages when the reason for
not issuing a delegation is two. Two refers to a resource limit and I do
not see why it would be considered invalid. However it probably was not the
best choice of reason for not issuing a delegation.
This patch changes the reasons used to ones that the ESXi client doesn't
complain about. This change does not affect the FreeBSD client and does
not appear to affect behaviour of the Linux NFSv4.1 client.
RFC5661 defines these "reasons" but does not give any guidance w.r.t. which
ones are more appropriate to return to a client.

Tested by: andreas.nagy@frequentis.com
PR: 226650
MFC after: 2 weeks


# 90d2dfab 12-Jun-2018 Rick Macklem <rmacklem@FreeBSD.org>

Merge the pNFS server code from projects/pnfs-planb-server into head.

This code merge adds a pNFS service to the NFSv4.1 server. Although it is
a large commit it should not affect behaviour for a non-pNFS NFS server.
Some documentation on how this works can be found at:
http://people.freebsd.org/~rmacklem/pnfs-planb-setup.txt
and will hopefully be turned into a proper document soon.
This is a merge of the kernel code. Userland and man page changes will
come soon, once the dust settles on this merge.
It has passed a "make universe", so I hope it will not cause build problems.
It also adds NFSv4.1 server support for the "current stateid".

Here is a brief overview of the pNFS service:
A pNFS service separates the Read/Write oeprations from all the other NFSv4.1
Metadata operations. It is hoped that this separation allows a pNFS service
to be configured that exceeds the limits of a single NFS server for either
storage capacity and/or I/O bandwidth.
It is possible to configure mirroring within the data servers (DSs) so that
the data storage file for an MDS file will be mirrored on two or more of
the DSs.
When this is used, failure of a DS will not stop the pNFS service and a
failed DS can be recovered once repaired while the pNFS service continues
to operate. Although two way mirroring would be the norm, it is possible
to set a mirroring level of up to four or the number of DSs, whichever is
less.
The Metadata server will always be a single point of failure,
just as a single NFS server is.

A Plan B pNFS service consists of a single MetaData Server (MDS) and K
Data Servers (DS), all of which are recent FreeBSD systems.
Clients will mount the MDS as they would a single NFS server.
When files are created, the MDS creates a file tree identical to what a
single NFS server creates, except that all the regular (VREG) files will
be empty. As such, if you look at the exported tree on the MDS directly
on the MDS server (not via an NFS mount), the files will all be of size 0.
Each of these files will also have two extended attributes in the system
attribute name space:
pnfsd.dsfile - This extended attrbute stores the information that
the MDS needs to find the data storage file(s) on DS(s) for this file.
pnfsd.dsattr - This extended attribute stores the Size, AccessTime, ModifyTime
and Change attributes for the file, so that the MDS doesn't need to
acquire the attributes from the DS for every Getattr operation.
For each regular (VREG) file, the MDS creates a data storage file on one
(or more if mirroring is enabled) of the DSs in one of the "dsNN"
subdirectories. The name of this file is the file handle
of the file on the MDS in hexadecimal so that the name is unique.
The DSs use subdirectories named "ds0" to "dsN" so that no one directory
gets too large. The value of "N" is set via the sysctl vfs.nfsd.dsdirsize
on the MDS, with the default being 20.
For production servers that will store a lot of files, this value should
probably be much larger.
It can be increased when the "nfsd" daemon is not running on the MDS,
once the "dsK" directories are created.

For pNFS aware NFSv4.1 clients, the FreeBSD server will return two pieces
of information to the client that allows it to do I/O directly to the DS.
DeviceInfo - This is relatively static information that defines what a DS
is. The critical bits of information returned by the FreeBSD
server is the IP address of the DS and, for the Flexible
File layout, that NFSv4.1 is to be used and that it is
"tightly coupled".
There is a "deviceid" which identifies the DeviceInfo.
Layout - This is per file and can be recalled by the server when it
is no longer valid. For the FreeBSD server, there is support
for two types of layout, call File and Flexible File layout.
Both allow the client to do I/O on the DS via NFSv4.1 I/O
operations. The Flexible File layout is a more recent variant
that allows specification of mirrors, where the client is
expected to do writes to all mirrors to maintain them in a
consistent state. The Flexible File layout also allows the
client to report I/O errors for a DS back to the MDS.
The Flexible File layout supports two variants referred to as
"tightly coupled" vs "loosely coupled". The FreeBSD server always
uses the "tightly coupled" variant where the client uses the
same credentials to do I/O on the DS as it would on the MDS.
For the "loosely coupled" variant, the layout specifies a
synthetic user/group that the client uses to do I/O on the DS.
The FreeBSD server does not do striping and always returns
layouts for the entire file. The critical information in a layout
is Read vs Read/Writea and DeviceID(s) that identify which
DS(s) the data is stored on.

At this time, the MDS generates File Layout layouts to NFSv4.1 clients
that know how to do pNFS for the non-mirrored DS case unless the sysctl
vfs.nfsd.default_flexfile is set non-zero, in which case Flexible File
layouts are generated.
The mirrored DS configuration always generates Flexible File layouts.
For NFS clients that do not support NFSv4.1 pNFS, all I/O operations
are done against the MDS which acts as a proxy for the appropriate DS(s).
When the MDS receives an I/O RPC, it will do the RPC on the DS as a proxy.
If the DS is on the same machine, the MDS/DS will do the RPC on the DS as
a proxy and so on, until the machine runs out of some resource, such as
session slots or mbufs.
As such, DSs must be separate systems from the MDS.

Tested by: james.rose@framestore.com
Relnotes: yes


# 9442a64e 01-Jun-2018 Rick Macklem <rmacklem@FreeBSD.org>

Add the BindConnectiontoSession operation to the NFSv4.1 server.

Under some fairly unusual circumstances, the Linux NFSv4.1 client is
doing a BindConnectiontoSession operation for TCP connections.
It is also used by the ESXi6.5 NFSv4.1 client.
This patch adds this operation to the NFSv4.1 server.

Reported by: andreas.nagy@frequentis.com
Tested by: andreas.nagy@frequentis.com
MFC after: 2 weeks


# 8932a483 13-May-2018 Rick Macklem <rmacklem@FreeBSD.org>

Fix the eir_server_scope reply argument for NFSv4.1 ExchangeID.

In the reply to an ExchangeID operation, the NFSv4.1 server returns a
"scope" value (eir_server_scope). If this value is the same, it indicates
that two servers share state, which is never the case for FreeBSD servers.
As such, the value needs to be unique and it was without this patch.
However, I just found out that it is not supposed to change when the
server reboots and without this patch, it did change.
This patch fixes eir_server_scope so that it does not change when the
server is rebooted.
The only affect not having this patch has is that Linux clients don't
reclaim opens and locks after a server reboot, which meant they lost
any byte range locks held before the server rebooted.
It only affects NFSv4.1 mounts and the FreeBSD NFSv4.1 client was not
affected by this bug.

MFC after: 1 week


# 5d4835e4 11-May-2018 Rick Macklem <rmacklem@FreeBSD.org>

Add support for the TestStateID operation to the NFSv4.1 server.

The Linux client now uses the TestStateID operation, so this patch adds
support for it to the NFSv4.1 server. The FreeBSD client never uses this
operation, so it should not be affected.

MFC after: 2 months


# 6269d663 19-Apr-2018 Rick Macklem <rmacklem@FreeBSD.org>

Fix OpenDowngrade for NFSv4.1 if a client sets the OPEN_SHARE_ACCESS_WANT* bits.

The NFSv4.1 RFC specifies that the OPEN_SHARE_ACCESS_WANT bits can be set
in the OpenDowngrade share_access argument and are basically ignored.
I do not know of a extant NFSv4.1 client that does this, but this little
patch fixes it just in case.
It also changes the error from NFSERR_BADXDR to NFSERR_INVAL since the NFSv4.1
RFC specifies this as the error to be returned if bogus bits are set.
(The NFSv4.0 RFC didn't specify any error for this, so the error reply can
be changed for NFSv4.0 as well.)
Found by inspection while looking at a problem with OpenDowngrade reported
for the ESXi 6.5 NFSv4.1 client.

Reported by: andreas.nagy@frequentis.com
PR: 227214
MFC after: 1 week


# b97b91b5 25-Jan-2018 Conrad Meyer <cem@FreeBSD.org>

nfs: Remove NFSSOCKADDRALLOC, NFSSOCKADDRFREE macros

They were just thin wrappers over malloc(9) w/ M_ZERO and free(9).

Discussed with: rmacklem, markj
Sponsored by: Dell EMC Isilon


# 222daa42 25-Jan-2018 Conrad Meyer <cem@FreeBSD.org>

style: Remove remaining deprecated MALLOC/FREE macros

Mechanically replace uses of MALLOC/FREE with appropriate invocations of
malloc(9) / free(9) (a series of sed expressions). Something like:

* MALLOC(a, b, ... -> a = malloc(...
* FREE( -> free(
* free((caddr_t) -> free(

No functional change.

For now, punt on modifying contrib ipfilter code, leaving a definition of
the macro in its KMALLOC().

Reported by: jhb
Reviewed by: cy, imp, markj, rmacklem
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D14035


# b1288166 17-Jan-2018 John Baldwin <jhb@FreeBSD.org>

Use long for the last argument to VOP_PATHCONF rather than a register_t.

pathconf(2) and fpathconf(2) both return a long. The kern_[f]pathconf()
functions now accept a pointer to a long value rather than modifying
td_retval directly. Instead, the system calls explicitly store the
returned long value in td_retval[0].

Requested by: bde
Reviewed by: kib
Sponsored by: Chelsio Communications


# 51369649 20-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.


# 57ef3db3 15-Oct-2017 Rick Macklem <rmacklem@FreeBSD.org>

Fix the client IP address reported by nfsdumpstate for 64bit arch and NFSv4.1.

The client IP address was not being reported for some NFSv4 mounts by
nfsdumpstate. Upon investigation, two problems were found for mounts
using IPv4. One was that the code (originally written and tested on i386)
assumed that a "u_long" was a "uint32_t" and would exactly store an
IPv4 host address. Not correct for 64bit arches.
Also, for NFSv4.1 mounts, the field was not being filled in. This was
basically correct, because NFSv4.1 does not use a callback address.
However, it meant that nfsdumpstate could not report the client IP addr.
This patch should fix both of these issues.
For IPv6, the address will still not be reported. The original NFSv4 RFC
only specified IPv4 callback addresses. I think this has changed and, if so,
a future commit to fix reporting of IPv6 addresses will be needed.

Reported by: manu
PR: 223036
MFC after: 2 weeks


# ce8d06fe 24-Sep-2017 Rick Macklem <rmacklem@FreeBSD.org>

Change a panic to an error return.

There was a panic() in the NFS server's write operation that didn't
need to be a panic() and could just be an error return.
This patch makes that change.
Found by code inspection during development of the pNFS service.

MFC after: 2 weeks


# fbbd9655 28-Feb-2017 Warner Losh <imp@FreeBSD.org>

Renumber copyright clause 4

Renumber cluase 4 to 3, per what everybody else did when BSD granted
them permission to remove clause 3. My insistance on keeping the same
numbering for legal reasons is too pedantic, so give up on that point.

Submitted by: Jan Schaumann <jschauma@stevens.edu>
Pull Request: https://github.com/freebsd/freebsd/pull/96


# 2f304845 05-Jan-2017 Konstantin Belousov <kib@FreeBSD.org>

Do not allocate struct statfs on kernel stack.

Right now size of the structure is 472 bytes on amd64, which is
already large and stack allocations are indesirable. With the ino64
work, MNAMELEN is increased to 1024, which will make it impossible to have
struct statfs on the stack.

Extracted from: ino64 work by gleb
Discussed with: mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week


# 5ecc225f 11-May-2016 Conrad Meyer <cem@FreeBSD.org>

nfsd: Fix use-after-free in NFS4 lock test service

Trivial use-after-free where stp was freed too soon in the non-error path.
To fix, simply move its release to the end of the routine.

Reported by: Coverity
CID: 1006105
Sponsored by: EMC / Isilon Storage Division


# 74b8d63d 10-Apr-2016 Pedro F. Giffuni <pfg@FreeBSD.org>

Cleanup unnecessary semicolons from the kernel.

Found with devel/coccinelle.


# 1f54e596 27-May-2015 Rick Macklem <rmacklem@FreeBSD.org>

Make the size of the hash tables used by the NFSv4 server tunable.
No appreciable change in performance was observed after increasing
the sizes of these tables and then testing with a single client.
However, there was an email that indicated high CPU overheads for
a heavily loaded NFSv4 and it is hoped that increasing the sizes
of the hash tables via these tunables might help.
The tables remain the same size by default.

Differential Revision: https://reviews.freebsd.org/D2596
MFC after: 2 weeks


# 66e80f77 16-Apr-2015 Rick Macklem <rmacklem@FreeBSD.org>

mav@ has found that NFS servers exporting ZFS file systems
can perform better when using a 128K read/write data size.
This patch changes NFS_MAXDATA from 64K to 128K so that
clients can use 128K for NFS mounts to allow this.
The patch also renames NFS_MAXDATA to NFS_SRVMAXIO so
that it is clear that it applies to the NFS server side
only. It also avoids a name conflict with the NFS_MAXDATA
defined in rpcsvc/nfs_prot.h, that is used for userland RPC.

Tested by: mav
Reviewed by: mav
MFC after: 2 weeks


# 6c21f6ed 18-Dec-2014 Konstantin Belousov <kib@FreeBSD.org>

The VOP_LOOKUP() implementations for CREATE op do not put the name
into namecache, to avoid cache trashing when doing large operations.
E.g., tar archive extraction is not usually followed by access to many
of the files created.

Right now, each VOP_LOOKUP() implementation explicitely knowns about
this quirk and tests for both MAKEENTRY flag presence and op != CREATE
to make the call to cache_enter(). Centralize the handling of the
quirk into VFS, by deciding to cache only by MAKEENTRY flag in VOP.
VFS now sets NOCACHE flag for CREATE namei() calls.

Note that the change in semantic is backward-compatible and could be
merged to the stable branch, and is compatible with non-changed
third-party filesystems which correctly handle MAKEENTRY.

Suggested by: Chris Torek <torek@pi-coral.com>
Reviewed by: mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks


# d8a5961f 02-Oct-2014 Marcelo Araujo <araujo@FreeBSD.org>

Fix failures and warnings reported by newpynfs20090424 test tool.
This fix addresses only issues with the pynfs reports, none of these
issues are know to create problems for extant real clients.

Submitted by: Bart Hsiao <bart.hsiao@gmail.com>
Reworked by: myself
Reviewed by: rmacklem
Approved by: rmacklem
Sponsored by: QNAP Systems Inc.


# c59e4cc3 01-Jul-2014 Rick Macklem <rmacklem@FreeBSD.org>

Merge the NFSv4.1 server code in projects/nfsv4.1-server over
into head. The code is not believed to have any effect
on the semantics of non-NFSv4.1 server behaviour.
It is a rather large merge, but I am hoping that there will
not be any regressions for the NFS server.

MFC after: 1 month


# ca4defd5 06-Jun-2014 Rick Macklem <rmacklem@FreeBSD.org>

The new NFS server would not allow a hard link to be
created to a symlink. This restriction (which was
inherited from OpenBSD) is not required by the NFS RFCs.
Since this is allowed by the old NFS server, it is a
POLA violation to not allow it. This patch modifies the
new NFS server to allow this.

Reported by: jhb
Reviewed by: jhb
MFC after: 3 days


# 25bfde79 08-Apr-2014 Xin LI <delphij@FreeBSD.org>

Fix NFS deadlock vulnerability. [SA-14:05]

Fix "Heartbleed" vulnerability and ECDSA Cache Side-channel
Attack in OpenSSL. [SA-14:06]


# e4558aac 18-Jan-2013 Xin LI <delphij@FreeBSD.org>

Make it possible to force async at server side on new NFS server, similar
to the old one's nfs.nfsrv.async.

Please note that by enabling this option (default is disabled), the system
could potentionally have silent data corruption if the server crashes
before write is committed to non-volatile storage, as the client side have
no way to tell if the data is already written.

Submitted by: rmacklem
MFC after: 2 weeks


# de67b496 20-Aug-2011 Rick Macklem <rmacklem@FreeBSD.org>

Fix the NFSv4 server so that it returns NFSERR_SYMLINK when
an attempt to do an Open operation on any type of file other
than VREG is done. A recent discussion on the IETF working group's
mailing list (nfsv4@ietf.org) decided that NFSERR_SYMLINK
should be returned for all non-regular files and not just symlinks,
so that the Linux client would work correctly.
This change does not affect the FreeBSD NFSv4 client and is not
believed to have a negative effect on other NFSv4 clients.

Reviewed by: zkirsch
Approved by: re (kib)
MFC after: 2 weeks


# 06521fbb 03-Aug-2011 Zack Kirsch <zack@FreeBSD.org>

Fix an NFS server issue where it was not correctly setting the eof flag when a
READ had hit the end of the file. Also, clean up some cruft in the code.

Approved by: re (kib)
Reviewed by: rmacklem
MFC after: 2 weeks


# 6b3dfc6a 31-Jul-2011 Rick Macklem <rmacklem@FreeBSD.org>

Fix rename in the new NFS server so that it does not require a
recursive vnode lock on the directory for the case where the
new file name is in the same directory as the old one. The patch
handles this as a special case, recognized by the new directory
having the same file handle as the old one and just VREF()s the old
dir vnode for this case, instead of doing a second VFS_FHTOVP() to get it.
This is required so that the server will work for file systems like
msdosfs, that do not support recursive vnode locking.
This problem was discovered during recent testing by pho@
when exporting an msdosfs file system via the new NFS server.

Tested by: pho
Reviewed by: zkirsch
Approved by: re (kib)
MFC after: 2 weeks


# a9285ae5 16-Jul-2011 Zack Kirsch <zack@FreeBSD.org>

Add DEXITCODE plumbing to NFS.

Isilon has the concept of an in-memory exit-code ring that saves the last exit
code of a function and allows for stack tracing. This is very helpful when
debugging tough issues.

This patch is essentially a no-op for BSD at this point, until we upstream
the dexitcode logic itself. The patch adds DEXITCODE calls to every NFS
function that returns an errno error code. A number of code paths were also
reorganized to have single exit paths, to reduce code duplication.

Submitted by: David Kwan <dkwan@isilon.com>
Reviewed by: rmacklem
Approved by: zml (mentor)
MFC after: 2 weeks


# a9989634 16-Jul-2011 Zack Kirsch <zack@FreeBSD.org>

Simple find/replace of VOP_UNLOCK -> NFSVOPUNLOCK. This is done so that NFSVOPUNLOCK can be modified later to add enhanced logging and assertions.

Reviewed by: rmacklem
Approved by: zml (mentor)
MFC after: 2 weeks


# 98f234f3 16-Jul-2011 Zack Kirsch <zack@FreeBSD.org>

Simple find/replace of vn_lock -> NFSVOPLOCK. This is done so that NFSVOPLOCK can be modified later to add enhanced logging and assertions.

Reviewed by: rmacklem
Approved by: zml (mentor)
MFC after: 2 weeks


# c383087c 16-Jul-2011 Zack Kirsch <zack@FreeBSD.org>

Remove unnecessary thread pointer from VOPLOCK macros and current users.

Reviewed by: rmacklem
Approved by: zml (mentor)
MFC after: 2 weeks


# 53f476ca 21-Jun-2011 Rick Macklem <rmacklem@FreeBSD.org>

Fix the new NFSv4 server so that it checks for VREAD_ACL when
a client does a Getattr for an ACL and not VREAD_ATTRIBUTES.
This was found during the recent NFSv4 interoperability Bakeathon.

MFC after: 2 weeks


# 37b88c2d 20-Jun-2011 Rick Macklem <rmacklem@FreeBSD.org>

Fix the new NFSv4 server so that it only allows Lookup of
directories and symbolic links when traversing non-exported
file systems. Found during the recent NFSv4 interoperability
Bakeathon.

MFC after: 2 weeks


# a09001a8 14-Apr-2011 Rick Macklem <rmacklem@FreeBSD.org>

Fix the experimental NFSv4 server so that it uses VOP_PATHCONF()
to determine if a file system supports NFSv4 ACLs. Since
VOP_PATHCONF() must be called with a locked vnode, the function
is called before nfsvno_fillattr() and the result is passed in
as an extra argument.

MFC after: 2 weeks


# 07c0c166 14-Apr-2011 Rick Macklem <rmacklem@FreeBSD.org>

Modify the experimental NFSv4 server so that it handles
crossing of server mount points properly. The functions
nfsvno_fillattr() and nfsv4_fillattr() were modified to
take the extra arguments that are the mount point, a flag
to indicate that it is a file system root and the mounted
on fileno. The mount point argument needs to be busy when
nfsvno_fillattr() is called, since the vp argument is not
locked.

Reviewed by: kib
MFC after: 2 weeks


# 8974bc2f 06-Jan-2011 Rick Macklem <rmacklem@FreeBSD.org>

Since the VFS_LOCK_GIANT() code in the experimental NFS
server is broken and the major file systems are now all
mpsafe, modify the server so that it will only export
mpsafe file systems. This was discussed on freebsd-fs@
and removes a fair bit of crufty code.

MFC after: 12 days


# 81f78d99 02-Jan-2011 Rick Macklem <rmacklem@FreeBSD.org>

Modify the experimental NFSv4 server so that the lookup
ops return a locked vnode. This ensures that the associated mount
point will always be valid for the code that follows the operation.
Also add a couple of additional checks
for non-error to the other functions that create file objects.

MFC after: 2 weeks


# c9aad40f 02-Jan-2011 Rick Macklem <rmacklem@FreeBSD.org>

Delete some cruft from the experimental NFS server that was
only used by the OpenBSD port for its pseudo-fs.

MFC after: 2 weeks


# 629fa50e 02-Jan-2011 Rick Macklem <rmacklem@FreeBSD.org>

Add checks for VI_DOOMED and vn_lock() failures to the
experimental NFS server, to handle the case where an
exported file system is forced dismounted while an RPC
is in progress. Further commits will fix the cases where
a mount point is used when the associated vnode isn't locked.

Reviewed by: kib
MFC after: 2 weeks


# 17891d00 25-Dec-2010 Rick Macklem <rmacklem@FreeBSD.org>

Modify the experimental NFS server so that it uses LK_SHARED
for RPC operations when it can. Since VFS_FHTOVP() currently
always gets an exclusively locked vnode and is usually called
at the beginning of each RPC, the RPCs for a given vnode will
still be serialized. As such, passing a lock type argument to
VFS_FHTOVP() would be preferable to doing the vn_lock() with
LK_DOWNGRADE after the VFS_FHTOVP() call.

Reviewed by: kib
MFC after: 2 weeks


# 0cf42b62 24-Dec-2010 Rick Macklem <rmacklem@FreeBSD.org>

Add an argument to nfsvno_getattr() in the experimental
NFS server, so that it can avoid calling VOP_ISLOCKED()
when the vnode is known to be locked. This will allow
LK_SHARED to be used for these cases, which happen to
be all the cases that can use LK_SHARED. This does not
fix any bug, but it reduces the number of calls to
VOP_ISLOCKED() and prepares the code so that it can be
switched to using LK_SHARED in a future patch.

Reviewed by: kib
MFC after: 2 weeks


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# 0c58adb2 19-Apr-2010 Rick Macklem <rmacklem@FreeBSD.org>

MFC: r206236
Harden the experimental NFS server a little, by adding range
checks on the length of the client's open/lock owner name. Also,
add free()'s for one case where they were missing and would
have caused a leak if NFSERR_BADXDR had been replied. Probably
never happens, but the leak is now plugged, just in case.


# 2d2fef10 15-Apr-2010 Rick Macklem <rmacklem@FreeBSD.org>

MFC: r206061
Add SAVENAME to the cn_flags for all cases in the experimental
NFS server for the CREATE cn_nameiop where SAVESTART isn't set.
I was not aware that this needed to be done by the caller until
recently.


# 2a45247c 05-Apr-2010 Rick Macklem <rmacklem@FreeBSD.org>

Harden the experimental NFS server a little, by adding range
checks on the length of the client's open/lock owner name. Also,
add free()'s for one case where they were missing and would
have caused a leak if NFSERR_BADXDR had been replied. Probably
never happens, but the leak is now plugged, just in case.

MFC after: 2 weeks


# f61786cb 01-Apr-2010 Rick Macklem <rmacklem@FreeBSD.org>

Add SAVENAME to the cn_flags for all cases in the experimental
NFS server for the CREATE cn_nameiop where SAVESTART isn't set.
I was not aware that this needed to be done by the caller until
recently.

Tested by: lampa AT fit.vutbr.cz (link case)
Submitted by: lampa AT fit.vutbr.cz (link case)
MFC after: 2 weeks


# d3db09cb 08-Jan-2010 Rick Macklem <rmacklem@FreeBSD.org>

MFC: r200999
Modify the experimental server so that it uses VOP_ACCESSX().
This is necessary in order to enable NFSv4 ACL support. The
argument to nfsvno_accchk() was changed to an accmode_t and
the function nfsrv_aclaccess() was no longer needed and,
therefore, deleted.

Reviewed by: trasz


# 8da45f2c 25-Dec-2009 Rick Macklem <rmacklem@FreeBSD.org>

Modify the experimental server so that it uses VOP_ACCESSX().
This is necessary in order to enable NFSv4 ACL support. The
argument to nfsvno_accchk() was changed to an accmode_t and
the function nfsrv_aclaccess() was no longer needed and,
therefore, deleted.

Reviewed by: trasz
MFC after: 2 weeks


# 52b239b0 08-Dec-2009 Rick Macklem <rmacklem@FreeBSD.org>

MFC: r199616
Patch the experimental NFS server is a manner analagous to
r197525, so that the creation verifier is handled correctly
in va_atime for 64bit architectures. There were two problems.
One was that the code incorrectly assumed that
sizeof (struct timespec) == 8 and the other was that the tv_sec
field needs to be assigned from a signed 32bit integer, so that
sign extension occurs on 64bit architectures. This is required
for correct operation when exporting ZFS volumes.

Tested by: gerrit at pmp.uni-hannover.de
Reviewed by: pjd


# 086f6e0c 20-Nov-2009 Rick Macklem <rmacklem@FreeBSD.org>

Patch the experimental NFS server is a manner analagous to
r197525, so that the creation verifier is handled correctly
in va_atime for 64bit architectures. There were two problems.
One was that the code incorrectly assumed that
sizeof (struct timespec) == 8 and the other was that the tv_sec
field needs to be assigned from a signed 32bit integer, so that
sign extension occurs on 64bit architectures. This is required
for correct operation when exporting ZFS volumes.

Reviewed by: pjd
MFC after: 2 weeks


# c3e22f83 26-May-2009 Rick Macklem <rmacklem@FreeBSD.org>

Fix the experimental nfs subsystem so that it builds with the
current NFSv4 ACLs, as defined in sys/acl.h. It still needs a
way to test a mount point for NFSv4 ACL support before it will
work. Until then, the NFSHASNFS4ACL() macro just always returns 0.

Approved by: kib (mentor)


# b1cfc0d9 24-May-2009 Rick Macklem <rmacklem@FreeBSD.org>

Add NFSv4 root export checks to the DelegPurge, Renew and
ReleaseLockOwner operations analagous to what is already
in place for SetClientID and SetClientIDConfirm. These are
the five NFSv4 operations that do not use file handle(s),
so the checks are done using the NFSv4 root export entries
in /etc/exports.

Approved by: kib (mentor)


# 98ad4453 14-May-2009 Rick Macklem <rmacklem@FreeBSD.org>

Apply changes to the experimental nfs server so that it uses the security
flavors as exported in FreeBSD-CURRENT. This allows it to use a
slightly modified mountd.c instead of a different utility.

Approved by: kib (mentor)


# dfd233ed 11-May-2009 Attilio Rao <attilio@FreeBSD.org>

Remove the thread argument from the FSD (File-System Dependent) parts of
the VFS. Now all the VFS_* functions and relating parts don't want the
context as long as it always refers to curthread.

In some points, in particular when dealing with VOPs and functions living
in the same namespace (eg. vflush) which still need to be converted,
pass curthread explicitly in order to retain the old behaviour.
Such loose ends will be fixed ASAP.

While here fix a bug: now, UFS_EXTATTR can be compiled alone without the
UFS_EXTATTR_AUTOSTART option.

VFS KPI is heavilly changed by this commit so thirdy parts modules needs
to be recompiled. Bump __FreeBSD_version in order to signal such
situation.


# 9ec7b004 04-May-2009 Rick Macklem <rmacklem@FreeBSD.org>

Add the experimental nfs subtree to the kernel, that includes
support for NFSv4 as well as NFSv2 and 3.
It lives in 3 subdirs under sys/fs:
nfs - functions that are common to the client and server
nfsclient - a mutation of sys/nfsclient that call generic functions
to do RPCs and handle state. As such, it retains the
buffer cache handling characteristics and vnode semantics that
are found in sys/nfsclient, for the most part.
nfsserver - the server. It includes a DRC designed specifically for
NFSv4, that is used instead of the generic DRC in sys/rpc.
The build glue will be checked in later, so at this point, it
consists of 3 new subdirs that should not affect kernel building.

Approved by: kib (mentor)