Cross Reference: /freebsd-current/sys/fs/nfsserver/nfs

History log of /freebsd-current/sys/fs/nfsserver/nfs_nfsdstate.c
Revision	Date	Author	Comments
# e2c9fad2	04-Jun-2024	Rick Macklem <rmacklem@FreeBSD.org>	nfsd: Fix delegation handled for atomic upgrade For NFSv4.1/4.2, an atomic upgrade of a delegation from a read delegation to a write delegation is allowed and can result in signoficantly improved performance. This patch adds support for this atomic upgrade, plus fixes a couple of other delegation related bugs. Since there were three cases where delegations were being issued, the patch factors this out into a separate function called nfsrv_issuedelegations(). This patch should only affect the NFSv4.1/4.2 behaviour when delegations are enabled, which is not the default. MFC after: 1 month
# 54c3aa02	25-Apr-2024	Rick Macklem <rmacklem@FreeBSD.org>	Revert "nfsd: Fix NFSv4.1/4.2 Claim_Deleg_Cur_FH" This reverts commit f300335d9aebf2e99862bf783978bd44ede23550. It turns out that the old code was correct and it was wireshark that was broken and indicated that the RPC's XDR was bogus. Found during IETF bakeathon testing this week.
# f300335d	19-Oct-2023	Rick Macklem <rmacklem@FreeBSD.org>	nfsd: Fix NFSv4.1/4.2 Claim_Deleg_Cur_FH When I implemented a test patch using Open Claim_Deleg_Cur_FH I discovered that the NFSv4.1/4.2 server was broken for this Open option. Fortunately it is never used by the FreeBSD client and never used by other clients unless delegations are enabled. (The FreeBSD NFSv4 server does not have delegations enabled by default.) Claim_Deleg_Cur_FH was broken because the code mistakenly assumed a stateID argument, which is not the case. This patch fixes the bug by changing the XDR parser to not expect a stateID and to fill most of the stateID in from the clientID. The clientID is the first two elements of the "other" array for the stateID and is sufficient to identify which client the delegation is issued to. Since there is only one delegation issued to a client per file, this is sufficient to locate the correct delegation. If you are running non-FreeBSD NFSv4.1/4.2 mounts against the FreeBSD server, you need this patch if you have delegations enabled. PR: 274574 MFC after: 2 weeks
# 685dc743	16-Aug-2023	Warner Losh <imp@FreeBSD.org>	sys: Remove $FreeBSD$: one-line .c pattern Remove /^[\s]__FBSDID$"\$FreeBSD\$"$;?\s*\n/
# 11892bc7	02-Aug-2023	Gordon Bergling <gbe@FreeBSD.org>	nfsserver: Fix a typo in a source code comment - s/restared/restarted/ MFC after: 3 days
# 4d846d26	10-May-2023	Warner Losh <imp@FreeBSD.org>	spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch up to that fact and revert to their recommended match of BSD-2-Clause. Discussed with: pfg MFC After: 3 days Sponsored by: Netflix
# ff2f1f69	07-Apr-2023	Rick Macklem <rmacklem@FreeBSD.org>	nfsd: Add support for the SP4_MACH_CRED case in ExchangeID Commit f4179ad46fa4 added support for operation bitmaps for NFSv4.1/4.2. This commit uses those to implement the SP4_MACH_CRED case for the NFSv4.1/4.2 ExchangeID operation since the Linux NFSv4.1/4.2 client is now using this for Kerberized mounts. The Linux Kerberized NFSv4.1/4.2 mounts currently work without support for this because Linux will fall back to SP4_NONE, but there is no guarantee this fallback will work forever. This commit only affects Kerberized NFSv4.1/4.2 mounts from Linux at this time. MFC after: 3 months
# 695d87ba	28-Mar-2023	Rick Macklem <rmacklem@FreeBSD.org>	nfscl: Make coverity happy Coverity does not like code that checks a function's return value sometimes. Add "(void)" in front of the function when the return value does not matter to try and make it happy. A recent commit deleted "(void)"s in front of nfsm_fhtom(). This commit puts them back in. Reported by: emaste MFC after: 3 months
# 896516e5	16-Mar-2023	Rick Macklem <rmacklem@FreeBSD.org>	nfscl: Add a new NFSv4.1/4.2 mount option for Kerberized mounts Without this patch, a Kerberized NFSv4.1/4.2 mount must provide a Kerberos credential for the client at mount time. This credential is typically referred to as a "machine credential". It can be created one of two ways: - The user (usually root) has a valid TGT at the time the mount is done and this becomes the machine credential. There are two problems with this. 1 - The user doing the mount must have a valid TGT for a user principal at mount time. As such, the mount cannot be put in fstab(5) or similar. 2 - When the TGT expires, the mount breaks. - The client machine has a service principal in its default keytab file and this service principal (typically called a host-based initiator credential) is used as the machine credential. There are problems with this approach as well: 1 - There is a certain amount of administrative overhead creating the service principal for the NFS client, creating a keytab entry for this principal and then copying the keytab entry into the client's default keytab file via some secure means. 2 - The NFS client must have a fixed, well known, DNS name, since that FQDN is in the service principal name as the instance. This patch uses a feature of NFSv4.1/4.2 called SP4_NONE, which allows the state maintenance operations to be performed by any authentication mechanism, to do these operations via AUTH_SYS instead of RPCSEC_GSS (Kerberos). As such, neither of the above mechanisms is needed. It is hoped that this option will encourage adoption of Kerberized NFS mounts using TLS, to provide a more secure NFS mount. This new NFSv4.1/4.2 mount option, called "syskrb5" must be used with "sec=krb5[ip]" to avoid the need for either of the above Kerberos setups to be done by the client. Note that all file access/modification operations still require users on the NFS client to have a valid TGT recognized by the NFSv4.1/4.2 server. As such, this option allows, at most, a malicious client to do some sort of DOS attack. Although not required, use of "tls" with this new option is encouraged, since it provides on-the-wire encryption plus, optionally, client identity verification via a X.509 certificate provided to the server during TLS handshake. Alternately, "sec=krb5p" does provide on-the-wire encryption of file data. A mount_nfs(8) man page update will be done in a separate commit. Discussed on: freebsd-current@ MFC after: 3 months
# b039ca07	15-Feb-2023	Rick Macklem <rmacklem@FreeBSD.org>	nfsd: Wrap nfsstatsv1_p in the NFSD_VNET() macro Commit 7344856e3a6d added a lot of macros that will front end vnet macros so that nfsd(8) can run in vnet prison. The nfsstatsv1_p variable got missed. This patch wraps all uses of nfsstatsv1_p with the NFSD_VNET() macro. The NFSD_VNET() macro is still a null macro. MFC after: 3 months
# 7e44856e	11-Feb-2023	Rick Macklem <rmacklem@FreeBSD.org>	nfsd: Prepare the NFS server code to run in a vnet prison This patch defines null macros that can be used to apply the vnet macros for global variables and SYSCTL flags. It also applies these macros to many of the global variables and some of the SYSCTLs. Since the macros do nothing, these changes should not result in semantics changes, although the changes are large in number. The patch does change several global variables that were arrays or structures to pointers to same. For these variables, modified initialization and cleanup code malloc's and free's the arrays/structures. This was done so that the vnet footprint would be about 300bytes when the macros are defined as vnet macros, allowing nfsd.ko to load dynamically. I believe the comments in D37519 have been addressed, although it has never been reviewed, due in part to the large size of the patch. This is the first of a series of patches that will put D37519 in main. Once everything is in main, the macros will be defined as front end macros to the vnet ones. MFC after: 3 months Differential Revision: https://reviews.freebsd.org/D37519
# aa41036e	22-Jun-2022	Elliott Mitchell <ehem+freebsd@m5p.com>	nfsserver: purge EOL release compatibility Remove now-obsolete FreeBSD 4.x and earlier ifdef. Reviewed by: imp Pull Request: https://github.com/freebsd/freebsd-src/pull/603 Differential Revision: https://reviews.freebsd.org/D35560
# 5a0050e6	15-Jan-2023	Rick Macklem <rmacklem@FreeBSD.org>	nfsserver: Fix handling of SP4_NONE For NFSv4.1/4.2, when the client specifies SP4_NONE for state protection in the ExchangeID operation arguments, the server MUST allow the state management operations for any user credentials. (I misread the RFC and thought that SP4_NONE meant "at the server's discression" and not MUST be allowed.) This means that the "sec=XXX" field of the "V4:" exports(5) line only applies to NFSv4.0. This patch fixes the server to always allow state management operations for SP4_NONE, which is the only state management option currently supported. (I have patches that add support for SP4_MACH_CRED to the server. These will be in a future commit.) In practice, this bug does not seem to have caused interoperability problems. MFC after: 2 weeks
# a75d1ddd	17-Sep-2022	Mateusz Guzik <mjg@FreeBSD.org>	vfs: introduce V_PCATCH to stop abusing PCATCH
# b875d4f5	27-Aug-2022	Rick Macklem <rmacklem@FreeBSD.org>	nfsd: Update console message for no session found The NFSv4.1/4.2 server generates a console message that indicates that there is no session. I was until recently perplexed w.r.t. how this could occur. It turns out that the common cause is multiple NFS clients with the same /etc/hostid. The host uuid is used by the FreeBSD NFSv4.1/4.2 client as a unique identifier for the client. If multiple clients use the same host uuid, this indicates to the NFSv4.1/4.2 server that they are the same client and confusion occurs. This trivial patch modifies the console message to suggest that the client's /etc/hostid needs to be checked for uniqueness. Reviewed by: asomers MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D36377
# 088ba435	13-Jul-2022	Rick Macklem <rmacklem@FreeBSD.org>	nfsd: Fix CreateSession for an established ClientID I mis-read the RFC w.r.t. handling of the sequenceid when a CreateSession is done after the initial one that confirms the ClientID. Fortunately this does not affect most extant NFSv4.1/4.2 clients, since they only acquire a single session for TCP for a ClientID (Solaris might be an exception?). This patch fixes the server to handle this case, where the RFC requires the sequenceid be incremented for each CreateSession and is required to reply to a retried CreateSession with a cached reply. It adds a field to nfsclient called lc_prevsess, which caches the sessionid, which is the only field in a CreateSession reply that will change for a retry, to implement this reply cache. The recent commits up to d4a11b3e3bdd that mark session slots bad when "intr" and/or "soft" mounts are used by the client needs this server patch. Without this patch, the client will do a full recovery, including a new ClientID, losing all byte range locks. However, prior to the recent client commits, the client would hang when all session slots were bad, so even without this patch it is not a regression. PR: 260011 MFC after: 2 weeks
# 40ada74e	09-Jul-2022	Rick Macklem <rmacklem@FreeBSD.org>	nfscl: Add optional support for slots marked bad This patch adds support for session slots marked bad to nfsv4_sequencelookup(). An additional boolean argument indicates if the check for slots marked bad should be done. The "cred" argument added to nfscl_reqstart() by commit 326bcf9394c7 is now passed into nfsv4_setquence() so that it can optionally set the boolean argument for nfsv4_sequencelookup(). When optionally enabled, nfsv4_setsequence() will do a DestroySession when all slots are marked bad. Since the code that marks slots bad is not yet committed, this patch should not result in a semantics change. PR: 260011 MFC after: 2 weeks
# 5d3fe02c	22-Jun-2022	Rick Macklem <rmacklem@FreeBSD.org>	nfsd: Clean up the code by not using the vnode_vtype() macro The vnode_vtype() macro was used to make the code compatible with Mac OSX, for the Mac OSX port. For FreeBSD, this macro just obscured the code, so avoid using it to clean up the code. This commit should not result in a semantics change.
# 271f6d52	02-May-2022	Rick Macklem <rmacklem@FreeBSD.org>	nfsd: Fix session slot freeing for NFSv4.1/4.2 Without this patch the NFSv4.1/4.2 server erroneously always frees session slot zero for callbacks. This only affects 4.1/4.2 mounts if the server has delegations enabled or is a pNFS configuration. Even for those cases, the effect is mainly to only use slot 0 for callbacks, serializing all of them. There is a slight chance that callbacks will fail if the client performs them in a different order than received on the TCP connection. If this bug affects your server, you will see console messages like: newnfs_request: Bad session slot This patch fixes the problem. Found during a recent IETF NFSv4 testing event. PR: 263728 MFC after: 2 weeks
# 17a56f3f	09-Feb-2022	Rick Macklem <rmacklem@FreeBSD.org>	nfsd: Reply NFSERR_SEQMISORDERED for bogus seqid argument The ESXi NFSv4.1 client bogusly sends the wrong value for the csa_sequence argument for a Create_session operation. RFC8881 requires this value to be the same as the sequence reply from the ExchangeID operation most recently done for the client ID. Without this patch, the server replies NFSERR_STALECLIENTID, which is the correct response for an NFSv4.0 SetClientIDConfirm but is not the correct error for NFSv4.1/4.2, which is specified as NFSERR_SEQMISORDERED in RFC8881. This patch fixes this. This change does not fix the issue reported in the PR, where the ESXi client loops, attempting ExchangeID/Create_session repeatedly. Reported by: asomers Tested by: asomers PR: 261291 MFC after: 1 week
# c302f889	13-Dec-2021	Rick Macklem <rmacklem@FreeBSD.org>	nfsd: Limit parsing of layout errors to maxcnt bytes This patch decrements maxcnt by the appropriate number of bytes during parsing and checks to see if there is data remaining. If not, it just returns from nfsrv_flexlayouterr() without further processing. This prevents the tl pointer from running off the end of the error data pointed at by layp, if there are flaws in the data. Reported by: rtm@lcs.mit.edu Tested by: rtm@lcs.mit.edu PR: 260293 MFC after: 2 weeks
# bdd57cbb	26-Nov-2021	Rick Macklem <rmacklem@FreeBSD.org>	nfsd: Add checks for layout errors in LayoutReturn For a LayoutReturn when using the Flexible File Layout, error reports may be provided in the request. Sanity check the size of these error reports and check that they exist before calling nfsrv_flexlayouterr(). Reported by: rtm@lcs.mit.edu Tested by: rtm@lcs.mit.edu PR: 260012 MFC after: 2 weeks
# 7e1d3eef	25-Nov-2021	Mateusz Guzik <mjg@FreeBSD.org>	vfs: remove the unused thread argument from NDINIT* See b4a58fbf640409a1 ("vfs: remove cn_thread") Bump __FreeBSD_version to 1400043.
# ce9676de	12-Nov-2021	Rick Macklem <rmacklem@FreeBSD.org>	pNFS: Add nfsstats counters for number of Layouts For pNFS, Layouts are issued by the server to indicate where a file's data resides on the DS(s). This patch adds counters for how many layouts are allocated to the nfsstatsv1 structure, using two reserved fields. MFC after: 2 weeks
# f8dc0630	08-Nov-2021	Rick Macklem <rmacklem@FreeBSD.org>	nfsd: Fix the NFSv4.2 pNFS MDS server for NFSERR_NOSPC via LayoutError If a pNFS server's DS runs out of disk space, it replies NFSERR_NOSPC to the client doing writing. For the Linux client, it then sends a LayoutError RPC to the MDS server to tell it about the error and keeps retrying, doing repeated LayoutGets to the MDS and Write RPCs to the DS. The Linux client is "stuck" until disk space on the DS is free'd up unless a subsequent LayoutGet request is sent a NFSERR_NOSPC reply. The looping problem still occurs for NFSv4.1 mounts, but no fix for this is known at this time. This patch changes the pNFS MDS server to reply to LayoutGet operations with NFSERR_NOSPC once a LayoutError reports the problem, until the DS has available space. This keeps the Linux NFSv4.2 from looping. Found during recent testing because of issues w.r.t. a DS being out of space found during a recent IEFT NFSv4 working group testing event. MFC after: 2 weeks
# a7e014ee	07-Nov-2021	Rick Macklem <rmacklem@FreeBSD.org>	nfsd: Fix the NFSv4 pNFS MDS server for DS NFSERR_NOSPC If a pNFS server's DS runs out of disk space, it replies NFSERR_NOSPC to the client doing writing. For the Linux client, it then sends a LayoutError RPC to the server to tell it about the error and keeps retrying, doing repeated LayoutGet and Write RPCs to the DS. The Linux client is "stuck" until disk space on the DS is free'd up. For a mirrored server configuration, the first mirror that ran out of space was taken offline. This does not make much sense, since the other mirror(s) will run out of space soon and the fix is a manual cleanup up disk space. This patch changes the pNFS server to not disable a mirror for the mirrored case when this occurs. Further work is needed, since the Linux client expects the MDS to reply NFSERR_NOSPC to LayoutGets once the DS is out of space. Without this further change, the above mentioned looping occurs. Found during a recent IEFT NFSv4 working group testing event. MFC after: 2 weeks
# ee29e6f3	16-Jul-2021	Rick Macklem <rmacklem@FreeBSD.org>	nfsd: Add sysctl to set maximum I/O size up to 1Mbyte Since MAXPHYS now allows the FreeBSD NFS client to do 1Mbyte I/O operations, add a sysctl called vfs.nfsd.srvmaxio so that the maximum NFS server I/O size can be set up to 1Mbyte. The Linux NFS client can also do 1Mbyte I/O operations. The default of 128Kbytes for the maximum I/O size has not been changed for two reasons: - kern.ipc.maxsockbuf must be increased to support 1Mbyte I/O - The limited benchmarking I can do actually shows a drop in I/O rate when the I/O size is above 256Kbytes. However, daveb@spectralogic.com reports seeing an increase in I/O rate for the 1Mbyte I/O size vs 128Kbytes using a Linux client. Reviewed by: asomers MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D30826
# 1e0a518d	08-Jul-2021	Rick Macklem <rmacklem@FreeBSD.org>	nfscl: Add a Linux compatible "nconnect" mount option Linux has had an "nconnect" NFS mount option for some time. It specifies that N (up to 16) TCP connections are to created for a mount, instead of just one TCP connection. A discussion on freebsd-net@ indicated that this could improve client<-->server network bandwidth, if either the client or server have one of the following: - multiple network ports aggregated to-gether with lagg/lacp. - a fast NIC that is using multiple queues It does result in using more IP port#s and might increase server peak load for a client. One difference from the Linux implementation is that this implementation uses the first TCP connection for all RPCs composed of small messages and uses the additional TCP connections for RPCs that normally have large messages (Read/Readdir/Write). The Linux implementation spreads all RPCs across all TCP connections in a round robin fashion, whereas this implementation spreads Read/Readdir/Write across the additional TCP connections in a round robin fashion. Reviewed by: markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D30970
# e1a907a2	11-Jun-2021	Rick Macklem <rmacklem@FreeBSD.org>	krpc: Acquire ref count of CLIENT for backchannel use Michael Dexter <editor@callfortesting.org> reported a crash in FreeNAS, where the first argument to clnt_bck_svccall() was no longer valid. This argument is a pointer to the callback CLIENT structure, which is free'd when the associated NFSv4 ClientID is free'd. This appears to have occurred because a callback reply was still in the socket receive queue when the CLIENT structure was free'd. This patch acquires a reference count on the CLIENT that is not CLNT_RELEASE()'d until the socket structure is destroyed. This should guarantee that the CLIENT structure is still valid when clnt_bck_svccall() is called. It also adds a check for closed or closing to clnt_bck_svccall() so that it will not process the callback RPC reply message after the ClientID is free'd. Comments by: mav MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D30153
# 46269d66	16-May-2021	Rick Macklem <rmacklem@FreeBSD.org>	NFSv4 server: Re-establish the delegation recall timeout Commit 7a606f280a3e allowed the server to do retries of CB_RECALL callbacks every couple of seconds. This was needed to allow the Linux client to re-establish the back channel. However this patch broke the delegation timeout check, such that it would just keep retrying CB_RECALLS. If the client has crashed or been network patitioned from the server, this continues until the client TCP reconnects to the server and re-establishes the back channel. This patch modifies the code such that it still times out the delegation recall after some minutes, so that the server will allow the conflicting client request once the delegation times out. This patch only affects the NFSv4 server when delegations are enabled and a NFSv4 client that holds a delegation has crashed or been network partitioned from the server for at least several minutes when a delegation needs to be recalled. MFC after: 2 weeks
# 87597731	26-Apr-2021	Rick Macklem <rmacklem@FreeBSD.org>	nfsd: fix the slot sequence# when a callback fails Commit 4281bfec3628 patched the server so that the callback session slot would be free'd for reuse when a callback attempt fails. However, this can often result in the sequence# for the session slot to be advanced such that the client end will reply NFSERR_SEQMISORDERED. To avoid the NFSERR_SEQMISORDERED client reply, this patch negates the sequence# advance for the case where the callback has failed. The common case is a failed back channel, where the callback cannot be sent to the client, and not advancing the sequence# is correct for this case. For the uncommon case where the client's reply to the callback is lost, not advancing the sequence# will indicate to the client that the next callback is a retry and not a new callback. But, since the FreeBSD server always sets "csa_cachethis" false in the callback sequence operation, a retry and a new callback should be handled the same way by the client, so this should not matter. Until you have this patch in your NFSv4.1/4.2 server, you should consider avoiding the use of delegations. Even with this patch, interoperation with the Linux NFSv4.1/4.2 client in kernel versions prior to 5.3 can result in frequent 15second delays if delegations are enabled. This occurs because, for kernels prior to 5.3, the Linux client does a TCP reconnect every time it sees multiple concurrent callbacks and then it takes 15seconds to recover the back channel after doing so. MFC after: 2 weeks
# 4281bfec	23-Apr-2021	Rick Macklem <rmacklem@FreeBSD.org>	nfsd: fix session slot handling for failed callbacks When the NFSv4.1/4.2 server does a callback to a client on the back channel, it will use a session slot in the back channel session. If the back channel has failed, the callback will fail and, without this patch, the session slot will not be released. As more callbacks are attempted, all session slots can become busy and then the nfsd thread gets stuck waiting for a back channel session slot. This patch frees the session slot upon callback failure to avoid this problem. Without this patch, the problem can be avoided by leaving delegations disabled in the NFS server. MFC after: 2 weeks
# 5a89498d	19-Apr-2021	Rick Macklem <rmacklem@FreeBSD.org>	nfsd: fix stripe size reply for the File Layout pNFS server At a recent testing event I found out that I had misinterpreted RFC5661 where it describes the stripe size in the File Layout's nfl_util field. This patch fixes the pNFS File Layout server so that it returns the correct value to the NFSv4.1/4.2 pNFS enabled client. This affects almost no one, since pNFS server configurations are rare and the extant pNFS aware NFS clients seemed to function correctly despite the erroneous stripe size. It might be needed for correct behaviour if a recent Linux client mounts a FreeBSD pNFS server configuration that is using File Layout (non-mirrored configuration). MFC after: 2 weeks
# 7a606f28	04-Apr-2021	Rick Macklem <rmacklem@FreeBSD.org>	nfsd: make the server repeat CB_RECALL every couple of seconds Commit 01ae8969a9ee stopped the NFSv4.1/4.2 server from implicitly binding the back channel to a new TCP connection so that it conforms to RFC5661, for NFSv4.1/4.2. An effect of this for the Linux NFS client is that it will do a BindConnectionToSession when it sees NFSV4SEQ_CBPATHDOWN set in a sequence reply. This will fix the back channel, but the first attempt at a callback like CB_RECALL will already have failed. Without this patch, a CB_RECALL will not be retried and that can result in a 5 minute delay until the delegation times out. This patch modifies the code so that it will retry the CB_RECALL every couple of seconds, often avoiding the 5 minute delay. This is not critical for correct behaviour, but avoids the 5 minute delay for the case where the Linux client re-binds the back channel via BindConnectionToSession. MFC after: 2 weeks
# 6f2addd8	04-Apr-2021	Rick Macklem <rmacklem@FreeBSD.org>	nfsd: fix BindConnectionToSession so that it clears "cb path down" Commit 01ae8969a9ee stopped the NFSv4.1/4.2 server from implicitly binding the back channel to a new TCP connection so that it conforms to RFC5661, for NFSv4.1/4.2. An effect of this for the Linux NFS client is that it will do a BindConnectionToSession when it sees NFSV4SEQ_CBPATHDOWN set in a sequence reply. It will do this for every RPC reply until it no longer sees the flag. Without that patch, this will happen until the client does an Open, which will clear LCL_CBDOWN. This patch clears LCL_CBDOWN right away, so that NFSV4SEQ_CBPATHDOWN will no longer be sent to the client in Sequence replies and the Linux client will not repeat the BindConnectionToSession RPCs. This is not critical for correct behaviour, but reduces RPC overheads for cases where the Open will not be done for a while. MFC after: 2 weeks
# 01ae8969	30-Mar-2021	Rick Macklem <rmacklem@FreeBSD.org>	nfsd: do not implicitly bind the back channel for NFSv4.1/4.2 mounts The NFSv4.1 (and 4.2 on 13) server incorrectly binds a new TCP connection to the back channel when first used by an RPC with a Sequence op in it (almost all of them). RFC5661 specifies that only the fore channel should be bound. This was done because early clients (including FreeBSD) did not do the required BindConnectionToSession RPC. Unfortunately, this breaks the Linux client when the "nconnects" mount option is used, since the server may do a callback on the incorrect TCP connection. This patch converts the server behaviour to that required by the RFC. It also makes the server test/indicate failure of the back channel more aggressively. Until this patch is applied to the server, the "nconnects" mount option is not recommended for a Linux NFSv4.1/4.2 client mount to the FreeBSD server. Reported by: bcodding@redhat.com Tested by: bcodding@redhat.com PR: 254560 MFC after: 1 week
# dc78533a	01-Jan-2021	Rick Macklem <rmacklem@FreeBSD.org>	nfsd: fix NFSv4.0 seqid handling for ERELOOKUP Commit 774a36851e0e fixed the NFS server so that it could handle ERELOOKUP returns from VOP calls by redoing the operation/RPC. However, for NFSv4.0, redoing an Open would increment the open_owner's seqid multiple times, breaking the protocol. This patch sets a new flag called ND_ERELOOKUP on the RPC when a redo is in progress. Then the code that increments the seqid avoids the seqid increment/check when the flag is set, since it indicates this has already been done for the Open.
# ff45b9fc	26-Sep-2020	Rick Macklem <rmacklem@FreeBSD.org>	Bjorn reported a problem where the Linux NFSv4.1 client is using an open_to_lock_owner4 when that lock_owner4 has already been created by a previous open_to_lock_owner4. This caused the NFS server to reply NFSERR_INVAL. For NFSv4.0, this is an error, although the updated NFSv4.0 RFC7530 notes that the correct error reply is NFSERR_BADSEQID (RFC3530 did not specify what error to return). For NFSv4.1, it is not obvious whether or not this is allowed by RFC5661, but the NFSv4.1 server can handle this case without error. This patch changes the NFSv4.1 (and NFSv4.2) server to handle multiple uses of the same lock_owner in open_to_lock_owner so that it now correctly interoperates with the Linux NFS client. It also changes the error returned for NFSv4.0 to be NFSERR_BADSEQID. Thanks go to Bjorn for diagnosing this and testing the patch. He also provided a program that I could use to reproduce the problem. Tested by: bj@cebitec.uni-bielefeld.de (Bjorn Fischer) PR: 249567 Reported by: bj@cebitec.uni-bielefeld.de (Bjorn Fischer) MFC after: 3 days
# 58dd2b52	18-Sep-2020	Rick Macklem <rmacklem@FreeBSD.org>	Fix a LOR between the NFS server and server side krpc. Recent testing of the NFS-over-TLS code found a LOR between the mutex lock used for sessions and the sleep lock used for server side krpc socket structures in nfsrv_checksequence(). This was fixed by r365789. A similar bug exists in nfsrv_bindconnsess(), where SVC_RELEASE() is called while mutexes are held. This patch applies a fix similar to r365789, moving the SVC_RELEASE() call down to after the mutexes are released. This patch fixes the problem by moving the SVC_RELEASE() call in nfsrv_checksequence() down a few lines to below where the mutex is released. MFC after: 1 week
# a5c55410	15-Sep-2020	Rick Macklem <rmacklem@FreeBSD.org>	Fix a LOR between the NFS server and server side krpc. Recent testing of the NFS-over-TLS code found a LOR between the mutex lock used for sessions and the sleep lock used for server side krpc socket structures. The code in nfsrv_checksequence() would call SVC_RELEASE() with the mutex held. Normally this is ok, since all that happens is SVC_RELEASE() decrements a reference count. However, if the socket has just been shut down, SVC_RELEASE() drops the reference count to 0 and acquires a sleep lock during destruction of the server side krpc structure. This patch fixes the problem by moving the SVC_RELEASE() call in nfsrv_checksequence() down a few lines to below where the mutex is released. MFC after: 1 week
# 2848d6d4	13-Sep-2020	Rick Macklem <rmacklem@FreeBSD.org>	Fix a case where the NFSv4.0 server might crash if delegations are enabled. asomers@ reported a crash on an NFSv4.0 server with a backtrace of: kdb_backtrace vpanic panic nfsrv_docallback nfsrv_checkgetattr nfsrvd_getattr nfsrvd_dorpc nfssvc_program svc_run_internal svc_thread_start fork_exit fork_trampoline where the panic message was "docallb", which indicates that a callback was attempted when the ClientID is unconfirmed. This would not normally occur, but it is possible to have an unconfirmed ClientID structure with delegation structure(s) chained off it if the client were to issue a SetClientID with the same "id" but different "verifier" after acquiring delegations on the previously confirmed ClientID. The bug appears to be that nfsrv_checkgetattr() failed to check for this uncommon case of an unconfirmed ClientID with a delegation structure that no longer refers to a delegation the client knows about. This patch adds a check for this case, handling it as if no delegation exists, which is the case when the above occurs. Although difficult to reproduce, this change should avoid the panic(). PR: 249127 Reported by: asomers Reviewed by: asomers MFC after: 1 week Differential Revision: https://reviews.freebbsd.org/D26342
# 586ee69f	01-Sep-2020	Mateusz Guzik <mjg@FreeBSD.org>	fs: clean up empty lines in .c and .h files
# 02511d21	10-Aug-2020	Rick Macklem <rmacklem@FreeBSD.org>	Add an argument to newnfs_connect() that indicates use TLS for the connection. For NFSv4.0, the server creates a server->client TCP connection for callbacks. If the client mount on the server is using TLS, enable TLS for this callback TCP connection. TLS connections from clients will not be supported until the kernel RPC changes are committed. Since this changes the internal ABI between the NFS kernel modules that will require a version bump, delete newnfs_trimtrailing(), which is no longer used. Since LCL_TLSCB is not yet set, these changes should not have any semantic affect at this time.
# 245bfd34	20-May-2020	Ryan Moeller <freqlabs@FreeBSD.org>	Deduplicate fsid comparisons Comparing fsid_t objects requires internal knowledge of the fsid structure and yet this is duplicated across a number of places in the code. Simplify by creating a fsidcmp function (macro). Reviewed by: mjg, rmacklem Approved by: mav (mentor) MFC after: 1 week Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D24749
# b9cc3262	12-May-2020	Ryan Moeller <freqlabs@FreeBSD.org>	nfs: Remove APPLESTATIC macro It is no longer useful. Reviewed by: rmacklem Approved by: mav (mentor) MFC after: 1 week Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D24811
# 32033b3d	08-May-2020	Ryan Moeller <freqlabs@FreeBSD.org>	Remove APPLEKEXT ifndefs They are no longer useful. Reviewed by: rmacklem Approved by: mav (mentor) MFC after: 1 week Sponsored by: iXsystems, Inc. Differential Revision: https://reviews.freebsd.org/D24752
# ae070589	17-Apr-2020	Rick Macklem <rmacklem@FreeBSD.org>	Replace all instances of the typedef mbuf_t with "struct mbuf ". The typedef mbuf_t was used for the Mac OS/X port of the code long ago. Since this port is no longer used and the use of mbuf_t obscures what the code does (and is not consistent with style(9)), it is no longer needed. This patch replaces all instances of mbuf_t with "struct mbuf ", so that it is no longer used. This patch should not result in any semantic change.
# 9f6624d3	11-Apr-2020	Rick Macklem <rmacklem@FreeBSD.org>	Replace mbuf macros with the code they would generate in the NFS code. When the code was ported to Mac OS/X, mbuf handling functions were converted to using the Mac OS/X accessor functions. For FreeBSD, they are a simple set of macros in sys/fs/nfs/nfskpiport.h. Since porting to Mac OS/X is no longer a consideration, replacement of these macros with the code generated by them makes the code more readable. When support for external page mbufs is added as needed by the KERN_TLS, the patch becomes simpler if done without the macros. This patch should not result in any semantic change.
# 76fd19b0	06-Apr-2020	Rick Macklem <rmacklem@FreeBSD.org>	Fix noisy NFSv4 server printf. Peter reported that his dmesg was getting cluttered with nfsrv_cache_session: no session messages when he rebooted his NFS server and they did not seem useful. He was correct, in that these messages are "normal" and expected when NFSv4.1 or NFSv4.2 are mounted and the server is rebooted. This patch silences the printf() during the grace period after a reboot. It also adds the client IP address to the printf(), so that the message is more useful if/when it occurs. If this happens outside of the server's grace period, it does indicate something is not working correctly. Instead of adding yet another nd_XXX argument, the arguments for nfsrv_cache_session() were simplified to take a "struct nfsrv_descript *". Reported by: pen@lysator.liu.se MFC after: 2 weeks
# 60a09a94	26-Jan-2020	Rick Macklem <rmacklem@FreeBSD.org>	Fix a crash in the NFSv4 server. The PR reported a crash that occurred when a file was removed while client(s) were actively doing lock operations on it. Since nfsvno_getvp() will return NULL when the file does not exist, the bug was obvious and easy to fix via this patch. It is a little surprising that this wasn't found sooner, but I guess the above case rarely occurs. Tested by: iron.udjin@gmail.com PR: 242768 Reported by: iron.udjin@gmail.com MFC after: 2 weeks
# b249ce48	03-Jan-2020	Mateusz Guzik <mjg@FreeBSD.org>	vfs: drop the mostly unused flags argument from VOP_UNLOCK Filesystems which want to use it in limited capacity can employ the VOP_UNLOCK_FLAGS macro. Reviewed by: kib (previous version) Differential Revision: https://reviews.freebsd.org/D21427
# f808cf72	13-Dec-2019	Rick Macklem <rmacklem@FreeBSD.org>	Silence some "might not be initialized" warnings for riscv64. None of these case were actually using the variable(s) uninitialized, but I figured that silencing the warnings via initializing them made sense. Some of these predated r355677.
# c057a378	12-Dec-2019	Rick Macklem <rmacklem@FreeBSD.org>	Add support for NFSv4.2 to the NFS client and server. This patch adds support for NFSv4.2 (RFC-7862) and Extended Attributes (RFC-8276) to the NFS client and server. NFSv4.2 is comprised of several optional features that can be supported in addition to NFSv4.1. This patch adds the following optional features: - posix_fadvise(POSIX_FADV_WILLNEED/POSIX_FADV_DONTNEED) - posix_fallocate() - intra server file range copying via the copy_file_range(2) syscall --> Avoiding data tranfer over the wire to/from the NFS client. - lseek(SEEK_DATA/SEEK_HOLE) - Extended attribute syscalls for "user" namespace attributes as defined by RFC-8276. Although this patch is fairly large, it should not affect support for the other versions of NFS. However it does add two new sysctls that allow a sysadmin to limit which minor versions of NFSv4 a server supports, allowing a sysadmin to disable NFSv4.2. Unfortunately, when the NFS stats structure was last revised, it was assumed that there would be no additional operations added beyond what was specified in RFC-7862. However RFC-8276 did add additional operations, forcing the NFS stats structure to revised again. It now has extra unused entries in all arrays, so that future extensions to NFSv4.2 can be accomodated without revising this structure again. A future commit will update nfsstat(1) to report counts for the new NFSv4.2 specific operations/procedures. This patch affects the internal interface between the nfscommon, nfscl and nfsd modules and, as such, they all must be upgraded simultaneously. I will do a version bump (although arguably not needed), due to this. This code has survived a "make universe" but has not been built with a recent GCC. If you encounter build problems, please email me. Relnotes: yes
# abd80ddb	08-Dec-2019	Mateusz Guzik <mjg@FreeBSD.org>	vfs: introduce v_irflag and make v_type smaller The current vnode layout is not smp-friendly by having frequently read data avoidably sharing cachelines with very frequently modified fields. In particular v_iflag inspected for VI_DOOMED can be found in the same line with v_usecount. Instead make it available in the same cacheline as the v_op, v_data and v_type which all get read all the time. v_type is avoidably 4 bytes while the necessary data will easily fit in 1. Shrinking it frees up 3 bytes, 2 of which get used here to introduce a new flag field with a new value: VIRF_DOOMED. Reviewed by: kib, jeff Differential Revision: https://reviews.freebsd.org/D22715
# 2e670777	04-Sep-2019	Rick Macklem <rmacklem@FreeBSD.org>	Delete the unused "nd" argument for nfsrv_checkdsattr(). The "nd" argument for nfsrv_checkdsattr() is no longer used by the function. This patch deletes it. This allows subsequent patches to delete the "nd" argument from nfsrv_proxyds(), since it's only use of "nd" was to pass it to nfsrv_checkdsattr(). The same will then be true for nfsvno_getattr(), which passes "nd" to nfsrv_proxyds(). Getting rid of the "nd" argument from nfsvno_getattr() avoids confusion over why it might need "nd". This patch is trivial and does not have any semantic effect. Found by inspection while working on the NFSv4.2 server.
# ed2f1001	13-Apr-2019	Rick Macklem <rmacklem@FreeBSD.org>	Add support for INET6 addresses to the kernel code that dumps open/lock state. PR#223036 reported that INET6 callback addresses were not printed by nfsdumpstate(8). This kernel patch adds INET6 addresses to the dump structure, so that nfsdumpstate(8) can print them out, post-r346190. The patch also includes the addition of #ifdef INET, INET6 as requested by bz@. PR: 223036 Reviewed by: bz, rgrimes MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D19839
# fdab4d3b	18-Aug-2018	Rick Macklem <rmacklem@FreeBSD.org>	Fix LORs between vn_start_write() and vn_lock() in nfsrv_copymr(). When coding the pNFS server, I added vn_start_write() calls in nfsrv_copymr() done while the vnodes were locked, not realizing I had introduced LORs and possible deadlock when an exported file system on the MDS is suspended. This patch fixes the LORs by moving the vn_start_write() calls up to before where the vnodes are locked. For "tvp", the vn_start_write() probaby isn't necessary, because NFS mounts can't be suspended. However, I think doing so is harmless. Thanks go to kib@ for letting me know that I had introduced these LORs. This patch only affects the behaviour of the pNFS server when pnfsdscopymr(8) is used to recover a mirrored DS.
# 41df1b5b	08-Aug-2018	Rick Macklem <rmacklem@FreeBSD.org>	Assorted fixes to handling of LayoutRecall callbacks, mostly error handling. After a re-read of the appropriate section of RFC5661, I decided that a few things should be changed related to LayoutRecall callback handling. Here are the things fixed by this patch. - For two of the three cases that LayoutRecall is done, I now think setting the clora_changed argument false is correct. - All errors other than NFSERR_DELAY returned by LayoutRecall appear permanent, so don't retry for any of them. (NFSERR_DELAY is retried by newnfs_request(), so it is not affected by this patch.) - Instead of waiting "forever" (actually until the process is SIGTERM'd) for Layouts to be returned during a mirror copy, fail and return ENXIO after about 1minute. Waiting for a <ctrl>C made sense when pnfsdscopymr() was done by itself, but did not make sense when done via find(1). This patch only affects the pNFS server.
# 25705dd5	05-Aug-2018	Rick Macklem <rmacklem@FreeBSD.org>	Copy all bits of a file handle in case there is padding in the structure. At least on x86, fhandle_t is a packed structure, so I believe an assignment will copy all the bits. However, for some current/future architectures, there might be padding in the structure that doesn't get copied via an assignment. Since NFS assumes a file handle is an opaque blob of bits that can be compared via memcmp()/bcmp(), all the bits including any padding must be copied. This patch replaces the assignments with a call to a byte copy function. Spotted during code inspection.
# 8014c971	29-Jul-2018	Rick Macklem <rmacklem@FreeBSD.org>	Silence newer gcc warnings. Newer versions of gcc generate "set, but not used" warnings in the NFS server. Add __unused macros to silence these warnings. Requested by: mmacy
# a3e709cd	28-Jul-2018	Rick Macklem <rmacklem@FreeBSD.org>	Modify the NFSv4.1 server so that it allows ReclaimComplete as done by ESXi 6.7. I believe that a ReclaimComplete with rca_one_fs == TRUE is only to be used after a file system has been transferred to a different file server. However, RFC5661 is somewhat vague w.r.t. this and the ESXi 6.7 client does both a ReclaimComplete with rca_one_fs == TRUE and one with ReclaimComplete with rca_one_fs == FALSE. Therefore, just ignore the rca_one_fs == TRUE operation and return NFS_OK without doing anything instead of replying NFS4ERR_NOTSUPP. This allows the ESXi 6.7 NFSv4.1 client to do a mount. After discussion on the NFSv4 IETF working group mailing list, doing this along with setting a flag to note that a ReclaimComplete with rca_one_fs TRUE was an appropriate way to handle this. The flag that indicates that a ReclaimComplete with rca_one_fs == TRUE was done may be used to disable replies of NFS4ERR_GRACE for non-reclaim state operations in a future commit. This patch along with r332790, r334492 and r336357 allow ESXi 6.7 NFSv4.1 mounts work ok. ESX 6.5 NFSv4.1 mounts do not work well, due to what I believe are violations of RFC-5661 and should not be used. Reported by: andreas.nagy@frequentis.com Tested by: andreas.nagy@frequentis.com, daniel@ftml.net (earlier version) MFC after: 2 weeks Relnotes: yes
# 5d54f186	16-Jul-2018	Rick Macklem <rmacklem@FreeBSD.org>	Modify the reasons for not issuing a delegation in the NFSv4.1 server. The ESXi NFSv4.1 client will generate warning messages when the reason for not issuing a delegation is two. Two refers to a resource limit and I do not see why it would be considered invalid. However it probably was not the best choice of reason for not issuing a delegation. This patch changes the reasons used to ones that the ESXi client doesn't complain about. This change does not affect the FreeBSD client and does not appear to affect behaviour of the Linux NFSv4.1 client. RFC5661 defines these "reasons" but does not give any guidance w.r.t. which ones are more appropriate to return to a client. Tested by: andreas.nagy@frequentis.com PR: 226650 MFC after: 2 weeks
# de9a1a70	09-Jul-2018	Rick Macklem <rmacklem@FreeBSD.org>	Add support for a "forced" pnfsdskill to the pNFS server kernel code. The pnfsdskill(8) command will normally fail if there is no valid mirror for the DS to be disabled. However, a system administrator may need to disable a DS which does not have a valid mirror so that the nfsd threads can be terminated. This patch adds the kernel code needed by pnfsdskill(8) to implement this "forced" case of disabling a DS. This patch only affects the pNFS server.
# acc6e58d	08-Jul-2018	Rick Macklem <rmacklem@FreeBSD.org>	Fix the kernel part of pnfsdscopymr() to handle holes in the file being copied. If a mirrored DS is being recovered that has a lot of large sparse files, pnfsdscopymr(8) would use a lot of space on the recovered mirror since it would write the "holes" in the file being mirrored. This patch adds code to check for a "hole" and skip doing the write. The check is done on a "per PNFSDS_COPYSIZ size block", which is currently 64K. I think that most file server file systems will be using a blocksize at least this large. If the file server is using a smaller blocksize and smaller holes need to be preserved, PNFSDS_COPYSIZ could be decreased. The block of 0s is malloc()d, since pnfsdcopymr(8) should be an infrequent occurrence.
# 5b500ea9	06-Jul-2018	Rick Macklem <rmacklem@FreeBSD.org>	Change the pNFS server so that it does not disable a mirrored DS for an NFSERR_STALE error reported via a LayoutReturn. The current FreeBSD client can generate these errors for an operational DS while doing a recovery of a mirror after a mirrored DS has been repaired. I am not sure why these errors occur, but my best current guess is a race between the Layout Recall issued by the kernel code run from pnfsdscopymr(8) and a Read operation on the DS for the file bing copied. The errors are not fatal, since the client falls back on doing I/O through the MDS, which can do the I/O successfully as a proxy. (The fact that the MDS can do this indicates that the file does still exist on the functioning DS.) This change only affects the pNFS server and only when a client does a LayoutReturn with the NFSERR_STALE error report.
# ff3b992f	04-Jul-2018	Rick Macklem <rmacklem@FreeBSD.org>	Fix the pNFS server so that it handles the "#mds_path" check for mirrors. The recently added feature of the pNFS server will set an fsid for the MDS file system to define the file system a DS should store files for. For a case where a DS handling all file systems has failed, it was possible for the code to check for a mirror with a specified fs, even though nfsdev_mdsisset was 0, possibly causing a false successful check for a mirror. This patch adds a check for nfsdev_mdsisset != 0 to avoid this. It only affects the pNFS server for a rare case. Found via code inspection.
# 2f32675c	02-Jul-2018	Rick Macklem <rmacklem@FreeBSD.org>	Add an optional feature to the pNFS server. Without this patch, the pNFS server distributes the data storage files across all of the specified DSs. A tester noted that it would be nice if a system administrator could control which DSs are used to store the file data for a given exported MDS file system. This patch adds the kernel support to do this. It also makes a slight semantic change to nfsv4_findmirror(), since some uses of it no longer require that the DS being searched for have a current mirror. A patch that will be committed in a few minutes will modify the nfsd daemon to support this feature. The patch should only affect sites using the pNFS server (specified via the "-p" command line option for nfsd. Suggested by: james.rose@framestore.com
# c16f407e	21-Jun-2018	Rick Macklem <rmacklem@FreeBSD.org>	Add a counter to limit the number of disabled DSs for a mirrored pNFS MDS. This patch adds a counter that limits the number of disabled mirrored DSs to mirror level - 1. It also makes a small change that keeps a Write that has failed with EACCES when attempted by a client to a DS from disabling the DS. This patch only affects the pNFS server.
# 90d2dfab	12-Jun-2018	Rick Macklem <rmacklem@FreeBSD.org>	Merge the pNFS server code from projects/pnfs-planb-server into head. This code merge adds a pNFS service to the NFSv4.1 server. Although it is a large commit it should not affect behaviour for a non-pNFS NFS server. Some documentation on how this works can be found at: http://people.freebsd.org/~rmacklem/pnfs-planb-setup.txt and will hopefully be turned into a proper document soon. This is a merge of the kernel code. Userland and man page changes will come soon, once the dust settles on this merge. It has passed a "make universe", so I hope it will not cause build problems. It also adds NFSv4.1 server support for the "current stateid". Here is a brief overview of the pNFS service: A pNFS service separates the Read/Write oeprations from all the other NFSv4.1 Metadata operations. It is hoped that this separation allows a pNFS service to be configured that exceeds the limits of a single NFS server for either storage capacity and/or I/O bandwidth. It is possible to configure mirroring within the data servers (DSs) so that the data storage file for an MDS file will be mirrored on two or more of the DSs. When this is used, failure of a DS will not stop the pNFS service and a failed DS can be recovered once repaired while the pNFS service continues to operate. Although two way mirroring would be the norm, it is possible to set a mirroring level of up to four or the number of DSs, whichever is less. The Metadata server will always be a single point of failure, just as a single NFS server is. A Plan B pNFS service consists of a single MetaData Server (MDS) and K Data Servers (DS), all of which are recent FreeBSD systems. Clients will mount the MDS as they would a single NFS server. When files are created, the MDS creates a file tree identical to what a single NFS server creates, except that all the regular (VREG) files will be empty. As such, if you look at the exported tree on the MDS directly on the MDS server (not via an NFS mount), the files will all be of size 0. Each of these files will also have two extended attributes in the system attribute name space: pnfsd.dsfile - This extended attrbute stores the information that the MDS needs to find the data storage file(s) on DS(s) for this file. pnfsd.dsattr - This extended attribute stores the Size, AccessTime, ModifyTime and Change attributes for the file, so that the MDS doesn't need to acquire the attributes from the DS for every Getattr operation. For each regular (VREG) file, the MDS creates a data storage file on one (or more if mirroring is enabled) of the DSs in one of the "dsNN" subdirectories. The name of this file is the file handle of the file on the MDS in hexadecimal so that the name is unique. The DSs use subdirectories named "ds0" to "dsN" so that no one directory gets too large. The value of "N" is set via the sysctl vfs.nfsd.dsdirsize on the MDS, with the default being 20. For production servers that will store a lot of files, this value should probably be much larger. It can be increased when the "nfsd" daemon is not running on the MDS, once the "dsK" directories are created. For pNFS aware NFSv4.1 clients, the FreeBSD server will return two pieces of information to the client that allows it to do I/O directly to the DS. DeviceInfo - This is relatively static information that defines what a DS is. The critical bits of information returned by the FreeBSD server is the IP address of the DS and, for the Flexible File layout, that NFSv4.1 is to be used and that it is "tightly coupled". There is a "deviceid" which identifies the DeviceInfo. Layout - This is per file and can be recalled by the server when it is no longer valid. For the FreeBSD server, there is support for two types of layout, call File and Flexible File layout. Both allow the client to do I/O on the DS via NFSv4.1 I/O operations. The Flexible File layout is a more recent variant that allows specification of mirrors, where the client is expected to do writes to all mirrors to maintain them in a consistent state. The Flexible File layout also allows the client to report I/O errors for a DS back to the MDS. The Flexible File layout supports two variants referred to as "tightly coupled" vs "loosely coupled". The FreeBSD server always uses the "tightly coupled" variant where the client uses the same credentials to do I/O on the DS as it would on the MDS. For the "loosely coupled" variant, the layout specifies a synthetic user/group that the client uses to do I/O on the DS. The FreeBSD server does not do striping and always returns layouts for the entire file. The critical information in a layout is Read vs Read/Writea and DeviceID(s) that identify which DS(s) the data is stored on. At this time, the MDS generates File Layout layouts to NFSv4.1 clients that know how to do pNFS for the non-mirrored DS case unless the sysctl vfs.nfsd.default_flexfile is set non-zero, in which case Flexible File layouts are generated. The mirrored DS configuration always generates Flexible File layouts. For NFS clients that do not support NFSv4.1 pNFS, all I/O operations are done against the MDS which acts as a proxy for the appropriate DS(s). When the MDS receives an I/O RPC, it will do the RPC on the DS as a proxy. If the DS is on the same machine, the MDS/DS will do the RPC on the DS as a proxy and so on, until the machine runs out of some resource, such as session slots or mbufs. As such, DSs must be separate systems from the MDS. Tested by: james.rose@framestore.com Relnotes: yes
# 9442a64e	01-Jun-2018	Rick Macklem <rmacklem@FreeBSD.org>	Add the BindConnectiontoSession operation to the NFSv4.1 server. Under some fairly unusual circumstances, the Linux NFSv4.1 client is doing a BindConnectiontoSession operation for TCP connections. It is also used by the ESXi6.5 NFSv4.1 client. This patch adds this operation to the NFSv4.1 server. Reported by: andreas.nagy@frequentis.com Tested by: andreas.nagy@frequentis.com MFC after: 2 weeks
# 440e2f9e	30-May-2018	Rick Macklem <rmacklem@FreeBSD.org>	Strengthen locking for the NFSv4.1 server DestroySession operation. If a client did a DestroySession on a session while it was still in use, the server might try to use the session structure after it is free'd. I think the client has violated RFC5661 if it does this, but this patch makes DestroySession block all other nfsd threads so no thread could be using the session when it is free'd. After the DestroySession, nfsd threads will not be able to find the session. The patch also adds a check for nd_sessionid being set, although if that was not the case it would have been all 0s and unlikely to have a false match. This might fix the crashes described in PR#228497 for the FreeNAS server. PR: 228497 MFC after: 1 week
# 04b19055	17-May-2018	Rick Macklem <rmacklem@FreeBSD.org>	Add a missing nfsrv_freesession() call for an unlikely failure case. Since NFSv4.1 clients normally create a single session which supports both fore and back channels, it is unlikely that a callback will fail due to a lack of a back channel. However, if this failure occurred, the session wasn't being dereferenced and would never be free'd. Found by inspection during pNFS server development. Tested by: andreas.nagy@frequentis.com MFC after: 2 months
# 0ebe2634	15-May-2018	Rick Macklem <rmacklem@FreeBSD.org>	End grace for the NFSv4 server if all mounts do ReclaimComplete. The NFSv4 protocol requires that the server only allow reclaim of state and not issue any new open/lock state for a grace period after booting. The NFSv4.0 protocol required this grace period to be greater than the lease duration (over 2minutes). For NFSv4.1, the client tells the server that it has done reclaiming state by doing a ReclaimComplete operation. If all NFSv4 clients are NFSv4.1, the grace period can end once all the clients have done ReclaimComplete, shortening the time period considerably. This patch does this. If there are any NFSv4.0 mounts, the grace period will still be over 2minutes. This change is only an optimization and does not affect correct operation. Tested by: andreas.nagy@frequentis.com MFC after: 2 months
# 0f13d146	12-May-2018	Rick Macklem <rmacklem@FreeBSD.org>	Fix a slow leak of session structures in the NFSv4.1 server. For a fairly rare case of a client doing an ExchangeID after a hard reboot, the old confirmed clientid still exists, but some clients use a new co_verifier. For this case, the server was not freeing up the sessions on the old confirmed clientid. This patch fixes this case. It also adds two LIST_INIT() macros, which are actually no-ops, since the structure is malloc()d with M_ZERO so the pointer is already set to NULL. It should have minimal impact, since the only way I could exercise this code path was by doing a hard power cycle (pulling the plus) on a machine running Linux with a NFSv4.1 mount on the server. Originally spotted during testing of the ESXi 6.5 client. Tested by: andreas.nagy@frequentis.com MFC after: 2 months
# bb343696	12-May-2018	Rick Macklem <rmacklem@FreeBSD.org>	The NFSv4.1 server should return NFSERR_BACKCHANBUSY instead of NFS_OK. When an NFSv4.1 session is busy due to a callback being in progress, nfsrv_freesession() should return NFSERR_BACKCHANBUSY instead of NFS_OK. The only effect this has is that the DestroySession operation will report the failure for this case and this probably has little or no effect on a client. Spotted by inspection and no failures related to this have been reported. MFC after: 2 months
# 5d4835e4	11-May-2018	Rick Macklem <rmacklem@FreeBSD.org>	Add support for the TestStateID operation to the NFSv4.1 server. The Linux client now uses the TestStateID operation, so this patch adds support for it to the NFSv4.1 server. The FreeBSD client never uses this operation, so it should not be affected. MFC after: 2 months
# 7427a9f1	02-May-2018	Rick Macklem <rmacklem@FreeBSD.org>	Revert r333183, since I am not sure that just initializing the list is the correct thing to do and that is already done without this commit.
# 858bb2fc	02-May-2018	Rick Macklem <rmacklem@FreeBSD.org>	Add two missing LIST_INIT()s. This patch adds two missing LIST_INIT()s. Found by inspection. In practice, these are currently no-ops, since the structure they are in is malloc'd with M_ZERO and all LIST_INIT does is set the pointer in the list head to NULL. (In other words, the M_ZERO has already correctly initialized it.) MFC after: 2 months
# b97b91b5	25-Jan-2018	Conrad Meyer <cem@FreeBSD.org>	nfs: Remove NFSSOCKADDRALLOC, NFSSOCKADDRFREE macros They were just thin wrappers over malloc(9) w/ M_ZERO and free(9). Discussed with: rmacklem, markj Sponsored by: Dell EMC Isilon
# 222daa42	25-Jan-2018	Conrad Meyer <cem@FreeBSD.org>	style: Remove remaining deprecated MALLOC/FREE macros Mechanically replace uses of MALLOC/FREE with appropriate invocations of malloc(9) / free(9) (a series of sed expressions). Something like: * MALLOC(a, b, ... -> a = malloc(... * FREE( -> free( * free((caddr_t) -> free( No functional change. For now, punt on modifying contrib ipfilter code, leaving a definition of the macro in its KMALLOC(). Reported by: jhb Reviewed by: cy, imp, markj, rmacklem Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D14035
# becffb36	18-Jan-2018	Emmanuel Vadot <manu@FreeBSD.org>	nfs: Do not printf each time a lock structure is freed during module unload There can be a lot of those structures and printing a line each time we free one on module unload. MFC after: 3 days
# 151ba793	24-Dec-2017	Alexander Kabaev <kan@FreeBSD.org>	Do pass removing some write-only variables from the kernel. This reduces noise when kernel is compiled by newer GCC versions, such as one used by external toolchain ports. Reviewed by: kib, andrew(sys/arm and sys/arm64), emaste(partial), erj(partial) Reviewed by: jhb (sys/dev/pci/* sys/kern/vfs_aio.c and sys/kern/kern_synch.c) Differential Revision: https://reviews.freebsd.org/D10385
# b07bc39a	04-Dec-2017	Rick Macklem <rmacklem@FreeBSD.org>	Avoid the overhead of acquiring a lock in nfsrv_checkgetattr() when there are no write delegations issued. manu@ reported on the freebsd-current@ mailing list that there was a significant performance hit in nfsrv_checkgetattr() caused by the acquisition/release of a state lock, even when there were no write delegations issued. This patch add a count of outstanding issued write delegations to the NFSv4 server. This count allows nfsrv_checkgetattr() to return without acquiring any lock when the count is 0, avoiding the performance hit for the case where no write delegations are issued. Reported by: manu Reviewed by: kib MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D13327
# d63027b6	27-Nov-2017	Pedro F. Giffuni <pfg@FreeBSD.org>	sys/fs: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts.
# 57ef3db3	15-Oct-2017	Rick Macklem <rmacklem@FreeBSD.org>	Fix the client IP address reported by nfsdumpstate for 64bit arch and NFSv4.1. The client IP address was not being reported for some NFSv4 mounts by nfsdumpstate. Upon investigation, two problems were found for mounts using IPv4. One was that the code (originally written and tested on i386) assumed that a "u_long" was a "uint32_t" and would exactly store an IPv4 host address. Not correct for 64bit arches. Also, for NFSv4.1 mounts, the field was not being filled in. This was basically correct, because NFSv4.1 does not use a callback address. However, it meant that nfsdumpstate could not report the client IP addr. This patch should fix both of these issues. For IPv6, the address will still not be reported. The original NFSv4 RFC only specified IPv4 callback addresses. I think this has changed and, if so, a future commit to fix reporting of IPv6 addresses will be needed. Reported by: manu PR: 223036 MFC after: 2 weeks
# 858f6fe3	24-Apr-2017	Rick Macklem <rmacklem@FreeBSD.org>	Allow use of a write open stateid for reading in the NFSv4 server. The NFSv4 RFCs give a server the option of allowing the use of an open stateid for write access to be used for a Read operation. This patch enables this by default and adds a sysctl to disable it, for anyone who does not want this capability. Allowing this is particularily useful for a pNFS Data Server (DS), since they are not permitted to allow the use of special stateids. Discovered during recent testing of the pNFS server under development. MFC after: 2 weeks
# b4b4b530	28-Jan-2017	Baptiste Daroussin <bapt@FreeBSD.org>	Revert crap accidentally committed
# 814aaaa7	28-Jan-2017	Baptiste Daroussin <bapt@FreeBSD.org>	Revert r312923 a better approach will be taken later
# a5d19b81	05-Dec-2016	Rick Macklem <rmacklem@FreeBSD.org>	Fix the NFSv4.1 server for Open reclaim after a reboot. The NFSv4.1 server failed to update the nfs-stablerestart file for a client when the client was issued its first Open. As such, recovery of Opens after a server reboot failed with NFSERR_NOGRACE. This patch fixes this. It also changes the code so that it malloc()'s the 1024 byte array instead of allocating it on the kernel stack for both NFSv4.0 and NFSv4.1. Note that this bug only affected NFSv4.1 and only when clients attempted to reclaim Opens after a server reboot. MFC after: 2 weeks
# dcb19c38	20-Oct-2016	Rick Macklem <rmacklem@FreeBSD.org>	A problem w.r.t. interoperation between the FreeBSD NFSv4.1 server with delegations enabled and the Linux NFSv4.1 client was reported in reviews.freebsd.org/D7891. I believe that the FreeBSD server behaviour conforms to the RFC and that the Linux client has a bug. Therefore, I do not think the proposed patch is appropriate. When nfsrv_writedelegifpos is non-zero, the FreeBSD server will issue a write delegation for a read open if possible. The Linux client then erroneously assumes that the credentials used for the read open can write the file. This patch reverses the default value for nfsrv_writedelegifpos to 0 so that the default behaviour is Linux compatible and adds a sysctl that can be used to set nfsrv_writedelegifpos. This change should only affect users that are mounting a FreeBSD server with delegations enabled (they are not enabled by default) with a Linux NFSv4.1 client mount. Reported by: fatih.acar@gandi.net Tested by: fatih.acar@gandi.net MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D7891
# 1b819cf2	12-Aug-2016	Rick Macklem <rmacklem@FreeBSD.org>	Update the nfsstats structure to include the changes needed by the patch in D1626 plus changes so that it includes counts for NFSv4.1 (and the draft of NFSv4.2). Also, make all the counts uint64_t and add a vers field at the beginning, so that future revisions can easily be implemented. There is code in place to handle the old vesion of the nfsstats structure for backwards binary compatibility. Subsequent commits will update nfsstat(8) to use the new fields. Submitted by: will (earlier version) Reviewed by: ken MFC after: 1 month Relnotes: yes Differential Revision: https://reviews.freebsd.org/D1626
# a96c9b30	29-Apr-2016	Pedro F. Giffuni <pfg@FreeBSD.org>	NFS: spelling fixes on comments. No funcional change.
# 0533d726	22-Apr-2016	Rick Macklem <rmacklem@FreeBSD.org>	Fix a LOR in the NFSv4.1 server. The ordering of acquisition of the state and session mutexes was reversed in two cases executed when an NFSv4.1 client created/freed a session. Since clients will typically do this only when mounting and dismounting, the likelyhood of causing a deadlock was low but possible. This can only occur for NFSv4.1 mounts, since the others do not use sessions. This was detected while testing the pNFS server/client where the client crashed during dismounting. The patch also reorders the unlocks, although that isn't necessary for correct operation. MFC after: 2 weeks
# a0962bf8	21-Nov-2015	Rick Macklem <rmacklem@FreeBSD.org>	When the nfsd threads are terminated, the NFSv4 server state (opens, locks, etc) is retained, which I believe is correct behaviour. However, for NFSv4.1, the server also retained a reference to the xprt (RPC transport socket structure) for the backchannel. This caused svcpool_destroy() to not call SVC_DESTROY() for the xprt and allowed a socket upcall to occur after the mutexes in the svcpool were destroyed, causing a crash. This patch fixes the code so that the backchannel xprt structure is dereferenced just before svcpool_destroy() is called, so the code does do an SVC_DESTROY() on the xprt, which shuts down the socket upcall. Tested by: g_amanakis@yahoo.com PR: 204340 MFC after: 2 weeks
# 29dc40b6	14-Aug-2015	Rick Macklem <rmacklem@FreeBSD.org>	For the case where an NFSv4.1 ExchangeID operation has the client identifier that already has a confirmed ClientID, the nfsrv_setclient() function would not fill in the clientidp being returned. As such, the value of ClientID returned would be whatever garbage was on the stack. An NFSv4.1 client would not normally do this, but it appears that it can happen for certain Linux clients. When it happens, the client persistently retries the ExchangeID and Create_session after Create_session fails when it uses the bogus clientid. With this patch, the correct clientid is replied. This problem was identified in a packet trace supplied by Ahmed Kamal via email. Reported by: email.ahmedkamal@googlemail.com MFC after: 2 weeks
# 25f37276	29-Jul-2015	Rick Macklem <rmacklem@FreeBSD.org>	This patch fixes a problem where, if the NFSv4 server has a previous unconfirmed clientid structure for the same client on the last hash list, this old entry would not be removed/deleted. I do not think this bug would have caused serious problems, since the new entry would have been before the old one on the list. This old entry would have eventually been scavenged/removed. Detected while reading the code looking for another bug. MFC after: 3 days
# 1f54e596	27-May-2015	Rick Macklem <rmacklem@FreeBSD.org>	Make the size of the hash tables used by the NFSv4 server tunable. No appreciable change in performance was observed after increasing the sizes of these tables and then testing with a single client. However, there was an email that indicated high CPU overheads for a heavily loaded NFSv4 and it is hoped that increasing the sizes of the hash tables via these tunables might help. The tables remain the same size by default. Differential Revision: https://reviews.freebsd.org/D2596 MFC after: 2 weeks
# 52f1bb38	24-Dec-2014	Rick Macklem <rmacklem@FreeBSD.org>	A deadlock in the NFSv4 server with vfs.nfsd.enable_locallocks=1 was reported via email. This was caused by a LOR between the sleep lock used to serialize the local locking (nfsrv_locklf()) and locking the vnode. I believe this patch fixes the problem by delaying relocking of the vnode until the sleep lock is unlocked (nfsrv_unlocklf()). To avoid nfsvno_advlock() having the side effect of unlocking the vnode, unlocking the vnode was moved to before the functions that call nfsvno_advlock(). It shouldn't affect the execution of the default case where vfs.nfsd.enable_locallocks=0. Reported by: loic.blot@unix-experience.fr Discussed with: kib MFC after: 1 week
# d8a5961f	02-Oct-2014	Marcelo Araujo <araujo@FreeBSD.org>	Fix failures and warnings reported by newpynfs20090424 test tool. This fix addresses only issues with the pynfs reports, none of these issues are know to create problems for extant real clients. Submitted by: Bart Hsiao <bart.hsiao@gmail.com> Reworked by: myself Reviewed by: rmacklem Approved by: rmacklem Sponsored by: QNAP Systems Inc.
# c59e4cc3	01-Jul-2014	Rick Macklem <rmacklem@FreeBSD.org>	Merge the NFSv4.1 server code in projects/nfsv4.1-server over into head. The code is not believed to have any effect on the semantics of non-NFSv4.1 server behaviour. It is a rather large merge, but I am hoping that there will not be any regressions for the NFS server. MFC after: 1 month
# ab7f2410	23-Apr-2014	Rick Macklem <rmacklem@FreeBSD.org>	Remove an unnecessary level of indirection for an argument. This simplifies the code and should avoid the clang sparc port from generating an abort() call. Requested by: rdivacky Submitted by: jhb MFC after: 2 weeks
# 43a213bb	24-Dec-2013	Rick Macklem <rmacklem@FreeBSD.org>	The NFSv4 server would call VOP_SETATTR() with a shared locked vnode when a Getattr for a file is done by a client other than the one that holds the file's delegation. This would only happen when delegations are enabled and the problem is fixed by this patch. MFC after: 1 week
# a89a2c8b	25-Jan-2013	John Baldwin <jhb@FreeBSD.org>	Further cleanups to use of timestamps in NFS: - Use NFSD_MONOSEC (which maps to time_uptime) instead of the seconds portion of wall-time stamps to manage timeouts on events. - Remove unused nd_starttime from the per-request structure in the new NFS server. - Use nanotime() for the modification time on a delegation to get as precise a time as possible. - Use time_second instead of extracting the second from a call to getmicrotime(). Submitted by: bde (3) Reviewed by: bde, rmacklem MFC after: 2 weeks
# 1f60bfd8	08-Dec-2012	Rick Macklem <rmacklem@FreeBSD.org>	Move the NFSv4.1 client patches over from projects/nfsv4.1-client to head. I don't think the NFS client behaviour will change unless the new "minorversion=1" mount option is used. It includes basic NFSv4.1 support plus support for pNFS using the Files Layout only. All problems detecting during an NFSv4.1 Bakeathon testing event in June 2012 have been resolved in this code and it has been tested against the NFSv4.1 server available to me. Although not reviewed, I believe that kib@ has looked at it.
# eb1b1807	05-Dec-2012	Gleb Smirnoff <glebius@FreeBSD.org>	Mechanically substitute flags from historic mbuf allocator with malloc(9) flags within sys. Exceptions: - sys/contrib not touched - sys/mbuf.h edited manually
# 2108487e	12-May-2012	Rick Macklem <rmacklem@FreeBSD.org>	Fix two cases in the new NFS server where a tsleep() is used, when the code should actually protect the tested variable with a mutex. Since the tsleep()s had a 10sec timeout, the race would have only delayed the allocation of a new clientid for a client. The sleeps will also rarely occur, since having a callback in progress when a client acquires a new clientid, is unlikely. in practice, since having a callback in progress when a fresh clientid is being acquired by a client is unlikely. MFC after: 1 month
# 526d0bd5	20-Feb-2012	Konstantin Belousov <kib@FreeBSD.org>	Fix found places where uio_resid is truncated to int. Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from the usermode. Discussed with: bde, das (previous versions) MFC after: 1 month
# 5b79362b	13-Jan-2012	Rick Macklem <rmacklem@FreeBSD.org>	Tai Horgan reported via email that there were two places in the new NFSv4 server where the code follows the wrong list. Fortunately, for these fairly rare cases, the lc_stateid[] lists are normally empty. This patch fixes the code to follow the correct list. Reported by: tai.horgan at isilon.com Discussed with: zack MFC after: 2 weeks
# a9285ae5	16-Jul-2011	Zack Kirsch <zack@FreeBSD.org>	Add DEXITCODE plumbing to NFS. Isilon has the concept of an in-memory exit-code ring that saves the last exit code of a function and allows for stack tracing. This is very helpful when debugging tough issues. This patch is essentially a no-op for BSD at this point, until we upstream the dexitcode logic itself. The patch adds DEXITCODE calls to every NFS function that returns an errno error code. A number of code paths were also reorganized to have single exit paths, to reduce code duplication. Submitted by: David Kwan <dkwan@isilon.com> Reviewed by: rmacklem Approved by: zml (mentor) MFC after: 2 weeks
# 68347a92	16-Jul-2011	Zack Kirsch <zack@FreeBSD.org>	Simple find/replace of VOP_ISLOCKED -> NFSVOPISLOCKED. This is done so that NFSVOPISLOCKED can be modified later to add enhanced logging and assertions. Reviewed by: rmacklem Approved by: zml (mentor) MFC after: 2 weeks
# a9989634	16-Jul-2011	Zack Kirsch <zack@FreeBSD.org>	Simple find/replace of VOP_UNLOCK -> NFSVOPUNLOCK. This is done so that NFSVOPUNLOCK can be modified later to add enhanced logging and assertions. Reviewed by: rmacklem Approved by: zml (mentor) MFC after: 2 weeks
# 98f234f3	16-Jul-2011	Zack Kirsch <zack@FreeBSD.org>	Simple find/replace of vn_lock -> NFSVOPLOCK. This is done so that NFSVOPLOCK can be modified later to add enhanced logging and assertions. Reviewed by: rmacklem Approved by: zml (mentor) MFC after: 2 weeks
# ff29f3b2	27-May-2011	Rick Macklem <rmacklem@FreeBSD.org>	Fix the new NFS client so that it handles NFSv4 state correctly during a forced dismount. This required that the exclusive and shared (refcnt) sleep lock functions check for MNTK_UMOUNTF before sleeping, so that they won't block while nfscl_umount() is getting rid of the state. As such, a "struct mount *" argument was added to the locking functions. I believe the only remaining case where a forced dismount can get hung in the kernel is when a thread is already attempting to do a TCP connect to a dead server when the krpc client structure called nr_client is NULL. This will only happen just after a "mount -u" with options that force a new TCP connection is done, so it shouldn't be a problem in practice. MFC after: 2 weeks
# 806e2e4b	10-Apr-2011	Rick Macklem <rmacklem@FreeBSD.org>	Add some cleanup code to the module unload operation for the experimental NFS server, so that it doesn't leak memory when unloaded. However, unloading the NFSv4 server is not recommended, since all NFSv4 state will be lost by the unload and clients will have to recover the state after a server reload/restart as if the server crashed/rebooted. MFC after: 2 weeks
# 5f73287a	14-Jan-2011	Rick Macklem <rmacklem@FreeBSD.org>	Modify the experimental NFSv4 server so that it posts a SIGUSR2 signal to the master nfsd daemon whenever the stable restart file has been modified. This will allow the master nfsd daemon to maintain an up to date backup copy of the file. This is enabled via the nfssvc() syscall, so that older nfsd daemons will not be signaled. Reviewed by: jhb MFC after: 1 week
# 770b49a3	12-Jan-2011	Zack Kirsch <zack@FreeBSD.org>	In the experimental NFS server, when converting an open-owner to a lock-owner, start at sequence id 1 instead of 0, to match up with both Solaris and Linux. Reviewed by: rmacklem Approved by: zml (mentor)
# fbf0af3f	06-Jan-2011	Rick Macklem <rmacklem@FreeBSD.org>	Delete the NFS_STARTWRITE() and NFS_ENDWRITE() macros that obscured vn_start_write() and vn_finished_write() for the old OpenBSD port, since most uses have been replaced by the correct calls. MFC after: 12 days
# 629fa50e	02-Jan-2011	Rick Macklem <rmacklem@FreeBSD.org>	Add checks for VI_DOOMED and vn_lock() failures to the experimental NFS server, to handle the case where an exported file system is forced dismounted while an RPC is in progress. Further commits will fix the cases where a mount point is used when the associated vnode isn't locked. Reviewed by: kib MFC after: 2 weeks
# 5a12538b	01-Jan-2011	Rick Macklem <rmacklem@FreeBSD.org>	Add support for shared vnode locks for the Read operation in the experimental NFSv4 server. Reviewed by: kib MFC after: 2 weeks
# d6ec8427	17-Dec-2010	Rick Macklem <rmacklem@FreeBSD.org>	Fix two vnode locking problems in nfsd_recalldelegation() in the experimental NFSv4 server. The first was a bogus use of VOP_ISLOCKED() in a KASSERT() and the second was the need to lock the vnode for the nfsrv_checkremove() call. Also, delete a "__unused" that was bogus, since the argument is used. Reviewed by: zack.kirsch at isilon.com MFC after: 2 weeks
# b4a8d952	09-Dec-2010	Rick Macklem <rmacklem@FreeBSD.org>	Disable attempts to establish a callback connection from the experimental NFSv4 server to a NFSv4 client when delegations are not being issued, even if the client advertises a callback path. This avoids a problem where a Linux client advertises a callback path that doesn't work, due to a firewall, and then times out an Open attempt before the FreeBSD server gives up its callback connection attempt. (Suggested by drb at karlov.mff.cuni.cz to fix the Linux client problem that he reported on the fs-stable mailing list.) The server should probably have a 1sec timeout on callback connection attempts when there are no delegations issued to the client, but that patch will require changes to the krpc and this serves as a work around until then. Tested by: drb at karlov.mff.cuni.cz MFC after: 5 days
# a7d5f7eb	19-Oct-2010	Jamie Gritton <jamie@FreeBSD.org>	A new jail(8) with a configuration file, to replace the work currently done by /etc/rc.d/jail.
# db0a33d2	11-Oct-2010	Rick Macklem <rmacklem@FreeBSD.org>	Try and make the nfsrv_localunlock() function in the experimental NFSv4 server more readable. Mostly changes to comments, but a case of >= is changed to >, since == can never happen. Also, I've added a couple of KASSERT()s and a slight optimization, since once the "else if" case happens, subsequent locks in the list can't have any effect. None of these changes fixes any known bug. MFC after: 2 weeks
# a212c01a	18-Sep-2010	Rick Macklem <rmacklem@FreeBSD.org>	Fix nfsrv_freeallnfslocks() in the experimental NFSv4 server so that it frees local locks correctly upon close. In order for nfsrv_localunlock() to work correctly, the lock can no longer be in the lockowner's stateid list. As such, nfsrv_freenfslock() has to be called before nfsrv_localunlock(), to get rid of the lock structure on the lockowner's stateid list. This only affected operation when local locks (vfs.newnfs.enable_locallocks=1) are enabled, which is not the default at this time. MFC after: 1 week
# 2c6d0e01	10-Sep-2010	Rick Macklem <rmacklem@FreeBSD.org>	This patch applies one of the two fixes suggested by zack.kirsch at isilon.com for a race between nfsrv_freeopen() and nfsrv_getlockfile() in the experimental NFS server that he found during testing. Although nfsrv_freeopen() holds a sleep lock on the lock file structure when called with cansleep != 0, nfsrv_getlockfile() could still search the list, once it acquired the NFSLOCKSTATE() mutex. I believe that acquiring the mutex in nfsrv_freeopen() fixes the race. MFC after: 2 weeks
# b5cb66df	28-Aug-2010	Rick Macklem <rmacklem@FreeBSD.org>	Add acquisition of a reference count on nfsv4root_lock to the nfsd_recalldelegation() function, since this function is called by nfsd threads when they are handling NFSv2 or NFSv3 RPCs, where no reference count would have been acquired. MFC after: 2 weeks
# 2ec3f925	28-Aug-2010	Rick Macklem <rmacklem@FreeBSD.org>	The timer routine in the experimental NFS server did not acquire the correct mutex when checking nfsv4root_lock. Although this could be fixed by adding mutex lock/unlock calls, zack.kirsch at isilon.com suggested a better fix that uses a non-blocking acquisition of a reference count on nfsv4root_lock. This fix allows the weird NFSLOCKSTATE(); NFSUNLOCKSTATE(); synchronization to be deleted. This patch applies this fix. Tested by: zack.kirsch at isilon.com MFC after: 2 weeks
# 2cf552b1	16-Jul-2010	Rick Macklem <rmacklem@FreeBSD.org>	Patch the experimental NFSv4 server so that it acquires a reference count on nfsv4rootfs_lock when dumping state, since these functions are not called by nfsd threads. Without this reference count, it is possible for an nfsd thread to acquire an exclusive lock on nfsv4rootfs_lock while the dump is in progress and then change the lists, potentially causing a crash. Reported by: zack.kirsch at isilon.com MFC after: 2 weeks
# 866e6c5a	15-Jul-2010	Rick Macklem <rmacklem@FreeBSD.org>	Delete comments related to soft clock interrupts that don't apply to the FreeBSD port of the experimental NFSv4 server. Submitted by: zack.kirsch at isilon.com MFC after: 2 weeks
# 63f6e5bf	14-Jul-2010	Rick Macklem <rmacklem@FreeBSD.org>	This patch fixes a bug in the experimental NFSv4 server where it released a reference count on nfsv4rootfs_lock erroneously when administrative revocation of state was done. Submitted by: zack.kirsch at isilon.com MFC after: 2 weeks
# 95b1c51b	13-Jul-2010	Rick Macklem <rmacklem@FreeBSD.org>	Fix a bogus comment that mentions lru lists that don't exist. Reported by: zack.kirsch at isilon.com MFC after: 2 weeks
# 227b9ebe	30-Apr-2010	Rick Macklem <rmacklem@FreeBSD.org>	MFC: r207170 An NFSv4 server will reply NFSERR_GRACE for non-recovery RPCs during the grace period after startup. This grace period must be at least the lease duration, which is typically 1-2 minutes. It seems prudent for the experimental NFS client to wait a few seconds before retrying such an RPC, so that the server isn't flooded with non-recovery RPCs during recovery. This patch adds an argument to nfs_catnap() to implement a 5 second delay for this case.
# 23f929df	24-Apr-2010	Rick Macklem <rmacklem@FreeBSD.org>	An NFSv4 server will reply NFSERR_GRACE for non-recovery RPCs during the grace period after startup. This grace period must be at least the lease duration, which is typically 1-2 minutes. It seems prudent for the experimental NFS client to wait a few seconds before retrying such an RPC, so that the server isn't flooded with non-recovery RPCs during recovery. This patch adds an argument to nfs_catnap() to implement a 5 second delay for this case. MFC after: 1 week
# 066adacf	13-Apr-2010	Rick Macklem <rmacklem@FreeBSD.org>	MFC: r205941 This patch should fix handling of byte range locks locally on the server for the experimental nfs server. When enabled by setting vfs.newnfs.locallocks_enable to non-zero, the experimental nfs server will now acquire byte range locks on the file on behalf of NFSv4 clients, such that lock conflicts between the NFSv4 clients and processes running locally on the server, will be recognized and handled correctly.
# a43fcbe3	30-Mar-2010	Rick Macklem <rmacklem@FreeBSD.org>	This patch should fix handling of byte range locks locally on the server for the experimental nfs server. When enabled by setting vfs.newnfs.locallocks_enable to non-zero, the experimental nfs server will now acquire byte range locks on the file on behalf of NFSv4 clients, such that lock conflicts between the NFSv4 clients and processes running locally on the server, will be recognized and handled correctly. MFC after: 2 weeks
# 3ed680d8	19-Feb-2010	Rick Macklem <rmacklem@FreeBSD.org>	MFC: r203849 Change the default value for vfs.newnfs.enable_locallocks to 0 for the experimental NFS server, since local locking is known to be broken and the patch to fix it is still a work in progress.
# 9c360f22	13-Feb-2010	Rick Macklem <rmacklem@FreeBSD.org>	Change the default value for vfs.newnfs.enable_locallocks to 0 for the experimental NFS server, since local locking is known to be broken and the patch to fix it is still a work in progress. MFC after: 5 days
# 8c271f67	17-Jan-2010	Rick Macklem <rmacklem@FreeBSD.org>	MFC: r201442 The test for "same client" for the experimental nfs server over NFSv4 was broken w.r.t. byte range lock conflicts when it was the same client and the request used the open_to_lock_owner4 case, since lckstp->ls_clp was not set. This patch fixes it by using "clp" instead of "lckstp->ls_clp".
# 39682934	03-Jan-2010	Rick Macklem <rmacklem@FreeBSD.org>	The test for "same client" for the experimental nfs server over NFSv4 was broken w.r.t. byte range lock conflicts when it was the same client and the request used the open_to_lock_owner4 case, since lckstp->ls_clp was not set. This patch fixes it by using "clp" instead of "lckstp->ls_clp". MFC after: 2 weeks
# 838d9858	19-Jun-2009	Brooks Davis <brooks@FreeBSD.org>	Rework the credential code to support larger values of NGROUPS and NGROUPS_MAX, eliminate ABI dependencies on them, and raise the to 1024 and 1023 respectively. (Previously they were equal, but under a close reading of POSIX, NGROUPS_MAX was defined to be too large by 1 since it is the number of supplemental groups, not total number of groups.) The bulk of the change consists of converting the struct ucred member cr_groups from a static array to a pointer. Do the equivalent in kinfo_proc. Introduce new interfaces crcopysafe() and crsetgroups() for duplicating a process credential before modifying it and for setting group lists respectively. Both interfaces take care for the details of allocating groups array. crsetgroups() takes care of truncating the group list to the current maximum (NGROUPS) if necessary. In the future, crsetgroups() may be responsible for insuring invariants such as sorting the supplemental groups to allow groupmember() to be implemented as a binary search. Because we can not change struct xucred without breaking application ABIs, we leave it alone and introduce a new XU_NGROUPS value which is always 16 and is to be used or NGRPS as appropriate for things such as NFS which need to use no more than 16 groups. When feasible, truncate the group list rather than generating an error. Minor changes: - Reduce the number of hand rolled versions of groupmember(). - Do not assign to both cr_gid and cr_groups[0]. - Modify ipfw to cache ucreds instead of part of their contents since they are immutable once referenced by more than one entity. Submitted by: Isilon Systems (initial implementation) X-MFC after: never PR: bin/113398 kern/133867
# 47b7dc99	16-Jun-2009	Rick Macklem <rmacklem@FreeBSD.org>	Remove the "int *" typecast for the aresid argument to vn_rdwr() and change the type of the argument from size_t to int. This should avoid issues on 64bit architectures. Suggested by: kib Approved by: kib (mentor)
# ed41a7cc	22-May-2009	Rick Macklem <rmacklem@FreeBSD.org>	Change the printf of r192595 to identify the function, as requested by Sam. Approved by: kib (mentor)
# 47617400	22-May-2009	Rick Macklem <rmacklem@FreeBSD.org>	Modified the printf message of r192590 to remove the possible DOS attack, as suggested by Sam. Approved by: kib (mentor)
# 8757104e	22-May-2009	Rick Macklem <rmacklem@FreeBSD.org>	Change the comment at the beginning of the function to reflect the change from panic() to printf() done by r192588.
# 199685bc	22-May-2009	Rick Macklem <rmacklem@FreeBSD.org>	Change the reboot panic that would have occurred if clientid numbers wrapped around to a printf() warning of a possible DOS attack, in the experimental nfsv4 server. Approved by: kib (mentor)
# bac9ff34	21-May-2009	Rick Macklem <rmacklem@FreeBSD.org>	Fix the comment at line 3711 to be consistent with the change applied for r192537. Approved by: kib (mentor)
# 29e890f1	20-May-2009	Rick Macklem <rmacklem@FreeBSD.org>	Although it should never happen, all the nfsv4 server can do when it runs out of clientids is reboot. I had replaced cpu_reboot() with printf(), since cpu_reboot() doesn't exist for sparc64. This change replaces the printf() with panic(), so the reboot would occur for this highly unlikely occurrence. Approved by: kib (mentor)
# 15e8331f	15-May-2009	Rick Macklem <rmacklem@FreeBSD.org>	Fixed the Null callback RPCs so that they work with the new krpc. This required two changes: setting the program and version numbers before connect and fixing the handling of the Null Rpc case in newnfs_request(). Approved by: kib (mentor)
# 9ec7b004	04-May-2009	Rick Macklem <rmacklem@FreeBSD.org>	Add the experimental nfs subtree to the kernel, that includes support for NFSv4 as well as NFSv2 and 3. It lives in 3 subdirs under sys/fs: nfs - functions that are common to the client and server nfsclient - a mutation of sys/nfsclient that call generic functions to do RPCs and handle state. As such, it retains the buffer cache handling characteristics and vnode semantics that are found in sys/nfsclient, for the most part. nfsserver - the server. It includes a DRC designed specifically for NFSv4, that is used instead of the generic DRC in sys/rpc. The build glue will be checked in later, so at this point, it consists of 3 new subdirs that should not affect kernel building. Approved by: kib (mentor)