Cross Reference: /freebsd-current/sys/fs/nfs/nfs

History log of /freebsd-current/sys/fs/nfs/nfs_commonkrpc.c
Revision	Date	Author	Comments
# a5581308	26-Dec-2023	Rick Macklem <rmacklem@FreeBSD.org>	nfscl: Fix handling of expired Kerberos credentials (NFSv4.1/4.2) If the NFS server detects that the Kerberos credentials provided by a NFSv4.1/4.2 mount using sec=krb5[ip] have expired, the NFS server replies with a krpc layer error of RPC_AUTHERROR. When this happened, the client erroneously left the NFSv4.1/4.2 session slot busy, so that it could not be used by other RPCs. If this happened for all session slots, the mount point would hang. This patch fixes the problem by releasing the session slot and resetting its sequence# upon receiving a RPC_AUTHERROR reply. This bug only affects NFSv4.1/4.2 mounts using sec=krb5[ip], but has existed since NFSv4.1 client support was added to FreeBSD. So, why has the bug remained undetected for so long? I cannot be sure, but I suspect that, often, the client detected the Kerberos credential expiration before attempting the RPC. For this case, the client would not do the RPC and, as such, there would be no busy session slot. Also, no hang would occur until all session slots are busied (64 for a FreeBSD client/server), so many cases of the bug probably went undetected? Also, use of sec=krb5[ip] mounts are not that common. PR: 275905 Tested by: Lexi <lexi.freebsd@le-fay.org> MFC after: 1 week
# dd7d42a1	23-Oct-2023	Rick Macklem <rmacklem@FreeBSD.org>	nfscl/kgssapi: Fix Kerberized NFS mounts to pNFS servers During recent testing related to the IETF NFSv4 Bakeathon, it was discovered that Kerberized NFSv4.1/4.2 mounts to pNFS servers (sec=krb5[ip],pnfs mount options) was broken. The FreeBSD client was using the "service principal" for the MDS to try and establish a rpcsec_gss credential for a DS, which is incorrect. (A "service principal" looks like "nfs@<fqdn-of-server>" and the <fqdn-of-server> for the DS is not the same as the MDS for most pNFS servers.) To fix this, the rpcsec_gss code needs to be able to do a reverse DNS lookup of the DS's IP address. A new kgssapi upcall to the gssd(8) daemon is added by this patch to do the reverse DNS along with a new rpcsec_gss function to generate the "service principal". A separate patch to the gssd(8) will be committed, so that this patch will fix the problem. Without the gssd(8) patch, the new upcall fails and current/incorrect behaviour remains. This bug only affects the rare case of a Kerberized (sec=krb5[ip],pnfs) mount using pNFS. This patch changes the internal KAPI between the kgssapi and nfscl modules, but since I did a version bump a few days ago, I will not do one this time. MFC after: 1 month
# c4e29825	18-Oct-2023	Rick Macklem <rmacklem@FreeBSD.org>	nfscl: Handle the NFSERR_RETRYUNCACHEDREP error from a NFSv4 server In a recent email list discussion related to NFSv4 mount problems against a non-FreeBSD NFSv4 server, the reporter of the issue noted that the server had replied 10068 (NFSERR_RETRYUNCACHEDREP). This did not seem related to the mount problem, but I had never seen this error before. It indicates that an RPC retry after a new TCP connection has been established failed because the server did not cache the reply. Since this should only happen for idempotent operations, redoing the RPC should be safe. This patch modifies the NFSv4.1/4.2 client to redo the RPC instead of considering the server error fatal. It should only affect the unusual case where TCP connections to NFSv4 servers are breaking without the NFSv4 server rebooting. Reported by: J David <j.devid.lists@gmail.com> MFC after: 2 weeks
# db7257ef	17-Oct-2023	Rick Macklem <rmacklem@FreeBSD.org>	nfsd: Fix a server crash PR#274346 reports a crash which appears to be caused by a NULL default session being destroyed. This patch should avoid the crash. Tested by: Joshua Kinard <freebsd@kumba.dev> PR: 274346 MFC after: 2 weeks
# 685dc743	16-Aug-2023	Warner Losh <imp@FreeBSD.org>	sys: Remove $FreeBSD$: one-line .c pattern Remove /^[\s]__FBSDID$"\$FreeBSD\$"$;?\s*\n/
# 896516e5	16-Mar-2023	Rick Macklem <rmacklem@FreeBSD.org>	nfscl: Add a new NFSv4.1/4.2 mount option for Kerberized mounts Without this patch, a Kerberized NFSv4.1/4.2 mount must provide a Kerberos credential for the client at mount time. This credential is typically referred to as a "machine credential". It can be created one of two ways: - The user (usually root) has a valid TGT at the time the mount is done and this becomes the machine credential. There are two problems with this. 1 - The user doing the mount must have a valid TGT for a user principal at mount time. As such, the mount cannot be put in fstab(5) or similar. 2 - When the TGT expires, the mount breaks. - The client machine has a service principal in its default keytab file and this service principal (typically called a host-based initiator credential) is used as the machine credential. There are problems with this approach as well: 1 - There is a certain amount of administrative overhead creating the service principal for the NFS client, creating a keytab entry for this principal and then copying the keytab entry into the client's default keytab file via some secure means. 2 - The NFS client must have a fixed, well known, DNS name, since that FQDN is in the service principal name as the instance. This patch uses a feature of NFSv4.1/4.2 called SP4_NONE, which allows the state maintenance operations to be performed by any authentication mechanism, to do these operations via AUTH_SYS instead of RPCSEC_GSS (Kerberos). As such, neither of the above mechanisms is needed. It is hoped that this option will encourage adoption of Kerberized NFS mounts using TLS, to provide a more secure NFS mount. This new NFSv4.1/4.2 mount option, called "syskrb5" must be used with "sec=krb5[ip]" to avoid the need for either of the above Kerberos setups to be done by the client. Note that all file access/modification operations still require users on the NFS client to have a valid TGT recognized by the NFSv4.1/4.2 server. As such, this option allows, at most, a malicious client to do some sort of DOS attack. Although not required, use of "tls" with this new option is encouraged, since it provides on-the-wire encryption plus, optionally, client identity verification via a X.509 certificate provided to the server during TLS handshake. Alternately, "sec=krb5p" does provide on-the-wire encryption of file data. A mount_nfs(8) man page update will be done in a separate commit. Discussed on: freebsd-current@ MFC after: 3 months
# a63b5d48	16-Feb-2023	Rick Macklem <rmacklem@FreeBSD.org>	nfscommon: Revert use of nfsstatsv1_p in nfs_commonkrpc.c Commit 9d329bbc9aea converted a lot of accesses to nfsstatsv1 to use nfsstatsv1_p instead. However, the accesses in nfs_commonkrpc.c are for client side and should not be converted. This patch puts them back in the correct pre-commit 9d329bbc9aea form. MFC after: 3 months
# b039ca07	15-Feb-2023	Rick Macklem <rmacklem@FreeBSD.org>	nfsd: Wrap nfsstatsv1_p in the NFSD_VNET() macro Commit 7344856e3a6d added a lot of macros that will front end vnet macros so that nfsd(8) can run in vnet prison. The nfsstatsv1_p variable got missed. This patch wraps all uses of nfsstatsv1_p with the NFSD_VNET() macro. The NFSD_VNET() macro is still a null macro. MFC after: 3 months
# 9d329bbc	13-Feb-2023	Rick Macklem <rmacklem@FreeBSD.org>	nfsd: Continue adding macros so nfsd can run in a vnet prison Commit 7344856e3a6d added a lot of macros that will front end vnet macros so that nfsd(8) can run in vnet prison. This patch adds some more of them and also a lot of uses of nfsstatsv1_p instead of nfsstatsv1. nfsstatsv1_p points to nfsstatsv1 for prison0, but will point to a malloc'd structure for other prisons. It also puts nfsstatsv1_p in nfscommon.ko instead of nfsd.ko. MFC after: 3 months
# 0685c73c	28-Aug-2022	Rick Macklem <rmacklem@FreeBSD.org>	nfscl: Add a console message for session recovery The NFSv4.1/4.2 client does recovery when it receives a NFSERR_BADSESSION reply from the server. If the server has not rebooted, this is often caused by multiple clients using the same /etc/hostid and, as such, not being recognized as different clients by the server. This trivial patch adds a console message to suggest that client's /etc/hostid's need to be checked for uniqueness. MFC after: 2 weeks
# fb29f817	27-Aug-2022	Rick Macklem <rmacklem@FreeBSD.org>	nfscl: Fix handling of nd_slotid while handling NFSERR_BADSESSION When the NFSv4.1/4.2 client is handling a server error of NFSERR_BADSESSION, it retries RPCs with a new session. Without this patch, the nd_slotid was not being updated for the new session. This would result in a bogus console message like "Wrong session srvslot=X slot=Y" and then it would free the incorrect slot, often generating a "freeing free slot!!" console message as well. This patch fixes the problem. Note that FreeBSD NFSv4.1/4.2 servers only generate a NFSERR_BADSESSION error after a reboot or after a client does a DestroySession operation. PR: 260011 MFC after: 1 week
# f2dfe607	27-Aug-2022	Rick Macklem <rmacklem@FreeBSD.org>	Revert "nfscl: Fix handling of nd_slotid while handling NFSERR_BADSESSION" Revert this commit, since I now have a better fix to commit. This reverts commit 8e59ec29e47f6ec64f54ddd88cab388ae536f0ff.
# 8e59ec29	25-Aug-2022	Rick Macklem <rmacklem@FreeBSD.org>	nfscl: Fix handling of nd_slotid while handling NFSERR_BADSESSION When the NFSv4.1/4.2 client is handling a server error of NFSERR_BADSESSION, it retries RPCs with a new session. Without this patch, the nd_slotid was not being updated for the new session. This would result in a bogus console message like "Wrong session srvslot=X slot=Y" and then it would free the incorrect slot, often generating a "freeing free slot!!" console message as well. This patch fixes the problem. Note that FreeBSD NFSv4.1/4.2 servers only generate a NFSERR_BADSESSION error after a reboot or after a client does a DestroySession operation. PR: 260011 MFC after: 1 week
# 2b612c9d	25-Aug-2022	Rick Macklem <rmacklem@FreeBSD.org>	nfscl: Fix handling of a bad session slot (NFSv4.1/4.2) When a session has been marked defunct by the server sending a NFSERR_BADSESSION reply to the NFSv4.1/4.2 client, nfsv4_sequencelookup() returns NFSERR_BADSESSION without actually assigning a session slot. Without this patch, newnfs_request() would erroneously free slot 0. This could result in the slot being reused prematurely, but most likely just generated a "freeing free slot!!" console message. This patch fixes the code to not do the erroneous freeing of the slot for this case. PR: 260011 MFC after: 1 week
# 981ef322	10-Jul-2022	Rick Macklem <rmacklem@FreeBSD.org>	nfscl: Enable detection of bad session slots To deal with broken session slots caused by the use of the "soft" and/or "intr" mount options, nfsv4_sequencelookup() has been modified to track the potentially broken session slots (commit 40ada74ee1da). Then, when all session slots are potentially broken, nfsv4_sequencelookup() does a DeleteSession operation, so that the NFSv4.1/4.2 server will reply NFSERR_BADSESSION to uses of the session. The client will then recover by doing a CreateSession to acquire a new session. This patch adds the code that marks potentially bad slots, so that the above semantics become functional. It has been successfully tested against a FreeBSD NFSv4.1/4.2 server, but does not work against a Linux 5.15 NFSv4.1/4.2 server. (The Linux 5.15 server creates a new session with the same sessionid as the destroyed one and, as such, keeps returning NFSERR_BADSESSION. I believe this is a bug in the Linux server.) However, this should not cause a regression and will make "intr" mounts fairly usable against the NFSv4.1/4.2 servers where it works. PR: 260011 MFC after: 2 weeks
# 40ada74e	09-Jul-2022	Rick Macklem <rmacklem@FreeBSD.org>	nfscl: Add optional support for slots marked bad This patch adds support for session slots marked bad to nfsv4_sequencelookup(). An additional boolean argument indicates if the check for slots marked bad should be done. The "cred" argument added to nfscl_reqstart() by commit 326bcf9394c7 is now passed into nfsv4_setquence() so that it can optionally set the boolean argument for nfsv4_sequencelookup(). When optionally enabled, nfsv4_setsequence() will do a DestroySession when all slots are marked bad. Since the code that marks slots bad is not yet committed, this patch should not result in a semantics change. PR: 260011 MFC after: 2 weeks
# 1c15c8c0	27-Nov-2021	Rick Macklem <rmacklem@FreeBSD.org>	nfscl: Sanity check the Sequence slotid in reply The slotid in the Sequence reply must be the same as in the request. Check that it is the same and log a console message if it is not, plus set it to the correct value. Reported by: rtm@lcs.mit.edu Tested by: rtm@lcs.mit.edu PR: 260071 MFC after: 2 weeks
# 80e5955b	04-Nov-2021	Rick Macklem <rmacklem@FreeBSD.org>	nfscl: Fix NFSv4.1/4.2 pnfs mounts using nconnect When a mount with the "pnfs" and "nconnect" options specified does an I/O operation, it erroneously uses a TCP connection to the MDS when it is meant to be a DS operation and, as such, needs to use a TCP connection to the DS. This patch fixes this. When the "pnfs" and "nconnect" options are specified for a NFSv4.1/4.2 mount, there probably should be N connections established to each DS for I/O RPCs. This is a fair amount of work and may be done in a future commit. This problem was found during a recent IETF NFSv4 working group testing event. MFC after: 2 weeks
# ae49051c	03-Nov-2021	Rick Macklem <rmacklem@FreeBSD.org>	nfscl: Fix forced dismount when "nconnect" is specified When a forced dismount is done and the "nconnect" mount option was used, the additional connections must be closed. This patch does that. Found during a recent IETF NFSv4 working group testing event. MFC after: 2 weeks
# 5a95a6e8	01-Nov-2021	Rick Macklem <rmacklem@FreeBSD.org>	nfscl: Use a smaller initial delay time for NFSERR_DELAY For NFS RPCs that receive a NFSERR_DELAY reply, the delay time is initially 1sec and then increases exponentially to NFS_TRYLATERDEL. It was found that this delay time is excessive for some NFSv4 servers, which work well with a 1msec delay. A 1sec delay resulted in very slow performance for Remove and Rename when delegations and pNFS were enabled. This patch decreases the initial delay time to 1msec. Found during a recent IETF NFSv4 working group testing event. MFC after: 2 weeks
# 1e0a518d	08-Jul-2021	Rick Macklem <rmacklem@FreeBSD.org>	nfscl: Add a Linux compatible "nconnect" mount option Linux has had an "nconnect" NFS mount option for some time. It specifies that N (up to 16) TCP connections are to created for a mount, instead of just one TCP connection. A discussion on freebsd-net@ indicated that this could improve client<-->server network bandwidth, if either the client or server have one of the following: - multiple network ports aggregated to-gether with lagg/lacp. - a fast NIC that is using multiple queues It does result in using more IP port#s and might increase server peak load for a client. One difference from the Linux implementation is that this implementation uses the first TCP connection for all RPCs composed of small messages and uses the additional TCP connections for RPCs that normally have large messages (Read/Readdir/Write). The Linux implementation spreads all RPCs across all TCP connections in a round robin fashion, whereas this implementation spreads Read/Readdir/Write across the additional TCP connections in a round robin fashion. Reviewed by: markj MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D30970
# c5f4772c	30-Jun-2021	Rick Macklem <rmacklem@FreeBSD.org>	nfscl: Improve "Consider increasing kern.ipc.maxsockbuf" message When the setting of kern.ipc.maxsockbuf is less than what is desired for I/O based on vfs.maxbcachebuf and vfs.nfs.bufpackets, a console message of "Consider increasing kern.ipc.maxsockbuf". is printed. This patch modifies the message to provide a suggested value for kern.ipc.maxsockbuf. Note that the setting is only needed when the NFS rsize/wsize is set to vfs.maxbcachebuf. While here, make nfs_bufpackets global, so that it can be used by a future patch that adds a sysctl to set the NFS server's maximum I/O size. Also, remove "sizeof(u_int32_t)" from the maximum packet length, since NFS_MAXXDR is already an "overestimate" of the actual length. MFC after: 2 weeks
# fc0dc940	18-May-2021	Rick Macklem <rmacklem@FreeBSD.org>	nfsd: Reduce the callback timeout to 800msec Recent discussion on the nfsv4@ietf.org mailing list confirmed that an NFSv4 server should reply to an RPC in less than 1second. If an NFSv4 RPC requires a delegation be recalled, the server will attempt a CB_RECALL callback. If the client is not responsive, the RPC reply will be delayed until the callback times out. Without this patch, the timeout is set to 4 seconds (set in ticks, but used as seconds), resulting in the RPC reply taking over 4sec. This patch redefines the constant as being in milliseconds and it implements that for a value of 800msec, to ensure the RPC reply is sent in less than 1second. This patch only affects mounts from clients when delegations are enabled on the server and the client is unresponsive to callbacks. MFC after: 2 weeks
# f5ff282b	26-Apr-2021	Rick Macklem <rmacklem@FreeBSD.org>	nfscl: fix the handling of NFSERR_DELAY for Open/LayoutGet RPCs For a pNFS mount, the NFSv4.1/4.2 client uses compound RPCs that have both Open and LayoutGet operations in them. If the pNFS server were tp reply NFSERR_DELAY for one of these compounds, the retry after a delay cannot be handled by newnfs_request(), since there is a reference held on the open state for the Open operation in them. Fix this by adding these RPCs to the "don't do delay here" list in newnfs_request(). This patch is only needed if the mount is using pNFS (the "pnfs" mount option) and probably only matters if the MDS server is issuing delegations as well as pNFS layouts. Found by code inspection. MFC after: 2 weeks
# 87597731	26-Apr-2021	Rick Macklem <rmacklem@FreeBSD.org>	nfsd: fix the slot sequence# when a callback fails Commit 4281bfec3628 patched the server so that the callback session slot would be free'd for reuse when a callback attempt fails. However, this can often result in the sequence# for the session slot to be advanced such that the client end will reply NFSERR_SEQMISORDERED. To avoid the NFSERR_SEQMISORDERED client reply, this patch negates the sequence# advance for the case where the callback has failed. The common case is a failed back channel, where the callback cannot be sent to the client, and not advancing the sequence# is correct for this case. For the uncommon case where the client's reply to the callback is lost, not advancing the sequence# will indicate to the client that the next callback is a retry and not a new callback. But, since the FreeBSD server always sets "csa_cachethis" false in the callback sequence operation, a retry and a new callback should be handled the same way by the client, so this should not matter. Until you have this patch in your NFSv4.1/4.2 server, you should consider avoiding the use of delegations. Even with this patch, interoperation with the Linux NFSv4.1/4.2 client in kernel versions prior to 5.3 can result in frequent 15second delays if delegations are enabled. This occurs because, for kernels prior to 5.3, the Linux client does a TCP reconnect every time it sees multiple concurrent callbacks and then it takes 15seconds to recover the back channel after doing so. MFC after: 2 weeks
# 665b1365	21-Dec-2020	Rick Macklem <rmacklem@FreeBSD.org>	Add a new "tlscertname" NFS mount option. When using NFS-over-TLS, an NFS client can optionally provide an X.509 certificate to the server during the TLS handshake. For some situations, such as different NFS servers or different certificates being mapped to different user credentials on the NFS server, there may be a need for different mounts to provide different certificates. This new mount option called "tlscertname" may be used to specify a non-default certificate be provided. This alernate certificate will be stored in /etc/rpc.tlsclntd in a file with a name based on what is provided by this mount option.
# 586ee69f	01-Sep-2020	Mateusz Guzik <mjg@FreeBSD.org>	fs: clean up empty lines in .c and .h files
# 6e4b6ff8	27-Aug-2020	Rick Macklem <rmacklem@FreeBSD.org>	Add flags to enable NFS over TLS to the NFS client and server. An Internet Draft titled "Towards Remote Procedure Call Encryption By Default" (soon to be an RFC I think) describes how Sun RPC is to use TLS with NFS as a specific application case. Various commits prepared the NFS code to use KERN_TLS, mainly enabling use of ext_pgs mbufs for large RPC messages. r364475 added TLS support to the kernel RPC. This commit (which is the final one for kernel changes required to do NFS over TLS) adds support for three export flags: MNT_EXTLS - Requires a TLS connection. MNT_EXTLSCERT - Requires a TLS connection where the client presents a valid X.509 certificate during TLS handshake. MNT_EXTLSCERTUSER - Requires a TLS connection where the client presents a valid X.509 certificate with "user@domain" in the otherName field of the SubjectAltName during TLS handshake. Without these export options, clients are permitted, but not required, to use TLS. For the client, a new nmount(2) option called "tls" makes the client do a STARTTLS Null RPC and TLS handshake for all TCP connections used for the mount. The CLSET_TLS client control option is used to indicate to the kernel RPC that this should be done. Unless the above export flags or "tls" option is used, semantics should not change for the NFS client nor server. For NFS over TLS to work, the userspace daemons rpctlscd(8) { for client } or rpctlssd(8) daemon { for server } must be running.
# 02511d21	10-Aug-2020	Rick Macklem <rmacklem@FreeBSD.org>	Add an argument to newnfs_connect() that indicates use TLS for the connection. For NFSv4.0, the server creates a server->client TCP connection for callbacks. If the client mount on the server is using TLS, enable TLS for this callback TCP connection. TLS connections from clients will not be supported until the kernel RPC changes are committed. Since this changes the internal ABI between the NFS kernel modules that will require a version bump, delete newnfs_trimtrailing(), which is no longer used. Since LCL_TLSCB is not yet set, these changes should not have any semantic affect at this time.
# e3e7c612	11-Apr-2020	Rick Macklem <rmacklem@FreeBSD.org>	Replace mbuf macros with the code they would generate in the NFS code. When the code was ported to Mac OS/X, mbuf handling functions were converted to using the Mac OS/X accessor functions. For FreeBSD, they are a simple set of macros in sys/fs/nfs/nfskpiport.h. Since porting to Mac OS/X is no longer a consideration, replacement of these macros with the code generated by them makes the code more readable. When support for external page mbufs is added as needed by the KERN_TLS, the patch becomes simpler if done without the macros. This patch should not result in any semantic change. This is the final patch of this series and the macros should now be able to be deleted from the .h files in a future commit.
# 9f6624d3	11-Apr-2020	Rick Macklem <rmacklem@FreeBSD.org>	Replace mbuf macros with the code they would generate in the NFS code. When the code was ported to Mac OS/X, mbuf handling functions were converted to using the Mac OS/X accessor functions. For FreeBSD, they are a simple set of macros in sys/fs/nfs/nfskpiport.h. Since porting to Mac OS/X is no longer a consideration, replacement of these macros with the code generated by them makes the code more readable. When support for external page mbufs is added as needed by the KERN_TLS, the patch becomes simpler if done without the macros. This patch should not result in any semantic change.
# 355b3b7f	26-Mar-2020	Mark Johnston <markj@FreeBSD.org>	Simplify td_ucred handling in newnfs_connect(). No functional change intended. MFC after: 1 week
# 8e1906f7	07-Dec-2019	Rick Macklem <rmacklem@FreeBSD.org>	Fix kernel handling of a NFSERR_MINORVERSMISMATCH NFSv4 server reply. When an NFSv4 server replies NFSERR_MINORVERSMISMATCH, it does not generate a status result for the first operation in the compound. Without this patch, this will result in a bogus EBADXDR error return. Returning EBADXDR is relatively harmless, but a correct reply of NFSERR_MINORVERSMISMATCH is needed by the pNFS client to select the correct minor version to use for a File Layout DS now that there can be NFSv4.2 DS servers. mount_nfs.c still needs to be fixed for this, although how the mount fails is only useful to help sysadmins isolate why a mount fails. Found during testing of the NFSv4.2 client and server. MFC after: 2 weeks
# 02c8dd7d	04-Apr-2019	Rick Macklem <rmacklem@FreeBSD.org>	Revert r320698, since the related userland changes were reverted by r338192. r338192 reverted the changes to nfsuserd so that it could use an AF_LOCAL socket, since it resulted in a vnode locking panic(). Post r338192 nfsuserd daemons use the old AF_INET socket for upcalls and do not use these kernel changes. I left them in for a while, so that nfsuserd daemons built from head sources between r320757 (Jul. 6, 2017) and r338192 (Aug. 22, 2018) would need them by default. This only affects head, since the changes were never MFC'd. I will add an UPDATING entry, since an nfsuserd daemon built from head sources between r320757 and r338192 will not run unless the "-use-udpsock" option is specified. (This command line option is only in the affected revisions of the nfsuserd daemon.) I suspect few will be affected by this, since most who run systems built from head sources (not stable or releases) will have rebuilt their nfsuserd daemon from sources post r338192 (Aug. 22, 2018) This is being reverted in preparation for an update to include AF_INET6 support to the code.
# 93df87f2	07-Aug-2018	Rick Macklem <rmacklem@FreeBSD.org>	Allow newnfs_request() to retry all callback RPCs with an NFSERR_DELAY reply. The code in newnfs_request() retries RPCs that get a reply of NFSERR_DELAY, but exempts certain NFSv4 operations. However, for callback RPCs, there should not be any exemptions at this time. The code would have erroneously exempted the CBRECALL callback, since it has the same operation number as the CLOSE operation. This patch fixes this by checking for a callback RPC (indicated by clp != NULL) and not checking for exempt operations for callbacks. This would have only affected the NFSv4 server when delegations are enabled (they are not enabled by default) and the client replies to CBRECALL with NFSERR_DELAY. This may never actually happen. Spotted during code inspection. MFC after: 2 weeks
# cecf6c6e	20-Jul-2018	Rick Macklem <rmacklem@FreeBSD.org>	Set CLSET_TIMEOUT on TCP connections to pNFS DSs. Use CLSET_TIMEOUT to set the timeout for connections to DSs instead of specifying a timeout on each RPC. This is done so that SO_SNDTIMEO is set on the TCP socket as well as specifying a time limit when waiting for an RPC reply. Useful if the send queue for the TCP connection has become constipated, due to a failed DS. The choice of lease_duration / 4 is fairly arbitrary, but seems to work ok, with a lower bound of 10sec. For client connections to a DS, set the retry limit to vfs.nfsd.dsretries, which is 2 by default. This patch should only affect pNFS connections to DSs. This patch requires r336542. MFC after: 2 weeks
# ba6cce3a	22-Jun-2018	Rick Macklem <rmacklem@FreeBSD.org>	Fix the handling of NFSv4.1 sessions for "soft" mounts. When a "soft" mount is used for NFSv4.1, an RPC that fails without completing will leave a slot in the NFSv4.1 session in an indeterminate state. As such, all that can be done is free up the slot while making is no longer usable. A "soft" NFSv4.1 mount is not recommended in general, since it will leave Open/Lock state in an indeterminate state. An exception is a pNFS mount of a DS, since there are no Opens/Locks done for them except file creates where loss of the Open state does not matter. The patch also makes connections to DSs soft, so that they will fail when a DS is non-functional or network partitioned, allowing the pNFS MDS to disable the DS for a mirrored configuration. This patch should not affect normal "hard" NFSv4.1 mounts. MFC after: 2 weeks
# 755e4b79	17-Jun-2018	Rick Macklem <rmacklem@FreeBSD.org>	Revert r335263, since it can cause crashes in unusual circumstances. This needs to be fixed in a different way.
# 46d30d3d	16-Jun-2018	Rick Macklem <rmacklem@FreeBSD.org>	Fix NFSv4.1 client side handling of "soft,retrans=2" mounts. Normally "soft,retrans=2" cannot be safely used on NFSv4 mounts, since the RPC can fail and leave the open/lock state in an undefined state. Doing I/O on a pNFS DS is an exception to this, since no open/lock state is maintained on the DS server. It is useful to do "soft,retrans=2" connections to a DS when it is mirrored, so that the client can detect failure of the DS. As such, mounts from the MDS to the DSs should use these mount options when mirroring is enabled. However, the NFSv4.1 client still leaves the session in an undefined state when this happens. This patch fixes the problem by setting the session defunct, so it will no longer be used. The patch also sets "retries=2" on the connections done by the client to a DS, which is the internal equivalent of "soft,retrans=2". The client does not know if the server implements mirroring at connection time, but always doing this should be safe, since it will fall back on doing I/O via the MDS as a proxy when there is a failure doing an I/O RPC to the DS. This patch should not affect non-pNFS client mounts. MFC after: 2 weeks
# 73b1879c	11-Jun-2018	Rick Macklem <rmacklem@FreeBSD.org>	Add a couple of safety belt checks to the NFSv4.1 client related to sessions. There were a couple of cases in newnfs_request() that it assumed that it was an NFSv4.1 mount with a session. This should always be the case when a Sequence operation is in the reply or the server replies NFSERR_BADSESSION. However, if a server was broken and sent an erroneous reply, these safety belt checks should avoid trouble. The one check required a small tweak to nfsmnt_mdssession() so that it returns NULL when there is no session instead of the offset of the field in the structure (0x8 for i386). This patch should have no effect on normal operation of the client. Found by inspection during pNFS server development. MFC after: 2 weeks
# 222daa42	25-Jan-2018	Conrad Meyer <cem@FreeBSD.org>	style: Remove remaining deprecated MALLOC/FREE macros Mechanically replace uses of MALLOC/FREE with appropriate invocations of malloc(9) / free(9) (a series of sed expressions). Something like: * MALLOC(a, b, ... -> a = malloc(... * FREE( -> free( * free((caddr_t) -> free( No functional change. For now, punt on modifying contrib ipfilter code, leaving a definition of the macro in its KMALLOC(). Reported by: jhb Reviewed by: cy, imp, markj, rmacklem Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D14035
# 151ba793	24-Dec-2017	Alexander Kabaev <kan@FreeBSD.org>	Do pass removing some write-only variables from the kernel. This reduces noise when kernel is compiled by newer GCC versions, such as one used by external toolchain ports. Reviewed by: kib, andrew(sys/arm and sys/arm64), emaste(partial), erj(partial) Reviewed by: jhb (sys/dev/pci/* sys/kern/vfs_aio.c and sys/kern/kern_synch.c) Differential Revision: https://reviews.freebsd.org/D10385
# 51369649	20-Nov-2017	Pedro F. Giffuni <pfg@FreeBSD.org>	sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.
# 63918d38	10-Oct-2017	Rick Macklem <rmacklem@FreeBSD.org>	Fix forced dismount when a pNFS mount is hung on a DS. When a "pnfs" NFSv4.1 mount is hung because of an unresponsive DS, a forced dismount wouldn't work, because the RPC socket for the DS was not being closed. This patch fixes this. This will only affect "pnfs" mounts where the pNFS server's DS is unresponsive (crashed or network partitioned or...). Found during testing of the pNFS server. MFC after: 2 weeks
# 16f300fa	27-Jul-2017	Rick Macklem <rmacklem@FreeBSD.org>	Replace the checks for MNTK_UNMOUNTF with a macro that does the same thing. This patch defines a macro that checks for MNTK_UNMOUNTF and replaces explicit checks with this macro. It has no effect on semantics, but prepares the code for a future patch where there will also be a NFS specific flag for "forced dismount about to occur". Suggested by: kib MFC after: 2 weeks
# 25d694a6	05-Jul-2017	Rick Macklem <rmacklem@FreeBSD.org>	Add support for AF_LOCAL socket upcalls to the nfsuserd daemon. This patch adds support for AF_LOCAL socket upcalls to an nfsuserd daemon that supports them. A future patch to the nfsuserd daemon will use AF_LOCAL sockets to avoid a problem when using upcalls to 127.0.0.1 if jails are in use. Suggested by: dfr PR: 205193
# ee791357	19-Jun-2017	Rick Macklem <rmacklem@FreeBSD.org>	Add the definition of maxbcachebuf to sys/buf.h. r320070 removed the definition of maxbcachebuf from sys/param.h to fix the build for arm. This patch adds the definition of maxbcachebuf to sys/buf.h, which should be ok, since sys/buf.h is not being included in arm/arm/elf_note.S. Suggested by: kib MFC after: 2 weeks
# 1d9f01b1	17-Jun-2017	Rick Macklem <rmacklem@FreeBSD.org>	Take "extern int maxbcachebuf" out of sys/param.h, since it breaks the arm build. In the arm build, elf_note.S includes sys/param.h and then does an elf macro called ELFNOTE(). Although the compile error doesn't make sense to me, I believe it just means that an "extern ..." can't exist in param.h for this inclusion case. I suspect adding #if !defined(LOCORE) might fix the build, but this commit just takes the definition out. I will ask freebsd-current@ what is the best was to deal with this and do a subsequent commit after that. Reported by: melounmichal@gmail.com
# d1c5e240	17-Jun-2017	Rick Macklem <rmacklem@FreeBSD.org>	Make MAXBCACHEBUF a tunable called vfs.maxbcachebuf. By making MAXBCACHEBUF a tunable, it can be increased to allow for larger read/write data sizes for the NFS client. The tunable is limited to MAXPHYS, which is currently 128K. Making MAXPHYS a tunable or increasing its value is being discussed, since it would be nice to support a read/write data size of 1Mbyte for the NFS client when mounting the AmazonEFS file service. Reviewed by: kib MFC after: 2 weeks Relnotes: yes Differential Revision: https://reviews.freebsd.org/D10991
# 0596f343	21-Apr-2017	Rick Macklem <rmacklem@FreeBSD.org>	Don't set ND_NOMOREDATA for a failed Setattr operation (NFSv4). The NFSv4 Setattr operation always has reply data even when it fails, so don't set the ND_NOMOREDATA for it. This would only affect unusual cases where Setattr fails and the RPC code wants to parse the rest of the compound. Detected during recent development related to the pNFS server. MFC after: 2 weeks
# 40f8ff48	21-Apr-2017	Rick Macklem <rmacklem@FreeBSD.org>	Don't create a backchannel for a DS connection. An NFSv4.1 client connection to a Data Server (DS) should not have a backchannel. This patch fixes the NFSv4.1/pNFS client to not do a backchannel for this case. Found during recent testing with the pNFS server under development. MFC after: 2 weeks
# fbbd9655	28-Feb-2017	Warner Losh <imp@FreeBSD.org>	Renumber copyright clause 4 Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point. Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96
# b2fc0141	23-Dec-2016	Rick Macklem <rmacklem@FreeBSD.org>	Fix NFSv4.1 client recovery from NFS4ERR_BAD_SESSION errors. For most NFSv4.1 servers, a NFS4ERR_BAD_SESSION error is a rare failure that indicates that the server has lost session/open/lock state. However, recent testing by cperciva@ against the AmazonEFS server found several problems with client recovery from this due to it generating this failure frequently. Briefly, the problems fixed are: - If all session slots were in use at the time of the failure, some processes would continue to loop waiting for a slot on the old session forever. - If an RPC that doesn't use open/lock state failed with NFS4ERR_BAD_SESSION, it would fail the RPC/syscall instead of initiating recovery and then looping to retry the RPC. - If a successful reply to an RPC for an old session wasn't processed until after a new session was created for a NFS4ERR_BAD_SESSION error, it would erroneously update the new session and corrupt it. - The use of the first element of the session list in the nfs mount structure (which is always the current metadata session) was slightly racey. With changes for the above problems it became more racey, so all uses of this head pointer was wrapped with a NFSLOCKMNT()/NFSUNLOCKMNT(). - Although the kernel malloc() usually allocates more bytes than requested and, as such, this wouldn't have caused problems, the allocation of a session structure was 1 byte smaller than it should have been. (Null termination byte for the string not included in byte count.) There are probably still problems with a pNFS data server that fails with NFS4ERR_BAD_SESSION, but I have no server that does this to test against (the AmazonEFS server doesn't do pNFS), so I can't fix these yet. Although this patch is fairly large, it should only affect the handling of NFS4ERR_BAD_SESSION error replies from an NFSv4.1 server. Thanks go to cperciva@ for the extension testing he did to help isolate/fix these problems. Reported by: cperciva Tested by: cperciva MFC after: 3 months Differential Revision: https://reviews.freebsd.org/D8745
# 1b819cf2	12-Aug-2016	Rick Macklem <rmacklem@FreeBSD.org>	Update the nfsstats structure to include the changes needed by the patch in D1626 plus changes so that it includes counts for NFSv4.1 (and the draft of NFSv4.2). Also, make all the counts uint64_t and add a vers field at the beginning, so that future revisions can easily be implemented. There is code in place to handle the old vesion of the nfsstats structure for backwards binary compatibility. Subsequent commits will update nfsstat(8) to use the new fields. Submitted by: will (earlier version) Reviewed by: ken MFC after: 1 month Relnotes: yes Differential Revision: https://reviews.freebsd.org/D1626
# 02abd400	19-Apr-2016	Pedro F. Giffuni <pfg@FreeBSD.org>	kernel: use our nitems() macro when it is available through param.h. No functional change, only trivial cases are done in this sweep, Discussed in: freebsd-current
# c15882f0	22-Dec-2014	Rick Macklem <rmacklem@FreeBSD.org>	Remove the old NFS client and server from head, which means that the NFSCLIENT and NFSSERVER kernel options will no longer work. This commit only removes the kernel components. Removal of unused code in the user utilities will be done later. This commit does not include an addition to UPDATING, but that will be committed in a few minutes. Discussed on: freebsd-fs
# c59e4cc3	01-Jul-2014	Rick Macklem <rmacklem@FreeBSD.org>	Merge the NFSv4.1 server code in projects/nfsv4.1-server over into head. The code is not believed to have any effect on the semantics of non-NFSv4.1 server behaviour. It is a rather large merge, but I am hoping that there will not be any regressions for the NFS server. MFC after: 1 month
# 54366c0b	25-Nov-2013	Attilio Rao <attilio@FreeBSD.org>	- For kernel compiled only with KDTRACE_HOOKS and not any lock debugging option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip
# cc085ba8	03-Nov-2013	Rick Macklem <rmacklem@FreeBSD.org>	During code inspection, I spotted that there was a code path where CLNT_CONTROL() would be called on "client" after it was released via CLNT_RELEASE(). It was unlikely that this code path gets executed and I have not heard of any problem report caused by this bug. This patch fixes the code so that this cannot happen. MFC after: 2 months
# 88a2437a	08-Jul-2013	Rick Macklem <rmacklem@FreeBSD.org>	Add support for host-based (Kerberos 5 service principal) initiator credentials to the kernel rpc. Modify the NFSv4 client to add support for the gssname and allgssname mount options to use this capability. Requires the gssd daemon to be running with the "-h" option. Reviewed by: jhb
# d96b98a3	17-Apr-2013	Kenneth D. Merry <ken@FreeBSD.org>	Revamp the old NFS server's File Handle Affinity (FHA) code so that it will work with either the old or new server. The FHA code keeps a cache of currently active file handles for NFSv2 and v3 requests, so that read and write requests for the same file are directed to the same group of threads (reads) or thread (writes). It does not currently work for NFSv4 requests. They are more complex, and will take more work to support. This improves read-ahead performance, especially with ZFS, if the FHA tuning parameters are configured appropriately. Without the FHA code, concurrent reads that are part of a sequential read from a file will be directed to separate NFS threads. This has the effect of confusing the ZFS zfetch (prefetch) code and makes sequential reads significantly slower with clients like Linux that do a lot of prefetching. The FHA code has also been updated to direct write requests to nearby file offsets to the same thread in the same way it batches reads, and the FHA code will now also send writes to multiple threads when needed. This improves sequential write performance in ZFS, because writes to a file are now more ordered. Since NFS writes (generally less than 64K) are smaller than the typical ZFS record size (usually 128K), out of order NFS writes to the same block can trigger a read in ZFS. Sending them down the same thread increases the odds of their being in order. In order for multiple write threads per file in the FHA code to be useful, writes in the NFS server have been changed to use a LK_SHARED vnode lock, and upgrade that to LK_EXCLUSIVE if the filesystem doesn't allow multiple writers to a file at once. ZFS is currently the only filesystem that allows multiple writers to a file, because it has internal file range locking. This change does not affect the NFSv4 code. This improves random write performance to a single file in ZFS, since we can now have multiple writers inside ZFS at one time. I have changed the default tuning parameters to a 22 bit (4MB) window size (from 256K) and unlimited commands per thread as a result of my benchmarking with ZFS. The FHA code has been updated to allow configuring the tuning parameters from loader tunable variables in addition to sysctl variables. The read offset window calculation has been slightly modified as well. Instead of having separate bins, each file handle has a rolling window of bin_shift size. This minimizes glitches in throughput when shifting from one bin to another. sys/conf/files: Add nfs_fha_new.c and nfs_fha_old.c. Compile nfs_fha.c when either the old or the new NFS server is built. sys/fs/nfs/nfsport.h, sys/fs/nfs/nfs_commonport.c: Bring in changes from Rick Macklem to newnfs_realign that allow it to operate in blocking (M_WAITOK) or non-blocking (M_NOWAIT) mode. sys/fs/nfs/nfs_commonsubs.c, sys/fs/nfs/nfs_var.h: Bring in a change from Rick Macklem to allow telling nfsm_dissect() whether or not to wait for mallocs. sys/fs/nfs/nfsm_subs.h: Bring in changes from Rick Macklem to create a new nfsm_dissect_nonblock() inline function and NFSM_DISSECT_NONBLOCK() macro. sys/fs/nfs/nfs_commonkrpc.c, sys/fs/nfsclient/nfs_clkrpc.c: Add the malloc wait flag to a newnfs_realign() call. sys/fs/nfsserver/nfs_nfsdkrpc.c: Setup the new NFS server's RPC thread pool so that it will call the FHA code. Add the malloc flag argument to newnfs_realign(). Unstaticize newnfs_nfsv3_procid[] so that we can use it in the FHA code. sys/fs/nfsserver/nfs_nfsdsocket.c: In nfsrvd_dorpc(), add NFSPROC_WRITE to the list of RPC types that use the LK_SHARED lock type. sys/fs/nfsserver/nfs_nfsdport.c: In nfsd_fhtovp(), if we're starting a write, check to see whether the underlying filesystem supports shared writes. If not, upgrade the lock type from LK_SHARED to LK_EXCLUSIVE. sys/nfsserver/nfs_fha.c: Remove all code that is specific to the NFS server implementation. Anything that is server-specific is now accessed through a callback supplied by that server's FHA shim in the new softc. There are now separate sysctls and tunables for the FHA implementations for the old and new NFS servers. The new NFS server has its tunables under vfs.nfsd.fha, the old NFS server's tunables are under vfs.nfsrv.fha as before. In fha_extract_info(), use callouts for all server-specific code. Getting file handles and offsets is now done in the individual server's shim module. In fha_hash_entry_choose_thread(), change the way we decide whether two reads are in proximity to each other. Previously, the calculation was a simple shift operation to see whether the offsets were in the same power of 2 bucket. The issue was that there would be a bucket (and therefore thread) transition, even if the reads were in close proximity. When there is a thread transition, reads wind up going somewhat out of order, and ZFS gets confused. The new calculation simply tries to see whether the offsets are within 1 << bin_shift of each other. If they are, the reads will be sent to the same thread. The effect of this change is that for sequential reads, if the client doesn't exceed the max_reqs_per_nfsd parameter and the bin_shift is set to a reasonable value (22, or 4MB works well in my tests), the reads in any sequential stream will largely be confined to a single thread. Change fha_assign() so that it takes a softc argument. It is now called from the individual server's shim code, which will pass in the softc. Change fhe_stats_sysctl() so that it takes a softc parameter. It is now called from the individual server's shim code. Add the current offset to the list of things printed out about each active thread. Change the num_reads and num_writes counters in the fha_hash_entry structure to 32-bit values, and rename them num_rw and num_exclusive, respectively, to reflect their changed usage. Add an enable sysctl and tunable that allows the user to disable the FHA code (when vfs.XXX.fha.enable = 0). This is useful for before/after performance comparisons. nfs_fha.h: Move most structure definitions out of nfs_fha.c and into the header file, so that the individual server shims can see them. Change the default bin_shift to 22 (4MB) instead of 18 (256K). Allow unlimited commands per thread. sys/nfsserver/nfs_fha_old.c, sys/nfsserver/nfs_fha_old.h, sys/fs/nfsserver/nfs_fha_new.c, sys/fs/nfsserver/nfs_fha_new.h: Add shims for the old and new NFS servers to interface with the FHA code, and callbacks for the The shims contain all of the code and definitions that are specific to the NFS servers. They setup the server-specific callbacks and set the server name for the sysctl and loader tunable variables. sys/nfsserver/nfs_srvkrpc.c: Configure the RPC code to call fhaold_assign() instead of fha_assign(). sys/modules/nfsd/Makefile: Add nfs_fha.c and nfs_fha_new.c. sys/modules/nfsserver/Makefile: Add nfs_fha_old.c. Reviewed by: rmacklem Sponsored by: Spectra Logic MFC after: 2 weeks
# 593efaf9	21-Feb-2013	John Baldwin <jhb@FreeBSD.org>	Further refine the handling of stop signals in the NFS client. The changes in r246417 were incomplete as they did not add explicit calls to sigdeferstop() around all the places that previously passed SBDRY to _sleep(). In addition, nfs_getcacheblk() could trigger a write RPC from getblk() resulting in sigdeferstop() recursing. Rather than manually deferring stop signals in specific places, change the VFS_() and VOP_() methods to defer stop signals for filesystems which request this behavior via a new VFCF_SBDRY flag. Note that this has to be a VFC flag rather than a MNTK flag so that it works properly with VFS_MOUNT() when the mount is not yet fully constructed. For now, only the NFS clients are set this new flag in VFS_SET(). A few other related changes: - Add an assertion to ensure that TDF_SBDRY doesn't leak to userland. - When a lookup request uses VOP_READLINK() to follow a symlink, mark the request as being on behalf of the thread performing the lookup (cnp_thread) rather than using a NULL thread pointer. This causes NFS to properly handle signals during this VOP on an interruptible mount. PR: kern/176179 Reported by: Russell Cattelan (sigdeferstop() recursion) Reviewed by: kib MFC after: 1 month
# a120a7a3	06-Feb-2013	John Baldwin <jhb@FreeBSD.org>	Rework the handling of stop signals in the NFS client. The changes in 195702, 195703, and 195821 prevented a thread from suspending while holding locks inside of NFS by forcing the thread to fail sleeps with EINTR or ERESTART but defer the thread suspension to the user boundary. However, this had the effect that stopping a process during an NFS request could abort the request and trigger EINTR errors that were visible to userland processes (previously the thread would have suspended and completed the request once it was resumed). This change instead effectively masks stop signals while in the NFS client. It uses the existing TDF_SBDRY flag to effect this since SIGSTOP cannot be masked directly. Also, instead of setting PBDRY on individual sleeps, the NFS client now sets the TDF_SBDRY flag around each NFS request and stop signals are masked for all sleeps during that region (the previous change missed sleeps in lockmgr locks). The end result is that stop signals sent to threads performing an NFS request are completely ignored until after the NFS request has finished processing and the thread prepares to return to userland. This restores the behavior of stop signals being transparent to userland processes while still preventing threads from suspending while holding NFS locks. Reviewed by: kib MFC after: 1 month
# a89a2c8b	25-Jan-2013	John Baldwin <jhb@FreeBSD.org>	Further cleanups to use of timestamps in NFS: - Use NFSD_MONOSEC (which maps to time_uptime) instead of the seconds portion of wall-time stamps to manage timeouts on events. - Remove unused nd_starttime from the per-request structure in the new NFS server. - Use nanotime() for the modification time on a delegation to get as precise a time as possible. - Use time_second instead of extracting the second from a call to getmicrotime(). Submitted by: bde (3) Reviewed by: bde, rmacklem MFC after: 2 weeks
# 6910d7a0	15-Jan-2013	John Baldwin <jhb@FreeBSD.org>	- More properly handle interrupted NFS requests on an interruptible mount by returning an error of EINTR rather than EACCES. - While here, bring back some (but not all) of the NFS RPC statistics lost when krpc was committed. Reviewed by: rmacklem MFC after: 1 week
# 1f60bfd8	08-Dec-2012	Rick Macklem <rmacklem@FreeBSD.org>	Move the NFSv4.1 client patches over from projects/nfsv4.1-client to head. I don't think the NFS client behaviour will change unless the new "minorversion=1" mount option is used. It includes basic NFSv4.1 support plus support for pNFS using the Files Layout only. All problems detecting during an NFSv4.1 Bakeathon testing event in June 2012 have been resolved in this code and it has been tested against the NFSv4.1 server available to me. Although not reviewed, I believe that kib@ has looked at it.
# 23b35663	19-Jan-2012	Rick Macklem <rmacklem@FreeBSD.org>	Martin Cracauer reported a problem to freebsd-current@ under the subject "Data corruption over NFS in -current". During investigation of this, I came across an ugly bogusity in the new NFS client where it replaced the cr_uid with the one used for the mount. This was done so that "system operations" like the NFSv4 Renew would be performed as the user that did the mount. However, if any other thread shares the credential with the one doing this operation, it could do an RPC (or just about anything else) as the wrong cr_uid. This patch fixes the above, by using the mount credentials instead of the one provided as an argument for this case. It appears to have fixed Martin's problem. This patch is needed for NFSv4 mounts and NFSv3 mounts against some non-FreeBSD servers that do not put post operation attributes in the NFSv3 Statfs RPC reply. Tested by: Martin Cracauer (cracauer at cons.org) Reviewed by: jhb MFC after: 2 weeks
# f7258644	07-Jan-2012	Rick Macklem <rmacklem@FreeBSD.org>	opt_inet6.h was missing from some files in the new NFS subsystem. The effect of this was, for clients mounted via inet6 addresses, that the DRC cache would never have a hit in the server. It also broke NFSv4 callbacks when an inet6 address was the only one available in the client. This patch fixes the above, plus deletes opt_inet6.h from a couple of files it is not needed for. MFC after: 2 weeks
# 713f46ac	20-Dec-2011	Rick Macklem <rmacklem@FreeBSD.org>	jwd@ reported a problem via email where the old NFS client would get a reply of EEXIST from an NFS server when a Mkdir RPC was retried, for an NFS over UDP mount. Upon investigation, it was found that the client was retransmitting the Mkdir RPC request over UDP, but with a different xid. As such, the retransmitted message would miss the Duplicate Request Cache in the server, causing it to reply EEXIST. The kernel client side UDP rpc code has two timers. The first one causes a retransmit using the same xid and socket and was set to a fixed value of 3seconds. (The default can be overridden via CLSET_RETRY_TIMEOUT.) The second one creates a new socket and xid and should be larger than the first. However, both NFS clients were setting the second timer to nm_timeo ("timeout=<value>" mount argument), which defaulted to 1second, so the first timer would never time out. This patch fixes both NFS clients so that they set the first timer using nm_timeo and makes the second timer larger than the first one. Reported by: jwd Tested by: jwd Reviewed by: jhb MFC after: 2 weeks
# 6a536cee	16-Jul-2011	Rick Macklem <rmacklem@FreeBSD.org>	The new NFSv4 client handled NFSERR_GRACE as a fatal error for the remove and rename operations. Some NFSv4 servers will report NFSERR_GRACE for these operations. This patch changes the behaviour of the client so that it handles NFSERR_GRACE like NFSERR_DELAY for non-state related operations like remove and rename. It also exempts the delegreturn operation from handling within newnfs_request() for NFSERR_DELAY/NFSERR_GRACE so that it can handle NFSERR_GRACE in the same manner as before. This problem was resolved thanks to discussion with bfields at fieldses.org. The problem was identified at the recent NFSv4 ineroperability bakeathon. MFC after: 2 weeks
# a9285ae5	16-Jul-2011	Zack Kirsch <zack@FreeBSD.org>	Add DEXITCODE plumbing to NFS. Isilon has the concept of an in-memory exit-code ring that saves the last exit code of a function and allows for stack tracing. This is very helpful when debugging tough issues. This patch is essentially a no-op for BSD at this point, until we upstream the dexitcode logic itself. The patch adds DEXITCODE calls to every NFS function that returns an errno error code. A number of code paths were also reorganized to have single exit paths, to reduce code duplication. Submitted by: David Kwan <dkwan@isilon.com> Reviewed by: rmacklem Approved by: zml (mentor) MFC after: 2 weeks
# 7bb55def	22-Jun-2011	Rick Macklem <rmacklem@FreeBSD.org>	Plug an mbuf leak in the new NFS client that occurred when a server replied NFS3ERR_JUKEBOX/NFS4ERR_DELAY to an rpc. This affected both NFSv3 and NFSv4. Found during testing at the recent NFSv4 interoperability Bakeathon. MFC after: 2 weeks
# 72b7c8dd	22-Jun-2011	Rick Macklem <rmacklem@FreeBSD.org>	Fix the new NFSv4 client so that it uses the same uid as was used for doing a mount when performing system operations on AUTH_SYS mounts. This resolved an issue when mounting a Linux server. Found during testing at the recent NFSv4 interoperability Bakeathon. MFC after: 2 weeks
# 7e7fd7d1	19-Jun-2011	Rick Macklem <rmacklem@FreeBSD.org>	Fix the kgssapi so that it can be loaded as a module. Currently the NFS subsystems use five of the rpcsec_gss/kgssapi entry points, but since it was not obvious which others might be useful, all nineteen were included. Basically the nineteen entry points are set in a structure called rpc_gss_entries and inline functions defined in sys/rpc/rpcsec_gss.h check for the entry points being non-NULL and then call them. A default value is returned otherwise. Requested by rwatson. Reviewed by: jhb MFC after: 2 weeks
# 8f0e65c9	18-Jun-2011	Rick Macklem <rmacklem@FreeBSD.org>	Add DTrace support to the new NFS client. This is essentially cloned from the old NFS client, plus additions for NFSv4. A review of this code is in progress, however it was felt by the reviewer that it could go in now, before code slush. Any changes required by the review can be committed as bug fixes later.
# 1f376590	15-May-2011	Rick Macklem <rmacklem@FreeBSD.org>	Change the sysctl naming for the old and new NFS clients to vfs.oldnfs.xxx and vfs.nfs.xxx respectively. This makes the default nfs client use vfs.nfs.xxx after r221124.
# ebd9ef33	17-Apr-2011	Rick Macklem <rmacklem@FreeBSD.org>	Get rid of the "nfscl: consider increasing kern.ipc.maxsockbuf" message that was generated when doing experimental NFS client mounts. I put that message in because the krpc would hang with the default size for mounts that used large rsize/wsize values. Since the bug that caused these hangs was fixed by r213756, I think the message is no longer needed. MFC after: 2 weeks
# a7d5f7eb	19-Oct-2010	Jamie Gritton <jamie@FreeBSD.org>	A new jail(8) with a configuration file, to replace the work currently done by /etc/rc.d/jail.
# 44066061	16-May-2010	Rick Macklem <rmacklem@FreeBSD.org>	MFC: r207764 Patch the experimental NFS client so that it works for NFSv2 by adding the necessary mapping from NFSv3 procedure numbers to NFSv2 procedure numbers when doing NFSv2 RPCs.
# 23d9efa7	07-May-2010	Rick Macklem <rmacklem@FreeBSD.org>	Patch the experimental NFS client so that it works for NFSv2 by adding the necessary mapping from NFSv3 procedure numbers to NFSv2 procedure numbers when doing NFSv2 RPCs. MFC after: 1 week
# 227b9ebe	30-Apr-2010	Rick Macklem <rmacklem@FreeBSD.org>	MFC: r207170 An NFSv4 server will reply NFSERR_GRACE for non-recovery RPCs during the grace period after startup. This grace period must be at least the lease duration, which is typically 1-2 minutes. It seems prudent for the experimental NFS client to wait a few seconds before retrying such an RPC, so that the server isn't flooded with non-recovery RPCs during recovery. This patch adds an argument to nfs_catnap() to implement a 5 second delay for this case.
# 23f929df	24-Apr-2010	Rick Macklem <rmacklem@FreeBSD.org>	An NFSv4 server will reply NFSERR_GRACE for non-recovery RPCs during the grace period after startup. This grace period must be at least the lease duration, which is typically 1-2 minutes. It seems prudent for the experimental NFS client to wait a few seconds before retrying such an RPC, so that the server isn't flooded with non-recovery RPCs during recovery. This patch adds an argument to nfs_catnap() to implement a 5 second delay for this case. MFC after: 1 week
# 089f366a	12-Jul-2009	Rick Macklem <rmacklem@FreeBSD.org>	Add calls to the experimental nfs client for the case of an "intr" mount, so that signals that aren't supposed to terminate RPCs in progress are masked off during the RPC. Approved by: re (kensmith), kib (mentor)
# 05c965a2	24-May-2009	Rick Macklem <rmacklem@FreeBSD.org>	Crib the realign function out of nfs_krpc.c and add a call to it for the client side reply. Hopefully this fixes the problem with using the new krpc for arm for the experimental nfs client. Approved by: kib (mentor)
# 63bde62e	23-May-2009	Rick Macklem <rmacklem@FreeBSD.org>	Fix the experimental nfsv4 client so that it works for the case of a kerberized mount without a host based principal name. This will only work for mounts being done by a user other than root. Support for a host based principal name will not work until proposed changes to the rpcsec_gss part of the krpc are committed. It now builds for "options KGSSAPI". Approved by: kib (mentor)
# e2b84e03	22-May-2009	Rick Macklem <rmacklem@FreeBSD.org>	Fix the rpc_gss_secfind() call in nfs_commonkrpc.c so that the code will build when "options KGSSAPI" is specified without requiring the proposed changes that add host based initiator principal support. It will not handle the case where the client uses a host based initiator principal until those changes are committed. The code that uses those changes is #ifdef'd notyet until the krpc rpcsec_changes are committed. Approved by: kib (mentor)
# 86ce6a83	21-May-2009	Robert Watson <rwatson@FreeBSD.org>	Remove the unmaintained University of Michigan NFSv4 client from 8.x prior to 8.0-RELEASE. Rick Macklem's new and more feature-rich NFSv234 client and server are replacing it. Discussed with: rmacklem
# 15e8331f	15-May-2009	Rick Macklem <rmacklem@FreeBSD.org>	Fixed the Null callback RPCs so that they work with the new krpc. This required two changes: setting the program and version numbers before connect and fixing the handling of the Null Rpc case in newnfs_request(). Approved by: kib (mentor)
# 9ec7b004	04-May-2009	Rick Macklem <rmacklem@FreeBSD.org>	Add the experimental nfs subtree to the kernel, that includes support for NFSv4 as well as NFSv2 and 3. It lives in 3 subdirs under sys/fs: nfs - functions that are common to the client and server nfsclient - a mutation of sys/nfsclient that call generic functions to do RPCs and handle state. As such, it retains the buffer cache handling characteristics and vnode semantics that are found in sys/nfsclient, for the most part. nfsserver - the server. It includes a DRC designed specifically for NFSv4, that is used instead of the generic DRC in sys/rpc. The build glue will be checked in later, so at this point, it consists of 3 new subdirs that should not affect kernel building. Approved by: kib (mentor)