Cross Reference: /freebsd-10-stable/sys/fs/nfsserver/

History log of /freebsd-10-stable/sys/fs/nfsserver/
Revision	Date	Author	Comments (<<< Hide modified files) (Show modified files >>>)
358035	17-Feb-2020	rmacklem	MFC: r357149 Fix a crash in the NFSv4 server. The PR reported a crash that occurred when a file was removed while client(s) were actively doing lock operations on it. Since nfsvno_getvp() will return NULL when the file does not exist, the bug was obvious and easy to fix via this patch. It is a little surprising that this wasn't found sooner, but I guess the above case rarely occurs. PR: 242768 nfs_nfsdstate.c
347040	03-May-2019	rmacklem	MFC: r346365 Fix the NFSv4.0 server so that it does not support NFSv4.1 attributes. During inspection of a packet trace, I noticed that an NFSv4.0 mount reported that it supported attributes that are only defined for NFSv4.1. In practice, this bug appears to be benign, since NFSv4.0 clients will not use attributes that were added for NFSv4.1. However, this was not correct and this patch fixes the NFSv4.0 server so that it only supports attributes defined for NFSv4.0. It also adds a definition for NFSv4.1 attributes that can only be set, although it is only defined as 0 for now. This is anticipation of the addition of support for the NFSv4.1 mode+mask attribute soon. /freebsd-10-stable/sys/fs/nfs/nfs.h /freebsd-10-stable/sys/fs/nfs/nfs_commonsubs.c /freebsd-10-stable/sys/fs/nfs/nfsproto.h nfs_nfsdport.c
346779	27-Apr-2019	rmacklem	MFC: r346191 Add support for INET6 addresses to the kernel code that dumps open/lock state. PR#223036 reported that INET6 callback addresses were not printed by nfsdumpstate(8). This kernel patch adds INET6 addresses to the dump structure, so that nfsdumpstate(8) can print them out, post-r346190. nfs_nfsdserv.c nfs_nfsdstate.c /freebsd-10-stable/sys/modules/nfsd/Makefile
340184	06-Nov-2018	avg	MFC r339595: nfsrvd_readdirplus: for some errors, do not fail the entire request Sponsored by: Panzura nfs_nfsdport.c
338132	21-Aug-2018	rmacklem	MFC: r336839 Modify the NFSv4.1 server so that it allows ReclaimComplete as done by ESXi 6.7. I believe that a ReclaimComplete with rca_one_fs == TRUE is only to be used after a file system has been transferred to a different file server. However, RFC5661 is somewhat vague w.r.t. this and the ESXi 6.7 client does both a ReclaimComplete with rca_one_fs == TRUE and one with ReclaimComplete with rca_one_fs == FALSE. Therefore, just ignore the rca_one_fs == TRUE operation and return NFS_OK without doing anything instead of replying NFS4ERR_NOTSUPP. This allows the ESXi 6.7 NFSv4.1 client to do a mount. After discussion on the NFSv4 IETF working group mailing list, doing this along with setting a flag to note that a ReclaimComplete with rca_one_fs TRUE was an appropriate way to handle this. The flag that indicates that a ReclaimComplete with rca_one_fs == TRUE was done may be used to disable replies of NFS4ERR_GRACE for non-reclaim state operations in a future commit. This patch along with r332790, r334492 and r336357 allow ESXi 6.7 NFSv4.1 mounts work ok. ESX 6.5 NFSv4.1 mounts do not work well, due to what I believe are violations of RFC-5661 and should not be used. /freebsd-10-stable/sys/fs/nfs/nfs.h /freebsd-10-stable/sys/fs/nfs/nfs_var.h nfs_nfsdserv.c nfs_nfsdstate.c
337058	01-Aug-2018	rmacklem	MFC: r336357 Modify the reasons for not issuing a delegation in the NFSv4.1 server. The ESXi NFSv4.1 client will generate warning messages when the reason for not issuing a delegation is two. Two refers to a resource limit and I do not see why it would be considered invalid. However it probably was not the best choice of reason for not issuing a delegation. This patch changes the reasons used to ones that the ESXi client doesn't complain about. This change does not affect the FreeBSD client and does not appear to affect behaviour of the Linux NFSv4.1 client. RFC5661 defines these "reasons" but does not give any guidance w.r.t. which ones are more appropriate to return to a client. /freebsd-10-stable/sys/fs/nfs/nfsproto.h nfs_nfsdserv.c nfs_nfsdstate.c
337006	31-Jul-2018	rmacklem	MFC: r336215 Ignore the cookie verifier for NFSv4.1 when the cookie is 0. RFC5661 states that the cookie verifier should be 0 when the cookie is 0. However, the wording is somewhat unclear and a recent discussion on the nfsv4@ietf.org mailing list indicated that the NFSv4 server should ignore the cookie verifier's value when the dirctory offset cookie is 0. This patch deletes the check for this that would return NFSERR_BAD_COOKIE when the verifier was not 0. This was found during testing of the ESXi client against the NFSv4.1 server. nfs_nfsdport.c
336846	28-Jul-2018	rmacklem	MFC: r334492 Add the BindConnectiontoSession operation to the NFSv4.1 server. Under some fairly unusual circumstances, the Linux NFSv4.1 client is doing a BindConnectiontoSession operation for TCP connections. It is also used by the ESXi6.5 NFSv4.1 client. This patch adds this operation to the NFSv4.1 server. PR: 226493 /freebsd-10-stable/sys/fs/nfs/nfs.h /freebsd-10-stable/sys/fs/nfs/nfs_commonsubs.c /freebsd-10-stable/sys/fs/nfs/nfs_var.h /freebsd-10-stable/sys/fs/nfs/nfsproto.h nfs_nfsdserv.c nfs_nfsdsocket.c nfs_nfsdstate.c
336518	19-Jul-2018	rmacklem	MFC: r333766 Add a missing nfsrv_freesession() call for an unlikely failure case. Since NFSv4.1 clients normally create a single session which supports both fore and back channels, it is unlikely that a callback will fail due to a lack of a back channel. However, if this failure occurred, the session wasn't being dereferenced and would never be free'd. Found by inspection during pNFS server development. nfs_nfsdstate.c
336422	17-Jul-2018	rmacklem	MFC: r333645 End grace for the NFSv4 server if all mounts do ReclaimComplete. The NFSv4 protocol requires that the server only allow reclaim of state and not issue any new open/lock state for a grace period after booting. The NFSv4.0 protocol required this grace period to be greater than the lease duration (over 2minutes). For NFSv4.1, the client tells the server that it has done reclaiming state by doing a ReclaimComplete operation. If all NFSv4 clients are NFSv4.1, the grace period can end once all the clients have done ReclaimComplete, shortening the time period considerably. This patch does this. If there are any NFSv4.0 mounts, the grace period will still be over 2minutes. This change is only an optimization and does not affect correct operation. /freebsd-10-stable/sys/fs/nfs/nfsport.h nfs_nfsdstate.c
336234	12-Jul-2018	rmacklem	MFC: r333579 The NFSv4.1 server should return NFSERR_BACKCHANBUSY instead of NFS_OK. When an NFSv4.1 session is busy due to a callback being in progress, nfsrv_freesession() should return NFSERR_BACKCHANBUSY instead of NFS_OK. The only effect this has is that the DestroySession operation will report the failure for this case and this probably has little or no effect on a client. Spotted by inspection and no failures related to this have been reported. nfs_nfsdstate.c
336179	10-Jul-2018	rmacklem	MFC: r333508 Add support for the TestStateID operation to the NFSv4.1 server. The Linux client now uses the TestStateID operation, so this patch adds support for it to the NFSv4.1 server. The FreeBSD client never uses this operation, so it should not be affected. /freebsd-10-stable/sys/fs/nfs/nfs_var.h nfs_nfsdserv.c nfs_nfsdsocket.c nfs_nfsdstate.c
334741	06-Jun-2018	rmacklem	MFC: r333580 Fix a slow leak of session structures in the NFSv4.1 server. For a fairly rare case of a client doing an ExchangeID after a hard reboot, the old confirmed clientid still exists, but some clients use a new co_verifier. For this case, the server was not freeing up the sessions on the old confirmed clientid. This patch fixes this case. It also adds two LIST_INIT() macros, which are actually no-ops, since the structure is malloc()d with M_ZERO so the pointer is already set to NULL. It should have minimal impact, since the only way I could exercise this code path was by doing a hard power cycle (pulling the plus) on a machine running Linux with a NFSv4.1 mount on the server. Originally spotted during testing of the ESXi 6.5 client. PR: 228497 nfs_nfsdstate.c
334699	06-Jun-2018	rmacklem	MFC: r334396 Strengthen locking for the NFSv4.1 server DestroySession operation. If a client did a DestroySession on a session while it was still in use, the server might try to use the session structure after it is free'd. I think the client has violated RFC5661 if it does this, but this patch makes DestroySession block all other nfsd threads so no thread could be using the session when it is free'd. After the DestroySession, nfsd threads will not be able to find the session. The patch also adds a check for nd_sessionid being set, although if that was not the case it would have been all 0s and unlikely to have a false match. This might fix the crashes described in PR#228497 for the FreeNAS server. PR: 228497 nfs_nfsdstate.c
334633	04-Jun-2018	rmacklem	MFC: r333592 Fix the eir_server_scope reply argument for NFSv4.1 ExchangeID. In the reply to an ExchangeID operation, the NFSv4.1 server returns a "scope" value (eir_server_scope). If this value is the same, it indicates that two servers share state, which is never the case for FreeBSD servers. As such, the value needs to be unique and it was without this patch. However, I just found out that it is not supposed to change when the server reboots and without this patch, it did change. This patch fixes eir_server_scope so that it does not change when the server is rebooted. The only affect not having this patch has is that Linux clients don't reclaim opens and locks after a server reboot, which meant they lost any byte range locks held before the server rebooted. It only affects NFSv4.1 mounts and the FreeBSD NFSv4.1 client was not affected by this bug. nfs_nfsdserv.c
327008	19-Dec-2017	rmacklem	MFC: r326544 Avoid the overhead of acquiring a lock in nfsrv_checkgetattr() when there are no write delegations issued. manu@ reported on the freebsd-current@ mailing list that there was a significant performance hit in nfsrv_checkgetattr() caused by the acquisition/release of a state lock, even when there were no write delegations issued. This patch add a count of outstanding issued write delegations to the NFSv4 server. This count allows nfsrv_checkgetattr() to return without acquiring any lock when the count is 0, avoiding the performance hit for the case where no write delegations are issued. nfs_nfsdstate.c
325448	05-Nov-2017	rmacklem	MFC: r324639 Fix the client IP address reported by nfsdumpstate for 64bit arch and NFSv4.1. The client IP address was not being reported for some NFSv4 mounts by nfsdumpstate. Upon investigation, two problems were found for mounts using IPv4. One was that the code (originally written and tested on i386) assumed that a "u_long" was a "uint32_t" and would exactly store an IPv4 host address. Not correct for 64bit arches. Also, for NFSv4.1 mounts, the field was not being filled in. This was basically correct, because NFSv4.1 does not use a callback address. However, it meant that nfsdumpstate could not report the client IP addr. This patch should fix both of these issues. For IPv6, the address will still not be reported. The original NFSv4 RFC only specified IPv4 callback addresses. I think this has changed and, if so, a future commit to fix reporting of IPv6 addresses will be needed. nfs_nfsdserv.c nfs_nfsdstate.c
324544	11-Oct-2017	rmacklem	MFC: r323978 Change a panic to an error return. There was a panic() in the NFS server's write operation that didn't need to be a panic() and could just be an error return. This patch makes that change. Found by code inspection during development of the pNFS service. nfs_nfsdserv.c
322138	07-Aug-2017	mav	MFC r321794: Improve FHA locality control for NFS read/write requests. This change adds two new tunables, allowing to control serialization for read and write NFS requests separately. It does not change the default behavior since there are too many factors to consider, but gives additional space for further experiments and tuning. The main motivation for this change is very low write speed in case of ZFS with sync=always or when NFS clients requests sychronous operation, when every separate request has to be written/flushed to ZIL, and requests are processed one at a time. Setting vfs.nfsd.fha.write=0 in that case allows to increase ZIL throughput by several times by coalescing writes and cache flushes. There is a worry that doing it may increase data fragmentation on disks, but I suppose it should not happen for pool with SLOG. Sponsored by: iXsystems, Inc. nfs_fha_new.c /freebsd-10-stable/sys/nfs/nfs_fha.c /freebsd-10-stable/sys/nfs/nfs_fha.h
317986	08-May-2017	rmacklem	MFC: r317382 Allow use of a write open stateid for reading in the NFSv4 server. The NFSv4 RFCs give a server the option of allowing the use of an open stateid for write access to be used for a Read operation. This patch enables this by default and adds a sysctl to disable it, for anyone who does not want this capability. Allowing this is particularily useful for a pNFS Data Server (DS), since they are not permitted to allow the use of special stateids. Discovered during recent testing of the pNFS server under development. nfs_nfsdstate.c
317914	07-May-2017	rmacklem	MFC: r317236 Fix the setting of atime for Linux client NFSv4 mounts. The FreeBSD NFSv4 server did not set the attribute bit for TimeAccess in the reply to an Open with exclusive_create, as required by the RFCs. (This is required since the FreeBSD NFS server stores the create_verifier in the va_atime attribute.) As such, the Linux NFSv4 client did not set the TimeAccess (atime) in the Setattr done in an RPC after the one with the Open/exclusive_create. This patch fixes the server to set the TimeAccess bit in the reply. I believe that storing the create_verifier in an extended attribute for file systems that support extended attributes might be a good idea, but I will wait for a discussion of this on the freebsd-fs@ email list before considering committing a patch to do this. nfs_nfsdport.c
314034	21-Feb-2017	avg	MFC r313735: add svcpool_close to handle killed nfsd threads PR: 204340 Reported by: Panzura Reviewed by: rmacklem Approved by: rmacklem nfs_nfsdkrpc.c /freebsd-10-stable/sys/rpc/svc.c /freebsd-10-stable/sys/rpc/svc.h
312421	19-Jan-2017	jpaetzel	MFC 311122 Workaround NFS bug with readdirplus when there are greater than 1 billion files in a filesystem. Reviewed by: kib nfs_nfsdport.c
310303	19-Dec-2016	rmacklem	MFC: r309566 Fix the NFSv4.1 server for Open reclaim after a reboot. The NFSv4.1 server failed to update the nfs-stablerestart file for a client when the client was issued its first Open. As such, recovery of Opens after a server reboot failed with NFSERR_NOGRACE. This patch fixes this. It also changes the code so that it malloc()'s the 1024 byte array instead of allocating it on the kernel stack for both NFSv4.0 and NFSv4.1. Note that this bug only affected NFSv4.1 and only when clients attempted to reclaim Opens after a server reboot. nfs_nfsdstate.c
308241	03-Nov-2016	rmacklem	MFC: r307694 A problem w.r.t. interoperation between the FreeBSD NFSv4.1 server with delegations enabled and the Linux NFSv4.1 client was reported in reviews.freebsd.org/D7891. I believe that the FreeBSD server behaviour conforms to the RFC and that the Linux client has a bug. Therefore, I do not think the proposed patch is appropriate. When nfsrv_writedelegifpos is non-zero, the FreeBSD server will issue a write delegation for a read open if possible. The Linux client then erroneously assumes that the credentials used for the read open can write the file. This patch reverses the default value for nfsrv_writedelegifpos to 0 so that the default behaviour is Linux compatible and adds a sysctl that can be used to set nfsrv_writedelegifpos. This change should only affect users that are mounting a FreeBSD server with delegations enabled (they are not enabled by default) with a Linux NFSv4.1 client mount. nfs_nfsdstate.c
306663	03-Oct-2016	rmacklem	Revert r306659 since the userland changes won't merge and this would break the build. /freebsd-10-stable/sys/fs/nfs/nfs_commonkrpc.c /freebsd-10-stable/sys/fs/nfs/nfs_commonport.c /freebsd-10-stable/sys/fs/nfs/nfsport.h /freebsd-10-stable/sys/fs/nfs/nfsproto.h /freebsd-10-stable/sys/fs/nfsclient/nfs_clbio.c /freebsd-10-stable/sys/fs/nfsclient/nfs_clcomsubs.c /freebsd-10-stable/sys/fs/nfsclient/nfs_clstate.c /freebsd-10-stable/sys/fs/nfsclient/nfs_clsubs.c /freebsd-10-stable/sys/fs/nfsclient/nfs_clvfsops.c /freebsd-10-stable/sys/fs/nfsclient/nfs_clvnops.c nfs_nfsdcache.c nfs_nfsdport.c nfs_nfsdsocket.c nfs_nfsdstate.c
306659	03-Oct-2016	rmacklem	MFC: r304026 Update the nfsstats structure to include the changes needed by the patch in D1626 plus changes so that it includes counts for NFSv4.1 (and the draft of NFSv4.2). Also, make all the counts uint64_t and add a vers field at the beginning, so that future revisions can easily be implemented. There is code in place to handle the old vesion of the nfsstats structure for backwards binary compatibility. Subsequent commits will update nfsstat(8) to use the new fields. /freebsd-10-stable/sys/fs/nfs/nfs_commonkrpc.c /freebsd-10-stable/sys/fs/nfs/nfs_commonport.c /freebsd-10-stable/sys/fs/nfs/nfsport.h /freebsd-10-stable/sys/fs/nfs/nfsproto.h /freebsd-10-stable/sys/fs/nfsclient/nfs_clbio.c /freebsd-10-stable/sys/fs/nfsclient/nfs_clcomsubs.c /freebsd-10-stable/sys/fs/nfsclient/nfs_clstate.c /freebsd-10-stable/sys/fs/nfsclient/nfs_clsubs.c /freebsd-10-stable/sys/fs/nfsclient/nfs_clvfsops.c /freebsd-10-stable/sys/fs/nfsclient/nfs_clvnops.c nfs_nfsdcache.c nfs_nfsdport.c nfs_nfsdsocket.c nfs_nfsdstate.c
300778	26-May-2016	rmacklem	MFC: r299514 Fix use-after-free in NFS4 lock test service. Trivial use-after-free where stp was freed too soon in the non-error path. To fix, simply move its release to the end of the routine. nfs_nfsdserv.c
300379	21-May-2016	rmacklem	MFC: r299226 Don't increment srvrpccnt[] for the NFSv4.1 operations. When support for NFSv4.1 was added to the NFS server, it broke the server rpc count stats, since newnfsstats.srvrpccnt[] doesn't have entries for the new NFSv4.1 operations. Without this patch, the code was incrementing bogus entries in newnfsstats for the new NFSv4.1 operations. This patch is an interim fix. The nfsstats structure needs to be updated and that will come in a future commit. nfs_nfsdsocket.c
300254	20-May-2016	rmacklem	MFC: r299201 Give mountd -S priority over outstanding RPC requests when suspending the nfsd. It was reported via email that under certain heavy RPC loads long delays before the exports would be updated was observed when using "mountd -S". This patch reverses the priority between the exclusive lock request to suspend the nfsd threads and the shared lock request for performing RPCs. As such, when mountd attempts to suspend the nfsd threads, it gets priority over outstanding RPC requests to do this. I suspect that the case reported was an artificial test load, but this patch did fix the problem for the reporter. nfs_nfsdkrpc.c
299223	07-May-2016	rmacklem	MFC: r298523 Allow the NFSv4 server to reply NFSERR_WRONGSEC for the SetClientID operation. It was reported via email that a Linux client couldn't do a Kerberized NFS mount when only "sec=krb5" was specified for the exports. The Linux client attempted a mount via krb5i and the server replied NFSERR_SERVERFAULT. Although NFSERR_WRONGSEC isn't listed as an error for SetClientID, I think it is the correct reply, so this patch enables that. I do not know if this fixes the mount attempt, but adding "krb5i" to the list of allowed security flavours does allow the mount to work. nfs_nfsdsubs.c
299222	07-May-2016	rmacklem	MFC: r298495 Fix a LOR in the NFSv4.1 server. The ordering of acquisition of the state and session mutexes was reversed in two cases executed when an NFSv4.1 client created/freed a session. Since clients will typically do this only when mounting and dismounting, the likelyhood of causing a deadlock was low but possible. This can only occur for NFSv4.1 mounts, since the others do not use sessions. This was detected while testing the pNFS server/client where the client crashed during dismounting. The patch also reorders the unlocks, although that isn't necessary for correct operation. /freebsd-10-stable/sys/fs/nfs/nfsrvstate.h nfs_nfsdstate.c
299208	07-May-2016	rmacklem	MFC: r297869 If the VOP_SETATTR() call that saves the exclusive create verifier failed, the NFS server would leave the newly created vnode locked. This could result in a file system that would not unmount and processes wedged, waiting for the file to be unlocked. Since this VOP_SETATTR() never fails for most file systems, this bug doesn't normally manifest itself. I found it during testing of an exported GlusterFS file system, which can fail. This patch adds the vput() and changes the error to the correct NFS one. nfs_nfsdport.c
292223	14-Dec-2015	rmacklem	MFC: r291527 Add kernel support to the NFS server for the "-manage-gids" option that will be added to the nfsuserd daemon in a future commit. It modifies the cache used by NFSv4 for name<-->id translation (both username/uid and group/gid) to support this. When "-manage-gids" is set, the server looks up each uid for the RPC and uses the list of groups cached in the server instead of the list of groups provided in the RPC request. The cached group list is acquired for the cache by the nfsuserd daemon via getgrouplist(3). This avoids the 16 groups limit for the list in the RPC request. Since the cache is now used for every RPC when "-manage-gids" is enabled, the code also modifies the cache to use a separate mutex for each hash list instead of a single global mutex. /freebsd-10-stable/sys/fs/nfs/nfs.h /freebsd-10-stable/sys/fs/nfs/nfs_commonport.c /freebsd-10-stable/sys/fs/nfs/nfs_commonsubs.c /freebsd-10-stable/sys/fs/nfs/nfs_var.h /freebsd-10-stable/sys/fs/nfs/nfsrvstate.h nfs_nfsdport.c /freebsd-10-stable/sys/nfs/nfssvc.h
291869	05-Dec-2015	rmacklem	MFC: r291150 When the nfsd threads are terminated, the NFSv4 server state (opens, locks, etc) is retained, which I believe is correct behaviour. However, for NFSv4.1, the server also retained a reference to the xprt (RPC transport socket structure) for the backchannel. This caused svcpool_destroy() to not call SVC_DESTROY() for the xprt and allowed a socket upcall to occur after the mutexes in the svcpool were destroyed, causing a crash. This patch fixes the code so that the backchannel xprt structure is dereferenced just before svcpool_destroy() is called, so the code does do an SVC_DESTROY() on the xprt, which shuts down the socket upcall. /freebsd-10-stable/sys/fs/nfs/nfs_var.h nfs_nfsdkrpc.c nfs_nfsdstate.c
287267	28-Aug-2015	rmacklem	MFC: r286790 For the case where an NFSv4.1 ExchangeID operation has the client identifier that already has a confirmed ClientID, the nfsrv_setclient() function would not fill in the clientidp being returned. As such, the value of ClientID returned would be whatever garbage was on the stack. This patch fixes the problem by filling in these fields. nfs_nfsdstate.c
286164	01-Aug-2015	rmacklem	MFC: r286046 This patch fixes a problem where, if the NFSv4 server has a previous unconfirmed clientid structure for the same client on the last hash list, this old entry would not be removed/deleted. I do not think this bug would have caused serious problems, since the new entry would have been before the old one on the list. This old entry would have eventually been scavenged/removed. nfs_nfsdstate.c
284334	12-Jun-2015	rmacklem	MFC: r283753 Make the NFS server use shared vnode locks for a few cases that are allowed by the VFS/VOP interface instead of using exclusive locks. nfs_nfsdsocket.c
284318	12-Jun-2015	rmacklem	Add TUNABLE_INT() macros so that the tunables MFC'd from head as r284216 are set via /boot/loader.conf in stable/10. This is a direct commit to stable/10 because TUNABLE_INT() is deprecated in head. nfs_nfsdstate.c
284216	10-Jun-2015	rmacklem	MFC: r283635 Make the size of the hash tables used by the NFSv4 server tunable. No appreciable change in performance was observed after increasing the sizes of these tables and then testing with a single client. However, there was an email that indicated high CPU overheads for a heavily loaded NFSv4 and it is hoped that increasing the sizes of the hash tables via these tunables might help. The tables remain the same size by default. /freebsd-10-stable/sys/fs/nfs/nfs.h /freebsd-10-stable/sys/fs/nfs/nfsdport.h /freebsd-10-stable/sys/fs/nfs/nfsrvstate.h nfs_nfsdport.c nfs_nfsdserv.c nfs_nfsdsocket.c nfs_nfsdstate.c nfs_nfsdsubs.c
284199	10-Jun-2015	kib	MFC r283600: Perform SU cleanup in the AST handler. Do not sleep waiting for SU cleanup while owning vnode lock. On MFC, for KBI stability, td_su member was moved to the end of the struct thread. nfs_nfsdkrpc.c /freebsd-10-stable/sys/kern/kern_exit.c /freebsd-10-stable/sys/kern/kern_kthread.c /freebsd-10-stable/sys/kern/kern_thr.c /freebsd-10-stable/sys/kern/subr_trap.c /freebsd-10-stable/sys/sys/proc.h /freebsd-10-stable/sys/sys/systm.h /freebsd-10-stable/sys/ufs/ffs/ffs_softdep.c
282340	02-May-2015	rmacklem	MFC: r281962 Fix the NFS server's handling of a bogus NFSv2 ROOT RPC. The ROOT RPC is deprecated in the NFSv2 RFC, RFC-1094 and should never be used by a client. nfs_nfsdkrpc.c
282271	30-Apr-2015	rmacklem	MFC: r281628 mav@ has found that NFS servers exporting ZFS file systems can perform better when using a 128K read/write data size. This patch changes NFS_MAXDATA from 64K to 128K so that clients can use 128K for NFS mounts to allow this. The patch also renames NFS_MAXDATA to NFS_SRVMAXIO so that it is clear that it applies to the NFS server side only. It also avoids a name conflict with the NFS_MAXDATA defined in rpcsvc/nfs_prot.h, that is used for userland RPC. /freebsd-10-stable/sys/fs/nfs/nfs.h /freebsd-10-stable/sys/fs/nfs/nfs_commonport.c /freebsd-10-stable/sys/fs/nfs/nfsproto.h nfs_nfsdserv.c
282270	30-Apr-2015	rmacklem	MFC: r281562 File systems that do not use the buffer cache (such as ZFS) must use VOP_FSYNC() to perform the NFS server's Commit operation. This patch adds a mnt_kern_flag called MNTK_USES_BCACHE which is set by file systems that use the buffer cache. If this flag is not set, the NFS server always does a VOP_FSYNC(). This should be ok for old file system modules that do not set MNTK_USES_BCACHE, since calling VOP_FSYNC() is correct, although it might not be optimal for file systems that use the buffer cache. /freebsd-10-stable/sys/fs/ext2fs/ext2_vfsops.c /freebsd-10-stable/sys/fs/fuse/fuse_vfsops.c /freebsd-10-stable/sys/fs/msdosfs/msdosfs_vfsops.c /freebsd-10-stable/sys/fs/nandfs/nandfs_vfsops.c /freebsd-10-stable/sys/fs/nfsclient/nfs_clvfsops.c nfs_nfsdport.c /freebsd-10-stable/sys/fs/nullfs/null_vfsops.c /freebsd-10-stable/sys/kern/vfs_subr.c /freebsd-10-stable/sys/sys/mount.h /freebsd-10-stable/sys/ufs/ffs/ffs_vfsops.c
280258	19-Mar-2015	rwatson	Merge r263233 from HEAD to stable/10: Update kernel inclusions of capability.h to use capsicum.h instead; some further refinement is required as some device drivers intended to be portable over FreeBSD versions rely on __FreeBSD_version to decide whether to include capability.h. Sponsored by: Google, Inc. /freebsd-10-stable/sys/amd64/amd64/sys_machdep.c /freebsd-10-stable/sys/amd64/linux32/linux32_machdep.c /freebsd-10-stable/sys/arm/arm/sys_machdep.c /freebsd-10-stable/sys/cam/ctl/ctl_frontend_iscsi.c /freebsd-10-stable/sys/cddl/compat/opensolaris/sys/file.h /freebsd-10-stable/sys/compat/freebsd32/freebsd32_capability.c /freebsd-10-stable/sys/compat/freebsd32/freebsd32_ioctl.c /freebsd-10-stable/sys/compat/freebsd32/freebsd32_misc.c /freebsd-10-stable/sys/compat/linux/linux_file.c /freebsd-10-stable/sys/compat/linux/linux_ioctl.c /freebsd-10-stable/sys/compat/linux/linux_socket.c /freebsd-10-stable/sys/compat/svr4/svr4_fcntl.c /freebsd-10-stable/sys/compat/svr4/svr4_filio.c /freebsd-10-stable/sys/compat/svr4/svr4_ioctl.c /freebsd-10-stable/sys/compat/svr4/svr4_misc.c /freebsd-10-stable/sys/compat/svr4/svr4_stream.c /freebsd-10-stable/sys/dev/aac/aac_linux.c /freebsd-10-stable/sys/dev/aacraid/aacraid_linux.c /freebsd-10-stable/sys/dev/amr/amr_linux.c /freebsd-10-stable/sys/dev/filemon/filemon.c /freebsd-10-stable/sys/dev/hwpmc/hwpmc_logging.c /freebsd-10-stable/sys/dev/ipmi/ipmi_linux.c /freebsd-10-stable/sys/dev/iscsi/icl.c /freebsd-10-stable/sys/dev/iscsi/icl_proxy.c /freebsd-10-stable/sys/dev/iscsi_initiator/iscsi.c /freebsd-10-stable/sys/dev/mfi/mfi_linux.c /freebsd-10-stable/sys/dev/tdfx/tdfx_linux.c /freebsd-10-stable/sys/fs/fdescfs/fdesc_vnops.c /freebsd-10-stable/sys/fs/fuse/fuse_vfsops.c /freebsd-10-stable/sys/fs/nfsclient/nfs_clport.c nfs_nfsdport.c /freebsd-10-stable/sys/i386/i386/sys_machdep.c /freebsd-10-stable/sys/i386/ibcs2/ibcs2_fcntl.c /freebsd-10-stable/sys/i386/ibcs2/ibcs2_ioctl.c /freebsd-10-stable/sys/i386/ibcs2/ibcs2_misc.c /freebsd-10-stable/sys/i386/linux/linux_machdep.c /freebsd-10-stable/sys/kern/imgact_elf.c /freebsd-10-stable/sys/kern/kern_descrip.c /freebsd-10-stable/sys/kern/kern_event.c /freebsd-10-stable/sys/kern/kern_exec.c /freebsd-10-stable/sys/kern/kern_exit.c /freebsd-10-stable/sys/kern/kern_ktrace.c /freebsd-10-stable/sys/kern/kern_sig.c /freebsd-10-stable/sys/kern/kern_sysctl.c /freebsd-10-stable/sys/kern/subr_capability.c /freebsd-10-stable/sys/kern/subr_syscall.c /freebsd-10-stable/sys/kern/subr_trap.c /freebsd-10-stable/sys/kern/sys_capability.c /freebsd-10-stable/sys/kern/sys_generic.c /freebsd-10-stable/sys/kern/sys_procdesc.c /freebsd-10-stable/sys/kern/tty.c /freebsd-10-stable/sys/kern/uipc_mqueue.c /freebsd-10-stable/sys/kern/uipc_sem.c /freebsd-10-stable/sys/kern/uipc_shm.c /freebsd-10-stable/sys/kern/uipc_syscalls.c /freebsd-10-stable/sys/kern/uipc_usrreq.c /freebsd-10-stable/sys/kern/vfs_acl.c /freebsd-10-stable/sys/kern/vfs_aio.c /freebsd-10-stable/sys/kern/vfs_extattr.c /freebsd-10-stable/sys/kern/vfs_lookup.c /freebsd-10-stable/sys/kern/vfs_syscalls.c /freebsd-10-stable/sys/netsmb/smb_dev.c /freebsd-10-stable/sys/nfsserver/nfs_srvkrpc.c /freebsd-10-stable/sys/security/mac/mac_syscalls.c /freebsd-10-stable/sys/sparc64/sparc64/sys_machdep.c /freebsd-10-stable/sys/ufs/ffs/ffs_alloc.c /freebsd-10-stable/sys/vm/vm_mmap.c
276500	01-Jan-2015	kib	MFC r275897: Set NOCACHE flag for CREATE namei() calls, do not specially handle MAKEENTRY in VOP_LOOKUP(). /freebsd-10-stable/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c /freebsd-10-stable/sys/fs/ext2fs/ext2_lookup.c /freebsd-10-stable/sys/fs/fuse/fuse_vnops.c /freebsd-10-stable/sys/fs/msdosfs/msdosfs_lookup.c /freebsd-10-stable/sys/fs/nandfs/nandfs_vnops.c /freebsd-10-stable/sys/fs/nfsclient/nfs_clvnops.c nfs_nfsdserv.c /freebsd-10-stable/sys/fs/tmpfs/tmpfs_vnops.c /freebsd-10-stable/sys/fs/unionfs/union_subr.c /freebsd-10-stable/sys/fs/unionfs/union_vnops.c /freebsd-10-stable/sys/kern/uipc_usrreq.c /freebsd-10-stable/sys/kern/vfs_syscalls.c /freebsd-10-stable/sys/kern/vfs_vnops.c /freebsd-10-stable/sys/nfsclient/nfs_vnops.c /freebsd-10-stable/sys/nfsserver/nfs_serv.c /freebsd-10-stable/sys/ufs/ffs/ffs_snapshot.c /freebsd-10-stable/sys/ufs/ufs/ufs_lookup.c
276437	31-Dec-2014	rmacklem	MFC: r276193 A deadlock in the NFSv4 server with vfs.nfsd.enable_locallocks=1 was reported via email. This was caused by a LOR between the sleep lock used to serialize the local locking (nfsrv_locklf()) and locking the vnode. I believe this patch fixes the problem by delaying relocking of the vnode until the sleep lock is unlocked (nfsrv_unlocklf()). To avoid nfsvno_advlock() having the side effect of unlocking the vnode, unlocking the vnode was moved to before the functions that call nfsvno_advlock(). It shouldn't affect the execution of the default case where vfs.nfsd.enable_locallocks=0. nfs_nfsdport.c nfs_nfsdstate.c
274027	03-Nov-2014	kib	MFC r273727: Original commit message was Allow the vfs.nfsd knobs to be set from loader.conf (or using kenv(8)). This is useful when nfsd is loaded as module. As I understand, automatic fetch from kenv does not work in stable/10. Merge the change still, to reduce code difference. nfs_nfsdkrpc.c
273877	31-Oct-2014	araujo	MFC r273159: Add two sysctl(8) to enable/disable NFSv4 server to check when setting user nobody and/or setting group nogroup as owner of a file or directory. Usually at the client side, if there is an username that is not in the client's passwd database, some clients will send 'nobody@<your.dns.domain>' in the wire and the NFSv4 server will treat it as an ERROR. However, if you have a valid user nobody in your passwd database, the NFSv4 server will treat it as a NFSERR_BADOWNER as its believes the client doesn't has the username mapped. Submitted by: Loic Blot <loic.blot@unix-experience.fr> Reviewed by: rmacklem Approved by: rmacklem Sponsored by: QNAP Systems Inc. nfs_nfsdsubs.c
273736	27-Oct-2014	hselasky	MFC r263710, r273377, r273378, r273423 and r273455: - De-vnet hash sizes and hash masks. - Fix multiple issues related to arguments passed to SYSCTL macros. Sponsored by: Mellanox Technologies /freebsd-10-stable/sys/amd64/amd64/fpu.c /freebsd-10-stable/sys/arm/arm/busdma_machdep-v6.c /freebsd-10-stable/sys/arm/arm/busdma_machdep.c /freebsd-10-stable/sys/cam/scsi/scsi_sa.c /freebsd-10-stable/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_znode.c /freebsd-10-stable/sys/cddl/dev/dtrace/dtrace_sysctl.c /freebsd-10-stable/sys/compat/ndis/kern_ndis.c /freebsd-10-stable/sys/dev/acpi_support/acpi_asus.c /freebsd-10-stable/sys/dev/acpi_support/acpi_asus_wmi.c /freebsd-10-stable/sys/dev/acpi_support/acpi_hp.c /freebsd-10-stable/sys/dev/acpi_support/acpi_ibm.c /freebsd-10-stable/sys/dev/acpi_support/acpi_rapidstart.c /freebsd-10-stable/sys/dev/acpi_support/acpi_sony.c /freebsd-10-stable/sys/dev/bxe/bxe.c /freebsd-10-stable/sys/dev/cxgb/cxgb_sge.c /freebsd-10-stable/sys/dev/cxgbe/t4_main.c /freebsd-10-stable/sys/dev/e1000/if_em.c /freebsd-10-stable/sys/dev/e1000/if_igb.c /freebsd-10-stable/sys/dev/e1000/if_lem.c /freebsd-10-stable/sys/dev/hatm/if_hatm.c /freebsd-10-stable/sys/dev/ixgbe/ixgbe.c /freebsd-10-stable/sys/dev/ixgbe/ixv.c /freebsd-10-stable/sys/dev/ixl/if_ixl.c /freebsd-10-stable/sys/dev/mpr/mpr.c /freebsd-10-stable/sys/dev/mps/mps.c /freebsd-10-stable/sys/dev/mrsas/mrsas.c /freebsd-10-stable/sys/dev/mrsas/mrsas.h /freebsd-10-stable/sys/dev/mxge/if_mxge.c /freebsd-10-stable/sys/dev/oce/oce_sysctl.c /freebsd-10-stable/sys/dev/qlxgb/qla_os.c /freebsd-10-stable/sys/dev/qlxgbe/ql_os.c /freebsd-10-stable/sys/dev/rt/if_rt.c /freebsd-10-stable/sys/dev/sound/pci/hda/hdaa.c /freebsd-10-stable/sys/dev/vxge/vxge.c /freebsd-10-stable/sys/dev/xen/netfront/netfront.c /freebsd-10-stable/sys/fs/devfs/devfs_devs.c /freebsd-10-stable/sys/fs/fuse/fuse_main.c /freebsd-10-stable/sys/fs/fuse/fuse_vfsops.c nfs_nfsdkrpc.c /freebsd-10-stable/sys/geom/geom_kern.c /freebsd-10-stable/sys/kern/kern_cpuset.c /freebsd-10-stable/sys/kern/kern_descrip.c /freebsd-10-stable/sys/kern/kern_mib.c /freebsd-10-stable/sys/kern/kern_synch.c /freebsd-10-stable/sys/kern/subr_devstat.c /freebsd-10-stable/sys/kern/subr_kdb.c /freebsd-10-stable/sys/kern/subr_uio.c /freebsd-10-stable/sys/kern/vfs_cache.c /freebsd-10-stable/sys/mips/mips/busdma_machdep.c /freebsd-10-stable/sys/net/if_lagg.c /freebsd-10-stable/sys/net/pfvar.h /freebsd-10-stable/sys/net80211/ieee80211_ht.c /freebsd-10-stable/sys/net80211/ieee80211_hwmp.c /freebsd-10-stable/sys/net80211/ieee80211_mesh.c /freebsd-10-stable/sys/net80211/ieee80211_superg.c /freebsd-10-stable/sys/netgraph/bluetooth/common/ng_bluetooth.c /freebsd-10-stable/sys/netgraph/ng_base.c /freebsd-10-stable/sys/netgraph/ng_socket.c /freebsd-10-stable/sys/netinet/cc/cc_chd.c /freebsd-10-stable/sys/netinet/tcp_reass.c /freebsd-10-stable/sys/netipsec/ipsec.h /freebsd-10-stable/sys/netipx/ipx_proto.c /freebsd-10-stable/sys/netpfil/pf/if_pfsync.c /freebsd-10-stable/sys/netpfil/pf/pf.c /freebsd-10-stable/sys/netpfil/pf/pf_ioctl.c /freebsd-10-stable/sys/ofed/drivers/net/mlx4/mlx4_en.h /freebsd-10-stable/sys/powerpc/powermac/fcu.c /freebsd-10-stable/sys/powerpc/powermac/smu.c /freebsd-10-stable/sys/powerpc/powerpc/busdma_machdep.c /freebsd-10-stable/sys/powerpc/powerpc/cpu.c /freebsd-10-stable/sys/sys/sysctl.h /freebsd-10-stable/sys/vm/memguard.c /freebsd-10-stable/sys/vm/vm_kern.c /freebsd-10-stable/sys/x86/x86/busdma_bounce.c
270066	16-Aug-2014	rmacklem	MFC: r269771 Change the NFS server's printf related to hitting the DRC cache's flood level so that it suggests increasing vfs.nfsd.tcphighwater. nfs_nfsdsocket.c
269655	07-Aug-2014	kib	MFC r269347: Do not generate 1000 unique lock names for nfsrc hash chain locks. Shorten the names of some nfs mutexes. /freebsd-10-stable/sys/fs/nfs/nfsrvcache.h nfs_nfsdport.c
269452	03-Aug-2014	rmacklem	MFC: r268273 The new NFSv3 server did not generate directory postop attributes for the reply to ReaddirPlus when the server failed within the loop that calls VFS_VGET(). This failure is most likely an error return from VFS_VGET() caused by a bogus d_fileno that was truncated to 32bits. This patch fixes the server so that it will return directory postop attributes for the failure. It does not fix the underlying issue caused by d_fileno being uint32_t when a file system like ZFS generates a fileno that is greater than 32bits. nfs_nfsdport.c
269398	01-Aug-2014	rmacklem	MFC: r268115 Merge the NFSv4.1 server code in projects/nfsv4.1-server over into head. The code is not believed to have any effect on the semantics of non-NFSv4.1 server behaviour. It is a rather large merge, but I am hoping that there will not be any regressions for the NFS server. /freebsd-10-stable/sys/conf/files /freebsd-10-stable/sys/fs/nfs/nfs.h /freebsd-10-stable/sys/fs/nfs/nfs_commonkrpc.c /freebsd-10-stable/sys/fs/nfs/nfs_commonport.c /freebsd-10-stable/sys/fs/nfs/nfs_commonsubs.c /freebsd-10-stable/sys/fs/nfs/nfs_var.h /freebsd-10-stable/sys/fs/nfs/nfsclstate.h /freebsd-10-stable/sys/fs/nfs/nfsdport.h /freebsd-10-stable/sys/fs/nfs/nfsport.h /freebsd-10-stable/sys/fs/nfs/nfsproto.h /freebsd-10-stable/sys/fs/nfs/nfsrvstate.h /freebsd-10-stable/sys/fs/nfsclient/nfs_clstate.c nfs_nfsdcache.c nfs_nfsdkrpc.c nfs_nfsdport.c nfs_nfsdserv.c nfs_nfsdsocket.c nfs_nfsdstate.c nfs_nfsdsubs.c /freebsd-10-stable/sys/modules/krpc/Makefile /freebsd-10-stable/sys/rpc/clnt_bck.c /freebsd-10-stable/sys/rpc/krpc.h /freebsd-10-stable/sys/rpc/svc.h /freebsd-10-stable/sys/rpc/svc_vc.c
268961	21-Jul-2014	bdrewery	MFC r268114: Change NFS readdir() to only ignore cookies preceding the given offset for UFS rather than for all but ZFS. nfs_nfsdport.c /freebsd-10-stable/sys/nfsserver/nfs_serv.c
267343	10-Jun-2014	rmacklem	MFC: r267191 The new NFS server would not allow a hard link to be created to a symlink. This restriction (which was inherited from OpenBSD) is not required by the NFS RFCs. Since this is allowed by the old NFS server, it is a POLA violation to not allow it. This patch modifies the new NFS server to allow this. nfs_nfsdserv.c
265714	08-May-2014	rmacklem	MFC: r265252 The new draft specification for NFSv4.0 specifies that a server should either accept owner and owner_group strings that are just the digits of the uid/gid or return NFS4ERR_BADOWNER. This patch adds a sysctl vfs.nfsd.enable_stringtouid, which can be set to enable the server w.r.t. accepting numeric string. It also ensures that NFS4ERR_BADOWNER is returned if numeric uid/gid strings are not enabled. This fixes the server for recent Linux nfs4 clients that use numeric uid/gid strings by default. /freebsd-10-stable/sys/fs/nfs/nfs_commonsubs.c nfs_nfsdport.c
265667	08-May-2014	rmacklem	MFC: r264888 The PR reported that the old NFS server did not set uio_td == NULL for the VOP_READ() call. This patch fixes both the old and new server for this case. nfs_nfsdport.c /freebsd-10-stable/sys/nfsserver/nfs_serv.c
265621	07-May-2014	rmacklem	MFC: r264845 Remove an unnecessary level of indirection for an argument. This simplifies the code and should avoid the clang sparc port from generating an abort() call. nfs_nfsdstate.c
264266	08-Apr-2014	delphij	Fix NFS deadlock vulnerability. [SA-14:05] Fix "Heartbleed" vulnerability and ECDSA Cache Side-channel Attack in OpenSSL. [SA-14:06] /freebsd-10-stable/crypto/openssl/crypto/bn/bn.h /freebsd-10-stable/crypto/openssl/crypto/bn/bn_lib.c /freebsd-10-stable/crypto/openssl/crypto/ec/ec2_mult.c /freebsd-10-stable/crypto/openssl/ssl/d1_both.c /freebsd-10-stable/crypto/openssl/ssl/t1_lib.c nfs_nfsdserv.c
261055	22-Jan-2014	mav	MFC r260229, r260258, r260367, r260390, r260459, r260648: Rework NFS Duplicate Request Cache cleanup logic. - Introduce additional hash to group requests by hash of sockref. This allows to process TCP acknowledgements without looping though all the cache, and as result allows to do it every time. - Indroduce additional callbacks to notify application layer about sockets disconnection. Without this last few requests processed just before socket disconnection never processed their ACKs and stuck in cache for many hours. - Implement transport-specific method for tracking reply acknowledgements. New implementation does not cross multiple stack layers to get the data and does not have race conditions that previously made some requests stuck in cache. This could be done more efficiently at sockbuf layer, but that would broke some KBIs, while I don't know other consumers for it aside NFS. - Instead of traversing all DRC twice per request, run cleaning only once per request, and except in some conditions traverse only single hash slot at a time. Together this limits NFS DRC growth only to situations of real connectivity problems. If network is working well, and so all replies are acknowledged, cache remains almost empty even after hours of heavy load. Without this change on the same test cache was growing to many thousand requests even with perfectly working local network. As another result this reduces CPU time spent on the DRC handling during SPEC NFS benchmark from about 10% to 0.5%. Sponsored by: iXsystems, Inc. /freebsd-10-stable/sys/fs/nfs/nfs_var.h /freebsd-10-stable/sys/fs/nfs/nfsrvcache.h nfs_nfsdcache.c nfs_nfsdkrpc.c nfs_nfsdport.c nfs_nfsdsubs.c /freebsd-10-stable/sys/rpc/svc.c /freebsd-10-stable/sys/rpc/svc.h /freebsd-10-stable/sys/rpc/svc_dg.c /freebsd-10-stable/sys/rpc/svc_vc.c
261051	22-Jan-2014	mav	MFC r259877: Slightly simplify expiration logic introduced in r254337. - Do not update the histogram for items we are any way deleting from cache. - Do not update the histogram if nfsrc_tcphighwater is not set. - Remove some extra math operations. nfs_nfsdcache.c
261049	22-Jan-2014	mav	MFC r259765: Fix RPC server threads file handle affinity to work better with ZFS. Instead of taking 8 specific bytes of file handle to identify file during RPC thread affitinity handling, use trivial hash of the full file handle. ZFS's struct zfid_short does not have padding field after the length field, as result, originally picked 8 bytes are loosing lower 16 bits of object ID, causing many false matches and unneeded requests affinity to same thread. This fix substantially improves NFS server latency and scalability in SPEC NFS benchmark by more flexible use of multiple NFS threads. nfs_fha_new.c /freebsd-10-stable/sys/nfs/nfs_fha.c /freebsd-10-stable/sys/nfs/nfs_fha.h /freebsd-10-stable/sys/nfsserver/nfs_fha_old.c
260159	01-Jan-2014	rmacklem	MFC: r259854 The NFSv4 server would call VOP_SETATTR() with a shared locked vnode when a Getattr for a file is done by a client other than the one that holds the file's delegation. This would only happen when delegations are enabled and the problem is fixed by this patch. /freebsd-10-stable/sys/fs/nfs/nfs_var.h nfs_nfsdport.c nfs_nfsdstate.c
260144	31-Dec-2013	rmacklem	MFC: r259845 An intermittent problem with NFSv4 exporting of ZFS snapshots was reported to the freebsd-fs mailing list. I believe the problem was caused by the Readdir operation using VFS_VGET() for a snapshot file entry instead of VOP_LOOKUP(). This would not occur for NFSv3, since it will do a VFS_VGET() of "." which fails with ENOTSUPP at the beginning of the directory, whereas NFSv4 does not check "." or "..". This patch adds a call to VFS_VGET() for the directory being read to check for ENOTSUPP. I also observed that the mount_on_fileid and fsid attributes were not correct at the snapshot's auto mountpoints when looking at packet traces for the Readdir. This patch fixes the attributes by doing a check for different v_mount structure, even if the vnode v_mountedhere is not set. nfs_nfsdport.c
256281	10-Oct-2013	gjb	Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle. Approved by: re (implicit) Sponsored by: The FreeBSD Foundation
255219	05-Sep-2013	pjd	Change the cap_rights_t type from uint64_t to a structure that we can extend in the future in a backward compatible (API and ABI) way. The cap_rights_t represents capability rights. We used to use one bit to represent one right, but we are running out of spare bits. Currently the new structure provides place for 114 rights (so 50 more than the previous cap_rights_t), but it is possible to grow the structure to hold at least 285 rights, although we can make it even larger if 285 rights won't be enough. The structure definition looks like this: struct cap_rights { uint64_t cr_rights[CAP_RIGHTS_VERSION + 2]; }; The initial CAP_RIGHTS_VERSION is 0. The top two bits in the first element of the cr_rights[] array contain total number of elements in the array - 2. This means if those two bits are equal to 0, we have 2 array elements. The top two bits in all remaining array elements should be 0. The next five bits in all array elements contain array index. Only one bit is used and bit position in this five-bits range defines array index. This means there can be at most five array elements in the future. To define new right the CAPRIGHT() macro must be used. The macro takes two arguments - an array index and a bit to set, eg. #define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL) We still support aliases that combine few rights, but the rights have to belong to the same array element, eg: #define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL) #define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL) #define CAP_FCHMODAT (CAP_FCHMOD \| CAP_LOOKUP) There is new API to manage the new cap_rights_t structure: cap_rights_t cap_rights_init(cap_rights_t rights, ...); void cap_rights_set(cap_rights_t rights, ...); void cap_rights_clear(cap_rights_t rights, ...); bool cap_rights_is_set(const cap_rights_t rights, ...); bool cap_rights_is_valid(const cap_rights_t rights); void cap_rights_merge(cap_rights_t dst, const cap_rights_t src); void cap_rights_remove(cap_rights_t dst, const cap_rights_t src); bool cap_rights_contains(const cap_rights_t big, const cap_rights_t little); Capability rights to the cap_rights_init(), cap_rights_set(), cap_rights_clear() and cap_rights_is_set() functions are provided by separating them with commas, eg: cap_rights_t rights; cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT); There is no need to terminate the list of rights, as those functions are actually macros that take care of the termination, eg: #define cap_rights_set(rights, ...) \ __cap_rights_set((rights), __VA_ARGS__, 0ULL) void __cap_rights_set(cap_rights_t *rights, ...); Thanks to using one bit as an array index we can assert in those functions that there are no two rights belonging to different array elements provided together. For example this is illegal and will be detected, because CAP_LOOKUP belongs to element 0 and CAP_PDKILL to element 1: cap_rights_init(&rights, CAP_LOOKUP \| CAP_PDKILL); Providing several rights that belongs to the same array's element this way is correct, but is not advised. It should only be used for aliases definition. This commit also breaks compatibility with some existing Capsicum system calls, but I see no other way to do that. This should be fine as Capsicum is still experimental and this change is not going to 9.x. Sponsored by: The FreeBSD Foundation
254337	14-Aug-2013	rmacklem	Fix several performance related issues in the new NFS server's DRC for NFS over TCP. - Increase the size of the hash tables. - Create a separate mutex for each hash list of the TCP hash table. - Single thread the code that deletes stale cache entries. - Add a tunable called vfs.nfsd.tcphighwater, which can be increased to allow the cache to grow larger, avoiding the overhead of frequent scans to delete stale cache entries. (The default value will result in frequent scans to delete stale cache entries, analagous to what the pre-patched code does.) - Add a tunable called vfs.nfsd.cachetcp that can be used to disable DRC caching for NFS over TCP, since the old NFS server didn't DRC cache TCP. It also adjusts the size of nfsrc_floodlevel dynamically, so that it is always greater than vfs.nfsd.tcphighwater. For UDP the algorithm remains the same as the pre-patched code, but the tunable vfs.nfsd.udphighwater can be used to allow the cache to grow larger and reduce the overhead caused by frequent scans for stale entries. UDP also uses a larger hash table size than the pre-patched code. Reported by: wollman Tested by: wollman (earlier version of patch) Submitted by: ivoras (earlier patch) Reviewed by: jhb (earlier version of patch) MFC after: 1 month
251171	31-May-2013	jeff	- Convert the bufobj lock to rwlock. - Use a shared bufobj lock in getblk() and inmem(). - Convert softdep's lk to rwlock to match the bufobj lock. - Move INFREECNT to b_flags and protect it with the buf lock. - Remove unnecessary locking around bremfree() and BKGRDINPROG. Sponsored by: EMC / Isilon Storage Division Discussed with: mckusick, kib, mdf
250657	15-May-2013	des	Fix typo in comment. Submitted by: Alex Weber <alexwebr@gmail.com> MFC after: 1 week
250055	29-Apr-2013	des	Fix a bug that allows NFS clients to issue READDIR on files. PR: kern/178016 Security: CVE-2013-3266 Security: FreeBSD-SA-13:05.nfsserver
249596	17-Apr-2013	ken	Move the NFS FHA (File Handle Affinity) code from sys/nfsserver to sys/nfs, since it is now shared by the two NFS servers. Suggested by: rmacklem Sponsored by: Spectra Logic MFC after: 2 weeks
249592	17-Apr-2013	ken	Revamp the old NFS server's File Handle Affinity (FHA) code so that it will work with either the old or new server. The FHA code keeps a cache of currently active file handles for NFSv2 and v3 requests, so that read and write requests for the same file are directed to the same group of threads (reads) or thread (writes). It does not currently work for NFSv4 requests. They are more complex, and will take more work to support. This improves read-ahead performance, especially with ZFS, if the FHA tuning parameters are configured appropriately. Without the FHA code, concurrent reads that are part of a sequential read from a file will be directed to separate NFS threads. This has the effect of confusing the ZFS zfetch (prefetch) code and makes sequential reads significantly slower with clients like Linux that do a lot of prefetching. The FHA code has also been updated to direct write requests to nearby file offsets to the same thread in the same way it batches reads, and the FHA code will now also send writes to multiple threads when needed. This improves sequential write performance in ZFS, because writes to a file are now more ordered. Since NFS writes (generally less than 64K) are smaller than the typical ZFS record size (usually 128K), out of order NFS writes to the same block can trigger a read in ZFS. Sending them down the same thread increases the odds of their being in order. In order for multiple write threads per file in the FHA code to be useful, writes in the NFS server have been changed to use a LK_SHARED vnode lock, and upgrade that to LK_EXCLUSIVE if the filesystem doesn't allow multiple writers to a file at once. ZFS is currently the only filesystem that allows multiple writers to a file, because it has internal file range locking. This change does not affect the NFSv4 code. This improves random write performance to a single file in ZFS, since we can now have multiple writers inside ZFS at one time. I have changed the default tuning parameters to a 22 bit (4MB) window size (from 256K) and unlimited commands per thread as a result of my benchmarking with ZFS. The FHA code has been updated to allow configuring the tuning parameters from loader tunable variables in addition to sysctl variables. The read offset window calculation has been slightly modified as well. Instead of having separate bins, each file handle has a rolling window of bin_shift size. This minimizes glitches in throughput when shifting from one bin to another. sys/conf/files: Add nfs_fha_new.c and nfs_fha_old.c. Compile nfs_fha.c when either the old or the new NFS server is built. sys/fs/nfs/nfsport.h, sys/fs/nfs/nfs_commonport.c: Bring in changes from Rick Macklem to newnfs_realign that allow it to operate in blocking (M_WAITOK) or non-blocking (M_NOWAIT) mode. sys/fs/nfs/nfs_commonsubs.c, sys/fs/nfs/nfs_var.h: Bring in a change from Rick Macklem to allow telling nfsm_dissect() whether or not to wait for mallocs. sys/fs/nfs/nfsm_subs.h: Bring in changes from Rick Macklem to create a new nfsm_dissect_nonblock() inline function and NFSM_DISSECT_NONBLOCK() macro. sys/fs/nfs/nfs_commonkrpc.c, sys/fs/nfsclient/nfs_clkrpc.c: Add the malloc wait flag to a newnfs_realign() call. sys/fs/nfsserver/nfs_nfsdkrpc.c: Setup the new NFS server's RPC thread pool so that it will call the FHA code. Add the malloc flag argument to newnfs_realign(). Unstaticize newnfs_nfsv3_procid[] so that we can use it in the FHA code. sys/fs/nfsserver/nfs_nfsdsocket.c: In nfsrvd_dorpc(), add NFSPROC_WRITE to the list of RPC types that use the LK_SHARED lock type. sys/fs/nfsserver/nfs_nfsdport.c: In nfsd_fhtovp(), if we're starting a write, check to see whether the underlying filesystem supports shared writes. If not, upgrade the lock type from LK_SHARED to LK_EXCLUSIVE. sys/nfsserver/nfs_fha.c: Remove all code that is specific to the NFS server implementation. Anything that is server-specific is now accessed through a callback supplied by that server's FHA shim in the new softc. There are now separate sysctls and tunables for the FHA implementations for the old and new NFS servers. The new NFS server has its tunables under vfs.nfsd.fha, the old NFS server's tunables are under vfs.nfsrv.fha as before. In fha_extract_info(), use callouts for all server-specific code. Getting file handles and offsets is now done in the individual server's shim module. In fha_hash_entry_choose_thread(), change the way we decide whether two reads are in proximity to each other. Previously, the calculation was a simple shift operation to see whether the offsets were in the same power of 2 bucket. The issue was that there would be a bucket (and therefore thread) transition, even if the reads were in close proximity. When there is a thread transition, reads wind up going somewhat out of order, and ZFS gets confused. The new calculation simply tries to see whether the offsets are within 1 << bin_shift of each other. If they are, the reads will be sent to the same thread. The effect of this change is that for sequential reads, if the client doesn't exceed the max_reqs_per_nfsd parameter and the bin_shift is set to a reasonable value (22, or 4MB works well in my tests), the reads in any sequential stream will largely be confined to a single thread. Change fha_assign() so that it takes a softc argument. It is now called from the individual server's shim code, which will pass in the softc. Change fhe_stats_sysctl() so that it takes a softc parameter. It is now called from the individual server's shim code. Add the current offset to the list of things printed out about each active thread. Change the num_reads and num_writes counters in the fha_hash_entry structure to 32-bit values, and rename them num_rw and num_exclusive, respectively, to reflect their changed usage. Add an enable sysctl and tunable that allows the user to disable the FHA code (when vfs.XXX.fha.enable = 0). This is useful for before/after performance comparisons. nfs_fha.h: Move most structure definitions out of nfs_fha.c and into the header file, so that the individual server shims can see them. Change the default bin_shift to 22 (4MB) instead of 18 (256K). Allow unlimited commands per thread. sys/nfsserver/nfs_fha_old.c, sys/nfsserver/nfs_fha_old.h, sys/fs/nfsserver/nfs_fha_new.c, sys/fs/nfsserver/nfs_fha_new.h: Add shims for the old and new NFS servers to interface with the FHA code, and callbacks for the The shims contain all of the code and definitions that are specific to the NFS servers. They setup the server-specific callbacks and set the server name for the sysctl and loader tunable variables. sys/nfsserver/nfs_srvkrpc.c: Configure the RPC code to call fhaold_assign() instead of fha_assign(). sys/modules/nfsd/Makefile: Add nfs_fha.c and nfs_fha_new.c. sys/modules/nfsserver/Makefile: Add nfs_fha_old.c. Reviewed by: rmacklem Sponsored by: Spectra Logic MFC after: 2 weeks
248084	09-Mar-2013	attilio	Switch the vm_object mutex to be a rwlock. This will enable in the future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes. The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs. The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example). Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho
247602	02-Mar-2013	pjd	Merge Capsicum overhaul: - Capability is no longer separate descriptor type. Now every descriptor has set of its own capability rights. - The cap_new(2) system call is left, but it is no longer documented and should not be used in new code. - The new syscall cap_rights_limit(2) should be used instead of cap_new(2), which limits capability rights of the given descriptor without creating a new one. - The cap_getrights(2) syscall is renamed to cap_rights_get(2). - If CAP_IOCTL capability right is present we can further reduce allowed ioctls list with the new cap_ioctls_limit(2) syscall. List of allowed ioctls can be retrived with cap_ioctls_get(2) syscall. - If CAP_FCNTL capability right is present we can further reduce fcntls that can be used with the new cap_fcntls_limit(2) syscall and retrive them with cap_fcntls_get(2). - To support ioctl and fcntl white-listing the filedesc structure was heavly modified. - The audit subsystem, kdump and procstat tools were updated to recognize new syscalls. - Capability rights were revised and eventhough I tried hard to provide backward API and ABI compatibility there are some incompatible changes that are described in detail below: CAP_CREATE old behaviour: - Allow for openat(2)+O_CREAT. - Allow for linkat(2). - Allow for symlinkat(2). CAP_CREATE new behaviour: - Allow for openat(2)+O_CREAT. Added CAP_LINKAT: - Allow for linkat(2). ABI: Reuses CAP_RMDIR bit. - Allow to be target for renameat(2). Added CAP_SYMLINKAT: - Allow for symlinkat(2). Removed CAP_DELETE. Old behaviour: - Allow for unlinkat(2) when removing non-directory object. - Allow to be source for renameat(2). Removed CAP_RMDIR. Old behaviour: - Allow for unlinkat(2) when removing directory. Added CAP_RENAMEAT: - Required for source directory for the renameat(2) syscall. Added CAP_UNLINKAT (effectively it replaces CAP_DELETE and CAP_RMDIR): - Allow for unlinkat(2) on any object. - Required if target of renameat(2) exists and will be removed by this call. Removed CAP_MAPEXEC. CAP_MMAP old behaviour: - Allow for mmap(2) with any combination of PROT_NONE, PROT_READ and PROT_WRITE. CAP_MMAP new behaviour: - Allow for mmap(2)+PROT_NONE. Added CAP_MMAP_R: - Allow for mmap(PROT_READ). Added CAP_MMAP_W: - Allow for mmap(PROT_WRITE). Added CAP_MMAP_X: - Allow for mmap(PROT_EXEC). Added CAP_MMAP_RW: - Allow for mmap(PROT_READ \| PROT_WRITE). Added CAP_MMAP_RX: - Allow for mmap(PROT_READ \| PROT_EXEC). Added CAP_MMAP_WX: - Allow for mmap(PROT_WRITE \| PROT_EXEC). Added CAP_MMAP_RWX: - Allow for mmap(PROT_READ \| PROT_WRITE \| PROT_EXEC). Renamed CAP_MKDIR to CAP_MKDIRAT. Renamed CAP_MKFIFO to CAP_MKFIFOAT. Renamed CAP_MKNODE to CAP_MKNODEAT. CAP_READ old behaviour: - Allow pread(2). - Disallow read(2), readv(2) (if there is no CAP_SEEK). CAP_READ new behaviour: - Allow read(2), readv(2). - Disallow pread(2) (CAP_SEEK was also required). CAP_WRITE old behaviour: - Allow pwrite(2). - Disallow write(2), writev(2) (if there is no CAP_SEEK). CAP_WRITE new behaviour: - Allow write(2), writev(2). - Disallow pwrite(2) (CAP_SEEK was also required). Added convinient defines: #define CAP_PREAD (CAP_SEEK \| CAP_READ) #define CAP_PWRITE (CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_R (CAP_MMAP \| CAP_SEEK \| CAP_READ) #define CAP_MMAP_W (CAP_MMAP \| CAP_SEEK \| CAP_WRITE) #define CAP_MMAP_X (CAP_MMAP \| CAP_SEEK \| 0x0000000000000008ULL) #define CAP_MMAP_RW (CAP_MMAP_R \| CAP_MMAP_W) #define CAP_MMAP_RX (CAP_MMAP_R \| CAP_MMAP_X) #define CAP_MMAP_WX (CAP_MMAP_W \| CAP_MMAP_X) #define CAP_MMAP_RWX (CAP_MMAP_R \| CAP_MMAP_W \| CAP_MMAP_X) #define CAP_RECV CAP_READ #define CAP_SEND CAP_WRITE #define CAP_SOCK_CLIENT \ (CAP_CONNECT \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| CAP_GETSOCKOPT \| \ CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| CAP_SETSOCKOPT \| CAP_SHUTDOWN) #define CAP_SOCK_SERVER \ (CAP_ACCEPT \| CAP_BIND \| CAP_GETPEERNAME \| CAP_GETSOCKNAME \| \ CAP_GETSOCKOPT \| CAP_LISTEN \| CAP_PEELOFF \| CAP_RECV \| CAP_SEND \| \ CAP_SETSOCKOPT \| CAP_SHUTDOWN) Added defines for backward API compatibility: #define CAP_MAPEXEC CAP_MMAP_X #define CAP_DELETE CAP_UNLINKAT #define CAP_MKDIR CAP_MKDIRAT #define CAP_RMDIR CAP_UNLINKAT #define CAP_MKFIFO CAP_MKFIFOAT #define CAP_MKNOD CAP_MKNODAT #define CAP_SOCK_ALL (CAP_SOCK_CLIENT \| CAP_SOCK_SERVER) Sponsored by: The FreeBSD Foundation Reviewed by: Christoph Mallon <christoph.mallon@gmx.de> Many aspects discussed with: rwatson, benl, jonathan ABI compatibility discussed with: kib
245909	25-Jan-2013	jhb	Further cleanups to use of timestamps in NFS: - Use NFSD_MONOSEC (which maps to time_uptime) instead of the seconds portion of wall-time stamps to manage timeouts on events. - Remove unused nd_starttime from the per-request structure in the new NFS server. - Use nanotime() for the modification time on a delegation to get as precise a time as possible. - Use time_second instead of extracting the second from a call to getmicrotime(). Submitted by: bde (3) Reviewed by: bde, rmacklem MFC after: 2 weeks
245613	18-Jan-2013	delphij	Make it possible to force async at server side on new NFS server, similar to the old one's nfs.nfsrv.async. Please note that by enabling this option (default is disabled), the system could potentionally have silent data corruption if the server crashes before write is committed to non-volatile storage, as the client side have no way to tell if the data is already written. Submitted by: rmacklem MFC after: 2 weeks
245611	18-Jan-2013	jhb	Use vfs_timestamp() to set file timestamps rather than invoking getmicrotime() or getnanotime() directly in NFS. Reviewed by: rmacklem, bde MFC after: 1 week
244042	08-Dec-2012	rmacklem	Move the NFSv4.1 client patches over from projects/nfsv4.1-client to head. I don't think the NFS client behaviour will change unless the new "minorversion=1" mount option is used. It includes basic NFSv4.1 support plus support for pNFS using the Files Layout only. All problems detecting during an NFSv4.1 Bakeathon testing event in June 2012 have been resolved in this code and it has been tested against the NFSv4.1 server available to me. Although not reviewed, I believe that kib@ has looked at it.
243882	05-Dec-2012	glebius	Mechanically substitute flags from historic mbuf allocator with malloc(9) flags within sys. Exceptions: - sys/contrib not touched - sys/mbuf.h edited manually
241896	22-Oct-2012	kib	Remove the support for using non-mpsafe filesystem modules. In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho
241561	14-Oct-2012	rmacklem	Add two new options to the nfssvc(2) syscall that allow processes running as root to suspend/resume execution of the kernel nfsd threads. An earlier version of this patch was tested by Vincent Hoffman (vince at unsane.co.uk) and John Hickey (jh at deterlab.net). Reviewed by: kib MFC after: 2 weeks
241025	28-Sep-2012	kib	Fix the mis-handling of the VV_TEXT on the nullfs vnodes. If you have a binary on a filesystem which is also mounted over by nullfs, you could execute the binary from the lower filesystem, or from the nullfs mount. When executed from lower filesystem, the lower vnode gets VV_TEXT flag set, and the file cannot be modified while the binary is active. But, if executed as the nullfs alias, only the nullfs vnode gets VV_TEXT set, and you still can open the lower vnode for write. Add a set of VOPs for the VV_TEXT query, set and clear operations, which are correctly bypassed to lower vnode. Tested by: pho (previous version) MFC after: 2 weeks
240720	20-Sep-2012	rmacklem	Modify the NFSv4 client so that it can handle owner and owner_group strings that consist entirely of digits, interpreting them as the uid/gid number. This change was needed since new (>= 3.3) Linux servers reply with these strings by default. This change is mandated by the rfc3530bis draft. Reported on freebsd-stable@ under the Subject heading "Problem with Linux >= 3.3 as NFSv4 server" by Norbert Aschendorff on Aug. 20, 2012. Tested by: norbert.aschendorff at yahoo.de Reviewed by: jhb MFC after: 2 weeks
235381	12-May-2012	rmacklem	Fix two cases in the new NFS server where a tsleep() is used, when the code should actually protect the tested variable with a mutex. Since the tsleep()s had a 10sec timeout, the race would have only delayed the allocation of a new clientid for a client. The sleeps will also rarely occur, since having a callback in progress when a client acquires a new clientid, is unlikely. in practice, since having a callback in progress when a fresh clientid is being acquired by a client is unlikely. MFC after: 1 month
235136	08-May-2012	jwd	Use the common api helper routine instead of freeing the namei buffer directly. Approved by: rmacklem (mentor) MFC after: 1 month
234740	27-Apr-2012	rmacklem	Fix a leak of namei lookup path buffers that occurs when a ZFS volume is exported via the new NFS server. The leak occurred because the new NFS server code didn't handle the case where a file system sets the SAVENAME flag in its VOP_LOOKUP() and ZFS does this for the DELETE case. Tested by: Oliver Brandmueller (ob at gruft.de), hrs PR: kern/167266 MFC after: 1 month
234482	20-Apr-2012	mckusick	This change creates a new list of active vnodes associated with a mount point. Active vnodes are those with a non-zero use or hold count, e.g., those vnodes that are not on the free list. Note that this list is in addition to the list of all the vnodes associated with a mount point. To avoid adding another set of linkage pointers to the vnode structure, the active list uses the existing linkage pointers used by the free list (previously named v_freelist, now renamed v_actfreelist). This update adds the MNT_VNODE_FOREACH_ACTIVE interface that loops over just the active vnodes associated with a mount point (typically less than 1% of the vnodes associated with the mount point). Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks
232467	03-Mar-2012	rmacklem	The name caching changes of r230394 exposed an intermittent bug in the new NFS server for NFSv4, where it would report ENOENT when the file actually existed on the server. This turned out to be caused by not initializing ni_topdir before calling lookup() and there was a rare case where the value on the stack location assigned to ni_topdir happened to be a pointer to a ".." entry, such that "dp == ndp->ni_topdir" succeeded in lookup(). This patch initializes ni_topdir to fix the problem. MFC after: 5 days
232050	23-Feb-2012	rmacklem	hrs@ reported a panic to freebsd-stable@ under the subject line "panic in 8.3-PRERELEASE" on Feb. 22, 2012. This panic was caused by use of a mix of tsleep() and msleep() calls on the same event in the new NFS server DRC code. It did "mtx_unlock(); tsleep();" in two places, which kib@ noted introduced a slight risk that the wakeup() would occur before the tsleep(), resulting in a 10sec delay before waking up. This patch fixes the problem by replacing "mtx_unlock(); tsleep();" with mtx_sleep(..PDROP..). It also changes a nfsmsleep() call to mtx_sleep() so that the code uses mtx_sleep() consistently within the file. Tested by: hrs (in progress) Reviewed by: jhb MFC after: 5 days
231949	21-Feb-2012	kib	Fix found places where uio_resid is truncated to int. Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from the usermode. Discussed with: bde, das (previous versions) MFC after: 1 month
231805	16-Feb-2012	rmacklem	Delete a couple of out of date comments that are no longer true in the new NFS client. Requested by: bde MFC after: 1 week
230100	14-Jan-2012	rmacklem	Tai Horgan reported via email that there were two places in the new NFSv4 server where the code follows the wrong list. Fortunately, for these fairly rare cases, the lc_stateid[] lists are normally empty. This patch fixes the code to follow the correct list. Reported by: tai.horgan at isilon.com Discussed with: zack MFC after: 2 weeks
228560	16-Dec-2011	rmacklem	Patch the new NFS server in a manner analagous to r228520 for the old NFS server, so that it correctly handles a count == 0 argument for Commit. PR: kern/118126 MFC after: 2 weeks
228260	04-Dec-2011	rmacklem	This patch adds a sysctl to the NFSv4 server which optionally disables the check for a UTF-8 compliant file name. Enabling this sysctl results in an NFSv4 server that is non-RFC3530 compliant, therefore it is not enabled by default. However, enabling this sysctl results in NFSv3 compatible behaviour and fixes the problem reported by "dan at sunsaturn.com" to freebsd-current@ on Nov. 14, 2011 under the subject "NFSV4 readlink_stat". Tested by: dan at sunsaturn.com Reviewed by: zack MFC after: 2 weeks
228185	01-Dec-2011	jhb	Enhance the sequential access heuristic used to perform readahead in the NFS server and reuse it for writes as well to allow writes to the backing store to be clustered. - Use a prime number for the size of the heuristic table (1017 is not prime). - Move the logic to locate a heuristic entry from the table and compute the sequential count out of VOP_READ() and into a separate routine. - Use the logic from sequential_heuristic() in vfs_vnops.c to update the seqcount when a sequential access is performed rather than just increasing seqcount by 1. This lets the clustering count ramp up faster. - Allow for some reordering of RPCs and if it is detected leave the current seqcount as-is rather than dropping back to a seqcount of 1. Also, when out of order access is encountered, cut seqcount in half rather than dropping it all the way back to 1 to further aid with reordering. - Fix the new NFS server to properly update the next offset after a successful VOP_READ() so that the readahead actually works. Some of these changes came from an earlier patch by Bjorn Gronwall that was forwarded to me by bde@. Discussed with: bde, rmacklem, fs@ Submitted by: Bjorn Gronwall (1, 4) MFC after: 2 weeks
227809	22-Nov-2011	rmacklem	This patch enables the new/default NFS server's use of shared vnode locking for read, readdir, readlink, getattr and access. It is hoped that this will improve server performance for these operations, since they will no longer be serialized for a given file/vnode.
225617	16-Sep-2011	kmacy	In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)
225356	03-Sep-2011	rmacklem	Fix the NFS servers so that they can do a Lookup of "..", which requires that ni_strictrelative be set to 0, post-r224810. Tested by: swills (earlier version), geo dot liaskos at gmail.com Approved by: re (kib)
225049	20-Aug-2011	rmacklem	Fix the NFSv4 server so that it returns NFSERR_SYMLINK when an attempt to do an Open operation on any type of file other than VREG is done. A recent discussion on the IETF working group's mailing list (nfsv4@ietf.org) decided that NFSERR_SYMLINK should be returned for all non-regular files and not just symlinks, so that the Linux client would work correctly. This change does not affect the FreeBSD NFSv4 client and is not believed to have a negative effect on other NFSv4 clients. Reviewed by: zkirsch Approved by: re (kib) MFC after: 2 weeks
224911	16-Aug-2011	jonathan	Fix a merge conflict. r224086 added "goto out"-style error handling to nfssvc_nfsd(), in order to reliably call NFSEXITCODE() before returning. Our Capsicum changes, based on the old "return (error)" model, did not merge nicely. Approved by: re (kib), mentor (rwatson) Sponsored by: Google Inc
224778	11-Aug-2011	rwatson	Second-to-last commit implementing Capsicum capabilities in the FreeBSD kernel for FreeBSD 9.0: Add a new capability mask argument to fget(9) and friends, allowing system call code to declare what capabilities are required when an integer file descriptor is converted into an in-kernel struct file *. With options CAPABILITIES compiled into the kernel, this enforces capability protection; without, this change is effectively a no-op. Some cases require special handling, such as mmap(2), which must preserve information about the maximum rights at the time of mapping in the memory map so that they can later be enforced in mprotect(2) -- this is done by narrowing the rights in the existing max_protection field used for similar purposes with file permissions. In namei(9), we assert that the code is not reached from within capability mode, as we're not yet ready to enforce namespace capabilities there. This will follow in a later commit. Update two capability names: CAP_EVENT and CAP_KEVENT become CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they represent. Approved by: re (bz) Submitted by: jonathan Sponsored by: Google Inc
224637	03-Aug-2011	zack	Fix an NFS server issue where it was not correctly setting the eof flag when a READ had hit the end of the file. Also, clean up some cruft in the code. Approved by: re (kib) Reviewed by: rmacklem MFC after: 2 weeks
224554	31-Jul-2011	rmacklem	Fix rename in the new NFS server so that it does not require a recursive vnode lock on the directory for the case where the new file name is in the same directory as the old one. The patch handles this as a special case, recognized by the new directory having the same file handle as the old one and just VREF()s the old dir vnode for this case, instead of doing a second VFS_FHTOVP() to get it. This is required so that the server will work for file systems like msdosfs, that do not support recursive vnode locking. This problem was discovered during recent testing by pho@ when exporting an msdosfs file system via the new NFS server. Tested by: pho Reviewed by: zkirsch Approved by: re (kib) MFC after: 2 weeks
224086	16-Jul-2011	zack	Add DEXITCODE plumbing to NFS. Isilon has the concept of an in-memory exit-code ring that saves the last exit code of a function and allows for stack tracing. This is very helpful when debugging tough issues. This patch is essentially a no-op for BSD at this point, until we upstream the dexitcode logic itself. The patch adds DEXITCODE calls to every NFS function that returns an errno error code. A number of code paths were also reorganized to have single exit paths, to reduce code duplication. Submitted by: David Kwan <dkwan@isilon.com> Reviewed by: rmacklem Approved by: zml (mentor) MFC after: 2 weeks
224083	16-Jul-2011	zack	Simple find/replace of VOP_ISLOCKED -> NFSVOPISLOCKED. This is done so that NFSVOPISLOCKED can be modified later to add enhanced logging and assertions. Reviewed by: rmacklem Approved by: zml (mentor) MFC after: 2 weeks
224082	16-Jul-2011	zack	Simple find/replace of VOP_UNLOCK -> NFSVOPUNLOCK. This is done so that NFSVOPUNLOCK can be modified later to add enhanced logging and assertions. Reviewed by: rmacklem Approved by: zml (mentor) MFC after: 2 weeks
224081	16-Jul-2011	zack	Simple find/replace of vn_lock -> NFSVOPLOCK. This is done so that NFSVOPLOCK can be modified later to add enhanced logging and assertions. Reviewed by: rmacklem Approved by: zml (mentor) MFC after: 2 weeks
224080	16-Jul-2011	zack	Remove unnecessary thread pointer from VOPLOCK macros and current users. Reviewed by: rmacklem Approved by: zml (mentor) MFC after: 2 weeks
224078	16-Jul-2011	zack	Move nfsvno_pathconf to be accessible to sys/fs/nfs; no functionality change. Reviewed by: rmacklem Approved by: zml (mentor) MFC after: 2 weeks
224077	16-Jul-2011	zack	Small acl patch to return the aclerror that comes back from nfsrv_dissectacl(). This fixes a problem where ATTRNOTSUPP was being returned instead of BADOWNER. Reviewed by: rmacklem Approved by: zml (mentor) MFC after: 2 weeks
223373	21-Jun-2011	rmacklem	Fix the new NFSv4 server so that it checks for VREAD_ACL when a client does a Getattr for an ACL and not VREAD_ATTRIBUTES. This was found during the recent NFSv4 interoperability Bakeathon. MFC after: 2 weeks
223349	20-Jun-2011	rmacklem	Fix the new NFSv4 server so that it only allows Lookup of directories and symbolic links when traversing non-exported file systems. Found during the recent NFSv4 interoperability Bakeathon. MFC after: 2 weeks
223348	20-Jun-2011	rmacklem	Fix the new NFSv4 server so that it allows Access and Readlink operations while traversing non-exported file systems. This is required for some non-FreeBSD clients to do NFSv4 mounts. Found during the recent NFSv4 interoperability Bakeathon. MFC after: 2 weeks
223312	19-Jun-2011	rmacklem	Fix a number of places where the new NFS server did not lock the mutex when manipulating rc_flag in the DRC cache. This is believed to fix a hung server that was reported to the freebsd-fs@ list on June 9 under the subject heading "New NFS server stress test hang", where all the threads were waiting for the RC_LOCKED flag to clear. Tested by: jwd at slowblink.com MFC after: 2 weeks
223309	19-Jun-2011	rmacklem	Fix the kgssapi so that it can be loaded as a module. Currently the NFS subsystems use five of the rpcsec_gss/kgssapi entry points, but since it was not obvious which others might be useful, all nineteen were included. Basically the nineteen entry points are set in a structure called rpc_gss_entries and inline functions defined in sys/rpc/rpcsec_gss.h check for the entry points being non-NULL and then call them. A default value is returned otherwise. Requested by rwatson. Reviewed by: jhb MFC after: 2 weeks
222663	04-Jun-2011	rmacklem	Modify the new NFS server so that the NFSv3 Pathconf RPC doesn't return an error when the underlying file system lacks support for any of the four _PC_xxx values used, by falling back to default values. Tested by: avg MFC after: 2 weeks
222389	27-May-2011	rmacklem	Fix the new NFS client so that it handles NFSv4 state correctly during a forced dismount. This required that the exclusive and shared (refcnt) sleep lock functions check for MNTK_UMOUNTF before sleeping, so that they won't block while nfscl_umount() is getting rid of the state. As such, a "struct mount *" argument was added to the locking functions. I believe the only remaining case where a forced dismount can get hung in the kernel is when a thread is already attempting to do a TCP connect to a dead server when the krpc client structure called nr_client is NULL. This will only happen just after a "mount -u" with options that force a new TCP connection is done, so it shouldn't be a problem in practice. MFC after: 2 weeks
222167	22-May-2011	rmacklem	Add a lock flags argument to the VFS_FHTOVP() file system method, so that callers can indicate the minimum vnode locking requirement. This will allow some file systems to choose to return a LK_SHARED locked vnode when LK_SHARED is specified for the flags argument. This patch only adds the flag. It does not change any file system to use it and all callers specify LK_EXCLUSIVE, so file system semantics are not changed. Reviewed by: kib
221615	08-May-2011	rmacklem	Change the new NFS server so that it uses vfs.nfsd naming for its sysctls instead of vfs.newnfs. This separates the names from the ones used by the client.
221517	06-May-2011	rmacklem	Change the new NFS server so that it returns 0 when the f_bavail or f_ffree fields of "struct statfs" are negative, since the values that go on the wire are unsigned and will appear to be very large positive values otherwise. This makes the handling of a negative f_bavail compatible with the old/regular NFS server. MFC after: 2 weeks
220648	14-Apr-2011	rmacklem	Fix the experimental NFSv4 server so that it uses VOP_PATHCONF() to determine if a file system supports NFSv4 ACLs. Since VOP_PATHCONF() must be called with a locked vnode, the function is called before nfsvno_fillattr() and the result is passed in as an extra argument. MFC after: 2 weeks
220645	14-Apr-2011	rmacklem	Modify the experimental NFSv4 server so that it handles crossing of server mount points properly. The functions nfsvno_fillattr() and nfsv4_fillattr() were modified to take the extra arguments that are the mount point, a flag to indicate that it is a file system root and the mounted on fileno. The mount point argument needs to be busy when nfsvno_fillattr() is called, since the vp argument is not locked. Reviewed by: kib MFC after: 2 weeks
220546	11-Apr-2011	rmacklem	Vrele ni_startdir in the experimental NFS server for the case of NFSv2 getting an error return from VOP_MKNOD(). Without this patch, the server file system remains busy after an NFSv2 VOP_MKNOD() fails. MFC after: 2 weeks
220530	10-Apr-2011	rmacklem	Add some cleanup code to the module unload operation for the experimental NFS server, so that it doesn't leak memory when unloaded. However, unloading the NFSv4 server is not recommended, since all NFSv4 state will be lost by the unload and clients will have to recover the state after a server reload/restart as if the server crashed/rebooted. MFC after: 2 weeks
220507	10-Apr-2011	rmacklem	Add a VOP_UNLOCK() for the directory, when that is not what VOP_LOOKUP() returned. This fixes a bug in the experimental NFS server for the case where VFS_VGET() fails returning EOPNOTSUPP in the ReaddirPlus RPC, forcing the use of VOP_LOOKUP() instead. MFC after: 2 weeks
219028	25-Feb-2011	netchild	Add some FEATURE macros for various features (AUDIT/CAM/IPC/KTR/MAC/NFS/NTP/ PMC/SYSV/...). No FreeBSD version bump, the userland application to query the features will be committed last and can serve as an indication of the availablility if needed. Sponsored by: Google Summer of Code 2010 Submitted by: kibab Reviewed by: arch@ (parts by rwatson, trasz, jhb) X-MFC after: to be determined in last commit with code from this project
218345	05-Feb-2011	alc	Unless "cnt" exceeds MAX_COMMIT_COUNT, nfsrv_commit() and nfsvno_fsync() are incorrectly calling vm_object_page_clean(). They are passing the length of the range rather than the ending offset of the range. Perform the OFF_TO_IDX() conversion in vm_object_page_clean() rather than the callers. Reviewed by: kib MFC after: 3 weeks
217432	14-Jan-2011	rmacklem	Modify the experimental NFSv4 server so that it posts a SIGUSR2 signal to the master nfsd daemon whenever the stable restart file has been modified. This will allow the master nfsd daemon to maintain an up to date backup copy of the file. This is enabled via the nfssvc() syscall, so that older nfsd daemons will not be signaled. Reviewed by: jhb MFC after: 1 week
217336	12-Jan-2011	zack	In the experimental NFS server, when converting an open-owner to a lock-owner, start at sequence id 1 instead of 0, to match up with both Solaris and Linux. Reviewed by: rmacklem Approved by: zml (mentor)
217335	12-Jan-2011	zack	Clean up the experimental NFS server replay cache when the module is unloaded. Reviewed by: rmacklem Approved by: zml (mentor)
217176	09-Jan-2011	rmacklem	Modify readdirplus in the experimental NFS server in a manner analogous to r216633 for the regular server. This change busies the file system so that VFS_VGET() is guaranteed to be using the correct mount point even during a forced dismount attempt. Since nfsd_fhtovp() is not called immediately before readdirplus, the patch is actually a clone of pjd@'s nfs_serv.c.4.patch instead of the one committed in r216633. Reviewed by: kib MFC after: 10 days
217066	06-Jan-2011	rmacklem	Delete the NFS_STARTWRITE() and NFS_ENDWRITE() macros that obscured vn_start_write() and vn_finished_write() for the old OpenBSD port, since most uses have been replaced by the correct calls. MFC after: 12 days
217063	06-Jan-2011	rmacklem	Since the VFS_LOCK_GIANT() code in the experimental NFS server is broken and the major file systems are now all mpsafe, modify the server so that it will only export mpsafe file systems. This was discussed on freebsd-fs@ and removes a fair bit of crufty code. MFC after: 12 days
217023	05-Jan-2011	rmacklem	Modify the experimental NFS server so that it calls vn_start_write() with a non-NULL vp. That way it will find the correct mount point mp and use that mp for the subsequent vn_finished_write() call. Also, it should fail without crashing if the mount point is being forced dismounted because vn_start_write() will set the mp NULL via VOP_GETWRITEMOUNT(). Reviewed by: kib MFC after: 12 days
217017	05-Jan-2011	rmacklem	Fix the experimental NFS server to use vfs_busyfs() instead of vfs_getvfs() so that the mount point is busied for the VFS_FHTOVP() call. This is analagous to r185432 for the regular NFS server. Reviewed by: kib MFC after: 12 days
216931	03-Jan-2011	rmacklem	Fix the nlm so that it no longer depends on the regular nfs client and, as such, can be loaded for the experimental nfs client without the regular client. Reviewed by: jhb MFC after: 2 weeks
216898	03-Jan-2011	rmacklem	Fix the experimental NFS server so that it doesn't leak a reference count on the directory when creating device special files. MFC after: 2 weeks
216897	03-Jan-2011	rmacklem	Modify the experimental NFSv4 server so that the lookup ops return a locked vnode. This ensures that the associated mount point will always be valid for the code that follows the operation. Also add a couple of additional checks for non-error to the other functions that create file objects. MFC after: 2 weeks
216894	02-Jan-2011	rmacklem	Delete some cruft from the experimental NFS server that was only used by the OpenBSD port for its pseudo-fs. MFC after: 2 weeks
216893	02-Jan-2011	rmacklem	Add checks for VI_DOOMED and vn_lock() failures to the experimental NFS server, to handle the case where an exported file system is forced dismounted while an RPC is in progress. Further commits will fix the cases where a mount point is used when the associated vnode isn't locked. Reviewed by: kib MFC after: 2 weeks
216875	01-Jan-2011	rmacklem	Add support for shared vnode locks for the Read operation in the experimental NFSv4 server. Reviewed by: kib MFC after: 2 weeks
216784	28-Dec-2010	rmacklem	Delete the nfsvno_localconflict() function in the experimental NFS server since it is no longer used and is broken. MFC after: 2 weeks
216700	25-Dec-2010	rmacklem	Modify the experimental NFS server so that it uses LK_SHARED for RPC operations when it can. Since VFS_FHTOVP() currently always gets an exclusively locked vnode and is usually called at the beginning of each RPC, the RPCs for a given vnode will still be serialized. As such, passing a lock type argument to VFS_FHTOVP() would be preferable to doing the vn_lock() with LK_DOWNGRADE after the VFS_FHTOVP() call. Reviewed by: kib MFC after: 2 weeks
216693	24-Dec-2010	rmacklem	Add an argument to nfsvno_getattr() in the experimental NFS server, so that it can avoid calling VOP_ISLOCKED() when the vnode is known to be locked. This will allow LK_SHARED to be used for these cases, which happen to be all the cases that can use LK_SHARED. This does not fix any bug, but it reduces the number of calls to VOP_ISLOCKED() and prepares the code so that it can be switched to using LK_SHARED in a future patch. Reviewed by: kib MFC after: 2 weeks
216692	24-Dec-2010	rmacklem	Simplify vnode locking in the expeimental NFS server's readdir functions. In particular, get rid of two bogus VOP_ISLOCKED() calls. Removing the VOP_ISLOCKED() calls is the only actual bug fixed by this patch. Reviewed by: kib MFC after: 2 weeks
216691	24-Dec-2010	rmacklem	Since VOP_READDIR() for ZFS does not return monotonically increasing directory offset cookies, disable the UFS related loop that skips over directory entries at the beginning of the block for the experimental NFS server. This loop is required for UFS since it always returns directory entries starting at the beginning of the block that the requested directory offset is in. In discussion with pjd@ and mckusick@ it seems that this behaviour of UFS should maybe change, with this fix being an interim patch until then. This patch only fixes the experimental server, since pjd@ is working on a patch for the regular server. Discussed with: pjd, mckusick MFC after: 5 days
216510	17-Dec-2010	rmacklem	Fix two vnode locking problems in nfsd_recalldelegation() in the experimental NFSv4 server. The first was a bogus use of VOP_ISLOCKED() in a KASSERT() and the second was the need to lock the vnode for the nfsrv_checkremove() call. Also, delete a "__unused" that was bogus, since the argument is used. Reviewed by: zack.kirsch at isilon.com MFC after: 2 weeks
216330	09-Dec-2010	rmacklem	Disable attempts to establish a callback connection from the experimental NFSv4 server to a NFSv4 client when delegations are not being issued, even if the client advertises a callback path. This avoids a problem where a Linux client advertises a callback path that doesn't work, due to a firewall, and then times out an Open attempt before the FreeBSD server gives up its callback connection attempt. (Suggested by drb at karlov.mff.cuni.cz to fix the Linux client problem that he reported on the fs-stable mailing list.) The server should probably have a 1sec timeout on callback connection attempts when there are no delegations issued to the client, but that patch will require changes to the krpc and this serves as a work around until then. Tested by: drb at karlov.mff.cuni.cz MFC after: 5 days
214255	23-Oct-2010	rmacklem	Modify the experimental NFSv4 server's file handle hash function to use the generic hash32_buf() function. Although adding the bytes seemed sufficient for UFS and ZFS, since most of the bytes are the same for file handles on the same volume, this might not be sufficient for other file systems. Use of a generic function also seems preferable to one specific to NFSv4. Suggested by: gleb.kurtsou at gmail.com MFC after: 10 days
214224	22-Oct-2010	rmacklem	Modify the file handle hash function in the experimental NFS server so that it will work better for non-UFS file systems. The new function simply sums the bytes of the fh_fid field of fhandle_t. MFC after: 10 days
214149	21-Oct-2010	rmacklem	Modify the experimental NFS server in a manner analagous to r214049 for the regular NFS server, so that it will not do a VOP_LOOKUP() of ".." when at the root of a file system when performing a ReaddirPlus RPC. MFC after: 10 days
213712	11-Oct-2010	rmacklem	Try and make the nfsrv_localunlock() function in the experimental NFSv4 server more readable. Mostly changes to comments, but a case of >= is changed to >, since == can never happen. Also, I've added a couple of KASSERT()s and a slight optimization, since once the "else if" case happens, subsequent locks in the list can't have any effect. None of these changes fixes any known bug. MFC after: 2 weeks
212834	19-Sep-2010	rmacklem	Fix nfsrv_freeallnfslocks() in the experimental NFSv4 server so that it frees local locks correctly upon close. In order for nfsrv_localunlock() to work correctly, the lock can no longer be in the lockowner's stateid list. As such, nfsrv_freenfslock() has to be called before nfsrv_localunlock(), to get rid of the lock structure on the lockowner's stateid list. This only affected operation when local locks (vfs.newnfs.enable_locallocks=1) are enabled, which is not the default at this time. MFC after: 1 week
212833	19-Sep-2010	rmacklem	Fix the experimental NFSv4 server so that it performs local VOP_ADVLOCK() unlock operations correctly. It was passing in F_SETLK instead of F_UNLCK as the operation for the unlock case. This only affected operation when local locking (vfs.newnfs.enable_locallocks=1) was enabled. MFC after: 1 week
212443	10-Sep-2010	rmacklem	This patch applies one of the two fixes suggested by zack.kirsch at isilon.com for a race between nfsrv_freeopen() and nfsrv_getlockfile() in the experimental NFS server that he found during testing. Although nfsrv_freeopen() holds a sleep lock on the lock file structure when called with cansleep != 0, nfsrv_getlockfile() could still search the list, once it acquired the NFSLOCKSTATE() mutex. I believe that acquiring the mutex in nfsrv_freeopen() fixes the race. MFC after: 2 weeks
211953	28-Aug-2010	rmacklem	Add acquisition of a reference count on nfsv4root_lock to the nfsd_recalldelegation() function, since this function is called by nfsd threads when they are handling NFSv2 or NFSv3 RPCs, where no reference count would have been acquired. MFC after: 2 weeks
211951	28-Aug-2010	rmacklem	The timer routine in the experimental NFS server did not acquire the correct mutex when checking nfsv4root_lock. Although this could be fixed by adding mutex lock/unlock calls, zack.kirsch at isilon.com suggested a better fix that uses a non-blocking acquisition of a reference count on nfsv4root_lock. This fix allows the weird NFSLOCKSTATE(); NFSUNLOCKSTATE(); synchronization to be deleted. This patch applies this fix. Tested by: zack.kirsch at isilon.com MFC after: 2 weeks
210178	16-Jul-2010	rmacklem	Patch the experimental NFSv4 server so that it acquires a reference count on nfsv4rootfs_lock when dumping state, since these functions are not called by nfsd threads. Without this reference count, it is possible for an nfsd thread to acquire an exclusive lock on nfsv4rootfs_lock while the dump is in progress and then change the lists, potentially causing a crash. Reported by: zack.kirsch at isilon.com MFC after: 2 weeks
210154	16-Jul-2010	rmacklem	Delete comments related to soft clock interrupts that don't apply to the FreeBSD port of the experimental NFSv4 server. Submitted by: zack.kirsch at isilon.com MFC after: 2 weeks
210102	15-Jul-2010	rmacklem	This patch fixes a bug in the experimental NFSv4 server where it released a reference count on nfsv4rootfs_lock erroneously when administrative revocation of state was done. Submitted by: zack.kirsch at isilon.com MFC after: 2 weeks
210030	13-Jul-2010	rmacklem	Fix a bogus comment that mentions lru lists that don't exist. Reported by: zack.kirsch at isilon.com MFC after: 2 weeks
209191	15-Jun-2010	rmacklem	Add MODULE_DEPEND() macros to the experimental NFS client and server so that the modules will load when kernels are built with none of the NFS* configuration options specified. I believe this resolves the problems reported by PR kern/144458 and the email on freebsd-stable@ posted by Dmitry Pryanishnikov on June 13. Tested by: kib PR: kern/144458 Reviewed by: kib MFC after: 1 week
209120	13-Jun-2010	kib	In NFS clients, instead of inconsistently using #ifdef DIAGNOSTIC and #ifndef DIAGNOSTIC for debug assertions, prefer KASSERT(). Also change one #ifdef DIAGNOSTIC in the new nfs server. Submitted by: Mikolaj Golub <to.my.trociny gmail com> MFC after: 2 weeks
207170	24-Apr-2010	rmacklem	An NFSv4 server will reply NFSERR_GRACE for non-recovery RPCs during the grace period after startup. This grace period must be at least the lease duration, which is typically 1-2 minutes. It seems prudent for the experimental NFS client to wait a few seconds before retrying such an RPC, so that the server isn't flooded with non-recovery RPCs during recovery. This patch adds an argument to nfs_catnap() to implement a 5 second delay for this case. MFC after: 1 week
206236	06-Apr-2010	rmacklem	Harden the experimental NFS server a little, by adding range checks on the length of the client's open/lock owner name. Also, add free()'s for one case where they were missing and would have caused a leak if NFSERR_BADXDR had been replied. Probably never happens, but the leak is now plugged, just in case. MFC after: 2 weeks
206170	04-Apr-2010	rmacklem	Harden the experimental NFS server a little, by adding extra checks in the readdir functions for non-positive byte count arguments. For the negative case, set it to the maximum allowable, since it was actually a large positive value (unsigned) on the wire. Also, fix up the readdir function comment a bit. Suggested by: dillon AT apollo.backplane.com MFC after: 2 weeks
206063	02-Apr-2010	rmacklem	For the experimental NFS server, add a call to free the lookup path buffer for one case where it was missing when doing mkdir. This could have conceivably resulted in a leak of a buffer, but a leak was never observed during testing, so I suspect it would have occurred rarely, if ever, in practice. MFC after: 2 weeks
206061	02-Apr-2010	rmacklem	Add SAVENAME to the cn_flags for all cases in the experimental NFS server for the CREATE cn_nameiop where SAVESTART isn't set. I was not aware that this needed to be done by the caller until recently. Tested by: lampa AT fit.vutbr.cz (link case) Submitted by: lampa AT fit.vutbr.cz (link case) MFC after: 2 weeks
205941	30-Mar-2010	rmacklem	This patch should fix handling of byte range locks locally on the server for the experimental nfs server. When enabled by setting vfs.newnfs.locallocks_enable to non-zero, the experimental nfs server will now acquire byte range locks on the file on behalf of NFSv4 clients, such that lock conflicts between the NFSv4 clients and processes running locally on the server, will be recognized and handled correctly. MFC after: 2 weeks
205663	26-Mar-2010	rmacklem	Patch the experimental NFS server in a manner analagous to r205661 for the regular NFS server, to ensure that ESTALE is returned to the client for all errors returned by VFS_FHTOVP(). MFC after: 2 weeks
205010	11-Mar-2010	rwatson	Update nfsrv_getsocksndseq() for changes in TCP internals since FreeBSD 6.x: - so_pcb is now guaranteed to be non-NULL and valid if a valid socket reference is held. - Need to check INP_TIMEWAIT and INP_DROPPED before assuming inp_ppcb is a tcpcb, as it might be a tcptw or NULL otherwise. - tp can never be NULL by the end of the function, so only check TCPS_ESTABLISHED before extracting tcpcb fields. The NFS server arguably incorporates too many assumptions about TCP internals, but fixing that is left for nother day. MFC after: 1 week Reviewed by: bz Reviewed and tested by: rmacklem Sponsored by: Juniper Networks
203849	14-Feb-2010	rmacklem	Change the default value for vfs.newnfs.enable_locallocks to 0 for the experimental NFS server, since local locking is known to be broken and the patch to fix it is still a work in progress. MFC after: 5 days
203848	14-Feb-2010	rmacklem	This fixes the experimental NFS server so that it won't crash in the caching code for IPv6 by fixing a typo that used the incorrect variable. It also fixes the indentation of the statement above it. Reported by: simon AT comsys.ntu-kpi.kiev.ua MFC after: 5 days
201442	03-Jan-2010	rmacklem	The test for "same client" for the experimental nfs server over NFSv4 was broken w.r.t. byte range lock conflicts when it was the same client and the request used the open_to_lock_owner4 case, since lckstp->ls_clp was not set. This patch fixes it by using "clp" instead of "lckstp->ls_clp". MFC after: 2 weeks
200999	25-Dec-2009	rmacklem	Modify the experimental server so that it uses VOP_ACCESSX(). This is necessary in order to enable NFSv4 ACL support. The argument to nfsvno_accchk() was changed to an accmode_t and the function nfsrv_aclaccess() was no longer needed and, therefore, deleted. Reviewed by: trasz MFC after: 2 weeks
200287	08-Dec-2009	delphij	Allow using IPv6 in nfsrvd_sentcache() callback. PR: kern/141289 Submitted by: Petr Lampa <lampa fit vutbr cz> Approved by: rmacklem MFC after: 1 week
199715	23-Nov-2009	rmacklem	Modify the experimental nfs server so that it falls back to using VOP_LOOKUP() when VFS_VGET() returns EOPNOTSUPP in the ReaddirPlus RPC. This patch is based upon one by pjd@ for the regular nfs server which has not yet been committed. It is needed when a ZFS volume is exported and ReaddirPlus (which almost always happens for NFSv4) is performed by a client. The patch also simplifies vnode lock handling somewhat. MFC after: 2 weeks
199616	20-Nov-2009	rmacklem	Patch the experimental NFS server is a manner analagous to r197525, so that the creation verifier is handled correctly in va_atime for 64bit architectures. There were two problems. One was that the code incorrectly assumed that sizeof (struct timespec) == 8 and the other was that the tv_sec field needs to be assigned from a signed 32bit integer, so that sign extension occurs on 64bit architectures. This is required for correct operation when exporting ZFS volumes. Reviewed by: pjd MFC after: 2 weeks
195699	14-Jul-2009	rwatson	Build on Jeff Roberson's linker-set based dynamic per-CPU allocator (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables. Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker. Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided. This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS. Bump __FreeBSD_version and update UPDATING. Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith)
194498	19-Jun-2009	brooks	Rework the credential code to support larger values of NGROUPS and NGROUPS_MAX, eliminate ABI dependencies on them, and raise the to 1024 and 1023 respectively. (Previously they were equal, but under a close reading of POSIX, NGROUPS_MAX was defined to be too large by 1 since it is the number of supplemental groups, not total number of groups.) The bulk of the change consists of converting the struct ucred member cr_groups from a static array to a pointer. Do the equivalent in kinfo_proc. Introduce new interfaces crcopysafe() and crsetgroups() for duplicating a process credential before modifying it and for setting group lists respectively. Both interfaces take care for the details of allocating groups array. crsetgroups() takes care of truncating the group list to the current maximum (NGROUPS) if necessary. In the future, crsetgroups() may be responsible for insuring invariants such as sorting the supplemental groups to allow groupmember() to be implemented as a binary search. Because we can not change struct xucred without breaking application ABIs, we leave it alone and introduce a new XU_NGROUPS value which is always 16 and is to be used or NGRPS as appropriate for things such as NFS which need to use no more than 16 groups. When feasible, truncate the group list rather than generating an error. Minor changes: - Reduce the number of hand rolled versions of groupmember(). - Do not assign to both cr_gid and cr_groups[0]. - Modify ipfw to cache ucreds instead of part of their contents since they are immutable once referenced by more than one entity. Submitted by: Isilon Systems (initial implementation) X-MFC after: never PR: bin/113398 kern/133867
194408	17-Jun-2009	rmacklem	Add the SVC_RELEASE(xprt), as required by r194407. Approved by: kib (mentor)
194292	16-Jun-2009	rmacklem	Remove the "int *" typecast for the aresid argument to vn_rdwr() and change the type of the argument from size_t to int. This should avoid issues on 64bit architectures. Suggested by: kib Approved by: kib (mentor)
193511	05-Jun-2009	rwatson	Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd
193162	31-May-2009	zec	Unbreak options VIMAGE kernel builds. Approved by: julian (mentor)
192861	26-May-2009	rmacklem	Fix the experimental nfs subsystem so that it builds with the current NFSv4 ACLs, as defined in sys/acl.h. It still needs a way to test a mount point for NFSv4 ACL support before it will work. Until then, the NFSHASNFS4ACL() macro just always returns 0. Approved by: kib (mentor)
192782	26-May-2009	rmacklem	Add two sysctl variables to the experimental nfs server, so that the range of versions of NFS handled by the server can be limited. The nfsd daemon must be restarted after these sysctl variables are changed, in order for the change to take effect. Approved by: kib (mentor)
192781	26-May-2009	rmacklem	Fix the handling of NFSv4 Illegal Operation number to conform to RFC3530 (the operation number in the reply must be set to the value for OP_ILLEGAL). Also cleaned up some indentation. Approved by: kib (mentor)
192780	26-May-2009	rmacklem	Fix the experimental nfs server's interface to the new krpc so that it handles the case of a non-exported NFSv4 root correctly. Also, delete handling for the case where nd_repstat is already set in nfs_proc(), since that no longer happens. Approved by: kib (mentor)
192707	25-May-2009	rmacklem	Add NFSv4 root export checks to the DelegPurge, Renew and ReleaseLockOwner operations analagous to what is already in place for SetClientID and SetClientIDConfirm. These are the five NFSv4 operations that do not use file handle(s), so the checks are done using the NFSv4 root export entries in /etc/exports. Approved by: kib (mentor)
192693	24-May-2009	rmacklem	Fix the experimental NFSv4 server so that it handles the case where a client is not allowed NFSv4 access correctly. This restriction is specified in the "V4: ..." line(s) in /etc/exports. Approved by: kib (mentor)
192596	22-May-2009	rmacklem	Change the printf of r192595 to identify the function, as requested by Sam. Approved by: kib (mentor)
192591	22-May-2009	rmacklem	Modified the printf message of r192590 to remove the possible DOS attack, as suggested by Sam. Approved by: kib (mentor)
192589	22-May-2009	rmacklem	Change the comment at the beginning of the function to reflect the change from panic() to printf() done by r192588.
192588	22-May-2009	rmacklem	Change the reboot panic that would have occurred if clientid numbers wrapped around to a printf() warning of a possible DOS attack, in the experimental nfsv4 server. Approved by: kib (mentor)
192574	22-May-2009	rmacklem	Fix the experimental nfs server so that it depends on the nlm, since it now calls nlm_acquire_next_sysid(). Approved by: kib (mentor)
192539	21-May-2009	rmacklem	Fix the comment at line 3711 to be consistent with the change applied for r192537. Approved by: kib (mentor)
192503	21-May-2009	rmacklem	Modify sys/fs/nfsserver/nfs_nfsdport.c to use nlm_acquire_next_sysid() to set the l_sysid for locks correctly. Approved by: kib (mentor)
192463	20-May-2009	rmacklem	Although it should never happen, all the nfsv4 server can do when it runs out of clientids is reboot. I had replaced cpu_reboot() with printf(), since cpu_reboot() doesn't exist for sparc64. This change replaces the printf() with panic(), so the reboot would occur for this highly unlikely occurrence. Approved by: kib (mentor)
192256	17-May-2009	rmacklem	Fix the acquisition of local locks via VOP_ADVLOCK() by the experimental nfsv4 server. It was setting the a_id argument to a fixed value, but that wasn't sufficient for FreeBSD8. Instead, set l_pid and l_sysid to 0 plus set the F_REMOTE flag to indicate that these fields are used to check for same lock owner. Since, for NFSv4, a lockowner is a ClientID plus an up to 1024byte name, it can't be put in l_sysid easily. I also renamed the p variable to td, since it's a thread ptr. Approved by: kib (mentor)
192255	17-May-2009	rmacklem	Added a SYSCTL to sys/fs/nfsserver/nfs_nfsdport.c so that the value of nfsrv_dolocallocks can be changed via sysctl. I also added some non-empty descriptor strings and reformatted some overly long lines. Approved by: kib (mentor)
192181	16-May-2009	rmacklem	Fixed the Null callback RPCs so that they work with the new krpc. This required two changes: setting the program and version numbers before connect and fixing the handling of the Null Rpc case in newnfs_request(). Approved by: kib (mentor)
192121	14-May-2009	rmacklem	Apply changes to the experimental nfs server so that it uses the security flavors as exported in FreeBSD-CURRENT. This allows it to use a slightly modified mountd.c instead of a different utility. Approved by: kib (mentor)
192017	12-May-2009	rmacklem	Modify the experimental nfs server to use the new nfsd_nfsd_args structure for nfsd. Includes a change that clarifies the use of an empty principal name string to indicate AUTH_SYS only. Approved by: kib (mentor)
192000	11-May-2009	rmacklem	Change the name of the nfs server addsock structure from nfsd_args to nfsd_addsock_args, so that it is consistent with the one in sys/nfsserver/nfs.h. Approved by: kib (mentor)
191998	11-May-2009	rmacklem	Modify nfsvno_fhtovp() to ensure that it always sets the credp argument. Returning without credp set could result in a caller doing crfree() on garbage. Reviewed by: kan Approved by: kib (mentor)
191990	11-May-2009	attilio	Remove the thread argument from the FSD (File-System Dependent) parts of the VFS. Now all the VFS_* functions and relating parts don't want the context as long as it always refers to curthread. In some points, in particular when dealing with VOPs and functions living in the same namespace (eg. vflush) which still need to be converted, pass curthread explicitly in order to retain the old behaviour. Such loose ends will be fixed ASAP. While here fix a bug: now, UFS_EXTATTR can be compiled alone without the UFS_EXTATTR_AUTOSTART option. VFS KPI is heavilly changed by this commit so thirdy parts modules needs to be recompiled. Bump __FreeBSD_version in order to signal such situation.
191940	09-May-2009	kan	Do not embed struct ucred into larger netcred parent structures. Credential might need to hang around longer than its parent and be used outside of mnt_explock scope controlling netcred lifetime. Use separate reference-counted ucred allocated separately instead. While there, extend mnt_explock coverage in vfs_stdexpcheck and clean-up some unused declarations in new NFS code. Reported by: John Hickey PR: kern/133439 Reviewed by: dfr, kib
191783	04-May-2009	rmacklem	Add the experimental nfs subtree to the kernel, that includes support for NFSv4 as well as NFSv2 and 3. It lives in 3 subdirs under sys/fs: nfs - functions that are common to the client and server nfsclient - a mutation of sys/nfsclient that call generic functions to do RPCs and handle state. As such, it retains the buffer cache handling characteristics and vnode semantics that are found in sys/nfsclient, for the most part. nfsserver - the server. It includes a DRC designed specifically for NFSv4, that is used instead of the generic DRC in sys/rpc. The build glue will be checked in later, so at this point, it consists of 3 new subdirs that should not affect kernel building. Approved by: kib (mentor)