Cross Reference: /freebsd-10-stable/sys/rpc/svc.c

History log of /freebsd-10-stable/sys/rpc/svc.c
Revision	Date	Author	Comments
# 336928	30-Jul-2018	rmacklem	MFC: r335866 Fix the server side krpc so that the kernel nfsd threads terminate. Occationally the kernel nfsd threads would not terminate when a SIGKILL was posted for the kernel process (called nfsd (slave)). When this occurred, the thread associated with the process (called "ismaster") had returned from svc_run_internal() and was sleeping waiting for the other threads to terminate. The other threads (created by kthread_start()) were still in svc_run_internal() handling NFS RPCs. The only way this could occur is for the "ismaster" thread to return from svc_run_internal() without having called svc_exit(). There was only one place in the code where this could happen and this patch stops that from happening. Since the problem is intermittent, I cannot be sure if this has fixed the problem, but I have not seen an occurrence of the problem with this patch applied.
# 314034	21-Feb-2017	avg	MFC r313735: add svcpool_close to handle killed nfsd threads PR: 204340 Reported by: Panzura Reviewed by: rmacklem Approved by: rmacklem
# 303692	02-Aug-2016	ngie	MFstable/11 r303691: MFC r302550,r302551,r302552,r302553: r302550: Deobfuscate cleanup path in clnt_dg_create(..) Similar to r300836 and r301800, cl and cu will always be non-NULL as they're allocated using the mem_alloc routines, which always use `malloc(..., M_WAITOK)`. Deobfuscating the cleanup path fixes a leak where if cl was NULL and cu was not, cu would not be free'd, and also removes a duplicate test for cl not being NULL. CID: 1007033, 1007344 r302551: Deobfuscate cleanup path in clnt_vc_create(..) Similar to r300836, r301800, and r302550, cl and ct will always be non-NULL as they're allocated using the mem_alloc routines, which always use `malloc(..., M_WAITOK)`. CID: 1007342 r302552: Convert `svc_xprt_alloc(..)` and `svc_xprt_free(..)`'s prototypes to ANSI C style prototypes r302553: Don't test for xpt not being NULL before calling svc_xprt_free(..) svc_xprt_alloc(..) will always return initialized memory as it uses mem_alloc(..) under the covers, which uses malloc(.., M_WAITOK, ..). CID: 1007341
# 301680	08-Jun-2016	ngie	MFC r300625: Remove unnecessary memset(.., 0, ..)'s The mem_alloc macro calls calloc (userspace) / malloc(.., M_WAITOK\|M_ZERO) under the covers, so zeroing out memory is already handled by the underlying calls
# 297342	28-Mar-2016	mav	MFC r297051: Fix incorrect (fortunately bigger) malloc size.
# 291384	27-Nov-2015	mav	MFC r291061: Improve locking of sg_threadcount.
# 290203	30-Oct-2015	wollman	Long-overdue MFC of r280930: Fix overflow bugs in and remove obsolete limit from kernel RPC implementation. The kernel RPC code, which is responsible for the low-level scheduling of incoming NFS requests, contains a throttling mechanism that prevents too much kernel memory from being tied up by NFS requests that are being serviced. When the throttle is engaged, the RPC layer stops servicing incoming NFS sockets, resulting ultimately in backpressure on the clients (if they're using TCP). However, this is a very heavy-handed mechanism as it prevents all clients from making any requests, regardless of how heavy or light they are. (Thus, when engaged, the throttle often prevents clients from even mounting the filesystem.) The throttle mechanism applies specifically to requests that have been received by the RPC layer (from a TCP or UDP socket) and are queued waiting to be serviced by one of the nfsd threads; it does not limit the amount of backlog in the socket buffers. The original implementation limited the total bytes of queued requests to the minimum of a quarter of (nmbclusters * MCLBYTES) and 45 MiB. The former limit seems reasonable, since requests queued in the socket buffers and replies being constructed to the requests in progress will all require some amount of network memory, but the 45 MiB limit is plainly ridiculous for modern memory sizes: when running 256 service threads on a busy server, 45 MiB would result in just a single maximum-sized NFS3PROC_WRITE queued per thread before throttling. Removing this limit exposed integer-overflow bugs in the original computation, and related bugs in the routines that actually account for the amount of traffic enqueued for service threads. The old implementation also attempted to reduce accounting overhead by batching updates until each queue is fully drained, but this is prone to livelock, resulting in repeated accumulate-throttle-drain cycles on a busy server. Various data types are changed to long or unsigned long; explicit 64-bit types are not used due to the unavailability of 64-bit atomics on many 32-bit platforms, but those platforms also cannot support nmbclusters large enough to cause overflow. This code (in a 10.1 kernel) is presently running on production NFS servers at CSAIL. Summary of this revision: * Removes 45 MiB limit on requests queued for nfsd service threads * Fixes integer-overflow and signedness bugs * Avoids unnecessary throttling by not deferring accounting for completed requests Differential Revision: https://reviews.freebsd.org/D2165 Reviewed by: rmacklem, mav Relnotes: yes Sponsored by: MIT Computer Science & Artificial Intelligence Laboratory
# 276272	26-Dec-2014	kib	MFC r275745: Add facility to stop all userspace processes. MFC r275753: Fix gcc build. MFC r275820: Add missed break.
# 275796	15-Dec-2014	kib	MFC r275618: Check for stop condition in nfsd threads.
# 267742	22-Jun-2014	mav	MFC r267228: Split RPC pool threads into number of smaller semi-isolated groups. Old design with unified thread pool was good from the point of thread utilization. But single pool-wide mutex became huge congestion point for systems with many CPUs. To reduce the congestion create several thread groups within a pool (one group for every 6 CPUs and 12 threads), each group with own mutex. Each connection during its registration is assigned to one of the groups in round-robin fashion. File affinify code may still move requests between the groups, but otherwise groups are self-contained.
# 267741	22-Jun-2014	mav	MFC r267223: Remove st_idle variable, duplicating st_xprt.
# 267740	22-Jun-2014	mav	MFC r267221, r267278: Introduce new per-thread lock to protect the list of requests. This allows to slightly simplify svc_run_internal() code: if we processed all the requests in a queue, then we know that new one will not appear.
# 261571	07-Feb-2014	mav	MFC r261449: Fix lock acquisition in case no request space available, missed in r260097.
# 261055	22-Jan-2014	mav	MFC r260229, r260258, r260367, r260390, r260459, r260648: Rework NFS Duplicate Request Cache cleanup logic. - Introduce additional hash to group requests by hash of sockref. This allows to process TCP acknowledgements without looping though all the cache, and as result allows to do it every time. - Indroduce additional callbacks to notify application layer about sockets disconnection. Without this last few requests processed just before socket disconnection never processed their ACKs and stuck in cache for many hours. - Implement transport-specific method for tracking reply acknowledgements. New implementation does not cross multiple stack layers to get the data and does not have race conditions that previously made some requests stuck in cache. This could be done more efficiently at sockbuf layer, but that would broke some KBIs, while I don't know other consumers for it aside NFS. - Instead of traversing all DRC twice per request, run cleaning only once per request, and except in some conditions traverse only single hash slot at a time. Together this limits NFS DRC growth only to situations of real connectivity problems. If network is working well, and so all replies are acknowledged, cache remains almost empty even after hours of heavy load. Without this change on the same test cache was growing to many thousand requests even with perfectly working local network. As another result this reduces CPU time spent on the DRC handling during SPEC NFS benchmark from about 10% to 0.5%. Sponsored by: iXsystems, Inc.
# 261054	22-Jan-2014	mav	MFC r260097: Move most of NFS file handle affinity code out of the heavily congested global RPC thread pool lock and protect it with own set of locks. On synthetic benchmarks this improves peak NFS request rate by 40%.
# 261053	22-Jan-2014	mav	MFC r260036: Introduce xprt_inactive_self() -- variant for use when sure that port is assigned to thread. For example, withing receive handlers. In that case the function reduces to single assignment and can avoid locking.
# 261048	22-Jan-2014	mav	MFC r259659, r259662: Remove several linear list traversals per request from RPC server code. Do not insert active ports into pool->sp_active list if they are success- fully assigned to some thread. This makes that list include only ports that really require attention, and so traversal can be reduced to simple taking the first one. Remove idle thread from pool->sp_idlethreads list when assigning some work (port of requests) to it. That again makes possible to replace list traversals with simple taking the first element.
# 261046	22-Jan-2014	mav	MFC r258578, r258580, r258581 (by hrs): Replace Sun RPC license in TI-RPC library with a 3-clause BSD license with the explicit permissions.
# 261045	22-Jan-2014	mav	MFC r258132: Some minor tuning to rpc/svc.c: - close cosmetic race in svc_exit(); - do not set wait timeout for idle threads if we have no use for wakeups; - create new requested thread sooner, not only after some another thread wakeup, that may happen later under constant load.
# 276272	26-Dec-2014	kib	MFC r275745: Add facility to stop all userspace processes. MFC r275753: Fix gcc build. MFC r275820: Add missed break.
# 275796	15-Dec-2014	kib	MFC r275618: Check for stop condition in nfsd threads.
# 267742	22-Jun-2014	mav	MFC r267228: Split RPC pool threads into number of smaller semi-isolated groups. Old design with unified thread pool was good from the point of thread utilization. But single pool-wide mutex became huge congestion point for systems with many CPUs. To reduce the congestion create several thread groups within a pool (one group for every 6 CPUs and 12 threads), each group with own mutex. Each connection during its registration is assigned to one of the groups in round-robin fashion. File affinify code may still move requests between the groups, but otherwise groups are self-contained.
# 267741	22-Jun-2014	mav	MFC r267223: Remove st_idle variable, duplicating st_xprt.
# 267740	22-Jun-2014	mav	MFC r267221, r267278: Introduce new per-thread lock to protect the list of requests. This allows to slightly simplify svc_run_internal() code: if we processed all the requests in a queue, then we know that new one will not appear.
# 261571	07-Feb-2014	mav	MFC r261449: Fix lock acquisition in case no request space available, missed in r260097.
# 261055	22-Jan-2014	mav	MFC r260229, r260258, r260367, r260390, r260459, r260648: Rework NFS Duplicate Request Cache cleanup logic. - Introduce additional hash to group requests by hash of sockref. This allows to process TCP acknowledgements without looping though all the cache, and as result allows to do it every time. - Indroduce additional callbacks to notify application layer about sockets disconnection. Without this last few requests processed just before socket disconnection never processed their ACKs and stuck in cache for many hours. - Implement transport-specific method for tracking reply acknowledgements. New implementation does not cross multiple stack layers to get the data and does not have race conditions that previously made some requests stuck in cache. This could be done more efficiently at sockbuf layer, but that would broke some KBIs, while I don't know other consumers for it aside NFS. - Instead of traversing all DRC twice per request, run cleaning only once per request, and except in some conditions traverse only single hash slot at a time. Together this limits NFS DRC growth only to situations of real connectivity problems. If network is working well, and so all replies are acknowledged, cache remains almost empty even after hours of heavy load. Without this change on the same test cache was growing to many thousand requests even with perfectly working local network. As another result this reduces CPU time spent on the DRC handling during SPEC NFS benchmark from about 10% to 0.5%. Sponsored by: iXsystems, Inc.
# 261054	22-Jan-2014	mav	MFC r260097: Move most of NFS file handle affinity code out of the heavily congested global RPC thread pool lock and protect it with own set of locks. On synthetic benchmarks this improves peak NFS request rate by 40%.
# 261053	22-Jan-2014	mav	MFC r260036: Introduce xprt_inactive_self() -- variant for use when sure that port is assigned to thread. For example, withing receive handlers. In that case the function reduces to single assignment and can avoid locking.
# 261048	22-Jan-2014	mav	MFC r259659, r259662: Remove several linear list traversals per request from RPC server code. Do not insert active ports into pool->sp_active list if they are success- fully assigned to some thread. This makes that list include only ports that really require attention, and so traversal can be reduced to simple taking the first one. Remove idle thread from pool->sp_idlethreads list when assigning some work (port of requests) to it. That again makes possible to replace list traversals with simple taking the first element.
# 261046	22-Jan-2014	mav	MFC r258578, r258580, r258581 (by hrs): Replace Sun RPC license in TI-RPC library with a 3-clause BSD license with the explicit permissions.
# 261045	22-Jan-2014	mav	MFC r258132: Some minor tuning to rpc/svc.c: - close cosmetic race in svc_exit(); - do not set wait timeout for idle threads if we have no use for wakeups; - create new requested thread sooner, not only after some another thread wakeup, that may happen later under constant load.