History log of /freebsd-current/sys/rpc/svc.h
Revision Date Author Comments
# a16ff32f 20-Mar-2024 John Baldwin <jhb@FreeBSD.org>

NFS: Request use of TCP_USE_DDP for in-kernel TCP sockets

Since this is an optimization, ignore failures to enable the option.

For the server side, defer enabling DDP until the first non-NULLPROC
RPC is received. This allows TLS handling (which uses NULLPROC RPCs)
to enable TLS offload first.

Reviewed by: rmacklem
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D44002


# 29363fb4 23-Nov-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove ancient SCCS tags.

Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.

Sponsored by: Netflix


# 2ff63af9 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .h pattern

Remove /^\s*\*+\s*\$FreeBSD\$.*$\n/


# 564ed8e8 22-Aug-2022 Rick Macklem <rmacklem@FreeBSD.org>

nfsd: Allow multiple instances of rpc.tlsservd

During a discussion with someone working on NFS-over-TLS
for a non-FreeBSD platform, we agreed that a single server
daemon for TLS handshakes could become a bottleneck when
an NFS server first boots, if many concurrent NFS-over-TLS
connections are attempted.

This patch modifies the kernel RPC code so that it can
handle multiple rpc.tlsservd daemons. A separate commit
currently under review as D35886 for the rpc.tlsservd
daemon.


# bcd0e31d 28-Dec-2021 John Baldwin <jhb@FreeBSD.org>

sys/rpc: Use C99 fixed-width integer types.

No functional change.

Reviewed by: imp, emaste
Differential Revision: https://reviews.freebsd.org/D33640


# 631504fb 03-Sep-2021 Gordon Bergling <gbe@FreeBSD.org>

Fix a common typo in source code comments

- s/existant/existent/

MFC after: 3 days


# 20d728b5 09-Jul-2021 Mark Johnston <markj@FreeBSD.org>

rpc: Make function tables const

No functional change intended.

MFC after: 1 week
Sponsored by: The FreeBSD Foundation


# e1a907a2 11-Jun-2021 Rick Macklem <rmacklem@FreeBSD.org>

krpc: Acquire ref count of CLIENT for backchannel use

Michael Dexter <editor@callfortesting.org> reported
a crash in FreeNAS, where the first argument to
clnt_bck_svccall() was no longer valid.
This argument is a pointer to the callback CLIENT
structure, which is free'd when the associated
NFSv4 ClientID is free'd.

This appears to have occurred because a callback
reply was still in the socket receive queue when
the CLIENT structure was free'd.

This patch acquires a reference count on the CLIENT
that is not CLNT_RELEASE()'d until the socket structure
is destroyed. This should guarantee that the CLIENT
structure is still valid when clnt_bck_svccall() is called.
It also adds a check for closed or closing to
clnt_bck_svccall() so that it will not process the callback
RPC reply message after the ClientID is free'd.

Comments by: mav
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D30153


# ab0c29af 21-Aug-2020 Rick Macklem <rmacklem@FreeBSD.org>

Add TLS support to the kernel RPC.

An internet draft titled "Towards Remote Procedure Call Encryption By Default"
describes how TLS is to be used for Sun RPC, with NFS as an intended use case.
This patch adds client and server support for this to the kernel RPC,
using KERN_TLS and upcalls to daemons for the handshake, peer reset and
other non-application data record cases.

The upcalls to the daemons use three fields to uniquely identify the
TCP connection. They are the time.tv_sec, time.tv_usec of the connection
establshment, plus a 64bit sequence number. The time fields avoid problems
with re-use of the sequence number after a daemon restart.
For the server side, once a Null RPC with AUTH_TLS is received, kernel
reception on the socket is blocked and an upcall to the rpctlssd(8) daemon
is done to perform the TLS handshake. Upon completion, the completion
status of the handshake is stored in xp_tls as flag bits and the reply to
the Null RPC is sent.
For the client, if CLSET_TLS has been set, a new TCP connection will
send the Null RPC with AUTH_TLS to initiate the handshake. The client
kernel RPC code will then block kernel I/O on the socket and do an upcall
to the rpctlscd(8) daemon to perform the handshake.
If the upcall is successful, ct_rcvstate will be maintained to indicate
if/when an upcall is being done.

If non-application data records are received, the code does an upcall to
the appropriate daemon, which will do a SSL_read() of 0 length to handle
the record(s).

When the socket is being shut down, upcalls are done to the daemons, so
that they can perform SSL_shutdown() calls to perform the "peer reset".

The rpctlssd(8) and rpctlscd(8) daemons require a patched version of the
openssl library and, as such, will not be committed to head at this time.

Although the changes done by this patch are fairly numerous, there should
be no semantics change to the kernel RPC at this time.
A future commit to the NFS code will optionally enable use of TLS for NFS.


# 51369649 20-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.


# 90f90687 14-Feb-2017 Andriy Gapon <avg@FreeBSD.org>

add svcpool_close to handle killed nfsd threads

This patch adds a new function to the server krpc called
svcpool_close(). It is similar to svcpool_destroy(), but does not free
the data structures, so that the pool can be used again.

This function is then used instead of svcpool_destroy(),
svcpool_create() when the nfsd threads are killed.

PR: 204340
Reported by: Panzura
Approved by: rmacklem
Obtained from: rmacklem
MFC after: 1 week


# 6244c6e7 05-May-2016 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/rpc: minor spelling fixes.

No functional change.


# 3c42b5bf 31-Mar-2015 Garrett Wollman <wollman@FreeBSD.org>

Fix overflow bugs in and remove obsolete limit from kernel RPC
implementation.

The kernel RPC code, which is responsible for the low-level scheduling
of incoming NFS requests, contains a throttling mechanism that
prevents too much kernel memory from being tied up by NFS requests
that are being serviced. When the throttle is engaged, the RPC layer
stops servicing incoming NFS sockets, resulting ultimately in
backpressure on the clients (if they're using TCP). However, this is
a very heavy-handed mechanism as it prevents all clients from making
any requests, regardless of how heavy or light they are. (Thus, when
engaged, the throttle often prevents clients from even mounting the
filesystem.) The throttle mechanism applies specifically to requests
that have been received by the RPC layer (from a TCP or UDP socket)
and are queued waiting to be serviced by one of the nfsd threads; it
does not limit the amount of backlog in the socket buffers.

The original implementation limited the total bytes of queued requests
to the minimum of a quarter of (nmbclusters * MCLBYTES) and 45 MiB.
The former limit seems reasonable, since requests queued in the socket
buffers and replies being constructed to the requests in progress will
all require some amount of network memory, but the 45 MiB limit is
plainly ridiculous for modern memory sizes: when running 256 service
threads on a busy server, 45 MiB would result in just a single
maximum-sized NFS3PROC_WRITE queued per thread before throttling.

Removing this limit exposed integer-overflow bugs in the original
computation, and related bugs in the routines that actually account
for the amount of traffic enqueued for service threads. The old
implementation also attempted to reduce accounting overhead by
batching updates until each queue is fully drained, but this is prone
to livelock, resulting in repeated accumulate-throttle-drain cycles on
a busy server. Various data types are changed to long or unsigned
long; explicit 64-bit types are not used due to the unavailability of
64-bit atomics on many 32-bit platforms, but those platforms also
cannot support nmbclusters large enough to cause overflow.

This code (in a 10.1 kernel) is presently running on production NFS
servers at CSAIL.

Summary of this revision:
* Removes 45 MiB limit on requests queued for nfsd service threads
* Fixes integer-overflow and signedness bugs
* Avoids unnecessary throttling by not deferring accounting for
completed requests

Differential Revision: https://reviews.freebsd.org/D2165
Reviewed by: rmacklem, mav
MFC after: 30 days
Relnotes: yes
Sponsored by: MIT Computer Science & Artificial Intelligence Laboratory


# c59e4cc3 01-Jul-2014 Rick Macklem <rmacklem@FreeBSD.org>

Merge the NFSv4.1 server code in projects/nfsv4.1-server over
into head. The code is not believed to have any effect
on the semantics of non-NFSv4.1 server behaviour.
It is a rather large merge, but I am hoping that there will
not be any regressions for the NFS server.

MFC after: 1 month


# b563304c 08-Jun-2014 Alexander Motin <mav@FreeBSD.org>

Split RPC pool threads into number of smaller semi-isolated groups.

Old design with unified thread pool was good from the point of thread
utilization. But single pool-wide mutex became huge congestion point
for systems with many CPUs. To reduce the congestion create several
thread groups within a pool (one group for every 6 CPUs and 12 threads),
each group with own mutex. Each connection during its registration is
assigned to one of the groups in round-robin fashion. File affinify
code may still move requests between the groups, but otherwise groups
are self-contained.

MFC after: 2 weeks
Sponsored by: iXsystems, Inc.


# b5d7fb73 08-Jun-2014 Alexander Motin <mav@FreeBSD.org>

Remove st_idle variable, duplicating st_xprt.

MFC after: 2 weeks


# b776fb2d 08-Jun-2014 Alexander Motin <mav@FreeBSD.org>

Introduce new per-thread lock to protect the list of requests.

This allows to slightly simplify svc_run_internal() code: if we processed
all the requests in a queue, then we know that new one will not appear.

MFC after: 2 weeks


# bcea84bd 08-Jan-2014 Peter Wemm <peter@FreeBSD.org>

Don't expose svc_loss_reg / _unreg to userland as they're kernel-only
additions from r260229 and the SVCPOOL type doesn't exist in userland.


# 0979970a 05-Jan-2014 Alexander Motin <mav@FreeBSD.org>

Fix NULL dereference panic on UDP requests introduced in r260229.


# c809a67a 04-Jan-2014 Alexander Motin <mav@FreeBSD.org>

Replace locks added in r260229 to protect sequence counters with atomics.

New algorithm does not create additional lock congestion, while some races
it includes should not be a problem. Those races may keep requests in DRC
cache for some more time by returning ACK position smaller then actual,
but it still should be able to drop thems when proper ACK finally read.

Races of the original algorithm based on TCP seq number were worse because
they happened when reply sequence number were recorded. After that even
correctly read ACKs could not clean DRC sometimes.


# d473bac7 03-Jan-2014 Alexander Motin <mav@FreeBSD.org>

Rework NFS Duplicate Request Cache cleanup logic.

- Introduce additional hash to group requests by hash of sockref. This
allows to process TCP acknowledgements without looping though all the cache,
and as result allows to do it every time.
- Indroduce additional callbacks to notify application layer about sockets
disconnection. Without this last few requests processed just before socket
disconnection never processed their ACKs and stuck in cache for many hours.
- Implement transport-specific method for tracking reply acknowledgements.
New implementation does not cross multiple stack layers to get the data and
does not have race conditions that previously made some requests stuck
in cache. This could be done more efficiently at sockbuf layer, but that
would broke some KBIs, while I don't know other consumers for it aside NFS.
- Instead of traversing all DRC twice per request, run cleaning only once
per request, and except in some conditions traverse only single hash slot
at a time.

Together this limits NFS DRC growth only to situations of real connectivity
problems. If network is working well, and so all replies are acknowledged,
cache remains almost empty even after hours of heavy load. Without this
change on the same test cache was growing to many thousand requests even
with perfectly working local network.

As another result this reduces CPU time spent on the DRC handling during
SPEC NFS benchmark from about 10% to 0.5%.

Sponsored by: iXsystems, Inc.


# f8fb069d 30-Dec-2013 Alexander Motin <mav@FreeBSD.org>

Move most of NFS file handle affinity code out of the heavily congested
global RPC thread pool lock and protect it with own set of locks.

On synthetic benchmarks this improves peak NFS request rate by 40%.


# 5c42b9dc 29-Dec-2013 Alexander Motin <mav@FreeBSD.org>

Introduce xprt_inactive_self() -- variant for use when sure that port
is assigned to thread. For example, withing receive handlers. In that
case the function reduces to single assignment and can avoid locking.


# ba981145 20-Dec-2013 Alexander Motin <mav@FreeBSD.org>

Remove several linear list traversals per request from RPC server code.

Do not insert active ports into pool->sp_active list if they are success-
fully assigned to some thread. This makes that list include only ports that
really require attention, and so traversal can be reduced to simple taking
the first one.

Remove idle thread from pool->sp_idlethreads list when assigning some
work (port of requests) to it. That again makes possible to replace list
traversals with simple taking the first element.


# 2e322d37 25-Nov-2013 Hiroki Sato <hrs@FreeBSD.org>

Replace Sun RPC license in TI-RPC library with a 3-clause BSD license,
with the explicit permission of Sun Microsystems in 2009.


# e2adc47d 07-Dec-2012 Rick Macklem <rmacklem@FreeBSD.org>

Add support for backchannels to the kernel RPC. Backchannels
are used by NFSv4.1 for callbacks. A backchannel is a connection
established by the client, but used for RPCs done by the server
on the client (callbacks). As a result, this patch mixes some
client side calls in the server side and vice versa. Some
definitions in the .c files were extracted out into a file called
krpc.h, so that they could be included in multiple .c files.
This code has been in projects/nfsv4.1-client for some time.
Although no one has given it a formal review, I believe kib@
has taken a look at it.


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# a4fa5e6d 04-Jun-2009 Rick Macklem <rmacklem@FreeBSD.org>

Fix two races in the server side krpc w.r.t upcalls:
Add a flag so that soupcall_clear() is only called once to cancel
an upcall.
Move the test for xprt_registered in the upcall down to after the
mtx_lock() of the pool mutex, to catch the case where it is
unregistered while the upcall is waiting for the mutex.
Also, move the mtx_destroy() of the pool mutex to after SVC_RELEASE(),
so that it isn't destroyed before the upcalls are disabled.

Reviewed by: dfr, jhb
Tested by: pho
Approved by: kib (mentor)


# 201e7488 16-Apr-2009 Rick Macklem <rmacklem@FreeBSD.org>

Added a field to the SVCXPRT structure that the nfsv4 server can
use to identify if the socket is the same one that a cached request
came in on. It is set by nfsrvd_addsock() to a unique value generated
by incrementing an unsigned 64bit static variable for each assignment
and then the value of xp_sockref is tested to see if it is equal to
the value that was saved with the cached reply.

Submitted by: rmacklem
Reviewed by: dfr
Approved by: kib (mentor)


# a9148abd 03-Nov-2008 Doug Rabson <dfr@FreeBSD.org>

Implement support for RPCSEC_GSS authentication to both the NFS client
and server. This replaces the RPC implementation of the NFS client and
server with the newer RPC implementation originally developed
(actually ported from the userland sunrpc code) to support the NFS
Lock Manager. I have tested this code extensively and I believe it is
stable and that performance is at least equal to the legacy RPC
implementation.

The NFS code currently contains support for both the new RPC
implementation and the older legacy implementation inherited from the
original NFS codebase. The default is to use the new implementation -
add the NFS_LEGACYRPC option to fall back to the old code. When I
merge this support back to RELENG_7, I will probably change this so
that users have to 'opt in' to get the new code.

To use RPCSEC_GSS on either client or server, you must build a kernel
which includes the KGSSAPI option and the crypto device. On the
userland side, you must build at least a new libc, mountd, mount_nfs
and gssd. You must install new versions of /etc/rc.d/gssd and
/etc/rc.d/nfsd and add 'gssd_enable=YES' to /etc/rc.conf.

As long as gssd is running, you should be able to mount an NFS
filesystem from a server that requires RPCSEC_GSS authentication. The
mount itself can happen without any kerberos credentials but all
access to the filesystem will be denied unless the accessing user has
a valid ticket file in the standard place (/tmp/krb5cc_<uid>). There
is currently no support for situations where the ticket file is in a
different place, such as when the user logged in via SSH and has
delegated credentials from that login. This restriction is also
present in Solaris and Linux. In theory, we could improve this in
future, possibly using Brooks Davis' implementation of variant
symlinks.

Supporting RPCSEC_GSS on a server is nearly as simple. You must create
service creds for the server in the form 'nfs/<fqdn>@<REALM>' and
install them in /etc/krb5.keytab. The standard heimdal utility ktutil
makes this fairly easy. After the service creds have been created, you
can add a '-sec=krb5' option to /etc/exports and restart both mountd
and nfsd.

The only other difference an administrator should notice is that nfsd
doesn't fork to create service threads any more. In normal operation,
there will be two nfsd processes, one in userland waiting for TCP
connections and one in the kernel handling requests. The latter
process will create as many kthreads as required - these should be
visible via 'top -H'. The code has some support for varying the number
of service threads according to load but initially at least, nfsd uses
a fixed number of threads according to the value supplied to its '-n'
option.

Sponsored by: Isilon Systems
MFC after: 1 month


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# dfdcada3 26-Mar-2008 Doug Rabson <dfr@FreeBSD.org>

Add the new kernel-mode NFS Lock Manager. To use it instead of the
user-mode lock manager, build a kernel with the NFSLOCKD option and
add '-k' to 'rpc_lockd_flags' in rc.conf.

Highlights include:

* Thread-safe kernel RPC client - many threads can use the same RPC
client handle safely with replies being de-multiplexed at the socket
upcall (typically driven directly by the NIC interrupt) and handed
off to whichever thread matches the reply. For UDP sockets, many RPC
clients can share the same socket. This allows the use of a single
privileged UDP port number to talk to an arbitrary number of remote
hosts.

* Single-threaded kernel RPC server. Adding support for multi-threaded
server would be relatively straightforward and would follow
approximately the Solaris KPI. A single thread should be sufficient
for the NLM since it should rarely block in normal operation.

* Kernel mode NLM server supporting cancel requests and granted
callbacks. I've tested the NLM server reasonably extensively - it
passes both my own tests and the NFS Connectathon locking tests
running on Solaris, Mac OS X and Ubuntu Linux.

* Userland NLM client supported. While the NLM server doesn't have
support for the local NFS client's locking needs, it does have to
field async replies and granted callbacks from remote NLMs that the
local client has contacted. We relay these replies to the userland
rpc.lockd over a local domain RPC socket.

* Robust deadlock detection for the local lock manager. In particular
it will detect deadlocks caused by a lock request that covers more
than one blocking request. As required by the NLM protocol, all
deadlock detection happens synchronously - a user is guaranteed that
if a lock request isn't rejected immediately, the lock will
eventually be granted. The old system allowed for a 'deferred
deadlock' condition where a blocked lock request could wake up and
find that some other deadlock-causing lock owner had beaten them to
the lock.

* Since both local and remote locks are managed by the same kernel
locking code, local and remote processes can safely use file locks
for mutual exclusion. Local processes have no fairness advantage
compared to remote processes when contending to lock a region that
has just been unlocked - the local lock manager enforces a strict
first-come first-served model for both local and remote lockers.

Sponsored by: Isilon Systems
PR: 95247 107555 115524 116679
MFC after: 2 weeks