History log of /freebsd-current/sys/rpc/clnt_vc.c
Revision Date Author Comments
# 4ba444de 26-Apr-2024 Rick Macklem <rmacklem@FreeBSD.org>

krpc: Ref cnt the client structures for TLS upcalls

A crash occurred during testing, where the client structures had
already been free'd when the upcall thread tried to lock them.

This patch acquires a reference count on both of the structures
and these are released when the upcall is done, so that the
structures cannot be free'd prematurely. This happened because
the testing is done over a very slow vpn.

Found during a IETF bakeathon testing event this week.

MFC after: 5 days


# e205fd31 09-Apr-2024 Gleb Smirnoff <glebius@FreeBSD.org>

rpc: use new macros to lock socket buffers

Fixes: d80a97def9a1db6f07f5d2e68f7ad62b27918947


# f79a8585 30-Jan-2024 Gleb Smirnoff <glebius@FreeBSD.org>

sockets: garbage collect SS_ISCONFIRMING

Fixes: 8df32b19dee92b5eaa4b488ae78dca6accfcb38e


# 29363fb4 23-Nov-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove ancient SCCS tags.

Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.

Sponsored by: Netflix


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 82512c17 14-Oct-2022 Rick Macklem <rmacklem@FreeBSD.org>

clnt_vc.c: Replace msleep() with pause() to avoid assert panic

An msleep() in clnt_vc.c used a global "fake_wchan" wchan argument
along with the mutex in a CLIENT structure. As such, it was
possible to use different mutexes for the same wchan and
cause a panic assert. Since this is in a rarely executed code
path, the assert panic was only recently observed.

Since "fake_wchan" never gets a wakeup, this msleep() can
be replaced with a pause() to avoid the panic assert,
which is what this patch does.

Reviewed by: kib, markj
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D36977


# 0b4f2ab0 15-May-2022 Rick Macklem <rmacklem@FreeBSD.org>

krpc: Fix NFS-over-TLS for KTLS1.3

When NFS-over-TLS uses KTLS1.3, the client can receive
post-handshake handshake records. These records can be
safely thown away, but are not handled correctly via the
rpctls_ct_handlerecord() upcall to the daemon.

Commit 373511338d95 changed soreceive_generic() so that it
will only return ENXIO for Alert records when MSG_TLSAPPDATA
is specified. As such, the post-handshake handshake
records will be returned to the krpc.

This patch modifies the krpc so that it will throw
these records away, which seems sufficient to make
NFS-over-TLS work with KTLS1.3. This change has
no effect on the use of KTLS1.2, since it does not
generate post-handshake handshake records.

MFC after: 2 weeks


# 43283184 12-May-2022 Gleb Smirnoff <glebius@FreeBSD.org>

sockets: use socket buffer mutexes in struct socket directly

Since c67f3b8b78e the sockbuf mutexes belong to the containing socket,
and socket buffers just point to it. In 74a68313b50 macros that access
this mutex directly were added. Go over the core socket code and
eliminate code that reaches the mutex by dereferencing the sockbuf
compatibility pointer.

This change requires a KPI change, as some functions were given the
sockbuf pointer only without any hint if it is a receive or send buffer.

This change doesn't cover the whole kernel, many protocols still use
compatibility pointers internally. However, it allows operation of a
protocol that doesn't use them.

Reviewed by: markj
Differential revision: https://reviews.freebsd.org/D35152


# 77bc5890 04-Apr-2022 Warner Losh <imp@FreeBSD.org>

clnt_vc_destroy: eliminiate write only variable stat

Sponsored by: Netflix


# e3ba94d4 09-Nov-2021 John Baldwin <jhb@FreeBSD.org>

Don't require the socket lock for sorele().

Previously, sorele() always required the socket lock and dropped the
lock if the released reference was not the last reference. Many
callers locked the socket lock just before calling sorele() resulting
in a wasted lock/unlock when not dropping the last reference.

Move the previous implementation of sorele() into a new
sorele_locked() function and use it instead of sorele() for various
places in uipc_socket.c that called sorele() while already holding the
socket lock.

The sorele() macro now uses refcount_release_if_not_last() try to drop
the socket reference without locking the socket. If that shortcut
fails, it locks the socket and calls sorele_locked().

Reviewed by: kib, markj
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D32741


# 20d728b5 09-Jul-2021 Mark Johnston <markj@FreeBSD.org>

rpc: Make function tables const

No functional change intended.

MFC after: 1 week
Sponsored by: The FreeBSD Foundation


# ab0c29af 21-Aug-2020 Rick Macklem <rmacklem@FreeBSD.org>

Add TLS support to the kernel RPC.

An internet draft titled "Towards Remote Procedure Call Encryption By Default"
describes how TLS is to be used for Sun RPC, with NFS as an intended use case.
This patch adds client and server support for this to the kernel RPC,
using KERN_TLS and upcalls to daemons for the handshake, peer reset and
other non-application data record cases.

The upcalls to the daemons use three fields to uniquely identify the
TCP connection. They are the time.tv_sec, time.tv_usec of the connection
establshment, plus a 64bit sequence number. The time fields avoid problems
with re-use of the sequence number after a daemon restart.
For the server side, once a Null RPC with AUTH_TLS is received, kernel
reception on the socket is blocked and an upcall to the rpctlssd(8) daemon
is done to perform the TLS handshake. Upon completion, the completion
status of the handshake is stored in xp_tls as flag bits and the reply to
the Null RPC is sent.
For the client, if CLSET_TLS has been set, a new TCP connection will
send the Null RPC with AUTH_TLS to initiate the handshake. The client
kernel RPC code will then block kernel I/O on the socket and do an upcall
to the rpctlscd(8) daemon to perform the handshake.
If the upcall is successful, ct_rcvstate will be maintained to indicate
if/when an upcall is being done.

If non-application data records are received, the code does an upcall to
the appropriate daemon, which will do a SSL_read() of 0 length to handle
the record(s).

When the socket is being shut down, upcalls are done to the daemons, so
that they can perform SSL_shutdown() calls to perform the "peer reset".

The rpctlssd(8) and rpctlscd(8) daemons require a patched version of the
openssl library and, as such, will not be committed to head at this time.

Although the changes done by this patch are fairly numerous, there should
be no semantics change to the kernel RPC at this time.
A future commit to the NFS code will optionally enable use of TLS for NFS.


# b94b9a80 20-Jun-2020 Rick Macklem <rmacklem@FreeBSD.org>

Fix up a comment added by r362455.


# 4302e8b6 20-Jun-2020 Rick Macklem <rmacklem@FreeBSD.org>

Modify the way the client side krpc does soreceive() for TCP.

Without this patch, clnt_vc_soupcall() first does a soreceive() for
4 bytes (the Sun RPC over TCP record mark) and then soreceive(s) for
the RPC message.
This first soreceive() almost always results in an mbuf allocation,
since having the 4byte record mark in a separate mbuf in the socket
rcv queue is unlikely.
This is somewhat inefficient and rather odd. It also will not work
for the ktls rx, since the latter returns a TLS record for each
soreceive().

This patch replaces the above with code similar to what the server side
of the krpc does for TCP, where it does a soreceive() for as much data
as possible and then parses RPC messages out of the received data.
A new field of the TCP socket structure called ct_raw is the list of
received mbufs that the RPC message(s) are parsed from.
I think this results in cleaner code and is needed for support of
nfs-over-tls.
It also fixes the code for the case where a server sends an RPC message
in multiple RPC message fragments. Although this is allowed by RFC5531,
no extant NFS server does this. However, it is probably good to fix this
in case some future NFS server does do this.


# 51369649 20-Nov-2017 Pedro F. Giffuni <pfg@FreeBSD.org>

sys: further adoption of SPDX licensing ID tags.

Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.


# dfd174d6 06-May-2017 Rick Macklem <rmacklem@FreeBSD.org>

Fix the client side krpc from doing TCP reconnects for ERESTART from sosend().

When sosend() replies ERESTART in the client side krpc, it indicates that
the RPC message hasn't yet been sent and that the send queue is full or
locked while a signal is posted for the process.
Without this patch, this would result in a RPC_CANTSEND reply from
clnt_vc_call(), which would cause clnt_reconnect_call() to create a new
TCP transport connection. For most NFS servers, this wasn't a serious problem,
although it did imply retries of outstanding RPCs, which could possibly
have missed the DRC.
For an NFSv4.1 mount to AmazonEFS, this caused a serious problem, since
AmazonEFS often didn't retain the NFSv4.1 session and would reply with
NFS4ERR_BAD_SESSION. This implies to the client a crash/reboot which
requires open/lock state recovery.

Three options were considered to fix this:
- Return the ERESTART all the way up to the system call boundary and then
have the system call redone. This is fraught with risk, due to convoluted
code paths, asynchronous I/O RPCs etc. cperciva@ worked on this, but it
is still a work in prgress and may not be feasible.
- Set SB_NOINTR for the socket buffer. This fixes the problem, but makes
the sosend() completely non interruptible, which kib@ considered
inappropriate. It also would break forced dismount when a thread
was blocked in sosend().
- Modify the retry loop in clnt_vc_call(), so that it loops for this case
for up to 15sec. Testing showed that the sosend() usually succeeded by
the 2nd retry. The extreme case observed was 111 loop iterations, or
about 100msec of delay.
This third alternative is what is implemented in this patch, since the
change is:
- localized
- straightforward
- forced dismount is not broken by it.

This patch has been tested by cperciva@ extensively against AmazonEFS.

Reported by: cperciva
Tested by: cperciva
MFC after: 2 weeks


# 34f1fddb 10-Apr-2017 Rick Macklem <rmacklem@FreeBSD.org>

Fix a crash during unmount of an NFSv4.1 mount.

Larry Rosenman reported a crash on freebsd-current@ which was caused by
a premature release of the krpc backchannel socket structure.
I believe this was caused by a race between the SVC_RELEASE() in clnt_vc.c
and the xprt_unregister() in the higher layer (clnt_rc.c), which tried
to lock the mutex in the xprt structure and crashed.
This patch fixes this by removing the xprt_unregister() in the clnt_vc
layer and allowing this to always be done by the clnt_rc (higher reconnect
layer).

Reported by: ler@lerctr.org
Tested by: ler@letctr.org
MFC after: 2 weeks


# 7d3db235 11-Jul-2016 Enji Cooper <ngie@FreeBSD.org>

Deobfuscate cleanup path in clnt_vc_create(..)

Similar to r300836, r301800, and r302550, cl and ct will always
be non-NULL as they're allocated using the mem_alloc routines,
which always use `malloc(..., M_WAITOK)`.

MFC after: 1 week
Reported by: Coverity
CID: 1007342
Sponsored by: EMC / Isilon Storage Division


# 6244c6e7 05-May-2016 Pedro F. Giffuni <pfg@FreeBSD.org>

sys/rpc: minor spelling fixes.

No functional change.


# cfa6009e 12-Nov-2014 Gleb Smirnoff <glebius@FreeBSD.org>

In preparation of merging projects/sendfile, transform bare access to
sb_cc member of struct sockbuf to a couple of inline functions:

sbavail() and sbused()

Right now they are equal, but once notion of "not ready socket buffer data",
will be checked in, they are going to be different.

Sponsored by: Netflix
Sponsored by: Nginx, Inc.


# c3e2c655 02-May-2014 Christian Brueffer <brueffer@FreeBSD.org>

Properly free resources in case of error.

CID: 1007032
Found with: Coverity Prevent(tm)
MFC after: 2 weeks


# a6132f60 24-Dec-2013 Dimitry Andric <dim@FreeBSD.org>

Remove some unused static const strings under sys/rpc, which have never
been used since the initial commit (r177633).

MFC after: 3 days


# 2e322d37 25-Nov-2013 Hiroki Sato <hrs@FreeBSD.org>

Replace Sun RPC license in TI-RPC library with a 3-clause BSD license,
with the explicit permission of Sun Microsystems in 2009.


# 3b14c753 13-Mar-2013 John Baldwin <jhb@FreeBSD.org>

Revert 195703 and 195821 as this special stop handling in NFS is now
implemented via VFCF_SBDRY rather than passing PBDRY to individual
sleep calls.


# bd54830b 11-Mar-2013 Gleb Smirnoff <glebius@FreeBSD.org>

Use m_get(), m_gethdr() and m_getcl() instead of historic macros.

Sponsored by: Nginx, Inc.


# e2adc47d 07-Dec-2012 Rick Macklem <rmacklem@FreeBSD.org>

Add support for backchannels to the kernel RPC. Backchannels
are used by NFSv4.1 for callbacks. A backchannel is a connection
established by the client, but used for RPCs done by the server
on the client (callbacks). As a result, this patch mixes some
client side calls in the server side and vice versa. Some
definitions in the .c files were extracted out into a file called
krpc.h, so that they could be included in multiple .c files.
This code has been in projects/nfsv4.1-client for some time.
Although no one has given it a formal review, I believe kib@
has taken a look at it.


# eb1b1807 05-Dec-2012 Gleb Smirnoff <glebius@FreeBSD.org>

Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags within sys.

Exceptions:

- sys/contrib not touched
- sys/mbuf.h edited manually


# 0c2222ba 02-Oct-2012 Pedro F. Giffuni <pfg@FreeBSD.org>

libtirpc: be sure to free cl_netid and cl_tp

When creating a client with clnt_tli_create, it uses strdup to copy
strings for these fields if nconf is passed in. clnt_dg_destroy frees
these strings already. Make sure clnt_vc_destroy frees them in the same
way.

This change matches the reference (OpenSolaris) implementation.

Tested by: David Wolfskill
Obtained from: Bull GNU/Linux NFSv4 Project (libtirpc)
MFC after: 2 weeks


# c148237d 23-Sep-2012 Pedro F. Giffuni <pfg@FreeBSD.org>

Partial revert of r239963:

The following change caused rpc.lockd to exit after startup:
____

libtirpc: be sure to free cl_netid and cl_tp

When creating a client with clnt_tli_create, it uses strdup to copy
strings for these fields if nconf is passed in. clnt_dg_destroy frees
these strings already. Make sure clnt_vc_destroy frees them in the
same way.
____

MFC after: 3 days
Reported by: David Wolfskill
Tested by: David Wolfskill


# 43981b6c 31-Aug-2012 Pedro F. Giffuni <pfg@FreeBSD.org>

Bring some changes from Bull's NFSv4 libtirpc implementation.

We especifically ignored the glibc compatibility changes
but this should help interaction with Solaris and Linux.
____

Fixed infinite loop in svc_run()
author Steve Dickson
Tue, 10 Jun 2008 12:35:52 -0500 (13:35 -0400)
Fixed infinite loop in svc_run()
____

__rpc_taddr2uaddr_af() assumes the netbuf to always have a
non-zero data. This is a bad assumption and can lead to a
seg-fault. This patch adds a check for zero length and returns
NULL when found.
author Steve Dickson
Mon, 27 Oct 2008 11:46:54 -0500 (12:46 -0400)
____

Changed clnt_spcreateerror() to return clearer
and more concise error messages.
author Steve Dickson
Thu, 20 Nov 2008 08:55:31 -0500 (08:55 -0500)
____

Converted all uid and gid variables of the type uid_t and gid_t.
author Steve Dickson
Wed, 28 Jan 2009 12:44:46 -0500 (12:44 -0500)
____

libtirpc: set r_netid and r_owner in __rpcb_findaddr_timed

These fields in the rpcbind GETADDR call are being passed uninitialized
to CLNT_CALL. In the case of x86_64 at least, this usually leads to a
segfault. On x86, it sometimes causes segfaults and other times causes
garbage to be sent on the wire.

rpcbind generally ignores the r_owner field for calls that come in over
the wire, so it really doesn't matter what we send in that slot. We just
need to send something. The reference implementation from Sun seems to
send a blank string. Have ours follow suit.
author Jeff Layton
Fri, 13 Mar 2009 11:44:16 -0500 (12:44 -0400)
____

libtirpc: be sure to free cl_netid and cl_tp

When creating a client with clnt_tli_create, it uses strdup to copy
strings for these fields if nconf is passed in. clnt_dg_destroy frees
these strings already. Make sure clnt_vc_destroy frees them in the same
way.

author Jeff Layton
Fri, 13 Mar 2009 11:47:36 -0500 (12:47 -0400)

Obtained from: Bull GNU/Linux NFSv4 Project
MFC after: 3 weeks


# 7b67bd9f 27-Apr-2011 Rick Macklem <rmacklem@FreeBSD.org>

This patch is believed to fix a problem in the kernel rpc for
non-interruptible NFS mounts, where a kernel thread will seem
to be stuck sleeping on "rpccon". The msleep() in clnt_vc_create()
that was waiting to a TCP connect to complete would return ERESTART,
since PCATCH was specified. Then the tsleep() in clnt_reconnect_call()
would sleep for 1 second and then try again and again and...
The patch changes the msleep() in clnt_vc_create() so it only sets
the PCATCH flag for interruptible cases.

Tested by: pho
Reviewed by: jhb
MFC after: 2 weeks


# 5e8eb3cd 12-Apr-2011 Rick Macklem <rmacklem@FreeBSD.org>

Fix a couple of mbuf leaks introduced by r217242. I do
not believe that these leaks had a practical impact,
since the situations in which they would have occurred
would have been extremely rare.

MFC after: 2 weeks


# 1fb51a12 16-Feb-2011 Bjoern A. Zeeb <bz@FreeBSD.org>

Mfp4 CH=177274,177280,177284-177285,177297,177324-177325

VNET socket push back:
try to minimize the number of places where we have to switch vnets
and narrow down the time we stay switched. Add assertions to the
socket code to catch possibly unset vnets as seen in r204147.

While this reduces the number of vnet recursion in some places like
NFS, POSIX local sockets and some netgraph, .. recursions are
impossible to fix.

The current expectations are documented at the beginning of
uipc_socket.c along with the other information there.

Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH
Reviewed by: jhb
Tested by: zec

Tested by: Mikolaj Golub (to.my.trociny gmail.com)
MFC after: 2 weeks


# 2a1e0fb4 10-Jan-2011 Rick Macklem <rmacklem@FreeBSD.org>

Fix a bug in the client side krpc where it was, sometimes
erroneously, assumed that 4 bytes of data were in the first
mbuf of a list by replacing the bcopy() with m_copydata().
Also, replace the uses of m_pullup(), which can fail for
reasons other than not enough data, with m_copydata().
For the cases where it isn't known that there is enough
data in the mbuf list, check first via m_len and m_length().
This is believed to fix a problem reported by dpd at dpdtech.com
and george+freebsd at m5p.com.

Reviewed by: jhb
MFC after: 8 days


# a7d5f7eb 19-Oct-2010 Jamie Gritton <jamie@FreeBSD.org>

A new jail(8) with a configuration file, to replace the work currently done
by /etc/rc.d/jail.


# cec077bc 12-Oct-2010 Rick Macklem <rmacklem@FreeBSD.org>

Fix the krpc so that it can handle NFSv3,UDP mounts with a read/write
data size greater than 8192. Since soreserve(so, 256*1024, 256*1024)
would always fail for the default value of sb_max, modify clnt_dg.c
so that it uses the calculated values and checks for an error return
from soreserve(). Also, add a check for error return from soreserve()
to clnt_vc.c and change __rpc_get_t_size() to use sb_max_adj instead of
the bogus maxsize == 256*1024.

PR: kern/150910
Reviewed by: jhb
MFC after: 2 weeks


# 43bd8298 15-Nov-2009 Rick Macklem <rmacklem@FreeBSD.org>

MFC: r199053
Add a check for the connection being shut down to the krpc
client just before queuing a request for the connection. The
code already had a check for the connection being shut down
while the request was queued, but not one for the shut down
having been initiated by the server before the request was
in the queue. This fixes some cases of problems w.r.t. reconnecting
to a NFS server that drops inactive TCP connections.

Tested by: Olaf Seibert, Daniel Braniss
Reviewed by: dfr


# f9917533 08-Nov-2009 Rick Macklem <rmacklem@FreeBSD.org>

Add a check for the connection being shut down to the krpc
client just before queuing a request for the connection. The
code already had a check for the connection being shut down
while the request was queued, but not one for the shut down
having been initiated by the server before the request was
in the queue. This appears to fix the problem of slow reconnects
against an NFS server that drops inactive connections reported
by Olaf Seibert, but does not fix the case
where the FreeBSD client generates RST segments at about the
same time as ACKs. This is still a problem that is being
investigated. This patch does not cause a regression for this
case.

Tested by: Olaf Seibert, Daniel Braniss
Reviewed by: dfr
MFC after: 5 days


# 83864c81 28-Aug-2009 Marko Zec <zec@FreeBSD.org>

MFC r196503:

Fix NFS panics with options VIMAGE kernels by apropriately setting curvnet
context inside the RPC code.

Temporarily set td's cred to mount's cred before calling socreate() via
__rpc_nconf2socket().

Submitted by: rmacklem (in part)
Reviewed by: rmacklem, rwatson
Discussed with: dfr, bz
Approved by: re (rwatson), julian (mentor)

Approved by: re (rwatson)


# 0348c661 24-Aug-2009 Marko Zec <zec@FreeBSD.org>

Fix NFS panics with options VIMAGE kernels by apropriately setting curvnet
context inside the RPC code.

Temporarily set td's cred to mount's cred before calling socreate() via
__rpc_nconf2socket().

Submitted by: rmacklem (in part)
Reviewed by: rmacklem, rwatson
Discussed with: dfr, bz
Approved by: re (rwatson), julian (mentor)
MFC after: 3 days


# b35687df 14-Jul-2009 Konstantin Belousov <kib@FreeBSD.org>

Use PBDRY flag for msleep(9) in NFS and NLM when sleeping thread owns
kernel resources that block other threads, like vnode locks. The SIGSTOP
sent to such thread (process, rather) shall not stop it until thread
releases the resources.

Tested by: pho
Reviewed by: jhb
Approved by: re (kensmith)


# 3144f812 04-Jun-2009 Rick Macklem <rmacklem@FreeBSD.org>

Fix upcall races in the client side krpc. For the client side upcall,
holding SOCKBUF_LOCK() isn't sufficient to guarantee that there is
no upcall in progress, since SOCKBUF_LOCK() is released/re-acquired
in the upcall. An upcall reference counter was added to the upcall
structure that is incremented at the beginning of the upcall and
decremented at the end of the upcall. As such, a reference count == 0
when holding the SOCKBUF_LOCK() guarantees there is no upcall in
progress. Add a function that is called just after soupcall_clear(),
which waits until the reference count == 0.
Also, move the mtx_destroy() down to after soupcall_clear(), so that
the mutex is not destroyed before upcalls are done.

Reviewed by: dfr, jhb
Tested by: pho
Approved by: kib (mentor)


# 74fb0ba7 01-Jun-2009 John Baldwin <jhb@FreeBSD.org>

Rework socket upcalls to close some races with setup/teardown of upcalls.
- Each socket upcall is now invoked with the appropriate socket buffer
locked. It is not permissible to call soisconnected() with this lock
held; however, so socket upcalls now return an integer value. The two
possible values are SU_OK and SU_ISCONNECTED. If an upcall returns
SU_ISCONNECTED, then the soisconnected() will be invoked on the
socket after the socket buffer lock is dropped.
- A new API is provided for setting and clearing socket upcalls. The
API consists of soupcall_set() and soupcall_clear().
- To simplify locking, each socket buffer now has a separate upcall.
- When a socket upcall returns SU_ISCONNECTED, the upcall is cleared from
the receive socket buffer automatically. Note that a SO_SND upcall
should never return SU_ISCONNECTED.
- All this means that accept filters should now return SU_ISCONNECTED
instead of calling soisconnected() directly. They also no longer need
to explicitly clear the upcall on the new socket.
- The HTTP accept filter still uses soupcall_set() to manage its internal
state machine, but other accept filters no longer have any explicit
knowlege of socket upcall internals aside from their return value.
- The various RPC client upcalls currently drop the socket buffer lock
while invoking soreceive() as a temporary band-aid. The plan for
the future is to add a new flag to allow soreceive() to be called with
the socket buffer locked.
- The AIO callback for socket I/O is now also invoked with the socket
buffer locked. Previously sowakeup() would drop the socket buffer
lock only to call aio_swake() which immediately re-acquired the socket
buffer lock for the duration of the function call.

Discussed with: rwatson, rmacklem


# a9ccfd56 11-Nov-2008 Doug Rabson <dfr@FreeBSD.org>

Add a missing call to mtx_destroy().


# a9148abd 03-Nov-2008 Doug Rabson <dfr@FreeBSD.org>

Implement support for RPCSEC_GSS authentication to both the NFS client
and server. This replaces the RPC implementation of the NFS client and
server with the newer RPC implementation originally developed
(actually ported from the userland sunrpc code) to support the NFS
Lock Manager. I have tested this code extensively and I believe it is
stable and that performance is at least equal to the legacy RPC
implementation.

The NFS code currently contains support for both the new RPC
implementation and the older legacy implementation inherited from the
original NFS codebase. The default is to use the new implementation -
add the NFS_LEGACYRPC option to fall back to the old code. When I
merge this support back to RELENG_7, I will probably change this so
that users have to 'opt in' to get the new code.

To use RPCSEC_GSS on either client or server, you must build a kernel
which includes the KGSSAPI option and the crypto device. On the
userland side, you must build at least a new libc, mountd, mount_nfs
and gssd. You must install new versions of /etc/rc.d/gssd and
/etc/rc.d/nfsd and add 'gssd_enable=YES' to /etc/rc.conf.

As long as gssd is running, you should be able to mount an NFS
filesystem from a server that requires RPCSEC_GSS authentication. The
mount itself can happen without any kerberos credentials but all
access to the filesystem will be denied unless the accessing user has
a valid ticket file in the standard place (/tmp/krb5cc_<uid>). There
is currently no support for situations where the ticket file is in a
different place, such as when the user logged in via SSH and has
delegated credentials from that login. This restriction is also
present in Solaris and Linux. In theory, we could improve this in
future, possibly using Brooks Davis' implementation of variant
symlinks.

Supporting RPCSEC_GSS on a server is nearly as simple. You must create
service creds for the server in the form 'nfs/<fqdn>@<REALM>' and
install them in /etc/krb5.keytab. The standard heimdal utility ktutil
makes this fairly easy. After the service creds have been created, you
can add a '-sec=krb5' option to /etc/exports and restart both mountd
and nfsd.

The only other difference an administrator should notice is that nfsd
doesn't fork to create service threads any more. In normal operation,
there will be two nfsd processes, one in userland waiting for TCP
connections and one in the kernel handling requests. The latter
process will create as many kthreads as required - these should be
visible via 'top -H'. The code has some support for varying the number
of service threads according to load but initially at least, nfsd uses
a fixed number of threads according to the value supplied to its '-n'
option.

Sponsored by: Isilon Systems
MFC after: 1 month


# d7f03759 19-Oct-2008 Ulf Lilleengen <lulf@FreeBSD.org>

- Import the HEAD csup code which is the basis for the cvsmode work.


# c675522f 26-Jun-2008 Doug Rabson <dfr@FreeBSD.org>

Re-implement the client side of rpc.lockd in the kernel. This implementation
provides the correct semantics for flock(2) style locks which are used by the
lockf(1) command line tool and the pidfile(3) library. It also implements
recovery from server restarts and ensures that dirty cache blocks are written
to the server before obtaining locks (allowing multiple clients to use file
locking to safely share data).

Sponsored by: Isilon Systems
PR: 94256
MFC after: 2 weeks


# ee31b83a 28-Mar-2008 Doug Rabson <dfr@FreeBSD.org>

Minor changes to improve compatibility with older FreeBSD releases.


# dfdcada3 26-Mar-2008 Doug Rabson <dfr@FreeBSD.org>

Add the new kernel-mode NFS Lock Manager. To use it instead of the
user-mode lock manager, build a kernel with the NFSLOCKD option and
add '-k' to 'rpc_lockd_flags' in rc.conf.

Highlights include:

* Thread-safe kernel RPC client - many threads can use the same RPC
client handle safely with replies being de-multiplexed at the socket
upcall (typically driven directly by the NIC interrupt) and handed
off to whichever thread matches the reply. For UDP sockets, many RPC
clients can share the same socket. This allows the use of a single
privileged UDP port number to talk to an arbitrary number of remote
hosts.

* Single-threaded kernel RPC server. Adding support for multi-threaded
server would be relatively straightforward and would follow
approximately the Solaris KPI. A single thread should be sufficient
for the NLM since it should rarely block in normal operation.

* Kernel mode NLM server supporting cancel requests and granted
callbacks. I've tested the NLM server reasonably extensively - it
passes both my own tests and the NFS Connectathon locking tests
running on Solaris, Mac OS X and Ubuntu Linux.

* Userland NLM client supported. While the NLM server doesn't have
support for the local NFS client's locking needs, it does have to
field async replies and granted callbacks from remote NLMs that the
local client has contacted. We relay these replies to the userland
rpc.lockd over a local domain RPC socket.

* Robust deadlock detection for the local lock manager. In particular
it will detect deadlocks caused by a lock request that covers more
than one blocking request. As required by the NLM protocol, all
deadlock detection happens synchronously - a user is guaranteed that
if a lock request isn't rejected immediately, the lock will
eventually be granted. The old system allowed for a 'deferred
deadlock' condition where a blocked lock request could wake up and
find that some other deadlock-causing lock owner had beaten them to
the lock.

* Since both local and remote locks are managed by the same kernel
locking code, local and remote processes can safely use file locks
for mutual exclusion. Local processes have no fairness advantage
compared to remote processes when contending to lock a region that
has just been unlocked - the local lock manager enforces a strict
first-come first-served model for both local and remote lockers.

Sponsored by: Isilon Systems
PR: 95247 107555 115524 116679
MFC after: 2 weeks