History log of /freebsd-10-stable/sys/nfs/nfs_common.c
Revision Date Author Comments
(<<< Hide modified files)
(Show modified files >>>)
# 256281 10-Oct-2013 gjb

Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.

Approved by: re (implicit)
Sponsored by: The FreeBSD Foundation

# 248318 15-Mar-2013 glebius

Use m_get() and m_getcl() instead of compat macros.


# 245568 17-Jan-2013 jhb

Remove the unused nfs_curusec().


# 243882 05-Dec-2012 glebius

Mechanically substitute flags from historic mbuf allocator with
malloc(9) flags within sys.

Exceptions:

- sys/contrib not touched
- sys/mbuf.h edited manually


# 221973 15-May-2011 rmacklem

Change the sysctl naming for the old and new NFS clients
to vfs.oldnfs.xxx and vfs.nfs.xxx respectively. This makes
the default nfs client use vfs.nfs.xxx after r221124.


# 203968 16-Feb-2010 marius

Factor out the code shared between NFS client and server into its own
module. With r203732 it became apparent that creating the sysctl nodes
twice causes at least a warning, however the whole code shouldn't be
present twice in the first place.

Discussed with: rmacklem


# 203732 09-Feb-2010 marius

- Move nfs_realign() from the NFS client to the shared NFS code and
remove the NFS server version in order to reduce code duplication.
The shared version now uses a second parameter how, which is passed
on to m_get(9) and m_getcl(9) as the server used M_WAIT while the
client requires M_DONTWAIT, and replaces the the previously unused
parameter hsiz.
- Change nfs_realign() to use nfsm_aligned() so as with other NFS code
the alignment check isn't actually performed on platforms without
strict alignment requirements for performance reasons because as the
comment suggests unaligned data only occasionally occurs with TCP.
- Change fha_extract_info() to use nfs_realign() with M_DONTWAIT rather
than M_WAIT because it's called with the RPC sp_lock held.

Reviewed by: jhb, rmacklem
MFC after: 1 week


# 203731 09-Feb-2010 marius

Some style(9) fixes


# 195631 12-Jul-2009 marcel

Revert rev 192323 (nfs_common.c only):
The D-cache flushing added here was to deal with I-cache
incoherency observed on ia64. However, the problem was
in the implementation of pmap_enter_object() for ia64:
it was missing I-cache coherency logic for prefaulted
pages. After this got added in rev 195625, testing showed
that no D-cache flushing was required.

The SIGILL that was observed on Book-E (see commit log
for rev 192323) ended up not being related to I-cache
incoherency, but was found to be caused by bad memory.
This discovery further undermined the need for D-cache
flushing in the NFS I/O code, triggering the reversal.

Approved by: re (kensmith)


# 195202 30-Jun-2009 dfr

Remove the old kernel RPC implementation and the NFS_LEGACYRPC option.

Approved by: re


# 192323 18-May-2009 marcel

Add cpu_flush_dcache() for use after non-DMA based I/O so that a
possible future I-cache coherency operation can succeed. On ARM
for example the L1 cache can be (is) virtually mapped, which
means that any I/O that uses temporary mappings will not see the
I-cache made coherent. On ia64 a similar behaviour has been
observed. By flushing the D-cache, execution of binaries backed
by md(4) and/or NFS work reliably.
For Book-E (powerpc), execution over NFS exhibits SIGILL once in
a while as well, though cpu_flush_dcache() hasn't been implemented
yet.

Doing an explicit D-cache flush as part of the non-DMA based I/O
read operation eliminates the need to do it as part of the
I-cache coherency operation itself and as such avoids pessimizing
the DMA-based I/O read operations for which D-cache are already
flushed/invalidated. It also allows future optimizations whereby
the bcopy() followed by the D-cache flush can be integrated in a
single operation, which could be implemented using on-chips DMA
engines, by-passing the D-cache altogether.


# 177599 25-Mar-2008 ru

Replaced the misleading uses of a historical artefact M_TRYWAIT with M_WAIT.
Removed dead code that assumed that M_TRYWAIT can return NULL; it's not true
since the advent of MBUMA.

Reviewed by: arch

There are ongoing disputes as to whether we want to switch to directly using
UMA flags M_WAITOK/M_NOWAIT for mbuf(9) allocation.


# 148008 14-Jul-2005 ps

Fixes for NFS crashes on architectures that require strict alignment.
- Fix nfsm_disct() so that after pulling up data, the remaining data
is aligned if necessary.
- Fix nfs_clnt_tcp_soupcall() to bcopy() the rpc length out of the
mbuf (instead of casting m_data to a uint32).

Submitted by: Pyun YongHyeon
Reviewed by: Mohan Srinivasan


# 139823 06-Jan-2005 imp

/* -> /*- for license, minor formatting changes


# 138463 06-Dec-2004 ps

Add non-blocking versions of nfsm_dissect() and friends, for use from
socket callbacks or similar callers, from both the NFS client and the
server.
Instituted nfsm_dissect_nonblock(), nfsm_dissect_xx_nonblock(). And
nfsm_disct() now takes an extra M_TRYWAIT/M_DONTWAIT argument.

Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com


# 127977 07-Apr-2004 imp

Remove advertising clause from University of California Regent's
license, per letter dated July 22, 1999 and email from Peter Wemm,
Alan Cox and Robert Watson.

Approved by: core, peter, alc, rwatson


# 111119 19-Feb-2003 imp

Back out M_* changes, per decision of the TRB.

Approved by: trb


# 109623 21-Jan-2003 alfred

Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.
Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.


# 104908 11-Oct-2002 mike

Change iov_base's type from `char *' to the standard `void *'. All
uses of iov_base which assume its type is `char *' (in order to do
pointer arithmetic) have been updated to cast iov_base to `char *'.


# 92784 20-Mar-2002 jeff

Remove unused include.


# 88731 31-Dec-2001 iedowse

When the old nfsm_adv() macro was moved to nfsm_adv_xx(), a '>='
must have been inadvertently changed to '>'. This broke nfsm_adv()
in the case where the advancement count is equal to the amount of
data remaining in the current mbuf. Instead of moving the current
position N bytes forward, nfs_adv() could end up moving it back to
N bytes from the start of the mbuf data.

This should fix the client-side readdirplus problems that have been
reported since September.


# 88091 17-Dec-2001 iedowse

Avoid passing the variable `tl' to functions that just use it for
temporary storage. In the old NFS code it wasn't at all clear if
the value of `tl' was used across or after macro calls, but I'm
fairly confident that the convention was to keep its use local.
Each ex-macro function now uses a local version of this variable,
so all of the double-indirection goes away.

The only exception to the `local use' rule for `tl' is nfsm_clget(),
which is left unchanged by this commit.

Reviewed by: peter


# 84079 28-Sep-2001 peter

Unwind some more macros. NFSMADV() was kinda silly since it was right
next to equivalent m_len adjustments. Move the nfsm_subs.h macros
into groups depending on which phase they are used in, since that
affects the error recovery requirements. Collect some of the common error
checking into a single macro as preparation for unwinding some more.
Have nfs_rephead return a value instead of secretly modifying args.
Remove some unused function arguments that were being passed around.
Clarify nfsm_reply()'s error handling (I hope).


# 84057 27-Sep-2001 peter

Make nfsm_dissect() have an obvious return value.


# 84002 27-Sep-2001 peter

Tidy up nfsm_build usage. This is only partially finished.


# 83651 18-Sep-2001 peter

Cleanup and split of nfs client and server code.
This builds on the top of several repo-copies.


# 83366 12-Sep-2001 julian

KSE Milestone 2
Note ALL MODULES MUST BE RECOMPILED
make the kernel aware that there are smaller units of scheduling than the
process. (but only allow one thread per process at this time).
This is functionally equivalent to teh previousl -current except
that there is a thread associated with each process.

Sorry john! (your next MFC will be a doosie!)

Reviewed by: peter@freebsd.org, dillon@freebsd.org

X-MFC after: ha ha ha ha


# 79224 04-Jul-2001 dillon

With Alfred's permission, remove vm_mtx in favor of a fine-grained approach
(this commit is just the first stage). Also add various GIANT_ macros to
formalize the removal of Giant, making it easy to test in a more piecemeal
fashion. These macros will allow us to test fine-grained locks to a degree
before removing Giant, and also after, and to remove Giant in a piecemeal
fashion via sysctl's on those subsystems which the authors believe can
operate without Giant.


# 78911 28-Jun-2001 jhb

- Protect the mnt_vnode list with the mntvnode lock.
- Use queue(9) macros.


# 76827 18-May-2001 alfred

Introduce a global lock for the vm subsystem (vm_mtx).

vm_mtx does not recurse and is required for most low level
vm operations.

faults can not be taken without holding Giant.

Memory subsystems can now call the base page allocators safely.

Almost all atomic ops were removed as they are covered under the
vm mutex.

Alpha and ia64 now need to catch up to i386's trap handlers.

FFS and NFS have been tested, other filesystems will need minor
changes (grabbing the vm lock when twiddling page properties).

Reviewed (partially) by: jake, jhb


# 76117 29-Apr-2001 grog

Revert consequences of changes to mount.h, part 2.

Requested by: bde


# 75858 23-Apr-2001 grog

Correct #includes to work with fixed sys/mount.h.


# 70254 21-Dec-2000 bmilekic

* Rename M_WAIT mbuf subsystem flag to M_TRYWAIT.
This is because calls with M_WAIT (now M_TRYWAIT) may not wait
forever when nothing is available for allocation, and may end up
returning NULL. Hopefully we now communicate more of the right thing
to developers and make it very clear that it's necessary to check whether
calls with M_(TRY)WAIT also resulted in a failed allocation.
M_TRYWAIT basically means "try harder, block if necessary, but don't
necessarily wait forever." The time spent blocking is tunable with
the kern.ipc.mbuf_wait sysctl.
M_WAIT is now deprecated but still defined for the next little while.

* Fix a typo in a comment in mbuf.h

* Fix some code that was actually passing the mbuf subsystem's M_WAIT to
malloc(). Made it pass M_WAITOK instead. If we were ever to redefine the
value of the M_WAIT flag, this could have became a big problem.


# 68711 14-Nov-2000 mckusick

In preparation for deprecating CIRCLEQ macros in favor of TAILQ
macros which provide the same functionality and are a bit more
efficient, convert use of CIRCLEQ's in NFS to TAILQ's.


# 67486 24-Oct-2000 dwmalone

Problem to avoid processes getting stuck in "vmopar". From Ian's
mail:

The problem seems to originate with NFS's postop_attr
information that is returned with a read or write RPC.
Within a vm_fault context, the code cannot deal with
vnode_pager_setsize() shrinking a vnode.

The workaround in the patch below stops the nfsm_postop_attr()
macro from ever shrinking a vnode. If the new size in the
postop_attr information is smaller, then it just sets the
nfsnode n_attrstamp to 0 to stop the wrong size getting
used in the future. This change only affects postop_attr
attributes; the nfsm_loadattr() macro works as normal.

The change is implemented by adding a new argument to
nfs_loadattrcache() called 'dontshrink'. When this is
non-zero, nfs_loadattrcache() will never reduce the
vnode/nfsnode size; instead it zeros n_attrstamp.

There remain other was processes can get stuck in vmopar.

Submitted by: Ian Dowse <iedowse@maths.tcd.ie>
Reviewed by: dillon
Tested by: Vadim Belman <voland@lflat.org>


# 63788 24-Jul-2000 mckusick

This patch corrects the first round of panics and hangs reported
with the new snapshot code.

Update addaliasu to correctly implement the semantics of the old
checkalias function. When a device vnode first comes into existence,
check to see if an anonymous vnode for the same device was created
at boot time by bdevvp(). If so, adopt the bdevvp vnode rather than
creating a new vnode for the device. This corrects a problem which
caused the kernel to panic when taking a snapshot of the root
filesystem.

Change the calling convention of vn_write_suspend_wait() to be the
same as vn_start_write().

Split out softdep_flushworklist() from softdep_flushfiles() so that
it can be used to clear the work queue when suspending filesystem
operations.

Access to buffers becomes recursive so that snapshots can recursively
traverse their indirect blocks using ffs_copyonwrite() when checking
for the need for copy on write when flushing one of their own indirect
blocks. This eliminates a deadlock between the syncer daemon and a
process taking a snapshot.

Ensure that softdep_process_worklist() can never block because of a
snapshot being taken. This eliminates a problem with buffer starvation.

Cleanup change in ffs_sync() which did not synchronously wait when
MNT_WAIT was specified. The result was an unclean filesystem panic
when doing forcible unmount with heavy filesystem I/O in progress.

Return a zero'ed block when reading a block that was not in use at
the time that a snapshot was taken. Normally, these blocks should
never be read. However, the readahead code will occationally read
them which can cause unexpected behavior.

Clean up the debugging code that ensures that no blocks be written
on a filesystem while it is suspended. Snapshots must explicitly
label the blocks that they are writing during the suspension so that
they do not cause a `write on suspended filesystem' panic.

Reorganize ffs_copyonwrite() to eliminate a deadlock and also to
prevent a race condition that would permit the same block to be
copied twice. This change eliminates an unexpected soft updates
inconsistency in fsck caused by the double allocation.

Use bqrelse rather than brelse for buffers that will be needed
soon again by the snapshot code. This improves snapshot performance.


# 60938 26-May-2000 jake

Back out the previous change to the queue(3) interface.
It was not discussed and should probably not happen.

Requested by: msmith and others


# 60833 23-May-2000 jake

Change the way that the queue(3) structures are declared; don't assume that
the type argument to *_HEAD and *_ENTRY is a struct.

Suggested by: phk
Reviewed by: phk
Approved by: mdodd


# 60041 05-May-2000 phk

Separate the struct bio related stuff out of <sys/buf.h> into
<sys/bio.h>.

<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall
not be made a nested include according to bdes teachings on the
subject of nested includes.

Diskdrivers and similar stuff below specfs::strategy() should no
longer need to include <sys/buf.> unless they need caching of data.

Still a few bogus uses of struct buf to track down.

Repocopy by: peter


# 59391 19-Apr-2000 phk

Remove ~25 unneeded #include <sys/conf.h>
Remove ~60 unneeded #include <sys/malloc.h>


# 57178 13-Feb-2000 peter

Clean up some loose ends in the network code, including the X.25 and ISO
#ifdefs. Clean out unused netisr's and leftover netisr linker set gunk.
Tested on x86 and alpha, including world.

Approved by: jkh


# 55934 13-Jan-2000 dillon

The alpha build cuases the 'nfsuid bloated' warning to occur. Well,
there is nothing we can do about it. In fact, after further review
there simply are not very many instances of the two structures NFS
checks for 'bloat' so I've decided to simply rip the checks out entirely.

Submitted by: Andrew Gallatin <gallatin@cs.duke.edu>


# 54970 21-Dec-1999 alfred

make getfh a standard syscall instead of dependant on having
NFSSERVER defined, useful for userland fileservers that want to
use a filehandle type interface to the filesystem.

Submitted by: Assar Westerlund assar@stacken.kth.se
PR: kern/15452


# 54485 12-Dec-1999 dillon

Fix a number of server-side issues related to aborting badly formed
NFS packets, mainly initializing structure pointers to NULL which
are conditionally freed prior to return.

PR: kern/15249
Submitted by: Ian Dowse <iedowse@maths.tcd.ie>


# 54480 12-Dec-1999 dillon

Synopsis of problem being fixed: Dan Nelson originally reported that
blocks of zeros could wind up in a file written to over NFS by a client.
The problem only occurs a few times per several gigabytes of data. This
problem turned out to be bug #3 below.

bug #1:

B_CLUSTEROK must be cleared when an NFS buffer is reverted from
stage 2 (ready for commit rpc) to stage 1 (ready for write).
Reversions can occur when a dirty NFS buffer is redirtied with new
data.

Otherwise the VFS/BIO system may end up thinking that a stage 1
NFS buffer is clusterable. Stage 1 NFS buffers are not clusterable.

bug #2:

B_CLUSTEROK was inappropriately set for a 'short' NFS buffer (short
buffers only occur near the EOF of the file). Change to only set
when the buffer is a full biosize (usually 8K). This bug has no
effect but should be fixed in -current anyway. It need not be
backported.

bug #3:

B_NEEDCOMMIT was inappropriately set in nfs_flush() (which is
typically only called by the update daemon). nfs_flush()
does a multi-pass loop but due to the lack of vnode locking it
is possible for new buffers to be added to the dirtyblkhd list
while a flush operation is going on. This may result in nfs_flush()
setting B_NEEDCOMMIT on a buffer which has *NOT* yet gone through its
stage 1 write, causing only the commit rpc to be made and thus
causing the contents of the buffer to be thrown away (never sent to
the server).

The patch also contains some cleanup, which only applies to the commit
into -current.

Reviewed by: dg, julian
Originally Reported by: Dan Nelson <dnelson@emsphone.com>


# 51906 03-Oct-1999 phk

Before we start to mess with the VFS name-cache clean things up a little bit:
Isolate the namecache in its own file, and give it a dedicated malloc type.


# 51138 10-Sep-1999 alfred

Seperate the export check in VFS_FHTOVP, exports are now checked via
VFS_CHECKEXP.

Add fh(open|stat|stafs) syscalls to allow userland to query filesystems
based on (network) filehandle.

Obtained from: NetBSD


# 50521 28-Aug-1999 phk

remove unused variables.


# 50477 27-Aug-1999 peter

$Id$ -> $FreeBSD$


# 50405 26-Aug-1999 phk

Simplify the handling of VCHR and VBLK vnodes using the new dev_t:

Make the alias list a SLIST.

Drop the "fast recycling" optimization of vnodes (including
the returning of a prexisting but stale vnode from checkalias).
It doesn't buy us anything now that we don't hardlimit
vnodes anymore.

Rename checkalias2() and checkalias() to addalias() and
addaliasu() - which takes dev_t and udev_t arg respectively.

Make the revoke syscalls use vcount() instead of VALIASED.

Remove VALIASED flag, we don't need it now and it is faster
to traverse the much shorter lists than to maintain the
flag.

vfs_mountedon() can check the dev_t directly, all the vnodes
point to the same one.

Print the devicename in specfs/vprint().

Remove a couple of stale LFS vnode flags.

Remove unimplemented/unused LK_DRAINED;


# 49535 08-Aug-1999 phk

Decommision miscfs/specfs/specdev.h. Most of it goes into <sys/conf.h>,
a few lines into <sys/vnode.h>.

Add a few fields to struct specinfo, paving the way for the fun part.


# 48859 17-Jul-1999 phk

I have not one single time remembered the name of this function correctly
so obviously I gave it the wrong name. s/umakedev/makeudev/g


# 48274 27-Jun-1999 peter

Minor tweaks to make sure (new) prerequisites for <sys/buf.h> (mostly
splbio()/splx()) are #included in time.


# 48225 26-Jun-1999 mckusick

Convert buffer locking from using the B_BUSY and B_WANTED flags to using
lockmgr locks. This commit should be functionally equivalent to the old
semantics. That is, all buffer locking is done with LK_EXCLUSIVE
requests. Changes to take advantage of LK_SHARED and LK_RECURSIVE will
be done in future commits.


# 48125 23-Jun-1999 julian

Matt's NFS fixes.
Submitted by: Matt Dillon
Reviewed by: David Cross, Julian Elischer, Mike Smith, Drew Gallatin
3.2 version to follow when tested


# 47751 05-Jun-1999 peter

Various changes lifted from the OpenBSD cvs tree:

txdr_hyper and fxdr_hyper tweaks to avoid excessive CPU order knowledge.

nfs_serv.c: don't call nfsm_adj() with negative values, windows clients
could crash servers when doing a readdir of a large directory.

nfs_socket.c: Use IP_PORTRANGE to get a priviliged port without a spin
loop trying to bind(). Don't clobber a mbuf pointer or we get panics
on a NFS3ERR_JUKEBOX error from a server when reusing a freed mbuf.

nfs_subs.c: Don't loose st_blocks on NFSv2 mounts when > 2GB.

Obtained from: OpenBSD


# 47028 11-May-1999 phk

Divorce "dev_t" from the "major|minor" bitmap, which is now called
udev_t in the kernel but still called dev_t in userland.

Provide functions to manipulate both types:
major() umajor()
minor() uminor()
makedev() umakedev()
dev2udev() udev2dev()

For now they're functions, they will become in-line functions
after one of the next two steps in this process.

Return major/minor/makedev to macro-hood for userland.

Register a name in cdevsw[] for the "filedescriptor" driver.

In the kernel the udev_t appears in places where we have the
major/minor number combination, (ie: a potential device: we
may not have the driver nor the device), like in inodes, vattr,
cdevsw registration and so on, whereas the dev_t appears where
we carry around a reference to a actual device.

In the future the cdevsw and the aliased-from vnode will be hung
directly from the dev_t, along with up to two softc pointers for
the device driver and a few houskeeping bits. This will essentially
replace the current "alias" check code (same buck, bigger bang).

A little stunt has been provided to try to catch places where the
wrong type is being used (dev_t vs udev_t), if you see something
not working, #undef DEVT_FASCIST in kern/kern_conf.c and see if
it makes a difference. If it does, please try to track it down
(many hands make light work) or at least try to reproduce it
as simply as possible, and describe how to do that.

Without DEVT_FASCIST I belive this patch is a no-op.

Stylistic/posixoid comments about the userland view of the <sys/*.h>
files welcome now, from userland they now contain the end result.

Next planned step: make all dev_t's refer to the same devsw[] which
means convert BLK's to CHR's at the perimeter of the vnodes and
other places where they enter the game (bootdev, mknod, sysctl).


# 44101 17-Feb-1999 bde

Fixed bitrot in NFS_ACDEBUG option.


# 43305 27-Jan-1999 dillon

Fix warnings in preparation for adding -Wall -Wcast-qual to the
kernel compile


# 42957 21-Jan-1999 dillon

This is a rather large commit that encompasses the new swapper,
changes to the VM system to support the new swapper, VM bug
fixes, several VM optimizations, and some additional revamping of the
VM code. The specific bug fixes will be documented with additional
forced commits. This commit is somewhat rough in regards to code
cleanup issues.

Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>


# 42315 05-Jan-1999 eivind

Remove the 'waslocked' parameter to vfs_object_create().


# 41796 14-Dec-1998 dt

Added 3 new errno values, requred by various standards: EOVERFLOW,
ECANCELED, EILSEQ.

Fixed ibcs2 and especially linux EIDRM and ENOMSG errno mapping.
Reviewed by: Dan Nelson <dnelson@emsphone.com>


# 41591 07-Dec-1998 archie

The "easy" fixes for compiling the kernel -Wunused: remove unreferenced static
and local variables, goto labels, and functions declared but not defined.


# 41026 09-Nov-1998 peter

Remove [apparently] bogus casts to u_long for the vnode_pager_setsize()
second argument. np_size is a 64 bit int, so is the second arg. This
might have caused needless 2G/4G file size problems.

I believe it was Bruce who queried this.


# 40790 31-Oct-1998 peter

Use TAILQ macros for clean/dirty block list processing. Set b_xflags
rather than abusing the list next pointer with a magic number.


# 38894 07-Sep-1998 bde

Made unloading of the nfs LKM sort of work. This is mainly to test
detachment of vfs sysctls. Unloading of vfs LKMs doesn't actually
work for any vfs, since it leaves garbage pointers to memory
allocation control structures.


# 38866 05-Sep-1998 bde

Instantiate `nfs_mount_type' in a standard file so that it is present
when nfs is an LKM. Declare it in a header file. Don't forget to use
it in non-Lite2 code. Initialize it to -1 instead of to 0, since 0
will soon be the mount type number for the first vfs loaded.

NetBSD uses strcmp() to avoid this ugly global.


# 37089 21-Jun-1998 bde

Fixed typo in ifdefed code. (NFS_ACDEBUG is not in LINT. Therefore,
code controlled by it did not even compile.)


# 36979 14-Jun-1998 bde

Avoid an egcs pessimization for 64-bit signed division on i386's.
Pre-2.8 versions of gcc generate a call to __divdi3() for all 64-bit
signed divisions, but egcs optimizes them to a shift and fixup when
the divisor is a constant power of 2. Unfortunately, it generates
a call to __cmpdi2() for the fixup, although all except possibly
ancient versions of gcc and egcs do ordinary 64-bit comparisons
inline.


# 36541 31-May-1998 peter

For the on-the-wire protocol, u_long -> u_int32_t; long -> int32_t;
int -> int32_t; u_short -> u_int16_t. Also, use mode_t instead of u_short
for storing modes (mode_t is a u_int16_t).

Obtained from: NetBSD


# 36534 31-May-1998 peter

Prototype support for selectively allowing non-reserved ports on a per
export basis. Needs userland support yet.

Obtained from: NetBSD


# 36519 31-May-1998 peter

#ifdef a diagnostic panic, plus another missed costmetic change.

Obtained from: NetBSD


# 36518 31-May-1998 peter

We have gained 2 more errno's, add them to the NFSv2 mapping table.


# 36511 31-May-1998 peter

Some const's

Obtained from: NetBSD


# 36503 31-May-1998 peter

NFS Jumbo commit part 1. Cosmetic and structural changes only. The aim
of this part of commits is to minimize unnecessary differences between
the other NFS's of similar origin. Yes, there are gratuitous changes here
that the style folks won't like, but it makes the catch-up less difficult.


# 36329 24-May-1998 peter

Convert a couple of large allocations to use zones rather than malloc
for better packing. This means that we can choose better values for the
various hash entries without having to try and get it all to fit within
an artificial power of two limit for malloc's sake.


# 36176 19-May-1998 peter

Allow control of the attribute cache timeouts at mount time.

We had run out of bits in the nfs mount flags, I have moved the internal
state flags into a seperate variable. These are no longer visible via
statfs(), but I don't know of anything that looks at them.


# 35066 06-Apr-1998 phk

Use random() to find our initial xid.


# 34961 30-Mar-1998 phk

Eradicate the variable "time" from the kernel, using various measures.
"time" wasn't a atomic variable, so splfoo() protection were needed
around any access to it, unless you just wanted the seconds part.

Most uses of time.tv_sec now uses the new variable time_second instead.

gettime() changed to getmicrotime(0.

Remove a couple of unneeded splfoo() protections, the new getmicrotime()
is atomic, (until Bruce sets a breakpoint in it).

A couple of places needed random data, so use read_random() instead
of mucking about with time which isn't random.

Add a new nfs_curusec() function.

Mark a couple of bogosities involving the now disappeard time variable.

Update ffs_update() to avoid the weird "== &time" checks, by fixing the
one remaining call that passwd &time as args.

Change profiling in ncr.c to use ticks instead of time. Resolution is
the same.

Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call
hzto() which subtracts time" sequences.

Reviewed by: bde


# 33134 06-Feb-1998 eivind

Back out DIAGNOSTIC changes.


# 33108 04-Feb-1998 eivind

Turn DIAGNOSTIC into a new-style option.


# 32011 27-Dec-1997 bde

Unspammed nested include of <vm/vm_zone.h>.


# 31886 19-Dec-1997 bde

Added a used include.

Fixed a gratuitous ANSIism and nearby KNF violations.


# 31016 07-Nov-1997 phk

Remove a bunch of variables which were unused both in GENERIC and LINT.

Found by: -Wunused


# 30813 28-Oct-1997 bde

Removed unused #includes.


# 30808 28-Oct-1997 bde

Don't #include <nfs/nfs.h> in <nfs/nfs_node.h> if KERNEL is defined.
Fixed everything that depended on the nested include.


# 30738 26-Oct-1997 phk

Always initialize the syscall vectors for our "private" syscalls (not
just in the LKM case).
Plug nqnfs_vop_lease_check directly into the default_vnodeop_p table.


# 29653 21-Sep-1997 dyson

Change the M_NAMEI allocations to use the zone allocator. This change
plus the previous changes to use the zone allocator decrease the useage
of malloc by half. The Zone allocator will be upgradeable to be able
to use per CPU-pools, and has more intelligent usage of SPLs. Additionally,
it has reasonable stats gathering capabilities, while making most calls
inline.


# 29288 10-Sep-1997 phk

unifdef -U__NetBSD__ -D__FreeBSD__


# 28270 16-Aug-1997 wollman

Fix all areas of the system (or at least all those in LINT) to avoid storing
socket addresses in mbufs. (Socket buffers are the one exception.) A number
of kernel APIs needed to get fixed in order to make this happen. Also,
fix three protocol families which kept PCBs in mbufs to not malloc them
instead. Delete some old compatibility cruft while we're at it, and add
some new routines in the in_cksum family.


# 27609 22-Jul-1997 dfr

Correct some dumb mistakes in the WebNFS stuff.

Submitted by: bde


# 27446 16-Jul-1997 dfr

Merge WebNFS changes from NetBSD.

Obtained from: NetBSD


# 24626 04-Apr-1997 dfr

Fix various bugs in the locking protocol, allowing proper shared locks
to be used. This should fix the lock panics that people are seeing.


# 22975 22-Feb-1997 peter

Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.


# 22521 10-Feb-1997 dyson

This is the kernel Lite/2 commit. There are some requisite userland
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.

The system boots and can mount UFS filesystems.

Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
Mount_std mounts will not work until the getfsent
library routine is changed.

Reviewed by: various people
Submitted by: Jeffery Hsu <hsu@freebsd.org>


# 21673 14-Jan-1997 jkh

Make the long-awaited change from $Id$ to $FreeBSD$

This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.


# 19449 06-Nov-1996 dfr

Improve the queuing algorithms used by NFS' asynchronous i/o. The
existing mechanism uses a global queue for some buffers and the
vp->b_dirtyblkhd queue for others. This turns sequential writes into
randomly ordered writes to the server, affecting both read and write
performance. The existing mechanism also copes badly with hung
servers, tending to block accesses to other servers when all the iods
are waiting for a hung server.

The new mechanism uses a queue for each mount point. All asynchronous
i/o goes through this queue which preserves the ordering of requests.
A simple mechanism ensures that the iods are shared out fairly between
active mount points. This removes the sysctl variable vfs.nfs.dwrite
since the new queueing mechanism removes the old delayed write code
completely.

This should go into the 2.2 branch.


# 18397 19-Sep-1996 nate

In sys/time.h, struct timespec is defined as:

/*
* Structure defined by POSIX.4 to be like a timeval.
*/
struct timespec {
time_t ts_sec; /* seconds */
long ts_nsec; /* and nanoseconds */
};

The correct names of the fields are tv_sec and tv_nsec.

Reminded by: James Drobina <jdrobina@infinet.com>


# 17761 21-Aug-1996 dyson

Even though this looks like it, this is not a complex code change.
The interface into the "VMIO" system has changed to be more consistant
and robust. Essentially, it is now no longer necessary to call vn_open
to get merged VM/Buffer cache operation, and exceptional conditions
such as merged operation of VBLK devices is simpler and more correct.

This code corrects a potentially large set of problems including the
problems with ktrace output and loaded systems, file create/deletes,
etc.

Most of the changes to NFS are cosmetic and name changes, eliminating
a layer of subroutine calls. The direct calls to vput/vrele have
been re-instituted for better cross platform compatibility.

Reviewed by: davidg


# 17186 16-Jul-1996 dfr

Various fixes from frank@fwi.uva.nl (Frank van der Linden) via
rick@snowhite.cis.uoguelph.ca:

1. Clear B_NEEDCOMMIT in nfs_write to make sure that dirty data is
correctly send to the server. If a buffer was dirtied when it was in
the B_DELWRI+B_NEEDCOMMIT state, the state of the buffer was left
unchanged and when the buffer was later cleaned, just a commit rpc was
made to the server to complete the previous write. Clearing
B_NEEDCOMMIT ensures that another write is made to the server.

2. If a server returned a server (for whatever reason) returned an
answer to a write RPC that implied that fewer bytes than requested
were written, bad things would happen.

3. The setattr operation passed on the atime in stead of the mtime to
the server. The fix is trivial.

4. XIDs always started at 0, but this caused some servers (older DEC
OSF/1 3.0 so I've been told) who had very long-lasting XID caches to
get confused if, after a reboot of a BSD client, RPCs came in with a
XID that had in the past been used before from that client. Patch is
to use the current time in seconds as a starting point for XIDs. The
patch below is not perfect, because it requires the root fs to be
mounted first. This is because of the check BSD systems do, comparing
FS time to system time.

Reviewed by: Bruce Evans, Terry Lambert.
Obtained from: frank@fwi.uva.nl (Frank van der Linden) via rick@snowhite.cis.uoguelph.ca


# 16634 23-Jun-1996 bde

Don't truncate minor or major numbers in the nfsv3 client.


# 16365 14-Jun-1996 phk

Fix for NFS_NOSERVER

Poul mentioned that he thought this was some kind of timing problem, and
that started me thinking. After a little poking around, I found that
nfs_timer() was completely disabled when NFS_NOSERVER was #defined.
But after looking at nfs_timer(), it seemed like it was something
required by both the client and server code, and disabling it outright
just didn't seem to make any sense. Parts of it relate only to the
NFS server side code, so I disabled those, but I re-enabled the rest
of the function and made sure that it would be called from nfs_init()
(in nfs_subs.c).

With nfs_timer() re-enabled, everything seems to work again. The only
other changes I made were to #ifdef away some variable declarations
in the NFS_NOSERVER case so that gcc would stop complaining about
unused variables.

Reviewed by: phk
Submitted by: Bill Paul <wpaul@skynet.ctr.columbia.edu>


# 13490 19-Jan-1996 dyson

Eliminated many redundant vm_map_lookup operations for vm_mmap.
Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish
overhead for merged cache.
Efficiency improvement for vfs_cluster. It used to do alot of redundant
calls to cluster_rbuild.
Correct the ordering for vrele of .text and release of credentials.
Use the selective tlb update for 486/586/P6.
Numerous fixes to the size of objects allocated for files. Additionally,
fixes in the various pagers.
Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs.
Fixes in the swap pager for exhausted resources. The pageout code
will not as readily thrash.
Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into
page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE),
thereby improving efficiency of several routines.
Eliminate even more unnecessary vm_page_protect operations.
Significantly speed up process forks.
Make vm_object_page_clean more efficient, thereby eliminating the pause
that happens every 30seconds.
Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the
case of filesystems mounted async.
Fix a panic with busy pages when write clustering is done for non-VMIO
buffers.


# 13416 13-Jan-1996 phk

Add an option NFS_NOSERVER which saves 100K in the install kernel (or
any other kernel that uses it). Use with option NFS.


# 12911 17-Dec-1995 phk

Staticize.


# 12662 07-Dec-1995 dg

Untangled the vm.h include file spaghetti.


# 11921 29-Oct-1995 phk

Second batch of cleanup changes.
This time mostly making a lot of things static and some unused
variables here and there.


# 10219 24-Aug-1995 dfr

Add support for amd direct maps.

Reviewed by: Thomas Graichen <graichen@sirius.physik.fu-berlin.de>


# 9759 29-Jul-1995 bde

Eliminate sloppy common-style declarations. There should be none left for
the LINT configuation.


# 9588 20-Jul-1995 dg

vnode_pager_alloc() never returns NULL, so don't check for it.


# 9507 13-Jul-1995 dg

NOTE: libkvm, w, ps, 'top', and any other utility which depends on struct
proc or any VM system structure will have to be rebuilt!!!

Much needed overhaul of the VM system. Included in this first round of
changes:

1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages,
haspage, and sync operations are supported. The haspage interface now
provides information about clusterability. All pager routines now take
struct vm_object's instead of "pagers".

2) Improved data structures. In the previous paradigm, there is constant
confusion caused by pagers being both a data structure ("allocate a
pager") and a collection of routines. The idea of a pager structure has
escentially been eliminated. Objects now have types, and this type is
used to index the appropriate pager. In most cases, items in the pager
structure were duplicated in the object data structure and thus were
unnecessary. In the few cases that remained, a un_pager structure union
was created in the object to contain these items.

3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now
be removed. For instance, vm_object_enter(), vm_object_lookup(),
vm_object_remove(), and the associated object hash list were some of the
things that were removed.

4) simple_lock's removed. Discussion with several people reveals that the
SMP locking primitives used in the VM system aren't likely the mechanism
that we'll be adopting. Even if it were, the locking that was in the code
was very inadequate and would have to be mostly re-done anyway. The
locking in a uni-processor kernel was a no-op but went a long way toward
making the code difficult to read and debug.

5) Places that attempted to kludge-up the fact that we don't have kernel
thread support have been fixed to reflect the reality that we are really
dealing with processes, not threads. The VM system didn't have complete
thread support, so the comments and mis-named routines were just wrong.
We now use tsleep and wakeup directly in the lock routines, for instance.

6) Where appropriate, the pagers have been improved, especially in the
pager_alloc routines. Most of the pager_allocs have been rewritten and
are now faster and easier to maintain.

7) The pagedaemon pageout clustering algorithm has been rewritten and
now tries harder to output an even number of pages before and after
the requested page. This is sort of the reverse of the ideal pagein
algorithm and should provide better overall performance.

8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup
have been removed. Some other unnecessary casts have also been removed.

9) Some almost useless debugging code removed.

10) Terminology of shadow objects vs. backing objects straightened out.
The fact that the vm_object data structure escentially had this
backwards really confused things. The use of "shadow" and "backing
object" throughout the code is now internally consistent and correct
in the Mach terminology.

11) Several minor bug fixes, including one in the vm daemon that caused
0 RSS objects to not get purged as intended.

12) A "default pager" has now been created which cleans up the transition
of objects to the "swap" type. The previous checks throughout the code
for swp->pg_data != NULL were really ugly. This change also provides
the rudiments for future backing of "anonymous" memory by something
other than the swap pager (via the vnode pager, for example), and it
allows the decision about which of these pagers to use to be made
dynamically (although will need some additional decision code to do
this, of course).

13) (dyson) MAP_COPY has been deprecated and the corresponding "copy
object" code has been removed. MAP_COPY was undocumented and non-
standard. It was furthermore broken in several ways which caused its
behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will
continue to work correctly, but via the slightly different semantics
of MAP_PRIVATE.

14) (dyson) Sharing maps have been removed. It's marginal usefulness in a
threads design can be worked around in other ways. Both #12 and #13
were done to simplify the code and improve readability and maintain-
ability. (As were most all of these changes)

TODO:

1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing
this will reduce the vnode pager to a mere fraction of its current size.

2) Rewrite vm_fault and the swap/vnode pagers to use the clustering
information provided by the new haspage pager interface. This will
substantially reduce the overhead by eliminating a large number of
VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be
improved to provide both a "behind" and "ahead" indication of
contiguousness.

3) Implement the extended features of pager_haspage in swap_pager_haspage().
It currently just says 0 pages ahead/behind.

4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps
via a much more general mechanism that could also be used for disk
striping of regular filesystems.

5) Do something to improve the architecture of vm_object_collapse(). The
fact that it makes calls into the swap pager and knows too much about
how the swap pager operates really bothers me. It also doesn't allow
for collapsing of non-swap pager objects ("unnamed" objects backed by
other pagers).


# 9456 09-Jul-1995 dg

Moved call to VOP_GETATTR() out of vnode_pager_alloc() and into the places
that call vnode_pager_alloc() so that a failure return can be dealt with.
This fixes a panic seen on NFS clients when a file being opened is deleted
on the server before the open completes.


# 9356 28-Jun-1995 dg

1) Converted v_vmdata to v_object.
2) Removed unnecessary vm_object_lookup()/pager_cache(object, TRUE) pairs
after vnode_pager_alloc() calls - the object is already guaranteed to be
persistent.
3) Removed some gratuitous casts.


# 9336 27-Jun-1995 dfr

Changes to support version 3 of the NFS protocol.
The version 2 support has been tested (client+server) against FreeBSD-2.0,
IRIX 5.3 and FreeBSD-current (using a loopback mount). The version 2 support
is stable AFAIK.
The version 3 support has been tested with a loopback mount and minimally
against an IRIX 5.3 server. It needs more testing and may have problems.
I have patched amd to support the new variable length filehandles although
it will still only use version 2 of the protocol.

Before booting a kernel with these changes, nfs clients will need to at least
build and install /usr/sbin/mount_nfs. Servers will need to build and
install /usr/sbin/mountd.

NFS diskless support is untested.

Obtained from: Rick Macklem <rick@snowhite.cis.uoguelph.ca>


# 9221 14-Jun-1995 joerg

The duplicate information returned in fa_type and fa_mode
is an ambiguity in the NFS version 2 protocol.

VREG should be taken literally as a regular file. If a
server intents to return some type information differently
in the upper bits of the mode field (e.g. for sockets, or
FIFOs), NFSv2 mandates fa_type to be VNON. Anyway, we
leave the examination of the mode bits even in the VREG
case to avoid breakage for bogus servers, but we make sure
that there are actually type bits set in the upper part of
fa_mode (and failing that, trust the va_type field).

NFSv3 cleared the issue, and requires fa_mode to not
contain any type information (while also introduing sockets
and FIFOs for fa_type).

The fix has been tested against a variety of NFS servers.
It fixes problems with the ``Tropic'' NFS server for Windows,
while apparently not breaking anything.

Pointed-out by: scott@zorch.sf-bay.org (Scott Hazen Mueller)


# 8876 30-May-1995 rgrimes

Remove trailing whitespace.


# 8832 29-May-1995 dg

Fixed some serious bugs that resulted in object reference counts not being
handled correctly. This would manifest itself as "object deallocated too
many times" panics and perhaps other strange inconsistencies on NFS servers.

Reviewed by: me, of course
Submitted by: John Dyson


# 7969 21-Apr-1995 dyson

Slight re-ordering of the creation of a vmio object to fix a condition
that can cause NFS I/O failures.


# 7090 16-Mar-1995 bde

Add and move declarations to fix all of the warnings from `gcc -Wimplicit'
(except in netccitt, netiso and netns) and most of the warnings from
`gcc -Wnested-externs'. Fix all the bugs found. There were no serious
ones.


# 6420 15-Feb-1995 phk

YF fix.


# 6417 15-Feb-1995 dg

Fixed two more bugs related to the merged cache changes.

Submitted by: John Dyson


# 6210 06-Feb-1995 dg

Changed order of release of vnode/object to fix a problem where the vnode
is freed with an old object still attached (subsequently causing a panic).
Fixes NFS server panic "object/pager mismatch".

Submitted by: John Dyson


# 5455 09-Jan-1995 dg

These changes embody the support of the fully coherent merged VM buffer cache,
much higher filesystem I/O performance, and much better paging performance. It
represents the culmination of over 6 months of R&D.

The majority of the merged VM/cache work is by John Dyson.

The following highlights the most significant changes. Additionally, there are
(mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to
support the new VM/buffer scheme.

vfs_bio.c:
Significant rewrite of most of vfs_bio to support the merged VM buffer cache
scheme. The scheme is almost fully compatible with the old filesystem
interface. Significant improvement in the number of opportunities for write
clustering.

vfs_cluster.c, vfs_subr.c
Upgrade and performance enhancements in vfs layer code to support merged
VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.

vm_object.c:
Yet more improvements in the collapse code. Elimination of some windows that
can cause list corruption.

vm_pageout.c:
Fixed it, it really works better now. Somehow in 2.0, some "enhancements"
broke the code. This code has been reworked from the ground-up.

vm_fault.c, vm_page.c, pmap.c, vm_object.c
Support for small-block filesystems with merged VM/buffer cache scheme.

pmap.c vm_map.c
Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of
kernel PTs.

vm_glue.c
Much simpler and more effective swapping code. No more gratuitous swapping.

proc.h
Fixed the problem that the p_lock flag was not being cleared on a fork.

swap_pager.c, vnode_pager.c
Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the
code doesn't need it anymore.

machdep.c
Changes to better support the parameter values for the merged VM/buffer cache
scheme.

machdep.c, kern_exec.c, vm_glue.c
Implemented a seperate submap for temporary exec string space and another one
to contain process upages. This eliminates all map fragmentation problems
that previously existed.

ffs_inode.c, ufs_inode.c, ufs_readwrite.c
Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on
busy buffers.

Submitted by: John Dyson and David Greenman


# 3664 17-Oct-1994 phk

This is a bunch of changes from NetBSD. There are a couple of bug-fixes.
But mostly it is changes to use the list-maintenance macros instead of
doing the pointer-gymnastics by hand.

Obtained from: NetBSD


# 3305 02-Oct-1994 phk

Prototyping and general gcc-shutting up. Gcc has one warning now which looks
bad, I will get to it eventually, unless somebody beats me to it.


# 2997 22-Sep-1994 wollman

Make NFS loadable.


# 2979 22-Sep-1994 wollman

More loadable VFS changes:

- Make a number of filesystems work again when they are statically compiled
(blush)

- FIFOs are no longer optional; ``options FIFO'' removed from distributed
config files.


# 1817 02-Aug-1994 dg

Added $Id$


# 1549 25-May-1994 rgrimes

The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.

Reviewed by: Rodney W. Grimes
Submitted by: John Dyson and David Greenman


# 1541 24-May-1994 rgrimes

BSD 4.4 Lite Kernel Sources