History log of /openbsd-current/sys/nfs/nfs_bio.c
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 1.86 01-May-2024 jsg

remove unneeded includes
ok miod@ mpi@


# 1.85 30-Apr-2024 miod

Do not cast off_t to u_long in uvm_vnp_setsize call (only misbehaves on 32-bit
platforms.)

ok mpi@


Revision tags: OPENBSD_6_6_BASE OPENBSD_6_7_BASE OPENBSD_6_8_BASE OPENBSD_6_9_BASE OPENBSD_7_0_BASE OPENBSD_7_1_BASE OPENBSD_7_2_BASE OPENBSD_7_3_BASE OPENBSD_7_4_BASE OPENBSD_7_5_BASE
# 1.84 25-Jul-2019 cheloha

vinvalbuf(9): tlseep -> tsleep_nsec(9); ok millert@


# 1.83 19-Jul-2019 cheloha

getblk(9): tsleep(9) -> tsleep_nsec(9); ok visa@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.82 22-Feb-2017 mpi

Keep local definitions local.

"good work" deraadt@, ok visa@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.81 13-Feb-2016 stefan

Convert to uiomove. From Martin Natano.


Revision tags: OPENBSD_5_8_BASE
# 1.80 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.79 10-Feb-2015 miod

First step towards making uiomove() take a size_t size argument:
- rename uiomove() to uiomovei() and update all its users.
- introduce uiomove(), which is similar to uiomovei() but with a size_t.
- rewrite uiomovei() as an uiomove() wrapper.
ok kettenis@


# 1.78 18-Dec-2014 tedu

delete a whole mess of unnecessary caddr_t casts


# 1.77 14-Nov-2014 tedu

bzero -> memset


Revision tags: OPENBSD_5_6_BASE
# 1.76 08-Jul-2014 deraadt

decouple struct uvmexp into a new file, so that uvm_extern.h and sysctl.h
don't need to be married.
ok guenther miod beck jsing kettenis


Revision tags: OPENBSD_5_5_BASE
# 1.75 14-Sep-2013 guenther

Correct the handling of I/O of >=2^32 bytes and the ktracing there of
by using size_t/ssize_t instead of int/u_int to handle I/O lengths in
uiomove(), vn_fsizechk(), and ktrgenio(). Eliminate the always-zero
'error' argument to ktrgenio() at the same time.


Revision tags: OPENBSD_5_4_BASE
# 1.74 11-Jun-2013 deraadt

final removal of daddr64_t. daddr_t has been 64 bit for a long enough
test period; i think 3 years ago the last bugs fell out.
ok otto beck others


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.73 11-Jul-2012 guenther

If the current offset is strictly less than the process filesize
rlimit, then a write that would take it over the limit should be
clamped, making it a partial write.

ok beck@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.72 07-Aug-2010 krw

No "\n" needed at the end of panic() strings.

Bogus chunks pointed out by matthew@ and miod@. No cookies for
marco@ and jasper@.

ok deraadt@ miod@ matthew@ jasper@ macro@


# 1.71 12-Apr-2010 beck

Don't jump the queue if we have to wait on the client side because
the nfs_bufq is full - instead tsleep waiting for one of our nfsiod's
to free up space for us in the queue so we can enqueue on the end.

ok blambert@, tedu@, oga@


# 1.70 09-Apr-2010 oga

make more bettah. instead of doing:

switch(type) {
case VREG:
/*something */
break;
case VLNK:
/* something */
break;
default:
panic("wtf?");
}

do_something_that_doesn't_change_type();

switch(type) {
case VREG:
/* nowt */
break;
case VLNK:
n = 0;
break;
default:
panic("wtf?");
}

be a bit less silly and replace the second switch with:

if (type == VLNK)
n = 0;

ok beck@, blambert@


# 1.69 09-Apr-2010 oga

In the nfs bio functions, instead of looking at an invalid vnode type,
deciding to do nothing, printing about it and continuing along our merry
way without even erroring the sodding buffer, just panic. by this point
we are liked very fucked up anyway.

found in either edmonton or stockholm then forgotten. ok beck@,
blambert@


Revision tags: OPENBSD_4_7_BASE
# 1.68 19-Oct-2009 jsg

antsy
no binary change apart from nfsm_reqhead() which is clearly correct.

ok thib@


# 1.67 02-Sep-2009 thib

Backout the asyncio/aiod change, as it causes buf's to get hung.
problem noticed by deraadt@

ok beck@


# 1.66 27-Aug-2009 thib

Garbage collect two variables that where set but unused.
Tiny spacing nit.
Fix a typo, pointed out by miod@.


# 1.65 27-Aug-2009 thib

introduce a flag member to struct nfs_aiod, and use flags instead of the exit
and worked members. nad_worked becomes NFSAIOD_WAKEUP, which is set after if
an aiod was removed from the idle list and woken up by nfs_asyncio().

don't rely on tsleep wchans being unique, that is keep going back to sleep if
woken up unless the NFSAIOD_WAKEUP flag is set.

fix a divide by zero crash if nfs.vfs.iothreads is set to 0, as that can happen
when we recalculate the maximum buf's to queue up for each aiod.

in nfs_asyncio() set the nad_mnt to NULL before returning the aiod back to the
idle list in the case where we have already queued up to many bufs, otherwise
we trip an assertion.

minimize the time we are holding the nfs_aiodl_mtx to only when we are inserting
or removing from the lists, with the exception of nfs_set_naiod() as it would
make the loops more complicated and its uncommon in any case.

tested by myself and deraadt@
"fine with me" deraadt@


# 1.64 26-Aug-2009 thib

make sure that an aiod has been removed from the nfs_aiods_idle list
before inserting it back into the list.

crashes debugged with help from deraadt@ who also tested this fix.


# 1.63 20-Aug-2009 thib

Rework the way we do async I/O in nfs. Introduce separate buf queues for
each mount, and when work is "found", peg an aiod to that mount todo the
I/O. Make nfs_asyncio() a bit smarter when deciding when to do asyncio
and when to force it sync, this is done by keeping the aiod's one two lists,
an "idle" and an "all" list, so asyncio is only done when there are aiods
hanging around todo it for us or are already pegged to the mount.

Idea liked by at least beck@ (and I think art@).
Extensive testing done by myself and jasper and a few others on various
arch's.

Ideas/Code from Net/Free.

OK blambert@.


# 1.62 28-Jul-2009 art

Using the buf pointer returned from incore is a really bad idea.
Even if we know that someone safely holds B_BUSY and will not modify
the buf (as was the case in here), we still need to be sure that
the B_BUSY will not be released while we fiddle with the buf.

In this case, it was not safe, since copyout can sleep and whoever was
writing out the buf could finish the write and release the buf which
could then get recycled or unmapped while we slept. Always acquire
B_BUSY ourselves, even when it might give a minor performance penalty.

thib@ ok


# 1.61 22-Jul-2009 thib

remove a comment thats part lie and part stating the obvious.

ok blambert@


# 1.60 20-Jul-2009 thib

(struct foo *)0 -> NULL, every where I could find it.

OK blambert@


Revision tags: OPENBSD_4_6_BASE
# 1.59 23-Jun-2009 jasper

- /dev/drum is long gone; sync comment with reality

ok thib@


# 1.58 19-Mar-2009 oga

We don't count buffercache stats in the B_PHYS case, so fix nfs to not
increment the num{read,write} and pending{read,write} statistics in that
case, since biodone won't change them on completion.

On another note, I'm not sure that we use physical buffers for swapping
over nfs anymore, so this chunk may be superfluous.

beck@ came up with the same diff "So anyway rather than me commiting it
from my copy, I'm giving you the OK and the commit. since it officially
makes you a buffer cache and NFS hacker };-)"


Revision tags: OPENBSD_4_5_BASE
# 1.57 24-Jan-2009 thib

Use a timespec instead of a time_t for the clients nfsnode
mtime, gives us better granularity, helps with cache consistency.

Idea lifted from NetBSD.

OK blambert@


# 1.56 19-Jan-2009 thib

Introduce a macro to invalidate the attribute
cache instead of setting n_attrstamp to 0 directly.

Lift the macro name from NetBSD.
prompted by and OK blambert@


# 1.55 09-Aug-2008 thib

o nfs_vinvalbuf() is always called with the intrflag as 1, and then
checks if the mount is actually interrutable, and if not sets it 0.
remove this argument from nfs_vinvalbuf and just do the checking inside
the function.
o give nfs_vinvalbuf() a makeover so it looks nice. (spacing, casts, &c);
o Actually pass PCATCH too tsleep() if the mount it interrutable.

ok art@, blambert@


# 1.54 08-Aug-2008 blambert

After beck@ changed the way nfsiod's are notified of work, the
nfs_iodwant array became unused. Garbage collect and free up
a few bytes.

ok thib@


Revision tags: OPENBSD_4_4_BASE
# 1.53 25-Jul-2008 beck

much more correct way of dealing with nfs pending reads/writes
ok thib@


# 1.52 23-Jul-2008 beck

Correct cases of mishandling of pending reads and writes to prevent
them going negative - this consists of identifying a number of cases of
IO not going through the buffer cache and marking those buffers with
B_RAW - as well as fixing nfs_bio to show pending writes and reads through
the buffer cache via NFS

still has a problem with mishandling the counters I believe in the
async/sync fallback case where counters stay positive which will be
addressed seperately.

ok tedu@ deraadt@


# 1.51 14-Jun-2008 beck

Ensure each nfsiod can actually enqueue more than one asynchio - this mirrors
the accidental situation that used to happen when it leaked buffers and allowed
the syncer to do it, however this puts a limit on how much of the buffer cache
it is allowed to consume to a sensible amount - improves nfs write performance
since we don't have to do tons of them synch now.

Modifies the existing code to use wakeup_one instead of cruft, and now
all nfsiod's tsleep the same way.

ok thib@ art@


# 1.50 12-Jun-2008 thib

add a statistic bit to count how often we change async to sync

you need to upgrade nfsstat and the relevant header files

ok beck@


# 1.49 12-Jun-2008 art

if (something_complicated)
return (EIO);
return (EIO);
is kinda silly. Don't.

Prettify a bit in the process.

'makes perfect sense' blambert@, ok thib@


# 1.48 12-Jun-2008 thib

Actually return an error in nfs_asyncio() if we fail to process
the buf due too all of the nfs iod's being busy; this downgrades
the write to a sync one and allows to handle this.

ok art@, beck@


# 1.47 11-Jun-2008 blambert

Canonical for() -> queue.h FOREACH macro conversions.
Also, it is historical practice to #include <sys/queue.h>
when using queue.h macros.

ok thib@ krw@

special thanks to krw@ for reminders vice violence


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.46 01-Jun-2007 deraadt

pedro ok'd this ~3500 line diff which removes the vop argument
"ap = v" comments in under 8 seconds, so it must be ok. and it compiles
too.


# 1.45 01-Jun-2007 thib

daddr_t -> daddr64_t;
Basically the usage of daddr_t was to math out arguments to
nfs_getcacheblk, wich calls getblk();

ok deraadt@


Revision tags: OPENBSD_4_1_BASE
# 1.44 29-Nov-2006 miod

Kernel stack can be swapped. This means that stuff that's on the stack
should never be referenced outside the context of the process to which
this stack belongs unless we do the PHOLD/PRELE dance. Loads of code
doesn't follow the rules here. Instead of trying to track down all
offenders and fix this hairy situation, it makes much more sense
to not swap kernel stacks.

From art@, tested by many some time ago.


# 1.43 01-Nov-2006 thib

move the declaration of nfsstats from nfs_bio.c to
nfs_subs.c so it gets pulled in for NFSSERVER only
kernels.

ok deraadt@,krw@


Revision tags: OPENBSD_4_0_BASE
# 1.42 20-Apr-2006 pedro

Remove unused debug code that sneaked in by accident long ago


Revision tags: OPENBSD_3_9_BASE
# 1.41 31-Oct-2005 otto

Fix reading large files; from NetBSD. Somehow this was overlooked
when earlier merges were done. Fixes PR 4250. ok millert@ deraadt@


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.40 03-Aug-2004 marius

NFS commit coalescion: instead of sending a commit for each block, coalesce
these into larger ranges wherever possible.

this should speed up NFS writes quite a bit.

ok art@ millert@ pedro@ tedu@


# 1.39 21-Jul-2004 marius

kqueue support for NFS, adapted from netbsd.

ok art@ pedro@, "get it in" deraadt@


Revision tags: OPENBSD_3_4_BASE OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.38 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: UBC_SYNC_A
# 1.37 13-May-2003 jason

Kill a bunch more commons (very few left =)


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_B
# 1.36 21-May-2002 art

Protect calls to biodone with splbio. Some functions called
by biodone assume splbio (probably just on other filesystems) and some
callbacks from b_iodone assume it too. It's just much safer.
costa@ ok.


Revision tags: OPENBSD_3_1_BASE
# 1.35 08-Feb-2002 csapuntz

There are NFS servers where it's possible to modify a symbolic link. Remove aggressive optimization


# 1.34 16-Jan-2002 ericj

use queue.h macro's
remove register


# 1.33 19-Dec-2001 art

UBC was a disaster. It worked very good when it worked, but on some
machines or some configurations or in some phase of the moon (we actually
don't know when or why) files disappeared. Since we've not been able to
track down the problem in two weeks intense debugging and we need -current
to be stable, back out everything to a state it had before UBC.

We apologise for the inconvenience.


Revision tags: UBC_BASE
# 1.32 14-Dec-2001 art

branches: 1.32.2;
Workaround a compiler bug on m68k.


# 1.31 10-Dec-2001 art

Merge in struct uvm_vnode into struct vnode.


# 1.30 30-Nov-2001 art

Whooops.
Stop returning EINPROGRESS now that the caller doesn't understand it
anymore.


# 1.29 30-Nov-2001 csapuntz

Call buf_cleanout, which handles wakeups


# 1.28 29-Nov-2001 art

Make sure the nfs vnodes are on the syncer worklist.


# 1.27 29-Nov-2001 art

Make sure the whole buffer is initialized before calling bgetvp.
Recommended by csapuntz@


# 1.26 29-Nov-2001 art

Correctly handle b_vp with bgetvp and brelvp in {get,put}pages.
Prevents panics caused by vnodes being recycled under our feet.


# 1.25 27-Nov-2001 art

Merge in the unified buffer cache code as found in NetBSD 2001/03/10. The
code is written mostly by Chuck Silvers <chuq@chuq.com>/<chs@netbsd.org>.

Tested for the past few weeks by many developers, should be in a pretty stable
state, but will require optimizations and additional cleanups.


# 1.24 15-Nov-2001 art

Remove creds from struct buf, move the creds that nfs need into the nfs node.
While in the area, convert nfs node allocation from malloc to pool and do
some cleanups.
Based on the UBC changes in NetBSD. niklas@ ok.


# 1.23 06-Nov-2001 miod

Replace inclusion of <vm/foo.h> with the correct <uvm/bar.h> when necessary.
(Look ma, I might have broken the tree)


Revision tags: OPENBSD_3_0_BASE
# 1.22 27-Jun-2001 art

Remove old vm.


# 1.21 25-Jun-2001 csapuntz

Get rid of some dead code caused by the last commit


# 1.20 25-Jun-2001 csapuntz

Remove NQNFS


# 1.19 25-Jun-2001 csapuntz

Get rid of old directory caching scheme which caused persistent duplicates.

Still not correct for NFSv3 but that's hard.


Revision tags: OPENBSD_2_9_BASE
# 1.18 23-Feb-2001 csapuntz

Change the B_DELWRI flag using buf_dirty and buf_undirty instead of
manually twiddling it. This allows the buffer cache to more easily
keep track of dirty buffers and decide when it is appropriate to speed
up the syncer.

Insipired by FreeBSD.
Look over by art@


# 1.17 23-Feb-2001 csapuntz

Remove the clustering fields from the vnodes and place them in the
file system inode instead


Revision tags: OPENBSD_2_8_BASE
# 1.16 23-Jun-2000 mickey

remove obsolete vtrace guts; art@


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE OPENBSD_2_7_BASE SMP_BASE kame_19991208
# 1.15 26-Feb-1999 art

branches: 1.15.6;
compatibility with uvm vnode pager


Revision tags: OPENBSD_2_3_BASE OPENBSD_2_4_BASE
# 1.14 02-Dec-1997 csapuntz

More splbio()'s added so that reassignbuf can do its thing.


# 1.13 06-Nov-1997 csapuntz

Updates for VFS Lite 2 + soft update.


Revision tags: OPENBSD_2_2_BASE
# 1.12 06-Oct-1997 deraadt

back out vfs lite2 till after 2.2


# 1.11 06-Oct-1997 csapuntz

VFS Lite2 Changes


Revision tags: OPENBSD_2_0_BASE OPENBSD_2_1_BASE
# 1.10 27-Jul-1996 deraadt

fvdl; Don't mistake a non-async block that needs to be commited for an
interrupted write.


# 1.9 21-Jul-1996 tholo

Ensure we never use more than one callout table slot


# 1.8 14-Jun-1996 tholo

Keep dirty list used by in-kernel update(8) in sync with buffers


# 1.7 28-May-1996 deraadt

sync


# 1.6 21-Apr-1996 deraadt

partial sync with netbsd 960418, more to come


# 1.5 17-Apr-1996 mickey

Minor cleanups. Checked against Lite2.
(NetBSD's was really just a Lite2's, but w/ 64bit support)


# 1.4 31-Mar-1996 mickey

From NetBSD: NFSv3 import (tomorrow's Net's kernel)
Open's patches kept in. i'll possibly take a look at Lite2 soon,
is there smth usefull ?..


# 1.3 29-Feb-1996 niklas

From NetBSD: merge with 960217 (still NFSv2)


# 1.2 08-Jan-1996 dm

graichen@freebsd.org: fixed -type:=direct mounts in amd


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.84 25-Jul-2019 cheloha

vinvalbuf(9): tlseep -> tsleep_nsec(9); ok millert@


# 1.83 19-Jul-2019 cheloha

getblk(9): tsleep(9) -> tsleep_nsec(9); ok visa@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.82 22-Feb-2017 mpi

Keep local definitions local.

"good work" deraadt@, ok visa@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.81 13-Feb-2016 stefan

Convert to uiomove. From Martin Natano.


Revision tags: OPENBSD_5_8_BASE
# 1.80 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.79 10-Feb-2015 miod

First step towards making uiomove() take a size_t size argument:
- rename uiomove() to uiomovei() and update all its users.
- introduce uiomove(), which is similar to uiomovei() but with a size_t.
- rewrite uiomovei() as an uiomove() wrapper.
ok kettenis@


# 1.78 18-Dec-2014 tedu

delete a whole mess of unnecessary caddr_t casts


# 1.77 14-Nov-2014 tedu

bzero -> memset


Revision tags: OPENBSD_5_6_BASE
# 1.76 08-Jul-2014 deraadt

decouple struct uvmexp into a new file, so that uvm_extern.h and sysctl.h
don't need to be married.
ok guenther miod beck jsing kettenis


Revision tags: OPENBSD_5_5_BASE
# 1.75 14-Sep-2013 guenther

Correct the handling of I/O of >=2^32 bytes and the ktracing there of
by using size_t/ssize_t instead of int/u_int to handle I/O lengths in
uiomove(), vn_fsizechk(), and ktrgenio(). Eliminate the always-zero
'error' argument to ktrgenio() at the same time.


Revision tags: OPENBSD_5_4_BASE
# 1.74 11-Jun-2013 deraadt

final removal of daddr64_t. daddr_t has been 64 bit for a long enough
test period; i think 3 years ago the last bugs fell out.
ok otto beck others


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.73 11-Jul-2012 guenther

If the current offset is strictly less than the process filesize
rlimit, then a write that would take it over the limit should be
clamped, making it a partial write.

ok beck@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.72 07-Aug-2010 krw

No "\n" needed at the end of panic() strings.

Bogus chunks pointed out by matthew@ and miod@. No cookies for
marco@ and jasper@.

ok deraadt@ miod@ matthew@ jasper@ macro@


# 1.71 12-Apr-2010 beck

Don't jump the queue if we have to wait on the client side because
the nfs_bufq is full - instead tsleep waiting for one of our nfsiod's
to free up space for us in the queue so we can enqueue on the end.

ok blambert@, tedu@, oga@


# 1.70 09-Apr-2010 oga

make more bettah. instead of doing:

switch(type) {
case VREG:
/*something */
break;
case VLNK:
/* something */
break;
default:
panic("wtf?");
}

do_something_that_doesn't_change_type();

switch(type) {
case VREG:
/* nowt */
break;
case VLNK:
n = 0;
break;
default:
panic("wtf?");
}

be a bit less silly and replace the second switch with:

if (type == VLNK)
n = 0;

ok beck@, blambert@


# 1.69 09-Apr-2010 oga

In the nfs bio functions, instead of looking at an invalid vnode type,
deciding to do nothing, printing about it and continuing along our merry
way without even erroring the sodding buffer, just panic. by this point
we are liked very fucked up anyway.

found in either edmonton or stockholm then forgotten. ok beck@,
blambert@


Revision tags: OPENBSD_4_7_BASE
# 1.68 19-Oct-2009 jsg

antsy
no binary change apart from nfsm_reqhead() which is clearly correct.

ok thib@


# 1.67 02-Sep-2009 thib

Backout the asyncio/aiod change, as it causes buf's to get hung.
problem noticed by deraadt@

ok beck@


# 1.66 27-Aug-2009 thib

Garbage collect two variables that where set but unused.
Tiny spacing nit.
Fix a typo, pointed out by miod@.


# 1.65 27-Aug-2009 thib

introduce a flag member to struct nfs_aiod, and use flags instead of the exit
and worked members. nad_worked becomes NFSAIOD_WAKEUP, which is set after if
an aiod was removed from the idle list and woken up by nfs_asyncio().

don't rely on tsleep wchans being unique, that is keep going back to sleep if
woken up unless the NFSAIOD_WAKEUP flag is set.

fix a divide by zero crash if nfs.vfs.iothreads is set to 0, as that can happen
when we recalculate the maximum buf's to queue up for each aiod.

in nfs_asyncio() set the nad_mnt to NULL before returning the aiod back to the
idle list in the case where we have already queued up to many bufs, otherwise
we trip an assertion.

minimize the time we are holding the nfs_aiodl_mtx to only when we are inserting
or removing from the lists, with the exception of nfs_set_naiod() as it would
make the loops more complicated and its uncommon in any case.

tested by myself and deraadt@
"fine with me" deraadt@


# 1.64 26-Aug-2009 thib

make sure that an aiod has been removed from the nfs_aiods_idle list
before inserting it back into the list.

crashes debugged with help from deraadt@ who also tested this fix.


# 1.63 20-Aug-2009 thib

Rework the way we do async I/O in nfs. Introduce separate buf queues for
each mount, and when work is "found", peg an aiod to that mount todo the
I/O. Make nfs_asyncio() a bit smarter when deciding when to do asyncio
and when to force it sync, this is done by keeping the aiod's one two lists,
an "idle" and an "all" list, so asyncio is only done when there are aiods
hanging around todo it for us or are already pegged to the mount.

Idea liked by at least beck@ (and I think art@).
Extensive testing done by myself and jasper and a few others on various
arch's.

Ideas/Code from Net/Free.

OK blambert@.


# 1.62 28-Jul-2009 art

Using the buf pointer returned from incore is a really bad idea.
Even if we know that someone safely holds B_BUSY and will not modify
the buf (as was the case in here), we still need to be sure that
the B_BUSY will not be released while we fiddle with the buf.

In this case, it was not safe, since copyout can sleep and whoever was
writing out the buf could finish the write and release the buf which
could then get recycled or unmapped while we slept. Always acquire
B_BUSY ourselves, even when it might give a minor performance penalty.

thib@ ok


# 1.61 22-Jul-2009 thib

remove a comment thats part lie and part stating the obvious.

ok blambert@


# 1.60 20-Jul-2009 thib

(struct foo *)0 -> NULL, every where I could find it.

OK blambert@


Revision tags: OPENBSD_4_6_BASE
# 1.59 23-Jun-2009 jasper

- /dev/drum is long gone; sync comment with reality

ok thib@


# 1.58 19-Mar-2009 oga

We don't count buffercache stats in the B_PHYS case, so fix nfs to not
increment the num{read,write} and pending{read,write} statistics in that
case, since biodone won't change them on completion.

On another note, I'm not sure that we use physical buffers for swapping
over nfs anymore, so this chunk may be superfluous.

beck@ came up with the same diff "So anyway rather than me commiting it
from my copy, I'm giving you the OK and the commit. since it officially
makes you a buffer cache and NFS hacker };-)"


Revision tags: OPENBSD_4_5_BASE
# 1.57 24-Jan-2009 thib

Use a timespec instead of a time_t for the clients nfsnode
mtime, gives us better granularity, helps with cache consistency.

Idea lifted from NetBSD.

OK blambert@


# 1.56 19-Jan-2009 thib

Introduce a macro to invalidate the attribute
cache instead of setting n_attrstamp to 0 directly.

Lift the macro name from NetBSD.
prompted by and OK blambert@


# 1.55 09-Aug-2008 thib

o nfs_vinvalbuf() is always called with the intrflag as 1, and then
checks if the mount is actually interrutable, and if not sets it 0.
remove this argument from nfs_vinvalbuf and just do the checking inside
the function.
o give nfs_vinvalbuf() a makeover so it looks nice. (spacing, casts, &c);
o Actually pass PCATCH too tsleep() if the mount it interrutable.

ok art@, blambert@


# 1.54 08-Aug-2008 blambert

After beck@ changed the way nfsiod's are notified of work, the
nfs_iodwant array became unused. Garbage collect and free up
a few bytes.

ok thib@


Revision tags: OPENBSD_4_4_BASE
# 1.53 25-Jul-2008 beck

much more correct way of dealing with nfs pending reads/writes
ok thib@


# 1.52 23-Jul-2008 beck

Correct cases of mishandling of pending reads and writes to prevent
them going negative - this consists of identifying a number of cases of
IO not going through the buffer cache and marking those buffers with
B_RAW - as well as fixing nfs_bio to show pending writes and reads through
the buffer cache via NFS

still has a problem with mishandling the counters I believe in the
async/sync fallback case where counters stay positive which will be
addressed seperately.

ok tedu@ deraadt@


# 1.51 14-Jun-2008 beck

Ensure each nfsiod can actually enqueue more than one asynchio - this mirrors
the accidental situation that used to happen when it leaked buffers and allowed
the syncer to do it, however this puts a limit on how much of the buffer cache
it is allowed to consume to a sensible amount - improves nfs write performance
since we don't have to do tons of them synch now.

Modifies the existing code to use wakeup_one instead of cruft, and now
all nfsiod's tsleep the same way.

ok thib@ art@


# 1.50 12-Jun-2008 thib

add a statistic bit to count how often we change async to sync

you need to upgrade nfsstat and the relevant header files

ok beck@


# 1.49 12-Jun-2008 art

if (something_complicated)
return (EIO);
return (EIO);
is kinda silly. Don't.

Prettify a bit in the process.

'makes perfect sense' blambert@, ok thib@


# 1.48 12-Jun-2008 thib

Actually return an error in nfs_asyncio() if we fail to process
the buf due too all of the nfs iod's being busy; this downgrades
the write to a sync one and allows to handle this.

ok art@, beck@


# 1.47 11-Jun-2008 blambert

Canonical for() -> queue.h FOREACH macro conversions.
Also, it is historical practice to #include <sys/queue.h>
when using queue.h macros.

ok thib@ krw@

special thanks to krw@ for reminders vice violence


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.46 01-Jun-2007 deraadt

pedro ok'd this ~3500 line diff which removes the vop argument
"ap = v" comments in under 8 seconds, so it must be ok. and it compiles
too.


# 1.45 01-Jun-2007 thib

daddr_t -> daddr64_t;
Basically the usage of daddr_t was to math out arguments to
nfs_getcacheblk, wich calls getblk();

ok deraadt@


Revision tags: OPENBSD_4_1_BASE
# 1.44 29-Nov-2006 miod

Kernel stack can be swapped. This means that stuff that's on the stack
should never be referenced outside the context of the process to which
this stack belongs unless we do the PHOLD/PRELE dance. Loads of code
doesn't follow the rules here. Instead of trying to track down all
offenders and fix this hairy situation, it makes much more sense
to not swap kernel stacks.

From art@, tested by many some time ago.


# 1.43 01-Nov-2006 thib

move the declaration of nfsstats from nfs_bio.c to
nfs_subs.c so it gets pulled in for NFSSERVER only
kernels.

ok deraadt@,krw@


Revision tags: OPENBSD_4_0_BASE
# 1.42 20-Apr-2006 pedro

Remove unused debug code that sneaked in by accident long ago


Revision tags: OPENBSD_3_9_BASE
# 1.41 31-Oct-2005 otto

Fix reading large files; from NetBSD. Somehow this was overlooked
when earlier merges were done. Fixes PR 4250. ok millert@ deraadt@


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.40 03-Aug-2004 marius

NFS commit coalescion: instead of sending a commit for each block, coalesce
these into larger ranges wherever possible.

this should speed up NFS writes quite a bit.

ok art@ millert@ pedro@ tedu@


# 1.39 21-Jul-2004 marius

kqueue support for NFS, adapted from netbsd.

ok art@ pedro@, "get it in" deraadt@


Revision tags: OPENBSD_3_4_BASE OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.38 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: UBC_SYNC_A
# 1.37 13-May-2003 jason

Kill a bunch more commons (very few left =)


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_B
# 1.36 21-May-2002 art

Protect calls to biodone with splbio. Some functions called
by biodone assume splbio (probably just on other filesystems) and some
callbacks from b_iodone assume it too. It's just much safer.
costa@ ok.


Revision tags: OPENBSD_3_1_BASE
# 1.35 08-Feb-2002 csapuntz

There are NFS servers where it's possible to modify a symbolic link. Remove aggressive optimization


# 1.34 16-Jan-2002 ericj

use queue.h macro's
remove register


# 1.33 19-Dec-2001 art

UBC was a disaster. It worked very good when it worked, but on some
machines or some configurations or in some phase of the moon (we actually
don't know when or why) files disappeared. Since we've not been able to
track down the problem in two weeks intense debugging and we need -current
to be stable, back out everything to a state it had before UBC.

We apologise for the inconvenience.


Revision tags: UBC_BASE
# 1.32 14-Dec-2001 art

branches: 1.32.2;
Workaround a compiler bug on m68k.


# 1.31 10-Dec-2001 art

Merge in struct uvm_vnode into struct vnode.


# 1.30 30-Nov-2001 art

Whooops.
Stop returning EINPROGRESS now that the caller doesn't understand it
anymore.


# 1.29 30-Nov-2001 csapuntz

Call buf_cleanout, which handles wakeups


# 1.28 29-Nov-2001 art

Make sure the nfs vnodes are on the syncer worklist.


# 1.27 29-Nov-2001 art

Make sure the whole buffer is initialized before calling bgetvp.
Recommended by csapuntz@


# 1.26 29-Nov-2001 art

Correctly handle b_vp with bgetvp and brelvp in {get,put}pages.
Prevents panics caused by vnodes being recycled under our feet.


# 1.25 27-Nov-2001 art

Merge in the unified buffer cache code as found in NetBSD 2001/03/10. The
code is written mostly by Chuck Silvers <chuq@chuq.com>/<chs@netbsd.org>.

Tested for the past few weeks by many developers, should be in a pretty stable
state, but will require optimizations and additional cleanups.


# 1.24 15-Nov-2001 art

Remove creds from struct buf, move the creds that nfs need into the nfs node.
While in the area, convert nfs node allocation from malloc to pool and do
some cleanups.
Based on the UBC changes in NetBSD. niklas@ ok.


# 1.23 06-Nov-2001 miod

Replace inclusion of <vm/foo.h> with the correct <uvm/bar.h> when necessary.
(Look ma, I might have broken the tree)


Revision tags: OPENBSD_3_0_BASE
# 1.22 27-Jun-2001 art

Remove old vm.


# 1.21 25-Jun-2001 csapuntz

Get rid of some dead code caused by the last commit


# 1.20 25-Jun-2001 csapuntz

Remove NQNFS


# 1.19 25-Jun-2001 csapuntz

Get rid of old directory caching scheme which caused persistent duplicates.

Still not correct for NFSv3 but that's hard.


Revision tags: OPENBSD_2_9_BASE
# 1.18 23-Feb-2001 csapuntz

Change the B_DELWRI flag using buf_dirty and buf_undirty instead of
manually twiddling it. This allows the buffer cache to more easily
keep track of dirty buffers and decide when it is appropriate to speed
up the syncer.

Insipired by FreeBSD.
Look over by art@


# 1.17 23-Feb-2001 csapuntz

Remove the clustering fields from the vnodes and place them in the
file system inode instead


Revision tags: OPENBSD_2_8_BASE
# 1.16 23-Jun-2000 mickey

remove obsolete vtrace guts; art@


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE OPENBSD_2_7_BASE SMP_BASE kame_19991208
# 1.15 26-Feb-1999 art

branches: 1.15.6;
compatibility with uvm vnode pager


Revision tags: OPENBSD_2_3_BASE OPENBSD_2_4_BASE
# 1.14 02-Dec-1997 csapuntz

More splbio()'s added so that reassignbuf can do its thing.


# 1.13 06-Nov-1997 csapuntz

Updates for VFS Lite 2 + soft update.


Revision tags: OPENBSD_2_2_BASE
# 1.12 06-Oct-1997 deraadt

back out vfs lite2 till after 2.2


# 1.11 06-Oct-1997 csapuntz

VFS Lite2 Changes


Revision tags: OPENBSD_2_0_BASE OPENBSD_2_1_BASE
# 1.10 27-Jul-1996 deraadt

fvdl; Don't mistake a non-async block that needs to be commited for an
interrupted write.


# 1.9 21-Jul-1996 tholo

Ensure we never use more than one callout table slot


# 1.8 14-Jun-1996 tholo

Keep dirty list used by in-kernel update(8) in sync with buffers


# 1.7 28-May-1996 deraadt

sync


# 1.6 21-Apr-1996 deraadt

partial sync with netbsd 960418, more to come


# 1.5 17-Apr-1996 mickey

Minor cleanups. Checked against Lite2.
(NetBSD's was really just a Lite2's, but w/ 64bit support)


# 1.4 31-Mar-1996 mickey

From NetBSD: NFSv3 import (tomorrow's Net's kernel)
Open's patches kept in. i'll possibly take a look at Lite2 soon,
is there smth usefull ?..


# 1.3 29-Feb-1996 niklas

From NetBSD: merge with 960217 (still NFSv2)


# 1.2 08-Jan-1996 dm

graichen@freebsd.org: fixed -type:=direct mounts in amd


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


# 1.83 19-Jul-2019 cheloha

getblk(9): tsleep(9) -> tsleep_nsec(9); ok visa@


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE OPENBSD_6_3_BASE OPENBSD_6_4_BASE OPENBSD_6_5_BASE
# 1.82 22-Feb-2017 mpi

Keep local definitions local.

"good work" deraadt@, ok visa@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.81 13-Feb-2016 stefan

Convert to uiomove. From Martin Natano.


Revision tags: OPENBSD_5_8_BASE
# 1.80 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.79 10-Feb-2015 miod

First step towards making uiomove() take a size_t size argument:
- rename uiomove() to uiomovei() and update all its users.
- introduce uiomove(), which is similar to uiomovei() but with a size_t.
- rewrite uiomovei() as an uiomove() wrapper.
ok kettenis@


# 1.78 18-Dec-2014 tedu

delete a whole mess of unnecessary caddr_t casts


# 1.77 14-Nov-2014 tedu

bzero -> memset


Revision tags: OPENBSD_5_6_BASE
# 1.76 08-Jul-2014 deraadt

decouple struct uvmexp into a new file, so that uvm_extern.h and sysctl.h
don't need to be married.
ok guenther miod beck jsing kettenis


Revision tags: OPENBSD_5_5_BASE
# 1.75 14-Sep-2013 guenther

Correct the handling of I/O of >=2^32 bytes and the ktracing there of
by using size_t/ssize_t instead of int/u_int to handle I/O lengths in
uiomove(), vn_fsizechk(), and ktrgenio(). Eliminate the always-zero
'error' argument to ktrgenio() at the same time.


Revision tags: OPENBSD_5_4_BASE
# 1.74 11-Jun-2013 deraadt

final removal of daddr64_t. daddr_t has been 64 bit for a long enough
test period; i think 3 years ago the last bugs fell out.
ok otto beck others


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.73 11-Jul-2012 guenther

If the current offset is strictly less than the process filesize
rlimit, then a write that would take it over the limit should be
clamped, making it a partial write.

ok beck@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.72 07-Aug-2010 krw

No "\n" needed at the end of panic() strings.

Bogus chunks pointed out by matthew@ and miod@. No cookies for
marco@ and jasper@.

ok deraadt@ miod@ matthew@ jasper@ macro@


# 1.71 12-Apr-2010 beck

Don't jump the queue if we have to wait on the client side because
the nfs_bufq is full - instead tsleep waiting for one of our nfsiod's
to free up space for us in the queue so we can enqueue on the end.

ok blambert@, tedu@, oga@


# 1.70 09-Apr-2010 oga

make more bettah. instead of doing:

switch(type) {
case VREG:
/*something */
break;
case VLNK:
/* something */
break;
default:
panic("wtf?");
}

do_something_that_doesn't_change_type();

switch(type) {
case VREG:
/* nowt */
break;
case VLNK:
n = 0;
break;
default:
panic("wtf?");
}

be a bit less silly and replace the second switch with:

if (type == VLNK)
n = 0;

ok beck@, blambert@


# 1.69 09-Apr-2010 oga

In the nfs bio functions, instead of looking at an invalid vnode type,
deciding to do nothing, printing about it and continuing along our merry
way without even erroring the sodding buffer, just panic. by this point
we are liked very fucked up anyway.

found in either edmonton or stockholm then forgotten. ok beck@,
blambert@


Revision tags: OPENBSD_4_7_BASE
# 1.68 19-Oct-2009 jsg

antsy
no binary change apart from nfsm_reqhead() which is clearly correct.

ok thib@


# 1.67 02-Sep-2009 thib

Backout the asyncio/aiod change, as it causes buf's to get hung.
problem noticed by deraadt@

ok beck@


# 1.66 27-Aug-2009 thib

Garbage collect two variables that where set but unused.
Tiny spacing nit.
Fix a typo, pointed out by miod@.


# 1.65 27-Aug-2009 thib

introduce a flag member to struct nfs_aiod, and use flags instead of the exit
and worked members. nad_worked becomes NFSAIOD_WAKEUP, which is set after if
an aiod was removed from the idle list and woken up by nfs_asyncio().

don't rely on tsleep wchans being unique, that is keep going back to sleep if
woken up unless the NFSAIOD_WAKEUP flag is set.

fix a divide by zero crash if nfs.vfs.iothreads is set to 0, as that can happen
when we recalculate the maximum buf's to queue up for each aiod.

in nfs_asyncio() set the nad_mnt to NULL before returning the aiod back to the
idle list in the case where we have already queued up to many bufs, otherwise
we trip an assertion.

minimize the time we are holding the nfs_aiodl_mtx to only when we are inserting
or removing from the lists, with the exception of nfs_set_naiod() as it would
make the loops more complicated and its uncommon in any case.

tested by myself and deraadt@
"fine with me" deraadt@


# 1.64 26-Aug-2009 thib

make sure that an aiod has been removed from the nfs_aiods_idle list
before inserting it back into the list.

crashes debugged with help from deraadt@ who also tested this fix.


# 1.63 20-Aug-2009 thib

Rework the way we do async I/O in nfs. Introduce separate buf queues for
each mount, and when work is "found", peg an aiod to that mount todo the
I/O. Make nfs_asyncio() a bit smarter when deciding when to do asyncio
and when to force it sync, this is done by keeping the aiod's one two lists,
an "idle" and an "all" list, so asyncio is only done when there are aiods
hanging around todo it for us or are already pegged to the mount.

Idea liked by at least beck@ (and I think art@).
Extensive testing done by myself and jasper and a few others on various
arch's.

Ideas/Code from Net/Free.

OK blambert@.


# 1.62 28-Jul-2009 art

Using the buf pointer returned from incore is a really bad idea.
Even if we know that someone safely holds B_BUSY and will not modify
the buf (as was the case in here), we still need to be sure that
the B_BUSY will not be released while we fiddle with the buf.

In this case, it was not safe, since copyout can sleep and whoever was
writing out the buf could finish the write and release the buf which
could then get recycled or unmapped while we slept. Always acquire
B_BUSY ourselves, even when it might give a minor performance penalty.

thib@ ok


# 1.61 22-Jul-2009 thib

remove a comment thats part lie and part stating the obvious.

ok blambert@


# 1.60 20-Jul-2009 thib

(struct foo *)0 -> NULL, every where I could find it.

OK blambert@


Revision tags: OPENBSD_4_6_BASE
# 1.59 23-Jun-2009 jasper

- /dev/drum is long gone; sync comment with reality

ok thib@


# 1.58 19-Mar-2009 oga

We don't count buffercache stats in the B_PHYS case, so fix nfs to not
increment the num{read,write} and pending{read,write} statistics in that
case, since biodone won't change them on completion.

On another note, I'm not sure that we use physical buffers for swapping
over nfs anymore, so this chunk may be superfluous.

beck@ came up with the same diff "So anyway rather than me commiting it
from my copy, I'm giving you the OK and the commit. since it officially
makes you a buffer cache and NFS hacker };-)"


Revision tags: OPENBSD_4_5_BASE
# 1.57 24-Jan-2009 thib

Use a timespec instead of a time_t for the clients nfsnode
mtime, gives us better granularity, helps with cache consistency.

Idea lifted from NetBSD.

OK blambert@


# 1.56 19-Jan-2009 thib

Introduce a macro to invalidate the attribute
cache instead of setting n_attrstamp to 0 directly.

Lift the macro name from NetBSD.
prompted by and OK blambert@


# 1.55 09-Aug-2008 thib

o nfs_vinvalbuf() is always called with the intrflag as 1, and then
checks if the mount is actually interrutable, and if not sets it 0.
remove this argument from nfs_vinvalbuf and just do the checking inside
the function.
o give nfs_vinvalbuf() a makeover so it looks nice. (spacing, casts, &c);
o Actually pass PCATCH too tsleep() if the mount it interrutable.

ok art@, blambert@


# 1.54 08-Aug-2008 blambert

After beck@ changed the way nfsiod's are notified of work, the
nfs_iodwant array became unused. Garbage collect and free up
a few bytes.

ok thib@


Revision tags: OPENBSD_4_4_BASE
# 1.53 25-Jul-2008 beck

much more correct way of dealing with nfs pending reads/writes
ok thib@


# 1.52 23-Jul-2008 beck

Correct cases of mishandling of pending reads and writes to prevent
them going negative - this consists of identifying a number of cases of
IO not going through the buffer cache and marking those buffers with
B_RAW - as well as fixing nfs_bio to show pending writes and reads through
the buffer cache via NFS

still has a problem with mishandling the counters I believe in the
async/sync fallback case where counters stay positive which will be
addressed seperately.

ok tedu@ deraadt@


# 1.51 14-Jun-2008 beck

Ensure each nfsiod can actually enqueue more than one asynchio - this mirrors
the accidental situation that used to happen when it leaked buffers and allowed
the syncer to do it, however this puts a limit on how much of the buffer cache
it is allowed to consume to a sensible amount - improves nfs write performance
since we don't have to do tons of them synch now.

Modifies the existing code to use wakeup_one instead of cruft, and now
all nfsiod's tsleep the same way.

ok thib@ art@


# 1.50 12-Jun-2008 thib

add a statistic bit to count how often we change async to sync

you need to upgrade nfsstat and the relevant header files

ok beck@


# 1.49 12-Jun-2008 art

if (something_complicated)
return (EIO);
return (EIO);
is kinda silly. Don't.

Prettify a bit in the process.

'makes perfect sense' blambert@, ok thib@


# 1.48 12-Jun-2008 thib

Actually return an error in nfs_asyncio() if we fail to process
the buf due too all of the nfs iod's being busy; this downgrades
the write to a sync one and allows to handle this.

ok art@, beck@


# 1.47 11-Jun-2008 blambert

Canonical for() -> queue.h FOREACH macro conversions.
Also, it is historical practice to #include <sys/queue.h>
when using queue.h macros.

ok thib@ krw@

special thanks to krw@ for reminders vice violence


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.46 01-Jun-2007 deraadt

pedro ok'd this ~3500 line diff which removes the vop argument
"ap = v" comments in under 8 seconds, so it must be ok. and it compiles
too.


# 1.45 01-Jun-2007 thib

daddr_t -> daddr64_t;
Basically the usage of daddr_t was to math out arguments to
nfs_getcacheblk, wich calls getblk();

ok deraadt@


Revision tags: OPENBSD_4_1_BASE
# 1.44 29-Nov-2006 miod

Kernel stack can be swapped. This means that stuff that's on the stack
should never be referenced outside the context of the process to which
this stack belongs unless we do the PHOLD/PRELE dance. Loads of code
doesn't follow the rules here. Instead of trying to track down all
offenders and fix this hairy situation, it makes much more sense
to not swap kernel stacks.

From art@, tested by many some time ago.


# 1.43 01-Nov-2006 thib

move the declaration of nfsstats from nfs_bio.c to
nfs_subs.c so it gets pulled in for NFSSERVER only
kernels.

ok deraadt@,krw@


Revision tags: OPENBSD_4_0_BASE
# 1.42 20-Apr-2006 pedro

Remove unused debug code that sneaked in by accident long ago


Revision tags: OPENBSD_3_9_BASE
# 1.41 31-Oct-2005 otto

Fix reading large files; from NetBSD. Somehow this was overlooked
when earlier merges were done. Fixes PR 4250. ok millert@ deraadt@


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.40 03-Aug-2004 marius

NFS commit coalescion: instead of sending a commit for each block, coalesce
these into larger ranges wherever possible.

this should speed up NFS writes quite a bit.

ok art@ millert@ pedro@ tedu@


# 1.39 21-Jul-2004 marius

kqueue support for NFS, adapted from netbsd.

ok art@ pedro@, "get it in" deraadt@


Revision tags: OPENBSD_3_4_BASE OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.38 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: UBC_SYNC_A
# 1.37 13-May-2003 jason

Kill a bunch more commons (very few left =)


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_B
# 1.36 21-May-2002 art

Protect calls to biodone with splbio. Some functions called
by biodone assume splbio (probably just on other filesystems) and some
callbacks from b_iodone assume it too. It's just much safer.
costa@ ok.


Revision tags: OPENBSD_3_1_BASE
# 1.35 08-Feb-2002 csapuntz

There are NFS servers where it's possible to modify a symbolic link. Remove aggressive optimization


# 1.34 16-Jan-2002 ericj

use queue.h macro's
remove register


# 1.33 19-Dec-2001 art

UBC was a disaster. It worked very good when it worked, but on some
machines or some configurations or in some phase of the moon (we actually
don't know when or why) files disappeared. Since we've not been able to
track down the problem in two weeks intense debugging and we need -current
to be stable, back out everything to a state it had before UBC.

We apologise for the inconvenience.


Revision tags: UBC_BASE
# 1.32 14-Dec-2001 art

branches: 1.32.2;
Workaround a compiler bug on m68k.


# 1.31 10-Dec-2001 art

Merge in struct uvm_vnode into struct vnode.


# 1.30 30-Nov-2001 art

Whooops.
Stop returning EINPROGRESS now that the caller doesn't understand it
anymore.


# 1.29 30-Nov-2001 csapuntz

Call buf_cleanout, which handles wakeups


# 1.28 29-Nov-2001 art

Make sure the nfs vnodes are on the syncer worklist.


# 1.27 29-Nov-2001 art

Make sure the whole buffer is initialized before calling bgetvp.
Recommended by csapuntz@


# 1.26 29-Nov-2001 art

Correctly handle b_vp with bgetvp and brelvp in {get,put}pages.
Prevents panics caused by vnodes being recycled under our feet.


# 1.25 27-Nov-2001 art

Merge in the unified buffer cache code as found in NetBSD 2001/03/10. The
code is written mostly by Chuck Silvers <chuq@chuq.com>/<chs@netbsd.org>.

Tested for the past few weeks by many developers, should be in a pretty stable
state, but will require optimizations and additional cleanups.


# 1.24 15-Nov-2001 art

Remove creds from struct buf, move the creds that nfs need into the nfs node.
While in the area, convert nfs node allocation from malloc to pool and do
some cleanups.
Based on the UBC changes in NetBSD. niklas@ ok.


# 1.23 06-Nov-2001 miod

Replace inclusion of <vm/foo.h> with the correct <uvm/bar.h> when necessary.
(Look ma, I might have broken the tree)


Revision tags: OPENBSD_3_0_BASE
# 1.22 27-Jun-2001 art

Remove old vm.


# 1.21 25-Jun-2001 csapuntz

Get rid of some dead code caused by the last commit


# 1.20 25-Jun-2001 csapuntz

Remove NQNFS


# 1.19 25-Jun-2001 csapuntz

Get rid of old directory caching scheme which caused persistent duplicates.

Still not correct for NFSv3 but that's hard.


Revision tags: OPENBSD_2_9_BASE
# 1.18 23-Feb-2001 csapuntz

Change the B_DELWRI flag using buf_dirty and buf_undirty instead of
manually twiddling it. This allows the buffer cache to more easily
keep track of dirty buffers and decide when it is appropriate to speed
up the syncer.

Insipired by FreeBSD.
Look over by art@


# 1.17 23-Feb-2001 csapuntz

Remove the clustering fields from the vnodes and place them in the
file system inode instead


Revision tags: OPENBSD_2_8_BASE
# 1.16 23-Jun-2000 mickey

remove obsolete vtrace guts; art@


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE OPENBSD_2_7_BASE SMP_BASE kame_19991208
# 1.15 26-Feb-1999 art

branches: 1.15.6;
compatibility with uvm vnode pager


Revision tags: OPENBSD_2_3_BASE OPENBSD_2_4_BASE
# 1.14 02-Dec-1997 csapuntz

More splbio()'s added so that reassignbuf can do its thing.


# 1.13 06-Nov-1997 csapuntz

Updates for VFS Lite 2 + soft update.


Revision tags: OPENBSD_2_2_BASE
# 1.12 06-Oct-1997 deraadt

back out vfs lite2 till after 2.2


# 1.11 06-Oct-1997 csapuntz

VFS Lite2 Changes


Revision tags: OPENBSD_2_0_BASE OPENBSD_2_1_BASE
# 1.10 27-Jul-1996 deraadt

fvdl; Don't mistake a non-async block that needs to be commited for an
interrupted write.


# 1.9 21-Jul-1996 tholo

Ensure we never use more than one callout table slot


# 1.8 14-Jun-1996 tholo

Keep dirty list used by in-kernel update(8) in sync with buffers


# 1.7 28-May-1996 deraadt

sync


# 1.6 21-Apr-1996 deraadt

partial sync with netbsd 960418, more to come


# 1.5 17-Apr-1996 mickey

Minor cleanups. Checked against Lite2.
(NetBSD's was really just a Lite2's, but w/ 64bit support)


# 1.4 31-Mar-1996 mickey

From NetBSD: NFSv3 import (tomorrow's Net's kernel)
Open's patches kept in. i'll possibly take a look at Lite2 soon,
is there smth usefull ?..


# 1.3 29-Feb-1996 niklas

From NetBSD: merge with 960217 (still NFSv2)


# 1.2 08-Jan-1996 dm

graichen@freebsd.org: fixed -type:=direct mounts in amd


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision


Revision tags: OPENBSD_6_1_BASE OPENBSD_6_2_BASE
# 1.82 22-Feb-2017 mpi

Keep local definitions local.

"good work" deraadt@, ok visa@


Revision tags: OPENBSD_5_9_BASE OPENBSD_6_0_BASE
# 1.81 13-Feb-2016 stefan

Convert to uiomove. From Martin Natano.


Revision tags: OPENBSD_5_8_BASE
# 1.80 14-Mar-2015 jsg

Remove some includes include-what-you-use claims don't
have any direct symbols used. Tested for indirect use by compiling
amd64/i386/sparc64 kernels.

ok tedu@ deraadt@


Revision tags: OPENBSD_5_7_BASE
# 1.79 10-Feb-2015 miod

First step towards making uiomove() take a size_t size argument:
- rename uiomove() to uiomovei() and update all its users.
- introduce uiomove(), which is similar to uiomovei() but with a size_t.
- rewrite uiomovei() as an uiomove() wrapper.
ok kettenis@


# 1.78 18-Dec-2014 tedu

delete a whole mess of unnecessary caddr_t casts


# 1.77 14-Nov-2014 tedu

bzero -> memset


Revision tags: OPENBSD_5_6_BASE
# 1.76 08-Jul-2014 deraadt

decouple struct uvmexp into a new file, so that uvm_extern.h and sysctl.h
don't need to be married.
ok guenther miod beck jsing kettenis


Revision tags: OPENBSD_5_5_BASE
# 1.75 14-Sep-2013 guenther

Correct the handling of I/O of >=2^32 bytes and the ktracing there of
by using size_t/ssize_t instead of int/u_int to handle I/O lengths in
uiomove(), vn_fsizechk(), and ktrgenio(). Eliminate the always-zero
'error' argument to ktrgenio() at the same time.


Revision tags: OPENBSD_5_4_BASE
# 1.74 11-Jun-2013 deraadt

final removal of daddr64_t. daddr_t has been 64 bit for a long enough
test period; i think 3 years ago the last bugs fell out.
ok otto beck others


Revision tags: OPENBSD_5_2_BASE OPENBSD_5_3_BASE
# 1.73 11-Jul-2012 guenther

If the current offset is strictly less than the process filesize
rlimit, then a write that would take it over the limit should be
clamped, making it a partial write.

ok beck@


Revision tags: OPENBSD_4_8_BASE OPENBSD_4_9_BASE OPENBSD_5_0_BASE OPENBSD_5_1_BASE
# 1.72 07-Aug-2010 krw

No "\n" needed at the end of panic() strings.

Bogus chunks pointed out by matthew@ and miod@. No cookies for
marco@ and jasper@.

ok deraadt@ miod@ matthew@ jasper@ macro@


# 1.71 12-Apr-2010 beck

Don't jump the queue if we have to wait on the client side because
the nfs_bufq is full - instead tsleep waiting for one of our nfsiod's
to free up space for us in the queue so we can enqueue on the end.

ok blambert@, tedu@, oga@


# 1.70 09-Apr-2010 oga

make more bettah. instead of doing:

switch(type) {
case VREG:
/*something */
break;
case VLNK:
/* something */
break;
default:
panic("wtf?");
}

do_something_that_doesn't_change_type();

switch(type) {
case VREG:
/* nowt */
break;
case VLNK:
n = 0;
break;
default:
panic("wtf?");
}

be a bit less silly and replace the second switch with:

if (type == VLNK)
n = 0;

ok beck@, blambert@


# 1.69 09-Apr-2010 oga

In the nfs bio functions, instead of looking at an invalid vnode type,
deciding to do nothing, printing about it and continuing along our merry
way without even erroring the sodding buffer, just panic. by this point
we are liked very fucked up anyway.

found in either edmonton or stockholm then forgotten. ok beck@,
blambert@


Revision tags: OPENBSD_4_7_BASE
# 1.68 19-Oct-2009 jsg

antsy
no binary change apart from nfsm_reqhead() which is clearly correct.

ok thib@


# 1.67 02-Sep-2009 thib

Backout the asyncio/aiod change, as it causes buf's to get hung.
problem noticed by deraadt@

ok beck@


# 1.66 27-Aug-2009 thib

Garbage collect two variables that where set but unused.
Tiny spacing nit.
Fix a typo, pointed out by miod@.


# 1.65 27-Aug-2009 thib

introduce a flag member to struct nfs_aiod, and use flags instead of the exit
and worked members. nad_worked becomes NFSAIOD_WAKEUP, which is set after if
an aiod was removed from the idle list and woken up by nfs_asyncio().

don't rely on tsleep wchans being unique, that is keep going back to sleep if
woken up unless the NFSAIOD_WAKEUP flag is set.

fix a divide by zero crash if nfs.vfs.iothreads is set to 0, as that can happen
when we recalculate the maximum buf's to queue up for each aiod.

in nfs_asyncio() set the nad_mnt to NULL before returning the aiod back to the
idle list in the case where we have already queued up to many bufs, otherwise
we trip an assertion.

minimize the time we are holding the nfs_aiodl_mtx to only when we are inserting
or removing from the lists, with the exception of nfs_set_naiod() as it would
make the loops more complicated and its uncommon in any case.

tested by myself and deraadt@
"fine with me" deraadt@


# 1.64 26-Aug-2009 thib

make sure that an aiod has been removed from the nfs_aiods_idle list
before inserting it back into the list.

crashes debugged with help from deraadt@ who also tested this fix.


# 1.63 20-Aug-2009 thib

Rework the way we do async I/O in nfs. Introduce separate buf queues for
each mount, and when work is "found", peg an aiod to that mount todo the
I/O. Make nfs_asyncio() a bit smarter when deciding when to do asyncio
and when to force it sync, this is done by keeping the aiod's one two lists,
an "idle" and an "all" list, so asyncio is only done when there are aiods
hanging around todo it for us or are already pegged to the mount.

Idea liked by at least beck@ (and I think art@).
Extensive testing done by myself and jasper and a few others on various
arch's.

Ideas/Code from Net/Free.

OK blambert@.


# 1.62 28-Jul-2009 art

Using the buf pointer returned from incore is a really bad idea.
Even if we know that someone safely holds B_BUSY and will not modify
the buf (as was the case in here), we still need to be sure that
the B_BUSY will not be released while we fiddle with the buf.

In this case, it was not safe, since copyout can sleep and whoever was
writing out the buf could finish the write and release the buf which
could then get recycled or unmapped while we slept. Always acquire
B_BUSY ourselves, even when it might give a minor performance penalty.

thib@ ok


# 1.61 22-Jul-2009 thib

remove a comment thats part lie and part stating the obvious.

ok blambert@


# 1.60 20-Jul-2009 thib

(struct foo *)0 -> NULL, every where I could find it.

OK blambert@


Revision tags: OPENBSD_4_6_BASE
# 1.59 23-Jun-2009 jasper

- /dev/drum is long gone; sync comment with reality

ok thib@


# 1.58 19-Mar-2009 oga

We don't count buffercache stats in the B_PHYS case, so fix nfs to not
increment the num{read,write} and pending{read,write} statistics in that
case, since biodone won't change them on completion.

On another note, I'm not sure that we use physical buffers for swapping
over nfs anymore, so this chunk may be superfluous.

beck@ came up with the same diff "So anyway rather than me commiting it
from my copy, I'm giving you the OK and the commit. since it officially
makes you a buffer cache and NFS hacker };-)"


Revision tags: OPENBSD_4_5_BASE
# 1.57 24-Jan-2009 thib

Use a timespec instead of a time_t for the clients nfsnode
mtime, gives us better granularity, helps with cache consistency.

Idea lifted from NetBSD.

OK blambert@


# 1.56 19-Jan-2009 thib

Introduce a macro to invalidate the attribute
cache instead of setting n_attrstamp to 0 directly.

Lift the macro name from NetBSD.
prompted by and OK blambert@


# 1.55 09-Aug-2008 thib

o nfs_vinvalbuf() is always called with the intrflag as 1, and then
checks if the mount is actually interrutable, and if not sets it 0.
remove this argument from nfs_vinvalbuf and just do the checking inside
the function.
o give nfs_vinvalbuf() a makeover so it looks nice. (spacing, casts, &c);
o Actually pass PCATCH too tsleep() if the mount it interrutable.

ok art@, blambert@


# 1.54 08-Aug-2008 blambert

After beck@ changed the way nfsiod's are notified of work, the
nfs_iodwant array became unused. Garbage collect and free up
a few bytes.

ok thib@


Revision tags: OPENBSD_4_4_BASE
# 1.53 25-Jul-2008 beck

much more correct way of dealing with nfs pending reads/writes
ok thib@


# 1.52 23-Jul-2008 beck

Correct cases of mishandling of pending reads and writes to prevent
them going negative - this consists of identifying a number of cases of
IO not going through the buffer cache and marking those buffers with
B_RAW - as well as fixing nfs_bio to show pending writes and reads through
the buffer cache via NFS

still has a problem with mishandling the counters I believe in the
async/sync fallback case where counters stay positive which will be
addressed seperately.

ok tedu@ deraadt@


# 1.51 14-Jun-2008 beck

Ensure each nfsiod can actually enqueue more than one asynchio - this mirrors
the accidental situation that used to happen when it leaked buffers and allowed
the syncer to do it, however this puts a limit on how much of the buffer cache
it is allowed to consume to a sensible amount - improves nfs write performance
since we don't have to do tons of them synch now.

Modifies the existing code to use wakeup_one instead of cruft, and now
all nfsiod's tsleep the same way.

ok thib@ art@


# 1.50 12-Jun-2008 thib

add a statistic bit to count how often we change async to sync

you need to upgrade nfsstat and the relevant header files

ok beck@


# 1.49 12-Jun-2008 art

if (something_complicated)
return (EIO);
return (EIO);
is kinda silly. Don't.

Prettify a bit in the process.

'makes perfect sense' blambert@, ok thib@


# 1.48 12-Jun-2008 thib

Actually return an error in nfs_asyncio() if we fail to process
the buf due too all of the nfs iod's being busy; this downgrades
the write to a sync one and allows to handle this.

ok art@, beck@


# 1.47 11-Jun-2008 blambert

Canonical for() -> queue.h FOREACH macro conversions.
Also, it is historical practice to #include <sys/queue.h>
when using queue.h macros.

ok thib@ krw@

special thanks to krw@ for reminders vice violence


Revision tags: OPENBSD_4_2_BASE OPENBSD_4_3_BASE
# 1.46 01-Jun-2007 deraadt

pedro ok'd this ~3500 line diff which removes the vop argument
"ap = v" comments in under 8 seconds, so it must be ok. and it compiles
too.


# 1.45 01-Jun-2007 thib

daddr_t -> daddr64_t;
Basically the usage of daddr_t was to math out arguments to
nfs_getcacheblk, wich calls getblk();

ok deraadt@


Revision tags: OPENBSD_4_1_BASE
# 1.44 29-Nov-2006 miod

Kernel stack can be swapped. This means that stuff that's on the stack
should never be referenced outside the context of the process to which
this stack belongs unless we do the PHOLD/PRELE dance. Loads of code
doesn't follow the rules here. Instead of trying to track down all
offenders and fix this hairy situation, it makes much more sense
to not swap kernel stacks.

From art@, tested by many some time ago.


# 1.43 01-Nov-2006 thib

move the declaration of nfsstats from nfs_bio.c to
nfs_subs.c so it gets pulled in for NFSSERVER only
kernels.

ok deraadt@,krw@


Revision tags: OPENBSD_4_0_BASE
# 1.42 20-Apr-2006 pedro

Remove unused debug code that sneaked in by accident long ago


Revision tags: OPENBSD_3_9_BASE
# 1.41 31-Oct-2005 otto

Fix reading large files; from NetBSD. Somehow this was overlooked
when earlier merges were done. Fixes PR 4250. ok millert@ deraadt@


Revision tags: OPENBSD_3_6_BASE OPENBSD_3_7_BASE OPENBSD_3_8_BASE
# 1.40 03-Aug-2004 marius

NFS commit coalescion: instead of sending a commit for each block, coalesce
these into larger ranges wherever possible.

this should speed up NFS writes quite a bit.

ok art@ millert@ pedro@ tedu@


# 1.39 21-Jul-2004 marius

kqueue support for NFS, adapted from netbsd.

ok art@ pedro@, "get it in" deraadt@


Revision tags: OPENBSD_3_4_BASE OPENBSD_3_5_BASE SMP_SYNC_A SMP_SYNC_B
# 1.38 02-Jun-2003 millert

Remove the advertising clause in the UCB license which Berkeley
rescinded 22 July 1999. Proofed by myself and Theo.


Revision tags: UBC_SYNC_A
# 1.37 13-May-2003 jason

Kill a bunch more commons (very few left =)


Revision tags: OPENBSD_3_2_BASE OPENBSD_3_3_BASE UBC_SYNC_B
# 1.36 21-May-2002 art

Protect calls to biodone with splbio. Some functions called
by biodone assume splbio (probably just on other filesystems) and some
callbacks from b_iodone assume it too. It's just much safer.
costa@ ok.


Revision tags: OPENBSD_3_1_BASE
# 1.35 08-Feb-2002 csapuntz

There are NFS servers where it's possible to modify a symbolic link. Remove aggressive optimization


# 1.34 16-Jan-2002 ericj

use queue.h macro's
remove register


# 1.33 19-Dec-2001 art

UBC was a disaster. It worked very good when it worked, but on some
machines or some configurations or in some phase of the moon (we actually
don't know when or why) files disappeared. Since we've not been able to
track down the problem in two weeks intense debugging and we need -current
to be stable, back out everything to a state it had before UBC.

We apologise for the inconvenience.


Revision tags: UBC_BASE
# 1.32 14-Dec-2001 art

branches: 1.32.2;
Workaround a compiler bug on m68k.


# 1.31 10-Dec-2001 art

Merge in struct uvm_vnode into struct vnode.


# 1.30 30-Nov-2001 art

Whooops.
Stop returning EINPROGRESS now that the caller doesn't understand it
anymore.


# 1.29 30-Nov-2001 csapuntz

Call buf_cleanout, which handles wakeups


# 1.28 29-Nov-2001 art

Make sure the nfs vnodes are on the syncer worklist.


# 1.27 29-Nov-2001 art

Make sure the whole buffer is initialized before calling bgetvp.
Recommended by csapuntz@


# 1.26 29-Nov-2001 art

Correctly handle b_vp with bgetvp and brelvp in {get,put}pages.
Prevents panics caused by vnodes being recycled under our feet.


# 1.25 27-Nov-2001 art

Merge in the unified buffer cache code as found in NetBSD 2001/03/10. The
code is written mostly by Chuck Silvers <chuq@chuq.com>/<chs@netbsd.org>.

Tested for the past few weeks by many developers, should be in a pretty stable
state, but will require optimizations and additional cleanups.


# 1.24 15-Nov-2001 art

Remove creds from struct buf, move the creds that nfs need into the nfs node.
While in the area, convert nfs node allocation from malloc to pool and do
some cleanups.
Based on the UBC changes in NetBSD. niklas@ ok.


# 1.23 06-Nov-2001 miod

Replace inclusion of <vm/foo.h> with the correct <uvm/bar.h> when necessary.
(Look ma, I might have broken the tree)


Revision tags: OPENBSD_3_0_BASE
# 1.22 27-Jun-2001 art

Remove old vm.


# 1.21 25-Jun-2001 csapuntz

Get rid of some dead code caused by the last commit


# 1.20 25-Jun-2001 csapuntz

Remove NQNFS


# 1.19 25-Jun-2001 csapuntz

Get rid of old directory caching scheme which caused persistent duplicates.

Still not correct for NFSv3 but that's hard.


Revision tags: OPENBSD_2_9_BASE
# 1.18 23-Feb-2001 csapuntz

Change the B_DELWRI flag using buf_dirty and buf_undirty instead of
manually twiddling it. This allows the buffer cache to more easily
keep track of dirty buffers and decide when it is appropriate to speed
up the syncer.

Insipired by FreeBSD.
Look over by art@


# 1.17 23-Feb-2001 csapuntz

Remove the clustering fields from the vnodes and place them in the
file system inode instead


Revision tags: OPENBSD_2_8_BASE
# 1.16 23-Jun-2000 mickey

remove obsolete vtrace guts; art@


Revision tags: OPENBSD_2_5_BASE OPENBSD_2_6_BASE OPENBSD_2_7_BASE SMP_BASE kame_19991208
# 1.15 26-Feb-1999 art

branches: 1.15.6;
compatibility with uvm vnode pager


Revision tags: OPENBSD_2_3_BASE OPENBSD_2_4_BASE
# 1.14 02-Dec-1997 csapuntz

More splbio()'s added so that reassignbuf can do its thing.


# 1.13 06-Nov-1997 csapuntz

Updates for VFS Lite 2 + soft update.


Revision tags: OPENBSD_2_2_BASE
# 1.12 06-Oct-1997 deraadt

back out vfs lite2 till after 2.2


# 1.11 06-Oct-1997 csapuntz

VFS Lite2 Changes


Revision tags: OPENBSD_2_0_BASE OPENBSD_2_1_BASE
# 1.10 27-Jul-1996 deraadt

fvdl; Don't mistake a non-async block that needs to be commited for an
interrupted write.


# 1.9 21-Jul-1996 tholo

Ensure we never use more than one callout table slot


# 1.8 14-Jun-1996 tholo

Keep dirty list used by in-kernel update(8) in sync with buffers


# 1.7 28-May-1996 deraadt

sync


# 1.6 21-Apr-1996 deraadt

partial sync with netbsd 960418, more to come


# 1.5 17-Apr-1996 mickey

Minor cleanups. Checked against Lite2.
(NetBSD's was really just a Lite2's, but w/ 64bit support)


# 1.4 31-Mar-1996 mickey

From NetBSD: NFSv3 import (tomorrow's Net's kernel)
Open's patches kept in. i'll possibly take a look at Lite2 soon,
is there smth usefull ?..


# 1.3 29-Feb-1996 niklas

From NetBSD: merge with 960217 (still NFSv2)


# 1.2 08-Jan-1996 dm

graichen@freebsd.org: fixed -type:=direct mounts in amd


# 1.1 18-Oct-1995 deraadt

branches: 1.1.1;
Initial revision