322138 |
07-Aug-2017 |
mav |
MFC r321794: Improve FHA locality control for NFS read/write requests.
This change adds two new tunables, allowing to control serialization for read and write NFS requests separately. It does not change the default behavior since there are too many factors to consider, but gives additional space for further experiments and tuning.
The main motivation for this change is very low write speed in case of ZFS with sync=always or when NFS clients requests sychronous operation, when every separate request has to be written/flushed to ZIL, and requests are processed one at a time. Setting vfs.nfsd.fha.write=0 in that case allows to increase ZIL throughput by several times by coalescing writes and cache flushes. There is a worry that doing it may increase data fragmentation on disks, but I suppose it should not happen for pool with SLOG.
Sponsored by: iXsystems, Inc. |
301057 |
31-May-2016 |
ian |
MFC r297323,r297324, r297325, r297326:
Set only one default route for nfsroot mount, the one associated with the interface that will be used to mount the rootfs (and never a self-ip proxy arp route). Made up of the following related changes...
Set ifctx->gotrootpath=1 only when the root path came from the dhcp/bootp server (and not when it came from a fallback method such as the ROOTDEVNAME option). This makes the code in bootpc_init() choose the first interface that provided a rootpath name. Previously it was choosing the first interface that got an IP address, which could be on a different and potentially unreachable subnet than the server providing the rootfs.
If the rootpath name actually does come from a fallback source, then the code continues to use the first interface in the list that got configured. Note that this wasn't directly reported in the PR cited below, but was discovered while working on that PR.
Switch bootpc_adjust_interface() from returning int to void. Its one caller doesn't check for errors, and all the errors that can happen result in it calling panic anyway, except for one that's really more of a warning (and is going to disappear on an upcoming commit anyway).
Stop setting the default route to the IP of the interface itself when the bootp/dhcp server doesn't provide a router option. Doing so prevents setting defaultrouter=<ip> in rc.conf (it fails because there's already a bogus default route installed by bootpc_init).
When an admin wants to use this style of proxy arp on an interface, the proper mechanism is to set the "use-lease-addr-for-default-route" flag in the dhcp server config. That causes the lease address to be delivered in the routers option, and the normal handling of the routers option will then install the self-ip as the default route.
Do not try to install a default route for each interface found, because only the first one will actually work and all the others just result in errors (which would get printed but otherwise ignored).
Instead, wait until we make a choice of which interface will be used to mount the rootfs, and install the default route associated with it (if any). After doing the md_mount() call to obtain the needed info, remove the default route again, and transcribe the route info into the nfs_diskless structure. If the system eventually chooses to mount the nfs rootfs, the default route will be installed again when the nfs_diskless code re-initializes the interface.
PR: 187094 |
301056 |
31-May-2016 |
ian |
MFC r297147, r297148, r297149, r297150, r297151:
Make both the loader and kernel use the interface-mtu option if the dhcp server provides it. Made up of these (semi-)related changes...
[kernel...] If the dhcp server provides an interface-mtu option, parse the value and set that mtu on the interface.
[libstand...]
Garbage collect the bswap routines from libstand, use sys/endian.h.
If the dhcp server delivers an interface-mtu option, parse it and store the value in a new global intf_mtu for use by the application.
[loader...]
If the dhcp server provided an interface-mtu option, transcribe the value to the boot.netif.mtu env var, which will be picked up by pre-existing code in nfs_mountroot() and used to configure the interface accordingly.
PR: 187094 |
292223 |
14-Dec-2015 |
rmacklem |
MFC: r291527 Add kernel support to the NFS server for the "-manage-gids" option that will be added to the nfsuserd daemon in a future commit. It modifies the cache used by NFSv4 for name<-->id translation (both username/uid and group/gid) to support this. When "-manage-gids" is set, the server looks up each uid for the RPC and uses the list of groups cached in the server instead of the list of groups provided in the RPC request. The cached group list is acquired for the cache by the nfsuserd daemon via getgrouplist(3). This avoids the 16 groups limit for the list in the RPC request. Since the cache is now used for every RPC when "-manage-gids" is enabled, the code also modifies the cache to use a separate mutex for each hash list instead of a single global mutex. |
267753 |
22-Jun-2014 |
mav |
MFC r267479: Fix/improve fhe_stats sysctl output. |
267740 |
22-Jun-2014 |
mav |
MFC r267221, r267278: Introduce new per-thread lock to protect the list of requests.
This allows to slightly simplify svc_run_internal() code: if we processed all the requests in a queue, then we know that new one will not appear. |
263478 |
21-Mar-2014 |
glebius |
Merge r262763, r262767, r262771, r262806 from head: - Remove rt_metrics_lite and simply put its members into rtentry. - Use counter(9) for rt_pksent (former rt_rmx.rmx_pksent). This removes another cache trashing ++ from packet forwarding path. - Create zini/fini methods for the rtentry UMA zone. Via initialize mutex and counter in them. - Fix reporting of rmx_pksent to routing socket. - Fix netstat(1) to report "Use" both in kvm(3) and sysctl(3) mode. |
261054 |
22-Jan-2014 |
mav |
MFC r260097: Move most of NFS file handle affinity code out of the heavily congested global RPC thread pool lock and protect it with own set of locks.
On synthetic benchmarks this improves peak NFS request rate by 40%. |
261049 |
22-Jan-2014 |
mav |
MFC r259765: Fix RPC server threads file handle affinity to work better with ZFS.
Instead of taking 8 specific bytes of file handle to identify file during RPC thread affitinity handling, use trivial hash of the full file handle. ZFS's struct zfid_short does not have padding field after the length field, as result, originally picked 8 bytes are loosing lower 16 bits of object ID, causing many false matches and unneeded requests affinity to same thread. This fix substantially improves NFS server latency and scalability in SPEC NFS benchmark by more flexible use of multiple NFS threads. |
261048 |
22-Jan-2014 |
mav |
MFC r259659, r259662: Remove several linear list traversals per request from RPC server code.
Do not insert active ports into pool->sp_active list if they are success- fully assigned to some thread. This makes that list include only ports that really require attention, and so traversal can be reduced to simple taking the first one.
Remove idle thread from pool->sp_idlethreads list when assigning some work (port of requests) to it. That again makes possible to replace list traversals with simple taking the first element. |
256281 |
10-Oct-2013 |
gjb |
Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation
|
253847 |
31-Jul-2013 |
ian |
Changes to allow using BOOTP_NFSROOT and mounting an nfs root filesystem other than the one specified by the BOOTP server. This configures NFS using the BOOTP protocol while also respecting other root-path options such as setting vfs.root.mountfrom in the environment or using the RB_DFLTROOT boot option. It allows you to override the root path provided by the server, or to supply a root path when the server provides IP configuration but no root path info.
This maintains the historical BOOTP_NFSROOT behavior of panicking on a failure to mount the root path provided by the server, unless you've provided an alternative via the ROOTDEVNAME kernel option or by setting vfs.root.mountfrom. The behavior of panicking when given no other options is preserved because it amounts to a bit of a retry loop that could eventually recover from a transient network or server problem.
The user can now override the root path from loader(8) even if the kernel is compiled with BOOTP_NFSROOT. If vfs.root.mountfrom is set in the environment it is used unconditionally -- it always overrides the BOOTP info. If it begins with [old]nfs: then the BOOTP code uses it instead of the server-provided info. If it specifies some other filesystem then the bootp code will not panic like it used to and the code in vfs_mountroot.c will invoke the right filesystem to do the mount.
If the kernel is compiled with the ROOTDEVNAME option, then that name is used by the BOOTP code if either * The server doesn't provide a pathname. * The boothowto flags include RB_DFLTROOT. The latter allows the user to compile in alternate path in ROOTDEVNAME such as ufs:/dev/da0s1a and boot from that path by setting boot_dftlroot=1 in loader(8) or using the '-r' option in boot(8).
The one thing not provided here is automatic failover from a server-provided path to a compiled-in one without the user manually requesting that. The code just isn't currently structured in a way that makes that possible with a lot of rewrite. I think the ability to set vfs.root.mountfrom and to use ROOTDEVNAME automatically when the server doesn't provide a name covers the most common needs.
A set of patches submitted by Lars Eggert provided the part I couldn't figure out by myself when I tried to do this last year; many thanks.
Reviewed by: rodrigc
|
249596 |
17-Apr-2013 |
ken |
Move the NFS FHA (File Handle Affinity) code from sys/nfsserver to sys/nfs, since it is now shared by the two NFS servers.
Suggested by: rmacklem Sponsored by: Spectra Logic MFC after: 2 weeks
|
248318 |
15-Mar-2013 |
glebius |
Use m_get() and m_getcl() instead of compat macros.
|
248207 |
12-Mar-2013 |
glebius |
Functions m_getm2() and m_get2() have different order of arguments, and that can drive someone crazy. While m_get2() is young and not documented yet, change its order of arguments to match m_getm2().
Sorry for churn, but better now than later.
|
248196 |
12-Mar-2013 |
glebius |
Use m_get2() to get mbuf of appropriate size.
Sponsored by: Nginx, Inc.
|
245568 |
17-Jan-2013 |
jhb |
Remove the unused nfs_curusec().
|
243882 |
05-Dec-2012 |
glebius |
Mechanically substitute flags from historic mbuf allocator with malloc(9) flags within sys.
Exceptions:
- sys/contrib not touched - sys/mbuf.h edited manually
|
243782 |
02-Dec-2012 |
rmacklem |
Add an nfssvc() option to the kernel for the new NFS client which dumps out the actual options being used by an NFS mount. This will be used to implement a "-m" option for nfsstat(1).
Reviewed by: alfred MFC after: 2 weeks
|
241561 |
14-Oct-2012 |
rmacklem |
Add two new options to the nfssvc(2) syscall that allow processes running as root to suspend/resume execution of the kernel nfsd threads. An earlier version of this patch was tested by Vincent Hoffman (vince at unsane.co.uk) and John Hickey (jh at deterlab.net).
Reviewed by: kib MFC after: 2 weeks
|
241394 |
10-Oct-2012 |
kevlo |
Revert previous commit...
Pointyhat to: kevlo (myself)
|
241370 |
09-Oct-2012 |
kevlo |
Prefer NULL over 0 for pointers
|
239337 |
16-Aug-2012 |
gonzo |
- Typo fix - style(9) fix
Spotted by: kib@, Andrey Zonov
|
239318 |
16-Aug-2012 |
gonzo |
Merge somewhat modified r230399 from projects/armv6:
Add timeout to wait for network controllers to appear when netbooting.
USB ethernet adapter initialization usually is delayed and they're not available immidiately after autoconfiguration. So we need to wait a bit before giving up
Reviewed by: stas@
|
231852 |
17-Feb-2012 |
bz |
Merge multi-FIB IPv6 support from projects/multi-fibv6/head/:
Extend the so far IPv4-only support for multiple routing tables (FIBs) introduced in r178888 to IPv6 providing feature parity.
This includes an extended rtalloc(9) KPI for IPv6, the necessary adjustments to the network stack, and user land support as in netstat.
Sponsored by: Cisco Systems, Inc. Reviewed by: melifaro (basically) MFC after: 10 days
|
228455 |
13-Dec-2011 |
glebius |
Some cleanup of BOOTP code. Initially I wanted to just change the ifioctl() usage, but end up with more changes.
- Use SIOCAIFADDR instead of old rusty SIOCSIFADDR, SIOCSIFBRDADDR and SIOCSIFNETMASK. - Use queue(9) instead of hand made stailq. - Use one socket for all ifioctl() and send/receive operations. - Use __func__ instead of cut-n-paste in logging and panics. - Axe some dead or strange code.
Tested by: gonzo, Stefan Bethke <stb lassitu.de>
|
227293 |
07-Nov-2011 |
ed |
Mark MALLOC_DEFINEs static that have no corresponding MALLOC_DECLAREs.
This means that their use is restricted to a single C file.
|
225617 |
16-Sep-2011 |
kmacy |
In order to maximize the re-usability of kernel code in user space this patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls.
Reviewed by: rwatson Approved by: re (bz)
|
223673 |
29-Jun-2011 |
gber |
Set proper root device name when legacy NFS client is compiled into kernel.
Approved by: cognet (mentor)
|
222813 |
07-Jun-2011 |
attilio |
etire the cpumask_t type and replace it with cpuset_t usage.
This is intended to fix the bug where cpu mask objects are capped to 32. MAXCPU, then, can now arbitrarely bumped to whatever value. Anyway, as long as several structures in the kernel are statically allocated and sized as MAXCPU, it is suggested to keep it as low as possible for the time being.
Technical notes on this commit itself: - More functions to handle with cpuset_t objects are introduced. The most notable are cpusetobj_ffs() (which calculates a ffs(3) for a cpuset_t object), cpusetobj_strprint() (which prepares a string representing a cpuset_t object) and cpusetobj_strscan() (which creates a valid cpuset_t starting from a string representation). - pc_cpumask and pc_other_cpus are target to be removed soon. With the moving from cpumask_t to cpuset_t they are now inefficient and not really useful. Anyway, for the time being, please note that access to pcpu datas is protected by sched_pin() in order to avoid migrating the CPU while reading more than one (possible) word - Please note that size of cpuset_t objects may differ between kernel and userland. While this is not directly related to the patch itself, it is good to understand that concept and possibly use the patch as a reference on how to deal with cpuset_t objects in userland, when accessing kernland members. - KTR_CPUMASK is changed and now is represented through a string, to be set as the example reported in NOTES.
Please additively note that no MAXCPU is bumped in this patch, but private testing has been done until to MAXCPU=128 on a real 8x8x2(htt) machine (amd64).
Please note that the FreeBSD version is not yet bumped because of the upcoming pcpu changes. However, note that this patch is not targeted for MFC.
People to thank for the time spent on this patch: - sbruno, pluknet and Nicholas Esborn (nick AT desert DOT net) tested several revision of the patches and really helped in improving stability of this work. - marius fixed several bugs in the sparc64 implementation and reviewed patches related to ktr. - jeff and jhb discussed the basic approach followed. - kib and marcel made targeted review on some specific part of the patch. - marius, art, nwhitehorn and andreast reviewed MD specific part of the patch. - marius, andreast, gonzo, nwhitehorn and jceel tested MD specific implementations of the patch. - Other people have made contributions on other patches that have been already committed and have been listed separately.
Companies that should be mentioned for having participated at several degrees: - Yahoo! for having offered the machines used for testing on big count of CPUs. - The FreeBSD Foundation for having sponsored my devsummit attendance, which has been instrumental. - Sandvine for having offered offices and infrastructure during development.
(I really hope I didn't forget anyone, if it happened I apologize in advance).
|
221973 |
15-May-2011 |
rmacklem |
Change the sysctl naming for the old and new NFS clients to vfs.oldnfs.xxx and vfs.nfs.xxx respectively. This makes the default nfs client use vfs.nfs.xxx after r221124.
|
221543 |
06-May-2011 |
rmacklem |
Move sys/nfsclient/nfs_kdtrace.h to sys/nfs/nfs_kdtrace.h so it can be used by the new NFS client as well as the old one.
|
221473 |
05-May-2011 |
rmacklem |
Modify the NFS nfssvc(2) syscall so that it allows anyone to get the statistics for the new NFS subsystem.
MFC after: 2 weeks
|
221439 |
04-May-2011 |
rmacklem |
Add kernel support for NFSSVC_ZEROCLTSTATS and NFSSVC_ZEROSRVSTATS so that they can be used by nfsstat(1) to implement the "-z" option for the new NFS subsystem.
MFC after: 2 weeks
|
221438 |
04-May-2011 |
rmacklem |
Revert r221306, since NFSSVC_ZEROSTATS zero'd both client and server stats, when separate modifiers for NFSSVC_GETSTATS for each of client and server stats is what it required by nfsstat(1).
|
221436 |
04-May-2011 |
ru |
Implemented a mount option "nocto" that disables cache coherency checking at open time. It may improve performance for read-only NFS mounts. Use deliberately.
MFC after: 1 week Reviewed by: rmacklem, jhb (earlier version)
|
221306 |
01-May-2011 |
rmacklem |
Add the kernel support needed to zero out the nfsstats structure for the new NFS subsystem. This will be used by nfsstats.c to implement the "-z" option.
MFC after: 2 weeks
|
221032 |
25-Apr-2011 |
rmacklem |
Fix the experimental NFS client so that it does not bogusly set the f_flags field of "struct statfs". This had the interesting effect of making the NFSv4 mounts "disappear" after r221014, since NFSMNT_NFSV4 and MNT_IGNORE became the same bit. Move the files used for a diskless NFS root from sys/nfsclient to sys/nfs in preparation for them to be used by both NFS clients. Also, move the declaration of the three global data structures from sys/nfsclient/nfs_vfsops.c to sys/nfs/nfs_diskless.c so that they are defined when either client uses them.
Reviewed by: jhb MFC after: 2 weeks
|
217432 |
14-Jan-2011 |
rmacklem |
Modify the experimental NFSv4 server so that it posts a SIGUSR2 signal to the master nfsd daemon whenever the stable restart file has been modified. This will allow the master nfsd daemon to maintain an up to date backup copy of the file. This is enabled via the nfssvc() syscall, so that older nfsd daemons will not be signaled.
Reviewed by: jhb MFC after: 1 week
|
216931 |
03-Jan-2011 |
rmacklem |
Fix the nlm so that it no longer depends on the regular nfs client and, as such, can be loaded for the experimental nfs client without the regular client.
Reviewed by: jhb MFC after: 2 weeks
|
214053 |
19-Oct-2010 |
rmacklem |
Fix the type of the 3rd argument for nm_getinfo so that it works for architectures like sparc64.
Suggested by: kib MFC after: 2 weeks
|
214048 |
19-Oct-2010 |
rmacklem |
Modify the NFS clients and the NLM so that the NLM can be used by both clients. Since the NLM uses various fields of the nfsmount structure, those fields were extracted and put in a separate nfs_mountcommon structure stored in sys/nfs/nfs_mountcommon.h. This structure also has a function pointer for a function that extracts the required information from the mount point and nfs vnode for that particular client, for information stored differently by the clients.
Reviewed by: jhb MFC after: 2 weeks
|
210455 |
24-Jul-2010 |
rmacklem |
Move sys/nfsclient/nfs_lock.c into sys/nfs and build it as a separate module that can be used by both the regular and experimental nfs clients. This fixes the problem reported by jh@ where /dev/nfslock would be registered twice when both nfs clients were used. I also defined the size of the lm_fh field to be the correct value, as it should be the maximum size of an NFSv3 file handle.
Reviewed by: jh MFC after: 2 weeks
|
203968 |
16-Feb-2010 |
marius |
Factor out the code shared between NFS client and server into its own module. With r203732 it became apparent that creating the sysctl nodes twice causes at least a warning, however the whole code shouldn't be present twice in the first place.
Discussed with: rmacklem
|
203732 |
09-Feb-2010 |
marius |
- Move nfs_realign() from the NFS client to the shared NFS code and remove the NFS server version in order to reduce code duplication. The shared version now uses a second parameter how, which is passed on to m_get(9) and m_getcl(9) as the server used M_WAIT while the client requires M_DONTWAIT, and replaces the the previously unused parameter hsiz. - Change nfs_realign() to use nfsm_aligned() so as with other NFS code the alignment check isn't actually performed on platforms without strict alignment requirements for performance reasons because as the comment suggests unaligned data only occasionally occurs with TCP. - Change fha_extract_info() to use nfs_realign() with M_DONTWAIT rather than M_WAIT because it's called with the RPC sp_lock held.
Reviewed by: jhb, rmacklem MFC after: 1 week
|
203731 |
09-Feb-2010 |
marius |
Some style(9) fixes
|
195631 |
12-Jul-2009 |
marcel |
Revert rev 192323 (nfs_common.c only): The D-cache flushing added here was to deal with I-cache incoherency observed on ia64. However, the problem was in the implementation of pmap_enter_object() for ia64: it was missing I-cache coherency logic for prefaulted pages. After this got added in rev 195625, testing showed that no D-cache flushing was required.
The SIGILL that was observed on Book-E (see commit log for rev 192323) ended up not being related to I-cache incoherency, but was found to be caused by bad memory. This discovery further undermined the need for D-cache flushing in the NFS I/O code, triggering the reversal.
Approved by: re (kensmith)
|
195202 |
30-Jun-2009 |
dfr |
Remove the old kernel RPC implementation and the NFS_LEGACYRPC option.
Approved by: re
|
195104 |
27-Jun-2009 |
rwatson |
Replace AUDIT_ARG() with variable argument macros with a set more more specific macros for each audit argument type. This makes it easier to follow call-graphs, especially for automated analysis tools (such as fxr).
In MFC, we should leave the existing AUDIT_ARG() macros as they may be used by third-party kernel modules.
Suggested by: brooks Approved by: re (kib) Obtained from: TrustedBSD Project MFC after: 1 week
|
192323 |
18-May-2009 |
marcel |
Add cpu_flush_dcache() for use after non-DMA based I/O so that a possible future I-cache coherency operation can succeed. On ARM for example the L1 cache can be (is) virtually mapped, which means that any I/O that uses temporary mappings will not see the I-cache made coherent. On ia64 a similar behaviour has been observed. By flushing the D-cache, execution of binaries backed by md(4) and/or NFS work reliably. For Book-E (powerpc), execution over NFS exhibits SIGILL once in a while as well, though cpu_flush_dcache() hasn't been implemented yet.
Doing an explicit D-cache flush as part of the non-DMA based I/O read operation eliminates the need to do it as part of the I-cache coherency operation itself and as such avoids pessimizing the DMA-based I/O read operations for which D-cache are already flushed/invalidated. It also allows future optimizations whereby the bcopy() followed by the D-cache flush can be integrated in a single operation, which could be implemented using on-chips DMA engines, by-passing the D-cache altogether.
|
190815 |
07-Apr-2009 |
rmacklem |
Adding sys/nfs/nfssvc.h and sys/nfs/nfs_nfssvc.c in preparation for sharing of the nfssvc() system call between nfsserver and the nfsv4 server. Building of nfs_nfssvc.c will be committed later, at the time the .c files in sys/nfsserver are updated. To do so now would result in nfssvc() multiply defined.
Submitted by: rmacklem Reviewed by: dfr Approved by: kib (mentor)
|
177599 |
25-Mar-2008 |
ru |
Replaced the misleading uses of a historical artefact M_TRYWAIT with M_WAIT. Removed dead code that assumed that M_TRYWAIT can return NULL; it's not true since the advent of MBUMA.
Reviewed by: arch
There are ongoing disputes as to whether we want to switch to directly using UMA flags M_WAITOK/M_NOWAIT for mbuf(9) allocation.
|
164725 |
28-Nov-2006 |
rees |
NFSv4 client: Add support for va_birthtime Fix va_ctime to use TIME_METADATA, not TIME_CREATE
|
148008 |
14-Jul-2005 |
ps |
Fixes for NFS crashes on architectures that require strict alignment. - Fix nfsm_disct() so that after pulling up data, the remaining data is aligned if necessary. - Fix nfs_clnt_tcp_soupcall() to bcopy() the rpc length out of the mbuf (instead of casting m_data to a uint32).
Submitted by: Pyun YongHyeon Reviewed by: Mohan Srinivasan
|
139823 |
07-Jan-2005 |
imp |
/* -> /*- for license, minor formatting changes
|
138463 |
06-Dec-2004 |
ps |
Add non-blocking versions of nfsm_dissect() and friends, for use from socket callbacks or similar callers, from both the NFS client and the server. Instituted nfsm_dissect_nonblock(), nfsm_dissect_xx_nonblock(). And nfsm_disct() now takes an extra M_TRYWAIT/M_DONTWAIT argument.
Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
|
127977 |
07-Apr-2004 |
imp |
Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999 and email from Peter Wemm, Alan Cox and Robert Watson.
Approved by: core, peter, alc, rwatson
|
122698 |
14-Nov-2003 |
alfred |
University of Michigan's Citi NFSv4 kernel client code.
Submitted by: Jim Rees <rees@umich.edu>
|
111119 |
19-Feb-2003 |
imp |
Back out M_* changes, per decision of the TRB.
Approved by: trb
|
109623 |
21-Jan-2003 |
alfred |
Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
|
104908 |
11-Oct-2002 |
mike |
Change iov_base's type from `char *' to the standard `void *'. All uses of iov_base which assume its type is `char *' (in order to do pointer arithmetic) have been updated to cast iov_base to `char *'.
|
102999 |
06-Sep-2002 |
peter |
nfsnode.h was moved to ../nfsclient ages ago. I forgot to remove it here.
|
92784 |
20-Mar-2002 |
jeff |
Remove unused include.
|
92741 |
20-Mar-2002 |
alfred |
Remove __P.
|
88731 |
31-Dec-2001 |
iedowse |
When the old nfsm_adv() macro was moved to nfsm_adv_xx(), a '>=' must have been inadvertently changed to '>'. This broke nfsm_adv() in the case where the advancement count is equal to the amount of data remaining in the current mbuf. Instead of moving the current position N bytes forward, nfs_adv() could end up moving it back to N bytes from the start of the mbuf data.
This should fix the client-side readdirplus problems that have been reported since September.
|
88091 |
18-Dec-2001 |
iedowse |
Avoid passing the variable `tl' to functions that just use it for temporary storage. In the old NFS code it wasn't at all clear if the value of `tl' was used across or after macro calls, but I'm fairly confident that the convention was to keep its use local. Each ex-macro function now uses a local version of this variable, so all of the double-indirection goes away.
The only exception to the `local use' rule for `tl' is nfsm_clget(), which is left unchanged by this commit.
Reviewed by: peter
|
84079 |
28-Sep-2001 |
peter |
Unwind some more macros. NFSMADV() was kinda silly since it was right next to equivalent m_len adjustments. Move the nfsm_subs.h macros into groups depending on which phase they are used in, since that affects the error recovery requirements. Collect some of the common error checking into a single macro as preparation for unwinding some more. Have nfs_rephead return a value instead of secretly modifying args. Remove some unused function arguments that were being passed around. Clarify nfsm_reply()'s error handling (I hope).
|
84057 |
27-Sep-2001 |
peter |
Make nfsm_dissect() have an obvious return value.
|
84002 |
27-Sep-2001 |
peter |
Tidy up nfsm_build usage. This is only partially finished.
|
83999 |
26-Sep-2001 |
peter |
Oops, forgot to rm this last time.
|
83651 |
18-Sep-2001 |
peter |
Cleanup and split of nfs client and server code. This builds on the top of several repo-copies.
|
83629 |
18-Sep-2001 |
imp |
nfs_strategy calls nfs_asyncio with td as NULL. So add a bandaid that will pass NULL as the struct proc when td is NULL. This has stopped crashing on my machine.
Note: The passing of NULL may be bogus, but I'll let others fix that problem.
Reviewed by: jhb
|
83493 |
15-Sep-2001 |
peter |
Sync some differences that were different between the copies of the files that were in nfs/nfs.h and nfsserver/nfs.h in the p4 tree.
|
83366 |
12-Sep-2001 |
julian |
KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process.
Sorry john! (your next MFC will be a doosie!)
Reviewed by: peter@freebsd.org, dillon@freebsd.org
X-MFC after: ha ha ha ha
|
83291 |
10-Sep-2001 |
kris |
Fix some signed/unsigned integer confusion, and add bounds checking of arguments to some functions.
Obtained from: NetBSD Reviewed by: peter MFC after: 2 weeks
|
82702 |
31-Aug-2001 |
dillon |
Pushdown Giant for nfs syscalls (nfssvc())
|
82213 |
23-Aug-2001 |
ache |
Stupid error from my side in prev. commit: || -> &&
|
82204 |
23-Aug-2001 |
ache |
Implement l_len<0 per POSIX check. Check for valid l_whence too.
|
82194 |
23-Aug-2001 |
ache |
Even better move: suppose that server is able to handle SEEK_END, so check arguments for all but not SEEK_END case, leaving SEEK_END handling for server
|
82193 |
23-Aug-2001 |
ache |
Apparently SEEK_END locking not supported by NFS. Previous variant returns EINVAL in that case, change it to EOPNOTSUPP.
|
82190 |
23-Aug-2001 |
ache |
Move <machine/*> after <sys/*>
Pointed by: bde
|
82174 |
23-Aug-2001 |
ache |
adv. lock: detect off_t overflow _before_ it occurse and return EOVERFLOW instead of EINVAL
|
80891 |
01-Aug-2001 |
iedowse |
Fix a client-side memory leak in nfs_flush(). The code allocates a temporary array to store struct buf pointers if the list doesn't fit in a local array. Usually it frees the array when finished, but if it jumps to the 'again' label and the new list does fit in the local array then it can forget to free a previously malloc'd M_TEMP memory.
Move the free() up a line so that it frees any previously allocated memory whether or not it needs to malloc a new array.
Reviewed by: dillon
|
80672 |
30-Jul-2001 |
peter |
Check the filehandle size when mounting.
Obtained from: Constantine Sapuntzakis <csapuntz@openbsd.org>
|
79247 |
04-Jul-2001 |
jhb |
- Sort includes. - Update vmmeter statistics for vnode pagein/pageouts in getpages/putpages.
|
79224 |
04-Jul-2001 |
dillon |
With Alfred's permission, remove vm_mtx in favor of a fine-grained approach (this commit is just the first stage). Also add various GIANT_ macros to formalize the removal of Giant, making it easy to test in a more piecemeal fashion. These macros will allow us to test fine-grained locks to a degree before removing Giant, and also after, and to remove Giant in a piecemeal fashion via sysctl's on those subsystems which the authors believe can operate without Giant.
|
78911 |
28-Jun-2001 |
jhb |
- Protect the mnt_vnode list with the mntvnode lock. - Use queue(9) macros.
|
77563 |
01-Jun-2001 |
jake |
Unlock the process returned from pfind() if it does not return NULL. This fixes a witness lock violation for nfssvc returning with locks held.
Submitted by: Jean-Luc Richier <Jean-Luc.Richier@imag.fr> PR: kern/27776
|
77183 |
25-May-2001 |
rwatson |
o Merge contents of struct pcred into struct ucred. Specifically, add the real uid, saved uid, real gid, and saved gid to ucred, as well as the pcred->pc_uidinfo, which was associated with the real uid, only rename it to cr_ruidinfo so as not to conflict with cr_uidinfo, which corresponds to the effective uid. o Remove p_cred from struct proc; add p_ucred to struct proc, replacing original macro that pointed. p->p_ucred to p->p_cred->pc_ucred. o Universally update code so that it makes use of ucred instead of pcred, p->p_ucred instead of p->p_pcred, cr_ruidinfo instead of p_uidinfo, cr_{r,sv}{u,g}id instead of p_*, etc. o Remove pcred0 and its initialization from init_main.c; initialize cr_ruidinfo there. o Restruction many credential modification chunks to always crdup while we figure out locking and optimizations; generally speaking, this means moving to a structure like this: newcred = crdup(oldcred); ... p->p_ucred = newcred; crfree(oldcred); It's not race-free, but better than nothing. There are also races in sys_process.c, all inter-process authorization, fork, exec, and exit. o Remove sigio->sio_ruid since sigio->sio_ucred now contains the ruid; remove comments indicating that the old arrangement was a problem. o Restructure exec1() a little to use newcred/oldcred arrangement, and use improved uid management primitives. o Clean up exit1() so as to do less work in credential cleanup due to pcred removal. o Clean up fork1() so as to do less work in credential cleanup and allocation. o Clean up ktrcanset() to take into account changes, and move to using suser_xxx() instead of performing a direct uid==0 comparision. o Improve commenting in various kern_prot.c credential modification calls to better document current behavior. In a couple of places, current behavior is a little questionable and we need to check POSIX.1 to make sure it's "right". More commenting work still remains to be done. o Update credential management calls, such as crfree(), to take into account new ruidinfo reference. o Modify or add the following uid and gid helper routines: change_euid() change_egid() change_ruid() change_rgid() change_svuid() change_svgid() In each case, the call now acts on a credential not a process, and as such no longer requires more complicated process locking/etc. They now assume the caller will do any necessary allocation of an exclusive credential reference. Each is commented to document its reference requirements. o CANSIGIO() is simplified to require only credentials, not processes and pcreds. o Remove lots of (p_pcred==NULL) checks. o Add an XXX to authorization code in nfs_lock.c, since it's questionable, and needs to be considered carefully. o Simplify posix4 authorization code to require only credentials, not processes and pcreds. Note that this authorization, as well as CANSIGIO(), needs to be updated to use the p_cansignal() and p_cansched() centralized authorization routines, as they currently do not take into account some desirable restrictions that are handled by the centralized routines, as well as being inconsistent with other similar authorization instances. o Update libkvm to take these changes into account.
Obtained from: TrustedBSD Project Reviewed by: green, bde, jhb, freebsd-arch, freebsd-audit
|
77086 |
23-May-2001 |
jhb |
Assert Giant is held by the caller rather than getting it and releasing it in getpages/putpages.
|
77031 |
23-May-2001 |
ru |
- FDESC, FIFO, NULL, PORTAL, PROC, UMAP and UNION file systems were repo-copied from sys/miscfs to sys/fs.
- Renamed the following file systems and their modules: fdesc -> fdescfs, portal -> portalfs, union -> unionfs.
- Renamed corresponding kernel options: FDESC -> FDESCFS, PORTAL -> PORTALFS, UNION -> UNIONFS.
- Install header files for the above file systems.
- Removed bogus -I${.CURDIR}/../../sys CFLAGS from userland Makefiles.
|
76827 |
19-May-2001 |
alfred |
Introduce a global lock for the vm subsystem (vm_mtx).
vm_mtx does not recurse and is required for most low level vm operations.
faults can not be taken without holding Giant.
Memory subsystems can now call the base page allocators safely.
Almost all atomic ops were removed as they are covered under the vm mutex.
Alpha and ia64 now need to catch up to i386's trap handlers.
FFS and NFS have been tested, other filesystems will need minor changes (grabbing the vm lock when twiddling page properties).
Reviewed (partially) by: jake, jhb
|
76688 |
16-May-2001 |
iedowse |
Change the second argument of vflush() to an integer that specifies the number of references on the filesystem root vnode to be both expected and released. Many filesystems hold an extra reference on the filesystem root vnode, which must be accounted for when determining if the filesystem is busy and then released if it isn't busy. The old `skipvp' approach required individual filesystem xxx_unmount functions to re-implement much of vflush()'s logic to deal with the root vnode.
All 9 filesystems that hold an extra reference on the root vnode got the logic wrong in the case of forced unmounts, so `umount -f' would always fail if there were any extra root vnode references. Fix this issue centrally in vflush(), now that we can.
This commit also fixes a vnode reference leak in devfs, which could result in idle devfs filesystems that refuse to unmount.
Reviewed by: phk, bp
|
76166 |
01-May-2001 |
markm |
Undo part of the tangle of having sys/lock.h and sys/mutex.h included in other "system" header files.
Also help the deprecation of lockmgr.h by making it a sub-include of sys/lock.h and removing sys/lockmgr.h form kernel .c files.
Sort sys/*.h includes where possible in affected files.
OK'ed by: bde (with reservations)
|
76131 |
29-Apr-2001 |
phk |
Add a vop_stdbmap(), and make it part of the default vop vector.
Make 7 filesystems which don't really know about VOP_BMAP rely on the default vector, rather than more or less complete local vop_nopbmap() implementations.
|
76118 |
29-Apr-2001 |
alfred |
Remove incorrect comment.
Submitted by: quinot@inf.enst.fr <quinot@inf.enst.fr> PR: kern/26893
|
76117 |
29-Apr-2001 |
grog |
Revert consequences of changes to mount.h, part 2.
Requested by: bde
|
75858 |
23-Apr-2001 |
grog |
Correct #includes to work with fixed sys/mount.h.
|
75692 |
19-Apr-2001 |
alfred |
vnode_pager_freepage() is really vm_page_free() in disguise, nuke vnode_pager_freepage() and replace all calls to it with vm_page_free()
|
75631 |
17-Apr-2001 |
alfred |
Implement client side NFS locks.
Obtained from: BSD/os Import Ok'd by: mckusick, jkh, motd on builder.freebsd.org
|
75580 |
17-Apr-2001 |
phk |
This patch removes the VOP_BWRITE() vector.
VOP_BWRITE() was a hack which made it possible for NFS client side to use struct buf with non-bio backing.
This patch takes a more general approach and adds a bp->b_op vector where more methods can be added.
The success of this patch depends on bp->b_op being initialized all relevant places for some value of "relevant" which is not easy to determine. For now the buffers have grown a b_magic element which will make such issues a tiny bit easier to debug.
|
75402 |
11-Apr-2001 |
peter |
Create debug.hashstat.[raw]nchash and debug.hashstat.[raw]nfsnode to enable easy access to the hash chain stats. The raw prefixed versions dump an integer array to userland with the chain lengths. This cheats and calls it an array of 'struct int' rather than 'int' or sysctl -a faithfully dumps out the 128K array on an average machine. The non-raw versions return 4 integers: count, number of chains used, maximum chain length, and percentage utilization (fixed point, multiplied by 100). The raw forms are more useful for analyzing the hash distribution, while the other form can be read easily by humans and stats loggers.
|
75218 |
05-Apr-2001 |
rwatson |
o Rather than arbitrarily construct a credential in the nfs_statfs() VFS operation, make use of the calling process's credential. This solution may not be ideal (there are a number of other possible proposals, including making use of the proc0 credential, adding a credential argument to the VFSOP, and switching from a hard-coded ucred to a hard-coded nfscred), it is simple and appears to work. The arguments against using simply crget() are fairly strong: it is the only place in the code (other than a nearly identical invocation in ncp) where crget() is invoked, other than in the process credential creation code; as ucred becomes extensible, this use of crget() without appropriate context results in less and less meaningful credential data. The implementation here will probably be tweaked as a result of experimentation and further exploration of the requirements. In the mean-time, it allows progress to be made in ucred expansion for new security models without causing a crash every time df is used on an NFS mounted file system.
This code has been interop tested against FreeBSD and Solaris NFS servers. While using the process credentials should not introduce interop problems, please let me know if any turn out to exist.
Reviewed by: freebsd-arch
|
74501 |
20-Mar-2001 |
peter |
Use the same API as the example code. Allow the initial hash value to be passed in, as the examples do. Incrementally hash in the dvp->v_id (using the official api) rather than add it. This seems to help power-of-two predictable filename trees where the filenames repeat on a power-of-two cycle and the directory trees have power-of-two components in it. The simple add then mask was causing things like 12000+ entry collision chains while most other entries have between 0 and 3 entries each. This way seems to improve things.
|
74384 |
17-Mar-2001 |
peter |
Use a generic implementation of the Fowler/Noll/Vo hash (FNV hash). Make the name cache hash as well as the nfsnode hash use it.
As a special tweak, create an unsigned version of register_t. This allows us to use a special tweak for the 64 bit versions that significantly speeds up the i386 version (ie: int64 XOR int64 is slower than int64 XOR int32).
The code layout is a little strange for the string function, but I was able to get between 5 to 10% improvement over the original version I started with. The layout affects gcc code generation choices and this way was fastest on x86 and alpha.
Note that 'CPUTYPE=p3' etc makes a fair difference to this. It is around 45% faster with -march=pentiumpro on a p6 cpu.
|
74381 |
17-Mar-2001 |
peter |
Dramatically improve the **lame** nfs_hash(). This is based on the Fowler / Noll / Vo Hash (http://www.isthe.com/chongo/tech/comp/fnv/).
This improves hash coverage a *massive* amount. We were seeing one set of machines that were using 0.84% of their 131072 entry nfsnode hash buckets with maximum chain lengths of up to ~500 entries. The machine was spending nearly 100% of its time in 'system'. A test with this has pushed the coverage from a few perCent up to 91% utilization with a max chain length of 11.
Submitted by: David Filo
|
73929 |
07-Mar-2001 |
jhb |
Grab the process lock while calling psignal and before calling psignal.
|
73286 |
01-Mar-2001 |
adrian |
Reviewed by: jlemon
An initial tidyup of the mount() syscall and VFS mount code.
This code replaces the earlier work done by jlemon in an attempt to make linux_mount() work.
* the guts of the mount work has been moved into vfs_mount().
* move `type', `path' and `flags' from being userland variables into being kernel variables in vfs_mount(). `data' remains a pointer into userspace.
* Attempt to verify the `type' and `path' strings passed to vfs_mount() aren't too long.
* rework mount() and linux_mount() to take the userland parameters (besides data, as mentioned) and pass kernel variables to vfs_mount(). (linux_mount() already did this, I've just tidied it up a little more.)
* remove the copyin*() stuff for `path'. `data' still requires copyin*() since its a pointer into userland.
* set `mount->mnt_statf_mntonname' in vfs_mount() rather than in each filesystem. This variable is generally initialised with `path', and each filesystem can override it if they want to.
* NOTE: f_mntonname is intiailised with "/" in the case of a root mount.
|
73211 |
28-Feb-2001 |
dillon |
Fix lockup for loopback NFS mounts. The pipelined I/O limitations could be hit on the client side and prevent the server side from retiring writes. Pipeline operations turned off for all READs (no big loss since reads are usually synchronous) and for NFS writes, and left on for the default bwrite(). (MFC expected prior to 4.3 freeze)
Testing by: mjacob, dillon
|
72650 |
18-Feb-2001 |
green |
Switch to using a struct xucred instead of a struct xucred when not actually in the kernel. This structure is a different size than what is currently in -CURRENT, but should hopefully be the last time any application breakage is caused there. As soon as any major inconveniences are removed, the definition of the in-kernel struct ucred should be conditionalized upon defined(_KERNEL).
This also changes struct export_args to remove dependency on the constantly-changing struct ucred, as well as limiting the bounds of the size fields to the correct size. This means: a) mountd and friends won't break all the time, b) mountd and friends won't crash the kernel all the time if they don't know what they're doing wrt actual struct export_args layout.
Reviewed by: bde
|
72645 |
18-Feb-2001 |
asmodai |
Preceed/preceeding are not english words. Use precede and preceding.
|
72216 |
09-Feb-2001 |
iedowse |
Fix some problems that were introduced in revision 1.97. Instead of returning an error code to the caller, NFS server op routines must themselves build an error reply and return 0 to the caller.
This is achieved by replacing the erroneous return statements with code that jumps forward to the op function's reply code. We need to be careful to ensure that the 'struct mount' pointer is NULL though, so that the final vn_finished_write() call becomes a no-op.
Reviewed by: mckusick, dillon
|
72200 |
09-Feb-2001 |
bmilekic |
Change and clean the mutex lock interface.
mtx_enter(lock, type) becomes:
mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks) mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized)
similarily, for releasing a lock, we now have:
mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN. We change the caller interface for the two different types of locks because the semantics are entirely different for each case, and this makes it explicitly clear and, at the same time, it rids us of the extra `type' argument.
The enter->lock and exit->unlock change has been made with the idea that we're "locking data" and not "entering locked code" in mind.
Further, remove all additional "flags" previously passed to the lock acquire/release routines with the exception of two:
MTX_QUIET and MTX_NOSWITCH
The functionality of these flags is preserved and they can be passed to the lock/unlock routines by calling the corresponding wrappers:
mtx_{lock, unlock}_flags(lock, flag(s)) and mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN locks, respectively.
Re-inline some lock acq/rel code; in the sleep lock case, we only inline the _obtain_lock()s in order to ensure that the inlined code fits into a cache line. In the spin lock case, we inline recursion and actually only perform a function call if we need to spin. This change has been made with the idea that we generally tend to avoid spin locks and that also the spin locks that we do have and are heavily used (i.e. sched_lock) do recurse, and therefore in an effort to reduce function call overhead for some architectures (such as alpha), we inline recursion for this case.
Create a new malloc type for the witness code and retire from using the M_DEV type. The new type is called M_WITNESS and is only declared if WITNESS is enabled.
Begin cleaning up some machdep/mutex.h code - specifically updated the "optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently need those.
Finally, caught up to the interface changes in all sys code.
Contributors: jake, jhb, jasone (in no particular order)
|
71915 |
02-Feb-2001 |
tegge |
Enable use of DHCP extensions. Reviewed by: Per Kristian Hove <Per.Hove@math.ntnu.no>
|
70674 |
04-Jan-2001 |
dillon |
NFS O_EXCL file create semantics temporarily uses file attributes to store the file verifier. The NFS client is supposed to do a SETATTR after a successful O_EXCL open/create to clean up the attributes. FreeBSD's client code was generating a SETATTR rpc but was not generating an access or modification time update within that rpc, leaving the file with a broken access time that solaris chokes on (and it doesn't look very nice when you ls -lua under FreeBSD either!). Fixed.
|
70254 |
21-Dec-2000 |
bmilekic |
* Rename M_WAIT mbuf subsystem flag to M_TRYWAIT. This is because calls with M_WAIT (now M_TRYWAIT) may not wait forever when nothing is available for allocation, and may end up returning NULL. Hopefully we now communicate more of the right thing to developers and make it very clear that it's necessary to check whether calls with M_(TRY)WAIT also resulted in a failed allocation. M_TRYWAIT basically means "try harder, block if necessary, but don't necessarily wait forever." The time spent blocking is tunable with the kern.ipc.mbuf_wait sysctl. M_WAIT is now deprecated but still defined for the next little while.
* Fix a typo in a comment in mbuf.h
* Fix some code that was actually passing the mbuf subsystem's M_WAIT to malloc(). Made it pass M_WAITOK instead. If we were ever to redefine the value of the M_WAIT flag, this could have became a big problem.
|
69781 |
08-Dec-2000 |
dwmalone |
Convert more malloc+bzero to malloc+M_ZERO.
Submitted by: josh@zipperup.org Submitted by: Robert Drehmel <robd@gmx.net>
|
69214 |
26-Nov-2000 |
phk |
Simplify the tprintf() API.
Loose the special <sys/tprintf.h> #include file.
|
68883 |
18-Nov-2000 |
dillon |
This patchset fixes a large number of file descriptor race conditions. Pre-rfork code assumed inherent locking of a process's file descriptor array. However, with the advent of rfork() the file descriptor table could be shared between processes. This patch closes over a dozen serious race conditions related to one thread manipulating the table (e.g. closing or dup()ing a descriptor) while another is blocked in an open(), close(), fcntl(), read(), write(), etc...
PR: kern/11629 Discussed with: Alexander Viro <viro@math.psu.edu>
|
68711 |
14-Nov-2000 |
mckusick |
In preparation for deprecating CIRCLEQ macros in favor of TAILQ macros which provide the same functionality and are a bit more efficient, convert use of CIRCLEQ's in NFS to TAILQ's.
|
68186 |
01-Nov-2000 |
eivind |
Give vop_mmap an untimely death. The opportunity to give it a timely death timed out in 1996.
|
67882 |
29-Oct-2000 |
phk |
Remove unneeded #include <sys/proc.h> lines.
|
67834 |
29-Oct-2000 |
tegge |
Reduce kernel stack usage by not having large packets on the stack. Supply correct size parameter to dhcpd. Replace some magic numbers with macro names. Handle more than one interface.
|
67534 |
24-Oct-2000 |
tegge |
Eliminate some bitrot (nonexisting member variable names). Don't use curproc when a proc pointer is available.
|
67531 |
24-Oct-2000 |
tegge |
Style fixes.
|
67529 |
24-Oct-2000 |
tegge |
Make RPC timeout message more readable. Supply proc pointer to sosend.
|
67486 |
24-Oct-2000 |
dwmalone |
Problem to avoid processes getting stuck in "vmopar". From Ian's mail:
The problem seems to originate with NFS's postop_attr information that is returned with a read or write RPC. Within a vm_fault context, the code cannot deal with vnode_pager_setsize() shrinking a vnode.
The workaround in the patch below stops the nfsm_postop_attr() macro from ever shrinking a vnode. If the new size in the postop_attr information is smaller, then it just sets the nfsnode n_attrstamp to 0 to stop the wrong size getting used in the future. This change only affects postop_attr attributes; the nfsm_loadattr() macro works as normal.
The change is implemented by adding a new argument to nfs_loadattrcache() called 'dontshrink'. When this is non-zero, nfs_loadattrcache() will never reduce the vnode/nfsnode size; instead it zeros n_attrstamp.
There remain other was processes can get stuck in vmopar.
Submitted by: Ian Dowse <iedowse@maths.tcd.ie> Reviewed by: dillon Tested by: Vadim Belman <voland@lflat.org>
|
67152 |
15-Oct-2000 |
bp |
Make nfs PDIRUNLOCK aware. Now it is possible to use nullfs mounts on top of nfs mounts, but there can be side effects because nfs uses shared locks for vnodes.
|
67151 |
15-Oct-2000 |
bp |
Add missed vop_stdunlock() for fifo's vnops (this affects only v2 mounts).
Give nfs's node lock its own name.
|
66615 |
04-Oct-2000 |
jasone |
Convert lockmgr locks from using simple locks to using mutexes.
Add lockdestroy() and appropriate invocations, which corresponds to lockinit() and must be called to clean up after a lockmgr lock is no longer needed.
|
66355 |
25-Sep-2000 |
bp |
Add a lock structure to vnode structure. Previously it was either allocated separately (nfs, cd9660 etc) or keept as a first element of structure referenced by v_data pointer(ffs). Such organization leads to known problems with stacked filesystems.
From this point vop_no*lock*() functions maintain only interlock lock. vop_std*lock*() functions maintain built-in v_lock structure using lockmgr(). vop_sharedlock() is compatible with vop_stdunlock(), but maintains a shared lock on vnode.
If filesystem wishes to export lockmgr compatible lock, it can put an address of this lock to v_vnlock field. This indicates that the upper filesystem can take advantage of it and use single lock structure for entire (or part) of stack of vnodes. This field shouldn't be examined or modified by VFS code except for initialization purposes.
Reviewed in general by: mckusick
|
65557 |
07-Sep-2000 |
jasone |
Major update to the way synchronization is done in the kernel. Highlights include:
* Mutual exclusion is used instead of spl*(). See mutex(9). (Note: The alpha port is still in transition and currently uses both.)
* Per-CPU idle processes.
* Interrupts are run in their own separate kernel threads and can be preempted (i386 only).
Partially contributed by: BSDi (BSD/OS) Submissions by (at least): cp, dfr, dillon, grog, jake, jhb, sheldonh
|
65497 |
05-Sep-2000 |
msmith |
Don't scan for the "right" network interface by shooting in the dark. Assume that the nfs_diskless structure is correctly set up; the provider ought to be getting it right.
|
63788 |
24-Jul-2000 |
mckusick |
This patch corrects the first round of panics and hangs reported with the new snapshot code.
Update addaliasu to correctly implement the semantics of the old checkalias function. When a device vnode first comes into existence, check to see if an anonymous vnode for the same device was created at boot time by bdevvp(). If so, adopt the bdevvp vnode rather than creating a new vnode for the device. This corrects a problem which caused the kernel to panic when taking a snapshot of the root filesystem.
Change the calling convention of vn_write_suspend_wait() to be the same as vn_start_write().
Split out softdep_flushworklist() from softdep_flushfiles() so that it can be used to clear the work queue when suspending filesystem operations.
Access to buffers becomes recursive so that snapshots can recursively traverse their indirect blocks using ffs_copyonwrite() when checking for the need for copy on write when flushing one of their own indirect blocks. This eliminates a deadlock between the syncer daemon and a process taking a snapshot.
Ensure that softdep_process_worklist() can never block because of a snapshot being taken. This eliminates a problem with buffer starvation.
Cleanup change in ffs_sync() which did not synchronously wait when MNT_WAIT was specified. The result was an unclean filesystem panic when doing forcible unmount with heavy filesystem I/O in progress.
Return a zero'ed block when reading a block that was not in use at the time that a snapshot was taken. Normally, these blocks should never be read. However, the readahead code will occationally read them which can cause unexpected behavior.
Clean up the debugging code that ensures that no blocks be written on a filesystem while it is suspended. Snapshots must explicitly label the blocks that they are writing during the suspension so that they do not cause a `write on suspended filesystem' panic.
Reorganize ffs_copyonwrite() to eliminate a deadlock and also to prevent a race condition that would permit the same block to be copied twice. This change eliminates an unexpected soft updates inconsistency in fsck caused by the double allocation.
Use bqrelse rather than brelse for buffers that will be needed soon again by the snapshot code. This improves snapshot performance.
|
62976 |
11-Jul-2000 |
mckusick |
Add snapshots to the fast filesystem. Most of the changes support the gating of system calls that cause modifications to the underlying filesystem. The gating can be enabled by any filesystem that needs to consistently suspend operations by adding the vop_stdgetwritemount to their set of vnops. Once gating is enabled, the function vfs_write_suspend stops all new write operations to a filesystem, allows any filesystem modifying system calls already in progress to complete, then sync's the filesystem to disk and returns. The function vfs_write_resume allows the suspended write operations to begin again. Gating is not added by default for all filesystems as for SMP systems it adds two extra locks to such critical kernel paths as the write system call. Thus, gating should only be added as needed.
Details on the use and current status of snapshots in FFS can be found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness is not included here. Unless and until you create a snapshot file, these changes should have no effect on your system (famous last words).
|
61619 |
13-Jun-2000 |
ps |
Correctly set the Maximum DHCP Message Size. bootpd now works again as well as ISC dhcpd.
|
60938 |
26-May-2000 |
jake |
Back out the previous change to the queue(3) interface. It was not discussed and should probably not happen.
Requested by: msmith and others
|
60833 |
23-May-2000 |
jake |
Change the way that the queue(3) structures are declared; don't assume that the type argument to *_HEAD and *_ENTRY is a struct.
Suggested by: phk Reviewed by: phk Approved by: mdodd
|
60141 |
07-May-2000 |
phk |
Include a RFC 1533 "Maximum DHCP Message Size" option in our request.
ISC DHCP will limit the reply length to 64 bytes for bootp replies unless we explicitly tell it we can do more. We tell it that we can do 1200 bytes.
|
60041 |
05-May-2000 |
phk |
Separate the struct bio related stuff out of <sys/buf.h> into <sys/bio.h>.
<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall not be made a nested include according to bdes teachings on the subject of nested includes.
Diskdrivers and similar stuff below specfs::strategy() should no longer need to include <sys/buf.> unless they need caching of data.
Still a few bogus uses of struct buf to track down.
Repocopy by: peter
|
59794 |
30-Apr-2000 |
phk |
Remove unneeded #include <vm/vm_zone.h>
Generated by: src/tools/tools/kerninclude
|
59762 |
29-Apr-2000 |
phk |
s/biowait/bufwait/g
Prodded by: several.
|
59391 |
19-Apr-2000 |
phk |
Remove ~25 unneeded #include <sys/conf.h> Remove ~60 unneeded #include <sys/malloc.h>
|
59249 |
15-Apr-2000 |
phk |
Complete the bio/buf divorce for all code below devfs::strategy
Exceptions: Vinum untouched. This means that it cannot be compiled. Greg Lehey is on the case.
CCD not converted yet, casts to struct buf (still safe)
atapi-cd casts to struct buf to examine B_PHYS
|
58934 |
02-Apr-2000 |
phk |
Move B_ERROR flag to b_ioflags and call it BIO_ERROR.
(Much of this done by script)
Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED.
Move b_pblkno and b_iodone_chain to struct bio while we transition, they will be obsoleted once bio structs chain/stack.
Add bio_queue field for struct bio aware disksort.
Address a lot of stylistic issues brought up by bde.
|
58710 |
27-Mar-2000 |
dillon |
Add a sysctl to specify the amount of UDP receive space NFS should reserve, in maximal NFS packets. Originally only 2 packets worth of space was reserved. The default is now 4, which appears to greatly improve performance for slow to mid-speed machines on gigabit networks.
Add documentation and correct some prior documentation.
Problem Researched by: Andrew Gallatin <gallatin@cs.duke.edu> Approved by: jkh
|
58349 |
20-Mar-2000 |
phk |
Rename the existing BUF_STRATEGY() to DEV_STRATEGY()
substitute BUF_WRITE(foo) for VOP_BWRITE(foo->b_vp, foo)
substitute BUF_STRATEGY(foo) for VOP_STRATEGY(foo->b_vp, foo)
This patch is machine generated except for the ccd.c and buf.h parts.
|
58345 |
20-Mar-2000 |
phk |
Remove B_READ, B_WRITE and B_FREEBUF and replace them with a new field in struct buf: b_iocmd. The b_iocmd is enforced to have exactly one bit set.
B_WRITE was bogusly defined as zero giving rise to obvious coding mistakes.
Also eliminate the redundant struct buf flag B_CALL, it can just as efficiently be done by comparing b_iodone to NULL.
Should you get a panic or drop into the debugger, complaining about "b_iocmd", don't continue. It is likely to write on your disk where it should have been reading.
This change is a step in the direction towards a stackable BIO capability.
A lot of this patch were machine generated (Thanks to style(9) compliance!)
Vinum users: Greg has not had time to test this yet, be careful.
|
57178 |
13-Feb-2000 |
peter |
Clean up some loose ends in the network code, including the X.25 and ISO #ifdefs. Clean out unused netisr's and leftover netisr linker set gunk. Tested on x86 and alpha, including world.
Approved by: jkh
|
56651 |
26-Jan-2000 |
dillon |
Fix catastrophic bug in NQNFS related to UDP mounts. The 'nqhost' struct contains a major union for which lph_slp was being initialized only for TCP connections, but accessed for all types of connections leading to a crash. Also, a conditional controlling an nfs_slplock() call contained an improper paren grouping, causing a second crash in the UDP case.
The nqhost structure has been reorganized and lph_slp has been made a normal structural field rather then a union field, and properly initialized for all connection types.
Approved by: jkh
|
55934 |
13-Jan-2000 |
dillon |
The alpha build cuases the 'nfsuid bloated' warning to occur. Well, there is nothing we can do about it. In fact, after further review there simply are not very many instances of the two structures NFS checks for 'bloat' so I've decided to simply rip the checks out entirely.
Submitted by: Andrew Gallatin <gallatin@cs.duke.edu>
|
55679 |
09-Jan-2000 |
shin |
tcp updates to support IPv6. also a small patch to sys/nfs/nfs_socket.c, as max_hdr size change.
Reviewed by: freebsd-arch, cvs-committers Obtained from: KAME project
|
55431 |
05-Jan-2000 |
dillon |
Enhance reassignbuf(). When a buffer cannot be time-optimally inserted into vnode dirtyblkhd we append it to the list instead of prepend it to the list in order to maintain a 'forward' locality of reference, which is arguably better then 'reverse'. The original algorithm did things this way to but at a huge time cost.
Enhance the append interlock for NFS writes to handle intr/soft mounts better.
Fix the hysteresis for NFS async daemon I/O requests to reduce the number of unnecessary context switches.
Modify handling of NFS mount options. Any given user option that is too high now defaults to the kernel maximum for that option rather then the kernel default for that option.
Reviewed by: Alfred Perlstein <bright@wintelcom.net>
|
55423 |
05-Jan-2000 |
dillon |
Fix at least one source of the continued 'NFS append race'. close() was calling nfs_flush() and then clearing the NMODIFIED bit. This is not legal since there might still be dirty buffers after the nfs_flush (for example, pending commits). The clearing of this bit in turn prevented a necessary vinvalbuf() from occuring leaving left over dirty buffers even after truncating the file in a new operation. The fix is to simply not clear NMODIFIED.
Also added a sysctl vfs.nfs.nfsv3_commit_on_close which, if set to 1, will cause close() to do a stage 1 write AND a stage 2 commit synchronously. By default only the stage 1 write is done synchronously.
Reviewed by: Alfred Perlstein <bright@wintelcom.net>
|
55206 |
29-Dec-1999 |
peter |
Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL" is an application space macro and the applications are supposed to be free to use it as they please (but cannot). This is consistant with the other BSD's who made this change quite some time ago. More commits to come.
|
54970 |
21-Dec-1999 |
alfred |
make getfh a standard syscall instead of dependant on having NFSSERVER defined, useful for userland fileservers that want to use a filehandle type interface to the filesystem.
Submitted by: Assar Westerlund assar@stacken.kth.se PR: kern/15452
|
54803 |
19-Dec-1999 |
rwatson |
Second pass commit to introduce new ACL and Extended Attribute system calls, vnops, vfsops, both in /kern, and to individual file systems that require a vfsop_ array entry.
Reviewed by: eivind
|
54799 |
19-Dec-1999 |
green |
M_PREPEND-related cleanups (unregisterifying struct mbuf *s).
|
54785 |
18-Dec-1999 |
dillon |
Fix compilation warning on alpha when converting pointer to integer to generate hash index.
Reviewed by: Andrew Gallatin <gallatin@cs.duke.edu>
|
54693 |
16-Dec-1999 |
dillon |
Have NFS use a snapshot of boottime instead of boottime itself to generate the NFSv3 Version id. boottime itself may change, sometimes once every tick if you are running xntpd, which really throws off clients. Clients will tend to throw away what they believe to be stale data too often, and can get into long loops rewriting the same data over and over again because they believe the server has rebooted over and over again due to the changing version id.
Approved by: jkh
|
54655 |
15-Dec-1999 |
eivind |
Introduce NDFREE (and remove VOP_ABORTOP)
|
54605 |
14-Dec-1999 |
dillon |
Fix two problems: First, fix the append seek position race that can occur due to np->n_size potentially changing if nfs_getcacheblk() blocks in nfs_write().
Second, under -current we must supply the proper bufsize when obtaining buffers that straddle the EOF, but due to the fact that np->n_size can change out from under us it is possible that we may specify the wrong buffer size and wind up truncating dirty data written by another process.
Both problems are solved by implementing nfs_rslock(), which allows us to lock around sensitive buffer cache operations such as those that occur when appending to a file.
It is believed that this race is responsible for causing dirtyoff/dirtyend and (in stable) validoff/validend to exceed the buffer size. Therefore we have now added a warning printf for the dirtyoff/end case in current.
However, we have introduced a new problem which we need to fix at some point, and that is that soft or intr NFS mounts may become uninterruptable from the point of view of process A which is stuck waiting on rslock while process B is stuck doing the rpc. To unstick process A, process B would have to be interrupted first.
Reviewed by: Alfred Perlstein <bright@wintelcom.net>
|
54567 |
13-Dec-1999 |
dillon |
Add a readahead heuristic to the NFS server side code. While the server cannot unilaterally pass data to a client it can reduce the physical disk transaction overhead by reading larger blocks. This results in better pipelining of requests/responses over the network and an almost 100% increase in cpu efficiency on the server. On a 100BaseTX network NFS read performance increases from 8.5 MBytes/sec to 10 MB/sec (maxed out), and cpu efficiency increases from 72% idle to 80% idle on the server.
Reviewed by: Alfred Perlstein <bright@wintelcom.net>
|
54564 |
13-Dec-1999 |
dillon |
PR: kern/15222 Submitted by: Ian Dowse <iedowse@maths.tcd.ie>
|
54536 |
13-Dec-1999 |
dillon |
Fix a timeout deadlock that can occur when the process holding the receive lock hasn't yet managed to send its own request.
PR: kern/15055 Submitted by: Ian Dowse iedowse@maths.tcd.ie
|
54485 |
12-Dec-1999 |
dillon |
Fix a number of server-side issues related to aborting badly formed NFS packets, mainly initializing structure pointers to NULL which are conditionally freed prior to return.
PR: kern/15249 Submitted by: Ian Dowse <iedowse@maths.tcd.ie>
|
54480 |
12-Dec-1999 |
dillon |
Synopsis of problem being fixed: Dan Nelson originally reported that blocks of zeros could wind up in a file written to over NFS by a client. The problem only occurs a few times per several gigabytes of data. This problem turned out to be bug #3 below.
bug #1:
B_CLUSTEROK must be cleared when an NFS buffer is reverted from stage 2 (ready for commit rpc) to stage 1 (ready for write). Reversions can occur when a dirty NFS buffer is redirtied with new data.
Otherwise the VFS/BIO system may end up thinking that a stage 1 NFS buffer is clusterable. Stage 1 NFS buffers are not clusterable.
bug #2:
B_CLUSTEROK was inappropriately set for a 'short' NFS buffer (short buffers only occur near the EOF of the file). Change to only set when the buffer is a full biosize (usually 8K). This bug has no effect but should be fixed in -current anyway. It need not be backported.
bug #3:
B_NEEDCOMMIT was inappropriately set in nfs_flush() (which is typically only called by the update daemon). nfs_flush() does a multi-pass loop but due to the lack of vnode locking it is possible for new buffers to be added to the dirtyblkhd list while a flush operation is going on. This may result in nfs_flush() setting B_NEEDCOMMIT on a buffer which has *NOT* yet gone through its stage 1 write, causing only the commit rpc to be made and thus causing the contents of the buffer to be thrown away (never sent to the server).
The patch also contains some cleanup, which only applies to the commit into -current.
Reviewed by: dg, julian Originally Reported by: Dan Nelson <dnelson@emsphone.com>
|
54444 |
11-Dec-1999 |
eivind |
Lock reporting and assertion changes. * lockstatus() and VOP_ISLOCKED() gets a new process argument and a new return value: LK_EXCLOTHER, when the lock is held exclusively by another process. * The ASSERT_VOP_(UN)LOCKED family is extended to use what this gives them * Extend the vnode_if.src format to allow more exact specification than locked/unlocked.
This commit should not do any semantic changes unless you are using DEBUG_VFS_LOCKS.
Discussed with: grog, mch, peter, phk Reviewed by: peter
|
53937 |
30-Nov-1999 |
dillon |
The symlink implementation could improperly return a NULL vp along with a 0 error code. The problem occured with NFSv2 mounts and also with any NFSv3 mount returning an EEXIST error (which is translated to 0 prior to return). The reply to the rpc only contains the file handle for the no-error case under NFSv3. The error case under NFSv3 and all cases under NFSv2 do *not* return the file handle. The fix is to do a secondary lookup to obtain the file handle and thus be able to generate a return vnode for the situations where the rpc reply does not contain the required information.
The bug was originally introduced when VOP_SYMLINK semantics were changed for -CURRENT. The NFS symlink implementation was not properly modified to go along with the change despite the fact that three people reviewed the code. It took four attempts to get the current fix correct with five people. Is NFS obfuscated? Ha!
Reviewed by: Alfred Perlstein <bright@wintelcom.net> Testing and Discussion: "Viren R.Shah" <viren@rstcorp.com>, Eivind Eklund <eivind@FreeBSD.ORG>, Ian Dowse <iedowse@maths.tcd.ie>
|
53776 |
27-Nov-1999 |
eivind |
Remap the error EEXISTS => 0 *before* using error to determine if we should return a vp.
|
53552 |
22-Nov-1999 |
dillon |
nm_srtt and nm_sdrtt are arrays[4]. Remove explicit initialization of element [4] in both, which goes beyond the end of the array, leaving [0], [1], [2], and [3]. This bug did not cause any problems since the overrun fields are initialized after the bogus array init but needs to be fixed anyway.
Submitted by: Ian Dowse <iedowse@maths.tcd.ie>
|
53463 |
20-Nov-1999 |
eivind |
Fix VOP_MKNOD for loss of WILLRELE. I don't know how I could have missed this in the first place :-(
Noticed by: bde
|
53452 |
20-Nov-1999 |
phk |
struct mountlist and struct mount.mnt_list have no business being a CIRCLEQ. Change them to TAILQ_HEAD and TAILQ_ENTRY respectively.
This removes ugly mp != (void*)&mountlist comparisons.
Requested by: phk Submitted by: Jake Burkholder jake@checker.org PR: 14967
|
53131 |
13-Nov-1999 |
eivind |
Remove WILLRELE from VOP_SYMLINK
Note: Previous commit to these files (except coda_vnops and devfs_vnops) that claimed to remove WILLRELE from VOP_RENAME actually removed it from VOP_MKNOD.
|
53101 |
12-Nov-1999 |
eivind |
Remove WILLRELE from VOP_RENAME
|
53095 |
11-Nov-1999 |
dillon |
Remove special case socket sharing code in order to allow nfsd to bind IP addresses to udp/cltp sockets separately.
PR: kern/13049 Reviewed by: David Malone <dwmalone@maths.tcd.ie>, freebsd-current
|
53022 |
08-Nov-1999 |
dillon |
Fix nfssvc_addsock() to not attempt to free a NULL socket structure when returning an error. Bug fix was extracted from the PR. The PR is not yet entirely resolved by this commit.
PR: kern/13049 Reviewed by: Matt Dillon <dillon@freebsd.org> Submitted by: Ian Dowse <iedowse@maths.tcd.ie>
|
52781 |
01-Nov-1999 |
msmith |
Call bootpc_init before we try to mount an NFS root, if we're configured to use BOOTP for NFS root discovery.
The entire interface setup inside nfs_mountroot is evil, and should die.
|
52635 |
29-Oct-1999 |
phk |
useracc() the prequel:
Merge the contents (less some trivial bordering the silly comments) of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts the #defines for the vm_inherit_t and vm_prot_t types next to their typedefs.
This paves the road for the commit to follow shortly: change useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE} as argument.
|
52492 |
25-Oct-1999 |
dillon |
Move NFS access cache hits/misses into nfsstats structure so /usr/bin/nfsstat can get to it easily.
|
51906 |
03-Oct-1999 |
phk |
Before we start to mess with the VFS name-cache clean things up a little bit: Isolate the namecache in its own file, and give it a dedicated malloc type.
|
51799 |
29-Sep-1999 |
marcel |
Careless use of struct proc *p caused major problems. 'p' is allowed to be NULL in this function (nfs_sigintr). Reorder the statements and guard them all with a single if (p != NULL).
reported, reviewed and tested by: jdp
|
51796 |
29-Sep-1999 |
dillon |
Make FreeBSD less conservative in determining when to return a cookie error for a directory. I have made this change after a great deal of review although I cannot be absolutely sure that this meets the spec.
The issue devolves into whether changes in an underlying (UFS) directory can cause NFS directory blocks to be renumbered. My read of the code indicates that NFS directory blocks will not be renumbered, which means that the cookies should still remain valid after a change is made to the underlying directory. This being the case, a cookie error should not be returned when a change is made to the underlying directory and, instead, the NFS client should rely on mtime detection to invalidate and reload the directory.
The use of mtime is problematic in of itself, due to insufficient resolution, which is why I believe the original conservative error handling was done. Still, there have been dozens of bug reports by people needing solaris<->FreeBSD interoperability and these have to be accomodated.
|
51791 |
29-Sep-1999 |
marcel |
sigset_t change (part 2 of 5) -----------------------------
The core of the signalling code has been rewritten to operate on the new sigset_t. No methodological changes have been made. Most references to a sigset_t object are through macros (see signalvar.h) to create a level of abstraction and to provide a basis for further improvements.
The NSIG constant has not been changed to reflect the maximum number of signals possible. The reason is that it breaks programs (especially shells) which assume that all signals have a non-null name in sys_signame. See src/bin/sh/trap.c for an example. Instead _SIG_MAXSIG has been introduced to hold the maximum signal possible with the new sigset_t.
struct sigprop has been moved from signalvar.h to kern_sig.c because a) it is only used there, and b) access must be done though function sigprop(). The latter because the table doesn't holds properties for all signals, but only for the first NSIG signals.
signal.h has been reorganized to make reading easier and to add the new and/or modified structures. The "old" structures are moved to signalvar.h to prevent namespace polution.
Especially the coda filesystem suffers from the change, because it contained lines like (p->p_sigmask == SIGIO), which is easy to do for integral types, but not for compound types.
NOTE: kdump (and port linux_kdump) must be recompiled.
Thanks to Garrett Wollman and Daniel Eischen for pressing the importance of changing sigreturn as well.
|
51475 |
20-Sep-1999 |
dillon |
Add comment to clarify a commit rpc optimization already being performed.
|
51344 |
17-Sep-1999 |
dillon |
Asynchronized client-side nfs_commit. NFS commit operations were previously issued synchronously even if async daemons (nfsiod's) were available. The commit has been moved from the strategy code to the doio code in order to asynchronize it.
Removed use of lastr in preparation for removal of vnode->v_lastr. It has been replaced with seqcount, which is already supported by the system and, in fact, gives us a better heuristic for sequential detection then lastr ever did.
Made major performance improvements to the server side commit. The server previously fsync'd the entire file for each commit rpc. The server now bawrite()s only those buffers related to the offset/size specified in the commit rpc.
Note that we do not commit the meta-data yet. This works still needs to be done.
Note that a further optimization can be done (and has not yet been done) on the client: we can merge multiple potential commit rpc's into a single rpc with a greater file offset/size range and greatly reduce rpc traffic.
Reviewed by: Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com>
|
51138 |
11-Sep-1999 |
alfred |
Seperate the export check in VFS_FHTOVP, exports are now checked via VFS_CHECKEXP.
Add fh(open|stat|stafs) syscalls to allow userland to query filesystems based on (network) filehandle.
Obtained from: NetBSD
|
51068 |
07-Sep-1999 |
alfred |
All unimplemented VFS ops now have entries in kern/vfs_default.c that return reasonable defaults.
This avoids confusing and ugly casting to eopnotsupp or making dummy functions. Bogus casting of filesystem sysctls to eopnotsupp() have been removed.
This should make *_vfsops.c more readable and reduce bloat.
Reviewed by: msmith, eivind Approved by: phk Tested by: Jeroen Ruigrok/Asmodai <asmodai@wxs.nl>
|
50521 |
28-Aug-1999 |
phk |
remove unused variables.
|
50477 |
28-Aug-1999 |
peter |
$Id$ -> $FreeBSD$
|
50405 |
26-Aug-1999 |
phk |
Simplify the handling of VCHR and VBLK vnodes using the new dev_t:
Make the alias list a SLIST.
Drop the "fast recycling" optimization of vnodes (including the returning of a prexisting but stale vnode from checkalias). It doesn't buy us anything now that we don't hardlimit vnodes anymore.
Rename checkalias2() and checkalias() to addalias() and addaliasu() - which takes dev_t and udev_t arg respectively.
Make the revoke syscalls use vcount() instead of VALIASED.
Remove VALIASED flag, we don't need it now and it is faster to traverse the much shorter lists than to maintain the flag.
vfs_mountedon() can check the dev_t directly, all the vnodes point to the same one.
Print the devicename in specfs/vprint().
Remove a couple of stale LFS vnode flags.
Remove unimplemented/unused LK_DRAINED;
|
50053 |
19-Aug-1999 |
peter |
Convert all the nfs macros to do { blah } while (0) to ensure it works correctly in if/else etc. egcs had probably picked up most of the problems here before with "ambiguous braces" etc, but this should increase the robustness a bit. Based on an idea from Eivind Eklund.
|
49945 |
17-Aug-1999 |
alc |
Add the (inline) function vm_page_undirty for clearing the dirty bitmask of a vm_page.
Use it.
Submitted by: dillon
|
49659 |
12-Aug-1999 |
dt |
nfs_getcacheblk() can return 0 if the mount is interruptible. It need to be checked by the caller.
Broken in: rev. 1.70 (1999/05/02)
|
49535 |
08-Aug-1999 |
phk |
Decommision miscfs/specfs/specdev.h. Most of it goes into <sys/conf.h>, a few lines into <sys/vnode.h>.
Add a few fields to struct specinfo, paving the way for the fun part.
|
49405 |
04-Aug-1999 |
peter |
Don't over-allocate and over-copy shorter NFSv2 filehandles and then correct the pointers afterwards.
It's kinda bogus that we generate a 24 (?) byte filehandle (2 x int32 fsid and 16 byte VFS fhandle) and pad it out to 64 bytes for NFSv3 with garbage. The whole point of NFSv3's variable filehandle length was to allow for shorter handles, both in memory and over the wire. I plan on taking a shot at fixing this shortly.
|
49302 |
31-Jul-1999 |
msmith |
As described by the submitter:
I did some tcpdumping the other day and noticed that GETATTR calls were frequently followed by an ACCESS call to the same file. The attached patch changes nfs_getattr to fill the access cache as a side effect. This is accomplished by calling ACCESS rather than GETATTR. This implies a modest overhead of 4 bytes in the request and 8 bytes in the response compared to doing a vanilla GETATTR. ... [The patch comprises two parts] The first is the "real" patch, the second counts misses and hits rather than fills and hits. The difference is subtle but important because both nfs_getattr and nfs_access now fill the cache. It also changes the default value of nfsaccess_cache_timeout to better match the attribute cache. IMHO, file timestamps change much more frequently than protection bits.
Submitted by: Bjoern Groenvall <bg@sics.se> Reviewed by: dillon (partially)
|
49239 |
30-Jul-1999 |
wpaul |
Close PR #12651: the hash calculation routine has changed in other parts of the kernel but was not updated in nfs_readdirplusrpc().
|
49237 |
30-Jul-1999 |
wpaul |
Fix two bugs in nfs_readdirplus(). The first is that in some cases, vnodes are locked and never unlocked, which leads to processes starting to wedge up after doing a mount -o nfsv3,tcp,rdirplus foo:/fs /fs; ls /fs. The second is that sometimes cnp is accessed without having been properly initialized: cnp->cn_nameptr points to an earlier name while "len" contains the length of a current name of different size. This leads to an attempt to dereference *(cn->cn_nameptr + len) which will sometimes cause a page fault and a panic.
With these two fixes, client side readdirplus works correctly with FreeBSD, IRIX 6.5.4 and Solaris 2.5.1 and 2.6 servers.
Submitted by: Matthew Dillon <dillon@backplane.com>
|
49232 |
29-Jul-1999 |
wpaul |
Correct the sanity test length calculation in nfsrv_readdirplus(): len is being incremented by 4 bytes too few each time through the loop, which allows more data into the mbuf chain that we really want. In the worst case, when we're using 32K read/write sizes with a TCP client, this causes readdirplus replies to sometimes exceed NFS_MAXPACKET which leads to a panic. This problem cropped up for me using an IRIX 6.5.4 NFSv3 TCP client with 32K read/write sizes, however supposedly it can be triggered by WinNT NFS servers too. In theory, it can probably be triggered by any NFS v3 implementation using TCP as long as it's using the maxiumum block size.
Reviewed by: Matthew Dillon <dillon@backplane.com>
|
49158 |
28-Jul-1999 |
alc |
Clear error in nfsrv_create when we have a valid reply so that that reply is actually transmitted. Submitted by: dillon
|
48859 |
17-Jul-1999 |
phk |
I have not one single time remembered the name of this function correctly so obviously I gave it the wrong name. s/umakedev/makeudev/g
|
48394 |
01-Jul-1999 |
peter |
Fix warning. va_fsid is udev_t, which is int32_t. No need to use %lx.
|
48362 |
30-Jun-1999 |
julian |
Submitted by: "David E. Cross" <crossd@cs.rpi.edu> Matt missed a line..
|
48357 |
30-Jun-1999 |
julian |
Submitted by: Conrad Minshall <conrad@apple.com> Reviewed by: Matthew Dillon <dillon@apollo.backplane.com>
The following ugly hack to the exit path of nfs_readlinkrpc() circumvents an Auspex bug: for symlinks longer than 112 (0x70) they return a 1024 byte xdr string - the correct data with many nulls appended. Without this fix namei returns ENAMETOOLONG, at least it does on our source base and on FreeBSD 3.0. Note we do not (and should not) rely upon their null padding.
|
48317 |
28-Jun-1999 |
peter |
Fix a KASSERT() that was negated and lead to: nfs_strategy: buffer 0xxxxx not locked when you attempted to write and had INVARIANTS turned on.
|
48274 |
27-Jun-1999 |
peter |
Minor tweaks to make sure (new) prerequisites for <sys/buf.h> (mostly splbio()/splx()) are #included in time.
|
48225 |
26-Jun-1999 |
mckusick |
Convert buffer locking from using the B_BUSY and B_WANTED flags to using lockmgr locks. This commit should be functionally equivalent to the old semantics. That is, all buffer locking is done with LK_EXCLUSIVE requests. Changes to take advantage of LK_SHARED and LK_RECURSIVE will be done in future commits.
|
48125 |
23-Jun-1999 |
julian |
Matt's NFS fixes. Submitted by: Matt Dillon Reviewed by: David Cross, Julian Elischer, Mike Smith, Drew Gallatin 3.2 version to follow when tested
|
48024 |
19-Jun-1999 |
mjacob |
Thanks to Bruce for noticing this.... compare against the *new* nfsnode's mount point for seeing whether or not the new nfsnode is already in the hash queue. We're pretty much guaranteed that the old nfsnode is already in the hash queue. Wank! Infinite Loop! Looks like just a minor typo.... (ah the influence of fortran ... np && np2... why not nfsnode_the_first && nfsnode_the_second???)...
|
47964 |
16-Jun-1999 |
mckusick |
Add a vnode argument to VOP_BWRITE to get rid of the last vnode operator special case. Delete special case code from vnode_if.sh, vnode_if.src, umap_vnops.c, and null_vnops.c.
|
47954 |
16-Jun-1999 |
mjacob |
Use vput instead of vrele. Reviewed by: Matthew Dillon <dillon@apollo.backplane.com> Submitted by: Ville-Pertti Keinonen <will@iki.fi> Obtained from: Matthew Dillon <dillon@apollo.backplane.com>
|
47938 |
15-Jun-1999 |
mjacob |
If we retry this operation from the top of this routine, we need to make sure we've freed any allocated resources (to avoid a memory leak) and and do the right thing with respect to the nfs node hash lock we'd acquired.
|
47751 |
05-Jun-1999 |
peter |
Various changes lifted from the OpenBSD cvs tree:
txdr_hyper and fxdr_hyper tweaks to avoid excessive CPU order knowledge.
nfs_serv.c: don't call nfsm_adj() with negative values, windows clients could crash servers when doing a readdir of a large directory.
nfs_socket.c: Use IP_PORTRANGE to get a priviliged port without a spin loop trying to bind(). Don't clobber a mbuf pointer or we get panics on a NFS3ERR_JUKEBOX error from a server when reusing a freed mbuf.
nfs_subs.c: Don't loose st_blocks on NFSv2 mounts when > 2GB.
Obtained from: OpenBSD
|
47750 |
05-Jun-1999 |
peter |
Fix a malloc race
Obtained from: OpenBSD (csapuntz)
|
47749 |
05-Jun-1999 |
peter |
Don't mistake a non-async block that needs to be committed for an interrupted write.
Obtained from: fvdl@NetBSD.org via OpenBSD.
|
47028 |
11-May-1999 |
phk |
Divorce "dev_t" from the "major|minor" bitmap, which is now called udev_t in the kernel but still called dev_t in userland.
Provide functions to manipulate both types: major() umajor() minor() uminor() makedev() umakedev() dev2udev() udev2dev()
For now they're functions, they will become in-line functions after one of the next two steps in this process.
Return major/minor/makedev to macro-hood for userland.
Register a name in cdevsw[] for the "filedescriptor" driver.
In the kernel the udev_t appears in places where we have the major/minor number combination, (ie: a potential device: we may not have the driver nor the device), like in inodes, vattr, cdevsw registration and so on, whereas the dev_t appears where we carry around a reference to a actual device.
In the future the cdevsw and the aliased-from vnode will be hung directly from the dev_t, along with up to two softc pointers for the device driver and a few houskeeping bits. This will essentially replace the current "alias" check code (same buck, bigger bang).
A little stunt has been provided to try to catch places where the wrong type is being used (dev_t vs udev_t), if you see something not working, #undef DEVT_FASCIST in kern/kern_conf.c and see if it makes a difference. If it does, please try to track it down (many hands make light work) or at least try to reproduce it as simply as possible, and describe how to do that.
Without DEVT_FASCIST I belive this patch is a no-op.
Stylistic/posixoid comments about the userland view of the <sys/*.h> files welcome now, from userland they now contain the end result.
Next planned step: make all dev_t's refer to the same devsw[] which means convert BLK's to CHR's at the perimeter of the vnodes and other places where they enter the game (bootdev, mknod, sysctl).
|
46580 |
06-May-1999 |
phk |
remove b_proc from struct buf, it's (now) unused.
Reviewed by: dillon, bde
|
46568 |
06-May-1999 |
peter |
Add sufficient braces to keep egcs happy about potentially ambiguous if/else nesting.
|
46370 |
03-May-1999 |
alc |
All directory accesses must be made with NFS_DIRBLKSIZE chunks to avoid confusing the directory read cookie cache. The nfs_access implementation for v2 mounts attempts to read from the directory if root is the user so that root can't access cached files when the server remaps root to some other user.
Submitted by: Doug Rabson <dfr@nlsystems.com> Reviewed by: Matthew Dillon <dillon@apollo.backplane.com>
|
46349 |
02-May-1999 |
alc |
The VFS/BIO subsystem contained a number of hacks in order to optimize piecemeal, middle-of-file writes for NFS. These hacks have caused no end of trouble, especially when combined with mmap(). I've removed them. Instead, NFS will issue a read-before-write to fully instantiate the struct buf containing the write. NFS does, however, optimize piecemeal appends to files. For most common file operations, you will not notice the difference. The sole remaining fragment in the VFS/BIO system is b_dirtyoff/end, which NFS uses to avoid cache coherency issues with read-merge-write style operations. NFS also optimizes the write-covers-entire-buffer case by avoiding the read-before-write. There is quite a bit of room for further optimization in these areas.
The VM system marks pages fully-valid (AKA vm_page_t->valid = VM_PAGE_BITS_ALL) in several places, most noteably in vm_fault. This is not correct operation. The vm_pager_get_pages() code is now responsible for marking VM pages all-valid. A number of VM helper routines have been added to aid in zeroing-out the invalid portions of a VM page prior to the page being marked all-valid. This operation is necessary to properly support mmap(). The zeroing occurs most often when dealing with file-EOF situations. Several bugs have been fixed in the NFS subsystem, including bits handling file and directory EOF situations and buf->b_flags consistancy issues relating to clearing B_ERROR & B_INVAL, and handling B_DONE.
getblk() and allocbuf() have been rewritten. B_CACHE operation is now formally defined in comments and more straightforward in implementation. B_CACHE for VMIO buffers is based on the validity of the backing store. B_CACHE for non-VMIO buffers is based simply on whether the buffer is B_INVAL or not (B_CACHE set if B_INVAL clear, and vise-versa). biodone() is now responsible for setting B_CACHE when a successful read completes. B_CACHE is also set when a bdwrite() is initiated and when a bwrite() is initiated. VFS VOP_BWRITE routines (there are only two - nfs_bwrite() and bwrite()) are now expected to set B_CACHE. This means that bowrite() and bawrite() also set B_CACHE indirectly.
There are a number of places in the code which were previously using buf->b_bufsize (which is DEV_BSIZE aligned) when they should have been using buf->b_bcount. These have been fixed. getblk() now clears B_DONE on return because the rest of the system is so bad about dealing with B_DONE.
Major fixes to NFS/TCP have been made. A server-side bug could cause requests to be lost by the server due to nfs_realign() overwriting other rpc's in the same TCP mbuf chain. The server's kernel must be recompiled to get the benefit of the fixes.
Submitted by: Matthew Dillon <dillon@apollo.backplane.com>
|
46155 |
28-Apr-1999 |
phk |
This Implements the mumbled about "Jail" feature.
This is a seriously beefed up chroot kind of thing. The process is jailed along the same lines as a chroot does it, but with additional tough restrictions imposed on what the superuser can do.
For all I know, it is safe to hand over the root bit inside a prison to the customer living in that prison, this is what it was developed for in fact: "real virtual servers".
Each prison has an ip number associated with it, which all IP communications will be coerced to use and each prison has its own hostname.
Needless to say, you need more RAM this way, but the advantage is that each customer can run their own particular version of apache and not stomp on the toes of their neighbors.
It generally does what one would expect, but setting up a jail still takes a little knowledge.
A few notes:
I have no scripts for setting up a jail, don't ask me for them.
The IP number should be an alias on one of the interfaces.
mount a /proc in each jail, it will make ps more useable.
/proc/<pid>/status tells the hostname of the prison for jailed processes.
Quotas are only sensible if you have a mountpoint per prison.
There are no privisions for stopping resource-hogging.
Some "#ifdef INET" and similar may be missing (send patches!)
If somebody wants to take it from here and develop it into more of a "virtual machine" they should be most welcome!
Tools, comments, patches & documentation most welcome.
Have fun...
Sponsored by: http://www.rndassociates.com/ Run for almost a year by: http://www.servetheweb.com/
|
46112 |
27-Apr-1999 |
phk |
Suser() simplification:
1: s/suser/suser_xxx/
2: Add new function: suser(struct proc *), prototyped in <sys/proc.h>.
3: s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/
The remaining suser_xxx() calls will be scrutinized and dealt with later.
There may be some unneeded #include <sys/cred.h>, but they are left as an exercise for Bruce.
More changes to the suser() API will come along with the "jail" code.
|
45996 |
24-Apr-1999 |
dt |
Fixed printf format errors on alpha.
|
45553 |
10-Apr-1999 |
peter |
Close a potential mbuf and/or mbuf cluster leak in the client-side NFS statfs() code. Free the whole chain, not just the first one.
|
45361 |
06-Apr-1999 |
peter |
Hold nfsd's upages in-core with PHOLD rather than P_NOSWAP.
|
45347 |
05-Apr-1999 |
julian |
Catch a case spotted by Tor where files mmapped could leave garbage in the unallocated parts of the last page when the file ended on a frag but not a page boundary. Delimitted by tags PRE_MATT_MMAP_EOF and POST_MATT_MMAP_EOF, in files alpha/alpha/pmap.c i386/i386/pmap.c nfs/nfs_bio.c vm/pmap.h vm/vm_page.c vm/vm_page.h vm/vnode_pager.c miscfs/specfs/spec_vnops.c ufs/ufs/ufs_readwrite.c kern/vfs_bio.c
Submitted by: Matt Dillon <dillon@freebsd.org> Reviewed by: Alan Cox <alc@freebsd.org>
|
44679 |
12-Mar-1999 |
julian |
Reviewed by: Many at differnt times in differnt parts, including alan, john, me, luoqi, and kirk Submitted by: Matt Dillon <dillon@frebsd.org>
This change implements a relatively sophisticated fix to getnewbuf(). There were two problems with getnewbuf(). First, the writerecursion can lead to a system stack overflow when you have NFS and/or VN devices in the system. Second, the free/dirty buffer accounting was completely broken. Not only did the nfs routines blow it trying to manually account for the buffer state, but the accounting that was done did not work well with the purpose of their existance: figuring out when getnewbuf() needs to sleep.
The meat of the change is to kern/vfs_bio.c. The remaining diffs are all minor except for NFS, which includes both the fixes for bp interaction AND fixes for a 'biodone(): buffer already done' lockup. Sys/buf.h also contains a chaining structure which is not used by this patchset but is used by other patches that are coming soon. This patch deliniated by tags PRE_MAT_GETBUF and POST_MAT_GETBUF. (sorry for the missing T matt)
|
44246 |
25-Feb-1999 |
peter |
Untangle the nfs send and receive queue locking a little. One lock routine was [ab]used for two different things, and you couldn't tell from the wait channel which one had wedged. Catch a few things missing from NFS_NOSERVER.
|
44112 |
18-Feb-1999 |
dfr |
Move the declaration of the vfs.nfs sysctl node outside an ifdef so that it builds if NFS_NOSERVER is defined.
Spotted by: Bruce Evans <bde@zeta.org.au>
|
44101 |
17-Feb-1999 |
bde |
Fixed bitrot in NFS_ACDEBUG option.
|
44078 |
16-Feb-1999 |
dfr |
* Change sysctl from using linker_set to construct its tree using SLISTs. This makes it possible to change the sysctl tree at runtime.
* Change KLD to find and register any sysctl nodes contained in the loaded file and to unregister them when the file is unloaded.
Reviewed by: Archie Cobbs <archie@whistle.com>, Peter Wemm <peter@netplex.com.au> (well they looked at it anyway)
|
43960 |
13-Feb-1999 |
dillon |
General additional cleanup of VOP API for NFS ops - mainly NFS ignoring the API for freeing up cnp's. This cleanup should not effect nominal operation one way or the other since NFS VOPs just happen to be called with flags that match what it actually does to the NAMEI components it gets. Still, if an NFS error occured, there was probably some memory leakage of NAMEI components with certain NFS VOP ops.
|
43956 |
13-Feb-1999 |
dillon |
PR: kern/9970
Remove incorrect vput() in nfs_link()
|
43705 |
06-Feb-1999 |
dillon |
Flush delayed-write data out prior to issuing a rename rpc. This appears to fix the problem w/ NFSV3 whereby a make installworld would get into high-network-bandwidth situations continuously trying to retry nfs writes that fail with a 'stale file handle' error.
|
43351 |
28-Jan-1999 |
dillon |
Fix warnings related to -Wall -Wcast-qual
|
43311 |
28-Jan-1999 |
dillon |
Fix warnings in preparation for adding -Wall -Wcast-qual to the kernel compile
|
43309 |
27-Jan-1999 |
dillon |
Fix warnings in preparation for adding -Wall -Wcast-qual to the kernel compile.
This commit includes significant work to proper handle const arguments for the DDB symbol routines.
|
43307 |
27-Jan-1999 |
dillon |
Fix nasty bug in nfs_access(). A conditional was if (a = b) instead of if (a == b).
|
43306 |
27-Jan-1999 |
dillon |
Fix warnings in preparation for adding -Wall -Wcast-qual to the kernel compile
|
43305 |
27-Jan-1999 |
dillon |
Fix warnings in preparation for adding -Wall -Wcast-qual to the kernel compile
|
42957 |
21-Jan-1999 |
dillon |
This is a rather large commit that encompasses the new swapper, changes to the VM system to support the new swapper, VM bug fixes, several VM optimizations, and some additional revamping of the VM code. The specific bug fixes will be documented with additional forced commits. This commit is somewhat rough in regards to code cleanup issues.
Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>
|
42582 |
12-Jan-1999 |
eivind |
Remove two cases of unused variable sp3.
|
42315 |
05-Jan-1999 |
eivind |
Remove the 'waslocked' parameter to vfs_object_create().
|
42155 |
30-Dec-1998 |
hoek |
Silence -Wtrigraph.
Submitted by: Bradley Dunn <bradley@dunn.org> (pr: kern/8817)
|
42060 |
25-Dec-1998 |
dfr |
Fix for creating files on a Solaris 7 server with NFSv3 (the request was slightly garbled but older servers seemed to understand it).
Reviewed by: David O'Brien <obrien@nuxi.ucdavis.edu>
|
41796 |
14-Dec-1998 |
dt |
Added 3 new errno values, requred by various standards: EOVERFLOW, ECANCELED, EILSEQ.
Fixed ibcs2 and especially linux EIDRM and ENOMSG errno mapping. Reviewed by: Dan Nelson <dnelson@emsphone.com>
|
41791 |
14-Dec-1998 |
dt |
(Hopefully) fix support for "large" files. Mostly cast block numbers to off_t before they multiplied to block sizes.
|
41619 |
09-Dec-1998 |
eivind |
Remove the if fixed in the last commit; bde quite correctly point out that it can never fail.
|
41606 |
08-Dec-1998 |
eivind |
Fix typo (; in "if (vp == NULL);").
|
41591 |
07-Dec-1998 |
archie |
The "easy" fixes for compiling the kernel -Wunused: remove unreferenced static and local variables, goto labels, and functions declared but not defined.
|
41514 |
04-Dec-1998 |
archie |
Examine all occurrences of sprintf(), strcat(), and str[n]cpy() for possible buffer overflow problems. Replaced most sprintf()'s with snprintf(); for others cases, added terminating NUL bytes where appropriate, replaced constants like "16" with sizeof(), etc.
These changes include several bug fixes, but most changes are for maintainability's sake. Any instance where it wasn't "immediately obvious" that a buffer overflow could not occur was made safer.
Reviewed by: Bruce Evans <bde@zeta.org.au> Reviewed by: Matthew Dillon <dillon@apollo.backplane.com> Reviewed by: Mike Spengler <mks@networkcs.com>
|
41488 |
03-Dec-1998 |
dillon |
Make bootp error message slightly more verbose
|
41186 |
15-Nov-1998 |
msmith |
Reimplement the NFS ACCESS RPC cache as an "accelerator" rather than a true cache. If the cached result lets us say "yes", then go with that. If we're not sure, or we think the answer might be "no", go to the wire to be certain. This avoids all of the possible false veto cases, and allows us to key the cached value with just the UID for which the cached value holds, reducing the bloat of the nfsnode structure from 104 bytes to just 12 bytes.
Since the "yes" case is by far the most common, this should still provide a substantial performance improvement. Also default the cache to on, with a conservative timeout (2 seconds). This improves performance if NFS is loaded as a KLD module, as there's not (yet) code to parse an option out of the module arguments to set it, and sysctl doesn't work (yet) for OIDs in modules.
The 'accelerator' mode was suggested by Bjoern Groenvall (bg@sics.se)
Feedback on this would be appreciated as testing has been necessarily limited by Comdex, and it would be valuable to have this in 2.2.8.
|
41138 |
13-Nov-1998 |
msmith |
Avoid a null pointer reference if the target of an NFS rename has been sillrenamed, or if the source vnode doesn't have an associated nfsnode.
Bug report from Andrew Gallatin <gallatin@cs.duke.edu>
|
41132 |
13-Nov-1998 |
dfr |
Fix a panic in nfsrv_dorec() where a NULL pointer could be passed to free() sometimes.
Reviewed by: Eric Haug <ejh@eas.slu.edu>
|
41127 |
13-Nov-1998 |
msmith |
Implement NFS ACCESS RPC result caching.
This yields startling performance increases for NFS clients for many access profiles, due to the fact that ACCESS results are persistently cached in the namecache in many cases.
Note that the code is somewhat conservative in that it requires an exact credential match for a cache hit. This bloats the nfsnode structure by sizeof(struct ucred) (96 bytes). Any less conservative approach opens the possibility for a false veto in eg. setuid applications. Alternative suggestions would be welcomed.
The cache is normally disabled, to activate set the sysctl variable vfs.nfs.access_cache_timeout to a nonzero value. This is the time in seconds that a cached entry will be considered valid; useful values appear to be 2-10 seconds. Performance of the cache can be monitored with the vfs.nfs.access_cache_hits and vfs.nfs.access_cache_hits variables.
|
41026 |
09-Nov-1998 |
peter |
Remove [apparently] bogus casts to u_long for the vnode_pager_setsize() second argument. np_size is a 64 bit int, so is the second arg. This might have caused needless 2G/4G file size problems.
I believe it was Bruce who queried this.
|
40792 |
31-Oct-1998 |
peter |
vm_object_page_clean() last arg changed from TRUE to OBJPC_SYNC. I'm not sure that this is necessary to be a sync write here since a VOP_FSYNC() follows and it will schedule, sort and complete the writes that the vm_object_page_clean() started (as I think I understand things).
|
40790 |
31-Oct-1998 |
peter |
Use TAILQ macros for clean/dirty block list processing. Set b_xflags rather than abusing the list next pointer with a magic number.
|
39794 |
29-Sep-1998 |
mckusick |
In nfs_link(), check for a cross-device mount *before* looking in the v_data field. Obtained from: Charles Hannum, via Frank van der Linden <frank@wins.uva.nl>
|
39793 |
29-Sep-1998 |
mckusick |
Missing vput when cross-device link error is detected in nfs_link.
|
39792 |
29-Sep-1998 |
mckusick |
During truncation, have to notify the VM about the new size of the NFS file *before* doing the nfs_vinvalbuf operation. Otherwise some invalid data may show up in an mmap.
|
39790 |
29-Sep-1998 |
mckusick |
Frank sez: 'It fixes a problem with servers that return 0 values for some of the fsinfo RPC fields. It is strictly speaking not wrong to do this, as the spec says that "it is expected that a server will make a best effort at supporting all the attributes", but pretty unusual. You guessed it, it's NT servers that do it.' Obtained from: Frank van der Linden <frank@wins.uva.nl>
|
39789 |
29-Sep-1998 |
mckusick |
Do not need (or want) to take a reference on an NFS file that is being deleted due to an forcible unmount. The problem is that vgone calls vclean() which then calls calls nfs_inactive() with VXLOCK set on the vnode. Nfs_inactive() was calling vget() to get a reference on the vnode, which in turn hung on VXLOCK. Nfs_inactive() now checks v_usecount to make sure that the vnode is not coming from vclean() before it does a vget().
|
39788 |
29-Sep-1998 |
mckusick |
The code checks each fragment mark to see if it's valid; if the fragment is less than NFS_MINPACKET or greater than NFS_MAXPACKET in size, it barfs and, I think, drops the connection.
However, there's no guarantee that in a multi-fragment RPC, all the fragments will be at least as large as NFS_MINPACKET.
In fact, with the version of "tclnfs" we have here, which supports NFS over TCP, at least when built under SunOS 4.1.3 (i.e., with 4.1.3's user-mode ONC RPC library), I can *repeatably* cause "tclnfs" to send a request with more than one fragment, one of which is only 8 bytes long. I just do a 3877-byte write to a file, at an offset of 0.
The check that "slp->ns_reclen" is greater than or equal to NFS_MINPACKET serves no useful purpose - if the NFS server code can't handle packets < NFS_MINPACKET bytes, it can't handle them over *any* protocol, so the check has to be done above the RPC-over-TCP layer - and should be removed. Obtained from: Fix from Guy Harris, forwarded by Rick Macklem.
|
39782 |
29-Sep-1998 |
mckusick |
Mark directory buffers that have no valid data with B_INVAL so that they are not put in the cache.
|
39781 |
29-Sep-1998 |
mckusick |
When adding data to a buffer, we need to clear the B_NEEDCOMMIT flag which says that the data is on server but not committed.
|
38909 |
07-Sep-1998 |
bde |
Removed statically configured mount type numbers (MOUNT_*) and all references to them.
The change a couple of days ago to ignore these numbers in statically configured vfsconf structs was slightly premature because the cd9660, cfs, devfs, ext2fs, nfs vfs's still used MOUNT_* instead of the number in their vfsconf struct.
|
38894 |
07-Sep-1998 |
bde |
Made unloading of the nfs LKM sort of work. This is mainly to test detachment of vfs sysctls. Unloading of vfs LKMs doesn't actually work for any vfs, since it leaves garbage pointers to memory allocation control structures.
|
38869 |
05-Sep-1998 |
bde |
Ignore the statically configured vfs type numbers and assign vfs type numbers in vfs attach order (modulo incomplete reuse of old numbers after vfs LKMs are unloaded). This requires reinitializing the sysctl tree (or at least the vfs subtree) for vfs's that support sysctls (currently only nfs). sysctl_order() already handled reinitialization reasonably except it checked for annulled self references in the wrong place.
Fixed sysctls for vfs LKMs.
|
38866 |
05-Sep-1998 |
bde |
Instantiate `nfs_mount_type' in a standard file so that it is present when nfs is an LKM. Declare it in a header file. Don't forget to use it in non-Lite2 code. Initialize it to -1 instead of to 0, since 0 will soon be the mount type number for the first vfs loaded.
NetBSD uses strcmp() to avoid this ugly global.
|
38799 |
04-Sep-1998 |
dfr |
Cosmetic changes to the PAGE_XXX macros to make them consistent with the other objects in vm.
|
38718 |
01-Sep-1998 |
luoqi |
Check for NULL pointer before freeing a struct sockaddr. m_freem() can handle NULL, buf free() can't.
|
38482 |
23-Aug-1998 |
wollman |
Yow! Completely change the way socket options are handled, eliminating another specialized mbuf type in the process. Also clean up some of the cruft surrounding IPFW, multicast routing, RSVP, and other ill-explored corners.
|
38412 |
18-Aug-1998 |
bde |
Fixed printf format errors.
|
38299 |
13-Aug-1998 |
dfr |
Protect all modifications to v_numoutput with splbio().
|
38289 |
12-Aug-1998 |
bde |
Don't configure compatibility code for pre-Lite2 mount() calls by default. This code should go away soon.
|
37997 |
01-Aug-1998 |
peter |
If we get an ENOBUFS from the network, it's normally transient network interface congestion (eg: nfs over a ppp link, etc). Don't log these for UDP mounts, and don't cause syscalls to fail with EINTR. This stops the 'nfs send error 55' warnings.
If the error is because the system is really hosed, this is the least of your problems...
|
37649 |
15-Jul-1998 |
bde |
Cast pointers to uintptr_t/intptr_t instead of to u_long/long, respectively. Most of the longs should probably have been u_longs, but this changes is just to prevent warnings about casts between pointers and integers of different sizes, not to fix poorly chosen types.
|
37393 |
05-Jul-1998 |
dfr |
Use u_int32_t in NQFHHASH instead of u_long.
|
37384 |
04-Jul-1998 |
julian |
VOP_STRATEGY grows an (struct vnode *) argument as the value in b_vp is often not really what you want. (and needs to be frobbed). more cleanups will follow this. Reviewed by: Bruce Evans <bde@freebsd.org>
|
37339 |
02-Jul-1998 |
kato |
Moved `#ifndef NFS_NOSERVER' after including nfs.h.
|
37291 |
30-Jun-1998 |
jmg |
fix buildworld hopefully be3fore anyone complains...
NFS_*TIMO should possibly be converted to sysctl vars (jkh's suggestion), but in some cases it looks like nfs keeps a copy of the value in a struct
hash sizes are already ifdef'd KERNEL, so there aren't userland inpact from them...
|
37272 |
30-Jun-1998 |
jmg |
convert some nfs tunables to options, these are: NFS_MINATTRTIMO VREG attrib cache timeout in sec NFS_MAXATTRTIMO NFS_MINDIRATTRTIMO VDIR attrib cache timeout in sec NFS_MAXDIRATTRTIMO NFS_GATHERDELAY Default write gather delay (msec) NFS_UIDHASHSIZ Tune the size of nfssvc_sock with this NFS_WDELAYHASHSIZ and with this NFS_MUIDHASHSIZ Tune the size of nfsmount with this NFS_NOSERVER (already documented in LINT) NFS_DEBUG turn on NFS debugging
also, because NFS_ROOT is used by very different files, it has been renamed to opt_nfsroot.h instead of the old opt_nfs.h....
|
37089 |
21-Jun-1998 |
bde |
Fixed typo in ifdefed code. (NFS_ACDEBUG is not in LINT. Therefore, code controlled by it did not even compile.)
|
36979 |
14-Jun-1998 |
bde |
Avoid an egcs pessimization for 64-bit signed division on i386's. Pre-2.8 versions of gcc generate a call to __divdi3() for all 64-bit signed divisions, but egcs optimizes them to a shift and fixup when the divisor is a constant power of 2. Unfortunately, it generates a call to __cmpdi2() for the fixup, although all except possibly ancient versions of gcc and egcs do ordinary 64-bit comparisons inline.
|
36735 |
07-Jun-1998 |
dfr |
This commit fixes various 64bit portability problems required for FreeBSD/alpha. The most significant item is to change the command argument to ioctl functions from int to u_long. This change brings us inline with various other BSD versions. Driver writers may like to use (__FreeBSD_version == 300003) to detect this change.
The prototype FreeBSD/alpha machdep will follow in a couple of days time.
|
36563 |
01-Jun-1998 |
peter |
Make sure we go a nfs_fsinfo() in get/putpages before calling readrpc/writerpc, since they assume it's already been done. This could break if the first read/write access to a nfs filesystem was an exec() or mmap() instead of a read(), write() syscall. (or statfs()). nfs_getpages() could return an errno (EOPNOTSUPP) instead of a VM_PAGER_* return code. Some layout tweaks for the get/putpages code.
|
36562 |
01-Jun-1998 |
peter |
Fix post-test pre-commit cleanup typo.
|
36561 |
01-Jun-1998 |
peter |
readlink() returns EINVAL rather than EPERM if called on a non-symlink.
|
36560 |
01-Jun-1998 |
peter |
Preset the maximum file size before we get to nfs_fsinfo(), based on an (over?) conservative assumption about what the client can store in it's buffer cache using a signed 32-bit 512-byte block number index. Otherwise it's possible for some file access when maxfilesize = 0 (eg: /usr is nfs mounted and doing an execve()) Pointed out by: bde
XXX It might make sense to do a preemptive nfs_fsinfo() call at mount time.
|
36557 |
01-Jun-1998 |
peter |
Hide more kernel stuff from userland. This stops nethostaddr etc being wanted by mount_nfs.c.
|
36541 |
31-May-1998 |
peter |
For the on-the-wire protocol, u_long -> u_int32_t; long -> int32_t; int -> int32_t; u_short -> u_int16_t. Also, use mode_t instead of u_short for storing modes (mode_t is a u_int16_t).
Obtained from: NetBSD
|
36540 |
31-May-1998 |
peter |
Support 'mount -u' remounts. This may require disconnecting and rebinding the socket. Certain mode changes are not allowed.
Obtained from: NetBSD
|
36539 |
31-May-1998 |
peter |
Cut-n-paste glitch
|
36538 |
31-May-1998 |
peter |
xdr encode -1 properly.
Obtained from: NetBSD
|
36537 |
31-May-1998 |
peter |
Fully fill in nfsv2 write rpc requests rather than leaving garbage.
Obtained from: NetBSD
|
36536 |
31-May-1998 |
peter |
Don't silently fail to set file flags.
Obtained from: NetBSD
|
36535 |
31-May-1998 |
peter |
Don't blindly accept the server's preferences if they are too small.
Obtained from: NetBSD
|
36534 |
31-May-1998 |
peter |
Prototype support for selectively allowing non-reserved ports on a per export basis. Needs userland support yet.
Obtained from: NetBSD
|
36533 |
31-May-1998 |
peter |
Hide whiteouts from NFS, since the protocol doesn't support them.
Obtained from: NetBSD
|
36532 |
31-May-1998 |
peter |
NetBSD has a comment that Solaris 2.5 doesn't do verifiers correctly, we have weakened this test already for Digital Unix, so it may be enough for Solaris. It needs to be checked again.
Obtained from: NetBSD
|
36531 |
31-May-1998 |
peter |
Don't pass a second copy of the uid/gid in with the v2/v3 sattr structures, it just makes more work. We pass a copy of the uid/gid with the credentials. (although, this may need to be revisited if a non AUTHUNIX authentication method (such as NFSKERB) ever gets implemented).
Obtained from: NetBSD
|
36530 |
31-May-1998 |
peter |
Use the new SB_UPCALL flag,
Obtained from: NetBSD (but I changed the flag clear order in case).
|
36526 |
31-May-1998 |
peter |
NFS_SMALLFH is defined in nfsproto.h, not sys/mount.h
Obtained from: NetBSD
|
36525 |
31-May-1998 |
peter |
Don't let the user try "rmdir ."
Obtained from: NetBSD
|
36524 |
31-May-1998 |
peter |
Don't let the user try and unlink() a directory on a NFS server.
Obtained from: NetBSD
|
36523 |
31-May-1998 |
peter |
When a write rpc returns an error, break the loop.
Obtained from: NetBSD
|
36522 |
31-May-1998 |
peter |
Don't leak an mbuf when a write rpc returns zero bytes written.
Obtained from: NetBSD
|
36521 |
31-May-1998 |
peter |
#ifdef a diagnostic printf
Obtained from: NetBSD
|
36520 |
31-May-1998 |
peter |
Don't try and free mrep twice on some error conditions.
Obtained from: NetBSD
|
36519 |
31-May-1998 |
peter |
#ifdef a diagnostic panic, plus another missed costmetic change.
Obtained from: NetBSD
|
36518 |
31-May-1998 |
peter |
We have gained 2 more errno's, add them to the NFSv2 mapping table.
|
36517 |
31-May-1998 |
peter |
Missed a cosmetic change that the other BSD's have.
|
36516 |
31-May-1998 |
peter |
oops, nfs_msg() is called from client code too.
|
36515 |
31-May-1998 |
peter |
When we can't reconnect a socket, don't forget to unlock before retrying or we can deadlock.
Obtained from: NetBSD
|
36514 |
31-May-1998 |
peter |
Don't log zero length reads, this can happen during normal operation.
Obtained from: NetBSD
|
36513 |
31-May-1998 |
peter |
Consider for readdir chunk sizes when tuning socket buffer reservations.
Obtained from: NetBSD
|
36512 |
31-May-1998 |
peter |
Refuse READDIR / READDIRPLUS rpc's for non-directories
Obtained from: NetBSD
|
36511 |
31-May-1998 |
peter |
Some const's
Obtained from: NetBSD
|
36503 |
31-May-1998 |
peter |
NFS Jumbo commit part 1. Cosmetic and structural changes only. The aim of this part of commits is to minimize unnecessary differences between the other NFS's of similar origin. Yes, there are gratuitous changes here that the style folks won't like, but it makes the catch-up less difficult.
|
36481 |
31-May-1998 |
peter |
VOP_ABORTUP() appears to be called with the wrong vnode. The other callers that I checked (eg: ufs_link()) do the ABORTOP on the directory rather than the file itself. After Michael Hancock's patches, the abortop doesn't seem all that critial now since something else will free the pathname buffer.
|
36473 |
30-May-1998 |
peter |
When using NFSv3, use the remote server's idea of the maximum file size rather than assuming 2^64. It may not like files that big. :-) On the nfs server, calculate and report the max file size as the point that the block numbers in the cache would turn negative. (ie: 1099511627775 bytes (1TB)).
One of the things I'm worried about however, is that directory offsets are really cookies on a NFSv3 server and can be rather large, especially when/if the server generates the opaque directory cookies by using a local filesystem offset in what comes out as the upper 32 bits of the 64 bit cookie. (a server is free to do this, it could save byte swapping depending on the native 64 bit byte order)
Obtained from: NetBSD
|
36329 |
24-May-1998 |
peter |
Convert a couple of large allocations to use zones rather than malloc for better packing. This means that we can choose better values for the various hash entries without having to try and get it all to fit within an artificial power of two limit for malloc's sake.
|
36251 |
20-May-1998 |
peter |
Only ignore "owner" permissions selectively rather than always. In some cases we ignore it (eg: read/write) to maintain chmod-after-open semantics but in other cases we do care, eg: creating files, access() etc. Never ignore errors from VOP_ACCESS() on immutable files.
This apparently comes from BSDI (from Keith Bostic) via NetBSD.
PR: 5148 Submitted by: Yoshiro MIHIRA <sanpei@yy.cs.keio.ac.jp>
|
36249 |
20-May-1998 |
peter |
s/flags/flag/
|
36248 |
20-May-1998 |
peter |
A cleaner fix for PR#5102, clear nonsense flags at mount time rather than in the core of nfs_bio.c at the 11th hour.
PR: 5102
|
36247 |
20-May-1998 |
peter |
Don't change argp->flags after it's been copied.
|
36176 |
19-May-1998 |
peter |
Allow control of the attribute cache timeouts at mount time.
We had run out of bits in the nfs mount flags, I have moved the internal state flags into a seperate variable. These are no longer visible via statfs(), but I don't know of anything that looks at them.
|
36100 |
16-May-1998 |
bde |
Get timespecs directly instead of via timevals.
|
36099 |
16-May-1998 |
bde |
Don't abuse `+' to combine flags.
|
36098 |
16-May-1998 |
bde |
Backed out rev.1.76. It just added style bugs.
|
36097 |
16-May-1998 |
bde |
Get timespecs directly instead of via timevals.
|
36013 |
13-May-1998 |
peter |
Add missing arg to vget().. Serves me right for committing a 2.2 patch to -current without testing it there.. :-(
Submitted by: Michael Hancock <michaelh@cet.co.jp>
|
35996 |
13-May-1998 |
peter |
Delete the #if 0 (nearly) duplicate definitions of nfsproto.h. Having these two files that are almost-but-not-quite the same leads to false grep hits, confusion etc.
Only installing one copy with a symlink would be nice but that doesn't work with SHARED=symlinks (it changes the source tree).
|
35994 |
13-May-1998 |
peter |
Hold a reference to the vnode during the sillyrename cleanup. If we block in nfs_vinvalbuf() or the nfs_removeit(), we can have the nfsnode reallocated from underneath us (eg: replaced by a ufs 'struct inode') which can cause disk corruption ('freeing free block' when di_db[5] gets trashed). This is not a cheap fix, but it'll do until the nfsnodes get reference counting and/or locking.
Apparently NetBSD have a similar fix (apparently from BSDI).
I wish all PR's had this much useful detail. :-)
PR: 6611 Submitted by: Stephen Clawson <sclawson@marker.cs.utah.edu>
|
35991 |
13-May-1998 |
peter |
Move the *vpp initialization earlier so that it's set in all error cases. This should stop the 'panic: leaf should not be empty' nfs panic.
PR: 1856 Submitted by: msaitoh@spa.is.uec.ac.jp
|
35823 |
07-May-1998 |
msmith |
In the words of the submitter:
--------- Make callers of namei() responsible for releasing references or locks instead of having the underlying filesystems do it. This eliminates redundancy in all terminal filesystems and makes it possible for stacked transport layers such as umapfs or nullfs to operate correctly.
Quality testing was done with testvn, and lat_fs from the lmbench suite.
Some NFS client testing courtesy of Patrik Kudo.
vop_mknod and vop_symlink still release the returned vpp. vop_rename still releases 4 vnode arguments before it returns. These remaining cases will be corrected in the next set of patches. ---------
Submitted by: Michael Hancock <michaelh@cet.co.jp>
|
35769 |
06-May-1998 |
msmith |
As described by the submitter:
Reverse the VFS_VRELE patch. Reference counting of vnodes does not need to be done per-fs. I noticed this while fixing vfs layering violations. Doing reference counting in generic code is also the preference cited by John Heidemann in recent discussions with him.
The implementation of alternative vnode management per-fs is still a valid requirement for some filesystems but will be revisited sometime later, most likely using a different framework.
Submitted by: Michael Hancock <michaelh@cet.co.jp>
|
35066 |
06-Apr-1998 |
phk |
Use random() to find our initial xid.
|
34961 |
30-Mar-1998 |
phk |
Eradicate the variable "time" from the kernel, using various measures. "time" wasn't a atomic variable, so splfoo() protection were needed around any access to it, unless you just wanted the seconds part.
Most uses of time.tv_sec now uses the new variable time_second instead.
gettime() changed to getmicrotime(0.
Remove a couple of unneeded splfoo() protections, the new getmicrotime() is atomic, (until Bruce sets a breakpoint in it).
A couple of places needed random data, so use read_random() instead of mucking about with time which isn't random.
Add a new nfs_curusec() function.
Mark a couple of bogosities involving the now disappeard time variable.
Update ffs_update() to avoid the weird "== &time" checks, by fixing the one remaining call that passwd &time as args.
Change profiling in ncr.c to use ticks instead of time. Resolution is the same.
Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call hzto() which subtracts time" sequences.
Reviewed by: bde
|
34930 |
28-Mar-1998 |
steve |
Don't allow the readdirplus routine to be used in NFS V2.
PR: 5102 Reviewed by: msmith Submitted by: Dmitry Kohmanyuk <dk@farm.org>
|
34926 |
28-Mar-1998 |
bde |
Don't depend on <sys/mount.h> including <sys/socket.h>.
|
34924 |
28-Mar-1998 |
bde |
Moved some #includes from <sys/param.h> nearer to where they are actually used.
|
34573 |
14-Mar-1998 |
tegge |
Add a BOOTP_WIRED_TO option, for use on machines with multiple network cards where the first detected card should not be used for bootp. Submitted by: Doug Ambrisko <ambrisko@whistle.com>
|
34572 |
14-Mar-1998 |
tegge |
Update workaround for limitations in the arp code. Adjust the RPC timeout message which occured when the old workaround broke to show the correct IP address.
|
34266 |
08-Mar-1998 |
julian |
Reviewed by: dyson@freebsd.org (john Dyson), dg@root.com (david greenman) Submitted by: Kirk McKusick (mcKusick@mckusick.com) Obtained from: WHistle development tree
|
34206 |
07-Mar-1998 |
dyson |
This mega-commit is meant to fix numerous interrelated problems. There has been some bitrot and incorrect assumptions in the vfs_bio code. These problems have manifest themselves worse on NFS type filesystems, but can still affect local filesystems under certain circumstances. Most of the problems have involved mmap consistancy, and as a side-effect broke the vfs.ioopt code. This code might have been committed seperately, but almost everything is interrelated.
1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that are fully valid. 2) Rather than deactivating erroneously read initial (header) pages in kern_exec, we now free them. 3) Fix the rundown of non-VMIO buffers that are in an inconsistent (missing vp) state. 4) Fix the disassociation of pages from buffers in brelse. The previous code had rotted and was faulty in a couple of important circumstances. 5) Remove a gratuitious buffer wakeup in vfs_vmio_release. 6) Remove a crufty and currently unused cluster mechanism for VBLK files in vfs_bio_awrite. When the code is functional, I'll add back a cleaner version. 7) The page busy count wakeups assocated with the buffer cache usage were incorrectly cleaned up in a previous commit by me. Revert to the original, correct version, but with a cleaner implementation. 8) The cluster read code now tries to keep data associated with buffers more aggressively (without breaking the heuristics) when it is presumed that the read data (buffers) will be soon needed. 9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The delay loop waiting is not useful for filesystem locks, due to the length of the time intervals. 10) Correct and clean-up spec_getpages. 11) Implement a fully functional nfs_getpages, nfs_putpages. 12) Fix nfs_write so that modifications are coherent with the NFS data on the server disk (at least as well as NFS seems to allow.) 13) Properly support MS_INVALIDATE on NFS. 14) Properly pass down MS_INVALIDATE to lower levels of the VM code from vm_map_clean. 15) Better support the notion of pages being busy but valid, so that fewer in-transit waits occur. (use p->busy more for pageouts instead of PG_BUSY.) Since the page is fully valid, it is still usable for reads. 16) It is possible (in error) for cached pages to be busy. Make the page allocation code handle that case correctly. (It should probably be a printf or panic, but I want the system to handle coding errors robustly. I'll probably add a printf.) 17) Correct the design and usage of vm_page_sleep. It didn't handle consistancy problems very well, so make the design a little less lofty. After vm_page_sleep, if it ever blocked, it is still important to relookup the page (if the object generation count changed), and verify it's status (always.) 18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up. 19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush. 20) Fix vm_pager_put_pages and it's descendents to support an int flag instead of a boolean, so that we can pass down the invalidate bit.
|
34096 |
06-Mar-1998 |
msmith |
Trivial filesystem getpages/putpages implementations, set the second. These should be considered the first steps in a work-in-progress. Submitted by: Terry Lambert <terry@freebsd.org>
|
33964 |
01-Mar-1998 |
msmith |
The intent is to get rid of WILLRELE in vnode_if.src by making a complement to all ops that return a vpp, VFS_VRELE. This is initially only for file systems that implement the following ops that do a WILLRELE:
vop_create, vop_whiteout, vop_mknod, vop_remove, vop_link, vop_rename, vop_mkdir, vop_rmdir, vop_symlink
This is initial DNA that doesn't do anything yet. VFS_VRELE is implemented but not called.
A default vfs_vrele was created for fs implementations that use the standard vnode management routines.
VFS_VRELE implementations were made for the following file systems:
Standard (vfs_vrele) ffs mfs nfs msdosfs devfs ext2fs
Custom union umapfs
Just EOPNOTSUPP fdesc procfs kernfs portal cd9660
These implementations may change as VOP changes are implemented.
In the next phase, in the vop implementations calls to vrele and the vrele part of vput will be moved to the top layer vfs_vnops and made visible to all layers. vput will be replaced by unlock in these cases. Unlocking will still be done in the per fs layer but the refcount decrement will be triggered at the top because it doesn't hurt to hold a vnode reference a little longer. This will have minimal impact on the structure of the existing code.
This will only be done for vnode arguments that are released by the various fs vop implementations.
Wider use of VFS_VRELE will likely require restructuring of the code.
Reviewed by: phk, dyson, terry et. al. Submitted by: Michael Hancock <michaelh@cet.co.jp>
|
33181 |
09-Feb-1998 |
eivind |
Staticize.
|
33134 |
06-Feb-1998 |
eivind |
Back out DIAGNOSTIC changes.
|
33121 |
05-Feb-1998 |
dyson |
Fix an omission of a line from the previous commit to this file. The problem appeared to be an NFS hang.
|
33108 |
04-Feb-1998 |
eivind |
Turn DIAGNOSTIC into a new-style option.
|
33058 |
03-Feb-1998 |
bde |
Added #include of <sys/queue.h> so that this file is more "self"-sufficent.
|
33054 |
03-Feb-1998 |
bde |
Forward declare some structs so that this file is more self-sufficient.
|
32998 |
01-Feb-1998 |
bde |
Moved declaration of `union nethostadr' outside of the KERNEL section, to give pollution compatible with <nfs/nqfs.h>. At least mount_nfs.c previously had to #define KERNEL before including <nfs/nfs.h> to get this pollution, but this gave other pollution.
Moved comment about NFSINT_SIGMASK to immediately before the code that it applies to.
|
32997 |
01-Feb-1998 |
bde |
Forward declare more structs that are used in prototypes here - don't depend on <sys/types.h> forward declaring common ones.
Added an underscore to `sin' in prototypes to avoid warnings for the conflict with the ANSI sin().
|
32937 |
31-Jan-1998 |
dyson |
Change the busy page mgmt, so that when pages are freed, they MUST be PG_BUSY. It is bogus to free a page that isn't busy, because it is in a state of being "unavailable" when being freed. The additional advantage is that the page_remove code has a better cross-check that the page should be busy and unavailable for other use. There were some minor problems with the collapse code, and this plugs those subtile "holes."
Also, the vfs_bio code wasn't checking correctly for PG_BUSY pages. I am going to develop a more consistant scheme for grabbing pages, busy or otherwise. For now, we are stuck with the current morass.
|
32912 |
31-Jan-1998 |
tegge |
Release the buffer when an error occurs while reading directory entries.
|
32755 |
25-Jan-1998 |
dyson |
Various NFS fixes: Make vfs_bio buffer mgmt work better. Buffers were being used after brelse. Make nfs_getpages work independently of other NFS interfaces. This eliminates some difficult recursion problems and decreases pagefault overhead. Remove an erroneous vfs_unbusy_pages. Fix a reentrancy problem, with nfs_vinvalbuf when vnode is already being rundown. Reassignbuf wasn't being called when needed under certain circumstances.
(Thanks to Bill Paul for help.)
|
32754 |
25-Jan-1998 |
dyson |
Various NFS fixes: Make vfs_bio buffer mgmt work better. Buffers were being used after brelse. Make nfs_getpages work independently of other NFS interfaces. This eliminates some difficult recursion problems and decreases pagefault overhead. Remove an erroneous vfs_unbusy_pages. Fix a reentrancy problem, with nfs_vinvalbuf when vnode is already being rundown. Reassignbuf wasn't being called when needed under certain circumstances.
(Thanks for help from Bill Paul.)
|
32609 |
18-Jan-1998 |
tegge |
Increase the minimum bootp reply packet size from 16 (bogus) to 300 (correct).
|
32358 |
09-Jan-1998 |
eivind |
Make the BOOTP family new-style options (in opt_bootp.h)
|
32350 |
08-Jan-1998 |
eivind |
Make INET a proper option.
This will not make any of object files that LINT create change; there might be differences with INET disabled, but hardly anything compiled before without INET anyway. Now the 'obvious' things will give a proper error if compiled without inet - ipx_ip, ipfw, tcp_debug. The only thing that _should_ work (but can't be made to compile reasonably easily) is sppp :-(
This commit move struct arpcom from <netinet/if_ether.h> to <net/if_arp.h>.
|
32286 |
06-Jan-1998 |
dyson |
Make our v_usecount vnode reference count work identically to the original BSD code. The association between the vnode and the vm_object no longer includes reference counts. The major difference is that vm_object's are no longer freed gratuitiously from the vnode, and so once an object is created for the vnode, it will last as long as the vnode does.
When a vnode object reference count is incremented, then the underlying vnode reference count is incremented also. The two "objects" are now more intimately related, and so the interactions are now much less complex.
When vnodes are now normally placed onto the free queue with an object still attached. The rundown of the object happens at vnode rundown time, and happens with exactly the same filesystem semantics of the original VFS code. There is absolutely no need for vnode_pager_uncache and other travesties like that anymore.
A side-effect of these changes is that SMP locking should be much simpler, the I/O copyin/copyout optimizations work, NFS should be more ponderable, and further work on layered filesystems should be less frustrating, because of the totally coherent management of the vnode objects and vnodes.
Please be careful with your system while running this code, but I would greatly appreciate feedback as soon a reasonably possible.
|
32071 |
29-Dec-1997 |
dyson |
Lots of improvements, including restructring the caching and management of vnodes and objects. There are some metadata performance improvements that come along with this. There are also a few prototypes added when the need is noticed. Changes include:
1) Cleaning up vref, vget. 2) Removal of the object cache. 3) Nuke vnode_pager_uncache and friends, because they aren't needed anymore. 4) Correct some missing LK_RETRY's in vn_lock. 5) Correct the page range in the code for msync.
Be gentle, and please give me feedback asap.
|
32011 |
27-Dec-1997 |
bde |
Unspammed nested include of <vm/vm_zone.h>.
|
31886 |
20-Dec-1997 |
bde |
Added a used include.
Fixed a gratuitous ANSIism and nearby KNF violations.
|
31617 |
08-Dec-1997 |
dyson |
Various of the ISP users have commented that the 1.41 version of the nfs_bio.c code worked better than the 1.44. This commit reverts the important parts of 1.44 to 1.41, and we will fix it when we can get a handle on the problem.
|
31391 |
24-Nov-1997 |
bde |
Don't call malloc(..., M_WAITOK) at splnet(). Doing so is often a mistake (since softnet interrupts may occur if malloc() waits), and doing it harmlessly but unnecessarily here interfered with detection of the mistaken cases.
|
31132 |
12-Nov-1997 |
julian |
Reviewed by: various.
Ever since I first say the way the mount flags were used I've hated the fact that modes, and events, internal and exported, and short-term and long term flags are all thrown together. Finally it's annoyed me enough.. This patch to the entire FreeBSD tree adds a second mount flag word to the mount struct. it is not exported to userspace. I have moved some of the non exported flags over to this word. this means that we now have 8 free bits in the mount flags. There are another two that might well move over, but which I'm not sure about. The only user visible change would have been in pstat -v, except that davidg has disabled it anyhow. I'd still like to move the state flags and the 'command' flags apart from each other.. e.g. MNT_FORCE really doesn't have the same semantics as MNT_RDONLY, but that's left for another day.
|
31017 |
07-Nov-1997 |
phk |
Rename some local variables to avoid shadowing other local variables.
Found by: -Wshadow
|
31016 |
07-Nov-1997 |
phk |
Remove a bunch of variables which were unused both in GENERIC and LINT.
Found by: -Wunused
|
30994 |
06-Nov-1997 |
phk |
Move the "retval" (3rd) parameter from all syscall functions and put it in struct proc instead.
This fixes a boatload of compiler warning, and removes a lot of cruft from the sources.
I have not removed the /*ARGSUSED*/, they will require some looking at.
libkvm, ps and other userland struct proc frobbing programs will need recompiled.
|
30813 |
28-Oct-1997 |
bde |
Removed unused #includes.
|
30808 |
28-Oct-1997 |
bde |
Don't #include <nfs/nfs.h> in <nfs/nfs_node.h> if KERNEL is defined. Fixed everything that depended on the nested include.
|
30780 |
27-Oct-1997 |
bde |
Removed unused #includes. The need for most of them went away with recent changes (docluster* and vfs improvements).
|
30743 |
26-Oct-1997 |
phk |
VFS interior redecoration.
Rename vn_default_error to vop_defaultop all over the place. Move vn_bwrite from vfs_bio.c to vfs_default.c and call it vop_stdbwrite. Use vop_null instead of nullop. Move vop_nopoll from vfs_subr.c to vfs_default.c Move vop_sharedlock from vfs_subr.c to vfs_default.c Move vop_nolock from vfs_subr.c to vfs_default.c Move vop_nounlock from vfs_subr.c to vfs_default.c Move vop_noislocked from vfs_subr.c to vfs_default.c Use vop_ebadf instead of *_ebadf. Add vop_defaultop for getpages on master vnode in MFS.
|
30738 |
26-Oct-1997 |
phk |
Always initialize the syscall vectors for our "private" syscalls (not just in the LKM case). Plug nqnfs_vop_lease_check directly into the default_vnodeop_p table.
|
30496 |
16-Oct-1997 |
phk |
VFS clean up "hekto commit"
1. Add defaults for more VOPs VOP_LOCK vop_nolock VOP_ISLOCKED vop_noislocked VOP_UNLOCK vop_nounlock and remove direct reference in filesystems.
2. Rename the nfsv2 vnop tables to improve sorting order.
|
30492 |
16-Oct-1997 |
phk |
Another VFS cleanup "kilo commit"
1. Remove VOP_UPDATE, it is (also) an UFS/{FFS,LFS,EXT2FS,MFS} intereface function, and now lives in the ufsmount structure.
2. Remove VOP_SEEK, it was unused.
3. Add mode default vops:
VOP_ADVLOCK vop_einval VOP_CLOSE vop_null VOP_FSYNC vop_null VOP_IOCTL vop_enotty VOP_MMAP vop_einval VOP_OPEN vop_null VOP_PATHCONF vop_einval VOP_READLINK vop_einval VOP_REALLOCBLKS vop_eopnotsupp
And remove identical functionality from filesystems
4. Add vop_stdpathconf, which returns the canonical stuff. Use it in the filesystems. (XXX: It's probably wrong that specfs and fifofs sets this vop, shouldn't it come from the "host" filesystem, for instance ufs or cd9660 ?)
5. Try to make system wide VOP functions have vop_* names.
6. Initialize the um_* vectors in LFS.
(Recompile your LKMS!!!)
|
30474 |
16-Oct-1997 |
phk |
VFS mega cleanup commit (x/N)
1. Add new file "sys/kern/vfs_default.c" where default actions for VOPs go. Implement proper defaults for ABORTOP, BWRITE, LEASE, POLL, REVOKE and STRATEGY. Various stuff spread over the entire tree belongs here.
2. Change VOP_BLKATOFF to a normal function in cd9660.
3. Kill VOP_BLKATOFF, VOP_TRUNCATE, VOP_VFREE, VOP_VALLOC. These are private interface functions between UFS and the underlying storage manager layer (FFS/LFS/MFS/EXT2FS). The functions now live in struct ufsmount instead.
4. Remove a kludge of VOP_ functions in all filesystems, that did nothing but obscure the simplicity and break the expandability. If a filesystem doesn't implement VOP_FOO, it shouldn't have an entry for it in its vnops table. The system will try to DTRT if it is not implemented. There are still some cruft left, but the bulk of it is done.
5. Fix another VCALL in vfs_cache.c (thanks Bruce!)
|
30439 |
15-Oct-1997 |
phk |
vnops megacommit
1. Use the default function to access all the specfs operations. 2. Use the default function to access all the fifofs operations. 3. Use the default function to access all the ufs operations. 4. Fix VCALL usage in vfs_cache.c 5. Use VOCALL to access specfs functions in devfs_vnops.c 6. Staticize most of the spec and fifofs vnops functions. 7. Make UFS panic if it lacks bits of the underlying storage handling.
|
30434 |
15-Oct-1997 |
phk |
Hmm, realign the vnops into two columns.
|
30431 |
15-Oct-1997 |
phk |
Stylistic overhaul of vnops tables. 1. Remove comment stating the blatantly obvious. 2. Align in two columns. 3. Sort all but the default element alphabetically. 4. Remove XXX comments pointing out entries not needed.
|
30430 |
15-Oct-1997 |
phk |
When the default vnops funtion is vn_default_error(), there is no reason to implement small functions that just return EOPNOTSUPP for things we don't do.
The removed functions only apply to UFS based filesystems anyway.
|
30354 |
12-Oct-1997 |
phk |
Last major round (Unless Bruce thinks of somthing :-) of malloc changes.
Distribute all but the most fundamental malloc types. This time I also remembered the trick to making things static: Put "static" in front of them.
A couple of finer points by: bde
|
30309 |
11-Oct-1997 |
phk |
Distribute and statizice a lot of the malloc M_* types.
Substantial input from: bde
|
30118 |
05-Oct-1997 |
phk |
Reverse rev 1.56 and rev 1.59. These made NFS too flakey.
|
29653 |
21-Sep-1997 |
dyson |
Change the M_NAMEI allocations to use the zone allocator. This change plus the previous changes to use the zone allocator decrease the useage of malloc by half. The Zone allocator will be upgradeable to be able to use per CPU-pools, and has more intelligent usage of SPLs. Additionally, it has reasonable stats gathering capabilities, while making most calls inline.
|
29363 |
14-Sep-1997 |
peter |
select -> poll flag missing vnode op table entries
|
29293 |
10-Sep-1997 |
phk |
Don't repeat checks done at general level.
|
29291 |
10-Sep-1997 |
phk |
Remove a couple of stubborn NetBSD #if's.
|
29288 |
10-Sep-1997 |
phk |
unifdef -U__NetBSD__ -D__FreeBSD__
|
29202 |
07-Sep-1997 |
bde |
Removed more vestiges of config-time swap configuration.
|
29024 |
02-Sep-1997 |
bde |
Added used #include - don't depend on <sys/mbuf.h> including <sys/malloc.h> (unless we only use the bogusly shared M*WAIT flags).
|
28787 |
26-Aug-1997 |
phk |
Uncut&paste cache_lookup().
This unifies several times in theory indentical 50 lines of code.
The filesystems have a new method: vop_cachedlookup, which is the meat of the lookup, and use vfs_cache_lookup() for their vop_lookup method. vfs_cache_lookup() will check the namecache and pass on to the vop_cachedlookup method in case of a miss.
It's still the task of the individual filesystems to populate the namecache with cache_enter().
Filesystems that do not use the namecache will just provide the vop_lookup method as usual.
|
28270 |
16-Aug-1997 |
wollman |
Fix all areas of the system (or at least all those in LINT) to avoid storing socket addresses in mbufs. (Socket buffers are the one exception.) A number of kernel APIs needed to get fixed in order to make this happen. Also, fix three protocol families which kept PCBs in mbufs to not malloc them instead. Delete some old compatibility cruft while we're at it, and add some new routines in the in_cksum family.
|
27845 |
02-Aug-1997 |
bde |
Removed unused #includes.
|
27609 |
22-Jul-1997 |
dfr |
Correct some dumb mistakes in the WebNFS stuff.
Submitted by: bde
|
27608 |
22-Jul-1997 |
dfr |
Allow NULL cookie verifiers for non-NULL offsets. This is needed for Digital Unix boxes since they appear to always send null verifiers.
|
27446 |
16-Jul-1997 |
dfr |
Merge WebNFS changes from NetBSD.
Obtained from: NetBSD
|
26995 |
27-Jun-1997 |
wpaul |
Fix a condition where nfs_statfs() can precipitate a panic. There is code that says this:
nfsm_request(vp, NFSPROC_FSSTAT, p, cred); if (v3) nfsm_postop_attr(vp, retattr); if (!error) nfsm_dissect(sfp, struct nfs_statfs *, NFSX_STATFS(v3));
The problem here is that if error != 0, nfsm_dissect() will not be called, which leaves sfp == NULL. But nfs_statfs() does not bail out at this point: it continues processing until it tries to dereference sfp, which causes a panic. I was able to generate this crash under the following conditions:
1) Set up a machine as an NFS server and NFS client, with amd running (using NIS maps). /usr/local is exported, though any exported fs can can be used to trigger the bug. 2) Log in as normal user, with home directory mounted from a SunOS 4.1.3 NFS server via amd (along with a few other NFS filesystems from same machine). 3) Su to root and type the following: # mount localhost:/usr/local /mnt # df
To fix the panic, I changed the code to read:
if (!error) { nfsm_dissect(sfp, struct nfs_statfs *, NFSX_STATFS(v3)); } else goto nfsmout;
This is a bit kludgy in that nfsmout is a label defined by the nfsm_subs.h macros, but these macros are themselves more than a little kludgy. This stops the machine from crashing, but does not fix the overall bug: 'error' somehow becomes 5 (EIO) when a statfs() is performed on the locally mounted NFS filesystem. This seems to only happen the first time the filesystem is accesed: on subsequent accesses, it seems to work fine again.
Now, I know there's no practical use in mounting a local filesystem via NFS, but doing it shouldn't cause the system to melt down.
|
26952 |
25-Jun-1997 |
tegge |
Clear nfs_iodwant[myiod] when the nfsiod process exits due to a signal.
|
26929 |
25-Jun-1997 |
dfr |
Avoid small synchronous writes when an application does lots of random-access short writes within a block (e.g. ld).
|
26928 |
25-Jun-1997 |
dfr |
Make nfs_lookup return a NULLVP on error so that DIAGNOSTIC kernels don't panic.
|
26669 |
16-Jun-1997 |
dyson |
Upgrade NFS to support the new vfs_bio resource/buffer management.
|
26637 |
14-Jun-1997 |
bde |
Don't require superuser privileges for creating fifos. The v2 case was broken when support for v3 was introduced in rev.1.16. The v3 case has always been broken in FreeBSD.
Should be in 2.2.
PR: 3838
|
26581 |
12-Jun-1997 |
tegge |
Move commonly used code into static functions in order to reduce kernel bloat.
|
26580 |
12-Jun-1997 |
tegge |
Remove unused routines.
|
26469 |
06-Jun-1997 |
dfr |
Fix a problem caused by removing large numbers of files from a directory which could cause a bad size to be given to uiomove, causing a page fault.
|
26420 |
03-Jun-1997 |
dfr |
Various fixes from NetBSD:
Use u_int for rpc procedure numbers. Some fixes to NQNFS. A rare NULL pointer dereference. Ignore NFSMNT_NOCONN for TCP mounts.
Obtained from: NetBSD
|
26418 |
03-Jun-1997 |
dfr |
Implement the async mount option for NFSv3. This makes NFS pretend that all writes sent to the server were synchronous and therefore no commits are needed. This is the same as the vfs.nfs.async variable on the server but allows each client to choose whether to work this way.
Also make the vfs.nfs.async variable do the 'right' thing for NFSv3, i.e. pretend that the write was synchronous.
|
26410 |
03-Jun-1997 |
dfr |
Fix a problem with nfs_flush where if many B_NEEDCOMMIT buffers are attached to the vnode, some of them could be re-written synchronously (if they overflowed the fixed size array nfs_flush had for them). The fix involves mallocing an array if there are more than its limited size stack buffer.
Reviewed by: Hidetoshi Shimokawa <simokawa@sat.t.u-tokyo.ac.jp>
|
26409 |
03-Jun-1997 |
dfr |
Fix some performance problems with the NFS mmap fixes.
|
25952 |
20-May-1997 |
dfr |
Plug a memory leak in nfs_link.
PR: kern/1001
|
25930 |
19-May-1997 |
dfr |
Fix a few bugs with NFS and mmap caused by NFS' use of b_validoff and b_validend. The changes to vfs_bio.c are a bit ugly but hopefully can be tidied up later by a slight redesign.
PR: kern/2573, kern/2754, kern/3046 (possibly) Reviewed by: dyson
|
25877 |
17-May-1997 |
phk |
Remove redundant check for vp == dvp (done in VFS before calling).
|
25804 |
14-May-1997 |
tegge |
Use same syntax as netboot for root and swap mounts. Handle mount options. Ignore T16 (swap server address) and T6 (DNS server).
|
25785 |
13-May-1997 |
dfr |
Check the B_CLUSTER flag when choosing whether to use unstable or filesync writes.
PR: kern/3438 Submitted by: Tor Egge <Tor.Egge@idi.ntnu.no>
|
25781 |
13-May-1997 |
dfr |
Don't keep addresses in mbuf chains. This should simplify the next round of network changes from Garret.
Reviewed by: Garrett Wollman <wollman@khavrinen.lcs.mit.edu>
|
25755 |
12-May-1997 |
tegge |
Use the old nfs arguments in the nfs_diskless structure, to be compatible with boot proms made from the 2.2 source. Convert the nfs arguments when copying to the new diskless structure. Copy the gateway field in the diskless structure.
|
25723 |
11-May-1997 |
tegge |
Bring in some kernel bootp support. This removes the need for netboot to fill in the nfs_diskless structure, at the cost of some kernel bloat. The advantage is that this code works on a wider range of network adapters than netboot. Several new kernel options are documented in LINT. Obtained from: parts of the code comes from NetBSD.
|
25664 |
10-May-1997 |
dfr |
Implement a separate control for write gathering on NFSv3. This is turned off for NFSv3 by default since write gathering seems to reduce performance for NFSv3 by up to 60%.
Add sysctl knobs to control both variables.
|
25663 |
10-May-1997 |
dfr |
Fix a nasty hang connected with write gathering. Also add debug print statements to bits of the server which helped me find the hang.
|
25611 |
09-May-1997 |
dfr |
Prevent a mapped root which appears on the server as e.g. nobody from accessing files which it shouldn't be able to. This required a better approximation of VOP_ACCESS for NFSv2 (NFSv3 already has an ACCESS rpc which is a better solution) and adding a call to VOP_ACCESS from VOP_LOOKUP.
PR: kern/876, kern/2635 Submitted by: David Malone <dwmalone@maths.tcd.ie> (for kern/2635)
|
25610 |
09-May-1997 |
dfr |
Fix memory leak caused by the fact that the directory offset cookies and the sillyrename information are stored in the same place.
|
25459 |
04-May-1997 |
phk |
Now I can even execute "df" on my diskless :-)
|
25453 |
04-May-1997 |
phk |
1. Add a {pointer, v_id} pair to the vnode to store the reference to the ".." vnode. This is cheaper storagewise than keeping it in the namecache, and it makes more sense since it's a 1:1 mapping.
2. Also handle the case of "." more intelligently rather than stuff the namecache with pointless entries.
3. Add two lists to the vnode and hang namecache entries which go from or to this vnode. When cleaning a vnode, delete all namecache entries it invalidates.
4. Never reuse namecache enties, malloc new ones when we need it, free old ones when they die. No longer a hard limit on how many we can have.
5. Remove the upper limit on namelength of namecache entries.
6. Make a global list for negative namecache entries, limit their number to a sysctl'able (debug.ncnegfactor) fraction of the total namecache. Currently the default fraction is 1/16th. (Suggestions for better default wanted!)
7. Assign v_id correctly in the face of 32bit rollover.
8. Remove the LRU list for namecache entries, not needed. Remove the #ifdef NCH_STATISTICS stuff, it's not needed either.
9. Use the vnode freelist as a true LRU list, also for namecache accesses.
10. Reuse vnodes more aggresively but also more selectively, if we can't reuse, malloc a new one. There is no longer a hard limit on their number, they grow to the point where we don't reuse potentially usable vnodes. A vnode will not get recycled if still has pages in core or if it is the source of namecache entries (Yes, this does indeed work :-) "." and ".." are not namecache entries any longer...)
11. Do not overload the v_id field in namecache entries with whiteout information, use a char sized flags field instead, so we can get rid of the vpid and v_id fields from the namecache struct. Since we're linked to the vnodes and purged when they're cleaned, we don't have to check the v_id any more.
12. NFS knew about the limitation on name length in the namecache, it shouldn't and doesn't now.
Bugs: The namecache statistics no longer includes the hits for ".." and "." hits.
Performance impact: Generally in the +/- 0.5% for "normal" workstations, but I hope this will allow the system to be selftuning over a bigger range of "special" applications. The case where RAM is available but unused for cache because we don't have any vnodes should be gone.
Future work: Straighten out the namecache statistics.
"desiredvnodes" is still used to (bogusly ?) size hash tables in the filesystems.
I have still to find a way to safely free unused vnodes back so their number can shrink when not needed.
There is a few uses of the v_id field left in the filesystems, scheduled for demolition at a later time.
Maybe a one slot cache for unused namecache entries should be implemented to decrease the malloc/free frequency.
|
25416 |
03-May-1997 |
phk |
Make nfs roots (diskless) functional again. It may still not be correct, but it is functional.
|
25307 |
30-Apr-1997 |
dfr |
Allow NULL rpcs on non-privileged ports at all times to work around broken clients.
PR: kern/3298 Submitted by: Tor Egge <Tor.Egge@idi.ntnu.no>
|
25201 |
27-Apr-1997 |
wollman |
The long-awaited mega-massive-network-code- cleanup. Part I.
This commit includes the following changes: 1) Old-style (pr_usrreq()) protocols are no longer supported, the compatibility glue for them is deleted, and the kernel will panic on boot if any are compiled in.
2) Certain protocol entry points are modified to take a process structure, so they they can easily tell whether or not it is possible to sleep, and also to access credentials.
3) SS_PRIV is no more, and with it goes the SO_PRIVSTATE setsockopt() call. Protocols should use the process pointer they are now passed.
4) The PF_LOCAL and PF_ROUTE families have been updated to use the new style, as has the `raw' skeleton family.
5) PF_LOCAL sockets now obey the process's umask when creating a socket in the filesystem.
As a result, LINT is now broken. I'm hoping that some enterprising hacker with a bit more time will either make the broken bits work (should be easy for netipx) or dike them out.
|
25089 |
22-Apr-1997 |
dfr |
Fix broken usage of nm_readdirsize and increase the socket buffers for UDP to prevent possible socket overflows.
2.2 candidate.
PR: kern/3304 Reviewed by: Thomas David Rivers <ponds!rivers@dg-rtp.dg.com>
|
25023 |
19-Apr-1997 |
dfr |
Fix a bug where a program which appended many small records to a file could wind up writing zeros instead of real data when the file is on an NFSv2 mounted directory.
While tracking this bug down, I noticed that nfs_asyncio was waking *all* the iods when a block was written instead of just one per block. Fixing this gives a 25% performance improvment for writes on v2 (less for v3).
Both are 2.2 candidates.
PR: kern/2774
|
25003 |
18-Apr-1997 |
dfr |
Don't allow partial buffers to be cluster-comitted. Zero the b_dirty{off,end} after cluster-comitting a group of buffers.
With these fixes, I was able to complete a 'make world' with remote src and obj directories.
|
24626 |
04-Apr-1997 |
dfr |
Fix various bugs in the locking protocol, allowing proper shared locks to be used. This should fix the lock panics that people are seeing.
|
24577 |
03-Apr-1997 |
dfr |
The code which recovered from a modified directory situation did not check for eof when re-caching the directory. This could cause it to loop forever if a directory was truncated.
|
24381 |
29-Mar-1997 |
bde |
Removed #include of <ufs/ufs/dir.h>. Nfs no longer depends on any ufs features, and the one thing that it depended on (DIRBLKSIZ) now has conflicting spelling.
|
24378 |
29-Mar-1997 |
bde |
Define our own version of DIRBLKSIZ instead of (ab)using ufs's value. Use the same value of 512 (ufs actually uses DEV_BSIZE). There are too many versions of DIRBLKSIZ, one for ufs, one for ext2fs, one for nfs, one for ibcs2, one for linux, one for applications, ... I think nfs's DIRBLKSIZ needs to be a divisor of the directory blocks sizes of all supported file systems. There is also NFS_DIRBLKSIZ, which is different from nfs's DIRBLKSIZ but is sometimes confused with it in comments.
Removed a bogus #ifdef KERNEL that hid the tunable constants for nfs. This came in undocumented with the Lite2 merge although it isn't in Lite2. It required more-bogus #define KERNEL's in fstat and pstat to make the constants visible.
Restored a spelling fix from rev.1.17.
Removed duplicate #defines of all the the NFS mount option flags.
|
24330 |
27-Mar-1997 |
guido |
Add code that will reject nfs requests in teh kernel from nonprivileged ports. This option will be automatically set/cleraed when mount is run without/with the -n option. Reviewed by: Doug Rabson
|
24250 |
25-Mar-1997 |
peter |
Use the correct (relative to the implementation) ordering of args in the VOP_LINK() calls, Closes PR#3064
Submitted by: bde
|
24249 |
25-Mar-1997 |
peter |
The local fs interface does not allow link()/unlink() of directories, do not allow a remote nfs client to cause local fs corruption either.
|
24204 |
24-Mar-1997 |
bde |
Don't include <sys/ioctl.h> in the kernel. Stage 2: include <sys/sockio.h> instead of <sys/ioctl.h> in network files.
|
24101 |
22-Mar-1997 |
bde |
Fixed some invalid (non-atomic) accesses to `time', mostly ones of the form `tv = time'. Use a new function gettime(). The current version just forces atomicicity without fixing precision or efficiency bugs. Simplified some related valid accesses by using the central function.
|
23570 |
09-Mar-1997 |
bde |
YAMInTheWrongDirectionF22 (part of rev.1.28.2.3: set B_CLUSTEROK for commits).
|
23218 |
28-Feb-1997 |
bde |
Fixed a panic in nfs_writevp(). Lite2 provided a fix for a silly missing-parentheses bug, but this exposed a misplaced vfs_busy_pages(). This bug cost a factor of 2.5-3 in nfsv3 write performance! It should be fixed in 2.2.
Removed some debugging code that gets triggered often in normal operation. There are still many backwards diagnostics (#define DIAGNOSTIC gives no diagnostics).
Submitted by: vfs_busy_pages() fix by dfr
|
22975 |
22-Feb-1997 |
peter |
Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.
|
22928 |
19-Feb-1997 |
kato |
Moved nqnfs_vop_lease_check() inside #ifndef NFS_NOSERVER. So, NFS_NOSERVER kernel can be compiled again.
|
22871 |
18-Feb-1997 |
bde |
Changed `#ifdef COMPAT_PRELITE2' to `#ifndef NO_COMPAT_PRELITE2' so that old nfs mount calls are supported by default.
|
22521 |
10-Feb-1997 |
dyson |
This is the kernel Lite/2 commit. There are some requisite userland changes, so don't expect to be able to run the kernel as-is (very well) without the appropriate Lite/2 userland changes.
The system boots and can mount UFS filesystems.
Untested: ext2fs, msdosfs, NFS Known problems: Incorrect Berkeley ID strings in some files. Mount_std mounts will not work until the getfsent library routine is changed.
Reviewed by: various people Submitted by: Jeffery Hsu <hsu@freebsd.org>
|
21673 |
14-Jan-1997 |
jkh |
Make the long-awaited change from $Id$ to $FreeBSD$
This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long.
Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
|
21124 |
31-Dec-1996 |
wpaul |
Fix (properly, I hope) 'panic: sillyrename dir' crash that can happen if you do:
% cd /nfsdir % mkdir -p foo/foo % mv foo/foo .
nfs_sillyrename() self-destructs if you try to sillyrename a directory, however nfs_rename() can be coerced into doing just that by the above sequence of commands. To avoid this, nfs_rename() now checks that v_type of the 'destination' vnode != VDIR before attempting the sillyrename. The server correctly handles this particular situation by returning ENOTEMPTY on the rename() attempt.
I asked if this was the correct fix for this on -hackers but nobody ever answered.
This is a 2.2 candidate.
|
20407 |
13-Dec-1996 |
wollman |
Convert the interface address and IP interface address structures to TAILQs. Fix places which referenced these for no good reason that I can see (the references remain, but were fixed to compile again; they are still questionable).
|
19449 |
06-Nov-1996 |
dfr |
Improve the queuing algorithms used by NFS' asynchronous i/o. The existing mechanism uses a global queue for some buffers and the vp->b_dirtyblkhd queue for others. This turns sequential writes into randomly ordered writes to the server, affecting both read and write performance. The existing mechanism also copes badly with hung servers, tending to block accesses to other servers when all the iods are waiting for a hung server.
The new mechanism uses a queue for each mount point. All asynchronous i/o goes through this queue which preserves the ordering of requests. A simple mechanism ensures that the iods are shared out fairly between active mount points. This removes the sysctl variable vfs.nfs.dwrite since the new queueing mechanism removes the old delayed write code completely.
This should go into the 2.2 branch.
|
19070 |
21-Oct-1996 |
dfr |
If a large (>4096 bytes) directory was modified, the old directory contents are discarded, including the cached seek cookies. Unfortunately, if the directory was larger than NFS_DIRBLKSIZ, then this confused nfs_readdirrpc(), making it appear as if the directory was truncated.
Reviewed by: Karl Denninger <karl@Mcs.Net>
|
19058 |
20-Oct-1996 |
phk |
Add four sysctl variables that joerg wanted.
|
18888 |
12-Oct-1996 |
bde |
Staticized `nfs_dwrite'.
|
18866 |
11-Oct-1996 |
dfr |
This fixes a problem with the nfs socket handling code which happens if a single process is performing a large number of requests (in this case writing a large file). The writing process could monopolise the recieve lock and prevent any other processes from recieving their replies.
It also adds a new sysctl variable 'vfs.nfs.dwrite' which controls the behaviour which originally pointed out the problem. When a process writes to a file over NFS, it usually arranges for another process (the 'iod') to perform the request. If no iods are available, then it turns the write into a 'delayed write' which is later picked up by the next iod to do a write request for that file. This can cause that particular iod to do a disproportionate number of requests from a single process which can harm performance on some NFS servers. The alternative is to perform the write synchronously in the context of the original writing process if no iod is avaiable for asynchronous writing.
The 'delayed write' behaviour is selected when vfs.nfs.dwrite=1 and the non-delayed behaviour is selected when vfs.nfs.dwrite=0. The default is vfs.nfs.dwrite=1; if many people tell me that performance is better if vfs.nfs.dwrite=0 then I will change the default.
Submitted by: Hidetoshi Shimokawa <simokawa@sat.t.u-tokyo.ac.jp>
|
18397 |
19-Sep-1996 |
nate |
In sys/time.h, struct timespec is defined as:
/* * Structure defined by POSIX.4 to be like a timeval. */ struct timespec { time_t ts_sec; /* seconds */ long ts_nsec; /* and nanoseconds */ };
The correct names of the fields are tv_sec and tv_nsec.
Reminded by: James Drobina <jdrobina@infinet.com>
|
18041 |
05-Sep-1996 |
dg |
Release an unneeded reference to a vnode that was gained in a VFS_VGET(). Fixes a readdirplus panic.
Submitted by: Doug Rabson <dfr@render.com>
|
18020 |
03-Sep-1996 |
bde |
Eliminated nested include of <sys/unistd.h> in <sys/file.h> in the kernel. Include it directly in the few places where it is used.
Reduced some #includes of <sys/file.h> to #includes of <sys/fcntl.h> or nothing.
|
17761 |
21-Aug-1996 |
dyson |
Even though this looks like it, this is not a complex code change. The interface into the "VMIO" system has changed to be more consistant and robust. Essentially, it is now no longer necessary to call vn_open to get merged VM/Buffer cache operation, and exceptional conditions such as merged operation of VBLK devices is simpler and more correct.
This code corrects a potentially large set of problems including the problems with ktrace output and loaded systems, file create/deletes, etc.
Most of the changes to NFS are cosmetic and name changes, eliminating a layer of subroutine calls. The direct calls to vput/vrele have been re-instituted for better cross platform compatibility.
Reviewed by: davidg
|
17186 |
16-Jul-1996 |
dfr |
Various fixes from frank@fwi.uva.nl (Frank van der Linden) via rick@snowhite.cis.uoguelph.ca:
1. Clear B_NEEDCOMMIT in nfs_write to make sure that dirty data is correctly send to the server. If a buffer was dirtied when it was in the B_DELWRI+B_NEEDCOMMIT state, the state of the buffer was left unchanged and when the buffer was later cleaned, just a commit rpc was made to the server to complete the previous write. Clearing B_NEEDCOMMIT ensures that another write is made to the server.
2. If a server returned a server (for whatever reason) returned an answer to a write RPC that implied that fewer bytes than requested were written, bad things would happen.
3. The setattr operation passed on the atime in stead of the mtime to the server. The fix is trivial.
4. XIDs always started at 0, but this caused some servers (older DEC OSF/1 3.0 so I've been told) who had very long-lasting XID caches to get confused if, after a reboot of a BSD client, RPCs came in with a XID that had in the past been used before from that client. Patch is to use the current time in seconds as a starting point for XIDs. The patch below is not perfect, because it requires the root fs to be mounted first. This is because of the check BSD systems do, comparing FS time to system time.
Reviewed by: Bruce Evans, Terry Lambert. Obtained from: frank@fwi.uva.nl (Frank van der Linden) via rick@snowhite.cis.uoguelph.ca
|
17096 |
11-Jul-1996 |
wollman |
Modify the kernel to use the new pr_usrreqs interface rather than the old pr_usrreq mechanism which was poorly designed and error-prone. This commit renames pr_usrreq to pr_ousrreq so that old code which depended on it would break in an obvious manner. This commit also implements the new interface for TCP, although the old function is left as an example (#ifdef'ed out). This commit ALSO fixes a longstanding bug in the TCP timer processing (introduced by davidg on 1995/04/12) which caused timer processing on a TCB to always stop after a single timer had expired (because it misinterpreted the return value from tcp_usrreq() to indicate that the TCB had been deleted). Finally, some code related to polling has been deleted from if.c because it is not relevant t -current and doesn't look at all like my current code.
|
16634 |
23-Jun-1996 |
bde |
Don't truncate minor or major numbers in the nfsv3 client.
|
16365 |
14-Jun-1996 |
phk |
Fix for NFS_NOSERVER
Poul mentioned that he thought this was some kind of timing problem, and that started me thinking. After a little poking around, I found that nfs_timer() was completely disabled when NFS_NOSERVER was #defined. But after looking at nfs_timer(), it seemed like it was something required by both the client and server code, and disabling it outright just didn't seem to make any sense. Parts of it relate only to the NFS server side code, so I disabled those, but I re-enabled the rest of the function and made sure that it would be called from nfs_init() (in nfs_subs.c).
With nfs_timer() re-enabled, everything seems to work again. The only other changes I made were to #ifdef away some variable declarations in the NFS_NOSERVER case so that gcc would stop complaining about unused variables.
Reviewed by: phk Submitted by: Bill Paul <wpaul@skynet.ctr.columbia.edu>
|
16312 |
12-Jun-1996 |
dg |
Moved the fsnode MALLOC to before the call to getnewvnode() so that the process won't possibly block before filling in the fsnode pointer (v_data) which might be dereferenced during a sync since the vnode is put on the mnt_vnodelist by getnewvnode.
Pointed out by Matt Day <mday@artisoft.com>
|
16226 |
08-Jun-1996 |
bde |
Fixed a vnode reference leak in nfsrv_rename(). The target inode wasn't released until the file system was unmounted. This bug also affected kern/vfs_syscalls.c but was fixed in rev.1.18 and rev.1.20 there.
Reviewed by: davidg
|
16192 |
08-Jun-1996 |
pst |
Clear flags before using an inactive buffer. This is a kludge, but matches the code in bread().
Reviewed by: bde
|
15543 |
02-May-1996 |
phk |
removed: CLBYTES PD_SHIFT PGSHIFT NBPG PGOFSET CLSIZELOG2 CLSIZE pdei() ptei() kvtopte() ptetov() ispt() ptetoav() &c &c new: NPDEPG
Major macro cleanup.
|
15480 |
30-Apr-1996 |
bde |
#include <sys/filedesc.h> explicitly instead of depending on it being bogusly included by <sys/socketvar.h>.
|
15479 |
30-Apr-1996 |
bde |
Fixed nfs sysctls. They missed out on the fs -> vfs name changes from Lite2. This broke nfsstat.
|
14093 |
13-Feb-1996 |
wollman |
Kill XNS. While we're at it, fix socreate() to take a process argument. (This was supposed to get committed days ago...)
|
13765 |
30-Jan-1996 |
mpp |
Fix a bunch of spelling errors in the comment fields of a bunch of system include files.
|
13625 |
25-Jan-1996 |
bde |
Fixed spelling of s_namlen so that this compiles again.
|
13619 |
24-Jan-1996 |
phk |
Use new printf features rather than local kludges.
|
13612 |
24-Jan-1996 |
mpp |
Add a check to prevent a computation from underflowing and causing a panic due to an attaempt to allocate a buffer for a terabyte or so of data when an attempt is made to create sparse data (e.g. a holey file) more than 1 block past the end of the file.
Note: some other areas of this code need to be looked at, since they might cause problems when the file size exceeds 2GB, due to storing results in ints when the computations are being done with quad sized variables.
Reviewed by: bde
|
13490 |
19-Jan-1996 |
dyson |
Eliminated many redundant vm_map_lookup operations for vm_mmap. Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish overhead for merged cache. Efficiency improvement for vfs_cluster. It used to do alot of redundant calls to cluster_rbuild. Correct the ordering for vrele of .text and release of credentials. Use the selective tlb update for 486/586/P6. Numerous fixes to the size of objects allocated for files. Additionally, fixes in the various pagers. Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs. Fixes in the swap pager for exhausted resources. The pageout code will not as readily thrash. Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE), thereby improving efficiency of several routines. Eliminate even more unnecessary vm_page_protect operations. Significantly speed up process forks. Make vm_object_page_clean more efficient, thereby eliminating the pause that happens every 30seconds. Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the case of filesystems mounted async. Fix a panic with busy pages when write clustering is done for non-VMIO buffers.
|
13416 |
13-Jan-1996 |
phk |
Add an option NFS_NOSERVER which saves 100K in the install kernel (or any other kernel that uses it). Use with option NFS.
|
13084 |
28-Dec-1995 |
phk |
Don't print swap server as root server. Submitted by: Mattias.Gronlund@sa.erisoft.se (Mattias Gronlund)
|
12970 |
22-Dec-1995 |
phk |
Move fs.nfs.nfsstats sysctl var back to it's old OID.
|
12911 |
17-Dec-1995 |
phk |
Staticize.
|
12662 |
07-Dec-1995 |
dg |
Untangled the vm.h include file spaghetti.
|
12588 |
03-Dec-1995 |
bde |
Completed function declarations and/or added prototypes and/or moved prototypes to the right place.
|
12457 |
21-Nov-1995 |
bde |
Completed function declarations, added prototypes and removed redundant declarations.
|
12453 |
21-Nov-1995 |
bde |
Completed function declarations and/or added prototypes.
|
12287 |
14-Nov-1995 |
phk |
Get rid of hostnamelen variable.
|
12274 |
14-Nov-1995 |
bde |
Included <sys/sysproto.h> to get central declarations for syscall args structs and prototypes for syscalls.
Ifdefed duplicated decentralized declarations of args structs. It's convenient to have this visible but they are hard to maintain. Some are already different from the central declarations. 4.4lite2 puts them in comments in the function headers but I wanted to avoid the large changes for that.
|
12158 |
09-Nov-1995 |
bde |
Introduced a type `vop_t' for vnode operation functions and used it 1138 times (:-() in casts and a few more times in declarations. This change is null for the i386.
The type has to be `typedef int vop_t(void *)' and not `typedef int vop_t()' because `gcc -Wstrict-prototypes' warns about the latter. Since vnode op functions are called with args of different (struct pointer) types, neither of these function types is any use for type checking of the arg, so it would be preferable not to use the complete function type, especially since using the complete type requires adding 1138 casts to avoid compiler warnings and another 40+ casts to reverse the function pointer conversions before calling the functions.
|
12118 |
06-Nov-1995 |
bde |
Replaced bogus macros for dummy devswitch entries by functions. These functions went away:
enosys (hasn't been used for some time) enxio enodev enoioctl (was used only once, actually for a vop)
if_tun.c: Continued cleaning up...
conf.h: Probably fixed the type of d_reset_t. It is hard to tell the correct type because there are no non-dummy device reset functions.
Removed last vestige of ambiguous sleep message strings.
|
11982 |
31-Oct-1995 |
joerg |
Include a prerequisite header (so this is consistent again with the NFSv2 state).
|
11921 |
29-Oct-1995 |
phk |
Second batch of cleanup changes. This time mostly making a lot of things static and some unused variables here and there.
|
11645 |
22-Oct-1995 |
dg |
Fix order problem: unbusy pages before releasing the buffer.
Submitted by: John Dyson <dyson>
|
11644 |
22-Oct-1995 |
dg |
Moved the filesystem read-only check out of the syscalls and into the filesystem layer, as was done in lite-2. Merged in some other cosmetic changes while I was at it. Rewrote most of msdosfs_access() to be more like ufs_access() and to include the FS read-only check.
Obtained from: partially from 4.4BSD-lite2
|
10551 |
04-Sep-1995 |
dyson |
Added VOP_GETPAGES/VOP_PUTPAGES and also the "backwards" block count for VOP_BMAP. Updated affected filesystems...
|
10478 |
30-Aug-1995 |
dfr |
Make nfs diskless work again.
Reviewed by: John Hay <jhay@mikom.csir.co.za>
|
10224 |
24-Aug-1995 |
dg |
Added NFS_ASYNC kernel option. It only has an effect for NFSv2.
|
10223 |
24-Aug-1995 |
dg |
Killed redundant declarations of nfsm_rpchead().
|
10222 |
24-Aug-1995 |
dfr |
Some fixes found using gcc -Wall:
nfsm_rpchead() has been called with the wrong number of args and misplaced args since someone added new args in the middle for nfsv3.
Here's another one that would be important on 64-bit systems. VOP_READDIR takes a `u_int **cookies' arg.
Submitted by: Bruce Evans <bde@zeta.org.au>
|
10219 |
24-Aug-1995 |
dfr |
Add support for amd direct maps.
Reviewed by: Thomas Graichen <graichen@sirius.physik.fu-berlin.de>
|
10027 |
11-Aug-1995 |
dg |
Converted mountlist to a CIRCLEQ.
Partially obtained from: 4.4BSD-Lite2
|
9966 |
06-Aug-1995 |
dg |
Fixed bug where vnode_pager_uncache() wasn't always called when it should be. The result was that the file's space wouldn't be properly freed when it was deleted.
Submitted by: John Dyson
|
9877 |
03-Aug-1995 |
dfr |
Slight changes to locking around VOP_READRIR. Detect in nfsrv_readdirplus when a filesystem soes not support VFS_VGET and return NFSERR_NOTSUPP so that the client will use ordinary readdir instead.
|
9854 |
02-Aug-1995 |
dfr |
Lock the directory vnode before VOP_READDIR in nfsrv_readdirplus
|
9842 |
01-Aug-1995 |
dg |
Removed my special-case hack for VOP_LINK and fixed the problem with the wrong vp's ops vector being used by changing the VOP_LINK's argument order. The special-case hack doesn't go far enough and breaks the generic bypass routine used in some non-leaf filesystems. Pointed out by Kirk McKusick.
|
9759 |
29-Jul-1995 |
bde |
Eliminate sloppy common-style declarations. There should be none left for the LINT configuation.
|
9681 |
24-Jul-1995 |
dfr |
Slightly better fix than previous revision.
Submitted by: Rick Macklem <rick@snowhite.cis.uoguelph.ca>
|
9679 |
24-Jul-1995 |
dfr |
Fix a problem which appeared to truncate a file to the nearest block boundary when it is moved to an NFS filesystem from from another filesystem and /bin/mv failed to set the file ownership during the move.
I believe that this bug is present in STABLE but I have not tested it. The fix would be the same in STABLE even though the code has changed quite considerably in CURRENT.
|
9627 |
22-Jul-1995 |
dg |
Correct my cut-'n-paste job from ffs_vfsops.c and fix up the formatting to be similar.
Submitted by: Bruce Evans
|
9604 |
21-Jul-1995 |
dg |
Implemented an nfs_node hash list lock, similar to what was implemented in ffs_vget(), and for the same reason: to prevent a race condition that results in duplicate vnodes/NFSnodes being allocated.
|
9588 |
20-Jul-1995 |
dg |
vnode_pager_alloc() never returns NULL, so don't check for it.
|
9522 |
13-Jul-1995 |
dfr |
I believe that the following fix to nfs_vnops.c should do the trick w.r.t. the problem "when a file is truncated on the server after being written on a client under NFSv3, the client doesn't see the size drop to zero". (As you noted, the problem is that NMODIFIED wasn't being cleared by nfs_close when it flushed the buffers. After checking through the code, the only place where NMODIFIED was used to test for the possibility of dirty blocks was in nfs_setattr(). The two cases are safe to do when there aren't dirty blocks, so I just took out the tests. Unfortunately, testing for v_dirtyblkhd.lh_first being non-null is not sufficient, since there are times when the code moves blocks to the clean list and then back to the dirty list.)
Submitted by: rick@snowhite.cis.uoguelph.ca
|
9507 |
13-Jul-1995 |
dg |
NOTE: libkvm, w, ps, 'top', and any other utility which depends on struct proc or any VM system structure will have to be rebuilt!!!
Much needed overhaul of the VM system. Included in this first round of changes:
1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages, haspage, and sync operations are supported. The haspage interface now provides information about clusterability. All pager routines now take struct vm_object's instead of "pagers".
2) Improved data structures. In the previous paradigm, there is constant confusion caused by pagers being both a data structure ("allocate a pager") and a collection of routines. The idea of a pager structure has escentially been eliminated. Objects now have types, and this type is used to index the appropriate pager. In most cases, items in the pager structure were duplicated in the object data structure and thus were unnecessary. In the few cases that remained, a un_pager structure union was created in the object to contain these items.
3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now be removed. For instance, vm_object_enter(), vm_object_lookup(), vm_object_remove(), and the associated object hash list were some of the things that were removed.
4) simple_lock's removed. Discussion with several people reveals that the SMP locking primitives used in the VM system aren't likely the mechanism that we'll be adopting. Even if it were, the locking that was in the code was very inadequate and would have to be mostly re-done anyway. The locking in a uni-processor kernel was a no-op but went a long way toward making the code difficult to read and debug.
5) Places that attempted to kludge-up the fact that we don't have kernel thread support have been fixed to reflect the reality that we are really dealing with processes, not threads. The VM system didn't have complete thread support, so the comments and mis-named routines were just wrong. We now use tsleep and wakeup directly in the lock routines, for instance.
6) Where appropriate, the pagers have been improved, especially in the pager_alloc routines. Most of the pager_allocs have been rewritten and are now faster and easier to maintain.
7) The pagedaemon pageout clustering algorithm has been rewritten and now tries harder to output an even number of pages before and after the requested page. This is sort of the reverse of the ideal pagein algorithm and should provide better overall performance.
8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup have been removed. Some other unnecessary casts have also been removed.
9) Some almost useless debugging code removed.
10) Terminology of shadow objects vs. backing objects straightened out. The fact that the vm_object data structure escentially had this backwards really confused things. The use of "shadow" and "backing object" throughout the code is now internally consistent and correct in the Mach terminology.
11) Several minor bug fixes, including one in the vm daemon that caused 0 RSS objects to not get purged as intended.
12) A "default pager" has now been created which cleans up the transition of objects to the "swap" type. The previous checks throughout the code for swp->pg_data != NULL were really ugly. This change also provides the rudiments for future backing of "anonymous" memory by something other than the swap pager (via the vnode pager, for example), and it allows the decision about which of these pagers to use to be made dynamically (although will need some additional decision code to do this, of course).
13) (dyson) MAP_COPY has been deprecated and the corresponding "copy object" code has been removed. MAP_COPY was undocumented and non- standard. It was furthermore broken in several ways which caused its behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will continue to work correctly, but via the slightly different semantics of MAP_PRIVATE.
14) (dyson) Sharing maps have been removed. It's marginal usefulness in a threads design can be worked around in other ways. Both #12 and #13 were done to simplify the code and improve readability and maintain- ability. (As were most all of these changes)
TODO:
1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing this will reduce the vnode pager to a mere fraction of its current size.
2) Rewrite vm_fault and the swap/vnode pagers to use the clustering information provided by the new haspage pager interface. This will substantially reduce the overhead by eliminating a large number of VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be improved to provide both a "behind" and "ahead" indication of contiguousness.
3) Implement the extended features of pager_haspage in swap_pager_haspage(). It currently just says 0 pages ahead/behind.
4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps via a much more general mechanism that could also be used for disk striping of regular filesystems.
5) Do something to improve the architecture of vm_object_collapse(). The fact that it makes calls into the swap pager and knows too much about how the swap pager operates really bothers me. It also doesn't allow for collapsing of non-swap pager objects ("unnamed" objects backed by other pagers).
|
9456 |
09-Jul-1995 |
dg |
Moved call to VOP_GETATTR() out of vnode_pager_alloc() and into the places that call vnode_pager_alloc() so that a failure return can be dealt with. This fixes a panic seen on NFS clients when a file being opened is deleted on the server before the open completes.
|
9428 |
07-Jul-1995 |
dfr |
Use a consistent blocksize for sizing bufs to avoid panicing the bio system.
|
9365 |
28-Jun-1995 |
dfr |
Use the correct cred for nfs_commit operations.
|
9356 |
28-Jun-1995 |
dg |
1) Converted v_vmdata to v_object. 2) Removed unnecessary vm_object_lookup()/pager_cache(object, TRUE) pairs after vnode_pager_alloc() calls - the object is already guaranteed to be persistent. 3) Removed some gratuitous casts.
|
9354 |
28-Jun-1995 |
dg |
Fixed VOP_LINK argument order botch.
|
9336 |
27-Jun-1995 |
dfr |
Changes to support version 3 of the NFS protocol. The version 2 support has been tested (client+server) against FreeBSD-2.0, IRIX 5.3 and FreeBSD-current (using a loopback mount). The version 2 support is stable AFAIK. The version 3 support has been tested with a loopback mount and minimally against an IRIX 5.3 server. It needs more testing and may have problems. I have patched amd to support the new variable length filehandles although it will still only use version 2 of the protocol.
Before booting a kernel with these changes, nfs clients will need to at least build and install /usr/sbin/mount_nfs. Servers will need to build and install /usr/sbin/mountd.
NFS diskless support is untested.
Obtained from: Rick Macklem <rick@snowhite.cis.uoguelph.ca>
|
9221 |
14-Jun-1995 |
joerg |
The duplicate information returned in fa_type and fa_mode is an ambiguity in the NFS version 2 protocol.
VREG should be taken literally as a regular file. If a server intents to return some type information differently in the upper bits of the mode field (e.g. for sockets, or FIFOs), NFSv2 mandates fa_type to be VNON. Anyway, we leave the examination of the mode bits even in the VREG case to avoid breakage for bogus servers, but we make sure that there are actually type bits set in the upper part of fa_mode (and failing that, trust the va_type field).
NFSv3 cleared the issue, and requires fa_mode to not contain any type information (while also introduing sockets and FIFOs for fa_type).
The fix has been tested against a variety of NFS servers. It fixes problems with the ``Tropic'' NFS server for Windows, while apparently not breaking anything.
Pointed-out by: scott@zorch.sf-bay.org (Scott Hazen Mueller)
|
9202 |
11-Jun-1995 |
rgrimes |
Merge RELENG_2_0_5 into HEAD
|
8876 |
30-May-1995 |
rgrimes |
Remove trailing whitespace.
|
8832 |
29-May-1995 |
dg |
Fixed some serious bugs that resulted in object reference counts not being handled correctly. This would manifest itself as "object deallocated too many times" panics and perhaps other strange inconsistencies on NFS servers.
Reviewed by: me, of course Submitted by: John Dyson
|
8692 |
21-May-1995 |
dg |
Changes to fix the following bugs:
1) Files weren't properly synced on filesystems other than UFS. In some cases, this lead to lost data. Most likely would be noticed on NFS. The fix is to make the VM page sync/object_clean general rather than in each filesystem. 2) Mixing regular and mmaped file I/O on NFS was very broken. It caused chunks of files to end up as zeroes rather than the intended contents. The fix was to fix several race conditions and to kludge up the "b_dirtyoff" and "b_dirtyend" that NFS relies upon - paying attention to page modifications that occurred via the mmapping.
Reviewed by: David Greenman Submitted by: John Dyson
|
8504 |
14-May-1995 |
dg |
Changed swap partition handling/allocation so that it doesn't require specific partitions be mentioned in the kernel config file ("swap on foo" is now obsolete).
From Poul-Henning:
The visible effect is this:
As default, unless options "NSWAPDEV=23" is in your config, you will have four swap-devices. You can swapon(2) any block device you feel like, it doesn't have to be in the kernel config.
There is a performance/resource win available by getting the NSWAPDEV right (but only if you have just one swap-device ??), but using that as default would be too restrictive.
The invisible effect is that:
Swap-handling disappears from the $arch part of the kernel. It gets a lot simpler (-145 lines) and cleaner.
Reviewed by: John Dyson, David Greenman Submitted by: Poul-Henning Kamp, with minor changes by me.
|
7969 |
21-Apr-1995 |
dyson |
Slight re-ordering of the creation of a vmio object to fix a condition that can cause NFS I/O failures.
|
7871 |
16-Apr-1995 |
dg |
Various fixes from John Dyson:
1) Rewrote screwy code that uses an incore buffer without making it busy. 2) Use B_CACHE instead of B_DONE in cases where it is appropriate. 3) Minor code optimization.
This *might* fix kern/345 submitted by Heikki Suonsivu.
|
7275 |
23-Mar-1995 |
dg |
Deleted bogus DIAGNOSTIC "nfs_fsync: dirty" message. This can and does happen normally when there is heavy write activity to a file since the vnode isn't locked (NFS plays fast and loose with vnode locks). This change "fixes" PR#267.
|
7160 |
19-Mar-1995 |
dg |
Removed unnecessary call to vnode_pager_uncache(). We automatically clear the VTEXT flag after all mappers have finished with the object.
|
7110 |
17-Mar-1995 |
dg |
Changed some (incorrect) nfsrv_vput()'s back into regular vput()'s. This fixes the last of the known NQNFS problems (until I find more, that is :-)).
|
7095 |
16-Mar-1995 |
wollman |
Add four more filesystem flags:
VFCF_NETWORK (this FS goes over the net) VFCF_READONLY (read-write mounts do not make any sense) VFCF_SYNTHETIC (data in this FS is not real) VFCF_LOOPBACK (this FS aliases something else)
cd9660 is readonly; nullfs, umapfs, and union are loopback; NFS is netowkr; procfs, kernfs, and fdesc are synthetic.
|
7090 |
16-Mar-1995 |
bde |
Add and move declarations to fix all of the warnings from `gcc -Wimplicit' (except in netccitt, netiso and netns) and most of the warnings from `gcc -Wnested-externs'. Fix all the bugs found. There were no serious ones.
|
6875 |
04-Mar-1995 |
dg |
Removed obsolete vtrace() remnants.
|
6420 |
15-Feb-1995 |
phk |
YF fix.
|
6417 |
15-Feb-1995 |
dg |
Fixed two more bugs related to the merged cache changes.
Submitted by: John Dyson
|
6416 |
15-Feb-1995 |
dg |
Woops, change a nfsrv_vput back into a nfsrv_vrele.
Submitted by: John Dyson
|
6414 |
15-Feb-1995 |
dg |
Fixed three bugs related to the merged cache changes. The bugs likely would make NFS servers flakey - probably the cause of freefall's recent hangs.
Submitted by: John Dyson
|
6361 |
14-Feb-1995 |
phk |
YFfix +int nfsrv_vput __P(( struct vnode * )); +int nfsrv_vrele __P(( struct vnode * )); +int nfsrv_vmio __P(( struct vnode * ));
|
6210 |
06-Feb-1995 |
dg |
Changed order of release of vnode/object to fix a problem where the vnode is freed with an old object still attached (subsequently causing a panic). Fixes NFS server panic "object/pager mismatch".
Submitted by: John Dyson
|
6151 |
03-Feb-1995 |
dg |
Fixed bmap run-length brokeness. Use bmap run-length extension when doing clustered paging.
Submitted by: John Dyson
|
6148 |
03-Feb-1995 |
dg |
Removed a pile of vfs_unbusy_pages()...both unnecessary and wrong - resulted in serious system instability. Changed a B_INVAL to a B_NOCACHE so that buffer data is properly disposed of.
Submitted by: John Dyson, Rick Macklin, and ohki@gssm.otsuka.tsukuba.ac.jp
|
6013 |
29-Jan-1995 |
bde |
Fix longstanding benign type mismatch.
|
5472 |
10-Jan-1995 |
dg |
Fix conversion to/from nfs v2 time in handling microseconds.
Submitted by: rick@snowhite.cis.uoguelph.ca
|
5471 |
10-Jan-1995 |
dg |
Added two missing brelse() calls.
Submitted by: rick@snowhite.cis.uoguelph.ca
|
5455 |
09-Jan-1995 |
dg |
These changes embody the support of the fully coherent merged VM buffer cache, much higher filesystem I/O performance, and much better paging performance. It represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are (mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to support the new VM/buffer scheme.
vfs_bio.c: Significant rewrite of most of vfs_bio to support the merged VM buffer cache scheme. The scheme is almost fully compatible with the old filesystem interface. Significant improvement in the number of opportunities for write clustering.
vfs_cluster.c, vfs_subr.c Upgrade and performance enhancements in vfs layer code to support merged VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c: Yet more improvements in the collapse code. Elimination of some windows that can cause list corruption.
vm_pageout.c: Fixed it, it really works better now. Somehow in 2.0, some "enhancements" broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of kernel PTs.
vm_glue.c Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the code doesn't need it anymore.
machdep.c Changes to better support the parameter values for the merged VM/buffer cache scheme.
machdep.c, kern_exec.c, vm_glue.c Implemented a seperate submap for temporary exec string space and another one to contain process upages. This eliminates all map fragmentation problems that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on busy buffers.
Submitted by: John Dyson and David Greenman
|
5353 |
03-Jan-1995 |
jkh |
From Gene Stark: If nd->swap_nblks is zero in nfs_mountroot(), then the system comes up without initializing swapdev_vp to an actual vnode pointer. The swap pager assumes a non-NULL value for swapdev_vp.
The fix is to try initializing local swap if no NFS swap space is specified.
|
5013 |
08-Dec-1994 |
phk |
Would you please correct nfs/nfs_vfsops.c so that the ip address of the root filesystem is printed out correctly? It's line 299 in nfs/nfs_vfsops.c.
Reviewed by: phk Submitted by: Luigi Rizzo luigi@iet.unipi.it
|
4067 |
02-Nov-1994 |
wollman |
Forward-declare a few structures to avoid warning messages.
|
3820 |
23-Oct-1994 |
wollman |
Implement fs.nfs MIB variables.
|
3797 |
22-Oct-1994 |
phk |
This is where the action is. I'm still not sure that swap is 100% OK, but it seems to work.
|
3664 |
17-Oct-1994 |
phk |
This is a bunch of changes from NetBSD. There are a couple of bug-fixes. But mostly it is changes to use the list-maintenance macros instead of doing the pointer-gymnastics by hand.
Obtained from: NetBSD
|
3451 |
09-Oct-1994 |
dg |
Got rid of map.h. It's a leftover from the rmap code, and we use rlists. Changed swapmap into swaplist.
|
3396 |
06-Oct-1994 |
dg |
Use tsleep() rather than sleep so that 'ps' is more informative about the wait.
|
3305 |
02-Oct-1994 |
phk |
Prototyping and general gcc-shutting up. Gcc has one warning now which looks bad, I will get to it eventually, unless somebody beats me to it.
|
3167 |
28-Sep-1994 |
dfr |
Make NFS ask the filesystems for directory cookies instead of making them itself.
|
3035 |
23-Sep-1994 |
wollman |
Forgot to commit this change when making NFS loadable.
|
2997 |
22-Sep-1994 |
wollman |
Make NFS loadable.
|
2979 |
22-Sep-1994 |
wollman |
More loadable VFS changes:
- Make a number of filesystems work again when they are statically compiled (blush)
- FIFOs are no longer optional; ``options FIFO'' removed from distributed config files.
|
2946 |
21-Sep-1994 |
wollman |
Implemented loadable VFS modules, and made most existing filesystems loadable. (NFS is a notable exception.)
|
2384 |
29-Aug-1994 |
dg |
"bogus" fixes from 1.1.5 to work around some cache coherency problems.
|
2175 |
21-Aug-1994 |
paul |
More idempotency....... this is fun :-)
|
2152 |
20-Aug-1994 |
dg |
Implemented filesystem clean bit via:
machdep.c: Changed printf's a little and call vfs_unmountall() if the sync was successful.
cd9660_vfsops.c, ffs_vfsops.c, nfs_vfsops.c, lfs_vfsops.c: Allow dismount of root FS. It is now disallowed at a higher level.
vfs_conf.c: Removed unused rootfs global.
vfs_subr.c: Added new routines vfs_unmountall and vfs_unmountroot. Filesystems are now dismounted if the machine is properly rebooted.
ffs_vfsops.c: Toggle clean bit at the appropriate places. Print warning if an unclean FS is mounted.
ffs_vfsops.c, lfs_vfsops.c: Fix bug in selecting proper flags for VOP_CLOSE().
vfs_syscalls.c: Disallow dismounting root FS via umount syscall.
|
2112 |
18-Aug-1994 |
wollman |
Fix up some sloppy coding practices:
- Delete redundant declarations. - Add -Wredundant-declarations to Makefile.i386 so they don't come back. - Delete sloppy COMMON-style declarations of uninitialized data in header files. - Add a few prototypes. - Clean up warnings resulting from the above.
NB: ioconf.c will still generate a redundant-declaration warning, which is unavoidable unless somebody volunteers to make `config' smarter.
|
2059 |
13-Aug-1994 |
dg |
Made the kernel compile cleanly with gcc 2.6.0. Thanks go to Bruce Evans for suggesting a method to detect various versions of gcc.
|
2010 |
10-Aug-1994 |
dg |
Initialize lockf pointer. I missed this when I made NFS use the generic advlock mechanism, and not doing so results in random system crashes.
|
1979 |
09-Aug-1994 |
dg |
Removed some padding bytes from the nfsnode struct to make the structure size a power of 2 again. The system complains otherwise - probably because it wastes space with our malloc scheme otherwise.
|
1960 |
08-Aug-1994 |
dg |
Made lockf advisory locking code generic (rather than ufs specific), and use it in NFS. This is required both for diskless support and for POSIX compliance. Note: the support in NFS is only for the local node.
Submitted by: based on work originally done by Yuval Yurom
|
1937 |
08-Aug-1994 |
dg |
Changed B_AGE policy to work correctly in a world with relatively large buffer caches. The old policy generally ended up caching nothing.
|
1858 |
05-Aug-1994 |
dg |
Converted 'vmunix' to 'kernel'.
|
1828 |
04-Aug-1994 |
dg |
Made NFS attribute cache timeouts kernel config file tunable via NFS_MINATTRTIMO and NFS_MAXATTRTIMO.
|
1817 |
02-Aug-1994 |
dg |
Added $Id$
|
1549 |
25-May-1994 |
rgrimes |
The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.
Reviewed by: Rodney W. Grimes Submitted by: John Dyson and David Greenman
|
1542 |
24-May-1994 |
rgrimes |
This commit was generated by cvs2svn to compensate for changes in r1541, which included commits to RCS files with non-trunk default branches.
|
1541 |
24-May-1994 |
rgrimes |
BSD 4.4 Lite Kernel Sources
|