272461 |
03-Oct-2014 |
gjb |
Copy stable/10@r272459 to releng/10.1 as part of the 10.1-RELEASE process.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation
|
260107 |
30-Dec-2013 |
rmacklem |
MFC: r259084 For software builds, the NFS client does many small synchronous (with FILE_SYNC) writes because non-contiguous byte ranges in the same buffer cache block are being written. This patch adds a new mount option "noncontigwr" which allows the non-contiguous byte ranges to be combined, with the dirty byte range becoming the superset of the bytes that are dirty, if the file has not been file locked. This reduces the number of writes significantly for software builds. The only case where this change might break existing applications is where an application is writing non-overlapping byte ranges within the same buffer cache block of a file from multiple clients concurrently. Since such an application would normally do file locking on the file, avoiding the byte range merge for files that have been file locked should be sufficient for most (maybe all?) cases.
|
256281 |
10-Oct-2013 |
gjb |
Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation
|
252673 |
04-Jul-2013 |
rmacklem |
A problem with the old NFS client where large writes to large files would sometimes result in a corrupted file was reported via email. This problem appears to have been caused by r251719 (reverting r251719 fixed the problem). Although I have not been able to reproduce this problem, I suspect it is caused by another thread increasing np->n_size after the mtx_unlock(&np->n_mtx) but before the vnode_pager_setsize() call. Since the np->n_mtx mutex serializes updates to np->n_size, doing the vnode_pager_setsize() with the mutex locked appears to avoid the problem. Unfortunately, vnode_pager_setsize() where the new size is smaller, cannot be called with a mutex held. This patch returns the semantics to be close to pre-r251719 such that the call to the vnode_pager_setsize() is only delayed until after the mutex is unlocked when np->n_size is shrinking. Since the file is growing when being written, I believe this will fix the corruption.
Reported by: David G. Lawrence (dg@dglawrence.com) Tested by: David G. Lawrence (pending, to happen soon) Reviewed by: kib MFC after: 1 week
|
252479 |
01-Jul-2013 |
rmacklem |
A recent version of the oldnfs NFS client in head/current will crash when doing a large write, since m_get2() would return NULL. This patch fixes the problem, since nfsm_uiotombuf() will allocate additional mbufs, as required.
Reported by: sbruno Tested by: sbruno Discussed with: glebius
|
251171 |
31-May-2013 |
jeff |
- Convert the bufobj lock to rwlock. - Use a shared bufobj lock in getblk() and inmem(). - Convert softdep's lk to rwlock to match the bufobj lock. - Move INFREECNT to b_flags and protect it with the buf lock. - Remove unnecessary locking around bremfree() and BKGRDINPROG.
Sponsored by: EMC / Isilon Storage Division Discussed with: mckusick, kib, mdf
|
251089 |
29-May-2013 |
rmacklem |
Add a patch analygous to r248567, r248581, r251079 to the old NFS client to avoid the panic reported in the PR by doing the vnode_pager_setsize() call after unlocking the mutex.
PR: 177335 MFC after: 2 weeks
|
249630 |
18-Apr-2013 |
rmacklem |
When an NFS unmount occurs, once vflush() writes the last dirty buffer for the last vnode on the mount back to the server, it returns. At that point, the code continues with the unmount, including freeing up the nfs specific part of the mount structure. It is possible that an nfsiod thread will try to check for an empty I/O queue in the nfs specific part of the mount structure after it has been free'd by the unmount. This patch avoids this problem by setting the iodmount entries for the mount back to NULL while holding the mutex in the unmount and checking the appropriate entry is non-NULL after acquiring the mutex in the nfsiod thread.
Reported and tested by: pho Reviewed by: kib MFC after: 2 weeks
|
249623 |
18-Apr-2013 |
rmacklem |
Both NFS clients can deadlock when using the "rdirplus" mount option. This can occur when an nfsiod thread that already holds a buffer lock attempts to acquire a vnode lock on an entry in the directory (a LOR) when another thread holding the vnode lock is waiting on an nfsiod thread. This patch avoids the deadlock by disabling readahead for this case, so the nfsiod threads never do readdirplus. Since readaheads for directories need the directory offset cookie from the previous read, they cannot normally happen in parallel. As such, testing by jhb@ and myself didn't find any performance degredation when this patch is applied. If there is a case where this results in a significant performance degradation, mounting without the "rdirplus" option can be done to re-enable readahead for directories.
Reported and tested by: jhb Reviewed by: jhb MFC after: 2 weeks
|
248500 |
19-Mar-2013 |
emaste |
Fix remainder calculation when biosize is not a power of 2
In common configurations biosize is a power of two, but is not required to be so. Thanks to markj@ for spotting an additional case beyond my original patch.
Reviewed by: rmacklem@
|
248255 |
13-Mar-2013 |
jhb |
Revert 195703 and 195821 as this special stop handling in NFS is now implemented via VFCF_SBDRY rather than passing PBDRY to individual sleep calls.
|
248207 |
12-Mar-2013 |
glebius |
Functions m_getm2() and m_get2() have different order of arguments, and that can drive someone crazy. While m_get2() is young and not documented yet, change its order of arguments to match m_getm2().
Sorry for churn, but better now than later.
|
248198 |
12-Mar-2013 |
glebius |
- Use m_get2() instead of nfsm_reqhead(). - Use m_get(), m_getcl() instead of historic macros.
Sponsored by: Nginx, Inc.
|
248084 |
09-Mar-2013 |
attilio |
Switch the vm_object mutex to be a rwlock. This will enable in the future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes.
The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs.
The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example).
Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho
|
247116 |
21-Feb-2013 |
jhb |
Further refine the handling of stop signals in the NFS client. The changes in r246417 were incomplete as they did not add explicit calls to sigdeferstop() around all the places that previously passed SBDRY to _sleep(). In addition, nfs_getcacheblk() could trigger a write RPC from getblk() resulting in sigdeferstop() recursing. Rather than manually deferring stop signals in specific places, change the VFS_*() and VOP_*() methods to defer stop signals for filesystems which request this behavior via a new VFCF_SBDRY flag. Note that this has to be a VFC flag rather than a MNTK flag so that it works properly with VFS_MOUNT() when the mount is not yet fully constructed. For now, only the NFS clients are set this new flag in VFS_SET().
A few other related changes: - Add an assertion to ensure that TDF_SBDRY doesn't leak to userland. - When a lookup request uses VOP_READLINK() to follow a symlink, mark the request as being on behalf of the thread performing the lookup (cnp_thread) rather than using a NULL thread pointer. This causes NFS to properly handle signals during this VOP on an interruptible mount.
PR: kern/176179 Reported by: Russell Cattelan (sigdeferstop() recursion) Reviewed by: kib MFC after: 1 month
|
246417 |
06-Feb-2013 |
jhb |
Rework the handling of stop signals in the NFS client. The changes in 195702, 195703, and 195821 prevented a thread from suspending while holding locks inside of NFS by forcing the thread to fail sleeps with EINTR or ERESTART but defer the thread suspension to the user boundary. However, this had the effect that stopping a process during an NFS request could abort the request and trigger EINTR errors that were visible to userland processes (previously the thread would have suspended and completed the request once it was resumed).
This change instead effectively masks stop signals while in the NFS client. It uses the existing TDF_SBDRY flag to effect this since SIGSTOP cannot be masked directly. Also, instead of setting PBDRY on individual sleeps, the NFS client now sets the TDF_SBDRY flag around each NFS request and stop signals are masked for all sleeps during that region (the previous change missed sleeps in lockmgr locks). The end result is that stop signals sent to threads performing an NFS request are completely ignored until after the NFS request has finished processing and the thread prepares to return to userland. This restores the behavior of stop signals being transparent to userland processes while still preventing threads from suspending while holding NFS locks.
Reviewed by: kib MFC after: 1 month
|
245909 |
25-Jan-2013 |
jhb |
Further cleanups to use of timestamps in NFS: - Use NFSD_MONOSEC (which maps to time_uptime) instead of the seconds portion of wall-time stamps to manage timeouts on events. - Remove unused nd_starttime from the per-request structure in the new NFS server. - Use nanotime() for the modification time on a delegation to get as precise a time as possible. - Use time_second instead of extracting the second from a call to getmicrotime().
Submitted by: bde (3) Reviewed by: bde, rmacklem MFC after: 2 weeks
|
245611 |
18-Jan-2013 |
jhb |
Use vfs_timestamp() to set file timestamps rather than invoking getmicrotime() or getnanotime() directly in NFS.
Reviewed by: rmacklem, bde MFC after: 1 week
|
245508 |
16-Jan-2013 |
jhb |
Use the VA_UTIMES_NULL flag to detect when NULL was passed to utimes() instead of comparing the desired time against the current time as a heuristic.
Reviewed by: rmacklem MFC after: 1 week
|
245476 |
15-Jan-2013 |
jhb |
- More properly handle interrupted NFS requests on an interruptible mount by returning an error of EINTR rather than EACCES. - While here, bring back some (but not all) of the NFS RPC statistics lost when krpc was committed.
Reviewed by: rmacklem MFC after: 1 week
|
244042 |
08-Dec-2012 |
rmacklem |
Move the NFSv4.1 client patches over from projects/nfsv4.1-client to head. I don't think the NFS client behaviour will change unless the new "minorversion=1" mount option is used. It includes basic NFSv4.1 support plus support for pNFS using the Files Layout only. All problems detecting during an NFSv4.1 Bakeathon testing event in June 2012 have been resolved in this code and it has been tested against the NFSv4.1 server available to me. Although not reviewed, I believe that kib@ has looked at it.
|
243882 |
05-Dec-2012 |
glebius |
Mechanically substitute flags from historic mbuf allocator with malloc(9) flags within sys.
Exceptions:
- sys/contrib not touched - sys/mbuf.h edited manually
|
243311 |
19-Nov-2012 |
attilio |
r16312 is not any longer real since many years (likely since when VFS received granular locking) but the comment present in UFS has been copied all over other filesystems code incorrectly for several times.
Removes comments that makes no sense now.
Reviewed by: kib MFC after: 3 days
|
242833 |
09-Nov-2012 |
attilio |
Complete MPSAFE VFS interface and remove MNTK_MPSAFE flag. Porters should refer to __FreeBSD_version 1000021 for this change as it may have happened at the same timeframe.
|
239246 |
14-Aug-2012 |
kib |
Do not leave invalid pages in the object after the short read for a network file systems (not only NFS proper). Short reads cause pages other then the requested one, which were not filled by read response, to stay invalid.
Change the vm_page_readahead_finish() interface to not take the error code, but instead to make a decision to free or to (de)activate the page only by its validity. As result, not requested invalid pages are freed even if the read RPC indicated success.
Noted and reviewed by: alc MFC after: 1 week
|
239065 |
05-Aug-2012 |
kib |
After the PHYS_TO_VM_PAGE() function was de-inlined, the main reason to pull vm_param.h was removed. Other big dependency of vm_page.h on vm_param.h are PA_LOCK* definitions, which are only needed for in-kernel code, because modules use KBI-safe functions to lock the pages.
Stop including vm_param.h into vm_page.h. Include vm_param.h explicitely for the kernel code which needs it.
Suggested and reviewed by: alc MFC after: 2 weeks
|
239040 |
04-Aug-2012 |
kib |
Reduce code duplication and exposure of direct access to struct vm_page oflags by providing helper function vm_page_readahead_finish(), which handles completed reads for pages with indexes other then the requested one, for VOP_GETPAGES().
Reviewed by: alc MFC after: 1 week
|
235332 |
12-May-2012 |
rmacklem |
PR# 165923 reported intermittent write failures for dirty memory mapped pages being written back on an NFS mount. Since any thread can call VOP_PUTPAGES() to write back a dirty page, the credentials of that thread may not have write access to the file on an NFS server. (Often the uid is 0, which may be mapped to "nobody" in the NFS server.) Although there is no completely correct fix for this (NFS servers check access on every write RPC instead of at open/mmap time), this patch avoids the common cases by holding onto a credential that recently opened the file for writing and uses that credential for the write RPCs being done by VOP_PUTPAGES() for both NFS clients.
Tested by: Joel Ray Holveck (joelh at juniper.net) PR: kern/165923 Reviewed by: kib MFC after: 2 weeks
|
235246 |
10-May-2012 |
mckusick |
Fix mount mutex handling missed in r234386.
|
235052 |
05-May-2012 |
pluknet |
Fix mount mutex handling missed in r234386.
|
234605 |
23-Apr-2012 |
trasz |
Remove unused thread argument from vtruncbuf().
Reviewed by: kib
|
234386 |
17-Apr-2012 |
mckusick |
Replace the MNT_VNODE_FOREACH interface with MNT_VNODE_FOREACH_ALL. The primary changes are that the user of the interface no longer needs to manage the mount-mutex locking and that the vnode that is returned has its mutex locked (thus avoiding the need to check to see if its is DOOMED or other possible end of life senarios).
To minimize compatibility issues for third-party developers, the old MNT_VNODE_FOREACH interface will remain available so that this change can be MFC'ed to 9. Following the MFC to 9, MNT_VNODE_FOREACH will be removed in head.
The reason for this update is to prepare for the addition of the MNT_VNODE_FOREACH_ACTIVE interface that will loop over just the active vnodes associated with a mount point (typically less than 1% of the vnodes associated with the mount point).
Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks
|
232821 |
11-Mar-2012 |
kib |
Remove fifo.h. The only used function declaration from the header is migrated to sys/vnode.h.
Submitted by: gianni
|
232420 |
03-Mar-2012 |
rmacklem |
Post r230394, the Lookup RPC counts for both NFS clients increased significantly. Upon investigation this was caused by name cache misses for lookups of "..". For name cache entries for non-".." directories, the cache entry serves double duty. It maps both the named directory plus ".." for the parent of the directory. As such, two ctime values (one for each of the directory and its parent) need to be saved in the name cache entry. This patch adds an entry for ctime of the parent directory to the name cache. It also adds an additional uma zone for large entries with this time value, in order to minimize memory wastage. As well, it fixes a couple of cases where the mtime of the parent directory was being saved instead of ctime for positive name cache entries. With this patch, Lookup RPC counts return to values similar to pre-r230394 kernels.
Reported by: bde Discussed with: kib Reviewed by: jhb MFC after: 2 weeks
|
232327 |
01-Mar-2012 |
rmacklem |
Fix the NFS clients so that they use copyin() instead of bcopy(), when doing direct I/O. This direct I/O code is not enabled by default.
Submitted by: kib (earlier version) Reviewed by: kib MFC after: 1 week
|
232116 |
24-Feb-2012 |
jhb |
Adjust the nfs_skip_wcc_data_onerr setting so that it does not block post-op attributes for ENOENT errors now that the name caching logic depends on working post-op attributes.
MFC after: 2 weeks
|
231949 |
21-Feb-2012 |
kib |
Fix found places where uio_resid is truncated to int.
Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from the usermode.
Discussed with: bde, das (previous versions) MFC after: 1 month
|
231852 |
17-Feb-2012 |
bz |
Merge multi-FIB IPv6 support from projects/multi-fibv6/head/:
Extend the so far IPv4-only support for multiple routing tables (FIBs) introduced in r178888 to IPv6 providing feature parity.
This includes an extended rtalloc(9) KPI for IPv6, the necessary adjustments to the network stack, and user land support as in netstat.
Sponsored by: Cisco Systems, Inc. Reviewed by: melifaro (basically) MFC after: 10 days
|
231088 |
06-Feb-2012 |
jhb |
Rename cache_lookup_times() to cache_lookup() and retire the old API and ABI stub for cache_lookup().
|
231075 |
06-Feb-2012 |
kib |
Current implementations of sync(2) and syncer vnode fsync() VOP uses mnt_noasync counter to temporary remove MNTK_ASYNC mount option, which is needed to guarantee a synchronous completion of the initiated i/o before syscall or VOP return. Global removal of MNTK_ASYNC option is harmful because not only i/o started from corresponding thread becomes synchronous, but all i/o is synchronous on the filesystem which is initiated during sync(2) or syncer activity.
Instead of removing MNTK_ASYNC from mnt_kern_flag, provide a local thread flag to disable async i/o for current thread only. Use the opportunity to move DOINGASYNC() macro into sys/vnode.h and consistently use it through places which tested for MNTK_ASYNC.
Some testing demonstrated 60-70% improvements in run time for the metadata-intensive operations on async-mounted UFS volumes, but still with great deviation due to other reasons.
Reviewed by: mckusick Tested by: scottl MFC after: 2 weeks
|
230803 |
31-Jan-2012 |
rmacklem |
When a "mount -u" switches an NFS mount point from TCP to UDP, any thread doing an I/O RPC with a transfer size greater than NFS_UDPMAXDATA will be hung indefinitely, retrying the RPC. After a discussion on freebsd-fs@, I decided to add a warning message for this case, as suggested by Jeremy Chadwick.
Suggested by: freebsd at jdc.parodius.com (Jeremy Chadwick) MFC after: 2 weeks
|
230605 |
27-Jan-2012 |
rmacklem |
A problem with respect to data read through the buffer cache for both NFS clients was reported to freebsd-fs@ under the subject "NFS corruption in recent HEAD" on Nov. 26, 2011. This problem occurred when a TCP mounted root fs was changed to using UDP. I believe that this problem was caused by the change in mnt_stat.f_iosize that occurred because rsize was decreased to the maximum supported by UDP. This patch fixes the problem by using v_bufobj.bo_bsize instead of f_iosize, since the latter is set to f_iosize when the vnode is allocated, but does not change for a given vnode when f_iosize changes.
Reported by: pjd Reviewed by: kib MFC after: 2 weeks
|
230559 |
26-Jan-2012 |
rmacklem |
Revert r230516, since it doesn't really fix the problem.
|
230552 |
25-Jan-2012 |
kib |
Fix remaining calls to cache_enter() in both NFS clients to provide appropriate timestamps. Restore the assertions which verify that NCF_TS is set when timestamp is asked for.
Reviewed by: jhb (previous version) MFC after: 2 weeks
|
230547 |
25-Jan-2012 |
jhb |
Add a timeout on positive name cache entries in the NFS client. That is, we will only trust a positive name cache entry for a specified amount of time before falling back to a LOOKUP RPC, even if the ctime for the file handle matches the cached copy in the name cache entry. The timeout is configured via a new 'nametimeo' mount option and defaults to 60 seconds. It may be set to zero to disable positive name caching entirely.
Reviewed by: rmacklem MFC after: 1 week
|
230516 |
25-Jan-2012 |
rmacklem |
If a mount -u is done to either NFS client that switches it from TCP to UDP and the rsize/wsize/readdirsize is greater than NFS_MAXDGRAMDATA, it is possible for a thread doing an I/O RPC to get stuck repeatedly doing retries. This happens because the RPC will use a resize/wsize/readdirsize that won't work for UDP and, as such, it will keep failing indefinitely. This patch returns an error for this case, to avoid the problem. A discussion on freebsd-fs@ seemed to indicate that returning an error was preferable to silently ignoring the "udp"/"mntudp" option. This problem was discovered while investigating a problem reported by pjd@ via email.
MFC after: 2 weeks
|
230394 |
20-Jan-2012 |
jhb |
Close a race in NFS lookup processing that could result in stale name cache entries on one client when a directory was renamed on another client. The root cause for the stale entry being trusted is that each per-vnode nfsnode structure has a single 'n_ctime' timestamp used to validate positive name cache entries. However, if there are multiple entries for a single vnode, they all share a single timestamp. To fix this, extend the name cache to allow filesystems to optionally store a timestamp value in each name cache entry. The NFS clients now fetch the timestamp associated with each name cache entry and use that to validate cache hits instead of the timestamps previously stored in the nfsnode. Another part of the fix is that the NFS clients now use timestamps from the post-op attributes of RPCs when adding name cache entries rather than pulling the timestamps out of the file's attribute cache. The latter is subject to races with other lookups updating the attribute cache concurrently. Some more details: - Add a variant of nfsm_postop_attr() to the old NFS client that can return a vattr structure with a copy of the post-op attributes. - Handle lookups of "." as a special case in the NFS clients since the name cache does not store name cache entries for ".", so we cannot get a useful timestamp. It didn't really make much sense to recheck the attributes on the the directory to validate the namecache hit for "." anyway. - ABI compat shims for the name cache routines are present in this commit so that it is safe to MFC.
MFC after: 2 weeks
|
230249 |
17-Jan-2012 |
mckusick |
Make sure all intermediate variables holding mount flags (mnt_flag) and that all internal kernel calls passing mount flags are declared as uint64_t so that flags in the top 32-bits are not lost.
MFC after: 2 weeks
|
228757 |
21-Dec-2011 |
rmacklem |
jwd@ reported a problem via email where the old NFS client would get a reply of EEXIST from an NFS server when a Mkdir RPC was retried, for an NFS over UDP mount. Upon investigation, it was found that the client was retransmitting the Mkdir RPC request over UDP, but with a different xid. As such, the retransmitted message would miss the Duplicate Request Cache in the server, causing it to reply EEXIST. The kernel client side UDP rpc code has two timers. The first one causes a retransmit using the same xid and socket and was set to a fixed value of 3seconds. (The default can be overridden via CLSET_RETRY_TIMEOUT.) The second one creates a new socket and xid and should be larger than the first. However, both NFS clients were setting the second timer to nm_timeo ("timeout=<value>" mount argument), which defaulted to 1second, so the first timer would never time out. This patch fixes both NFS clients so that they set the first timer using nm_timeo and makes the second timer larger than the first one.
Reported by: jwd Tested by: jwd Reviewed by: jhb MFC after: 2 weeks
|
228156 |
30-Nov-2011 |
kib |
Rename vm_page_set_valid() to vm_page_set_valid_range(). The vm_page_set_valid() is the most reasonable name for the m->valid accessor.
Reviewed by: attilio, alc
|
227690 |
19-Nov-2011 |
rmacklem |
The old NFS client will crash due to the reply being m_freem()'d twice if the server bogusly returns an error with the NFSERR_RETERR bit (bit 31) set. No actual NFS error has this bit set, but it seems that amd will sometimes do this. This patch makes sure the NFSERR_RETERR bit is cleared to avoid a crash.
PR: kern/153847 MFC after: 2 weeks
|
227507 |
14-Nov-2011 |
jhb |
Finish making 'wcommitsize' an NFS client mount option.
Reviewed by: rmacklem MFC after: 1 week
|
224733 |
09-Aug-2011 |
jhb |
Merge 220876, 220877, and 221537 from the new NFS client to the old: Allow the NFS client to use a max file size larger than 1TB for v3 mounts. It now allows files up to OFF_MAX subject to whatever limit the server advertises.
Reviewed by: rmacklem Approved by: re (kib) MFC after: 1 week
|
224604 |
02-Aug-2011 |
rmacklem |
Fix a LOR in the NFS client which could cause a deadlock. This was reported to the mailing list freebsd-net@freebsd.org on July 21, 2011 under the subject "LOR with nfsclient sillyrename". The LOR occurred when nfs_inactive() called vrele(sp->s_dvp) while holding the vnode lock on the file in s_dvp. This patch modifies the client so that it performs the vrele(sp->s_dvp) as a separate task to avoid the LOR. This fix was discussed with jhb@ and kib@, who both proposed variations of it.
Tested by: pho, jlott at averesystems.com Submitted by: jhb (earlier version) Reviewed by: kib Approved by: re (kib) MFC after: 2 weeks
|
223309 |
19-Jun-2011 |
rmacklem |
Fix the kgssapi so that it can be loaded as a module. Currently the NFS subsystems use five of the rpcsec_gss/kgssapi entry points, but since it was not obvious which others might be useful, all nineteen were included. Basically the nineteen entry points are set in a structure called rpc_gss_entries and inline functions defined in sys/rpc/rpcsec_gss.h check for the entry points being non-NULL and then call them. A default value is returned otherwise. Requested by rwatson.
Reviewed by: jhb MFC after: 2 weeks
|
222586 |
01-Jun-2011 |
kib |
In the VOP_PUTPAGES() implementations, change the default error from VM_PAGER_AGAIN to VM_PAGER_ERROR for the uwritten pages. Return VM_PAGER_AGAIN for the partially written page. Always forward at least one page in the loop of vm_object_page_clean().
VM_PAGER_ERROR causes the page reactivation and does not clear the page dirty state, so the write is not lost.
The change fixes an infinite loop in vm_object_page_clean() when the filesystem returns permanent errors for some page writes.
Reported and tested by: gavin Reviewed by: alc, rmacklem MFC after: 1 week
|
222464 |
29-May-2011 |
rmacklem |
Add a check for MNTK_UNMOUNTF at the beginning of nfs_sync() in the old NFS client so that a forced dismount doesn't get stuck in the VFS_SYNC() call that happens before VFS_UNMOUNT() in dounmount(). Analagous to r222329 for the new NFS client. An additional change is needed before forced dismounts will work.
PR: kern/157365 MFC after: 2 weeks
|
222187 |
22-May-2011 |
alc |
Eliminate duplicate #include's.
|
222075 |
18-May-2011 |
rmacklem |
Add a sanity check for the existence of an "addr" option to both NFS clients. This avoids the crash reported by Sergey Kandaurov (pluknet@gmail.com) to the freebsd-fs@ list with subject "[old nfsclient] different nmount() args passed from mount vs mount_nfs" dated May 17, 2011.
Tested by: pluknet at gmail.com (old nfs client) MFC after: 2 weeks
|
221986 |
16-May-2011 |
rmacklem |
Fix a comment that got missed by r221973 which changed the sysctl naming for the old NFS client to vfs.oldnfs.
|
221973 |
15-May-2011 |
rmacklem |
Change the sysctl naming for the old and new NFS clients to vfs.oldnfs.xxx and vfs.nfs.xxx respectively. This makes the default nfs client use vfs.nfs.xxx after r221124.
|
221543 |
06-May-2011 |
rmacklem |
Move sys/nfsclient/nfs_kdtrace.h to sys/nfs/nfs_kdtrace.h so it can be used by the new NFS client as well as the old one.
|
221542 |
06-May-2011 |
rmacklem |
Fix the module dependency in nfs_kdtrace.c for the old NFS client. This should fix a problem reported by Marcus Reid.
|
221436 |
04-May-2011 |
ru |
Implemented a mount option "nocto" that disables cache coherency checking at open time. It may improve performance for read-only NFS mounts. Use deliberately.
MFC after: 1 week Reviewed by: rmacklem, jhb (earlier version)
|
221139 |
27-Apr-2011 |
rmacklem |
Fix module names and dependencies so the NFS clients will load correctly as modules after r221124.
|
221124 |
27-Apr-2011 |
rmacklem |
This patch changes head so that the default NFS client is now the new NFS client (which I guess is no longer experimental). The fstype "newnfs" is now "nfs" and the regular/old NFS client is now fstype "oldnfs". Although mounts via fstype "nfs" will usually work without userland changes, an updated mount_nfs(8) binary is needed for kernels built with "options NFSCL" but not "options NFSCLIENT". Updated mount_nfs(8) and mount(8) binaries are needed to do mounts for fstype "oldnfs". The GENERIC kernel configs have been changed to use options NFSCL and NFSD (the new client and server) instead of NFSCLIENT and NFSSERVER. For kernels being used on diskless NFS root systems, "options NFSCL" must be in the kernel config. Discussed on freebsd-fs@.
|
221066 |
26-Apr-2011 |
rmacklem |
Fix a kernel linking problem introduced by r221032, r221040 when building kernels that don't have "options NFS_ROOT" specified. I plan on moving the functions that use these data structures into the shared code in sys/nfs/nfs_diskless.c in a future commit. At that time, these definitions will no longer be needed in nfs_vfsops.c and nfs_clvfsops.c.
MFC after: 2 weeks
|
221032 |
25-Apr-2011 |
rmacklem |
Fix the experimental NFS client so that it does not bogusly set the f_flags field of "struct statfs". This had the interesting effect of making the NFSv4 mounts "disappear" after r221014, since NFSMNT_NFSV4 and MNT_IGNORE became the same bit. Move the files used for a diskless NFS root from sys/nfsclient to sys/nfs in preparation for them to be used by both NFS clients. Also, move the declaration of the three global data structures from sys/nfsclient/nfs_vfsops.c to sys/nfs/nfs_diskless.c so that they are defined when either client uses them.
Reviewed by: jhb MFC after: 2 weeks
|
221014 |
25-Apr-2011 |
rmacklem |
Modify the experimental NFS client so that it uses the same "struct nfs_args" as the regular NFS client. This is needed so that the old mount(2) syscall will work and it makes sharing of the diskless NFS root code easier. Eary in the porting exercise I introduced a new revision of nfs_args, but didn't actually need it, thanks to nmount(2). I re-introduced the NFSMNT_KERB flag, since it does essentially the same thing and the old one would not have been used because it never worked. I also added a few new NFSMNT_xxx flags to sys/nfsclient/nfs_args.h that are used by the experimental NFS client.
MFC after: 2 weeks
|
220595 |
13-Apr-2011 |
ru |
- Fixed nfs_printf() to use vprintf(). - Fixed vfs.nfs.acdebug sysctl's description. - Fixed panic when compiled with NFS_ACDEBUG.
MFC after: 3 days
|
219028 |
25-Feb-2011 |
netchild |
Add some FEATURE macros for various features (AUDIT/CAM/IPC/KTR/MAC/NFS/NTP/ PMC/SYSV/...).
No FreeBSD version bump, the userland application to query the features will be committed last and can serve as an indication of the availablility if needed.
Sponsored by: Google Summer of Code 2010 Submitted by: kibab Reviewed by: arch@ (parts by rwatson, trasz, jhb) X-MFC after: to be determined in last commit with code from this project
|
218757 |
16-Feb-2011 |
bz |
Mfp4 CH=177274,177280,177284-177285,177297,177324-177325
VNET socket push back: try to minimize the number of places where we have to switch vnets and narrow down the time we stay switched. Add assertions to the socket code to catch possibly unset vnets as seen in r204147.
While this reduces the number of vnet recursion in some places like NFS, POSIX local sockets and some netgraph, .. recursions are impossible to fix.
The current expectations are documented at the beginning of uipc_socket.c along with the other information there.
Sponsored by: The FreeBSD Foundation Sponsored by: CK Software GmbH Reviewed by: jhb Tested by: zec
Tested by: Mikolaj Golub (to.my.trociny gmail.com) MFC after: 2 weeks
|
216931 |
03-Jan-2011 |
rmacklem |
Fix the nlm so that it no longer depends on the regular nfs client and, as such, can be loaded for the experimental nfs client without the regular client.
Reviewed by: jhb MFC after: 2 weeks
|
215548 |
19-Nov-2010 |
kib |
Remove prtactive variable and related printf()s in the vop_inactive and vop_reclaim() methods. They seems to be unused, and the reported situation is normal for the forced unmount.
MFC after: 1 week X-MFC-note: keep prtactive symbol in vfs_subr.c
|
214418 |
27-Oct-2010 |
jh |
Add missing "readahead" to the nfs_opts list.
PR: 151321 Tested by: Simon Walton MFC after: 2 weeks
|
214053 |
19-Oct-2010 |
rmacklem |
Fix the type of the 3rd argument for nm_getinfo so that it works for architectures like sparc64.
Suggested by: kib MFC after: 2 weeks
|
214048 |
19-Oct-2010 |
rmacklem |
Modify the NFS clients and the NLM so that the NLM can be used by both clients. Since the NLM uses various fields of the nfsmount structure, those fields were extracted and put in a separate nfs_mountcommon structure stored in sys/nfs/nfs_mountcommon.h. This structure also has a function pointer for a function that extracts the required information from the mount point and nfs vnode for that particular client, for information stored differently by the clients.
Reviewed by: jhb MFC after: 2 weeks
|
214026 |
18-Oct-2010 |
kib |
Do not synchronously start the nfsiod threads at all. The r212506 fixed the issues with file descriptor locks, but the same problems are present for vnode lock/user map lock.
If the nfs_asyncio() cannot find the free nfsiod, schedule task to create new nfsiod and return error. This causes fall back to the synchronous i/o for nfs_strategy(), or does not start read at all in the case of readahead. The caller that holds vnode and potentially user map lock does not wait for kproc_create() to finish, preventing the LORs.
The change effectively reverts r203072, because we never hand off the request to newly created nfsiod thread anymore.
Reviewed by: jhb Tested by: jhb, pluknet MFC after: 3 weeks
|
212506 |
12-Sep-2010 |
kib |
Do not fork nfsiod directly from the vop methods. This causes LORs between vnode lock and several locks needed during fork, like fd lock.
Instead, schedule the task to be executed in the taskqueue context. We still waiting for the fork to finish, but the context of the thread executing the task does not make real LORs with our vnode lock.
Submitted by: pluknet at gmail com Reviewed by: jhb Tested by: pho MFC after: 3 weeks
|
212293 |
07-Sep-2010 |
jhb |
Store the full timestamp when caching timestamps of files and directories for purposes of validating name cache entries. This closes races where two updates to a file or directory within the same second could result in stale entries in the name cache. While here, remove the 'n_expiry' field as it is no longer used.
Reviewed by: rmacklem MFC after: 1 week
|
212123 |
01-Sep-2010 |
rmacklem |
Modify nfs_diskless.c to recognize the environment variable boot.nfsroot.nfshandlelen and set the diskless root fs to use NFSv3 and this file handle length when it is set. If this environment variable is not set, the diskless root fs will use NFSv2 and the same defaults as before. This fixes the problem where the diskless nfs root fs had to be on a FreeBSD server for NFSv3 to work, because it did not know the correct file handle length and assumed the size used by FreeBSD. Until pxeboot and loader are replaced by ones built from commits coming soon, boot.nfsroot.nfshandlelen will not be set by them and the diskless root fs will use NFSv2 unless the /etc/fstab entry has the "nfsv3" option specified.
Tested by: danny at cs.huji.ac.il MFC after: 2 weeks
|
211531 |
20-Aug-2010 |
jhb |
Add dedicated routines to toggle lockmgr flags such as LK_NOSHARE and LK_CANRECURSE after a lock is created. Use them to implement macros that otherwise manipulated the flags directly. Assert that the associated lockmgr lock is exclusively locked by the current thread when manipulating these flags to ensure the flag updates are safe. This last change required some minor shuffling in a few filesystems to exclusively lock a brand new vnode slightly earlier.
Reviewed by: kib MFC after: 3 days
|
210834 |
04-Aug-2010 |
rmacklem |
Add some mutex locking on the nfsnode to the regular NFS client.
Reviewed by: jhb
|
210455 |
24-Jul-2010 |
rmacklem |
Move sys/nfsclient/nfs_lock.c into sys/nfs and build it as a separate module that can be used by both the regular and experimental nfs clients. This fixes the problem reported by jh@ where /dev/nfslock would be registered twice when both nfs clients were used. I also defined the size of the lm_fh field to be the correct value, as it should be the maximum size of an NFSv3 file handle.
Reviewed by: jh MFC after: 2 weeks
|
210136 |
15-Jul-2010 |
jhb |
Retire the NFS access cache timestamp structure. It was used in VOP_OPEN() to avoid sending multiple ACCESS/GETATTR RPCs during a single open() between VOP_LOOKUP() and VOP_OPEN(). Now we always send the RPC in VOP_LOOKUP() and not VOP_OPEN() in the cases that multiple RPCs could be sent.
MFC after: 2 weeks
|
209948 |
12-Jul-2010 |
jhb |
A previous change moved the GETATTR RPC for open() calls that hit in the name cache up into nfs_lookup() instead of nfs_open(). Continue this trend by flushing the attribute cache for leaf nodes in nfs_lookup() during an open() if we do a LOOKUP RPC. For NFSv3 this should generally be a NOP as the attributes are flushed before fetching the post-op attributes from the LOOKUP RPC which most (all?) NFSv3 servers provide, so the post-op attributes should populate the cache.
Now all NFS open() calls will always clear the cached attributes during the nfs_lookup() prior to nfs_open() in the !NMODIFIED case to provide CTOC. As a result, we can remove the conditional flushing of the attribute cache from nfs_open().
Reviewed by: rmacklem, bde MFC after: 2 weeks
|
209946 |
12-Jul-2010 |
jhb |
- Add missing locking around flushing of an NFS node's attribute cache in the NMODIFIED case of nfs_open(). - Cosmetic tweak to simplify an expression in nfs_lookup().
Reviewed by: rmacklem, bde MFC after: 1 week
|
209120 |
13-Jun-2010 |
kib |
In NFS clients, instead of inconsistently using #ifdef DIAGNOSTIC and #ifndef DIAGNOSTIC for debug assertions, prefer KASSERT(). Also change one #ifdef DIAGNOSTIC in the new nfs server.
Submitted by: Mikolaj Golub <to.my.trociny gmail com> MFC after: 2 weeks
|
208605 |
27-May-2010 |
delphij |
Fix build: newnp represents newvp so KDTRACE_NFS_ATTRCACHE_FLUSH_DONE() on newvp instead of vp here.
|
208603 |
27-May-2010 |
jhb |
More gracefully handle stale file handles and attributes when opening a file via NFS. Specifically, to satisfy close-to-open-consistency, the NFS client always performs at least one RPC on a file during an open(2) to see if the file has changed. Normally this RPC is an ACCESS or GETATTR RPC that is forced by flushing a file's attribute cache during nfs_open() and then requesting new attributes. However, if the file is noticed to be stale during nfs_open(), the only recourse is to fail the open(2) call with ESTALE. On the other hand, if the ACCESS or GETATTR RPC is sent during nfs_lookup(), then the NFS client can fall back to a LOOKUP RPC to obtain the new file handle in the case that a file has been replaced.
This change causes the NFS client to flush the attribute cache during nfs_lookup() when validating a name cache hit if the attributes fetched during nfs_lookup() can be reused in nfs_open(). This allows the client to open a replaced file via the new file handle the first time that it notices a replaced file rather than failing with ESTALE in some cases.
Reviewed by: rmacklem, bde Reviewed by: mohans (older version) MFC after: 1 week
|
208586 |
27-May-2010 |
cperciva |
Change the current working directory to be inside the jail created by the jail(8) command. [10:04]
Fix a one-NUL-byte buffer overflow in libopie. [10:05]
Correctly sanity-check a buffer length in nfs mount. [10:06]
Approved by: so (cperciva) Approved by: re (kensmith) Security: FreeBSD-SA-10:04.jail Security: FreeBSD-SA-10:05.opie Security: FreeBSD-SA-10:06.nfsclient
|
207746 |
07-May-2010 |
alc |
Push down the page queues lock into vm_page_activate().
|
207728 |
06-May-2010 |
alc |
Eliminate page queues locking around most calls to vm_page_free().
|
207669 |
05-May-2010 |
alc |
Acquire the page lock around all remaining calls to vm_page_free() on managed pages that didn't already have that lock held. (Freeing an unmanaged page, such as the various pmaps use, doesn't require the page lock.)
This allows a change in vm_page_remove()'s locking requirements. It now expects the page lock to be held instead of the page queues lock. Consequently, the page queues lock is no longer required at all by callers to vm_page_rename().
Discussed with: kib
|
207662 |
05-May-2010 |
trasz |
Move checking against RLIMIT_FSIZE into one place, vn_rlimit_fsize().
Reviewed by: kib
|
207584 |
03-May-2010 |
kib |
Lock the page around vm_page_activate() and vm_page_deactivate() calls where it was missed. The wrapped fragments now protect wire_count with page lock.
Reviewed by: alc
|
204063 |
18-Feb-2010 |
pjd |
Simplify code a bit.
|
203968 |
16-Feb-2010 |
marius |
Factor out the code shared between NFS client and server into its own module. With r203732 it became apparent that creating the sysctl nodes twice causes at least a warning, however the whole code shouldn't be present twice in the first place.
Discussed with: rmacklem
|
203732 |
09-Feb-2010 |
marius |
- Move nfs_realign() from the NFS client to the shared NFS code and remove the NFS server version in order to reduce code duplication. The shared version now uses a second parameter how, which is passed on to m_get(9) and m_getcl(9) as the server used M_WAIT while the client requires M_DONTWAIT, and replaces the the previously unused parameter hsiz. - Change nfs_realign() to use nfsm_aligned() so as with other NFS code the alignment check isn't actually performed on platforms without strict alignment requirements for performance reasons because as the comment suggests unaligned data only occasionally occurs with TCP. - Change fha_extract_info() to use nfs_realign() with M_DONTWAIT rather than M_WAIT because it's called with the RPC sp_lock held.
Reviewed by: jhb, rmacklem MFC after: 1 week
|
203731 |
09-Feb-2010 |
marius |
Some style(9) fixes
|
203072 |
27-Jan-2010 |
rmacklem |
Fix a race that can occur when nfs nfsiod threads are being created. Without this patch it was possible for a different thread that calls nfs_asyncio() to snitch a newly created nfsiod thread that was intended for another caller of nfs_asyncio(), because the nfs_iod_mtx mutex was unlocked while the new nfsiod thread was created. This patch labels the newly created nfsiod, so that it is not taken by another caller of nfs_asyncio(). This is believed to fix the problem reported on the freebsd-stable email list under the subject: FreeBSD NFS client/Linux NFS server issue.
Tested by: to DOT my DOT trociny AT gmail DOT com Reviewed by: jhb MFC after: 2 weeks
|
202774 |
21-Jan-2010 |
rmacklem |
Fix a typo in a comment introduced by r202767.
MFC after: 2 weeks
|
202767 |
21-Jan-2010 |
rmacklem |
Add a timeout for the negative name cache entries in the NFS client. This avoids a bogus negative name cache entry from persisting forever when another client creates an entry with the same name within the same NFS server time of day clock tick. The mount option negnametimeo can be used to override the default timeout interval on a per-mount-point basis. Setting negnametimeo to 0 disables negative name caching for the mount point. I also fixed one obvious typo where args.timeo should be args.maxgrouplist.
Submitted by: jhb (earlier version) Reviewed by: jhb MFC after: 2 weeks
|
201895 |
09-Jan-2010 |
zec |
Reduce recursions on curvnet and thus spamming the console with warning messages for kernels built with options VIMAGE and VNET_DEBUG enabled.
Reviewed by: bz MFC after: 3 days
|
201758 |
07-Jan-2010 |
mbr |
Remove extraneous semicolons, no functional changes.
Submitted by: Marc Balmer <marc@msys.ch> MFC after: 1 week
|
201044 |
27-Dec-2009 |
bz |
Add missing include to make LINT-VIMAGE build as well.
Found by: test building an MFC candidate X-MFC with: r200471
|
200471 |
13-Dec-2009 |
bz |
Add a few more V_hacks to nfsclient to allow machines with a VIMAGE kernel to boot from NFS. [1]
Note: this is not a full virtualization of nfsclient. It is only does what advertised above and nothing more.
Requested by: public demand [1] Tested by: kris, .. MFC after: 5 days
|
198174 |
16-Oct-2009 |
jhb |
Close a race with caching of -ve name lookups in the NFS client. Specifically, clients only trust -ve cache entries while the directory remains unchanged and discard any -ve cache entries for a directory when they notice that the modification time of a directory entry changes. The race involves two concurrent lookups as follows: - Thread A does a lookup for file 'foo' which sends a lookup RPC to the server. The lookup fails and the server replies. - The 'foo' file is created (either by the same client or a different client) updating the modification time on the parent directory of 'foo'. - Thread B does a lookup for a different file 'bar' which updates the cached attributes of the parent directory of 'foo' to reflect the new modification time after 'foo' was created. - Thread A finally resumes execution to parse the reply from the NFS server. It adds a -ve cache entry and sets the cached value of the directory's modification time that is used for invalidating -ve cached lookups to the new modification time set by thread B.
At this point, future lookups of 'foo' will honor the -ve cached entry until the cached entry is pushed out of the name cache's LRU or the modification time of the parent directory is changed again by some other change. The fix is to read the directory's modification time before sending the lookup RPC and use that cached modification time when setting the directory's cached modification time. Also, we do not add a -ve cache entry if another thread has added -ve cache entry that set the directory's cached modification time to a newer value than the value we read before sending the lookup RPC.
Reviewed by: rmacklem MFC after: 1 week
|
197997 |
12-Oct-2009 |
rwatson |
Add a MODULE_DEPEND() on the NFS client from dtnfsclient so that dtnfsclient can access NFS client symbols.
MFC after: 3 days Discussed with: kib Reported by: markm
|
197235 |
15-Sep-2009 |
qingli |
Reverting the previous change for now. Some users reports the patch fixes their issues but one reports a failure in NFS ROOT. Revert the change for now pending further investigation.
Reviewed by: bz MFC after: immediately
|
197212 |
15-Sep-2009 |
qingli |
Simply remove the code instead of using "#if 0".
Pointed out by sam
|
197210 |
15-Sep-2009 |
qingli |
The bootp code installs an interface address and the nfs client module tries to install the same address again. This extra code is removed, which was discovered by the removal of a call to in_ifscrub() in r196714. This call to in_ifscrub is put back here because the SIOCAIFADDR command can be used to change the prefix length of an existing alias.
Reviewed by: kmacy
|
197048 |
09-Sep-2009 |
rmacklem |
Add LK_NOWITNESS to the vn_lock() calls done on newly created nfs vnodes, since these nodes are not linked into the mount queue and, as such, the vn_lock() cannot cause a deadlock so LORs are harmless.
Suggested by: kib Approved by: kib (mentor) MFC after: 3 days
|
196503 |
24-Aug-2009 |
zec |
Fix NFS panics with options VIMAGE kernels by apropriately setting curvnet context inside the RPC code.
Temporarily set td's cred to mount's cred before calling socreate() via __rpc_nconf2socket().
Submitted by: rmacklem (in part) Reviewed by: rmacklem, rwatson Discussed with: dfr, bz Approved by: re (rwatson), julian (mentor) MFC after: 3 days
|
196481 |
23-Aug-2009 |
rwatson |
Rework global locks for interface list and index management, correcting several critical bugs, including race conditions and lock order issues:
Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an sxlock. Either can be held to stablize the lists and indexes, but both are required to write. This allows the list to be held stable in both network interrupt contexts and sleepable user threads across sleeping memory allocations or device driver interactions. As before, writes to the interface list must occur from sleepable contexts.
Reviewed by: bz, julian MFC after: 3 days
|
196205 |
14-Aug-2009 |
kib |
In nfs_upgrade_vnlock(), assert that the vnode is locked. It is for all pathes, as far as I see and testing seems to confirm it. Comparision of old_lock with LK_SHARED make sense only if vnode is locked by current thread.
When downgrading, pass LK_RETRY to the vn_lock(), since otherwise vn_lock() unlocks the doomed vnode, causing extra unlock.
Reported and tested by: pho Approved by: re (rwatson) MFC after: 3 weeks
|
196019 |
01-Aug-2009 |
rwatson |
Merge the remainder of kern_vimage.c and vimage.h into vnet.c and vnet.h, we now use jails (rather than vimages) as the abstraction for virtualization management, and what remained was specific to virtual network stacks. Minor cleanups are done in the process, and comments updated to reflect these changes.
Reviewed by: bz Approved by: re (vimage blanket)
|
195744 |
17-Jul-2009 |
rmacklem |
Patch the regular nfs client in a manner analagous to r195704 for the experimental client. The patch avoids calling vn_lock() for the case where nfs_nget() has acquired the same vnode as dvp, since nfs_nget() has already locked the vnode.
Reviewed by: kib, jhb Approved by: re (kensmith), kib (mentor)
|
195703 |
14-Jul-2009 |
kib |
Use PBDRY flag for msleep(9) in NFS and NLM when sleeping thread owns kernel resources that block other threads, like vnode locks. The SIGSTOP sent to such thread (process, rather) shall not stop it until thread releases the resources.
Tested by: pho Reviewed by: jhb Approved by: re (kensmith)
|
195699 |
14-Jul-2009 |
rwatson |
Build on Jeff Roberson's linker-set based dynamic per-CPU allocator (DPCPU), as suggested by Peter Wemm, and implement a new per-virtual network stack memory allocator. Modify vnet to use the allocator instead of monolithic global container structures (vinet, ...). This change solves many binary compatibility problems associated with VIMAGE, and restores ELF symbols for virtualized global variables.
Each virtualized global variable exists as a "reference copy", and also once per virtual network stack. Virtualized global variables are tagged at compile-time, placing the in a special linker set, which is loaded into a contiguous region of kernel memory. Virtualized global variables in the base kernel are linked as normal, but those in modules are copied and relocated to a reserved portion of the kernel's vnet region with the help of a the kernel linker.
Virtualized global variables exist in per-vnet memory set up when the network stack instance is created, and are initialized statically from the reference copy. Run-time access occurs via an accessor macro, which converts from the current vnet and requested symbol to a per-vnet address. When "options VIMAGE" is not compiled into the kernel, normal global ELF symbols will be used instead and indirection is avoided.
This change restores static initialization for network stack global variables, restores support for non-global symbols and types, eliminates the need for many subsystem constructors, eliminates large per-subsystem structures that caused many binary compatibility issues both for monitoring applications (netstat) and kernel modules, removes the per-function INIT_VNET_*() macros throughout the stack, eliminates the need for vnet_symmap ksym(2) munging, and eliminates duplicate definitions of virtualized globals under VIMAGE_GLOBALS.
Bump __FreeBSD_version and update UPDATING.
Portions submitted by: bz Reviewed by: bz, zec Discussed with: gnn, jamie, jeff, jhb, julian, sam Suggested by: peter Approved by: re (kensmith)
|
195294 |
02-Jul-2009 |
kib |
In vn_vget_ino() and their inline equivalents, mnt_ref() the mount point around the sequence that drop vnode lock and then busies the mount point. Not having vlocked node or direct reference to the mp allows for the forced unmount to proceed, making mp unmounted or reused.
Tested by: pho Reviewed by: jeff Approved by: re (kensmith) MFC after: 2 weeks
|
195203 |
30-Jun-2009 |
dfr |
Adjust the internal NFS KPI to avoid the last traces of NFS_LEGACYRPC.
Approved by: re
|
195202 |
30-Jun-2009 |
dfr |
Remove the old kernel RPC implementation and the NFS_LEGACYRPC option.
Approved by: re
|
195181 |
30-Jun-2009 |
jhb |
Fix build with NFS_LEGACYRPC enabled after the socket upcall locking changes.
Approved by: re (kensmith)
|
194951 |
25-Jun-2009 |
rwatson |
Add a new global rwlock, in_ifaddr_lock, which will synchronize use of the in_ifaddrhead and INADDR_HASH address lists.
Previously, these lists were used unsynchronized as they were effectively never changed in steady state, but we've seen increasing reports of writer-writer races on very busy VPN servers as core count has gone up (and similar configurations where address lists change frequently and concurrently).
For the time being, use rwlocks rather than rmlocks in order to take advantage of their better lock debugging support. As a result, we don't enable ip_input()'s read-locking of INADDR_HASH until an rmlock conversion is complete and a performance analysis has been done. This means that one class of reader-writer races still exists.
MFC after: 6 weeks Reviewed by: bz
|
194739 |
23-Jun-2009 |
bz |
After cleaning up rt_tables from vnet.h and cleaning up opt_route.h a lot of files no longer need route.h either. Garbage collect them. While here remove now unneeded vnet.h #includes as well.
|
194425 |
18-Jun-2009 |
alc |
Fix some of the style errors in *getpages().
|
194358 |
17-Jun-2009 |
kib |
For dotdot lookup in nfs_lookup, inline the vn_vget_ino() to prevent operating on the unmounted mount point and freed mount data in case of forced unmount performed while dvp is unlocked to nget the target vnode.
Add missed calls to m_freem(mrep) there on error exits [1].
Submitted by: rmacklem [1] Tested by: pho MFC after: 2 weeks
|
194118 |
13-Jun-2009 |
jamie |
Rename the host-related prison fields to be the same as the host.* parameters they represent, and the variables they replaced, instead of abbreviated versions of them.
Approved by: bz (mentor)
|
193952 |
10-Jun-2009 |
rmacklem |
Add a test for VI_DOOMED just after nfs_upgrade_vnlock() in nfs_bioread_check_cons(). This is required since it is possible for the vnode to be vgonel()'d while in nfs_upgrade_vnlock() when a forced dismount is in progress. Also, move the check for VI_DOOMED in nfs_vinvalbuf() down to after nfs_upgrade_vnlock() and replace the out of date comment for it.
Submitted by: jhb Tested by: pho Approved by: kib (mentor) MFC after: 1 month
|
193744 |
08-Jun-2009 |
bz |
After r193232 rt_tables in vnet.h are no longer indirectly dependent on the ROUTETABLES kernel option thus there is no need to include opt_route.h anymore in all consumers of vnet.h and no longer depend on it for module builds.
Remove the hidden include in flowtable.h as well and leave the two explicit #includes in ip_input.c and ip_output.c.
|
193272 |
01-Jun-2009 |
jhb |
Rework socket upcalls to close some races with setup/teardown of upcalls. - Each socket upcall is now invoked with the appropriate socket buffer locked. It is not permissible to call soisconnected() with this lock held; however, so socket upcalls now return an integer value. The two possible values are SU_OK and SU_ISCONNECTED. If an upcall returns SU_ISCONNECTED, then the soisconnected() will be invoked on the socket after the socket buffer lock is dropped. - A new API is provided for setting and clearing socket upcalls. The API consists of soupcall_set() and soupcall_clear(). - To simplify locking, each socket buffer now has a separate upcall. - When a socket upcall returns SU_ISCONNECTED, the upcall is cleared from the receive socket buffer automatically. Note that a SO_SND upcall should never return SU_ISCONNECTED. - All this means that accept filters should now return SU_ISCONNECTED instead of calling soisconnected() directly. They also no longer need to explicitly clear the upcall on the new socket. - The HTTP accept filter still uses soupcall_set() to manage its internal state machine, but other accept filters no longer have any explicit knowlege of socket upcall internals aside from their return value. - The various RPC client upcalls currently drop the socket buffer lock while invoking soreceive() as a temporary band-aid. The plan for the future is to add a new flag to allow soreceive() to be called with the socket buffer locked. - The AIO callback for socket I/O is now also invoked with the socket buffer locked. Previously sowakeup() would drop the socket buffer lock only to call aio_swake() which immediately re-acquired the socket buffer lock for the duration of the function call.
Discussed with: rwatson, rmacklem
|
193232 |
01-Jun-2009 |
bz |
Convert the two dimensional array to be malloced and introduce an accessor function to get the correct rnh pointer back.
Update netstat to get the correct pointer using kvm_read() as well.
This not only fixes the ABI problem depending on the kernel option but also permits the tunable to overwrite the kernel option at boot time up to MAXFIBS, enlarging the number of FIBs without having to recompile. So people could just use GENERIC now.
Reviewed by: julian, rwatson, zec X-MFC: not possible
|
193187 |
31-May-2009 |
alc |
nfs_write() can use the recently introduced vfs_bio_set_valid() instead of vfs_bio_set_validclean(), thereby avoiding the page queues lock.
Garbage collect vfs_bio_set_validclean(). Nothing uses it any longer.
|
193066 |
29-May-2009 |
jamie |
Place hostnames and similar information fully under the prison system. The system hostname is now stored in prison0, and the global variable "hostname" has been removed, as has the hostname_mtx mutex. Jails may have their own host information, or they may inherit it from the parent/system. The proper way to read the hostname is via getcredhostname(), which will copy either the hostname associated with the passed cred, or the system hostname if you pass NULL. The system hostname can still be accessed directly (and without locking) at prison0.pr_host, but that should be avoided where possible.
The "similar information" referred to is domainname, hostid, and hostuuid, which have also become prison parameters and had their associated global variables removed.
Approved by: bz (mentor)
|
192986 |
28-May-2009 |
alc |
Make *getpages()s' assertion on the state of each page's dirty bits stricter.
|
192686 |
24-May-2009 |
dfr |
Make sure we feed 32bit align memory to nfsm_dissect otherwise we will fault on platforms with strict alignment requirements. In particular, this fixes the problems with the new RPC transport on the arm platform.
Note: this adds yet another copy of nfs_realign(). I will attempt to refactor after NFS_LEGACYRPC is removed.
Submitted by: sam
|
192645 |
23-May-2009 |
bz |
While r192615 fixed the former problems, make this file VIMAGE compliant now as well initializing local context variables.
|
192615 |
23-May-2009 |
bz |
It seems this file was ignored by MRT, rnh locking changes and new-arpv2. So let the V_irtualization people finally make the disabled debugging code compile again.
MFC after: 2 weeks X-MFC: MRT and adapt rnh locking
|
192578 |
22-May-2009 |
rwatson |
Remove the unmaintained University of Michigan NFSv4 client from 8.x prior to 8.0-RELEASE. Rick Macklem's new and more feature-rich NFSv234 client and server are replacing it.
Discussed with: rmacklem
|
192134 |
15-May-2009 |
alc |
Eliminate unnecessary clearing of the page's dirty mask from various getpages functions.
Eliminate a stale comment.
|
192010 |
12-May-2009 |
alc |
Eliminate gratuitous clearing of the page's dirty mask.
|
191990 |
11-May-2009 |
attilio |
Remove the thread argument from the FSD (File-System Dependent) parts of the VFS. Now all the VFS_* functions and relating parts don't want the context as long as it always refers to curthread.
In some points, in particular when dealing with VOPs and functions living in the same namespace (eg. vflush) which still need to be converted, pass curthread explicitly in order to retain the old behaviour. Such loose ends will be fixed ASAP.
While here fix a bug: now, UFS_EXTATTR can be compiled alone without the UFS_EXTATTR_AUTOSTART option.
VFS KPI is heavilly changed by this commit so thirdy parts modules needs to be recompiled. Bump __FreeBSD_version in order to signal such situation.
|
191964 |
10-May-2009 |
alc |
Eliminate stale comments.
Eliminate a case of unnecessary page queues locking.
|
191816 |
05-May-2009 |
zec |
Change the curvnet variable from a global const struct vnet *, previously always pointing to the default vnet context, to a dynamically changing thread-local one. The currvnet context should be set on entry to networking code via CURVNET_SET() macros, and reverted to previous state via CURVNET_RESTORE(). Recursions on curvnet are permitted, though strongly discuouraged.
This change should have no functional impact on nooptions VIMAGE kernel builds, where CURVNET_* macros expand to whitespace.
The curthread->td_vnet (aka curvnet) variable's purpose is to be an indicator of the vnet context in which the current network-related operation takes place, in case we cannot deduce the current vnet context from any other source, such as by looking at mbuf's m->m_pkthdr.rcvif->if_vnet, sockets's so->so_vnet etc. Moreover, so far curvnet has turned out to be an invaluable consistency checking aid: it helps to catch cases when sockets, ifnets or any other vnet-aware structures may have leaked from one vnet to another.
The exact placement of the CURVNET_SET() / CURVNET_RESTORE() macros was a result of an empirical iterative process, whith an aim to reduce recursions on CURVNET_SET() to a minimum, while still reducing the scope of CURVNET_SET() to networking only operations - the alternative would be calling CURVNET_SET() on each system call entry. In general, curvnet has to be set in three typicall cases: when processing socket-related requests from userspace or from within the kernel; when processing inbound traffic flowing from device drivers to upper layers of the networking stack, and when executing timer-driven networking functions.
This change also introduces a DDB subcommand to show the list of all vnet instances.
Approved by: julian (mentor)
|
191777 |
04-May-2009 |
rwatson |
Remove redundant NFSMNT_NFSV3 check in DTrace hooks for NFS RPC.
MFC after: 1 month
|
191776 |
04-May-2009 |
rwatson |
Fix typo in comment.
MFC after: 1 month
|
191013 |
13-Apr-2009 |
kib |
Remove trailing spaces
|
190888 |
10-Apr-2009 |
rwatson |
Remove VOP_LEASE and supporting functions. This hasn't been used since the removal of NQNFS, but was left in in case it was required for NFSv4. Since our new NFSv4 client and server can't use it for their requirements, GC the old mechanism, as well as other unused lease- related code and interfaces.
Due to its impact on kernel programming and binary interfaces, this change should not be MFC'd.
Proposed by: jeff Reviewed by: jeff Discussed with: rmacklem, zach loafman @ isilon
|
190887 |
10-Apr-2009 |
kib |
Cache_lookup() for DOTDOT drops dvp vnode lock, allowing dvp to be reclaimed. Check the condition and return ENOENT then.
In nfs_lookup(), respect ENOENT return from cache_lookup() when it is caused by dvp reclaim.
Reported and tested by: pho
|
190785 |
06-Apr-2009 |
jhb |
When a stale file handle is encountered, purge all cached information about an NFS node including the access and attribute caches. Previously the NFS client only purged any name cache entries associated with the file.
PR: kern/123755 Submitted by: Jaakko Heinonen jh of saunalahti fi Reported by: Timo Sirainen tss of iki fi Reviewed by: rwatson, rmacklem MFC after: 1 month
|
190783 |
06-Apr-2009 |
jhb |
Change the default timeout for caching attributes of a directory in the NFS client from 30 seconds to 3 seconds. After the recent changes to add caching of negative name cache lookups, a negative name cache hit will persist until the client notices the parent directory has changed. The higher the attribute cache timeout on directories, the longer that can take, so lower the default timeout for directories to match that of regular files.
Suggested by: bde, mohans MFC after: 1 month
|
190419 |
25-Mar-2009 |
rwatson |
Move dtnfsclient.c in the cddl tree to nfs_kdtrace.c in the nfsclient directory, since it's under a BSD license, and this keeps NFS internals- aware tracing parts close to NFS.
MFC after: 1 month Suggested by: jhb
|
190396 |
24-Mar-2009 |
rwatson |
Fix two bugs in DTrace tracing of accesscache and attrcache load events:
- Trace non-error loads into the access cache once, not zero times or twice. - Sometimes attr cache loads fail due to a race, in which case they are aborted leading to an invalidation; in this case, trace only the flush, not a load.
MFC after: 1 month Sponsored by: Google, Inc.
|
190380 |
24-Mar-2009 |
rwatson |
Add DTrace probes to the NFS access and attribute caches. Access cache events are:
nfsclient:accesscache:flush:done nfsclient:accesscache:get:hit nfsclient:accesscache:get:miss nfsclient:accesscache:load:done
They pass the vnode, uid, and requested or loaded access mode (if any); the load event may also report a load error if the RPC fails.
The attribute cache events are:
nfsclient:attrcache:flush:done nfsclient:attrcache:get:hit nfsclient:attrcache:get:miss nfsclient:attrcache:load:done
They pass the vnode, optionally the vattr if one is present (hit or load), and in the case of a load event, also a possible RPC error.
MFC after: 1 month Sponsored by: Google, Inc.
|
190293 |
22-Mar-2009 |
rwatson |
Add dtnfsclient, a first cut at an NFSv2/v3 client reuest DTrace provider. The NFS client exposes 'start' and 'done' probes for NFSv2 and NFSv3 RPCs when using the new RPC implementation, passing in the vnode, mbuf chain, credential, and NFSv2 or NFSv3 procedure number. For 'done' probes, the error number is also available.
Probes are named in the following way:
... nfsclient:nfs2:write:start nfsclient:nfs2:write:done ... nfsclient:nfs3:access:start nfsclient:nfs3:access:done ...
Access to the unmarshalled arguments is not easily available at this point in the stack, but the passed probe arguments are sufficient to to a lot of interesting things in practice. Technically, these probes may cover multiple RPC retransmits, and even transactions if the transaction ID change as a result of authentication failure or a jukebox error from the server, but usefully capture the intent of a single NFS request, such as access, getattr, write, etc.
Typical use might involve profiling RPC latency by system call, number of RPCs, how often a getattr leads to a call to access, when failed access control checks occur, etc. More detailed RPC information might best be provided by adding a krpc provider. It would also be useful to add NFS client probes for events such as the access cache or attribute cache satisfying requests without an RPC.
Sponsored by: Google, Inc. MFC after: 1 month
|
190220 |
21-Mar-2009 |
rwatson |
In nfs_request(), always exit using the nfsmout label once we're definitely doing an NFSv2 or NFSv3 RPC, rather than sometimes doing so and sometimes not. This makes it easier to add a DTrace return probe at a single point in the function.
MFC after: 1 week
|
190176 |
20-Mar-2009 |
jhb |
Expand the per-node access cache to cache permissions for multiple users. The number of entries in the cache defaults to 8 but is easily changed in nfsclient/nfs.h. When the cache is filled, the oldest cache entry is evicted when space is needed.
I mirrored the changes to the NFSv[23] client in the NFSv4 client to fix compile breakage. However, the NFSv4 client doesn't actually use the access cache currently.
Submitted by: rmacklem
|
189639 |
10-Mar-2009 |
jhb |
- Remove code to set SAVENAME for CREATE or RENAME requests that get a -ve hit in the name cache. cache_lookup() doesn't actually return ENOENT for such requests to force the filesystem to do an explicit lookup, so this was effectively dead code. - Grab the nfsnode mutex while writing to n_dmtime. We don't grab the lock when comparing the time against the cached directory mod time (just as we don't when comparing ctime's for +ve name cache hits) since the attribute caching is already racy for NFS clients as it is.
Discussed with: bde
|
189106 |
27-Feb-2009 |
bz |
For all files including net/vnet.h directly include opt_route.h and net/route.h.
Remove the hidden include of opt_route.h and net/route.h from net/vnet.h.
We need to make sure that both opt_route.h and net/route.h are included before net/vnet.h because of the way MRT figures out the number of FIBs from the kernel option. If we do not, we end up with the default number of 1 when including net/vnet.h and array sizes are wrong.
This does not change the list of files which depend on opt_route.h but we can identify them now more easily.
|
188994 |
24-Feb-2009 |
jhb |
Bring back the code to prime the ACCESS cache when fetching attributes for an NFS file. Now the priming is conditional on a new vfs.nfs.prime_access_cache sysctl. For now I've left the default setting to disabling the priming.
Requested by: scottl
|
188833 |
19-Feb-2009 |
jhb |
Enable caching of negative pathname lookups in the NFS client. To avoid stale entries, we save a copy of the directory's modification time when the first negative cache entry was added in the directory's NFS node. When a negative cache entry is hit during a pathname lookup, the parent directory's modification time is checked. If it has changed, all of the negative cache entries for that parent are purged and the lookup falls back to using the RPC. This required adding a new cache_purge_negative() method to the name cache to purge only negative cache entries for a given directory.
Submitted by: mohans, Rick Macklem, Ricardo Labiaga @ NetApp Reviewed by: mohans
|
188832 |
19-Feb-2009 |
jhb |
When fetching attributes for a file for NFSv3 mounts, do not perform an opportunistic ACCESS RPC to populate both the access and attribute caches of the file and instead always use a GETATTR RPC. On many modern NFS servers, an ACCESS RPC is much more expensive to service than a GETATTR RPC.
Submitted by: mohans
|
188831 |
19-Feb-2009 |
jhb |
Don't clear the attribute cache of a file when it is closed. A subsequent open() of the same file will load fresh attributes, so they do not need to be explicitly flushed in close() to guarantee close to open consistency. However, other file desciptors may still reference this file and clearing the attributes in close() forces those other file descriptors to fetch fresh attributes the next time they need them.
Reviewed by: mohans MFC after: 1 week
|
188751 |
18-Feb-2009 |
jhb |
Reindent a small bit of code that was not 8-space indented like the rest of the nfs_lookup() function.
|
187830 |
28-Jan-2009 |
ed |
Last step of splitting up minor and unit numbers: remove minor().
Inside the kernel, the minor() function was responsible for obtaining the device minor number of a character device. Because we made device numbers dynamically allocated and independent of the unit number passed to make_dev() a long time ago, it was actually a misnomer. If you really want to obtain the device number, you should use dev2udev().
We already converted all the drivers to use dev2unit() to obtain the device unit number, which is still used by a lot of drivers. I've noticed not a single driver passes NULL to dev2unit(). Even if they would, its behaviour would make little sense. This is why I've removed the NULL check.
Ths commit removes minor(), minor2unit() and unit2minor() from the kernel. Because there was a naming collision with uminor(), we can rename umajor() and uminor() back to major() and minor(). This means that the makedev(3) manual page also applies to kernel space code now.
I suspect umajor() and uminor() isn't used that often in external code, but to make it easier for other parties to port their code, I've increased __FreeBSD_version to 800062.
|
187812 |
28-Jan-2009 |
rodrigc |
Fix parsing of acregmin, acregmax, acdirmin and acdirmax NFS mount options when passed as strings via nmount().
Submitted by: Jaakko Heinonen <jh saunalahti fi>
|
187526 |
21-Jan-2009 |
jhb |
Move the VA_MARKATIME flag for VOP_SETATTR() out into its own VOP: VOP_MARKATIME() since unlike the rest of VOP_SETATTR(), VA_MARKATIME can be performed while holding a shared vnode lock (the same functionality is done internally by VOP_READ which can run with a shared vnode lock). Add missing locking of the vnode interlock to the ufs implementation and remove a special note and test from the NFS client about not supporting the feature.
Inspired by: ups Tested by: pho
|
185571 |
02-Dec-2008 |
bz |
Rather than using hidden includes (with cicular dependencies), directly include only the header files needed. This reduces the unneeded spamming of various headers into lots of files.
For now, this leaves us with very few modules including vnet.h and thus needing to depend on opt_route.h.
Reviewed by: brooks, gnn, des, zec, imp Sponsored by: The FreeBSD Foundation
|
184969 |
14-Nov-2008 |
dfr |
Switch the default rpc implementation for NFS back to the new code. I believe I have fixed the reported problems - if you still have trouble with it, please contact me with as much detail as possible so that I can track down any other issues as quickly as possible.
|
184920 |
13-Nov-2008 |
dfr |
Temporarily switch NFS back to the old RPC code while I try to diagnose and fix the problems a few people have noticed with the new code. People who want to continue testing the new code or who need RPCSEC_GSS support should use the new option NFS_NEWRPC to select it.
|
184588 |
03-Nov-2008 |
dfr |
Implement support for RPCSEC_GSS authentication to both the NFS client and server. This replaces the RPC implementation of the NFS client and server with the newer RPC implementation originally developed (actually ported from the userland sunrpc code) to support the NFS Lock Manager. I have tested this code extensively and I believe it is stable and that performance is at least equal to the legacy RPC implementation.
The NFS code currently contains support for both the new RPC implementation and the older legacy implementation inherited from the original NFS codebase. The default is to use the new implementation - add the NFS_LEGACYRPC option to fall back to the old code. When I merge this support back to RELENG_7, I will probably change this so that users have to 'opt in' to get the new code.
To use RPCSEC_GSS on either client or server, you must build a kernel which includes the KGSSAPI option and the crypto device. On the userland side, you must build at least a new libc, mountd, mount_nfs and gssd. You must install new versions of /etc/rc.d/gssd and /etc/rc.d/nfsd and add 'gssd_enable=YES' to /etc/rc.conf.
As long as gssd is running, you should be able to mount an NFS filesystem from a server that requires RPCSEC_GSS authentication. The mount itself can happen without any kerberos credentials but all access to the filesystem will be denied unless the accessing user has a valid ticket file in the standard place (/tmp/krb5cc_<uid>). There is currently no support for situations where the ticket file is in a different place, such as when the user logged in via SSH and has delegated credentials from that login. This restriction is also present in Solaris and Linux. In theory, we could improve this in future, possibly using Brooks Davis' implementation of variant symlinks.
Supporting RPCSEC_GSS on a server is nearly as simple. You must create service creds for the server in the form 'nfs/<fqdn>@<REALM>' and install them in /etc/krb5.keytab. The standard heimdal utility ktutil makes this fairly easy. After the service creds have been created, you can add a '-sec=krb5' option to /etc/exports and restart both mountd and nfsd.
The only other difference an administrator should notice is that nfsd doesn't fork to create service threads any more. In normal operation, there will be two nfsd processes, one in userland waiting for TCP connections and one in the kernel handling requests. The latter process will create as many kthreads as required - these should be visible via 'top -H'. The code has some support for varying the number of service threads according to load but initially at least, nfsd uses a fixed number of threads according to the value supplied to its '-n' option.
Sponsored by: Isilon Systems MFC after: 1 month
|
184561 |
02-Nov-2008 |
trhodes |
Document a few sysctls in the NFS client and server code. Minor style(9) where applicable.
Approved by: alfred (slightly older version)
|
184554 |
02-Nov-2008 |
attilio |
Improve VFS locking: - Implement real draining for vfs consumers by not relying on the mnt_lock and using instead a refcount in order to keep track of lock requesters. - Due to the change above, remove the mnt_lock lockmgr because it is now useless. - Due to the change above, vfs_busy() is no more linked to a lockmgr. Change so its KPI by removing the interlock argument and defining 2 new flags for it: MBF_NOWAIT which basically replaces the LK_NOWAIT of the old version (which was unlinked from the lockmgr alredy) and MBF_MNTLSTLOCK which provides the ability to drop the mountlist_mtx once the mnt interlock is held (ability still desired by most consumers). - The stub used into vfs_mount_destroy(), that allows to override the mnt_ref if running for more than 3 seconds, make it totally useless. Remove it as it was thought to work into older versions. If a problem of "refcount held never going away" should appear, we will need to fix properly instead than trust on such hackish solution. - Fix a bug where returning (with an error) from dounmount() was still leaving the MNTK_MWAIT flag on even if it the waiters were actually woken up. Just a place in vfs_mount_destroy() is left because it is going to recycle the structure in any case, so it doesn't matter. - Remove the markercnt refcount as it is useless.
This patch modifies VFS ABI and breaks KPI for vfs_busy() so manpages and __FreeBSD_version will be modified accordingly.
Discussed with: kib Tested by: pho
|
184413 |
28-Oct-2008 |
trasz |
Introduce accmode_t. This is required for NFSv4 ACLs - it will be neccessary to add more V* constants, and the variables changed by this patch were often being assigned to mode_t variables, which is 16 bit.
Approved by: rwatson (mentor)
|
184214 |
23-Oct-2008 |
des |
Fix a number of style issues in the MALLOC / FREE commit. I've tried to be careful not to fix anything that was already broken; the NFSv4 code is particularly bad in this respect.
|
184205 |
23-Oct-2008 |
des |
Retire the MALLOC and FREE macros. They are an abomination unto style(9).
MFC after: 3 months
|
183754 |
10-Oct-2008 |
attilio |
Remove the struct thread unuseful argument from bufobj interface. In particular following functions KPI results modified: - bufobj_invalbuf() - bufsync()
and BO_SYNC() "virtual method" of the buffer objects set. Main consumers of bufobj functions are affected by this change too and, in particular, functions which changed their KPI are: - vinvalbuf() - g_vfs_close()
Due to the KPI breakage, __FreeBSD_version will be bumped in a later commit.
As a side note, please consider just temporary the 'curthread' argument passing to VOP_SYNC() (in bufsync()) as it will be axed out ASAP
Reviewed by: kib Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
|
183550 |
02-Oct-2008 |
zec |
Step 1.5 of importing the network stack virtualization infrastructure from the vimage project, as per plan established at devsummit 08/08: http://wiki.freebsd.org/Image/Notes200808DevSummit
Introduce INIT_VNET_*() initializer macros, VNET_FOREACH() iterator macros, and CURVNET_SET() context setting macros, all currently resolving to NOPs.
Prepare for virtualization of selected SYSCTL objects by introducing a family of SYSCTL_V_*() macros, currently resolving to their global counterparts, i.e. SYSCTL_V_INT() == SYSCTL_INT().
Move selected #defines from sys/sys/vimage.h to newly introduced header files specific to virtualized subsystems (sys/net/vnet.h, sys/netinet/vinet.h etc.).
All the changes are verified to have zero functional impact at this point in time by doing MD5 comparision between pre- and post-change object files(*).
(*) netipsec/keysock.c did not validate depending on compile time options.
Implemented by: julian, bz, brooks, zec Reviewed by: julian, bz, brooks, kris, rwatson, ... Approved by: julian (mentor) Obtained from: //depot/projects/vimage-commit2/... X-MFC after: never Sponsored by: NLnet Foundation, The FreeBSD Foundation
|
183330 |
24-Sep-2008 |
jhb |
Part 1 of making shared lookups more resilient with respect to forced unmounts. When we upgrade a vnode lock from shared to exclusive during a name cache lookup, fail the lookup with EBADF if the vnode is invalidated while we are waiting for the exclusive lock.
Also, for correctness (though I'm not sure it can occur in practice), downgrade an exclusively locked vnode if it should be share locked.
Tested by: pho
|
183215 |
20-Sep-2008 |
kib |
fdescfs, devfs, mqueuefs, nfs, portalfs, pseudofs, tmpfs and xfs initialize the vattr structure in VOP_GETATTR() with VATTR_NULL(), vattr_null() or by zeroing it. Remove these to allow preinitialization of fields work in vn_stat(). This is needed to get birthtime initialized correctly.
Submitted by: Jaakko Heinonen <jh saunalahti fi> Discussed on: freebsd-fs MFC after: 1 month
|
183005 |
13-Sep-2008 |
rodrigc |
Add code to parse NFS mount options passed as individual items of the nmount() iovec. This will allow us to move away from gathering up all the NFS mount options as a single "struct nfs_args" to be passed down through nmount(). This will make adding new NFS mount options much easier. Many, many thanks to Doug Rabson, who took my initial patches and cleaned them up.
Reviewed by: dfr MFC after: 3 months
|
182542 |
31-Aug-2008 |
attilio |
Decontextualize vfs_busy(), vfs_unbusy() and vfs_mount_alloc() functions.
Manpages are updated accordingly.
Tested by: Diego Sardina <siarodx at gmail dot com>
|
182371 |
28-Aug-2008 |
attilio |
Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed thread was always curthread and totally unuseful.
Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
|
181803 |
17-Aug-2008 |
bz |
Commit step 1 of the vimage project, (network stack) virtualization work done by Marko Zec (zec@).
This is the first in a series of commits over the course of the next few weeks.
Mark all uses of global variables to be virtualized with a V_ prefix. Use macros to map them back to their global names for now, so this is a NOP change only.
We hope to have caught at least 85-90% of what is needed so we do not invalidate a lot of outstanding patches again.
Obtained from: //depot/projects/vimage-commit2/... Reviewed by: brooks, des, ed, mav, julian, jamie, kris, rwatson, zec, ... (various people I forgot, different versions) md5 (with a bit of help) Sponsored by: NLnet Foundation, The FreeBSD Foundation X-MFC after: never V_Commit_Message_Reviewed_By: more people than the patch
|
180780 |
24-Jul-2008 |
dfr |
Try again not to use a userspace pointer in the kernel when trying to record the hostname which we need for NLM requests. The previous patch was incomplete.
PR: 125849 Pointy hat: dfr
|
180779 |
24-Jul-2008 |
dfr |
Don't use a userspace pointer in the kernel when trying to record the hostname which we need for NLM requests.
PR: 125849
|
180724 |
22-Jul-2008 |
ed |
Move the NFS/RPC code away from lbolt.
The kernel has a special wchan called `lbolt', which is triggered each second. It doesn't seem to be used a lot and it seems pretty redundant, because we can specify a timeout value to the *sleep() routines. In an attempt to eventually remove lbolt, make the NFS/RPC code use a timeout of `hz' when trying to reconnect.
Only the TTY code (not MPSAFE TTY) and the VFS syncer seem to use lbolt now.
Reviewed by: attilio, jhb Approved by: philip (mentor), alfred, dfr
|
180291 |
05-Jul-2008 |
rwatson |
Introduce a new lock, hostname_mtx, and use it to synchronize access to global hostname and domainname variables. Where necessary, copy to or from a stack-local buffer before performing copyin() or copyout(). A few uses, such as in cd9660 and daemon_saver, remain under-synchronized and will require further updates.
Correct a bug in which a failed copyin() of domainname would leave domainname potentially corrupted.
MFC after: 3 weeks
|
180025 |
26-Jun-2008 |
dfr |
Re-implement the client side of rpc.lockd in the kernel. This implementation provides the correct semantics for flock(2) style locks which are used by the lockf(1) command line tool and the pidfile(3) library. It also implements recovery from server restarts and ensures that dirty cache blocks are written to the server before obtaining locks (allowing multiple clients to use file locking to safely share data).
Sponsored by: Isilon Systems PR: 94256 MFC after: 2 weeks
|
179333 |
27-May-2008 |
attilio |
Once the ENOLCK is detected we expect to retry the acquisition. Anyway, in the edge case the flushing happens and the while is no more executed, nfs_flush() (and nfs4_flush()) can return with a wrong err value of ENOLCK. Bring it back to 0, as we expect to have for that case.
Reported by: kris Reviewed by: kib
|
179039 |
16-May-2008 |
benno |
Allow the block size used when booting over NFS to be overridden. It defaults to 8192 bytes which is the size currently used.
|
178888 |
09-May-2008 |
julian |
Add code to allow the system to handle multiple routing tables. This particular implementation is designed to be fully backwards compatible and to be MFC-able to 7.x (and 6.x)
Currently the only protocol that can make use of the multiple tables is IPv4 Similar functionality exists in OpenBSD and Linux.
From my notes:
-----
One thing where FreeBSD has been falling behind, and which by chance I have some time to work on is "policy based routing", which allows different packet streams to be routed by more than just the destination address.
Constraints: ------------
I want to make some form of this available in the 6.x tree (and by extension 7.x) , but FreeBSD in general needs it so I might as well do it in -current and back port the portions I need.
One of the ways that this can be done is to have the ability to instantiate multiple kernel routing tables (which I will now refer to as "Forwarding Information Bases" or "FIBs" for political correctness reasons). Which FIB a particular packet uses to make the next hop decision can be decided by a number of mechanisms. The policies these mechanisms implement are the "Policies" referred to in "Policy based routing".
One of the constraints I have if I try to back port this work to 6.x is that it must be implemented as a EXTENSION to the existing ABIs in 6.x so that third party applications do not need to be recompiled in timespan of the branch.
This first version will not have some of the bells and whistles that will come with later versions. It will, for example, be limited to 16 tables in the first commit. Implementation method, Compatible version. (part 1) ------------------------------- For this reason I have implemented a "sufficient subset" of a multiple routing table solution in Perforce, and back-ported it to 6.x. (also in Perforce though not always caught up with what I have done in -current/P4). The subset allows a number of FIBs to be defined at compile time (8 is sufficient for my purposes in 6.x) and implements the changes needed to allow IPV4 to use them. I have not done the changes for ipv6 simply because I do not need it, and I do not have enough knowledge of ipv6 (e.g. neighbor discovery) needed to do it.
Other protocol families are left untouched and should there be users with proprietary protocol families, they should continue to work and be oblivious to the existence of the extra FIBs.
To understand how this is done, one must know that the current FIB code starts everything off with a single dimensional array of pointers to FIB head structures (One per protocol family), each of which in turn points to the trie of routes available to that family.
The basic change in the ABI compatible version of the change is to extent that array to be a 2 dimensional array, so that instead of protocol family X looking at rt_tables[X] for the table it needs, it looks at rt_tables[Y][X] when for all protocol families except ipv4 Y is always 0. Code that is unaware of the change always just sees the first row of the table, which of course looks just like the one dimensional array that existed before.
The entry points rtrequest(), rtalloc(), rtalloc1(), rtalloc_ign() are all maintained, but refer only to the first row of the array, so that existing callers in proprietary protocols can continue to do the "right thing". Some new entry points are added, for the exclusive use of ipv4 code called in_rtrequest(), in_rtalloc(), in_rtalloc1() and in_rtalloc_ign(), which have an extra argument which refers the code to the correct row.
In addition, there are some new entry points (currently called rtalloc_fib() and friends) that check the Address family being looked up and call either rtalloc() (and friends) if the protocol is not IPv4 forcing the action to row 0 or to the appropriate row if it IS IPv4 (and that info is available). These are for calling from code that is not specific to any particular protocol. The way these are implemented would change in the non ABI preserving code to be added later.
One feature of the first version of the code is that for ipv4, the interface routes show up automatically on all the FIBs, so that no matter what FIB you select you always have the basic direct attached hosts available to you. (rtinit() does this automatically).
You CAN delete an interface route from one FIB should you want to but by default it's there. ARP information is also available in each FIB. It's assumed that the same machine would have the same MAC address, regardless of which FIB you are using to get to it.
This brings us as to how the correct FIB is selected for an outgoing IPV4 packet.
Firstly, all packets have a FIB associated with them. if nothing has been done to change it, it will be FIB 0. The FIB is changed in the following ways.
Packets fall into one of a number of classes.
1/ locally generated packets, coming from a socket/PCB. Such packets select a FIB from a number associated with the socket/PCB. This in turn is inherited from the process, but can be changed by a socket option. The process in turn inherits it on fork. I have written a utility call setfib that acts a bit like nice..
setfib -3 ping target.example.com # will use fib 3 for ping.
It is an obvious extension to make it a property of a jail but I have not done so. It can be achieved by combining the setfib and jail commands.
2/ packets received on an interface for forwarding. By default these packets would use table 0, (or possibly a number settable in a sysctl(not yet)). but prior to routing the firewall can inspect them (see below). (possibly in the future you may be able to associate a FIB with packets received on an interface.. An ifconfig arg, but not yet.)
3/ packets inspected by a packet classifier, which can arbitrarily associate a fib with it on a packet by packet basis. A fib assigned to a packet by a packet classifier (such as ipfw) would over-ride a fib associated by a more default source. (such as cases 1 or 2).
4/ a tcp listen socket associated with a fib will generate accept sockets that are associated with that same fib.
5/ Packets generated in response to some other packet (e.g. reset or icmp packets). These should use the FIB associated with the packet being reponded to.
6/ Packets generated during encapsulation. gif, tun and other tunnel interfaces will encapsulate using the FIB that was in effect withthe proces that set up the tunnel. thus setfib 1 ifconfig gif0 [tunnel instructions] will set the fib for the tunnel to use to be fib 1.
Routing messages would be associated with their process, and thus select one FIB or another. messages from the kernel would be associated with the fib they refer to and would only be received by a routing socket associated with that fib. (not yet implemented)
In addition Netstat has been edited to be able to cope with the fact that the array is now 2 dimensional. (It looks in system memory using libkvm (!)). Old versions of netstat see only the first FIB.
In addition two sysctls are added to give: a) the number of FIBs compiled in (active) b) the default FIB of the calling process.
Early testing experience: -------------------------
Basically our (IronPort's) appliance does this functionality already using ipfw fwd but that method has some drawbacks.
For example, It can't fully simulate a routing table because it can't influence the socket's choice of local address when a connect() is done.
Testing during the generating of these changes has been remarkably smooth so far. Multiple tables have co-existed with no notable side effects, and packets have been routes accordingly.
ipfw has grown 2 new keywords:
setfib N ip from anay to any count ip from any to any fib N
In pf there seems to be a requirement to be able to give symbolic names to the fibs but I do not have that capacity. I am not sure if it is required.
SCTP has interestingly enough built in support for this, called VRFs in Cisco parlance. it will be interesting to see how that handles it when it suddenly actually does something.
Where to next: --------------------
After committing the ABI compatible version and MFCing it, I'd like to proceed in a forward direction in -current. this will result in some roto-tilling in the routing code.
Firstly: the current code's idea of having a separate tree per protocol family, all of the same format, and pointed to by the 1 dimensional array is a bit silly. Especially when one considers that there is code that makes assumptions about every protocol having the same internal structures there. Some protocols don't WANT that sort of structure. (for example the whole idea of a netmask is foreign to appletalk). This needs to be made opaque to the external code.
My suggested first change is to add routing method pointers to the 'domain' structure, along with information pointing the data. instead of having an array of pointers to uniform structures, there would be an array pointing to the 'domain' structures for each protocol address domain (protocol family), and the methods this reached would be called. The methods would have an argument that gives FIB number, but the protocol would be free to ignore it.
When the ABI can be changed it raises the possibilty of the addition of a fib entry into the "struct route". Currently, the structure contains the sockaddr of the desination, and the resulting fib entry. To make this work fully, one could add a fib number so that given an address and a fib, one can find the third element, the fib entry.
Interaction with the ARP layer/ LL layer would need to be revisited as well. Qing Li has been working on this already.
This work was sponsored by Ironport Systems/Cisco
Reviewed by: several including rwatson, bz and mlair (parts each) Obtained from: Ironport systems/Cisco
|
178429 |
22-Apr-2008 |
phk |
Now that all platforms use genclock, shuffle things around slightly for better structure.
Much of this is related to <sys/clock.h>, which should really have been called <sys/calendar.h>, but unless and until we need the name, the repocopy can wait.
In general the kernel does not know about minutes, hours, days, timezones, daylight savings time, leap-years and such. All that is theoretically a matter for userland only.
Parts of kernel code does however care: badly designed filesystems store timestamps in local time and RTC chips almost universally track time in a YY-MM-DD HH:MM:SS format, and sometimes in local timezone instead of UTC. For this we have <sys/clock.h>
<sys/time.h> on the other hand, deals with time_t, timeval, timespec and so on. These know only seconds and fractions thereof.
Move inittodr() and resettodr() prototypes to <sys/time.h>. Retain the names as it is one of the few surviving PDP/VAX references.
Move startrtclock() to <machine/clock.h> on relevant platforms, it is a MD call between machdep.c/clock.c. Remove references to it elsewhere.
Remove a lot of unnecessary <sys/clock.h> includes.
Move the machdep.disable_rtc_set sysctl to subr_rtc.c where it belongs. XXX: should be kern.disable_rtc_set really, it's not MD.
|
178243 |
16-Apr-2008 |
kib |
Move the head of byte-level advisory lock list from the filesystem-specific vnode data to the struct vnode. Provide the default implementation for the vop_advlock and vop_advlockasync. Purge the locks on the vnode reclaim by using the lf_purgelocks(). The default implementation is augmented for the nfs and smbfs. In the nfs_advlock, push the Giant inside the nfs_dolock.
Before the change, the vop_advlock and vop_advlockasync have taken the unlocked vnode and dereferenced the fs-private inode data, racing with with the vnode reclamation due to forced unmount. Now, the vop_getattr under the shared vnode lock is used to obtain the inode size, and later, in the lf_advlockasync, after locking the vnode interlock, the VI_DOOMED flag is checked to prevent an operation on the doomed vnode.
The implementation of the lf_purgelocks() is submitted by dfr.
Reported by: kris Tested by: kris, pho Discussed with: jeff, dfr MFC after: 2 weeks
|
177633 |
26-Mar-2008 |
dfr |
Add the new kernel-mode NFS Lock Manager. To use it instead of the user-mode lock manager, build a kernel with the NFSLOCKD option and add '-k' to 'rpc_lockd_flags' in rc.conf.
Highlights include:
* Thread-safe kernel RPC client - many threads can use the same RPC client handle safely with replies being de-multiplexed at the socket upcall (typically driven directly by the NIC interrupt) and handed off to whichever thread matches the reply. For UDP sockets, many RPC clients can share the same socket. This allows the use of a single privileged UDP port number to talk to an arbitrary number of remote hosts.
* Single-threaded kernel RPC server. Adding support for multi-threaded server would be relatively straightforward and would follow approximately the Solaris KPI. A single thread should be sufficient for the NLM since it should rarely block in normal operation.
* Kernel mode NLM server supporting cancel requests and granted callbacks. I've tested the NLM server reasonably extensively - it passes both my own tests and the NFS Connectathon locking tests running on Solaris, Mac OS X and Ubuntu Linux.
* Userland NLM client supported. While the NLM server doesn't have support for the local NFS client's locking needs, it does have to field async replies and granted callbacks from remote NLMs that the local client has contacted. We relay these replies to the userland rpc.lockd over a local domain RPC socket.
* Robust deadlock detection for the local lock manager. In particular it will detect deadlocks caused by a lock request that covers more than one blocking request. As required by the NLM protocol, all deadlock detection happens synchronously - a user is guaranteed that if a lock request isn't rejected immediately, the lock will eventually be granted. The old system allowed for a 'deferred deadlock' condition where a blocked lock request could wake up and find that some other deadlock-causing lock owner had beaten them to the lock.
* Since both local and remote locks are managed by the same kernel locking code, local and remote processes can safely use file locks for mutual exclusion. Local processes have no fairness advantage compared to remote processes when contending to lock a region that has just been unlocked - the local lock manager enforces a strict first-come first-served model for both local and remote lockers.
Sponsored by: Isilon Systems PR: 95247 107555 115524 116679 MFC after: 2 weeks
|
177599 |
25-Mar-2008 |
ru |
Replaced the misleading uses of a historical artefact M_TRYWAIT with M_WAIT. Removed dead code that assumed that M_TRYWAIT can return NULL; it's not true since the advent of MBUMA.
Reviewed by: arch
There are ongoing disputes as to whether we want to switch to directly using UMA flags M_WAITOK/M_NOWAIT for mbuf(9) allocation.
|
177493 |
22-Mar-2008 |
jeff |
- Complete part of the unfinished bufobj work by consistently using BO_LOCK/UNLOCK/MTX when manipulating the bufobj. - Create a new lock in the bufobj to lock bufobj fields independently. This leaves the vnode interlock as an 'identity' lock while the bufobj is an io lock. The bufobj lock is ordered before the vnode interlock and also before the mnt ilock. - Exploit this new lock order to simplify softdep_check_suspend(). - A few sync related functions are marked with a new XXX to note that we may not properly interlock against a non-zero bv_cnt when attempting to sync all vnodes on a mountlist. I do not believe this race is important. If I'm wrong this will make these locations easier to find.
Reviewed by: kib (earlier diff) Tested by: kris, pho (earlier diff)
|
177253 |
16-Mar-2008 |
rwatson |
In keeping with style(9)'s recommendations on macros, use a ';' after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr.
MFC after: 1 month Discussed with: imp, rink
|
176824 |
05-Mar-2008 |
rodrigc |
Expand the nfs_opts array to include all possible string mount options that mount_nfs could pass down, if it passed down string mount options. Right now, mount_nfs jut passes down a single mount option named "nfs_args" with a fully initialized 'struct nfs_args'.
In future commits, we will add code to the kernel for parsing stringified NFS mount options, so that we can convert mount_nfs to pass string options from userspace to kernel, instead of an initialized struct nfs_args.
|
176823 |
05-Mar-2008 |
rodrigc |
In nfs_mount(), default initialize struct nfs_args the same way that it is default initialized in revision 1.77 of mount_nfs.c.
Right now, this is a no-op, because currently we initialize struct nfs_args in mount_nfs in userspace, and pass it down into the kernel via nmount(), so we overwrite whatever we initialize here with the value passed in from userspace.
However, this lays the groundwork for moving away from passing struct nfs_args from userspace to kernel via nmount(), so that we can instead pass string mount options via nmount() which can be parsed in the kernel. This will make it easier to add new NFS mount options.
|
176559 |
25-Feb-2008 |
attilio |
Axe the 'thread' argument from VOP_ISLOCKED() and lockstatus() as it is always curthread.
As KPI gets broken by this patch, manpages and __FreeBSD_version will be updated by further commits.
Tested by: Andrea Barberio <insomniac at slackware dot it>
|
176519 |
24-Feb-2008 |
attilio |
Introduce some functions in the vnode locks namespace and in the ffs namespace in order to handle lockmgr fields in a controlled way instead than spreading all around bogus stubs: - VN_LOCK_AREC() allows lock recursion for a specified vnode - VN_LOCK_ASHARE() allows lock sharing for a specified vnode
In FFS land: - BUF_AREC() allows lock recursion for a specified buffer lock - BUF_NOREC() disallows recursion for a specified buffer lock
Side note: union_subr.c::unionfs_node_update() is the only other function directly handling lockmgr fields. As this is not simple to fix, it has been left behind as "sole" exception.
|
176374 |
17-Feb-2008 |
yar |
Prevent the NFS client from losing MNT_ROOTFS on the root file system. In particular, stop overwriting mount point flags in nfs_mountdiskless() because now they are set elsewhere. (They were _initialized_ by that function in the 4.4BSD days, when mount structures were not allocated in a centralized manner -- see rev. 1.1 of this file.)
Fix nfs_mount(), which happened to depend on the loss of MNT_ROOTFS when it came to update handling.
Also note that mountnfs() no longer handles updates. Now they shouldn't reach this function, so printf a diagnostic message if that happens due to a coding error.
|
176249 |
13-Feb-2008 |
attilio |
- Add real assertions to lockmgr locking primitives. A couple of notes for this: * WITNESS support, when enabled, is only used for shared locks in order to avoid problems with the "disowned" locks * KA_HELD and KA_UNHELD only exists in the lockmgr namespace in order to assert for a generic thread (not curthread) owning or not the lock. Really, this kind of check is bogus but it seems very widespread in the consumers code. So, for the moment, we cater this untrusted behaviour, until the consumers are not fixed and the options could be removed (hopefully during 8.0-CURRENT lifecycle) * Implementing KA_HELD and KA_UNHELD (not surported natively by WITNESS) made necessary the introduction of LA_MASKASSERT which specifies the range for default lock assertion flags * About other aspects, lockmgr_assert() follows exactly what other locking primitives offer about this operation.
- Build real assertions for buffer cache locks on the top of lockmgr_assert(). They can be used with the BUF_ASSERT_*(bp) paradigm.
- Add checks at lock destruction time and use a cookie for verifying lock integrity at any operation.
- Redefine BUF_LOCKFREE() in order to not use a direct assert but let it rely on the aforementioned destruction time check.
KPI results evidently broken, so __FreeBSD_version bumping and manpage update result necessary and will be committed soon.
Side note: lockmgr_assert() will be used soon in order to implement real assertions in the vnode namespace replacing the legacy and still bogus "VOP_ISLOCKED()" way.
Tested by: kris (earlier version) Reviewed by: jhb
|
176224 |
13-Feb-2008 |
jhb |
Consolidate the code to generate a new XID for a NFS request into a nfs_xid_gen() function instead of duplicating the logic in both nfsm_rpchead() and the NFS3ERR_JUKEBOX handling in nfs_request().
MFC after: 1 week Submitted by: mohans (a long while ago)
|
176198 |
11-Feb-2008 |
kris |
Switch the default NFS mount mode from UDP to TCP. UDP mounts are a historical relic, and are no longer appropriate for either LAN or WAN mounting. At modern (gigabit and 10 gigabit) LAN speeds packet loss from socket buffer fill events is common, and sequence numbers wrap quickly enough that data corruption is possible. TCP solves both of these problems without imposing significant overhead.
MFC after: 1 month
|
176134 |
09-Feb-2008 |
attilio |
namei() can call underlying nfs_readlink() passing a struct uio pointer owned by a NULL owner. This will lead consequent VOP_ISLOCKED() present into nfs_upgrade_vnlock() to panic as it only acquire curthread now. Fix nfs_upgrade_vnlock() and nfs_downgrade_vnlock() in order to not use more the struct thread pointer passed as argument (as it is really nomore required there as vn_lock() and VOP_UNLOCK doesn't get the lock more). Using curthread, in place, doesn't get ambiguity as LK_EXCLOTHER should be handled as a "not locked" request by both functions.
Reported by: kris Tested by: kris Reviewed by: ups
|
176116 |
08-Feb-2008 |
attilio |
Conver all explicit instances to VOP_ISLOCKED(arg, NULL) into VOP_ISLOCKED(arg, curthread). Now, VOP_ISLOCKED() and lockstatus() should only acquire curthread as argument; this will lead in axing the additional argument from both functions, making the code cleaner.
Reviewed by: jeff, kib
|
175635 |
24-Jan-2008 |
attilio |
Cleanup lockmgr interface and exported KPI: - Remove the "thread" argument from the lockmgr() function as it is always curthread now - Axe lockcount() function as it is no longer used - Axe LOCKMGR_ASSERT() as it is bogus really and no currently used. Hopefully this will be soonly replaced by something suitable for it. - Remove the prototype for dumplockinfo() as the function is no longer present
Addictionally: - Introduce a KASSERT() in lockstatus() in order to let it accept only curthread or NULL as they should only be passed - Do a little bit of style(9) cleanup on lockmgr.h
KPI results heavilly broken by this change, so manpages and FreeBSD_version will be modified accordingly by further commits.
Tested by: matteo
|
175486 |
19-Jan-2008 |
attilio |
- Introduce the function lockmgr_recursed() which returns true if the lockmgr lkp, when held in exclusive mode, is recursed - Introduce the function BUF_RECURSED() which does the same for bufobj locks based on the top of lockmgr_recursed() - Introduce the function BUF_ISLOCKED() which works like the counterpart VOP_ISLOCKED(9), showing the state of lockmgr linked with the bufobj
BUF_RECURSED() and BUF_ISLOCKED() entirely replace the usage of bogus BUF_REFCNT() in a more explicative and SMP-compliant way. This allows us to axe out BUF_REFCNT() and leaving the function lockcount() totally unused in our stock kernel. Further commits will axe lockcount() as well as part of lockmgr() cleanup.
KPI results, obviously, broken so further commits will update manpages and freebsd version.
Tested by: kris (on UFS and NFS)
|
175294 |
13-Jan-2008 |
attilio |
VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used in conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary.
KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed.
Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
|
175237 |
11-Jan-2008 |
jhb |
The previous revision broke the case of reconnecting to a TCP NFS server via a new socket during an NFS operation as that reconnect takes place in the context of an arbitrary thread with an arbitrary credential. Ideally we would like to use the mount point's credential for the entire process of setting up the socket to connect to the NFS server. Since some of the APIs (sobind(), etc.) only take a thread pointer and infer the credential from that instead of a direct credential, work around the problem by temporarily changing the current thread's credential to that of the mount point while connecting the socket and then reverting back to the original credential when we are done.
Reviewed by: rwatson Tested on: UDP, TCP, TCP with forced reconnect
|
175221 |
10-Jan-2008 |
jhb |
Pass curthread to various socket routines (socreate(), sobind(), and soconnect()) instead of &thread0 when establishing a connection to the NFS server. Otherwise inconsistent credentials may be used when setting up the NFS socket.
MFC after: 1 week Reviewed by: rwatson
|
175202 |
10-Jan-2008 |
attilio |
vn_lock() is currently only used with the 'curthread' passed as argument. Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed.
Manpage and FreeBSD_version will be updated through further commits.
As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock.
Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>
|
173751 |
19-Nov-2007 |
rwatson |
Remove hacks from the NFSv2/3 client intended to handle a lack of a server-side RPC retranmission cache for non-idempotent operations: these hacks substituted 0 (success) for the expected EEXIST in the event that a target name already existed for LINK, SYMLINK, and MKDIR operations, under the assumption that EEXIST represented a second application of the original RPC rather than a true failure.
Background: certain NFS operations (in this case, LINK, SYMLINK, and MKDIR) are not idempotent, as they leave behind persisting state on the server that prevents them from being replayed without an error;if an UDP RPC reply is lost leading to a retransmission by theclient, the second reply will return EEXIST rather than success, asthe new object has already been created. The NFS client previouslysilently mapped the EEXIST return into success to paper over thisproblem.
However, in all modern NFS server implementations, a reply cache is kept in order to retransmit the original reply to a retransmitted request, rather than performing the operation a second time, allowing this hack to be avoided. This allows link()-based filelocking over NFS to operate correctly, as an application requestingthe creation of a new link for a file to tell if it succeededatomically or not.
Other NFS clients, including Solaris and Linux, generally follow this behavior for the same reasons. Most clients also now default to TCP, which also helps avoid the issue of retransmitted but non-idempotent requests in most cases.
Reported by: Adam McDougall <mcdouga9 at egr dot msu dot edu>, Timo Sirainen <tss at iki dot fi> Reviewed by: mohans MFC after: 1 week
|
173068 |
27-Oct-2007 |
rodrigc |
Add the following mount options to the nfs_opts array: noatime, noexec, suiddir, nosuid, nosymfollow, union, noclusterr, noclusterw, multilabel, acls, force, update, async. These options correspond to MOPT_STDOPTS, MOPT_FORCE, MOPT_UPDATE, and MOPT_ASYNC.
Currently, mount_nfs converts these "-o" options from strings to MNT_ flags via getmntopts(), and passes the flags from userspace to the kernel. This change will allow us in future to pass these mount options as strings directly to the kernel via nmount() when doing NFS mounts.
|
172836 |
20-Oct-2007 |
julian |
Rename the kthread_xxx (e.g. kthread_create()) calls to kproc_xxx as they actually make whole processes. Thos makes way for us to add REAL kthread_create() and friends that actually make theads. it turns out that most of these calls actually end up being moved back to the thread version when it's added. but we need to make this cosmetic change first.
I'd LOVE to do this rename in 7.0 so that we can eventually MFC the new kthread_xxx() calls.
|
172759 |
18-Oct-2007 |
jhb |
Add a -z flag to nfsstat which zeros the NFS statistics after displaying them.
MFC after: 1 week Requested by: ps Submitted by: ps (6 years ago)
|
172697 |
16-Oct-2007 |
alfred |
Get rid of qaddr_t.
Requested by: bde
|
172600 |
12-Oct-2007 |
mohans |
NFS MP scaling changes. - Eliminate the hideous nfs_sndlock that serialized NFS/TCP request senders thru the sndlock. - Institute a new nfs_connectlock that serializes NFS/TCP reconnects. Add logic to wait for pending request senders to finish sending before reconnecting. Dial down the sb_timeo for NFS/TCP sockets to 1 sec. - Break out the nfs xid manipulation under a new nfs xid lock, rather than over loading the nfs request lock for this purpose. - Fix some of the locking in nfs_request. Many thanks to Kris Kennaway for his help with this and for initiating the MP scaling analysis and work. Kris also tested this patch thorougly. Approved by: re@ (Ken Smith)
|
172324 |
25-Sep-2007 |
mohans |
Fix for a very rare race, caused by the nfsiod wakeup and nfsiod idle timeout occurring at exactly the same time. If this happens, the nfsiod exits although there may be a queued async IO request for it.
Found by : Kris Kennaway Approved by: re
|
171744 |
06-Aug-2007 |
rwatson |
Remove the now-unused NET_{LOCK,UNLOCK,ASSERT}_GIANT() macros, which previously conditionally acquired Giant based on debug.mpsafenet. As that has now been removed, they are no longer required. Removing them significantly simplifies error-handling in the socket layer, eliminated quite a bit of unwinding of locking in error cases.
While here clean up the now unneeded opt_net.h, which previously was used for the NET_WITH_GIANT kernel option. Clean up some related gotos for consistency.
Reviewed by: bz, csjp Tested by: kris Approved by: re (kensmith)
|
171190 |
03-Jul-2007 |
jhb |
Fix for a race where out of order loading of NFS attrs into the nfsnode could lead to attrs being stale. One example (that we ran into) was a READDIR+, WRITE. The responses came back in order, but the attrs from the WRITE were loaded before the attrs from the READDIR+, leading to the wrong size from being read on the next stat() call.
MFC after: 1 week Submitted by: mohans Approved by: re (kensmith)
|
171189 |
03-Jul-2007 |
jhb |
Fix up NFS client write error handling. Errors are split into recoverable and unrecoverable. For the former, we redirty the buffer and hang onto it for future retries. For the latter (eg. ESTALE), we discard the buffer and return the error back to the user on the next syscall. This fixes a number of vfs panics and fixes having a large number of dirty buffers (that cannot be written out and reclaimed) from hanging around. Thanks to ups@ for discussions on this issue.
Reported by: kris, Kai, others Approved by: re (kensmith)
|
170292 |
04-Jun-2007 |
attilio |
Do proper "locking" for missing vmmeters part. Now, we assume no more sched_lock protection for some of them and use the distribuited loads method for vmmeter (distribuited through CPUs).
Reviewed by: alc, bde Approved by: jeff (mentor)
|
170174 |
01-Jun-2007 |
jeff |
- Move rusage from being per-process in struct pstats to per-thread in td_ru. This removes the requirement for per-process synchronization in statclock() and mi_switch(). This was previously supported by sched_lock which is going away. All modifications to rusage are now done in the context of the owning thread. reads proceed without locks. - Aggregate exiting threads rusage in thread_exit() such that the exiting thread's rusage is not lost. - Provide a new routine, rufetch() to fetch an aggregate of all rusage structures from all threads in a process. This routine must be used in any place requiring a rusage from a process prior to it's exit. The exited process's rusage is still available via p_ru. - Aggregate tick statistics only on demand via rufetch() or when a thread exits. Tick statistics are kept in the thread and protected by sched_lock until it exits.
Initial patch by: attilio Reviewed by: attilio, bde (some objections), arch (mostly silent)
|
170170 |
31-May-2007 |
attilio |
Revert VMCNT_* operations introduction. Probabilly, a general approach is not the better solution here, so we should solve the sched_lock protection problems separately.
Requested by: alc Approved by: jeff (mentor)
|
169681 |
18-May-2007 |
rwatson |
In nfs_down(), if rep can be NULL, which we test for, then we should lock and unlock conditionally, not just set the flag on it conditionally. In practice, this bug couldn't manifest, as in the current revision of the code, no callers pass a NULL rep.
CID: 1416 Found with: Coverity Prevent(tm)
|
169667 |
18-May-2007 |
jeff |
- define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulating vmcnts. This can be used to abstract away pcpu details but also changes to use atomics for all counters now. This means sched lock is no longer responsible for protecting counts in the switch routines.
Contributed by: Attilio Rao <attilio@FreeBSD.org>
|
169043 |
25-Apr-2007 |
jhb |
Various fixes to the NFS Directio support. - Fix for a bug where a close would not wait for all (directio) dirty buffers to drain. The nfsnode was not marked NMODIFIED when there were directio dirtied buffers pending, causing this. - No reason to vhold/vrele the vp when enqueueing DirectIO requests for the nfsiods. The vnode can't really go way since the close has to wait for these requests to drain.
MFC after: 1 week Submitted by: mohans
|
168931 |
21-Apr-2007 |
rwatson |
Attempt to rationalize NFS privileges:
- Replace PRIV_NFSD with PRIV_NFS_DAEMON, add PRIV_NFS_LOCKD.
- Use PRIV_NFS_DAEMON in the NFS server.
- In the NFS client, move the privilege check from nfslockdans(), which occurs every time a write is performed on /dev/nfslock, and instead do it in nfslock_open() just once. This allows us to avoid checking the saved uid for root, and just use the effective on open. Use PRIV_NFS_LOCKD.
|
167830 |
23-Mar-2007 |
delphij |
Don't destroy a mutex just before we use it, instead, destroy it after we have used it.
|
167497 |
13-Mar-2007 |
tegge |
Make insmntque() externally visibile and allow it to fail (e.g. during late stages of unmount). On failure, the vnode is recycled.
Add insmntque1(), to allow for file system specific cleanup when recycling vnode on failure.
Change getnewvnode() to no longer call insmntque(). Previously, embryonic vnodes were put onto the list of vnode belonging to a file system, which is unsafe for a file system marked MPSAFE.
Change vfs_hash_insert() to no longer lock the vnode. The caller now has that responsibility.
Change most file systems to lock the vnode and call insmntque() or insmntque1() after a new vnode has been sufficiently setup. Handle failed insmntque*() calls by propagating errors to callers, possibly after some file system specific cleanup.
Approved by: re (kensmith) Reviewed by: kib In collaboration with: kib
|
167353 |
09-Mar-2007 |
mohans |
Back out a chance to nfs_timer() that inadvertantly crept in the last checkin :(
|
167352 |
09-Mar-2007 |
mohans |
Over NFS, an open() call could result in multiple over-the-wire GETATTRs being generated - one from lookup()/namei() and the other from nfs_open() (for cto consistency). This change eliminates the GETATTR in nfs_open() if an otw GETATTR was done from the namei() path. Instead of extending the vop interface, we timestamp each attr load, and use this to detect whether a GETATTR was done from namei() for this syscall. Introduces a thread-local variable that counts the syscalls made by the thread and uses <pid, tid, thread syscalls> as the attrload timestamp. Thanks to jhb@ and peter@ for a discussion on thread state that could be used as the timestamp with minimal overhead.
|
167086 |
27-Feb-2007 |
jhb |
Use pause() rather than tsleep() on stack variables and function pointers.
|
166782 |
16-Feb-2007 |
mohans |
Backing out an earlier change. It seems harmless for NFS to miss the "force unmount" flag, making the acquisition of the MNT_ILOCK in nfs_request() and nfs_sigintr() unnecessary. Pointed out by tegge@.
|
166636 |
11-Feb-2007 |
mohans |
Add missing MNT_ILOCK around some mnt_kern_flag accesses.
|
166378 |
31-Jan-2007 |
mohans |
Fix for a vnode lock leak in nfs_create() in the event of an error. Spotted by ups@.
|
166338 |
30-Jan-2007 |
kris |
Instead of always hard-coding the socket type for the nfs root mount as SOCK_DGRAM (i.e. UDP), respect the value configured earlier. This allows TCP NFS root mounts using e.g. the boot.nfsroot.options="tcp" tunable.
In this case some of the connection parameters like the retry timer were previously set appropriately for TCP but inappropriately for the UDP socket that was actually used, leading to e.g. extremely long recovery times (O(hours)) after a nfs server reboot.
Reviewed by: mohans MFC After: 2 weeks
|
166218 |
25-Jan-2007 |
bde |
Unstaticize nfs_iosize() in nfsclient and use it in nfs4client instead of duplicating it except for larger style bugs in the copy.
Fix some nearby style bugs (including a harmless type mismatch) in and near the remaining copy.
This is part of fixing collisions of the 2 nfs*client's names. Even static names should have a unique prefixes so that they can be debugged easily.
|
166193 |
23-Jan-2007 |
kib |
Cylinder group bitmaps and blocks containing inode for a snapshot file are after snaplock, while other ffs device buffers are before snaplock in global lock order. By itself, this could cause deadlock when bdwrite() tries to flush dirty buffers on snapshotted ffs. If, during the flush, COW activity for snapshot needs to allocate block and ffs_alloccg() selects the cylinder group that is being written by bdwrite(), then kernel would panic due to recursive buffer lock acquision.
Avoid dealing with buffers in bdwrite() that are from other side of snaplock divisor in the lock order then the buffer being written. Add new BOP, bop_bdwrite(), to do dirty buffer flushing for same vnode in the bdwrite(). Default implementation, bufbdflush(), refactors the code from bdwrite(). For ffs device buffers, specialized implementation is used.
Reviewed by: tegge, jeff, Russell Cattelan (cattelan xfs org, xfs changes) Tested by: Peter Holm X-MFC after: 3 weeks (if ever: it changes ABI)
|
165104 |
11-Dec-2006 |
mohans |
NetApp filers return corrupt post op attrs in the wcc on NFS error responses. This is easy to reproduce for EROFS. I am not sure if the attrs can be corrupt for other NFS error responses. For now, disabling wcc pre-op attr checks and post-op attr loads on NFS errors (sysctl'ed). Reported by: Kris Kennaway
|
164934 |
06-Dec-2006 |
sam |
consolidate parsing of nfs root mount options in one place and handle all options (some may require fixes elsewhere)
Reviewed by: jhb, mohans MFC after: 1 month
|
164735 |
29-Nov-2006 |
mohans |
In nfs_nget(), we must initialize the fh in the nfsnode before inserting the vnode into the vfs hash. Otherwise, another thread walking the hash can trip on an nfsnode with an uninitialized or partially initialized fh. Thanks to ups@ for spotting this race.
|
164701 |
27-Nov-2006 |
mohans |
bde@ pointed out that tprintf() acquires Giant so callers of tprintf() don't have to explicitly acquire Giant (although they need to be aware of this and not hold any locks at that point). Remove the acquisitions of Giant in the NFS client wrapping tprintf().
|
164684 |
27-Nov-2006 |
mohans |
Fix for a bug caused by a race when 2 threads lookup the same file. Leave the loser's lock(s) initialized, so the reclaim logic can unconditionally destroy them when that race occurs (or if the vfs hash insert happened to fail for some other reason). Thanks to ups@ for a careful review of the code. Reported by : Kris Kennaway
|
164430 |
20-Nov-2006 |
mohans |
1) Fix up locking in nfs_up() and nfs_down. 2) Reduce the acquisitions of the Giant lock in the nfs_socket.c paths significantly. - We don't need to acquire Giant before tsleeping on lbolt anymore, since jhb specialcased lbolt handling in msleep. - nfs_up() needs to acquire Giant only if printing the "server up" message. - nfs_timer() held Giant for the duration of the NFS timer processing, just because the printing of the message in nfs_down() needed it (and we acquire other locks in nfs_timer()). The acquisition of Giant is moved down into nfs_down() now, reducing the time Giant is held in that path.
Reported by: Kris Kennaway
|
164346 |
16-Nov-2006 |
mohans |
vfs_hash_insert() vputs() the losing vnode before returning, in the event of a race where a duplicate vnode is entered into the vfs hash. nfs_nget() shouldn't be releasing the vnode in that case.
|
164345 |
16-Nov-2006 |
mohans |
Fix to readdir+ reply handling. When inserting an entry into the namecache, initialize the nfsnode's ctime. Otherwise a subsequent lookup purges the just entered namecache entry.
|
164063 |
07-Nov-2006 |
sam |
honor nolockd flag in root mount options
MFC after: 2 weeks
|
163830 |
31-Oct-2006 |
mohans |
Make EWOULDBLOCK a recoverable error so that the request is retransmitted. This bug results in data corruption with NFS/TCP. Writes are silently dropped on EWOULDBLOCK (because socket send buffer is full and sockbuf timer fires).
Reviewed by: ups@
|
163471 |
17-Oct-2006 |
bde |
Fixed some style bugs (especially ones involving long lines and use of __P(())). There are many more.
|
163341 |
14-Oct-2006 |
bde |
Don't do null Setattr RPCs for VA_MARK_ATIME. When we added the VA_MARK_ATIME feature to fix POSIX conformance fore execve() and mmap(), we thought that it was optimized well enough for the one file system that supports it (ffs) and harmless for other file systems (except layered ones which already get the layering for VOP_SETATTR() wrong). However, nfs_setattr() doesn't do much parameter checking, so when it gets a combination of parameters that it doesn't understand, it always does a Setattr RPC. This RPC can't do anything good, and for VA_MARK_ATIME it is null except for wasting a lot of time.
This is the smallest and easiest to fix of several bugs that have increased the number of RPCs for kernel builds on nfs by more than 100% since 2004-11-05. The real-time increase depends on network latency and parallelization and can also be very large (approaching the same percentage for unparallelized operations like "make depend" on systems with fast CPUs and high-latency networks).
|
162954 |
02-Oct-2006 |
phk |
First part of a little cleanup in the calendar/timezone/RTC handling.
Move relevant variables to <sys/clock.h> and fix #includes as necessary.
Use libkern's much more time- & spamce-efficient BCD routines.
|
162649 |
26-Sep-2006 |
tegge |
Add mnt_noasync counter to better handle interleaved calls to nmount(), sync() and sync_fsync() without losing MNT_ASYNC. Add MNTK_ASYNC flag which is set only when MNT_ASYNC is set and mnt_noasync is zero, and check that flag instead of MNT_ASYNC before initiating async io.
|
162647 |
26-Sep-2006 |
tegge |
Use mount interlock to protect all changes to mnt_flag and mnt_kern_flag. This eliminates a race where MNT_UPDATE flag could be lost when nmount() raced against sync(), sync_fsync() or quotactl().
|
162288 |
13-Sep-2006 |
mohans |
Fixes up the handling of shared vnode lock lookups in the NFS client, adds a FS type specific flag indicating that the FS supports shared vnode lock lookups, adds some logic in vfs_lookup.c to test this flag and set lock flags appropriately.
- amd on 6.x is a non-starter (without this change). Using amd under heavy load results in a deadlock (with cascading vnode locks all the way to the root) very quickly. - This change should also fix the more general problem of cascading vnode deadlocks when an NFS server goes down.
Ideally, we wouldn't need these changes, as enabling shared vnode lock lookups globally would work. Unfortunately, UFS, for example isn't ready for shared vnode lock lookups, crashing pretty quickly.
This change is the result of discussions with Stephan Uphoff (ups@).
Reviewed by: ups@
|
161722 |
29-Aug-2006 |
mohans |
Fix for a deadlock triggered by a 'umount -f' causing a NFS request to never retransmit (or return). Thanks to John Baldwin for helping nail this one.
Found by : Kris Kennaway
|
161371 |
16-Aug-2006 |
thomas |
Fix typos in comment.
|
161125 |
09-Aug-2006 |
alc |
Introduce a field to struct vm_page for storing flags that are synchronized by the lock on the object containing the page.
Transition PG_WANTED and PG_SWAPINPROG to use the new field, eliminating the need for holding the page queues lock when setting or clearing these flags. Rename PG_WANTED and PG_SWAPINPROG to VPO_WANTED and VPO_SWAPINPROG, respectively.
Eliminate the assertion that the page queues lock is held in vm_page_io_finish().
Eliminate the acquisition and release of the page queues lock around calls to vm_page_io_finish() in kern_sendfile() and vfs_unbusy_pages().
|
161109 |
09-Aug-2006 |
brooks |
Add a new kernel environment variable "boot.netif.mtu" which is used to set the MTU prior to mounting root via NFS. This is required if the server supports a higher than default MTU because the client will not see the responses otherwise.
MFC after: 3 weeks
|
160619 |
24-Jul-2006 |
rwatson |
soreceive_generic(), and sopoll_generic(). Add new functions sosend(), soreceive(), and sopoll(), which are wrappers for pru_sosend, pru_soreceive, and pru_sopoll, and are now used univerally by socket consumers rather than either directly invoking the old so*() functions or directly invoking the protocol switch method (about an even split prior to this commit).
This completes an architectural change that was begun in 1996 to permit protocols to provide substitute implementations, as now used by UDP. Consumers now uniformly invoke sosend(), soreceive(), and sopoll() to perform these operations on sockets -- in particular, distributed file systems and socket system calls.
Architectural head nod: sam, gnn, wollman
|
160182 |
08-Jul-2006 |
kib |
Signals may be delivered to process as well as to the thread. Check the thread-delivered signals in addition to the process one.
Reviewed by: mohan MFC after: 1 month Approved by: kan (mentor)
|
160181 |
08-Jul-2006 |
kib |
Always supply curthread as argument to nfs_asyncio and nfs_doio in nfs_strategy. Otherwise, for some buffers, signals would be ignored at the intr mounts.
Reviewed by: mohan MFC after: 1 month Approved by: kan (mentor)
|
160038 |
29-Jun-2006 |
yar |
There is a consensus that ifaddr.ifa_addr should never be NULL, except in places dealing with ifaddr creation or destruction; and in such special places incomplete ifaddrs should never be linked to system-wide data structures. Therefore we can eliminate all the superfluous checks for "ifa->ifa_addr != NULL" and get ready to the system crashing honestly instead of masking possible bugs.
Suggested by: glebius, jhb, ru
|
160029 |
29-Jun-2006 |
yar |
Use the elegant TAILQ_FOREACH() in place of a hand-rolled for() loop.
|
159081 |
30-May-2006 |
mohans |
Kris Kennaway found that for '/' NFS mounts, the MPSAFE mount flag was not being set, which means Giant would be acquired for these mounts.
|
158965 |
26-May-2006 |
mohans |
Fix for a potential attempt to sleep while holding nm_mtx. Caught and reported by Witness (which forces the mbuf allocation flag to M_NOWAIT).
Reported by: "sekes".
|
158915 |
25-May-2006 |
ups |
Call vm_object_page_clean() with the object lock held.
Submitted by: kensmith@ Reviewed by: mohans@ MFC after: 6 days
|
158906 |
25-May-2006 |
ups |
Do not set B_NOCACHE on buffers when releasing them in flushbuflist(). If B_NOCACHE is set the pages of vm backed buffers will be invalidated. However clean buffers can be backed by dirty VM pages so invalidating them can lead to data loss. Add support for flush dirty page in the data invalidation function of some network file systems.
This fixes data losses during vnode recycling (and other code paths using invalbuf(*,V_SAVE,*,*)) for data written using an mmaped file.
Collaborative effort by: jhb@,mohans@,peter@,ps@,ups@ Reviewed by: tegge@ MFC after: 7 days
|
158905 |
24-May-2006 |
mohans |
Since NFSv4 is not SMP safe, nfsiod needs to acquire Giant for NFSv4 mounts before doing the read/write.
Reported by: Chuck Lever.
|
158903 |
24-May-2006 |
rwatson |
Adjust minimum iod threads from 4 to 0 -- since we compile the NFS client into the kernel by default, and many users won't use NFS, don't start an extra 4 kernel threads that are unused. Once NFS becomes active, it will start nfsiod's as it needs them.
We might consider mandating a minimum iod's equal to the number of active NFS mounts (truncated to some value), which would force some to remain available without having to create a new one if the file system is mostly inactive.
PR: 70880 MFC after: 2 weeks Prodded by: cel Head nod: peter Pointed out by: Joe <fbsd_user at a1poweruser dot com>
|
158860 |
23-May-2006 |
cel |
NFS over TCP retransmit behavior should default to a 60 second time out, mimicing the NFS reference implementation.
NFS over TCP does not need fast retransmit timeouts, since network loss and congestion are managed by the transport (TCP), unlike with NFS over UDP. A long timeout prevents the unnecessary retransmission of non- idempotent NFS requests.
Reviewed by: mohans, silby, rees? Sponsored by: Network Appliance, Incorporated
|
158859 |
23-May-2006 |
cel |
Refactor the NFS over UDP retransmit timeout estimation logic to allow the estimator to be more easily tuned and maintained.
There should be no functional change except there is now a lower limit on the retransmit timeout to prevent the client from retransmitting faster than the server's disks can fill requests, and an upper limit to prevent the estimator from taking to long to retransmit during a server outage.
Reviewed by: mohan, kris, silby Sponsored by: Network Appliance, Incorporated
|
158855 |
23-May-2006 |
mohans |
Vnode locks are recursive and the NFS client support shared vnode locks.
Found by: Kris Kennaway.
|
158739 |
19-May-2006 |
mohans |
Changes to make the NFS client MP safe.
Thanks to Kris Kennaway for testing and sending lots of bugs my way.
|
158316 |
05-May-2006 |
mohans |
Fix a snafu caused while patching the previous fix from another branch.
|
158315 |
05-May-2006 |
mohans |
Fix for a NFS/TCP client bug which would cause the NFS/TCP stream to get out of sync under heavy loads, forcing frequent reconnets, causing EBADRPC errors etc.
|
157557 |
06-Apr-2006 |
mohans |
Keep track of the number of in-progress async direct IO writes in the nfsnode. Make fsync/close wait until all of these drain. Add a check to nfs_getpage() and nfs_putpage().
|
157349 |
01-Apr-2006 |
jeff |
- Busy the filesystem in nfs_statfs to prevent us from creating a new vnode after vflush() has succeeded. This would cause a dangling vnode panic at unmount time otherwise. Other filesystems may have this problem via their VFS_VGET() routines.
Found by: kris Sponsored by: Isilon Systems, Inc.
|
157058 |
23-Mar-2006 |
kris |
Fix a bug in the NFS/TCP retransmission path.
The bug was that earlier, if a request was retransmitted, we would do subsequent retransmits every 10 msecs.
This can cause data corruption under moderate loads by reordering operations as seen by the client NFS attribute cache, and on the server side when the retransmission occurs after the original request has left the duplicate cache, since the operation will be committed for a second time.
Further work on retransmission handling is needed (e.g. they are still being done sent too often since they are scaled by HZ, and the size of the dup cache is too small and easily overwhelmed on busy servers).
Submitted by: mohans
|
156879 |
19-Mar-2006 |
pjd |
Actually I wanted 'nolockd' here instead of 'lockd'.
MFC after: 2 days
|
156825 |
17-Mar-2006 |
cel |
If an NFS server returns more than a few EJUKEBOX errors for a given RPC request, the FreeBSD NFS client will quickly back off to a excessively long wait (days, then weeks) before retrying the request.
Change the behavior of the FreeBSD NFS client to match the behavior of the reference NFS client implementation (Solaris). This provides a fixed delay of 10 seconds between each retry by default. A sysctl, called nfs3_jukebox_delay, is now available to tune the delay. Unlike Solaris, the sysctl value on FreeBSD is in seconds, rather than in HZ.
Sponsored by: Network Appliance, Incorporated Reviewed by: rick Approved by: silby MFC after: 3 days
|
156416 |
08-Mar-2006 |
cel |
Fix a bug in NFSv3 READDIRPLUS reply processing
The client's READDIRPLUS logic skips the attributes and filehandle of the ".." entry. If the server doesn't send attributes but does send a filehandle for "..", the client's logic doesn't account for the extra "value follows" field that indicates whether the filehandle is present, causing the remaining entries in the reply to be ignored.
Sponsored by: Network Appliance, Inc. Reviewed by: rick, mohans Approved by: silby MFC after: 2 weeks
|
154580 |
20-Jan-2006 |
rees |
Don't log an error on tcp connection reset, even if we don't get ECONNRESET.
Submitted by: cel@citi.umich.edu
|
154487 |
17-Jan-2006 |
alfred |
I ran into an nfs client panic a couple of times in a row over the last few days. I tracked it down to the fact that nfs_reclaim() is setting vp->v_data to NULL _before_ calling vnode_destroy_object(). After silence from the mailing list I checked further and discovered that ufs_reclaim() is unique among FreeBSD filesystems for calling vnode_destroy_object() early, long before tossing v_data or much of anything else, for that matter. The rest, including NFS, appear to be identical, as if they were just clones of one original routine.
The enclosed patch fixes all file systems in essentially the same way, by moving the call to vnode_destroy_object() to early in the routine (before the call to vfs_hash_remove(), if any). I have only tested NFS, but I've now run for over eighteen hours with the patch where I wouldn't get past four or five without it.
Submitted by: Frank Mayhar Requested by: Mohan Srinivasan MFC After: 1 week
|
154316 |
13-Jan-2006 |
rwatson |
In nfs_dolock(), GC now under-used ioflg, rendered obsolete when we moved from using a fifo to talk to rpc.lockd to using a special device node.
Noticed by: Coverity Prevent analysis tool MFC after: 3 days
|
154152 |
09-Jan-2006 |
tegge |
Add marker vnodes to ensure that all vnodes associated with the mount point are iterated over when using MNT_VNODE_FOREACH.
Reviewed by: truckman
|
153786 |
28-Dec-2005 |
delphij |
Correct a typo
|
153365 |
12-Dec-2005 |
ps |
Improve upon rev 1.133 where NFS/TCP would not reconnect.
Submitted by: Mohan Srinivasan
|
152920 |
29-Nov-2005 |
ru |
Unexpand LLADDR().
|
152657 |
21-Nov-2005 |
ps |
Fix for a bug where NFS/TCP would not reconnect (in the case where the server FIN'ed). Seen with Solaris NFS servers.
Reported by: TOMITA Yoshinori <yoshint@flab.fujitsu.co.jp> Submitted by: Mohan Strinivasan
|
152656 |
21-Nov-2005 |
ps |
- Always return success from NFS strategy. nfs_doio(), in the event of an error, does the right thing, in terms of setting the error flags in the buf header. That fixes a crash from bstrategy(). - Treat ETIMEDOUT as a "recoverable" error, causing the buffer to be re-dirtied. ETIMEDOUT can occur on soft mounts, when the number of retries are exceeded, and we don't want data loss in that case.
Submitted by: Mohan Srinivasan
|
152652 |
21-Nov-2005 |
rees |
fix a problem with XID re-use when a server returns NFSERR_JUKEBOX.
Submitted by: cel@citi.umich.edu Fixed by: rick@snowhite.cis.uoguelph.ca Approved by: alfred MFC after: 3 weeks
|
152289 |
10-Nov-2005 |
jon |
fix a crash when an nfsv2 mount fails
MFC after: 1 week
|
152019 |
03-Nov-2005 |
ps |
Fix for a crash (from nfs_lookup() in an error case).
Submitted by: Mohan Srinivasan
|
152000 |
03-Nov-2005 |
ps |
In nfs_flush(), clear the NMODIFIED bit only if there are no dirty buffers *and* there are no buffers queued up for writing. The bug was that NMODIFIED was being cleared even while there were buffers scheduled to be written out, which leads to all sorts of interesting bugs - one where the file could shrink (because of a post-op getattr load, say) causing data in buffer(s) queued for write to be tossed, resulting in data corruption.
Submitted by: Mohan Srinivasan
|
151998 |
03-Nov-2005 |
ps |
Fix for a race between the thread transmitting the request and the thread processing the reply.
Submitted by: Mohan Srinivasan
|
151897 |
31-Oct-2005 |
rwatson |
Normalize a significant number of kernel malloc type names:
- Prefer '_' to ' ', as it results in more easily parsed results in memory monitoring tools such as vmstat.
- Remove punctuation that is incompatible with using memory type names as file names, such as '/' characters.
- Disambiguate some collisions by adding subsystem prefixes to some memory types.
- Generally prefer lower case to upper case.
- If the same type is defined in multiple architecture directories, attempt to use the same name in additional cases.
Not all instances were caught in this change, so more work is required to finish this conversion. Similar changes are required for UMA zone names.
|
151695 |
26-Oct-2005 |
glebius |
- Fix leak of struct nlminfo on process exit. - Fix malloc type collision, that made the above problem difficult to understand.
Reported by: Vladimir Sharun <sharun ukr.net>
|
151024 |
06-Oct-2005 |
pjd |
- Use strsep() instead of strtok(). - strdup() uses M_WAITOK, so we don't need to check it's return value against NULL.
MFC after: 2 weeks
|
150995 |
06-Oct-2005 |
pjd |
Add boot.nfsroot.options loader tunable. It allows to specify options for NFS root file system. Currently supported options are: soft, intr, conn, lockd.
I'm adding this functionality mostly for 'lockd' option, which is only honored when performing the initial mount and will be silently ignored if used while updating the mount options.
This will allow to use flock(2) without the need of using varmfs or rpc.lockd and friends.
Example of use: boot.nfsroot.options="intr,lockd"
MFC after: 2 weeks
|
150335 |
19-Sep-2005 |
rwatson |
Add GIANT_REQUIRED and WITNESS sleep warnings to uprintf() and tprintf(), as they both interact with the tty code (!MPSAFE) and may sleep if the tty buffer is full (per comment).
Modify all consumers of uprintf() and tprintf() to hold Giant around calls into these functions. In most cases, this means adding an acquisition of Giant immediately around the function. In some cases (nfs_timer()), it means acquiring Giant higher up in the callout.
With these changes, UFS no longer panics on SMP when either blocks are exhausted or inodes are exhausted under load due to races in the tty code when running without Giant.
NB: Some reduction in calls to uprintf() in the svr4 code is probably desirable.
NB: In the case of nfs_timer(), calling uprintf() while holding a mutex, or even in a callout at all, is a bad idea, and will generate warnings and potential upset. This needs to be fixed, but was a problem before this change.
NB: uprintf()/tprintf() sleeping is generally a bad ideas, as is having non-MPSAFE tty code.
MFC after: 1 week
|
148448 |
27-Jul-2005 |
ps |
FIx for a bug in the change that made nfs_timer() MPSAFE. We need to grab Giant before calling pru_send() (if running with mpsafenet = 0).
Found by: Jeremie Le Hen. Fixed by: Maxime Henrion
|
148447 |
27-Jul-2005 |
ps |
In nfs_nget() if two threads race on the same filehandle, the loser should cause the nfsnode to get freed. This fixes a potential vnode (and nfsnode) leak in that path.
Submitted by: Mohan Srinivasan Reviewed by: phk
|
148268 |
21-Jul-2005 |
ps |
Remove the NFS client rslock. The rslock was used to serialize writers that want to extend the file. It was also used to serialize readers that might want to read the last block of the file (with a writer extending the file). Now that we support vnode locking for NFS, the rslock is unnecessary. Writers grab the exclusive vnode lock before writing and readers grab the shared (or in some cases the exclusive) lock.
Submitted by: Mohan Srinivasan
|
148162 |
19-Jul-2005 |
ps |
Make nfs_timer() MPSAFE. With this change, the bottom half of the NFS client (the interface with the protocol stack and callouts) is Giant-free.
Submitted by: Mohan Srinivasan.
|
148111 |
18-Jul-2005 |
ps |
Fix for a NFS soft mounts bug where if the number of retries exceeds the max rexmits, the request was not being bounced back with a ETIMEDOUT error.
Reported by: Oliver Lehmann Submitted by: Mohan Srinivasan
|
148008 |
14-Jul-2005 |
ps |
Fixes for NFS crashes on architectures that require strict alignment. - Fix nfsm_disct() so that after pulling up data, the remaining data is aligned if necessary. - Fix nfs_clnt_tcp_soupcall() to bcopy() the rpc length out of the mbuf (instead of casting m_data to a uint32).
Submitted by: Pyun YongHyeon Reviewed by: Mohan Srinivasan
|
147420 |
16-Jun-2005 |
green |
Ifdef out the incomplete non-blocking IO implementation for NFS pending discussion of how implementation would proceed. Applications like -lc_r expect select(3) to match the EAGAIN-status of IO functions.
Approved by: re
|
147280 |
10-Jun-2005 |
green |
Fix a serious deadlock with the NFS client. Given a large enough atomic write request, it can fill the buffer cache with the entirety of that write in order to handle retries. However, it never drops the vnode lock, or else it wouldn't be atomic, so it ends up waiting indefinitely for more buf memory that cannot be gotten as it has it all, and it waits in an uncancellable state.
To fix this, hibufspace is exported and scaled to a reasonable fraction. This is used as the limit of how much of an atomic write request by the NFS client will be handled asynchronously. If the request is larger than this, it will be turned into a synchronous request which won't deadlock the system. It's possible this value is far off from what is required by some, so it shall be tunable as soon as mount_nfs(8) learns of the new field.
The slowdown between an asynchronous and a synchronous write on NFS appears to be on the order of 2x-4x.
General nod by: gad MFC after: 2 weeks More testing: wes PR: kern/79208
|
146333 |
17-May-2005 |
des |
Ugh. Previous commit got the logic exactly backward.
Submitted by: bland Pointy hat to: des
|
146316 |
17-May-2005 |
des |
Revision 1.173 broke updating a mount from ro to rw. Fix that by clearing the MNT_RDONLY flag if MNT_UPDATE is set and "ro" was not specified.
Suggested by: cognet
|
146065 |
10-May-2005 |
rees |
set R_MUSTRESEND flag in mark_for_reconnect so re-connected requests get re-sent instead of timing out.
don't log an error message on reconnection, which is not an error.
remove unused nfs_mrep_before_tsleep.
Reviewed by: Mohan Srinivasan Approved by: alfred
|
145879 |
04-May-2005 |
ps |
Fix a bug in NFS/TCP where retransmissions would not reliably happen if the server rebooted or tore down the connection for any reason.
Found by: Jonathan Noack. Submitted by: Mohan Srinivasan.
|
145806 |
02-May-2005 |
iedowse |
Don't copy the NFSMNT_* flags into struct statfs's f_flags field, as they have no connection with the expected MNT_* flags. This bug was exposed 18 months ago when the assignments to f_flags in vfs_syscalls.c were moved to before the VFS_STATFS() call. It was fixed in the CSRG source 10 years ago, but we never picked up that change.
PR: kern/80390 MFC after: 1 week
|
145598 |
27-Apr-2005 |
des |
When NFS was converted to the new mount syscall, code was written that sets the MNT_RDONLY flag if the "ro" option was passed in from userland, and clears it otherwise. In the diskless case, the MNT_RDONLY flag is already set when this code is reached, but there are no mount options, so it was incorrectly cleared. Change the logic so the MNT_RDONLY flag is set if the "ro" option was specified, and left alone otherwise.
Note that the NFS code will still happily let you mount a filesystem RW even if the server exports it RO. I'm not sure how to fix that.
|
145572 |
26-Apr-2005 |
des |
While I'm here, list the new kenv (boot.netif.name) along with the others.
|
145570 |
26-Apr-2005 |
des |
When netbooting, as soon as we've figured out which interface we booted from, store its name in a kenv variable.
|
145235 |
18-Apr-2005 |
rees |
TCP reconnect is not an error. Change the message from LOG_ERR to LOG_INFO.
Approved by: alfred
|
145060 |
14-Apr-2005 |
jeff |
- cache_lookup() relocks the parent in the DOTDOT case for us.
Spotted by: phk Sponsored by: Isilon Systems, Inc.
|
145006 |
13-Apr-2005 |
jeff |
- Change all filesystems and vfs_cache to relock the dvp once the child is locked in the ISDOTDOT case. Se vfs_lookup.c r1.79 for details.
Sponsored by: Isilon Systems, Inc.
|
144367 |
31-Mar-2005 |
jeff |
- LK_NOPAUSE is a nop now.
Sponsored by: Isilon Systems, Inc.
|
144299 |
29-Mar-2005 |
jeff |
- Remove wantparent, it is no longer necessary. An assert in vfs_lookup.c prevents any callers from doing a modifying op without LOCKPARENT or WANTPARENT.
|
144297 |
29-Mar-2005 |
jeff |
- cache_lookup() now locks the new vnode for us to prevent some races. Remove redundant code.
Sponsored by: Isilon Systems, Inc.
|
144206 |
28-Mar-2005 |
jeff |
- We no longer have to bother with PDIRUNLOCK, lookup() handles it for us. - Network filesystems are written with a special idiom that checks the cache first, and may even unlock dvp before discovering that a network round-trip is required to resolve the name. I believe dvp is prevented from being recycled even in the forced unmount case by the shared lock on the mount point. If not, this code should grow checks for VI_DOOMED after it relocks dvp or it will access NULL v_data fields.
Sponsored by: Isilon Systems, Inc.
|
144059 |
24-Mar-2005 |
jeff |
- Update vfs_root implementations to match the new prototype. None of these filesystems will support shared locks until they are explicitly modified to do so. Careful review must be done to ensure that this is safe for each individual filesystem.
Sponsored by: Isilon Systems, Inc.
|
144040 |
23-Mar-2005 |
ps |
- The NFS client was incorrectly masking SIGSTOP (which is non-maskable). - The NFS client needs to guard against spurious wakeups while waiting for the response. ltrace causes the process under question to wakeup (possibly from ptrace()), which causes NFS to wakeup from tsleep without the response being delivered.
Submitted by: Mohan Srinivasan
|
143822 |
18-Mar-2005 |
das |
Don't brelse(bp) if bp is null. Also, eliminate some redundancy and dead code.
Found by: Coverity Prevent analysis tool
|
143693 |
16-Mar-2005 |
phk |
Use vfs_hash.
|
143687 |
16-Mar-2005 |
jmg |
MFp4: use the function to fix the packet header length instead of rolling our own...
|
143511 |
13-Mar-2005 |
jeff |
- VOP_INACTIVE should no longer drop the vnode lock.
Sponsored by: Isilon Systems, Inc.
|
143510 |
13-Mar-2005 |
jeff |
- The VI_DOOMED flag now signals the end of a vnode's relationship with the filesystem. Check that rather than VI_XLOCK.
Sponsored by: Isilon Systems, Inc.
|
143508 |
13-Mar-2005 |
jeff |
- It is no longer necessary to lock and unlock the vnode in nfs_close() as the top level does this for us now.
Sponsored by: Isilon Systems, Inc.
|
142568 |
26-Feb-2005 |
ps |
Minor cleanup in nfs_request() and removal of a comment that doesn't reflect reality.
Submitted by: Mohan Srinivasan
|
142233 |
22-Feb-2005 |
phk |
vp->v_id is a private field for the vfs namecache and it is a big mistake that NFS ever started using it. Long time ago I added the necessary vhold()/vdrop() calls to replace it, but forgot to remove the v_id code.
Do it now.
|
142079 |
19-Feb-2005 |
phk |
Try to unbreak the vnode locking around vop_reclaim() (based mostly on patch from kan@).
Pull bufobj_invalbuf() out of vinvalbuf() and make g_vfs call it on close. This is not yet a generally safe function, but for this very specific use it is safe. This solves the problem with buffers not being flushed by unmount or after failed mount attempts.
|
142070 |
18-Feb-2005 |
ps |
Fix for a potential NFS client race where shared data is updated from base context as well as the socket callback.
Submitted by: Mohan Srinivasan
|
141465 |
07-Feb-2005 |
jhb |
Drop Giant before calling kthread_exit().
|
140996 |
29-Jan-2005 |
rwatson |
Style cleanup for O_DIRECT sysctl comment introduced in nfs_vnops.c:1.242.
|
140939 |
28-Jan-2005 |
phk |
Make filesystems get rid of their own vnodes vnode_pager object in VOP_RECLAIM().
|
140777 |
24-Jan-2005 |
phk |
Create a vnode_pager object when a file is opened.
|
140731 |
24-Jan-2005 |
phk |
Remove unused cred arg from nfs_vinvalbuf() and many bogus arguments passed for it.
|
140460 |
18-Jan-2005 |
peter |
Mostly back out rev 1.33 from quite some time ago, and the followup fixes and tweaks. The code was actually quite broken because it discarded the upper bits of the 64 bit division. We only had a 50% chance of scaling up the blocksize for large NFS client mounts when it was needed. For 5.x and beyond, this was harmless because we could represent the result in either case. For 4.x this was a big problem though. (4.x also has a df(1) bug to compound the problem)
|
140220 |
14-Jan-2005 |
phk |
Eliminate unused and unnecessary "cred" argument from vinvalbuf()
|
140122 |
12-Jan-2005 |
brian |
Include opt_bootp.h for BOOTP_NFSROOT
PR: 73183 Submitted by: Darrin Smith sdar at salseast dot org MFC after: 7 days
|
140056 |
11-Jan-2005 |
phk |
Add BO_SYNC() and add a default which uses the secret vnode pointer and VOP_FSYNC() for now.
|
140048 |
11-Jan-2005 |
phk |
Remove the unused credential argument from VOP_FSYNC() and VFS_SYNC().
I'm not sure why a credential was added to these in the first place, it is not used anywhere and it doesn't make much sense:
The credentials for syncing a file (ability to write to the file) should be checked at the system call level.
Credentials for syncing one or more filesystems ("none") should be checked at the system call level as well.
If the filesystem implementation needs a particular credential to carry out the syncing it would logically have to the cached mount credential, or a credential cached along with any delayed write data.
Discussed with: rwatson
|
139823 |
07-Jan-2005 |
imp |
/* -> /*- for license, minor formatting changes
|
139744 |
05-Jan-2005 |
ps |
If the NFS/TCP stream is out of sync between the client and server, and if the client (erroneously) reads the RPC length as 0 bytes, the client can loop around in the socket callback. Explicitly check for the length being 0 case and teardown/re-connect.
Submitted by: Mohan Srinivasan
|
139247 |
23-Dec-2004 |
ps |
Turn NFS directio off until the stability issues are resolved.
|
138919 |
16-Dec-2004 |
ps |
Change the NFS sillyrename convention so that we won't run out of sillyrenames (which were limited to 58 per pid per directory, for no good reason). The new format of sillyrenames looks like
.nfs.0000b31a.00d24.4 ^^^^^^^^ ^^^^^ ticks pid
Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com Obtained from: Yahoo!
|
138899 |
15-Dec-2004 |
ps |
First cut of NFS direct IO support. - NFS direct IO completely bypasses the buffer and page caches. If a file is open for direct IO all caching is disabled. - Direct IO for Directories will be addressed later. - 2 new NFS directio related sysctls are added. One is a knob to disable NFS direct IO completely (direct IO is enabled by default). The other is to disallow mmaped IO on a file that has at least one O_DIRECT open (see the comment in nfs_vnops.c for more details). The default is to allow mmaps on a file that has O_DIRECT opens.
Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com Obtained from: Yahoo!
|
138694 |
11-Dec-2004 |
marcel |
Revert rev 1.233. The null-pointer function call (a dereference on ia64) was not the result of a change in the vector operations. It was caused by the NFS locking code using a FIFO and those bypassing the vnode. This indirectly caused the panic. The NFS locking code has been changed.
Requested by: phk
|
138645 |
10-Dec-2004 |
ps |
In nfs_rename(), skip the otw rename operation if the fsync (to either src or dst) fails. This closes a potential data loss case (where the fsync failed with ENOSPC, for example).
Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com Obtained from: Yahoo!
|
138644 |
10-Dec-2004 |
ps |
Store a hint in the nfsnode to detect sequential access of the file. Kick off a readahead only when sequential access is detected. This eliminates wasteful readaheads in random file access.
Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com Obtained from: Yahoo!
|
138529 |
07-Dec-2004 |
ps |
Fix for a Lock Order Reversal in the nfs_flush() path, between the vnode interlock and the proc lock.
Reported by: marcel Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
|
138514 |
07-Dec-2004 |
phk |
Don't clobber mnt_stat.f_mntonname
|
138509 |
07-Dec-2004 |
phk |
The remaining part of nmount/omount/rootfs mount changes. I cannot sensibly split the conversion of the remaining three filesystems out from the root mounting changes, so in one go:
cd9660: Convert to nmount. Add omount compat shims. Remove dedicated rootfs mounting code. Use vfs_mountedfrom() Rely on vfs_mount.c calling VFS_STATFS()
nfs(client): Convert to nmount (the simple way, mount_nfs(8) is still necessary). Add omount compat shims. Drop COMPAT_PRELITE2 mount arg compatibility.
ffs: Convert to nmount. Add omount compat shims. Remove dedicated rootfs mounting code. Use vfs_mountedfrom() Rely on vfs_mount.c calling VFS_STATFS()
Remove vfs_omount() method, all filesystems are now converted.
Remove MNTK_WANTRDWR, handling RO/RW conversions is a filesystem task, and they all do it now.
Change rootmounting to use DEVFS trampoline:
vfs_mount.c: Mount devfs on /. Devfs needs no 'from' so this is clean. symlink /dev to /. This makes it possible to lookup /dev/foo. Mount "real" root filesystem on /. Surgically move the devfs mountpoint from under the real root filesystem onto /dev in the real root filesystem.
Remove now unnecessary getdiskbyname().
kern_init.c: Don't do devfs mounting and rootvnode assignment here, it was already handled by vfs_mount.c.
Remove now unused bdevvp(), addaliasu() and addalias(). Put the few necessary lines in devfs where they belong. This eliminates the second-last source of bogo vnodes, leaving only the lemming-syncer.
Remove rootdev variable, it doesn't give meaning in a global context and was not trustworth anyway. Correct information is provided by statfs(/).
|
138505 |
07-Dec-2004 |
ps |
Always issue wakeups() to the NFS requestors under the mutex to close all potential cases of missed wakeups.
Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
|
138496 |
06-Dec-2004 |
ps |
Rewrite of the NFS client's reply handling. We now have NFS socket upcalls which do RPC header parsing and match up the reply with the request. NFS calls now sleep on the nfsreq structure. This enables us to eliminate the NFS recvlock.
Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
|
138473 |
06-Dec-2004 |
ps |
2 fixes that improve on the consistency of the NFS client cache. - Change the cached mtime to a 'struct timespec' from a time_t. Improving the precision of the cached mtime tightens up NFS' "close-to-open" consistency considerably. - Always force an over-the-wire consistency check from nfs_open() (unless the file is marked modified). This further improves NFS' "close-to-open" consistency.
Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
|
138469 |
06-Dec-2004 |
ps |
Serialize NFS vinvalbuf operations by acquiring/upgrading to the vnode EXCLUSIVE lock. This prevents threads from adding pages to the vnode while an invalidation is in progress, closing potential races. In the bioread() path, callers acquire the SHARED vnode lock - so while an invalidate was in progress, it was possible to fault in new pages onto the vnode causing the invalidation to take a while or fail. We saw these races at Yahoo! with very large files+heavy concurrent access. Forcing an upgrade to EXCLUSIVE lock before doing the invalidation closes all these races.
Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
|
138463 |
06-Dec-2004 |
ps |
Add non-blocking versions of nfsm_dissect() and friends, for use from socket callbacks or similar callers, from both the NFS client and the server. Instituted nfsm_dissect_nonblock(), nfsm_dissect_xx_nonblock(). And nfsm_disct() now takes an extra M_TRYWAIT/M_DONTWAIT argument.
Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
|
138460 |
06-Dec-2004 |
ps |
- If all data has been committed to stable storage on the server, it is safe to turn off the nfsnode's NMODIFIED flag. - Move the check for signals to the top of the loop where we loop around the dirty buffers on the vnode, scheduling writes. This ensures that we'll break ouf of the flush operation on reception of a signal.
Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
|
138458 |
06-Dec-2004 |
rwatson |
Correct a typo in a comment.
|
138430 |
06-Dec-2004 |
phk |
For reasons unknown, the nfs locking code used a fifo to send requests to userland and a dedicated system call to get replies.
The vnode-bypass of fifos broke this into a panic.
Ditch all the magic and create a device /dev/nfslock instead, and use that for both directions apart from the shorter path, this is also faster because the device driver runs Giant free using the vnode bypass.
Noticed by: marcel
|
138419 |
05-Dec-2004 |
rwatson |
Convert GIANT_REQUIRED; in nfs_mountroot() to NET_ASSERT_GIANT(), and annotate that nfs_mountroot assumes it is OK to step on the values in the global NFSv3 diskless structure as the mountroot function is called during a serialized part of the boot, before any other NFS client activity occurs.
MFC after: 2 weeks
|
138418 |
05-Dec-2004 |
rwatson |
Convert a GIANT_REQUIRED; into a NET_ASSERT_GIANT();, as sockets are now only conditionally protected by Giant based on debug.mpsafenet.
|
138412 |
05-Dec-2004 |
phk |
VFS_STATFS(mp, ...) is mostly called with &mp->mnt_stat, but a few cases doesn't. Most of the implementations have grown weeds for this so they copy some fields from mnt_stat if the passed argument isn't that.
Fix this the cleaner way: Always call the implementation on mnt_stat and copy that in toto to the VFS_STATFS argument if different.
|
138411 |
05-Dec-2004 |
marcel |
Fix null-pointer indirect function calls introduced in the previous commit. In the new world order, the transitive closure on the vector operations is not precomputed. As such, it's unsafe to actually use any of the function pointers in an indirect function call. They can be null, and we need to use the default vector in that case. This is mostly a quick fix for the four function pointers that are ed explicitly. A more generic or scalable solution is likely to see the light of day.
No pathos on: current@
|
138290 |
01-Dec-2004 |
phk |
Back when VOP_* was introduced, we did not have new-style struct initializations but we did have lofty goals and big ideals.
Adjust to more contemporary circumstances and gain type checking.
Replace the entire vop_t frobbing thing with properly typed structures. The only casualty is that we can not add a new VOP_ method with a loadable module. History has not given us reason to belive this would ever be feasible in the the first place.
Eliminate in toto VOCALL(), vop_t, VNODEOP_SET() etc.
Give coda correct prototypes and function definitions for all vop_()s.
Generate a bit more data from the vnode_if.src file: a struct vop_vector and protype typedefs for all vop methods.
Add a new vop_bypass() and make vop_default be a pointer to another struct vop_vector.
Remove a lot of vfs_init since vop_vector is ready to use from the compiler.
Cast various vop_mumble() to void * with uppercase name, for instance VOP_PANIC, VOP_NULL etc.
Implement VCALL() by making vdesc_offset the offsetof() the relevant function pointer in vop_vector. This is disgusting but since the code is generated by a script comparatively safe. The alternative for nullfs etc. would be much worse.
Fix up all vnode method vectors to remove casts so they become typesafe. (The bulk of this is generated by scripts)
|
138278 |
01-Dec-2004 |
phk |
Remove redundant functions (repo-copied from nfsclient) for dealing with fifos.
|
138276 |
01-Dec-2004 |
phk |
Scripted modification of vop_* prototypes to use typedefs.
|
138259 |
01-Dec-2004 |
phk |
Add missing #include
|
138256 |
01-Dec-2004 |
ps |
Fix for a race between lookup and readdirplus, that causes a deadlock (with NFS exclusive vnode locks enabled). Lookup grabs the parent's lock and wants to lock child. Readdirplus locks the child and wants to lock parent (for loading the attrs for ".."). The fix is to not load the attrs for ".." in readdirplus.
Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com Reviewed by: rwatson
|
138255 |
01-Dec-2004 |
ps |
Clean all dirty pages (dirtied by mmap'ed writes) in nfs_close(). This closes a major hole in close-to-open consistency support. Added a new sysctl so that this can be disabled for single NFS client applications with very large amounts of mmap'ed IO (for performance).
Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com Reviewed by: rwatson
|
138254 |
01-Dec-2004 |
ps |
Fix for a (blocks) underrun bug where negative values were being returned back to df from a statfs call. Causing df to print negative values.
Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com Reviewed by: rwatson
|
138204 |
29-Nov-2004 |
ps |
Fix for a bug in nfs_mkdir() that called vrele() instead of vput() in the error cases, causing panics.
Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com Reviewed by: rwatson
|
137846 |
18-Nov-2004 |
jeff |
- Eliminate the acquisition and release of the bqlock in bremfree() by setting the B_REMFREE flag in the buf. This is done to prevent lock order reversals with code that must call bremfree() with a local lock held. This also reduces overhead by removing two lock operations per buf for fsync() and similar. - Check for the B_REMFREE flag in brelse() and bqrelse() after the bqlock has been acquired so that we may remove ourself from the free-list. - Provide a bremfreef() function to immediately remove a buf from a free-list for use only by NFS. This is done because the nfsclient code overloads the b_freelist queue for its own async. io queue. - Simplify the numfreebuffers accounting by removing a switch statement that executed the same code in every possible case. - getnewbuf() can encounter locked bufs on free-lists once Giant is removed. Remove a panic associated with this condition and delay asserts that inspect the buf until after it is locked.
Reviewed by: phk Sponsored by: Isilon Systems, Inc.
|
137480 |
09-Nov-2004 |
phk |
Detect root mount attempts on the flag, not on the NULL path.
|
137197 |
04-Nov-2004 |
phk |
Retire b_magic now, we have the bufobj containing the same hint.
|
136927 |
24-Oct-2004 |
phk |
Move the buffer method vector (buf->b_op) to the bufobj.
Extend it with a strategy method.
Add bufstrategy() which do the usual VOP_SPECSTRATEGY/VOP_STRATEGY song and dance.
Rename ibwrite to bufwrite().
Move the two NFS buf_ops to more sensible places, add bufstrategy to them.
Add inlines for bwrite() and bstrategy() which calls through buf->b_bufobj->b_ops->b_{write,strategy}().
Replace almost all VOP_STRATEGY()/VOP_SPECSTRATEGY() calls with bstrategy().
|
136767 |
22-Oct-2004 |
phk |
Add b_bufobj to struct buf which eventually will eliminate the need for b_vp.
Initialize b_bufobj for all buffers.
Make incore() and gbincore() take a bufobj instead of a vnode.
Make inmem() local to vfs_bio.c
Change a lot of VI_[UN]LOCK(bp->b_vp) to BO_[UN]LOCK(bp->b_bufobj) also VI_MTX() to BO_MTX(),
Make buf_vlist_add() take a bufobj instead of a vnode.
Eliminate other uses of bp->b_vp where bp->b_bufobj will do.
Various minor polishing: remove "register", turn panic into KASSERT, use new function declarations, TAILQ_FOREACH_SAFE() etc.
|
136751 |
21-Oct-2004 |
phk |
Move the VI_BWAIT flag into no bo_flag element of bufobj and call it BO_WWAIT
Add bufobj_wref(), bufobj_wdrop() and bufobj_wwait() to handle the write count on a bufobj. Bufobj_wdrop() replaces vwakeup().
Use these functions all relevant places except in ffs_softdep.c where the use if interlocked_sleep() makes this impossible.
Rename b_vnbufs to b_bobufs now that we touch all the relevant files anyway.
|
136517 |
14-Oct-2004 |
pjd |
Add a missing newline character.
|
136006 |
01-Oct-2004 |
das |
nfsclient/nfs_bio.c has a PHOLD() without a PRELE(). Neither should be necessary here. Also, use killproc() instead of psignal().
|
135874 |
28-Sep-2004 |
phk |
Remove support for using NFS device nodes.
|
135862 |
27-Sep-2004 |
phk |
Remove NFS4 vop method vector for devices: we are desupporing device nodes on anything but DEVFS and in this case it was not even used (see below).
Put the NFS4 vop method for fifo's behind "#if 0" because it is unused. Add a XXX comment to say that I think the unusedness is a bug.
|
135860 |
27-Sep-2004 |
phk |
style consistency.
|
135280 |
15-Sep-2004 |
phk |
Remove unused B_WRITEINPROG flag
|
134898 |
07-Sep-2004 |
phk |
Explicitly pass vnode to nfs_doio() and mountpoint to nfs_asyncio().
|
134281 |
25-Aug-2004 |
rwatson |
In nfs_timer(), pass curthread rather than &thread0 into the protocol send routine. In IPv6 UDP, the thread will be passed to suser(), which asserts that if a thread is used for a super user check, it be curthread. Many of these protocol entry points probably need to accept credentials instead of threads.
MT5 candidate.
Noticed/tested by: kuriyama
|
132902 |
30-Jul-2004 |
phk |
Put a version element in the VFS filesystem configuration structure and refuse initializing filesystems with a wrong version. This will aid maintenance activites on the 5-stable branch.
s/vfs_mount/vfs_omount/
s/vfs_nmount/vfs_mount/
Name our filesystems mount function consistently.
Eliminate the namiedata argument to both vfs_mount and vfs_omount. It was originally there to save stack space. A few places abused it to get hold of some credentials to pass around. Effectively it is unused.
Reorganize the root filesystem selection code.
|
132808 |
28-Jul-2004 |
phk |
Move a relic to its correct location(s): Put nfs diskless initialization calls with the code they call. (Yet another example of mindless copy&paste).
|
132805 |
28-Jul-2004 |
phk |
Remove global variable rootdevs and rootvp, they are unused as such.
Add local rootvp variables as needed.
Remove checks for miniroot's in the swappartition. We never did that and most of the filesystems could never be used for that, but it had still been copy&pasted all over the place.
|
132640 |
25-Jul-2004 |
phk |
Eliminate unused second argument to reassignbuf() and simplify it accordingly.
|
132084 |
13-Jul-2004 |
alfred |
Turn off SO_REUSEADDR and SO_REUSEPORT, they were causing EADDRINUSE to be returned from the protocol stack.
Pointy hat to me for not groking what those options _really_ mean.
|
132060 |
12-Jul-2004 |
dwmalone |
Rename Alfred's kern_setsockopt to so_setsockopt, as this seems a a better name. I have a kern_[sg]etsockopt which I plan to commit shortly, but the arguments to these function will be quite different from so_setsockopt.
Approved by: alfred
|
132023 |
12-Jul-2004 |
alfred |
Make VFS_ROOT() and vflush() take a thread argument. This is to allow filesystems to decide based on the passed thread which vnode to return. Several filesystems used curthread, they now use the passed thread.
|
132018 |
12-Jul-2004 |
alfred |
Use SO_REUSEADDR and SO_REUSEPORT when reconnecting NFS mounts. Tune the timeout from 5 seconds to 12 seconds. Provide a sysctl to show how many reconnects the NFS client has done.
Seems to fix IPv6 from: kuriyama
|
131840 |
08-Jul-2004 |
brian |
Change the following environment variables to kernel options:
bootp -> BOOTP bootp.nfsroot -> BOOTP_NFSROOT bootp.nfsv3 -> BOOTP_NFSV3 bootp.compat -> BOOTP_COMPAT bootp.wired_to -> BOOTP_WIRED_TO
- i.e. back out the previous commit. It's already possible to pxeboot(8) with a GENERIC kernel.
Pointed out by: dwmalone
|
131814 |
08-Jul-2004 |
brian |
Change the following kernel options to environment variables:
BOOTP -> bootp BOOTP_NFSROOT -> bootp.nfsroot BOOTP_NFSV3 -> bootp.nfsv3 BOOTP_COMPAT -> bootp.compat BOOTP_WIRED_TO -> bootp.wired_to
This lets you PXE boot with a GENERIC kernel by putting this sort of thing in loader.conf:
bootp="YES" bootp.nfsroot="YES" bootp.nfsv3="YES" bootp.wired_to="bge1"
or even setting the variables manually from the OK prompt.
|
131717 |
06-Jul-2004 |
rwatson |
Acquire socket lock in nfs_connect() connection/sleep loop to protect socket state and avoid missed wakeups.
|
131697 |
06-Jul-2004 |
alfred |
use vfs_suser() to restrict access to the nfs mount's timeout.
|
131694 |
06-Jul-2004 |
alfred |
NFS mobility Phase VI:
Export NFS mount state via sysctl. Export timeout via sysctl.
|
131691 |
06-Jul-2004 |
alfred |
NFS mobility PHASE I, II & III (phase VI, and V pending):
Rebind the client socket when we experience a timeout. This fixes the case where our IP changes for some reason.
Signal a VFS event when NFS transitions from up to down and vice versa.
Add a placeholder vfs_sysctl where we will put status reporting shortly.
Also: Make down NFS mounts return EIO instead of EINTR when there is a soft timeout or force unmount in progress.
|
131551 |
04-Jul-2004 |
phk |
When we traverse the vnodes on a mountpoint we need to look out for our cached 'next vnode' being removed from this mountpoint. If we find that it was recycled, we restart our traversal from the start of the list.
Code to do that is in all local disk filesystems (and a few other places) and looks roughly like this:
MNT_ILOCK(mp); loop: for (vp = TAILQ_FIRST(&mp...); (vp = nvp) != NULL; nvp = TAILQ_NEXT(vp,...)) { if (vp->v_mount != mp) goto loop; MNT_IUNLOCK(mp); ... MNT_ILOCK(mp); } MNT_IUNLOCK(mp);
The code which takes vnodes off a mountpoint looks like this:
MNT_ILOCK(vp->v_mount); ... TAILQ_REMOVE(&vp->v_mount->mnt_nvnodelist, vp, v_nmntvnodes); ... MNT_IUNLOCK(vp->v_mount); ... vp->v_mount = something;
(Take a moment and try to spot the locking error before you read on.)
On a SMP system, one CPU could have removed nvp from our mountlist but not yet gotten to assign a new value to vp->v_mount while another CPU simultaneously get to the top of the traversal loop where it finds that (vp->v_mount != mp) is not true despite the fact that the vnode has indeed been removed from our mountpoint.
Fix:
Introduce the macro MNT_VNODE_FOREACH() to traverse the list of vnodes on a mountpoint while taking into account that vnodes may be removed from the list as we go. This saves approx 65 lines of duplicated code.
Split the insmntque() which potentially moves a vnode from one mount point to another into delmntque() and insmntque() which does just what the names say.
Fix delmntque() to set vp->v_mount to NULL while holding the mountpoint lock.
|
131021 |
24-Jun-2004 |
rwatson |
When updating sb_flags, acquire the socket buffer lock to prevent races.
|
130640 |
17-Jun-2004 |
phk |
Second half of the dev_t cleanup.
The big lines are: NODEV -> NULL NOUDEV -> NODEV udev_t -> dev_t udev2dev() -> findcdev()
Various minor adjustments including handling of userland access to kernel space struct cdev etc.
|
130619 |
17-Jun-2004 |
rwatson |
Remove bad cookie vp kernel printf; while it does notify about an interesting event, there's little or nothing the user can do about it.
|
130554 |
16-Jun-2004 |
rwatson |
Convert GIANT_REQUIRED to NET_ASSERT_GIANT where Giant is used to protect socket operations. Leave one "as-is" as it also frobs rootvp.
|
128992 |
06-May-2004 |
alc |
Make vm_page's PG_ZERO flag immutable between the time of the page's allocation and deallocation. This flag's principal use is shortly after allocation. For such cases, clearing the flag is pointless. The only unusual use of PG_ZERO is in vfs_bio_clrbuf(). However, allocbuf() never requests a prezeroed page. So, vfs_bio_clrbuf() never sees a prezeroed page.
Reviewed by: tegge@
|
128263 |
14-Apr-2004 |
peadar |
Let the NFS client notice a file's size changing as a modification. This avoids presenting invalid data to the client's applications when the file is modified, and then extended within the window of the resolution of the modifcation timestamp.
Reviewed By: iedowse PR: kern/64091
|
128126 |
11-Apr-2004 |
marcel |
Unbreak build: s/TAILQ_ISEMPTY/TAILQ_EMPTY/g
|
128111 |
11-Apr-2004 |
peadar |
Clean up properly when unloading NFS client module.
This includes a modified form of some code from Thomas Moestl (tmm@) to properly clean up the UMA zone and the "nfsnodehashtbl" hash table.
Reviewed By: iedowse PR: 16299
|
127977 |
07-Apr-2004 |
imp |
Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999 and email from Peter Wemm, Alan Cox and Robert Watson.
Approved by: core, peter, alc, rwatson
|
127857 |
04-Apr-2004 |
rwatson |
Spell 2 as SHUT_RDWR when used as an argument to soshutdown().
|
127797 |
03-Apr-2004 |
peadar |
Flush cached access mode after modifying a files attributes for NFSv3. It's likely that modifying the attributes will affect the file's accessibility. This version of the patch is one suggested by Ian Dowse after reviewing my original attempt in the PR
Reviewed By: iedowse PR: kern/44336 MFC after: 3 days
|
127515 |
28-Mar-2004 |
kan |
Reset callout if in nfs_timeout and rpcclnt_timeout functions. Timer are supposed to continue firing as long as there is work to do, not stop after the first invocation.
This is damage control after a patch that has been committed prematurely.
Tested by: kris
|
127421 |
25-Mar-2004 |
rees |
only do nfs rpc callouts if there is work to do.
Submitted by: kan Approved by: alfred
|
127144 |
17-Mar-2004 |
pjd |
Add a comment with an explanation why we don't report EPIPE errors on nfs sockets.
Requested by: ru
|
127137 |
17-Mar-2004 |
pjd |
Don't report EPIPE errors on nfs sockets. These can be due to idle tcp mounts which will be closed by netapp, solaris, etc. if left idle too long.
Obtained from: NetBSD
|
126962 |
14-Mar-2004 |
peter |
Calculate NFS timeouts in units of 10ms, not 5ms. This matches the default clock precision on i386. This is a NOP change on i386. But this stops the mount_nfs units from suddenly changing to units of 1/20 of a second (vs the normal 1/10 of a second) if HZ is increased.
|
126888 |
12-Mar-2004 |
brooks |
Allow kernel with the BOOTP option to boot when DHCP/BOOTP sets the root path to an absolute path without a host name. Previously, there was a nasty POLA violation where a system would PXE boot until you added the BOOTP option and then it would panic instead.
Reviewed by: tegge, Dirk-Willem van Gulik <dirkx at webweaving.org> (a previous version) Submitted by: tegge (getip function)
|
126853 |
11-Mar-2004 |
phk |
Properly vector all bwrite() and BUF_WRITE() calls through the same path and s/BUF_WRITE()/bwrite()/ since it now does the same as bwrite().
|
126851 |
11-Mar-2004 |
phk |
Remove unused second arg to vfinddev(). Don't call addaliasu() on VBLK nodes.
|
126425 |
01-Mar-2004 |
rwatson |
Rename dup_sockaddr() to sodupsockaddr() for consistency with other functions in kern_socket.c.
Rename the "canwait" field to "mflags" and pass M_WAITOK and M_NOWAIT in from the caller context rather than "1" or "0".
Correct mflags pass into mac_init_socket() from previous commit to not include M_ZERO.
Submitted by: sam
|
126330 |
27-Feb-2004 |
rees |
NFSv4 fixes from Connectathon 2004:
remove unused pid field of file context struct map nfs4 error codes to errnos eliminate redundant code from nfs4_request use zero stateid on setattr that doesn't set file size use same clientid on all mounts until reboot invalidate dirty bufs in nfs4_close, to play it safe open file for writing if truncating and it's not already open
Approved by: alfred
|
126105 |
22-Feb-2004 |
cperciva |
If mountnfs returns an error, it will have already freed nam; no need to free it again.
Reported by: "Ted Unangst" <tedu@coverity.com> Approved by: rwatson (mentor)
|
125454 |
04-Feb-2004 |
jhb |
Locking for the per-process resource limits structure. - struct plimit includes a mutex to protect a reference count. The plimit structure is treated similarly to struct ucred in that is is always copy on write, so having a reference to a structure is sufficient to read from it without needing a further lock. - The proc lock protects the p_limit pointer and must be held while reading limits from a process to keep the limit structure from changing out from under you while reading from it. - Various global limits that are ints are not protected by a lock since int writes are atomic on all the archs we support and thus a lock wouldn't buy us anything. - All accesses to individual resource limits from a process are abstracted behind a simple lim_rlimit(), lim_max(), and lim_cur() API that return either an rlimit, or the current or max individual limit of the specified resource from a process. - dosetrlimit() was renamed to kern_setrlimit() to match existing style of other similar syscall helper functions. - The alpha OSF/1 compat layer no longer calls getrlimit() and setrlimit() (it didn't used the stackgap when it should have) but uses lim_rlimit() and kern_setrlimit() instead. - The svr4 compat no longer uses the stackgap for resource limits calls, but uses lim_rlimit() and kern_setrlimit() instead. - The ibcs2 compat no longer uses the stackgap for resource limits. It also no longer uses the stackgap for accessing sysctl's for the ibcs2_sysconf() syscall but uses kernel_sysctl() instead. As a result, ibcs2_sysconf() no longer needs Giant. - The p_rlimit macro no longer exists.
Submitted by: mtm (mostly, I only did a few cleanups and catchups) Tested on: i386 Compiled on: alpha, amd64
|
125263 |
31-Jan-2004 |
obrien |
Bump the NFCv3/TCP defaults for rsize and wsize from 8K to 32K to match Solaris and HP-UX. This increases read performance for large files across NFS.
PR: 62024 & 26324 Submitted by: Bjoern Groenvall <bg@sics.se>
|
122953 |
22-Nov-2003 |
alfred |
Use function pointers to remove the depenancy cross dependancy on nfs4 and the nfs3 client. Also fix some bugs that happen to be causing crashes in both v3 and v4 introduced by the v4 import.
Submitted by: Jim Rees <rees@umich.edu> Approved by: re
|
122736 |
15-Nov-2003 |
alfred |
Move the declaration for "struct nfs4_fctx" out from under #ifdef KERNEL for fstat(1).
|
122719 |
15-Nov-2003 |
alfred |
unbreak LINT.
|
122698 |
14-Nov-2003 |
alfred |
University of Michigan's Citi NFSv4 kernel client code.
Submitted by: Jim Rees <rees@umich.edu>
|
122523 |
12-Nov-2003 |
kan |
1. Consolidate mount struct allocation/destruction into a common code in vfs_mount_alloc/vfs_mount_destroy functions and take care to completely destroy the mount point along with its locks. Mount struct has grown in coplexity recently and depending on each failure path to destroy it completely isn't working anymore.
2. Eliminate largely identical vfs_mount and vfs_unmount question by moving the code to handle both cases into a newly introduced vfs_domount function.
3. Simplify nfs_mount_diskless to always expect an allocated mount struct and never attempt an allocation/destruction itself. The vfs_allocroot allocation was there to support 'magic' swap space configuration for diskless clients that was already removed by PHK some time ago.
4. Include a vfs_buildopts cleanups by Peter Edwards to validate the sanity of nmount parameters passed from userland.
Submitted by: (4) Peter Edwards <peter.edwards@openet-telecom.com> Reviewed by: rwatson
|
122450 |
11-Nov-2003 |
alfred |
Stop using shared locks for nfs vop locks.
The reason this was done was to avoid a race to the root when an NFS server went down. However a semi-recent change to the way that the kernel's lookup() routine traverses mount points prevents this.
Rev 1.39 of vfs_lookup.c changed the ordering of locks such that we aquire a shared lock on the mount point being accessed and then drop the directory vnode lock before requesting the target lock.
With that in place we no longer need shared locks for NFS to prevent race to the root lockups.
|
122261 |
07-Nov-2003 |
sam |
Assert GIANT_REQUIRED where sockets are manipulated. This is preparatory for MPSAFE network commits and ongoing socket locking work.
Supported by: FreeBSD Foundation
|
122091 |
05-Nov-2003 |
kan |
Remove mntvnode_mtx and replace it with per-mountpoint mutex. Introduce two new macros MNT_ILOCK(mp)/MNT_IUNLOCK(mp) to operate on this mutex transparently.
Eventually new mutex will be protecting more fields in struct mount, not only vnode list.
Discussed with: jeff
|
121874 |
02-Nov-2003 |
kan |
Take care not to call vput if thread used in corresponding vget wasn't curthread, i.e. when we receive a thread pointer to use as a function argument. Use VOP_UNLOCK/vrele in these cases.
The only case there td != curthread known at the moment is boot() calling sync with thread0 pointer.
This fixes the panic on shutdown people have reported.
|
121816 |
31-Oct-2003 |
brooks |
Replace the if_name and if_unit members of struct ifnet with new members if_xname, if_dname, and if_dunit. if_xname is the name of the interface and if_dname/unit are the driver name and instance.
This change paves the way for interface renaming and enhanced pseudo device creation and configuration symantics.
Approved By: re (in principle) Reviewed By: njl, imp Tested On: i386, amd64, sparc64 Obtained From: NetBSD (if_xname)
|
121205 |
18-Oct-2003 |
phk |
DuH!
bp->b_iooffset (the spot on the disk), not bp->b_offset (the offset in the file)
|
121201 |
18-Oct-2003 |
phk |
Initialize bp->b_offset before calling VOP_STRATEGY().
Remove KASSERTS and panics with B_PHYS checks which no longer apply.
|
121191 |
18-Oct-2003 |
phk |
We do not get B_PHYS buffers here anymore. /dev/drum is long gone.
|
120812 |
05-Oct-2003 |
iedowse |
Since the addition of the VI_DOINGINACT flag some time ago, VOP_INACTIVE routines need not worry about their vnode getting recycled if they block. Remove the code from nfs_inactive() that used vget() to get an extra vnode reference that was held during the nfs_vinvalbuf() call.
|
120788 |
05-Oct-2003 |
jeff |
- Remove an incorrect XXX comment. This code does respect the XLOCK since it uses vget() which will fail if the identity changes.
|
120787 |
05-Oct-2003 |
jeff |
- Check the XLOCK before we inspect the vnode.
|
120786 |
05-Oct-2003 |
jeff |
- We don't need to cache_purge() in nfs_reclaim(), vclean() does it for us.
|
120755 |
04-Oct-2003 |
jeff |
- Consistently set sopt_dir.
Pointed out by: pete@isilon.com
|
120736 |
04-Oct-2003 |
jeff |
- Acquire the vnode interlock prior to dropping the mntvnode_mtx. - Make a note of the lack of XLOCK protection in this code. We would access a vnode while it is changing identities without Giant.
|
120730 |
04-Oct-2003 |
jeff |
- Remove the backtrace() call from the *_vinvalbuf() functions. Thanks to a stack trace supplied by phk, I now understand what's going on here. The check for VI_XLOCK stops us from calling vinvalbuf once the vnode has been partially torn down in vclean(). It is not clear that this would cause a problem. Document this in nfs_bio.c, which is where the other two filesystems copied this code from.
|
120264 |
19-Sep-2003 |
jeff |
- Remove interlock protection around VI_XLOCK. The interlock is not sufficient to guarantee that this race is not hit. The XLOCK will likely have to be redesigned due to the way reference counting and mutexes work in FreeBSD. We currently can not be guaranteed that xlock was not set and cleared while we were blocked on the interlock while waiting to check for XLOCK. This would lead us to reference a vnode which was not the vnode we requested. - Add a backtrace() call inside of INVARIANTS in the hopes of finding out if this condition is ever hit. It should not, since we should be retaining a reference to the vnode in these cases. The reference would be sufficient to block recycling.
|
120003 |
12-Sep-2003 |
phk |
Name the vnode method vectors consistently with the rest of the filesystems.
This improves the output of src/tools/tools/vop_table
|
119766 |
05-Sep-2003 |
phk |
Remove now unused BOOTP tags related to NFS swap device.
|
119735 |
04-Sep-2003 |
dds |
KNF: parentheses around return values.
Suggested by: bde Approved by: schweikh (mentor - blanket) MFC after: 6 weeks
|
119687 |
02-Sep-2003 |
dds |
Fix errno return values to better represent failure reasons for read and open.
Approved by: schweikh (mentor) Agreed: bde MFC after: 6 weeks
|
118944 |
15-Aug-2003 |
phk |
Remove the magic way of configuring NFS backed swap.
This code dates back to the very first diskless support on FreeBSD, back when swapon(8) couldn't simply be run on a NFS backed file.
Suggested replacement command sequence on the client:
dd if=/dev/zero of=/swapfile bs=1k count=1 oseek=100000 swapon /swapfile rm -f /swapfile
For whatever value of 100000 you want.
|
118639 |
07-Aug-2003 |
billf |
0) preallocate per-interface context structures without the ifnet lock held 1) avoid immediately calling bzero() after malloc() by passing M_ZERO 2) do not initialize individual members of the global context to zero 3) remove an unused assignment of ifctx in bootpc_init()
Reviewed by: tegge
|
118135 |
29-Jul-2003 |
tjr |
Fix a problem that occurs when truncating files on NFSv3 mounts: we need to set np->n_size back to the desired size again after calling nfs_meta_setsize(), since it could end up in nfs_loadattrcache() getting called, which would change n_size back to the value it had before the truncate request was issued. The result of this bug is that the size info cached in the nfsnode becomes incorrect, lseek(fd, ofs, SEEK_END) seeks past the end of the file, stat() returns the wrong size, etc.
PR: 41792 MFC after: 2 weeks
|
118094 |
27-Jul-2003 |
phk |
Add fdidx argument to vn_open() and vn_open_cred() and pass -1 throughout.
|
117152 |
02-Jul-2003 |
phk |
Change idle sleep indentifier to "-" for nfsiod
|
116461 |
17-Jun-2003 |
alc |
Lock the vm object when freeing a page.
|
116412 |
15-Jun-2003 |
phk |
Add the same KASSERT to all VOP_STRATEGY and VOP_SPECSTRATEGY implementations to check that the buffer points to the correct vnode.
|
116271 |
12-Jun-2003 |
phk |
Initialize struct vfsops C99-sparsely.
Submitted by: hmp Reviewed by: phk
|
116260 |
12-Jun-2003 |
iedowse |
When removing a sillyrename file, make sure that the directory vnode has not been cleaned in the meantime, since this can happen during a forced unmount. Also add a comment that nfs_removeit() should really be locking the directory vnode before calling nfs_removerpc().
Reported by: mbr Tested by: mbr MFC after: 1 week
|
116189 |
11-Jun-2003 |
obrien |
Use __FBSDID().
|
116185 |
11-Jun-2003 |
rwatson |
Add the comment I meant to add about not passing in PCATCH to the tsleep(). Note the XXX.
|
116070 |
09-Jun-2003 |
hsu |
On a socket creation error, don't close the socket.
|
115533 |
31-May-2003 |
phk |
Remove unsed variables. Add explicit breaks to switch
Found by: FlexeLint
|
115456 |
31-May-2003 |
phk |
The IO_NOWDRAIN and B_NOWDRAIN hacks are no longer needed to prevent deadlocks with vnode backed md(4) devices because md now uses a kthread to run the bio requests instead of doing it directly from the bio down path.
|
115415 |
30-May-2003 |
rwatson |
rpc.lockd stability workaround: remove PCATCH from the tsleep() in nfs_lock.c. Right now, if we permit a signal to interrupt the sleep, we will slip the lock and no process on that client, the server, or any other client will be able to acquire the lock. This can happen, for example, if a user hits Ctrl-C or Ctrl-T while a process is waiting for the lock. By removing PCATCH, we prevent that from happening, at the cost of not permitting a user-requested lock abort: also nasty. However, a user interface bug might be preferable to a serious semantic bug, so we go with that for now.
We need to teach the rpc.lockd/kernel protocol how to abort lock requests, and rpc.lockd how to handle aborted lock requests; patches for the kernel bit are floating around, but no rpc.lockd bit yet.
Approved by: re (scottl)
|
115172 |
19-May-2003 |
peter |
Deal with the possibility of negative available space from the file server to avoid Bad Things(TM) happening (eg: df crashing with a floating point exception).
Submitted by: Harold Gutch <logix@foobar.franken.de> Approved by: re (scottl)
|
115041 |
15-May-2003 |
rwatson |
This change grabs the vnode lock for NFS client vnodes when calling VOP_SETATTR() or VOP_GETATTR(); without these locks (a) VFS_DEBUG_LOCKS will panic, and (b) it may be possible to corrupt entries in the cached vnode attributes in the nfsnode, since nfsnode attribute cache data is also protected by the vnode lock.
Approved by: re (jhb) Pointed out by: VFS_DEBUG_LOCKS
|
114983 |
13-May-2003 |
jhb |
- Merge struct procsig with struct sigacts. - Move struct sigacts out of the u-area and malloc() it using the M_SUBPROC malloc bucket. - Add a small sigacts_*() API for managing sigacts structures: sigacts_alloc(), sigacts_free(), sigacts_copy(), sigacts_share(), and sigacts_shared(). - Remove the p_sigignore, p_sigacts, and p_sigcatch macros. - Add a mutex to struct sigacts that protects all the members of the struct. - Add sigacts locking. - Remove Giant from nosys(), kill(), killpg(), and kern_sigaction() now that sigacts is locked. - Several in-kernel functions such as psignal(), tdsignal(), trapsignal(), and thread_stopped() are now MP safe.
Reviewed by: arch@ Approved by: re (rwatson)
|
114434 |
01-May-2003 |
des |
Instead of recording the Unix time in a process when it starts, record the uptime. Where necessary, convert it back to Unix time by adding boottime to it. This fixes a potential problem in the accounting code, which would compute the elapsed time incorrectly if the Unix time was stepped during the lifetime of the process.
|
114216 |
29-Apr-2003 |
kan |
Deprecate machine/limits.h in favor of new sys/limits.h. Change all in-tree consumers to include <sys/limits.h>
Discussed on: standards@ Partially submitted by: Craig Rodrigues <rodrigc@attbi.com>
|
113986 |
24-Apr-2003 |
truckman |
VOP_FSYNC() expects to be called with the vnode locked, so lock fvp in nfs_rename() before calling VOP_FSYNC() and unlock fvp immediately after.
Reviewed by: bde
|
113985 |
24-Apr-2003 |
peter |
Fix a bug with df on large (>1TB) nfsv3 file servers on 32 bit client machines where the 'long' number of blocks in struct statfs wont fit. Instead of chosing an artificial 512 byte block size, simply scale it up until we avoid an overflow. NFSv3 reports the sizes in bytes, and the blocksize is a figment of nfsclient's imagination.
|
113885 |
23-Apr-2003 |
truckman |
Release the vnode interlock in nfs_flush() before calling nfs_sigintr(), and grab it again later if necessary. This prevents a lock order reversal because nfs_sigintr() calls PROC_LOCK().
|
112892 |
31-Mar-2003 |
thomas |
Revert change 1.201 (removing mapping of VAPPEND to VWRITE). Instead, use the generic vaccess() operation to determine whether an operation is permitted. This avoids embedding knowledge on vnode permission bits such as VAPPEND in the NFS client.
PR: kern/46515 vaccess() patch submitted by: "Peter Edwards" <pmedwards@eircom.net> Approved by: tjr, roberto (mentor)
|
112888 |
31-Mar-2003 |
jeff |
- Move p->p_sigmask to td->td_sigmask. Signal masks will be per thread with a follow on commit to kern_sig.c - signotify() now operates on a thread since unmasked pending signals are stored in the thread. - PS_NEEDSIGCHK moves to TDF_NEEDSIGCHK.
|
112685 |
26-Mar-2003 |
rwatson |
Add O_NONBLOCK to the vn_open_cred() flags for NFS client locking when opening the POSIX fifo; convert ENXIO error returns to EOPNOTSUPP.
This improves handling of the case where the /var/run/lock fifo exists but there is no listener: we immediately return EOPNOTSUPP rather than blocking until a listener turns up. This could occur during a diskless boot before rpc.lockd is loaded, or if the lock file persists across a reboot following the disabling of rpc.lockd. This may have suddenly started to occur due to fifo blocking fixes--previously it looks like attempts to read on a fifo with no listener would time out due to insufficient resources.
Reviewed by: alfred
|
112657 |
26-Mar-2003 |
alfred |
req can not be NULL or we'd die.
Sponsored by: RED
|
112455 |
21-Mar-2003 |
tjr |
Map VAPPEND to VWRITE in nfsspec_access() - VAPPEND is never set in the mode returned by VOP_GETATTR. This fixes incorrect "Permission denied" errors when trying to append to a file on an NFSv2 mount.
|
112225 |
14-Mar-2003 |
jeff |
- Add a forgotten BUF_LOCK()
Most sincere apologies to: jake
|
112178 |
13-Mar-2003 |
jeff |
- Lock the buf before inspecting its contents.
|
111856 |
04-Mar-2003 |
jeff |
- Add a new 'flags' parameter to getblk(). - Define one flag GB_LOCK_NOWAIT that tells getblk() to pass the LK_NOWAIT flag to the initial BUF_LOCK(). This will eventually be used in cases were we want to use a buffer only if it is not currently in use. - Convert all consumers of the getblk() api to use this extra parameter.
Reviwed by: arch Not objected to by: mckusick
|
111841 |
03-Mar-2003 |
njl |
Finish cleanup of vprint() which was begun with changing v_tag to a string. Remove extraneous uses of vop_null, instead defering to the default op. Rename vnode type "vfs" to the more descriptive "syncer". Fix formatting for various filesystems that use vop_print.
|
111748 |
02-Mar-2003 |
des |
More low-hanging fruit: kill caddr_t in calls to wakeup(9) / [mt]sleep(9).
|
111514 |
26-Feb-2003 |
jeff |
- The interlock was not being droped in nfs_flush() if the first part of an if clause was true. Break the two clauses out into seperate statements since they require different actions.
Reported/Tested by: jake Spotted by: jhb
|
111475 |
25-Feb-2003 |
jeff |
- Properly handle the vnode interlock in nfs_fsync.
Reported by: phk
|
111463 |
25-Feb-2003 |
jeff |
- Add an interlock argument to BUF_LOCK and BUF_TIMELOCK. - Remove the buftimelock mutex and acquire the buf's interlock to protect these fields instead. - Hold the vnode interlock while locking bufs on the clean/dirty queues. This reduces some cases from one BUF_LOCK with a LK_NOWAIT and another BUF_LOCK with a LK_TIMEFAIL to a single lock.
Reviewed by: arch, mckusick
|
111119 |
19-Feb-2003 |
imp |
Back out M_* changes, per decision of the TRB.
Approved by: trb
|
111105 |
18-Feb-2003 |
peter |
Get rid of a silly message I added back in Sept 2001 (1.68).
|
110922 |
15-Feb-2003 |
tjr |
Lock proc while accessing p_siglist, p_sigmask and p_sigignore in nfs_sigintr().
|
109704 |
22-Jan-2003 |
dillon |
Provide a sysctl to allow defaulting of the connectionless (-c) feature to mount_nfs. The sysctl defaults to 1 (paranoid mode). Setting it to 0 will allow an NFS client to receive replies on a different IP then they were sent to by default.
Submitted by: Sean Eric Fagan <sef@kithrup.com>
|
109623 |
21-Jan-2003 |
alfred |
Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0. Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
|
108648 |
04-Jan-2003 |
phk |
Since Jeffr made the std* functions the default in rev 1.63 of kern/vfs_defaults.c it is wrong for the individual filesystems to use the std* functions as that prevents override of the default.
Found by: src/tools/tools/vop_table
|
108589 |
03-Jan-2003 |
phk |
Convert calls to BUF_STRATEGY to VOP_STRATEGY calls. This is a no-op since all BUF_STRATEGY did in the first place was call VOP_STRATEGY.
|
108470 |
30-Dec-2002 |
schweikh |
Fix typos, mostly s/ an / a / where appropriate and a few s/an/and/ Add FreeBSD Id tag where missing.
|
108357 |
28-Dec-2002 |
dillon |
Abstract-out the constants for the sequential heuristic.
No operational changes.
MFC after: 1 day
|
108250 |
24-Dec-2002 |
hsu |
SMP locking for radix nodes.
|
108198 |
23-Dec-2002 |
alc |
Avoid holding the vnode interlock around malloc() or free() to prevent a lock order reversal.
Reviewed by: jeff
|
108172 |
22-Dec-2002 |
hsu |
SMP locking for ifnet list.
|
108161 |
21-Dec-2002 |
dillon |
do not try to free a mountpoint that we did not allocate.
X-MFC after: immediately
|
107104 |
20-Nov-2002 |
alfred |
reapply 1.26 through 1.28.
Approved by: re
|
107101 |
20-Nov-2002 |
alfred |
forgot about 5.x freeze, backout 1.26 through 1.28 pending re@ appoval.
|
107100 |
20-Nov-2002 |
alfred |
remove useless casts, unused macros and cleanup a line wrap.
|
107099 |
20-Nov-2002 |
alfred |
comment and untwist error return logic
|
107098 |
20-Nov-2002 |
alfred |
Remove an outdated comment complaining about exporting struct ucred to userspace, I fixed it a while ago.
|
105568 |
20-Oct-2002 |
phk |
Don't examine an un-initialized variable.
Spotted by: FlexeLint.
|
105563 |
20-Oct-2002 |
phk |
Remove extern declarations of stuff which is static in nfs_node.c Move related macro to nfs_node.c
Spotted by: FlexeLint
|
105077 |
14-Oct-2002 |
mckusick |
Regularize the vop_stdlock'ing protocol across all the filesystems that use it. Specifically, vop_stdlock uses the lock pointed to by vp->v_vnlock. By default, getnewvnode sets up vp->v_vnlock to reference vp->v_lock. Filesystems that wish to use the default do not need to allocate a lock at the front of their node structure (as some still did) or do a lockinit. They can simply start using vn_lock/VOP_UNLOCK. Filesystems that wish to manage their own locks, but still use the vop_stdlock functions (such as nullfs) can simply replace vp->v_vnlock with a pointer to the lock that they wish to have used for the vnode. Such filesystems are responsible for setting the vp->v_vnlock back to the default in their vop_reclaim routine (e.g., vp->v_vnlock = &vp->v_lock).
In theory, this set of changes cleans up the existing filesystem lock interface and should have no function change to the existing locking scheme.
Sponsored by: DARPA & NAI Labs.
|
104908 |
11-Oct-2002 |
mike |
Change iov_base's type from `char *' to the standard `void *'. All uses of iov_base which assume its type is `char *' (in order to do pointer arithmetic) have been updated to cast iov_base to `char *'.
|
104354 |
02-Oct-2002 |
scottl |
Some kernel threads try to do significant work, and the default KSTACK_PAGES doesn't give them enough stack to do much before blowing away the pcb. This adds MI and MD code to allow the allocation of an alternate kstack who's size can be speficied when calling kthread_create. Passing the value 0 prevents the alternate kstack from being created. Note that the ia64 MD code is missing for now, and PowerPC was only partially written due to the pmap.c being incomplete there. Though this patch does not modify anything to make use of the alternate kstack, acpi and usb are good candidates.
Reviewed by: jake, peter, jhb
|
104306 |
01-Oct-2002 |
jmallett |
Back our kernel support for reliable signal queues.
Requested by: rwatson, phk, and many others
|
104241 |
30-Sep-2002 |
jmallett |
Lock access to the signal queue, and related structures, with PROC_LOCK.
Submitted by: jhb
|
104235 |
30-Sep-2002 |
jmallett |
Convert use of p_siglist and old SIG*() macros to use <sys/ksiginfo.h> prototyped functions to get a sigset_t, and further to check for any queued signals, rather than an empty signal set, to go with the move to signal queues rather than signal sets.
|
104094 |
28-Sep-2002 |
phk |
Be consistent about "static" functions: if the function is marked static in its prototype, mark it static at the definition too.
Inspired by: FlexeLint warning #512
|
104024 |
27-Sep-2002 |
rwatson |
Remove an errant debugging printf that got left in during my last commit.
Pointed out by: guido
|
104016 |
26-Sep-2002 |
rwatson |
Apparently pxeboot passes in a mygateway of non-zero sin length from DHCP in the event that no gateway is returned from DHCP, breaking the assumption that we skip the routing insertion of the gateway if the sin length is zero. Check also for s_addr of 0 to avoid the "Oh no, adding my default route failed" panic, making it possible to pxeboot machines on segments without default routes. Arguably this could be a bug in pxeboot, or in the TUNABLE code, but this makes my boxes boot.
|
103939 |
25-Sep-2002 |
jeff |
- Lock access to the buf lists. - Use vrefcnt() where appropriate. - Add some locking asserts.
|
103770 |
22-Sep-2002 |
jake |
Moved nfs_diskless setup code from autoconf.c to nfsclient/nfs_diskless.c so that it is MI. Allow nfs_mountroot to return an error if the nfs_diskless struct is not valid, rather than panicing later on. Call nfs_setup_diskless() from nfs_mountroot if NFS_ROOT is defined, like bootpc_init(). Removed legacy root mount support for sparc64, and enabled NFS_ROOT by default.
|
103554 |
18-Sep-2002 |
phk |
Use m_length() instead of home-rolled versions.
|
103314 |
14-Sep-2002 |
njl |
Remove all use of vnode->v_tag, replacing with appropriate substitutes. v_tag is now const char * and should only be used for debugging.
Additionally: 1. All users of VT_NTS now check vfsconf->vf_type VFCF_NETWORK 2. The user of VT_PROCFS now checks for the new flag VV_PROCDEP, which is propagated by pseudofs to all child vnodes if the fs sets PFS_PROCDEP.
Suggested by: phk Reviewed by: bde, rwatson (earlier version)
|
103099 |
08-Sep-2002 |
phk |
Now that we have a cached mount credential in struct mount, use it istead of a private cached copy.
|
102966 |
05-Sep-2002 |
bde |
Use `struct uma_zone *' instead of uma_zone_t, so that <sys/uma.h> isn't a prerequisite.
|
102052 |
18-Aug-2002 |
sobomax |
Increase size of ifnet.if_flags from 16 bits (short) to 32 bits (int). To avoid breaking application ABI use unused ifreq.ifru_flags[1] for upper 16 bits in SIOCSIFFLAGS and SIOCGIFFLAGS ioctl's.
Reviewed by: -hackers, -net
|
101947 |
15-Aug-2002 |
alfred |
Remove a case of exposing 'struct ucred' to userspace. Use a struct xucred for LOCKD_MSG instead.
Requested by: rwatson
|
101941 |
15-Aug-2002 |
rwatson |
In order to better support flexible and extensible access control, make a series of modifications to the credential arguments relating to file read and write operations to cliarfy which credential is used for what:
- Change fo_read() and fo_write() to accept "active_cred" instead of "cred", and change the semantics of consumers of fo_read() and fo_write() to pass the active credential of the thread requesting an operation rather than the cached file cred. The cached file cred is still available in fo_read() and fo_write() consumers via fp->f_cred. These changes largely in sys_generic.c.
For each implementation of fo_read() and fo_write(), update cred usage to reflect this change and maintain current semantics:
- badfo_readwrite() unchanged - kqueue_read/write() unchanged pipe_read/write() now authorize MAC using active_cred rather than td->td_ucred - soo_read/write() unchanged - vn_read/write() now authorize MAC using active_cred but VOP_READ/WRITE() with fp->f_cred
Modify vn_rdwr() to accept two credential arguments instead of a single credential: active_cred and file_cred. Use active_cred for MAC authorization, and select a credential for use in VOP_READ/WRITE() based on whether file_cred is NULL or not. If file_cred is provided, authorize the VOP using that cred, otherwise the active credential, matching current semantics.
Modify current vn_rdwr() consumers to pass a file_cred if used in the context of a struct file, and to always pass active_cred. When vn_rdwr() is used without a file_cred, pass NOCRED.
These changes should maintain current semantics for read/write, but avoid a redundant passing of fp->f_cred, as well as making it more clear what the origin of each credential is in file descriptor read/write operations.
Follow-up commits will make similar changes to other file descriptor operations, and modify the MAC framework to pass both credentials to MAC policy modules so they can implement either semantic for revocation.
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
101777 |
13-Aug-2002 |
phk |
Introduce typedefs for the member functions of struct vfsops and employ these in the main filesystems. This does not change the resulting code but makes the source a little bit more grepable.
Sponsored by: DARPA and NAI Labs.
|
101744 |
12-Aug-2002 |
rwatson |
Pass IO_NOMACCHECK to vn_rdwr() in the following checks to prevent enforcement of MAC policy on the read or write operations:
- In ext2fs, don't enforce MAC on loop-back reads and writes supporting directory read operations in lookup(), directory modifications in rename(), directory write operations in mkdir(), symlink write operations in symlink().
- In the NFS client locking code, perform vn_rdwr() on the NFS locking socket without enforcing MAC, since the write is done on behalf of the kernel NFS implementation rather than the user process.
- In UFS, don't enforce MAC on loop-back reads and writes supporting directory read operations in lookup(), and symlink write operations in symlink().
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
101364 |
05-Aug-2002 |
jeff |
- Add a missing VI_UNLOCK to an error case in nfs_flush.
|
101308 |
04-Aug-2002 |
jeff |
- Replace v_flag with v_iflag and v_vflag - v_vflag is protected by the vnode lock and is used when synchronization with VOP calls is needed. - v_iflag is protected by interlock and is used for dealing with vnode management issues. These flags include X/O LOCK, FREE, DOOMED, etc. - All accesses to v_iflag and v_vflag have either been locked or marked with mp_fixme's. - Many ASSERT_VOP_LOCKED calls have been added where the locking was not clear. - Many functions in vfs_subr.c were restructured to provide for stronger locking.
Idea stolen from: BSD/OS
|
100450 |
21-Jul-2002 |
alc |
o Lock page queue accesses in nfs_getpages().
|
100194 |
16-Jul-2002 |
dillon |
Fix a bug nfs_write() related to ^C'ing during a file write on an interruptable mount. We were returning from inside the loop without releasing the rslock.
Submitted by: Mike Junk <junk@isilon.com> MFC after: 3 days
|
100175 |
16-Jul-2002 |
jhb |
If we get a receive error in nfs_receive() and then get an error trying to obtain the send lock, we would bogusly try to unlock the send lock before returning resulting in a panic. Instead, only unlock the send lock if nfs_sndlock() succeeds and nfs_reconnect() fails.
MFC after: 3 days Sponsored by: The Weather Channel
|
100134 |
15-Jul-2002 |
alfred |
Add IPv6 support.
Submitted by: Jean-Luc Richier <Jean-Luc.Richier@imag.fr>
|
99797 |
11-Jul-2002 |
dillon |
Convert old style (type foo *)0 casts to NULLs
PR: kern/40360 Requested by: Hiten PAndya via direct email
|
99737 |
10-Jul-2002 |
dillon |
Replace the global buffer hash table with per-vnode splay trees using a methodology similar to the vm_map_entry splay and the VM splay that Alan Cox is working on. Extensive testing has appeared to have shown no increase in overhead.
Disadvantages Dirties more cache lines during lookups.
Not as fast as a hash table lookup (but still N log N and optimal when there is locality of reference).
Advantages vnode->v_dirtyblkhd is now perfectly sorted, making fsync/sync/filesystem syncer operate more efficiently.
I get to rip out all the old hacks (some of which were mine) that tried to keep the v_dirtyblkhd tailq sorted.
The per-vnode splay tree should be easier to lock / SMPng pushdown on vnodes will be easier.
This commit along with another that Alan is working on for the VM page global hash table will allow me to implement ranged fsync(), optimize server-side nfs commit rpcs, and implement partial syncs by the filesystem syncer (aka filesystem syncer would detect that someone is trying to get the vnode lock, remembers its place, and skip to the next vnode).
Note that the buffer cache splay is somewhat more complex then other splays due to special handling of background bitmap writes (multiple buffers with the same lblkno in the same vnode), and B_INVAL discontinuities between the old hash table and the existence of the buffer on the v_cleanblkhd list.
Suggested by: alc
|
98988 |
28-Jun-2002 |
jhb |
In namei(), we use a NULL thread for uio_td when doing a VOP_READLINK(). nfs_readlink() calls nfs_bioread() which passes in uio_td as the thread argument to nfs_getcacheblk(). In nfs_getcacheblk() we dereference the thread pointer to get a process pointer to pass to nfs_sigintr(). This obviously results in a panic. :)
Rather than change nfs_getcacheblk() to check if the thread pointer is NULL when calling nfs_sigintr() like other callers do, change nfs_sigintr() to take a thread as the last argument instead of a process so none of the callers have to care if the thread is NULL or not.
|
97658 |
31-May-2002 |
tanimura |
Back out my lats commit of locking down a socket, it conflicts with hsu's work.
Requested by: hsu
|
97333 |
27-May-2002 |
dd |
Don't tsleep() with an sb_mtx held.
|
97211 |
24-May-2002 |
peter |
Fix warning; deprecated use of label at end of compound statement
|
96972 |
20-May-2002 |
tanimura |
Lock down a socket, milestone 1.
o Add a mutex (sb_mtx) to struct sockbuf. This protects the data in a socket buffer. The mutex in the receive buffer also protects the data in struct socket.
o Determine the lock strategy for each members in struct socket.
o Lock down the following members:
- so_count - so_options - so_linger - so_state
o Remove *_locked() socket APIs. Make the following socket APIs touching the members above now require a locked socket:
- sodisconnect() - soisconnected() - soisconnecting() - soisdisconnected() - soisdisconnecting() - sofree() - soref() - sorele() - sorwakeup() - sotryfree() - sowakeup() - sowwakeup()
Reviewed by: alfred
|
96825 |
17-May-2002 |
ambrisko |
Add TAG_VENDOR_INDENTIFIER (option 60) to our DHCP request done by the kernel BOOTP option. The format will be: FreeBSD:<MACHINE>:<osrelease> this way people can tune their DHCP server to server up root file systems via the OS, machine type and version.
Obtained from: NetBSD MFC after: 3 weeks
|
96755 |
16-May-2002 |
trhodes |
More s/file system/filesystem/g
|
95662 |
28-Apr-2002 |
phk |
We don't need the arp kludge any more.
|
95590 |
27-Apr-2002 |
iedowse |
Remove the nfs_{lock,unlock,islocked} functions and the associated definitions; they have been unused and #if 0'd out since the Lite/2 merge and we are unlikely to want them in the future.
|
94903 |
17-Apr-2002 |
iedowse |
The recent NFS forced unmount improvements introduced a side-effect where some client operations might be unexpectedly cancelled during an unsuccessful non-forced unmount attempt. This causes problems for amd(8), because it periodically attempts a non-forced unmount to check if the filesystem is still in use.
Fix this by adding a new mountpoint flag MNTK_UNMOUNTF that is set only during the operation of a forced unmount. Use this instead of MNTK_UNMOUNT to trigger the cancellation of hung NFS operations.
Also correct a problem where dounmount() might inadvertently clear the MNTK_UNMOUNT flag.
Reported by: simokawa MFC after: 1 week
|
93593 |
01-Apr-2002 |
jhb |
Change the suser() API to take advantage of td_ucred as well as do a general cleanup of the API. The entire API now consists of two functions similar to the pre-KSE API. The suser() function takes a thread pointer as its only argument. The td_ucred member of this thread must be valid so the only valid thread pointers are curthread and a few kernel threads such as thread0. The suser_cred() function takes a pointer to a struct ucred as its first argument and an integer flag as its second argument. The flag is currently only used for the PRISON_ROOT flag.
Discussed on: smp@
|
92783 |
20-Mar-2002 |
jeff |
Remove references to vm_zone.h and switch over to the new uma API.
|
92219 |
13-Mar-2002 |
luigi |
Add a readonly sysctl variable of type string, kern.bootp_cookie, which is initialized with whatever string a dhcp/bootp server passes as vendor tag 134. There is no standard tag that I know with this information, and no vendor-defined tag that applies to FreeBSD that I could find doing the same thing.
The intended use is to pass information to userland for run-time configuration of a diskless client without having to run a bootp/dhcp client for the third time (after the one in pxeboot/etherboot, and the one in the kernel bootp), also because these clients generally screwup the interface configuration, which is not exactly what you want when you have your disks nfs-mounted.
Manpage update to follow soon.
MFC-after: 3 days
|
91888 |
08-Mar-2002 |
phk |
vhold() our vnode while checking the remote side.
This is belived to be the only place where a soft reference to a vnode is held with no sort of hard reference, consequently this change should allow us to free(9) vnodes from the freelist after properly cleaning them up.
Reviewed by: dillon
|
91460 |
28-Feb-2002 |
peter |
Fix warnings.. bootpc_init() and related.
|
91420 |
27-Feb-2002 |
jhb |
Use thread0.td_ucred instead of proc0.p_ucred. This change is cosmetic and isn't strictly required. However, it lowers the number of false positives found when grep'ing the kernel sources for p_ucred to ensure proper locking.
|
91406 |
27-Feb-2002 |
jhb |
Simple p_ucred -> td_ucred changes to start using the per-thread ucred reference.
|
90373 |
07-Feb-2002 |
peter |
Fix a long line touched in previous commit (but not caused by previous commit)
|
90361 |
07-Feb-2002 |
julian |
Pre-KSE/M3 commit. this is a low-functionality change that changes the kernel to access the main thread of a process via the linked list of threads rather than assuming that it is embedded in the process. It IS still embeded there but remove all teh code that assumes that in preparation for the next commit which will actually move it out.
Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,
|
89407 |
15-Jan-2002 |
peter |
Revise the nfsiod auto tuning code. Now both the upper and lower limits are specifyable by sysctl and are respected.
Submitted by: Maxime Henrion <mux@sneakerz.org>
|
89324 |
14-Jan-2002 |
peter |
Implement vfs.nfs.iodmin (minimum number of nfsiod's) and vfs.nfs.iodmaxidle (idle time before nfsiod's exit). Make it adaptive so that we create nfsiod's on demand and they go away after not being used for a while. The upper limit is NFS_MAXASYNCDAEMON (currently 20). More will be done here, but this is a useful checkpoint.
Submitted by: Maxime Henrion <mux@qualys.com>
|
89174 |
10-Jan-2002 |
iedowse |
Terminate requests in nfs_sigintr() if the filesystem is in the process of being unmounted. This allows forced NFS unmounts to complete even if there are processes stuck holding the mnt_lock while the server is down. The mechanism is not ideal in that there is a small chance we might accidentally cancel requests during a failed non-forced unmount attempt on that filesystem, but this is not really a big problem.
Also, move the tsleep() in nfs_nmcancelreqs() so that we do not sleep in the case where there are no requests to be cancelled.
|
88796 |
02-Jan-2002 |
iedowse |
Permit NFS filesystems to be forcibly unmounted when the server is down, even if there are hung processes and the mount is non- interruptible.
This works by having nfs_unmount call a new function nfs_nmcancelreqs() in the FORCECLOSE case. It scans the list of outstanding requests and marks as interrupted any requests belonging to the specified mount. Then it waits up to 30 seconds for all requests to terminate. A few other changes are necessary to support this: - Unconditionally set a socket timeout so that even hard mounts are guaranteed to occasionally check the R_SOFTTERM flag on requests. For hard mounts this flag can only be set by nfs_nmcancelreqs(). - Reject requests on a mount that is currently being unmounted. - Never grant the receive lock to a request that has been cancelled.
This should also avoid an old problem where a forced NFS unmount could cause a crash; it occurred when a VOP on an unlocked vnode (usually VOP_GETATTR) was in progress at the time of the forced unmount.
|
88777 |
01-Jan-2002 |
alc |
o Remove an errant ';' introduced in the last revision. o Remove an unused variable.
|
88770 |
01-Jan-2002 |
rwatson |
o Remove premature use of nmp->nm_cred, it hasn't been initialized yet.
|
88746 |
31-Dec-2001 |
rwatson |
o Pass td into nfs_mountroot() to eliminate an XXX'd curthread use. Since it's in the parent function anyway, might as well pass it another layer down.
Obtained from: TrustedBSD Project
|
88745 |
31-Dec-2001 |
rwatson |
o Remove premature leakage of use of td_ucred from base source tree: instead, use td->td_proc->p_ucred.
|
88743 |
31-Dec-2001 |
rwatson |
o Add missing #include's of sys/proc.h, missed in merge, required to dereference td->td_proc->p_ucred.
|
88739 |
31-Dec-2001 |
rwatson |
o Make the credential used by socreate() an explicit argument to socreate(), rather than getting it implicitly from the thread argument.
o Make NFS cache the credential provided at mount-time, and use the cached credential (nfsmount->nm_cred) when making calls to socreate() on initially connecting, or reconnecting the socket.
This fixes bugs involving NFS over TCP and ipfw uid/gid rules, as well as bugs involving NFS and mandatory access control implementations.
Reviewed by: freebsd-arch
|
88712 |
30-Dec-2001 |
iedowse |
Add a #define for the size of the nfs_backoff[] array, and use this instead of magic constants in the code.
|
88680 |
30-Dec-2001 |
ambrisko |
Increase the buffer size to hold a bootp/DHCP reply from 256 bytes to 1222 bytes (derived as the maximum that isc-dhcpd uses). This solves the problem if a bootp/DHCP reply is over 256 bytes in which the end of the bootp/DHCP reply will not be found and then the reply will be ignored. This happens when swap and root paths are longish or many parameters are set.
Reviewed by: imp Approved by: imp
|
88541 |
27-Dec-2001 |
dillon |
nfs_nget() does no locking whatsoever when looking up a vnode. If the vget() sleeps we have to retry the operation to avoid racing against a deletion.
MFC maybe: submitted to re's
|
88091 |
18-Dec-2001 |
iedowse |
Avoid passing the variable `tl' to functions that just use it for temporary storage. In the old NFS code it wasn't at all clear if the value of `tl' was used across or after macro calls, but I'm fairly confident that the convention was to keep its use local. Each ex-macro function now uses a local version of this variable, so all of the double-indirection goes away.
The only exception to the `local use' rule for `tl' is nfsm_clget(), which is left unchanged by this commit.
Reviewed by: peter
|
87834 |
14-Dec-2001 |
dillon |
This fixes a large number of bugs in our NFS client side code. A recent commit by Kirk also fixed a softupdates bug that could easily be triggered by server side NFS.
* An edge case with shared R+W mmap()'s and truncate whereby the system would inappropriately clear the dirty bits on still-dirty data. (applicable to all filesystems)
THIS FIX TEMPORARILY DISABLED PENDING FURTHER TESTING. see vm/vm_page.c line 1641
* The straddle case for VM pages and buffer cache buffers when truncating. (applicable to NFS client side)
* Possible SMP database corruption due to vm_pager_unmap_page() not clearing the TLB for the other cpu's. (applicable to NFS client side but could effect all filesystems). Note: not considered serious since the corruption occurs beyond the file EOF.
* When flusing a dirty buffer due to B_CACHE getting cleared, we were accidently setting B_CACHE again (that is, bwrite() sets B_CACHE), when we really want it to stay clear after the write is complete. This resulted in a corrupt buffer. (applicable to all filesystems but probably only triggered by NFS)
* We have to call vtruncbuf() when ftruncate()ing to remove any buffer cache buffers. This is still tentitive, I may be able to remove it due to the second bug fix. (applicable to NFS client side)
* vnode_pager_setsize() race against nfs_vinvalbuf()... we have to set n_size before calling nfs_vinvalbuf or the NFS code may recursively vnode_pager_setsize() to the original value before the truncate. This is what was causing the user mmap bus faults in the nfs tester program. (applicable to NFS client side)
* Fix to softupdates (see ufs/ffs/ffs_inode.c 1.73, commit made by Kirk).
Testing program written by: Avadis Tevanian, Jr. Testing program supplied by: jkh / Apple (see Dec2001 posting to freebsd-hackers with Subject 'NFS: How to make FreeBS fall on its face in one easy step') MFC after: 1 week
|
86363 |
14-Nov-2001 |
rwatson |
o Modify nfslockdans() to accept a thread reference instead of a proc reference: with td->td_ucred, it will be desirable to authorize based on td->td_ucred, rather than p->p_ucred. o Since the same variable 'p' was later used with pfind() on the target process for the wakeup, introduce a new local variable 'targetp' to use instead.
Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs
|
86284 |
12-Nov-2001 |
alfred |
Allow users to use the 'nolockd' or -L options with mount_nfs in order to avoid the need for rpc.lockd to perform client locks. Using this option a user can revert back to using local locks for NFS mounts like we did before we had rpc.lockd.
|
86278 |
11-Nov-2001 |
alfred |
turn vn_open() into a wrapper around vn_open_cred() which allows one to perform a vn_open using temporary/other/fake credentials.
Modify the nfs client side locking code to use vn_open_cred() passing proc0's ucred instead of the old way which was to temporary raise privs while running vn_open(). This should close the race hopefully.
|
86089 |
05-Nov-2001 |
dillon |
Implement IO_NOWDRAIN and B_NOWDRAIN - prevents the buffer cache from blocking in wdrain during a write. This flag needs to be used in devices whos strategy routines turn-around and issue another high level I/O, such as when MD turns around and issues a VOP_WRITE to vnode backing store, in order to avoid deadlocking the dirty buffer draining code.
Remove a vprintf() warning from MD when the backing vnode is found to be in-use. The syncer of buf_daemon could be flushing the backing vnode at the time of an MD operation so the warning is not correct.
MFC after: 1 week
|
85398 |
24-Oct-2001 |
rwatson |
o Note an additional potential problem here: LOCKD_MSG directly exports struct ucred to userland. In 5.0-CURRENT, it is desirable to instead export struct xucred, as ucred contains mutexes, pointers, and other kernel evil. I'll add it to my work queue.
|
85370 |
23-Oct-2001 |
rwatson |
o Add two comments identifying problems with the current nfs_lock.c implementation, so that the information doesn't get lost. (1) /var/run/lock is looked up relative to the current thread's root directory, but it's not clear that's desirable. (2) A race condition associated with live credential modification on a shared credential is present when privilege is granted for the purposes of talking to /var/run/lock.
|
85339 |
23-Oct-2001 |
dillon |
Change the vnode list under the mount point from a LIST to a TAILQ in preparation for an implementation of limiting code for kern.maxvnodes.
MFC after: 3 days
|
84827 |
11-Oct-2001 |
jhb |
Change the kernel's ucred API as follows: - crhold() returns a reference to the ucred whose refcount it bumps. - crcopy() now simply copies the credentials from one credential to another and has no return value. - a new crshared() primitive is added which returns true if a ucred's refcount is > 1 and false (0) otherwise.
|
84726 |
09-Oct-2001 |
jhb |
Use crhold() instead of crdup() since we aren't modifying the cred but just need to ensure it remains immutable.
|
84700 |
09-Oct-2001 |
peter |
Make this compile after last commit. It should be: "td ? td->td_proc : NULL", not "td ? td->td_proc, NULL"
|
84690 |
08-Oct-2001 |
julian |
Don't dereference td if it's NULL.
Submitted by: Alexander N. Kabaev <ak03@gte.com>
|
84079 |
28-Sep-2001 |
peter |
Unwind some more macros. NFSMADV() was kinda silly since it was right next to equivalent m_len adjustments. Move the nfsm_subs.h macros into groups depending on which phase they are used in, since that affects the error recovery requirements. Collect some of the common error checking into a single macro as preparation for unwinding some more. Have nfs_rephead return a value instead of secretly modifying args. Remove some unused function arguments that were being passed around. Clarify nfsm_reply()'s error handling (I hope).
|
84057 |
27-Sep-2001 |
peter |
Make nfsm_dissect() have an obvious return value.
|
84002 |
27-Sep-2001 |
peter |
Tidy up nfsm_build usage. This is only partially finished.
|
83914 |
25-Sep-2001 |
iedowse |
Add a missing dereference level. This caused nfsm_postop_attr_xx() to try and extract node attributes from an RPC reply even if none were present.
Reviewed by: peter
|
83697 |
20-Sep-2001 |
peter |
Add the magic marker so that loader and kldload(2) can find this in module form automagically.
|
83694 |
20-Sep-2001 |
peter |
Oops. Fix a missing indirection level. gcc didn't complain about it on x86, but did complain about it on alpha (since int and pointer are different sizes)
|
83654 |
18-Sep-2001 |
peter |
Sigh, Last minute pre-merge typo. (missing quotes)
|
83651 |
18-Sep-2001 |
peter |
Cleanup and split of nfs client and server code. This builds on the top of several repo-copies.
|
83629 |
18-Sep-2001 |
imp |
nfs_strategy calls nfs_asyncio with td as NULL. So add a bandaid that will pass NULL as the struct proc when td is NULL. This has stopped crashing on my machine.
Note: The passing of NULL may be bogus, but I'll let others fix that problem.
Reviewed by: jhb
|
83493 |
15-Sep-2001 |
peter |
Sync some differences that were different between the copies of the files that were in nfs/nfs.h and nfsserver/nfs.h in the p4 tree.
|
83366 |
12-Sep-2001 |
julian |
KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process.
Sorry john! (your next MFC will be a doosie!)
Reviewed by: peter@freebsd.org, dillon@freebsd.org
X-MFC after: ha ha ha ha
|
83291 |
10-Sep-2001 |
kris |
Fix some signed/unsigned integer confusion, and add bounds checking of arguments to some functions.
Obtained from: NetBSD Reviewed by: peter MFC after: 2 weeks
|
82702 |
31-Aug-2001 |
dillon |
Pushdown Giant for nfs syscalls (nfssvc())
|
82213 |
23-Aug-2001 |
ache |
Stupid error from my side in prev. commit: || -> &&
|
82204 |
23-Aug-2001 |
ache |
Implement l_len<0 per POSIX check. Check for valid l_whence too.
|
82194 |
23-Aug-2001 |
ache |
Even better move: suppose that server is able to handle SEEK_END, so check arguments for all but not SEEK_END case, leaving SEEK_END handling for server
|
82193 |
23-Aug-2001 |
ache |
Apparently SEEK_END locking not supported by NFS. Previous variant returns EINVAL in that case, change it to EOPNOTSUPP.
|
82190 |
23-Aug-2001 |
ache |
Move <machine/*> after <sys/*>
Pointed by: bde
|
82174 |
23-Aug-2001 |
ache |
adv. lock: detect off_t overflow _before_ it occurse and return EOVERFLOW instead of EINVAL
|
80891 |
01-Aug-2001 |
iedowse |
Fix a client-side memory leak in nfs_flush(). The code allocates a temporary array to store struct buf pointers if the list doesn't fit in a local array. Usually it frees the array when finished, but if it jumps to the 'again' label and the new list does fit in the local array then it can forget to free a previously malloc'd M_TEMP memory.
Move the free() up a line so that it frees any previously allocated memory whether or not it needs to malloc a new array.
Reviewed by: dillon
|
80672 |
30-Jul-2001 |
peter |
Check the filehandle size when mounting.
Obtained from: Constantine Sapuntzakis <csapuntz@openbsd.org>
|
79247 |
04-Jul-2001 |
jhb |
- Sort includes. - Update vmmeter statistics for vnode pagein/pageouts in getpages/putpages.
|
79224 |
04-Jul-2001 |
dillon |
With Alfred's permission, remove vm_mtx in favor of a fine-grained approach (this commit is just the first stage). Also add various GIANT_ macros to formalize the removal of Giant, making it easy to test in a more piecemeal fashion. These macros will allow us to test fine-grained locks to a degree before removing Giant, and also after, and to remove Giant in a piecemeal fashion via sysctl's on those subsystems which the authors believe can operate without Giant.
|
78911 |
28-Jun-2001 |
jhb |
- Protect the mnt_vnode list with the mntvnode lock. - Use queue(9) macros.
|
77563 |
01-Jun-2001 |
jake |
Unlock the process returned from pfind() if it does not return NULL. This fixes a witness lock violation for nfssvc returning with locks held.
Submitted by: Jean-Luc Richier <Jean-Luc.Richier@imag.fr> PR: kern/27776
|
77183 |
25-May-2001 |
rwatson |
o Merge contents of struct pcred into struct ucred. Specifically, add the real uid, saved uid, real gid, and saved gid to ucred, as well as the pcred->pc_uidinfo, which was associated with the real uid, only rename it to cr_ruidinfo so as not to conflict with cr_uidinfo, which corresponds to the effective uid. o Remove p_cred from struct proc; add p_ucred to struct proc, replacing original macro that pointed. p->p_ucred to p->p_cred->pc_ucred. o Universally update code so that it makes use of ucred instead of pcred, p->p_ucred instead of p->p_pcred, cr_ruidinfo instead of p_uidinfo, cr_{r,sv}{u,g}id instead of p_*, etc. o Remove pcred0 and its initialization from init_main.c; initialize cr_ruidinfo there. o Restruction many credential modification chunks to always crdup while we figure out locking and optimizations; generally speaking, this means moving to a structure like this: newcred = crdup(oldcred); ... p->p_ucred = newcred; crfree(oldcred); It's not race-free, but better than nothing. There are also races in sys_process.c, all inter-process authorization, fork, exec, and exit. o Remove sigio->sio_ruid since sigio->sio_ucred now contains the ruid; remove comments indicating that the old arrangement was a problem. o Restructure exec1() a little to use newcred/oldcred arrangement, and use improved uid management primitives. o Clean up exit1() so as to do less work in credential cleanup due to pcred removal. o Clean up fork1() so as to do less work in credential cleanup and allocation. o Clean up ktrcanset() to take into account changes, and move to using suser_xxx() instead of performing a direct uid==0 comparision. o Improve commenting in various kern_prot.c credential modification calls to better document current behavior. In a couple of places, current behavior is a little questionable and we need to check POSIX.1 to make sure it's "right". More commenting work still remains to be done. o Update credential management calls, such as crfree(), to take into account new ruidinfo reference. o Modify or add the following uid and gid helper routines: change_euid() change_egid() change_ruid() change_rgid() change_svuid() change_svgid() In each case, the call now acts on a credential not a process, and as such no longer requires more complicated process locking/etc. They now assume the caller will do any necessary allocation of an exclusive credential reference. Each is commented to document its reference requirements. o CANSIGIO() is simplified to require only credentials, not processes and pcreds. o Remove lots of (p_pcred==NULL) checks. o Add an XXX to authorization code in nfs_lock.c, since it's questionable, and needs to be considered carefully. o Simplify posix4 authorization code to require only credentials, not processes and pcreds. Note that this authorization, as well as CANSIGIO(), needs to be updated to use the p_cansignal() and p_cansched() centralized authorization routines, as they currently do not take into account some desirable restrictions that are handled by the centralized routines, as well as being inconsistent with other similar authorization instances. o Update libkvm to take these changes into account.
Obtained from: TrustedBSD Project Reviewed by: green, bde, jhb, freebsd-arch, freebsd-audit
|
77086 |
23-May-2001 |
jhb |
Assert Giant is held by the caller rather than getting it and releasing it in getpages/putpages.
|
77031 |
23-May-2001 |
ru |
- FDESC, FIFO, NULL, PORTAL, PROC, UMAP and UNION file systems were repo-copied from sys/miscfs to sys/fs.
- Renamed the following file systems and their modules: fdesc -> fdescfs, portal -> portalfs, union -> unionfs.
- Renamed corresponding kernel options: FDESC -> FDESCFS, PORTAL -> PORTALFS, UNION -> UNIONFS.
- Install header files for the above file systems.
- Removed bogus -I${.CURDIR}/../../sys CFLAGS from userland Makefiles.
|
76827 |
19-May-2001 |
alfred |
Introduce a global lock for the vm subsystem (vm_mtx).
vm_mtx does not recurse and is required for most low level vm operations.
faults can not be taken without holding Giant.
Memory subsystems can now call the base page allocators safely.
Almost all atomic ops were removed as they are covered under the vm mutex.
Alpha and ia64 now need to catch up to i386's trap handlers.
FFS and NFS have been tested, other filesystems will need minor changes (grabbing the vm lock when twiddling page properties).
Reviewed (partially) by: jake, jhb
|
76688 |
16-May-2001 |
iedowse |
Change the second argument of vflush() to an integer that specifies the number of references on the filesystem root vnode to be both expected and released. Many filesystems hold an extra reference on the filesystem root vnode, which must be accounted for when determining if the filesystem is busy and then released if it isn't busy. The old `skipvp' approach required individual filesystem xxx_unmount functions to re-implement much of vflush()'s logic to deal with the root vnode.
All 9 filesystems that hold an extra reference on the root vnode got the logic wrong in the case of forced unmounts, so `umount -f' would always fail if there were any extra root vnode references. Fix this issue centrally in vflush(), now that we can.
This commit also fixes a vnode reference leak in devfs, which could result in idle devfs filesystems that refuse to unmount.
Reviewed by: phk, bp
|
76166 |
01-May-2001 |
markm |
Undo part of the tangle of having sys/lock.h and sys/mutex.h included in other "system" header files.
Also help the deprecation of lockmgr.h by making it a sub-include of sys/lock.h and removing sys/lockmgr.h form kernel .c files.
Sort sys/*.h includes where possible in affected files.
OK'ed by: bde (with reservations)
|
76131 |
29-Apr-2001 |
phk |
Add a vop_stdbmap(), and make it part of the default vop vector.
Make 7 filesystems which don't really know about VOP_BMAP rely on the default vector, rather than more or less complete local vop_nopbmap() implementations.
|
76118 |
29-Apr-2001 |
alfred |
Remove incorrect comment.
Submitted by: quinot@inf.enst.fr <quinot@inf.enst.fr> PR: kern/26893
|
76117 |
29-Apr-2001 |
grog |
Revert consequences of changes to mount.h, part 2.
Requested by: bde
|
75858 |
23-Apr-2001 |
grog |
Correct #includes to work with fixed sys/mount.h.
|
75692 |
19-Apr-2001 |
alfred |
vnode_pager_freepage() is really vm_page_free() in disguise, nuke vnode_pager_freepage() and replace all calls to it with vm_page_free()
|
75631 |
17-Apr-2001 |
alfred |
Implement client side NFS locks.
Obtained from: BSD/os Import Ok'd by: mckusick, jkh, motd on builder.freebsd.org
|
75580 |
17-Apr-2001 |
phk |
This patch removes the VOP_BWRITE() vector.
VOP_BWRITE() was a hack which made it possible for NFS client side to use struct buf with non-bio backing.
This patch takes a more general approach and adds a bp->b_op vector where more methods can be added.
The success of this patch depends on bp->b_op being initialized all relevant places for some value of "relevant" which is not easy to determine. For now the buffers have grown a b_magic element which will make such issues a tiny bit easier to debug.
|
75402 |
11-Apr-2001 |
peter |
Create debug.hashstat.[raw]nchash and debug.hashstat.[raw]nfsnode to enable easy access to the hash chain stats. The raw prefixed versions dump an integer array to userland with the chain lengths. This cheats and calls it an array of 'struct int' rather than 'int' or sysctl -a faithfully dumps out the 128K array on an average machine. The non-raw versions return 4 integers: count, number of chains used, maximum chain length, and percentage utilization (fixed point, multiplied by 100). The raw forms are more useful for analyzing the hash distribution, while the other form can be read easily by humans and stats loggers.
|
75218 |
05-Apr-2001 |
rwatson |
o Rather than arbitrarily construct a credential in the nfs_statfs() VFS operation, make use of the calling process's credential. This solution may not be ideal (there are a number of other possible proposals, including making use of the proc0 credential, adding a credential argument to the VFSOP, and switching from a hard-coded ucred to a hard-coded nfscred), it is simple and appears to work. The arguments against using simply crget() are fairly strong: it is the only place in the code (other than a nearly identical invocation in ncp) where crget() is invoked, other than in the process credential creation code; as ucred becomes extensible, this use of crget() without appropriate context results in less and less meaningful credential data. The implementation here will probably be tweaked as a result of experimentation and further exploration of the requirements. In the mean-time, it allows progress to be made in ucred expansion for new security models without causing a crash every time df is used on an NFS mounted file system.
This code has been interop tested against FreeBSD and Solaris NFS servers. While using the process credentials should not introduce interop problems, please let me know if any turn out to exist.
Reviewed by: freebsd-arch
|
74501 |
20-Mar-2001 |
peter |
Use the same API as the example code. Allow the initial hash value to be passed in, as the examples do. Incrementally hash in the dvp->v_id (using the official api) rather than add it. This seems to help power-of-two predictable filename trees where the filenames repeat on a power-of-two cycle and the directory trees have power-of-two components in it. The simple add then mask was causing things like 12000+ entry collision chains while most other entries have between 0 and 3 entries each. This way seems to improve things.
|
74384 |
17-Mar-2001 |
peter |
Use a generic implementation of the Fowler/Noll/Vo hash (FNV hash). Make the name cache hash as well as the nfsnode hash use it.
As a special tweak, create an unsigned version of register_t. This allows us to use a special tweak for the 64 bit versions that significantly speeds up the i386 version (ie: int64 XOR int64 is slower than int64 XOR int32).
The code layout is a little strange for the string function, but I was able to get between 5 to 10% improvement over the original version I started with. The layout affects gcc code generation choices and this way was fastest on x86 and alpha.
Note that 'CPUTYPE=p3' etc makes a fair difference to this. It is around 45% faster with -march=pentiumpro on a p6 cpu.
|
74381 |
17-Mar-2001 |
peter |
Dramatically improve the **lame** nfs_hash(). This is based on the Fowler / Noll / Vo Hash (http://www.isthe.com/chongo/tech/comp/fnv/).
This improves hash coverage a *massive* amount. We were seeing one set of machines that were using 0.84% of their 131072 entry nfsnode hash buckets with maximum chain lengths of up to ~500 entries. The machine was spending nearly 100% of its time in 'system'. A test with this has pushed the coverage from a few perCent up to 91% utilization with a max chain length of 11.
Submitted by: David Filo
|
73929 |
07-Mar-2001 |
jhb |
Grab the process lock while calling psignal and before calling psignal.
|
73286 |
01-Mar-2001 |
adrian |
Reviewed by: jlemon
An initial tidyup of the mount() syscall and VFS mount code.
This code replaces the earlier work done by jlemon in an attempt to make linux_mount() work.
* the guts of the mount work has been moved into vfs_mount().
* move `type', `path' and `flags' from being userland variables into being kernel variables in vfs_mount(). `data' remains a pointer into userspace.
* Attempt to verify the `type' and `path' strings passed to vfs_mount() aren't too long.
* rework mount() and linux_mount() to take the userland parameters (besides data, as mentioned) and pass kernel variables to vfs_mount(). (linux_mount() already did this, I've just tidied it up a little more.)
* remove the copyin*() stuff for `path'. `data' still requires copyin*() since its a pointer into userland.
* set `mount->mnt_statf_mntonname' in vfs_mount() rather than in each filesystem. This variable is generally initialised with `path', and each filesystem can override it if they want to.
* NOTE: f_mntonname is intiailised with "/" in the case of a root mount.
|
73211 |
28-Feb-2001 |
dillon |
Fix lockup for loopback NFS mounts. The pipelined I/O limitations could be hit on the client side and prevent the server side from retiring writes. Pipeline operations turned off for all READs (no big loss since reads are usually synchronous) and for NFS writes, and left on for the default bwrite(). (MFC expected prior to 4.3 freeze)
Testing by: mjacob, dillon
|
72650 |
18-Feb-2001 |
green |
Switch to using a struct xucred instead of a struct xucred when not actually in the kernel. This structure is a different size than what is currently in -CURRENT, but should hopefully be the last time any application breakage is caused there. As soon as any major inconveniences are removed, the definition of the in-kernel struct ucred should be conditionalized upon defined(_KERNEL).
This also changes struct export_args to remove dependency on the constantly-changing struct ucred, as well as limiting the bounds of the size fields to the correct size. This means: a) mountd and friends won't break all the time, b) mountd and friends won't crash the kernel all the time if they don't know what they're doing wrt actual struct export_args layout.
Reviewed by: bde
|
71915 |
02-Feb-2001 |
tegge |
Enable use of DHCP extensions. Reviewed by: Per Kristian Hove <Per.Hove@math.ntnu.no>
|
70674 |
04-Jan-2001 |
dillon |
NFS O_EXCL file create semantics temporarily uses file attributes to store the file verifier. The NFS client is supposed to do a SETATTR after a successful O_EXCL open/create to clean up the attributes. FreeBSD's client code was generating a SETATTR rpc but was not generating an access or modification time update within that rpc, leaving the file with a broken access time that solaris chokes on (and it doesn't look very nice when you ls -lua under FreeBSD either!). Fixed.
|
70254 |
21-Dec-2000 |
bmilekic |
* Rename M_WAIT mbuf subsystem flag to M_TRYWAIT. This is because calls with M_WAIT (now M_TRYWAIT) may not wait forever when nothing is available for allocation, and may end up returning NULL. Hopefully we now communicate more of the right thing to developers and make it very clear that it's necessary to check whether calls with M_(TRY)WAIT also resulted in a failed allocation. M_TRYWAIT basically means "try harder, block if necessary, but don't necessarily wait forever." The time spent blocking is tunable with the kern.ipc.mbuf_wait sysctl. M_WAIT is now deprecated but still defined for the next little while.
* Fix a typo in a comment in mbuf.h
* Fix some code that was actually passing the mbuf subsystem's M_WAIT to malloc(). Made it pass M_WAITOK instead. If we were ever to redefine the value of the M_WAIT flag, this could have became a big problem.
|
69781 |
08-Dec-2000 |
dwmalone |
Convert more malloc+bzero to malloc+M_ZERO.
Submitted by: josh@zipperup.org Submitted by: Robert Drehmel <robd@gmx.net>
|
69214 |
26-Nov-2000 |
phk |
Simplify the tprintf() API.
Loose the special <sys/tprintf.h> #include file.
|
68883 |
18-Nov-2000 |
dillon |
This patchset fixes a large number of file descriptor race conditions. Pre-rfork code assumed inherent locking of a process's file descriptor array. However, with the advent of rfork() the file descriptor table could be shared between processes. This patch closes over a dozen serious race conditions related to one thread manipulating the table (e.g. closing or dup()ing a descriptor) while another is blocked in an open(), close(), fcntl(), read(), write(), etc...
PR: kern/11629 Discussed with: Alexander Viro <viro@math.psu.edu>
|
68711 |
14-Nov-2000 |
mckusick |
In preparation for deprecating CIRCLEQ macros in favor of TAILQ macros which provide the same functionality and are a bit more efficient, convert use of CIRCLEQ's in NFS to TAILQ's.
|
68186 |
01-Nov-2000 |
eivind |
Give vop_mmap an untimely death. The opportunity to give it a timely death timed out in 1996.
|
67882 |
29-Oct-2000 |
phk |
Remove unneeded #include <sys/proc.h> lines.
|
67834 |
29-Oct-2000 |
tegge |
Reduce kernel stack usage by not having large packets on the stack. Supply correct size parameter to dhcpd. Replace some magic numbers with macro names. Handle more than one interface.
|
67534 |
24-Oct-2000 |
tegge |
Eliminate some bitrot (nonexisting member variable names). Don't use curproc when a proc pointer is available.
|
67531 |
24-Oct-2000 |
tegge |
Style fixes.
|
67529 |
24-Oct-2000 |
tegge |
Make RPC timeout message more readable. Supply proc pointer to sosend.
|
67486 |
24-Oct-2000 |
dwmalone |
Problem to avoid processes getting stuck in "vmopar". From Ian's mail:
The problem seems to originate with NFS's postop_attr information that is returned with a read or write RPC. Within a vm_fault context, the code cannot deal with vnode_pager_setsize() shrinking a vnode.
The workaround in the patch below stops the nfsm_postop_attr() macro from ever shrinking a vnode. If the new size in the postop_attr information is smaller, then it just sets the nfsnode n_attrstamp to 0 to stop the wrong size getting used in the future. This change only affects postop_attr attributes; the nfsm_loadattr() macro works as normal.
The change is implemented by adding a new argument to nfs_loadattrcache() called 'dontshrink'. When this is non-zero, nfs_loadattrcache() will never reduce the vnode/nfsnode size; instead it zeros n_attrstamp.
There remain other was processes can get stuck in vmopar.
Submitted by: Ian Dowse <iedowse@maths.tcd.ie> Reviewed by: dillon Tested by: Vadim Belman <voland@lflat.org>
|
67152 |
15-Oct-2000 |
bp |
Make nfs PDIRUNLOCK aware. Now it is possible to use nullfs mounts on top of nfs mounts, but there can be side effects because nfs uses shared locks for vnodes.
|
67151 |
15-Oct-2000 |
bp |
Add missed vop_stdunlock() for fifo's vnops (this affects only v2 mounts).
Give nfs's node lock its own name.
|
66615 |
04-Oct-2000 |
jasone |
Convert lockmgr locks from using simple locks to using mutexes.
Add lockdestroy() and appropriate invocations, which corresponds to lockinit() and must be called to clean up after a lockmgr lock is no longer needed.
|
66355 |
25-Sep-2000 |
bp |
Add a lock structure to vnode structure. Previously it was either allocated separately (nfs, cd9660 etc) or keept as a first element of structure referenced by v_data pointer(ffs). Such organization leads to known problems with stacked filesystems.
From this point vop_no*lock*() functions maintain only interlock lock. vop_std*lock*() functions maintain built-in v_lock structure using lockmgr(). vop_sharedlock() is compatible with vop_stdunlock(), but maintains a shared lock on vnode.
If filesystem wishes to export lockmgr compatible lock, it can put an address of this lock to v_vnlock field. This indicates that the upper filesystem can take advantage of it and use single lock structure for entire (or part) of stack of vnodes. This field shouldn't be examined or modified by VFS code except for initialization purposes.
Reviewed in general by: mckusick
|
65497 |
05-Sep-2000 |
msmith |
Don't scan for the "right" network interface by shooting in the dark. Assume that the nfs_diskless structure is correctly set up; the provider ought to be getting it right.
|
63788 |
24-Jul-2000 |
mckusick |
This patch corrects the first round of panics and hangs reported with the new snapshot code.
Update addaliasu to correctly implement the semantics of the old checkalias function. When a device vnode first comes into existence, check to see if an anonymous vnode for the same device was created at boot time by bdevvp(). If so, adopt the bdevvp vnode rather than creating a new vnode for the device. This corrects a problem which caused the kernel to panic when taking a snapshot of the root filesystem.
Change the calling convention of vn_write_suspend_wait() to be the same as vn_start_write().
Split out softdep_flushworklist() from softdep_flushfiles() so that it can be used to clear the work queue when suspending filesystem operations.
Access to buffers becomes recursive so that snapshots can recursively traverse their indirect blocks using ffs_copyonwrite() when checking for the need for copy on write when flushing one of their own indirect blocks. This eliminates a deadlock between the syncer daemon and a process taking a snapshot.
Ensure that softdep_process_worklist() can never block because of a snapshot being taken. This eliminates a problem with buffer starvation.
Cleanup change in ffs_sync() which did not synchronously wait when MNT_WAIT was specified. The result was an unclean filesystem panic when doing forcible unmount with heavy filesystem I/O in progress.
Return a zero'ed block when reading a block that was not in use at the time that a snapshot was taken. Normally, these blocks should never be read. However, the readahead code will occationally read them which can cause unexpected behavior.
Clean up the debugging code that ensures that no blocks be written on a filesystem while it is suspended. Snapshots must explicitly label the blocks that they are writing during the suspension so that they do not cause a `write on suspended filesystem' panic.
Reorganize ffs_copyonwrite() to eliminate a deadlock and also to prevent a race condition that would permit the same block to be copied twice. This change eliminates an unexpected soft updates inconsistency in fsck caused by the double allocation.
Use bqrelse rather than brelse for buffers that will be needed soon again by the snapshot code. This improves snapshot performance.
|
61619 |
13-Jun-2000 |
ps |
Correctly set the Maximum DHCP Message Size. bootpd now works again as well as ISC dhcpd.
|
60938 |
26-May-2000 |
jake |
Back out the previous change to the queue(3) interface. It was not discussed and should probably not happen.
Requested by: msmith and others
|
60833 |
23-May-2000 |
jake |
Change the way that the queue(3) structures are declared; don't assume that the type argument to *_HEAD and *_ENTRY is a struct.
Suggested by: phk Reviewed by: phk Approved by: mdodd
|
60141 |
07-May-2000 |
phk |
Include a RFC 1533 "Maximum DHCP Message Size" option in our request.
ISC DHCP will limit the reply length to 64 bytes for bootp replies unless we explicitly tell it we can do more. We tell it that we can do 1200 bytes.
|
60041 |
05-May-2000 |
phk |
Separate the struct bio related stuff out of <sys/buf.h> into <sys/bio.h>.
<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall not be made a nested include according to bdes teachings on the subject of nested includes.
Diskdrivers and similar stuff below specfs::strategy() should no longer need to include <sys/buf.> unless they need caching of data.
Still a few bogus uses of struct buf to track down.
Repocopy by: peter
|
59794 |
30-Apr-2000 |
phk |
Remove unneeded #include <vm/vm_zone.h>
Generated by: src/tools/tools/kerninclude
|
59762 |
29-Apr-2000 |
phk |
s/biowait/bufwait/g
Prodded by: several.
|
59391 |
19-Apr-2000 |
phk |
Remove ~25 unneeded #include <sys/conf.h> Remove ~60 unneeded #include <sys/malloc.h>
|
59249 |
15-Apr-2000 |
phk |
Complete the bio/buf divorce for all code below devfs::strategy
Exceptions: Vinum untouched. This means that it cannot be compiled. Greg Lehey is on the case.
CCD not converted yet, casts to struct buf (still safe)
atapi-cd casts to struct buf to examine B_PHYS
|
58934 |
02-Apr-2000 |
phk |
Move B_ERROR flag to b_ioflags and call it BIO_ERROR.
(Much of this done by script)
Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED.
Move b_pblkno and b_iodone_chain to struct bio while we transition, they will be obsoleted once bio structs chain/stack.
Add bio_queue field for struct bio aware disksort.
Address a lot of stylistic issues brought up by bde.
|
58710 |
27-Mar-2000 |
dillon |
Add a sysctl to specify the amount of UDP receive space NFS should reserve, in maximal NFS packets. Originally only 2 packets worth of space was reserved. The default is now 4, which appears to greatly improve performance for slow to mid-speed machines on gigabit networks.
Add documentation and correct some prior documentation.
Problem Researched by: Andrew Gallatin <gallatin@cs.duke.edu> Approved by: jkh
|
58349 |
20-Mar-2000 |
phk |
Rename the existing BUF_STRATEGY() to DEV_STRATEGY()
substitute BUF_WRITE(foo) for VOP_BWRITE(foo->b_vp, foo)
substitute BUF_STRATEGY(foo) for VOP_STRATEGY(foo->b_vp, foo)
This patch is machine generated except for the ccd.c and buf.h parts.
|
58345 |
20-Mar-2000 |
phk |
Remove B_READ, B_WRITE and B_FREEBUF and replace them with a new field in struct buf: b_iocmd. The b_iocmd is enforced to have exactly one bit set.
B_WRITE was bogusly defined as zero giving rise to obvious coding mistakes.
Also eliminate the redundant struct buf flag B_CALL, it can just as efficiently be done by comparing b_iodone to NULL.
Should you get a panic or drop into the debugger, complaining about "b_iocmd", don't continue. It is likely to write on your disk where it should have been reading.
This change is a step in the direction towards a stackable BIO capability.
A lot of this patch were machine generated (Thanks to style(9) compliance!)
Vinum users: Greg has not had time to test this yet, be careful.
|
57178 |
13-Feb-2000 |
peter |
Clean up some loose ends in the network code, including the X.25 and ISO #ifdefs. Clean out unused netisr's and leftover netisr linker set gunk. Tested on x86 and alpha, including world.
Approved by: jkh
|
55934 |
13-Jan-2000 |
dillon |
The alpha build cuases the 'nfsuid bloated' warning to occur. Well, there is nothing we can do about it. In fact, after further review there simply are not very many instances of the two structures NFS checks for 'bloat' so I've decided to simply rip the checks out entirely.
Submitted by: Andrew Gallatin <gallatin@cs.duke.edu>
|
55679 |
09-Jan-2000 |
shin |
tcp updates to support IPv6. also a small patch to sys/nfs/nfs_socket.c, as max_hdr size change.
Reviewed by: freebsd-arch, cvs-committers Obtained from: KAME project
|
55431 |
05-Jan-2000 |
dillon |
Enhance reassignbuf(). When a buffer cannot be time-optimally inserted into vnode dirtyblkhd we append it to the list instead of prepend it to the list in order to maintain a 'forward' locality of reference, which is arguably better then 'reverse'. The original algorithm did things this way to but at a huge time cost.
Enhance the append interlock for NFS writes to handle intr/soft mounts better.
Fix the hysteresis for NFS async daemon I/O requests to reduce the number of unnecessary context switches.
Modify handling of NFS mount options. Any given user option that is too high now defaults to the kernel maximum for that option rather then the kernel default for that option.
Reviewed by: Alfred Perlstein <bright@wintelcom.net>
|
55423 |
05-Jan-2000 |
dillon |
Fix at least one source of the continued 'NFS append race'. close() was calling nfs_flush() and then clearing the NMODIFIED bit. This is not legal since there might still be dirty buffers after the nfs_flush (for example, pending commits). The clearing of this bit in turn prevented a necessary vinvalbuf() from occuring leaving left over dirty buffers even after truncating the file in a new operation. The fix is to simply not clear NMODIFIED.
Also added a sysctl vfs.nfs.nfsv3_commit_on_close which, if set to 1, will cause close() to do a stage 1 write AND a stage 2 commit synchronously. By default only the stage 1 write is done synchronously.
Reviewed by: Alfred Perlstein <bright@wintelcom.net>
|
55206 |
29-Dec-1999 |
peter |
Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL" is an application space macro and the applications are supposed to be free to use it as they please (but cannot). This is consistant with the other BSD's who made this change quite some time ago. More commits to come.
|
54970 |
21-Dec-1999 |
alfred |
make getfh a standard syscall instead of dependant on having NFSSERVER defined, useful for userland fileservers that want to use a filehandle type interface to the filesystem.
Submitted by: Assar Westerlund assar@stacken.kth.se PR: kern/15452
|
54803 |
19-Dec-1999 |
rwatson |
Second pass commit to introduce new ACL and Extended Attribute system calls, vnops, vfsops, both in /kern, and to individual file systems that require a vfsop_ array entry.
Reviewed by: eivind
|
54799 |
19-Dec-1999 |
green |
M_PREPEND-related cleanups (unregisterifying struct mbuf *s).
|
54655 |
15-Dec-1999 |
eivind |
Introduce NDFREE (and remove VOP_ABORTOP)
|
54605 |
14-Dec-1999 |
dillon |
Fix two problems: First, fix the append seek position race that can occur due to np->n_size potentially changing if nfs_getcacheblk() blocks in nfs_write().
Second, under -current we must supply the proper bufsize when obtaining buffers that straddle the EOF, but due to the fact that np->n_size can change out from under us it is possible that we may specify the wrong buffer size and wind up truncating dirty data written by another process.
Both problems are solved by implementing nfs_rslock(), which allows us to lock around sensitive buffer cache operations such as those that occur when appending to a file.
It is believed that this race is responsible for causing dirtyoff/dirtyend and (in stable) validoff/validend to exceed the buffer size. Therefore we have now added a warning printf for the dirtyoff/end case in current.
However, we have introduced a new problem which we need to fix at some point, and that is that soft or intr NFS mounts may become uninterruptable from the point of view of process A which is stuck waiting on rslock while process B is stuck doing the rpc. To unstick process A, process B would have to be interrupted first.
Reviewed by: Alfred Perlstein <bright@wintelcom.net>
|
54536 |
13-Dec-1999 |
dillon |
Fix a timeout deadlock that can occur when the process holding the receive lock hasn't yet managed to send its own request.
PR: kern/15055 Submitted by: Ian Dowse iedowse@maths.tcd.ie
|
54485 |
12-Dec-1999 |
dillon |
Fix a number of server-side issues related to aborting badly formed NFS packets, mainly initializing structure pointers to NULL which are conditionally freed prior to return.
PR: kern/15249 Submitted by: Ian Dowse <iedowse@maths.tcd.ie>
|
54480 |
12-Dec-1999 |
dillon |
Synopsis of problem being fixed: Dan Nelson originally reported that blocks of zeros could wind up in a file written to over NFS by a client. The problem only occurs a few times per several gigabytes of data. This problem turned out to be bug #3 below.
bug #1:
B_CLUSTEROK must be cleared when an NFS buffer is reverted from stage 2 (ready for commit rpc) to stage 1 (ready for write). Reversions can occur when a dirty NFS buffer is redirtied with new data.
Otherwise the VFS/BIO system may end up thinking that a stage 1 NFS buffer is clusterable. Stage 1 NFS buffers are not clusterable.
bug #2:
B_CLUSTEROK was inappropriately set for a 'short' NFS buffer (short buffers only occur near the EOF of the file). Change to only set when the buffer is a full biosize (usually 8K). This bug has no effect but should be fixed in -current anyway. It need not be backported.
bug #3:
B_NEEDCOMMIT was inappropriately set in nfs_flush() (which is typically only called by the update daemon). nfs_flush() does a multi-pass loop but due to the lack of vnode locking it is possible for new buffers to be added to the dirtyblkhd list while a flush operation is going on. This may result in nfs_flush() setting B_NEEDCOMMIT on a buffer which has *NOT* yet gone through its stage 1 write, causing only the commit rpc to be made and thus causing the contents of the buffer to be thrown away (never sent to the server).
The patch also contains some cleanup, which only applies to the commit into -current.
Reviewed by: dg, julian Originally Reported by: Dan Nelson <dnelson@emsphone.com>
|
54444 |
11-Dec-1999 |
eivind |
Lock reporting and assertion changes. * lockstatus() and VOP_ISLOCKED() gets a new process argument and a new return value: LK_EXCLOTHER, when the lock is held exclusively by another process. * The ASSERT_VOP_(UN)LOCKED family is extended to use what this gives them * Extend the vnode_if.src format to allow more exact specification than locked/unlocked.
This commit should not do any semantic changes unless you are using DEBUG_VFS_LOCKS.
Discussed with: grog, mch, peter, phk Reviewed by: peter
|
53937 |
30-Nov-1999 |
dillon |
The symlink implementation could improperly return a NULL vp along with a 0 error code. The problem occured with NFSv2 mounts and also with any NFSv3 mount returning an EEXIST error (which is translated to 0 prior to return). The reply to the rpc only contains the file handle for the no-error case under NFSv3. The error case under NFSv3 and all cases under NFSv2 do *not* return the file handle. The fix is to do a secondary lookup to obtain the file handle and thus be able to generate a return vnode for the situations where the rpc reply does not contain the required information.
The bug was originally introduced when VOP_SYMLINK semantics were changed for -CURRENT. The NFS symlink implementation was not properly modified to go along with the change despite the fact that three people reviewed the code. It took four attempts to get the current fix correct with five people. Is NFS obfuscated? Ha!
Reviewed by: Alfred Perlstein <bright@wintelcom.net> Testing and Discussion: "Viren R.Shah" <viren@rstcorp.com>, Eivind Eklund <eivind@FreeBSD.ORG>, Ian Dowse <iedowse@maths.tcd.ie>
|
53776 |
27-Nov-1999 |
eivind |
Remap the error EEXISTS => 0 *before* using error to determine if we should return a vp.
|
53552 |
22-Nov-1999 |
dillon |
nm_srtt and nm_sdrtt are arrays[4]. Remove explicit initialization of element [4] in both, which goes beyond the end of the array, leaving [0], [1], [2], and [3]. This bug did not cause any problems since the overrun fields are initialized after the bogus array init but needs to be fixed anyway.
Submitted by: Ian Dowse <iedowse@maths.tcd.ie>
|
53463 |
20-Nov-1999 |
eivind |
Fix VOP_MKNOD for loss of WILLRELE. I don't know how I could have missed this in the first place :-(
Noticed by: bde
|
53131 |
13-Nov-1999 |
eivind |
Remove WILLRELE from VOP_SYMLINK
Note: Previous commit to these files (except coda_vnops and devfs_vnops) that claimed to remove WILLRELE from VOP_RENAME actually removed it from VOP_MKNOD.
|
53095 |
11-Nov-1999 |
dillon |
Remove special case socket sharing code in order to allow nfsd to bind IP addresses to udp/cltp sockets separately.
PR: kern/13049 Reviewed by: David Malone <dwmalone@maths.tcd.ie>, freebsd-current
|
53022 |
08-Nov-1999 |
dillon |
Fix nfssvc_addsock() to not attempt to free a NULL socket structure when returning an error. Bug fix was extracted from the PR. The PR is not yet entirely resolved by this commit.
PR: kern/13049 Reviewed by: Matt Dillon <dillon@freebsd.org> Submitted by: Ian Dowse <iedowse@maths.tcd.ie>
|
52781 |
01-Nov-1999 |
msmith |
Call bootpc_init before we try to mount an NFS root, if we're configured to use BOOTP for NFS root discovery.
The entire interface setup inside nfs_mountroot is evil, and should die.
|
52635 |
29-Oct-1999 |
phk |
useracc() the prequel:
Merge the contents (less some trivial bordering the silly comments) of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts the #defines for the vm_inherit_t and vm_prot_t types next to their typedefs.
This paves the road for the commit to follow shortly: change useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE} as argument.
|
52492 |
25-Oct-1999 |
dillon |
Move NFS access cache hits/misses into nfsstats structure so /usr/bin/nfsstat can get to it easily.
|
51906 |
03-Oct-1999 |
phk |
Before we start to mess with the VFS name-cache clean things up a little bit: Isolate the namecache in its own file, and give it a dedicated malloc type.
|
51799 |
29-Sep-1999 |
marcel |
Careless use of struct proc *p caused major problems. 'p' is allowed to be NULL in this function (nfs_sigintr). Reorder the statements and guard them all with a single if (p != NULL).
reported, reviewed and tested by: jdp
|
51791 |
29-Sep-1999 |
marcel |
sigset_t change (part 2 of 5) -----------------------------
The core of the signalling code has been rewritten to operate on the new sigset_t. No methodological changes have been made. Most references to a sigset_t object are through macros (see signalvar.h) to create a level of abstraction and to provide a basis for further improvements.
The NSIG constant has not been changed to reflect the maximum number of signals possible. The reason is that it breaks programs (especially shells) which assume that all signals have a non-null name in sys_signame. See src/bin/sh/trap.c for an example. Instead _SIG_MAXSIG has been introduced to hold the maximum signal possible with the new sigset_t.
struct sigprop has been moved from signalvar.h to kern_sig.c because a) it is only used there, and b) access must be done though function sigprop(). The latter because the table doesn't holds properties for all signals, but only for the first NSIG signals.
signal.h has been reorganized to make reading easier and to add the new and/or modified structures. The "old" structures are moved to signalvar.h to prevent namespace polution.
Especially the coda filesystem suffers from the change, because it contained lines like (p->p_sigmask == SIGIO), which is easy to do for integral types, but not for compound types.
NOTE: kdump (and port linux_kdump) must be recompiled.
Thanks to Garrett Wollman and Daniel Eischen for pressing the importance of changing sigreturn as well.
|
51475 |
20-Sep-1999 |
dillon |
Add comment to clarify a commit rpc optimization already being performed.
|
51344 |
17-Sep-1999 |
dillon |
Asynchronized client-side nfs_commit. NFS commit operations were previously issued synchronously even if async daemons (nfsiod's) were available. The commit has been moved from the strategy code to the doio code in order to asynchronize it.
Removed use of lastr in preparation for removal of vnode->v_lastr. It has been replaced with seqcount, which is already supported by the system and, in fact, gives us a better heuristic for sequential detection then lastr ever did.
Made major performance improvements to the server side commit. The server previously fsync'd the entire file for each commit rpc. The server now bawrite()s only those buffers related to the offset/size specified in the commit rpc.
Note that we do not commit the meta-data yet. This works still needs to be done.
Note that a further optimization can be done (and has not yet been done) on the client: we can merge multiple potential commit rpc's into a single rpc with a greater file offset/size range and greatly reduce rpc traffic.
Reviewed by: Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com>
|
51138 |
11-Sep-1999 |
alfred |
Seperate the export check in VFS_FHTOVP, exports are now checked via VFS_CHECKEXP.
Add fh(open|stat|stafs) syscalls to allow userland to query filesystems based on (network) filehandle.
Obtained from: NetBSD
|
51068 |
07-Sep-1999 |
alfred |
All unimplemented VFS ops now have entries in kern/vfs_default.c that return reasonable defaults.
This avoids confusing and ugly casting to eopnotsupp or making dummy functions. Bogus casting of filesystem sysctls to eopnotsupp() have been removed.
This should make *_vfsops.c more readable and reduce bloat.
Reviewed by: msmith, eivind Approved by: phk Tested by: Jeroen Ruigrok/Asmodai <asmodai@wxs.nl>
|
50521 |
28-Aug-1999 |
phk |
remove unused variables.
|
50477 |
28-Aug-1999 |
peter |
$Id$ -> $FreeBSD$
|
50405 |
26-Aug-1999 |
phk |
Simplify the handling of VCHR and VBLK vnodes using the new dev_t:
Make the alias list a SLIST.
Drop the "fast recycling" optimization of vnodes (including the returning of a prexisting but stale vnode from checkalias). It doesn't buy us anything now that we don't hardlimit vnodes anymore.
Rename checkalias2() and checkalias() to addalias() and addaliasu() - which takes dev_t and udev_t arg respectively.
Make the revoke syscalls use vcount() instead of VALIASED.
Remove VALIASED flag, we don't need it now and it is faster to traverse the much shorter lists than to maintain the flag.
vfs_mountedon() can check the dev_t directly, all the vnodes point to the same one.
Print the devicename in specfs/vprint().
Remove a couple of stale LFS vnode flags.
Remove unimplemented/unused LK_DRAINED;
|
50053 |
19-Aug-1999 |
peter |
Convert all the nfs macros to do { blah } while (0) to ensure it works correctly in if/else etc. egcs had probably picked up most of the problems here before with "ambiguous braces" etc, but this should increase the robustness a bit. Based on an idea from Eivind Eklund.
|
49945 |
17-Aug-1999 |
alc |
Add the (inline) function vm_page_undirty for clearing the dirty bitmask of a vm_page.
Use it.
Submitted by: dillon
|
49659 |
12-Aug-1999 |
dt |
nfs_getcacheblk() can return 0 if the mount is interruptible. It need to be checked by the caller.
Broken in: rev. 1.70 (1999/05/02)
|
49535 |
08-Aug-1999 |
phk |
Decommision miscfs/specfs/specdev.h. Most of it goes into <sys/conf.h>, a few lines into <sys/vnode.h>.
Add a few fields to struct specinfo, paving the way for the fun part.
|
49405 |
04-Aug-1999 |
peter |
Don't over-allocate and over-copy shorter NFSv2 filehandles and then correct the pointers afterwards.
It's kinda bogus that we generate a 24 (?) byte filehandle (2 x int32 fsid and 16 byte VFS fhandle) and pad it out to 64 bytes for NFSv3 with garbage. The whole point of NFSv3's variable filehandle length was to allow for shorter handles, both in memory and over the wire. I plan on taking a shot at fixing this shortly.
|
49302 |
31-Jul-1999 |
msmith |
As described by the submitter:
I did some tcpdumping the other day and noticed that GETATTR calls were frequently followed by an ACCESS call to the same file. The attached patch changes nfs_getattr to fill the access cache as a side effect. This is accomplished by calling ACCESS rather than GETATTR. This implies a modest overhead of 4 bytes in the request and 8 bytes in the response compared to doing a vanilla GETATTR. ... [The patch comprises two parts] The first is the "real" patch, the second counts misses and hits rather than fills and hits. The difference is subtle but important because both nfs_getattr and nfs_access now fill the cache. It also changes the default value of nfsaccess_cache_timeout to better match the attribute cache. IMHO, file timestamps change much more frequently than protection bits.
Submitted by: Bjoern Groenvall <bg@sics.se> Reviewed by: dillon (partially)
|
49239 |
30-Jul-1999 |
wpaul |
Close PR #12651: the hash calculation routine has changed in other parts of the kernel but was not updated in nfs_readdirplusrpc().
|
49237 |
30-Jul-1999 |
wpaul |
Fix two bugs in nfs_readdirplus(). The first is that in some cases, vnodes are locked and never unlocked, which leads to processes starting to wedge up after doing a mount -o nfsv3,tcp,rdirplus foo:/fs /fs; ls /fs. The second is that sometimes cnp is accessed without having been properly initialized: cnp->cn_nameptr points to an earlier name while "len" contains the length of a current name of different size. This leads to an attempt to dereference *(cn->cn_nameptr + len) which will sometimes cause a page fault and a panic.
With these two fixes, client side readdirplus works correctly with FreeBSD, IRIX 6.5.4 and Solaris 2.5.1 and 2.6 servers.
Submitted by: Matthew Dillon <dillon@backplane.com>
|
48859 |
17-Jul-1999 |
phk |
I have not one single time remembered the name of this function correctly so obviously I gave it the wrong name. s/umakedev/makeudev/g
|
48394 |
01-Jul-1999 |
peter |
Fix warning. va_fsid is udev_t, which is int32_t. No need to use %lx.
|
48357 |
30-Jun-1999 |
julian |
Submitted by: Conrad Minshall <conrad@apple.com> Reviewed by: Matthew Dillon <dillon@apollo.backplane.com>
The following ugly hack to the exit path of nfs_readlinkrpc() circumvents an Auspex bug: for symlinks longer than 112 (0x70) they return a 1024 byte xdr string - the correct data with many nulls appended. Without this fix namei returns ENAMETOOLONG, at least it does on our source base and on FreeBSD 3.0. Note we do not (and should not) rely upon their null padding.
|
48317 |
28-Jun-1999 |
peter |
Fix a KASSERT() that was negated and lead to: nfs_strategy: buffer 0xxxxx not locked when you attempted to write and had INVARIANTS turned on.
|
48274 |
27-Jun-1999 |
peter |
Minor tweaks to make sure (new) prerequisites for <sys/buf.h> (mostly splbio()/splx()) are #included in time.
|
48225 |
26-Jun-1999 |
mckusick |
Convert buffer locking from using the B_BUSY and B_WANTED flags to using lockmgr locks. This commit should be functionally equivalent to the old semantics. That is, all buffer locking is done with LK_EXCLUSIVE requests. Changes to take advantage of LK_SHARED and LK_RECURSIVE will be done in future commits.
|
48125 |
23-Jun-1999 |
julian |
Matt's NFS fixes. Submitted by: Matt Dillon Reviewed by: David Cross, Julian Elischer, Mike Smith, Drew Gallatin 3.2 version to follow when tested
|
48024 |
19-Jun-1999 |
mjacob |
Thanks to Bruce for noticing this.... compare against the *new* nfsnode's mount point for seeing whether or not the new nfsnode is already in the hash queue. We're pretty much guaranteed that the old nfsnode is already in the hash queue. Wank! Infinite Loop! Looks like just a minor typo.... (ah the influence of fortran ... np && np2... why not nfsnode_the_first && nfsnode_the_second???)...
|
47964 |
16-Jun-1999 |
mckusick |
Add a vnode argument to VOP_BWRITE to get rid of the last vnode operator special case. Delete special case code from vnode_if.sh, vnode_if.src, umap_vnops.c, and null_vnops.c.
|
47938 |
15-Jun-1999 |
mjacob |
If we retry this operation from the top of this routine, we need to make sure we've freed any allocated resources (to avoid a memory leak) and and do the right thing with respect to the nfs node hash lock we'd acquired.
|
47751 |
05-Jun-1999 |
peter |
Various changes lifted from the OpenBSD cvs tree:
txdr_hyper and fxdr_hyper tweaks to avoid excessive CPU order knowledge.
nfs_serv.c: don't call nfsm_adj() with negative values, windows clients could crash servers when doing a readdir of a large directory.
nfs_socket.c: Use IP_PORTRANGE to get a priviliged port without a spin loop trying to bind(). Don't clobber a mbuf pointer or we get panics on a NFS3ERR_JUKEBOX error from a server when reusing a freed mbuf.
nfs_subs.c: Don't loose st_blocks on NFSv2 mounts when > 2GB.
Obtained from: OpenBSD
|
47750 |
05-Jun-1999 |
peter |
Fix a malloc race
Obtained from: OpenBSD (csapuntz)
|
47749 |
05-Jun-1999 |
peter |
Don't mistake a non-async block that needs to be committed for an interrupted write.
Obtained from: fvdl@NetBSD.org via OpenBSD.
|
47028 |
11-May-1999 |
phk |
Divorce "dev_t" from the "major|minor" bitmap, which is now called udev_t in the kernel but still called dev_t in userland.
Provide functions to manipulate both types: major() umajor() minor() uminor() makedev() umakedev() dev2udev() udev2dev()
For now they're functions, they will become in-line functions after one of the next two steps in this process.
Return major/minor/makedev to macro-hood for userland.
Register a name in cdevsw[] for the "filedescriptor" driver.
In the kernel the udev_t appears in places where we have the major/minor number combination, (ie: a potential device: we may not have the driver nor the device), like in inodes, vattr, cdevsw registration and so on, whereas the dev_t appears where we carry around a reference to a actual device.
In the future the cdevsw and the aliased-from vnode will be hung directly from the dev_t, along with up to two softc pointers for the device driver and a few houskeeping bits. This will essentially replace the current "alias" check code (same buck, bigger bang).
A little stunt has been provided to try to catch places where the wrong type is being used (dev_t vs udev_t), if you see something not working, #undef DEVT_FASCIST in kern/kern_conf.c and see if it makes a difference. If it does, please try to track it down (many hands make light work) or at least try to reproduce it as simply as possible, and describe how to do that.
Without DEVT_FASCIST I belive this patch is a no-op.
Stylistic/posixoid comments about the userland view of the <sys/*.h> files welcome now, from userland they now contain the end result.
Next planned step: make all dev_t's refer to the same devsw[] which means convert BLK's to CHR's at the perimeter of the vnodes and other places where they enter the game (bootdev, mknod, sysctl).
|
46580 |
06-May-1999 |
phk |
remove b_proc from struct buf, it's (now) unused.
Reviewed by: dillon, bde
|
46568 |
06-May-1999 |
peter |
Add sufficient braces to keep egcs happy about potentially ambiguous if/else nesting.
|
46370 |
03-May-1999 |
alc |
All directory accesses must be made with NFS_DIRBLKSIZE chunks to avoid confusing the directory read cookie cache. The nfs_access implementation for v2 mounts attempts to read from the directory if root is the user so that root can't access cached files when the server remaps root to some other user.
Submitted by: Doug Rabson <dfr@nlsystems.com> Reviewed by: Matthew Dillon <dillon@apollo.backplane.com>
|
46349 |
02-May-1999 |
alc |
The VFS/BIO subsystem contained a number of hacks in order to optimize piecemeal, middle-of-file writes for NFS. These hacks have caused no end of trouble, especially when combined with mmap(). I've removed them. Instead, NFS will issue a read-before-write to fully instantiate the struct buf containing the write. NFS does, however, optimize piecemeal appends to files. For most common file operations, you will not notice the difference. The sole remaining fragment in the VFS/BIO system is b_dirtyoff/end, which NFS uses to avoid cache coherency issues with read-merge-write style operations. NFS also optimizes the write-covers-entire-buffer case by avoiding the read-before-write. There is quite a bit of room for further optimization in these areas.
The VM system marks pages fully-valid (AKA vm_page_t->valid = VM_PAGE_BITS_ALL) in several places, most noteably in vm_fault. This is not correct operation. The vm_pager_get_pages() code is now responsible for marking VM pages all-valid. A number of VM helper routines have been added to aid in zeroing-out the invalid portions of a VM page prior to the page being marked all-valid. This operation is necessary to properly support mmap(). The zeroing occurs most often when dealing with file-EOF situations. Several bugs have been fixed in the NFS subsystem, including bits handling file and directory EOF situations and buf->b_flags consistancy issues relating to clearing B_ERROR & B_INVAL, and handling B_DONE.
getblk() and allocbuf() have been rewritten. B_CACHE operation is now formally defined in comments and more straightforward in implementation. B_CACHE for VMIO buffers is based on the validity of the backing store. B_CACHE for non-VMIO buffers is based simply on whether the buffer is B_INVAL or not (B_CACHE set if B_INVAL clear, and vise-versa). biodone() is now responsible for setting B_CACHE when a successful read completes. B_CACHE is also set when a bdwrite() is initiated and when a bwrite() is initiated. VFS VOP_BWRITE routines (there are only two - nfs_bwrite() and bwrite()) are now expected to set B_CACHE. This means that bowrite() and bawrite() also set B_CACHE indirectly.
There are a number of places in the code which were previously using buf->b_bufsize (which is DEV_BSIZE aligned) when they should have been using buf->b_bcount. These have been fixed. getblk() now clears B_DONE on return because the rest of the system is so bad about dealing with B_DONE.
Major fixes to NFS/TCP have been made. A server-side bug could cause requests to be lost by the server due to nfs_realign() overwriting other rpc's in the same TCP mbuf chain. The server's kernel must be recompiled to get the benefit of the fixes.
Submitted by: Matthew Dillon <dillon@apollo.backplane.com>
|
46112 |
27-Apr-1999 |
phk |
Suser() simplification:
1: s/suser/suser_xxx/
2: Add new function: suser(struct proc *), prototyped in <sys/proc.h>.
3: s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/
The remaining suser_xxx() calls will be scrutinized and dealt with later.
There may be some unneeded #include <sys/cred.h>, but they are left as an exercise for Bruce.
More changes to the suser() API will come along with the "jail" code.
|
45996 |
24-Apr-1999 |
dt |
Fixed printf format errors on alpha.
|
45553 |
10-Apr-1999 |
peter |
Close a potential mbuf and/or mbuf cluster leak in the client-side NFS statfs() code. Free the whole chain, not just the first one.
|
45361 |
06-Apr-1999 |
peter |
Hold nfsd's upages in-core with PHOLD rather than P_NOSWAP.
|
45347 |
05-Apr-1999 |
julian |
Catch a case spotted by Tor where files mmapped could leave garbage in the unallocated parts of the last page when the file ended on a frag but not a page boundary. Delimitted by tags PRE_MATT_MMAP_EOF and POST_MATT_MMAP_EOF, in files alpha/alpha/pmap.c i386/i386/pmap.c nfs/nfs_bio.c vm/pmap.h vm/vm_page.c vm/vm_page.h vm/vnode_pager.c miscfs/specfs/spec_vnops.c ufs/ufs/ufs_readwrite.c kern/vfs_bio.c
Submitted by: Matt Dillon <dillon@freebsd.org> Reviewed by: Alan Cox <alc@freebsd.org>
|
44679 |
12-Mar-1999 |
julian |
Reviewed by: Many at differnt times in differnt parts, including alan, john, me, luoqi, and kirk Submitted by: Matt Dillon <dillon@frebsd.org>
This change implements a relatively sophisticated fix to getnewbuf(). There were two problems with getnewbuf(). First, the writerecursion can lead to a system stack overflow when you have NFS and/or VN devices in the system. Second, the free/dirty buffer accounting was completely broken. Not only did the nfs routines blow it trying to manually account for the buffer state, but the accounting that was done did not work well with the purpose of their existance: figuring out when getnewbuf() needs to sleep.
The meat of the change is to kern/vfs_bio.c. The remaining diffs are all minor except for NFS, which includes both the fixes for bp interaction AND fixes for a 'biodone(): buffer already done' lockup. Sys/buf.h also contains a chaining structure which is not used by this patchset but is used by other patches that are coming soon. This patch deliniated by tags PRE_MAT_GETBUF and POST_MAT_GETBUF. (sorry for the missing T matt)
|
44246 |
25-Feb-1999 |
peter |
Untangle the nfs send and receive queue locking a little. One lock routine was [ab]used for two different things, and you couldn't tell from the wait channel which one had wedged. Catch a few things missing from NFS_NOSERVER.
|
44112 |
18-Feb-1999 |
dfr |
Move the declaration of the vfs.nfs sysctl node outside an ifdef so that it builds if NFS_NOSERVER is defined.
Spotted by: Bruce Evans <bde@zeta.org.au>
|
44101 |
17-Feb-1999 |
bde |
Fixed bitrot in NFS_ACDEBUG option.
|
44078 |
16-Feb-1999 |
dfr |
* Change sysctl from using linker_set to construct its tree using SLISTs. This makes it possible to change the sysctl tree at runtime.
* Change KLD to find and register any sysctl nodes contained in the loaded file and to unregister them when the file is unloaded.
Reviewed by: Archie Cobbs <archie@whistle.com>, Peter Wemm <peter@netplex.com.au> (well they looked at it anyway)
|
43960 |
13-Feb-1999 |
dillon |
General additional cleanup of VOP API for NFS ops - mainly NFS ignoring the API for freeing up cnp's. This cleanup should not effect nominal operation one way or the other since NFS VOPs just happen to be called with flags that match what it actually does to the NAMEI components it gets. Still, if an NFS error occured, there was probably some memory leakage of NAMEI components with certain NFS VOP ops.
|
43956 |
13-Feb-1999 |
dillon |
PR: kern/9970
Remove incorrect vput() in nfs_link()
|
43705 |
06-Feb-1999 |
dillon |
Flush delayed-write data out prior to issuing a rename rpc. This appears to fix the problem w/ NFSV3 whereby a make installworld would get into high-network-bandwidth situations continuously trying to retry nfs writes that fail with a 'stale file handle' error.
|
43351 |
28-Jan-1999 |
dillon |
Fix warnings related to -Wall -Wcast-qual
|
43311 |
28-Jan-1999 |
dillon |
Fix warnings in preparation for adding -Wall -Wcast-qual to the kernel compile
|
43309 |
27-Jan-1999 |
dillon |
Fix warnings in preparation for adding -Wall -Wcast-qual to the kernel compile.
This commit includes significant work to proper handle const arguments for the DDB symbol routines.
|
43307 |
27-Jan-1999 |
dillon |
Fix nasty bug in nfs_access(). A conditional was if (a = b) instead of if (a == b).
|
43306 |
27-Jan-1999 |
dillon |
Fix warnings in preparation for adding -Wall -Wcast-qual to the kernel compile
|
43305 |
27-Jan-1999 |
dillon |
Fix warnings in preparation for adding -Wall -Wcast-qual to the kernel compile
|
42957 |
21-Jan-1999 |
dillon |
This is a rather large commit that encompasses the new swapper, changes to the VM system to support the new swapper, VM bug fixes, several VM optimizations, and some additional revamping of the VM code. The specific bug fixes will be documented with additional forced commits. This commit is somewhat rough in regards to code cleanup issues.
Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>
|
42582 |
12-Jan-1999 |
eivind |
Remove two cases of unused variable sp3.
|
42315 |
05-Jan-1999 |
eivind |
Remove the 'waslocked' parameter to vfs_object_create().
|
42155 |
30-Dec-1998 |
hoek |
Silence -Wtrigraph.
Submitted by: Bradley Dunn <bradley@dunn.org> (pr: kern/8817)
|
42060 |
25-Dec-1998 |
dfr |
Fix for creating files on a Solaris 7 server with NFSv3 (the request was slightly garbled but older servers seemed to understand it).
Reviewed by: David O'Brien <obrien@nuxi.ucdavis.edu>
|
41796 |
14-Dec-1998 |
dt |
Added 3 new errno values, requred by various standards: EOVERFLOW, ECANCELED, EILSEQ.
Fixed ibcs2 and especially linux EIDRM and ENOMSG errno mapping. Reviewed by: Dan Nelson <dnelson@emsphone.com>
|
41791 |
14-Dec-1998 |
dt |
(Hopefully) fix support for "large" files. Mostly cast block numbers to off_t before they multiplied to block sizes.
|
41591 |
07-Dec-1998 |
archie |
The "easy" fixes for compiling the kernel -Wunused: remove unreferenced static and local variables, goto labels, and functions declared but not defined.
|
41514 |
04-Dec-1998 |
archie |
Examine all occurrences of sprintf(), strcat(), and str[n]cpy() for possible buffer overflow problems. Replaced most sprintf()'s with snprintf(); for others cases, added terminating NUL bytes where appropriate, replaced constants like "16" with sizeof(), etc.
These changes include several bug fixes, but most changes are for maintainability's sake. Any instance where it wasn't "immediately obvious" that a buffer overflow could not occur was made safer.
Reviewed by: Bruce Evans <bde@zeta.org.au> Reviewed by: Matthew Dillon <dillon@apollo.backplane.com> Reviewed by: Mike Spengler <mks@networkcs.com>
|
41488 |
03-Dec-1998 |
dillon |
Make bootp error message slightly more verbose
|
41186 |
15-Nov-1998 |
msmith |
Reimplement the NFS ACCESS RPC cache as an "accelerator" rather than a true cache. If the cached result lets us say "yes", then go with that. If we're not sure, or we think the answer might be "no", go to the wire to be certain. This avoids all of the possible false veto cases, and allows us to key the cached value with just the UID for which the cached value holds, reducing the bloat of the nfsnode structure from 104 bytes to just 12 bytes.
Since the "yes" case is by far the most common, this should still provide a substantial performance improvement. Also default the cache to on, with a conservative timeout (2 seconds). This improves performance if NFS is loaded as a KLD module, as there's not (yet) code to parse an option out of the module arguments to set it, and sysctl doesn't work (yet) for OIDs in modules.
The 'accelerator' mode was suggested by Bjoern Groenvall (bg@sics.se)
Feedback on this would be appreciated as testing has been necessarily limited by Comdex, and it would be valuable to have this in 2.2.8.
|
41138 |
13-Nov-1998 |
msmith |
Avoid a null pointer reference if the target of an NFS rename has been sillrenamed, or if the source vnode doesn't have an associated nfsnode.
Bug report from Andrew Gallatin <gallatin@cs.duke.edu>
|
41132 |
13-Nov-1998 |
dfr |
Fix a panic in nfsrv_dorec() where a NULL pointer could be passed to free() sometimes.
Reviewed by: Eric Haug <ejh@eas.slu.edu>
|
41127 |
13-Nov-1998 |
msmith |
Implement NFS ACCESS RPC result caching.
This yields startling performance increases for NFS clients for many access profiles, due to the fact that ACCESS results are persistently cached in the namecache in many cases.
Note that the code is somewhat conservative in that it requires an exact credential match for a cache hit. This bloats the nfsnode structure by sizeof(struct ucred) (96 bytes). Any less conservative approach opens the possibility for a false veto in eg. setuid applications. Alternative suggestions would be welcomed.
The cache is normally disabled, to activate set the sysctl variable vfs.nfs.access_cache_timeout to a nonzero value. This is the time in seconds that a cached entry will be considered valid; useful values appear to be 2-10 seconds. Performance of the cache can be monitored with the vfs.nfs.access_cache_hits and vfs.nfs.access_cache_hits variables.
|
41026 |
09-Nov-1998 |
peter |
Remove [apparently] bogus casts to u_long for the vnode_pager_setsize() second argument. np_size is a 64 bit int, so is the second arg. This might have caused needless 2G/4G file size problems.
I believe it was Bruce who queried this.
|
40790 |
31-Oct-1998 |
peter |
Use TAILQ macros for clean/dirty block list processing. Set b_xflags rather than abusing the list next pointer with a magic number.
|
39794 |
29-Sep-1998 |
mckusick |
In nfs_link(), check for a cross-device mount *before* looking in the v_data field. Obtained from: Charles Hannum, via Frank van der Linden <frank@wins.uva.nl>
|
39793 |
29-Sep-1998 |
mckusick |
Missing vput when cross-device link error is detected in nfs_link.
|
39792 |
29-Sep-1998 |
mckusick |
During truncation, have to notify the VM about the new size of the NFS file *before* doing the nfs_vinvalbuf operation. Otherwise some invalid data may show up in an mmap.
|
39790 |
29-Sep-1998 |
mckusick |
Frank sez: 'It fixes a problem with servers that return 0 values for some of the fsinfo RPC fields. It is strictly speaking not wrong to do this, as the spec says that "it is expected that a server will make a best effort at supporting all the attributes", but pretty unusual. You guessed it, it's NT servers that do it.' Obtained from: Frank van der Linden <frank@wins.uva.nl>
|
39789 |
29-Sep-1998 |
mckusick |
Do not need (or want) to take a reference on an NFS file that is being deleted due to an forcible unmount. The problem is that vgone calls vclean() which then calls calls nfs_inactive() with VXLOCK set on the vnode. Nfs_inactive() was calling vget() to get a reference on the vnode, which in turn hung on VXLOCK. Nfs_inactive() now checks v_usecount to make sure that the vnode is not coming from vclean() before it does a vget().
|
39788 |
29-Sep-1998 |
mckusick |
The code checks each fragment mark to see if it's valid; if the fragment is less than NFS_MINPACKET or greater than NFS_MAXPACKET in size, it barfs and, I think, drops the connection.
However, there's no guarantee that in a multi-fragment RPC, all the fragments will be at least as large as NFS_MINPACKET.
In fact, with the version of "tclnfs" we have here, which supports NFS over TCP, at least when built under SunOS 4.1.3 (i.e., with 4.1.3's user-mode ONC RPC library), I can *repeatably* cause "tclnfs" to send a request with more than one fragment, one of which is only 8 bytes long. I just do a 3877-byte write to a file, at an offset of 0.
The check that "slp->ns_reclen" is greater than or equal to NFS_MINPACKET serves no useful purpose - if the NFS server code can't handle packets < NFS_MINPACKET bytes, it can't handle them over *any* protocol, so the check has to be done above the RPC-over-TCP layer - and should be removed. Obtained from: Fix from Guy Harris, forwarded by Rick Macklem.
|
39782 |
29-Sep-1998 |
mckusick |
Mark directory buffers that have no valid data with B_INVAL so that they are not put in the cache.
|
39781 |
29-Sep-1998 |
mckusick |
When adding data to a buffer, we need to clear the B_NEEDCOMMIT flag which says that the data is on server but not committed.
|
38909 |
07-Sep-1998 |
bde |
Removed statically configured mount type numbers (MOUNT_*) and all references to them.
The change a couple of days ago to ignore these numbers in statically configured vfsconf structs was slightly premature because the cd9660, cfs, devfs, ext2fs, nfs vfs's still used MOUNT_* instead of the number in their vfsconf struct.
|
38894 |
07-Sep-1998 |
bde |
Made unloading of the nfs LKM sort of work. This is mainly to test detachment of vfs sysctls. Unloading of vfs LKMs doesn't actually work for any vfs, since it leaves garbage pointers to memory allocation control structures.
|
38869 |
05-Sep-1998 |
bde |
Ignore the statically configured vfs type numbers and assign vfs type numbers in vfs attach order (modulo incomplete reuse of old numbers after vfs LKMs are unloaded). This requires reinitializing the sysctl tree (or at least the vfs subtree) for vfs's that support sysctls (currently only nfs). sysctl_order() already handled reinitialization reasonably except it checked for annulled self references in the wrong place.
Fixed sysctls for vfs LKMs.
|
38866 |
05-Sep-1998 |
bde |
Instantiate `nfs_mount_type' in a standard file so that it is present when nfs is an LKM. Declare it in a header file. Don't forget to use it in non-Lite2 code. Initialize it to -1 instead of to 0, since 0 will soon be the mount type number for the first vfs loaded.
NetBSD uses strcmp() to avoid this ugly global.
|
38799 |
04-Sep-1998 |
dfr |
Cosmetic changes to the PAGE_XXX macros to make them consistent with the other objects in vm.
|
38718 |
01-Sep-1998 |
luoqi |
Check for NULL pointer before freeing a struct sockaddr. m_freem() can handle NULL, buf free() can't.
|
38482 |
23-Aug-1998 |
wollman |
Yow! Completely change the way socket options are handled, eliminating another specialized mbuf type in the process. Also clean up some of the cruft surrounding IPFW, multicast routing, RSVP, and other ill-explored corners.
|
38412 |
18-Aug-1998 |
bde |
Fixed printf format errors.
|
38299 |
13-Aug-1998 |
dfr |
Protect all modifications to v_numoutput with splbio().
|
38289 |
12-Aug-1998 |
bde |
Don't configure compatibility code for pre-Lite2 mount() calls by default. This code should go away soon.
|
37997 |
01-Aug-1998 |
peter |
If we get an ENOBUFS from the network, it's normally transient network interface congestion (eg: nfs over a ppp link, etc). Don't log these for UDP mounts, and don't cause syscalls to fail with EINTR. This stops the 'nfs send error 55' warnings.
If the error is because the system is really hosed, this is the least of your problems...
|
37649 |
15-Jul-1998 |
bde |
Cast pointers to uintptr_t/intptr_t instead of to u_long/long, respectively. Most of the longs should probably have been u_longs, but this changes is just to prevent warnings about casts between pointers and integers of different sizes, not to fix poorly chosen types.
|
37384 |
04-Jul-1998 |
julian |
VOP_STRATEGY grows an (struct vnode *) argument as the value in b_vp is often not really what you want. (and needs to be frobbed). more cleanups will follow this. Reviewed by: Bruce Evans <bde@freebsd.org>
|
37291 |
30-Jun-1998 |
jmg |
fix buildworld hopefully be3fore anyone complains...
NFS_*TIMO should possibly be converted to sysctl vars (jkh's suggestion), but in some cases it looks like nfs keeps a copy of the value in a struct
hash sizes are already ifdef'd KERNEL, so there aren't userland inpact from them...
|
37272 |
30-Jun-1998 |
jmg |
convert some nfs tunables to options, these are: NFS_MINATTRTIMO VREG attrib cache timeout in sec NFS_MAXATTRTIMO NFS_MINDIRATTRTIMO VDIR attrib cache timeout in sec NFS_MAXDIRATTRTIMO NFS_GATHERDELAY Default write gather delay (msec) NFS_UIDHASHSIZ Tune the size of nfssvc_sock with this NFS_WDELAYHASHSIZ and with this NFS_MUIDHASHSIZ Tune the size of nfsmount with this NFS_NOSERVER (already documented in LINT) NFS_DEBUG turn on NFS debugging
also, because NFS_ROOT is used by very different files, it has been renamed to opt_nfsroot.h instead of the old opt_nfs.h....
|
37089 |
21-Jun-1998 |
bde |
Fixed typo in ifdefed code. (NFS_ACDEBUG is not in LINT. Therefore, code controlled by it did not even compile.)
|
36979 |
14-Jun-1998 |
bde |
Avoid an egcs pessimization for 64-bit signed division on i386's. Pre-2.8 versions of gcc generate a call to __divdi3() for all 64-bit signed divisions, but egcs optimizes them to a shift and fixup when the divisor is a constant power of 2. Unfortunately, it generates a call to __cmpdi2() for the fixup, although all except possibly ancient versions of gcc and egcs do ordinary 64-bit comparisons inline.
|
36735 |
07-Jun-1998 |
dfr |
This commit fixes various 64bit portability problems required for FreeBSD/alpha. The most significant item is to change the command argument to ioctl functions from int to u_long. This change brings us inline with various other BSD versions. Driver writers may like to use (__FreeBSD_version == 300003) to detect this change.
The prototype FreeBSD/alpha machdep will follow in a couple of days time.
|
36563 |
01-Jun-1998 |
peter |
Make sure we go a nfs_fsinfo() in get/putpages before calling readrpc/writerpc, since they assume it's already been done. This could break if the first read/write access to a nfs filesystem was an exec() or mmap() instead of a read(), write() syscall. (or statfs()). nfs_getpages() could return an errno (EOPNOTSUPP) instead of a VM_PAGER_* return code. Some layout tweaks for the get/putpages code.
|
36562 |
01-Jun-1998 |
peter |
Fix post-test pre-commit cleanup typo.
|
36561 |
01-Jun-1998 |
peter |
readlink() returns EINVAL rather than EPERM if called on a non-symlink.
|
36560 |
01-Jun-1998 |
peter |
Preset the maximum file size before we get to nfs_fsinfo(), based on an (over?) conservative assumption about what the client can store in it's buffer cache using a signed 32-bit 512-byte block number index. Otherwise it's possible for some file access when maxfilesize = 0 (eg: /usr is nfs mounted and doing an execve()) Pointed out by: bde
XXX It might make sense to do a preemptive nfs_fsinfo() call at mount time.
|
36541 |
31-May-1998 |
peter |
For the on-the-wire protocol, u_long -> u_int32_t; long -> int32_t; int -> int32_t; u_short -> u_int16_t. Also, use mode_t instead of u_short for storing modes (mode_t is a u_int16_t).
Obtained from: NetBSD
|
36540 |
31-May-1998 |
peter |
Support 'mount -u' remounts. This may require disconnecting and rebinding the socket. Certain mode changes are not allowed.
Obtained from: NetBSD
|
36538 |
31-May-1998 |
peter |
xdr encode -1 properly.
Obtained from: NetBSD
|
36537 |
31-May-1998 |
peter |
Fully fill in nfsv2 write rpc requests rather than leaving garbage.
Obtained from: NetBSD
|
36536 |
31-May-1998 |
peter |
Don't silently fail to set file flags.
Obtained from: NetBSD
|
36535 |
31-May-1998 |
peter |
Don't blindly accept the server's preferences if they are too small.
Obtained from: NetBSD
|
36534 |
31-May-1998 |
peter |
Prototype support for selectively allowing non-reserved ports on a per export basis. Needs userland support yet.
Obtained from: NetBSD
|
36531 |
31-May-1998 |
peter |
Don't pass a second copy of the uid/gid in with the v2/v3 sattr structures, it just makes more work. We pass a copy of the uid/gid with the credentials. (although, this may need to be revisited if a non AUTHUNIX authentication method (such as NFSKERB) ever gets implemented).
Obtained from: NetBSD
|
36530 |
31-May-1998 |
peter |
Use the new SB_UPCALL flag,
Obtained from: NetBSD (but I changed the flag clear order in case).
|
36526 |
31-May-1998 |
peter |
NFS_SMALLFH is defined in nfsproto.h, not sys/mount.h
Obtained from: NetBSD
|
36525 |
31-May-1998 |
peter |
Don't let the user try "rmdir ."
Obtained from: NetBSD
|
36524 |
31-May-1998 |
peter |
Don't let the user try and unlink() a directory on a NFS server.
Obtained from: NetBSD
|
36523 |
31-May-1998 |
peter |
When a write rpc returns an error, break the loop.
Obtained from: NetBSD
|
36522 |
31-May-1998 |
peter |
Don't leak an mbuf when a write rpc returns zero bytes written.
Obtained from: NetBSD
|
36521 |
31-May-1998 |
peter |
#ifdef a diagnostic printf
Obtained from: NetBSD
|
36520 |
31-May-1998 |
peter |
Don't try and free mrep twice on some error conditions.
Obtained from: NetBSD
|
36519 |
31-May-1998 |
peter |
#ifdef a diagnostic panic, plus another missed costmetic change.
Obtained from: NetBSD
|
36518 |
31-May-1998 |
peter |
We have gained 2 more errno's, add them to the NFSv2 mapping table.
|
36517 |
31-May-1998 |
peter |
Missed a cosmetic change that the other BSD's have.
|
36516 |
31-May-1998 |
peter |
oops, nfs_msg() is called from client code too.
|
36515 |
31-May-1998 |
peter |
When we can't reconnect a socket, don't forget to unlock before retrying or we can deadlock.
Obtained from: NetBSD
|
36514 |
31-May-1998 |
peter |
Don't log zero length reads, this can happen during normal operation.
Obtained from: NetBSD
|
36513 |
31-May-1998 |
peter |
Consider for readdir chunk sizes when tuning socket buffer reservations.
Obtained from: NetBSD
|
36511 |
31-May-1998 |
peter |
Some const's
Obtained from: NetBSD
|
36503 |
31-May-1998 |
peter |
NFS Jumbo commit part 1. Cosmetic and structural changes only. The aim of this part of commits is to minimize unnecessary differences between the other NFS's of similar origin. Yes, there are gratuitous changes here that the style folks won't like, but it makes the catch-up less difficult.
|
36481 |
31-May-1998 |
peter |
VOP_ABORTUP() appears to be called with the wrong vnode. The other callers that I checked (eg: ufs_link()) do the ABORTOP on the directory rather than the file itself. After Michael Hancock's patches, the abortop doesn't seem all that critial now since something else will free the pathname buffer.
|
36473 |
30-May-1998 |
peter |
When using NFSv3, use the remote server's idea of the maximum file size rather than assuming 2^64. It may not like files that big. :-) On the nfs server, calculate and report the max file size as the point that the block numbers in the cache would turn negative. (ie: 1099511627775 bytes (1TB)).
One of the things I'm worried about however, is that directory offsets are really cookies on a NFSv3 server and can be rather large, especially when/if the server generates the opaque directory cookies by using a local filesystem offset in what comes out as the upper 32 bits of the 64 bit cookie. (a server is free to do this, it could save byte swapping depending on the native 64 bit byte order)
Obtained from: NetBSD
|
36329 |
24-May-1998 |
peter |
Convert a couple of large allocations to use zones rather than malloc for better packing. This means that we can choose better values for the various hash entries without having to try and get it all to fit within an artificial power of two limit for malloc's sake.
|
36249 |
20-May-1998 |
peter |
s/flags/flag/
|
36248 |
20-May-1998 |
peter |
A cleaner fix for PR#5102, clear nonsense flags at mount time rather than in the core of nfs_bio.c at the 11th hour.
PR: 5102
|
36247 |
20-May-1998 |
peter |
Don't change argp->flags after it's been copied.
|
36176 |
19-May-1998 |
peter |
Allow control of the attribute cache timeouts at mount time.
We had run out of bits in the nfs mount flags, I have moved the internal state flags into a seperate variable. These are no longer visible via statfs(), but I don't know of anything that looks at them.
|
36100 |
16-May-1998 |
bde |
Get timespecs directly instead of via timevals.
|
36099 |
16-May-1998 |
bde |
Don't abuse `+' to combine flags.
|
36098 |
16-May-1998 |
bde |
Backed out rev.1.76. It just added style bugs.
|
36097 |
16-May-1998 |
bde |
Get timespecs directly instead of via timevals.
|
36013 |
13-May-1998 |
peter |
Add missing arg to vget().. Serves me right for committing a 2.2 patch to -current without testing it there.. :-(
Submitted by: Michael Hancock <michaelh@cet.co.jp>
|
35994 |
13-May-1998 |
peter |
Hold a reference to the vnode during the sillyrename cleanup. If we block in nfs_vinvalbuf() or the nfs_removeit(), we can have the nfsnode reallocated from underneath us (eg: replaced by a ufs 'struct inode') which can cause disk corruption ('freeing free block' when di_db[5] gets trashed). This is not a cheap fix, but it'll do until the nfsnodes get reference counting and/or locking.
Apparently NetBSD have a similar fix (apparently from BSDI).
I wish all PR's had this much useful detail. :-)
PR: 6611 Submitted by: Stephen Clawson <sclawson@marker.cs.utah.edu>
|
35991 |
13-May-1998 |
peter |
Move the *vpp initialization earlier so that it's set in all error cases. This should stop the 'panic: leaf should not be empty' nfs panic.
PR: 1856 Submitted by: msaitoh@spa.is.uec.ac.jp
|
35823 |
07-May-1998 |
msmith |
In the words of the submitter:
--------- Make callers of namei() responsible for releasing references or locks instead of having the underlying filesystems do it. This eliminates redundancy in all terminal filesystems and makes it possible for stacked transport layers such as umapfs or nullfs to operate correctly.
Quality testing was done with testvn, and lat_fs from the lmbench suite.
Some NFS client testing courtesy of Patrik Kudo.
vop_mknod and vop_symlink still release the returned vpp. vop_rename still releases 4 vnode arguments before it returns. These remaining cases will be corrected in the next set of patches. ---------
Submitted by: Michael Hancock <michaelh@cet.co.jp>
|
35769 |
06-May-1998 |
msmith |
As described by the submitter:
Reverse the VFS_VRELE patch. Reference counting of vnodes does not need to be done per-fs. I noticed this while fixing vfs layering violations. Doing reference counting in generic code is also the preference cited by John Heidemann in recent discussions with him.
The implementation of alternative vnode management per-fs is still a valid requirement for some filesystems but will be revisited sometime later, most likely using a different framework.
Submitted by: Michael Hancock <michaelh@cet.co.jp>
|
35066 |
06-Apr-1998 |
phk |
Use random() to find our initial xid.
|
34961 |
30-Mar-1998 |
phk |
Eradicate the variable "time" from the kernel, using various measures. "time" wasn't a atomic variable, so splfoo() protection were needed around any access to it, unless you just wanted the seconds part.
Most uses of time.tv_sec now uses the new variable time_second instead.
gettime() changed to getmicrotime(0.
Remove a couple of unneeded splfoo() protections, the new getmicrotime() is atomic, (until Bruce sets a breakpoint in it).
A couple of places needed random data, so use read_random() instead of mucking about with time which isn't random.
Add a new nfs_curusec() function.
Mark a couple of bogosities involving the now disappeard time variable.
Update ffs_update() to avoid the weird "== &time" checks, by fixing the one remaining call that passwd &time as args.
Change profiling in ncr.c to use ticks instead of time. Resolution is the same.
Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call hzto() which subtracts time" sequences.
Reviewed by: bde
|
34930 |
28-Mar-1998 |
steve |
Don't allow the readdirplus routine to be used in NFS V2.
PR: 5102 Reviewed by: msmith Submitted by: Dmitry Kohmanyuk <dk@farm.org>
|
34926 |
28-Mar-1998 |
bde |
Don't depend on <sys/mount.h> including <sys/socket.h>.
|
34924 |
28-Mar-1998 |
bde |
Moved some #includes from <sys/param.h> nearer to where they are actually used.
|
34573 |
14-Mar-1998 |
tegge |
Add a BOOTP_WIRED_TO option, for use on machines with multiple network cards where the first detected card should not be used for bootp. Submitted by: Doug Ambrisko <ambrisko@whistle.com>
|
34572 |
14-Mar-1998 |
tegge |
Update workaround for limitations in the arp code. Adjust the RPC timeout message which occured when the old workaround broke to show the correct IP address.
|
34266 |
08-Mar-1998 |
julian |
Reviewed by: dyson@freebsd.org (john Dyson), dg@root.com (david greenman) Submitted by: Kirk McKusick (mcKusick@mckusick.com) Obtained from: WHistle development tree
|
34206 |
07-Mar-1998 |
dyson |
This mega-commit is meant to fix numerous interrelated problems. There has been some bitrot and incorrect assumptions in the vfs_bio code. These problems have manifest themselves worse on NFS type filesystems, but can still affect local filesystems under certain circumstances. Most of the problems have involved mmap consistancy, and as a side-effect broke the vfs.ioopt code. This code might have been committed seperately, but almost everything is interrelated.
1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that are fully valid. 2) Rather than deactivating erroneously read initial (header) pages in kern_exec, we now free them. 3) Fix the rundown of non-VMIO buffers that are in an inconsistent (missing vp) state. 4) Fix the disassociation of pages from buffers in brelse. The previous code had rotted and was faulty in a couple of important circumstances. 5) Remove a gratuitious buffer wakeup in vfs_vmio_release. 6) Remove a crufty and currently unused cluster mechanism for VBLK files in vfs_bio_awrite. When the code is functional, I'll add back a cleaner version. 7) The page busy count wakeups assocated with the buffer cache usage were incorrectly cleaned up in a previous commit by me. Revert to the original, correct version, but with a cleaner implementation. 8) The cluster read code now tries to keep data associated with buffers more aggressively (without breaking the heuristics) when it is presumed that the read data (buffers) will be soon needed. 9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The delay loop waiting is not useful for filesystem locks, due to the length of the time intervals. 10) Correct and clean-up spec_getpages. 11) Implement a fully functional nfs_getpages, nfs_putpages. 12) Fix nfs_write so that modifications are coherent with the NFS data on the server disk (at least as well as NFS seems to allow.) 13) Properly support MS_INVALIDATE on NFS. 14) Properly pass down MS_INVALIDATE to lower levels of the VM code from vm_map_clean. 15) Better support the notion of pages being busy but valid, so that fewer in-transit waits occur. (use p->busy more for pageouts instead of PG_BUSY.) Since the page is fully valid, it is still usable for reads. 16) It is possible (in error) for cached pages to be busy. Make the page allocation code handle that case correctly. (It should probably be a printf or panic, but I want the system to handle coding errors robustly. I'll probably add a printf.) 17) Correct the design and usage of vm_page_sleep. It didn't handle consistancy problems very well, so make the design a little less lofty. After vm_page_sleep, if it ever blocked, it is still important to relookup the page (if the object generation count changed), and verify it's status (always.) 18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up. 19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush. 20) Fix vm_pager_put_pages and it's descendents to support an int flag instead of a boolean, so that we can pass down the invalidate bit.
|
34096 |
06-Mar-1998 |
msmith |
Trivial filesystem getpages/putpages implementations, set the second. These should be considered the first steps in a work-in-progress. Submitted by: Terry Lambert <terry@freebsd.org>
|
33964 |
01-Mar-1998 |
msmith |
The intent is to get rid of WILLRELE in vnode_if.src by making a complement to all ops that return a vpp, VFS_VRELE. This is initially only for file systems that implement the following ops that do a WILLRELE:
vop_create, vop_whiteout, vop_mknod, vop_remove, vop_link, vop_rename, vop_mkdir, vop_rmdir, vop_symlink
This is initial DNA that doesn't do anything yet. VFS_VRELE is implemented but not called.
A default vfs_vrele was created for fs implementations that use the standard vnode management routines.
VFS_VRELE implementations were made for the following file systems:
Standard (vfs_vrele) ffs mfs nfs msdosfs devfs ext2fs
Custom union umapfs
Just EOPNOTSUPP fdesc procfs kernfs portal cd9660
These implementations may change as VOP changes are implemented.
In the next phase, in the vop implementations calls to vrele and the vrele part of vput will be moved to the top layer vfs_vnops and made visible to all layers. vput will be replaced by unlock in these cases. Unlocking will still be done in the per fs layer but the refcount decrement will be triggered at the top because it doesn't hurt to hold a vnode reference a little longer. This will have minimal impact on the structure of the existing code.
This will only be done for vnode arguments that are released by the various fs vop implementations.
Wider use of VFS_VRELE will likely require restructuring of the code.
Reviewed by: phk, dyson, terry et. al. Submitted by: Michael Hancock <michaelh@cet.co.jp>
|
33181 |
09-Feb-1998 |
eivind |
Staticize.
|
33134 |
06-Feb-1998 |
eivind |
Back out DIAGNOSTIC changes.
|
33121 |
05-Feb-1998 |
dyson |
Fix an omission of a line from the previous commit to this file. The problem appeared to be an NFS hang.
|
33108 |
04-Feb-1998 |
eivind |
Turn DIAGNOSTIC into a new-style option.
|
33054 |
03-Feb-1998 |
bde |
Forward declare some structs so that this file is more self-sufficient.
|
32998 |
01-Feb-1998 |
bde |
Moved declaration of `union nethostadr' outside of the KERNEL section, to give pollution compatible with <nfs/nqfs.h>. At least mount_nfs.c previously had to #define KERNEL before including <nfs/nfs.h> to get this pollution, but this gave other pollution.
Moved comment about NFSINT_SIGMASK to immediately before the code that it applies to.
|
32997 |
01-Feb-1998 |
bde |
Forward declare more structs that are used in prototypes here - don't depend on <sys/types.h> forward declaring common ones.
Added an underscore to `sin' in prototypes to avoid warnings for the conflict with the ANSI sin().
|
32912 |
31-Jan-1998 |
tegge |
Release the buffer when an error occurs while reading directory entries.
|
32755 |
25-Jan-1998 |
dyson |
Various NFS fixes: Make vfs_bio buffer mgmt work better. Buffers were being used after brelse. Make nfs_getpages work independently of other NFS interfaces. This eliminates some difficult recursion problems and decreases pagefault overhead. Remove an erroneous vfs_unbusy_pages. Fix a reentrancy problem, with nfs_vinvalbuf when vnode is already being rundown. Reassignbuf wasn't being called when needed under certain circumstances.
(Thanks to Bill Paul for help.)
|
32754 |
25-Jan-1998 |
dyson |
Various NFS fixes: Make vfs_bio buffer mgmt work better. Buffers were being used after brelse. Make nfs_getpages work independently of other NFS interfaces. This eliminates some difficult recursion problems and decreases pagefault overhead. Remove an erroneous vfs_unbusy_pages. Fix a reentrancy problem, with nfs_vinvalbuf when vnode is already being rundown. Reassignbuf wasn't being called when needed under certain circumstances.
(Thanks for help from Bill Paul.)
|
32609 |
18-Jan-1998 |
tegge |
Increase the minimum bootp reply packet size from 16 (bogus) to 300 (correct).
|
32358 |
09-Jan-1998 |
eivind |
Make the BOOTP family new-style options (in opt_bootp.h)
|
32350 |
08-Jan-1998 |
eivind |
Make INET a proper option.
This will not make any of object files that LINT create change; there might be differences with INET disabled, but hardly anything compiled before without INET anyway. Now the 'obvious' things will give a proper error if compiled without inet - ipx_ip, ipfw, tcp_debug. The only thing that _should_ work (but can't be made to compile reasonably easily) is sppp :-(
This commit move struct arpcom from <netinet/if_ether.h> to <net/if_arp.h>.
|
32286 |
06-Jan-1998 |
dyson |
Make our v_usecount vnode reference count work identically to the original BSD code. The association between the vnode and the vm_object no longer includes reference counts. The major difference is that vm_object's are no longer freed gratuitiously from the vnode, and so once an object is created for the vnode, it will last as long as the vnode does.
When a vnode object reference count is incremented, then the underlying vnode reference count is incremented also. The two "objects" are now more intimately related, and so the interactions are now much less complex.
When vnodes are now normally placed onto the free queue with an object still attached. The rundown of the object happens at vnode rundown time, and happens with exactly the same filesystem semantics of the original VFS code. There is absolutely no need for vnode_pager_uncache and other travesties like that anymore.
A side-effect of these changes is that SMP locking should be much simpler, the I/O copyin/copyout optimizations work, NFS should be more ponderable, and further work on layered filesystems should be less frustrating, because of the totally coherent management of the vnode objects and vnodes.
Please be careful with your system while running this code, but I would greatly appreciate feedback as soon a reasonably possible.
|
32071 |
29-Dec-1997 |
dyson |
Lots of improvements, including restructring the caching and management of vnodes and objects. There are some metadata performance improvements that come along with this. There are also a few prototypes added when the need is noticed. Changes include:
1) Cleaning up vref, vget. 2) Removal of the object cache. 3) Nuke vnode_pager_uncache and friends, because they aren't needed anymore. 4) Correct some missing LK_RETRY's in vn_lock. 5) Correct the page range in the code for msync.
Be gentle, and please give me feedback asap.
|
32011 |
27-Dec-1997 |
bde |
Unspammed nested include of <vm/vm_zone.h>.
|
31886 |
20-Dec-1997 |
bde |
Added a used include.
Fixed a gratuitous ANSIism and nearby KNF violations.
|
31617 |
08-Dec-1997 |
dyson |
Various of the ISP users have commented that the 1.41 version of the nfs_bio.c code worked better than the 1.44. This commit reverts the important parts of 1.44 to 1.41, and we will fix it when we can get a handle on the problem.
|
31391 |
24-Nov-1997 |
bde |
Don't call malloc(..., M_WAITOK) at splnet(). Doing so is often a mistake (since softnet interrupts may occur if malloc() waits), and doing it harmlessly but unnecessarily here interfered with detection of the mistaken cases.
|
31132 |
12-Nov-1997 |
julian |
Reviewed by: various.
Ever since I first say the way the mount flags were used I've hated the fact that modes, and events, internal and exported, and short-term and long term flags are all thrown together. Finally it's annoyed me enough.. This patch to the entire FreeBSD tree adds a second mount flag word to the mount struct. it is not exported to userspace. I have moved some of the non exported flags over to this word. this means that we now have 8 free bits in the mount flags. There are another two that might well move over, but which I'm not sure about. The only user visible change would have been in pstat -v, except that davidg has disabled it anyhow. I'd still like to move the state flags and the 'command' flags apart from each other.. e.g. MNT_FORCE really doesn't have the same semantics as MNT_RDONLY, but that's left for another day.
|
31017 |
07-Nov-1997 |
phk |
Rename some local variables to avoid shadowing other local variables.
Found by: -Wshadow
|
31016 |
07-Nov-1997 |
phk |
Remove a bunch of variables which were unused both in GENERIC and LINT.
Found by: -Wunused
|
30994 |
06-Nov-1997 |
phk |
Move the "retval" (3rd) parameter from all syscall functions and put it in struct proc instead.
This fixes a boatload of compiler warning, and removes a lot of cruft from the sources.
I have not removed the /*ARGSUSED*/, they will require some looking at.
libkvm, ps and other userland struct proc frobbing programs will need recompiled.
|
30813 |
28-Oct-1997 |
bde |
Removed unused #includes.
|
30808 |
28-Oct-1997 |
bde |
Don't #include <nfs/nfs.h> in <nfs/nfs_node.h> if KERNEL is defined. Fixed everything that depended on the nested include.
|
30780 |
27-Oct-1997 |
bde |
Removed unused #includes. The need for most of them went away with recent changes (docluster* and vfs improvements).
|
30743 |
26-Oct-1997 |
phk |
VFS interior redecoration.
Rename vn_default_error to vop_defaultop all over the place. Move vn_bwrite from vfs_bio.c to vfs_default.c and call it vop_stdbwrite. Use vop_null instead of nullop. Move vop_nopoll from vfs_subr.c to vfs_default.c Move vop_sharedlock from vfs_subr.c to vfs_default.c Move vop_nolock from vfs_subr.c to vfs_default.c Move vop_nounlock from vfs_subr.c to vfs_default.c Move vop_noislocked from vfs_subr.c to vfs_default.c Use vop_ebadf instead of *_ebadf. Add vop_defaultop for getpages on master vnode in MFS.
|
30738 |
26-Oct-1997 |
phk |
Always initialize the syscall vectors for our "private" syscalls (not just in the LKM case). Plug nqnfs_vop_lease_check directly into the default_vnodeop_p table.
|
30496 |
16-Oct-1997 |
phk |
VFS clean up "hekto commit"
1. Add defaults for more VOPs VOP_LOCK vop_nolock VOP_ISLOCKED vop_noislocked VOP_UNLOCK vop_nounlock and remove direct reference in filesystems.
2. Rename the nfsv2 vnop tables to improve sorting order.
|
30492 |
16-Oct-1997 |
phk |
Another VFS cleanup "kilo commit"
1. Remove VOP_UPDATE, it is (also) an UFS/{FFS,LFS,EXT2FS,MFS} intereface function, and now lives in the ufsmount structure.
2. Remove VOP_SEEK, it was unused.
3. Add mode default vops:
VOP_ADVLOCK vop_einval VOP_CLOSE vop_null VOP_FSYNC vop_null VOP_IOCTL vop_enotty VOP_MMAP vop_einval VOP_OPEN vop_null VOP_PATHCONF vop_einval VOP_READLINK vop_einval VOP_REALLOCBLKS vop_eopnotsupp
And remove identical functionality from filesystems
4. Add vop_stdpathconf, which returns the canonical stuff. Use it in the filesystems. (XXX: It's probably wrong that specfs and fifofs sets this vop, shouldn't it come from the "host" filesystem, for instance ufs or cd9660 ?)
5. Try to make system wide VOP functions have vop_* names.
6. Initialize the um_* vectors in LFS.
(Recompile your LKMS!!!)
|
30474 |
16-Oct-1997 |
phk |
VFS mega cleanup commit (x/N)
1. Add new file "sys/kern/vfs_default.c" where default actions for VOPs go. Implement proper defaults for ABORTOP, BWRITE, LEASE, POLL, REVOKE and STRATEGY. Various stuff spread over the entire tree belongs here.
2. Change VOP_BLKATOFF to a normal function in cd9660.
3. Kill VOP_BLKATOFF, VOP_TRUNCATE, VOP_VFREE, VOP_VALLOC. These are private interface functions between UFS and the underlying storage manager layer (FFS/LFS/MFS/EXT2FS). The functions now live in struct ufsmount instead.
4. Remove a kludge of VOP_ functions in all filesystems, that did nothing but obscure the simplicity and break the expandability. If a filesystem doesn't implement VOP_FOO, it shouldn't have an entry for it in its vnops table. The system will try to DTRT if it is not implemented. There are still some cruft left, but the bulk of it is done.
5. Fix another VCALL in vfs_cache.c (thanks Bruce!)
|
30439 |
15-Oct-1997 |
phk |
vnops megacommit
1. Use the default function to access all the specfs operations. 2. Use the default function to access all the fifofs operations. 3. Use the default function to access all the ufs operations. 4. Fix VCALL usage in vfs_cache.c 5. Use VOCALL to access specfs functions in devfs_vnops.c 6. Staticize most of the spec and fifofs vnops functions. 7. Make UFS panic if it lacks bits of the underlying storage handling.
|
30434 |
15-Oct-1997 |
phk |
Hmm, realign the vnops into two columns.
|
30431 |
15-Oct-1997 |
phk |
Stylistic overhaul of vnops tables. 1. Remove comment stating the blatantly obvious. 2. Align in two columns. 3. Sort all but the default element alphabetically. 4. Remove XXX comments pointing out entries not needed.
|
30430 |
15-Oct-1997 |
phk |
When the default vnops funtion is vn_default_error(), there is no reason to implement small functions that just return EOPNOTSUPP for things we don't do.
The removed functions only apply to UFS based filesystems anyway.
|
30354 |
12-Oct-1997 |
phk |
Last major round (Unless Bruce thinks of somthing :-) of malloc changes.
Distribute all but the most fundamental malloc types. This time I also remembered the trick to making things static: Put "static" in front of them.
A couple of finer points by: bde
|
30309 |
11-Oct-1997 |
phk |
Distribute and statizice a lot of the malloc M_* types.
Substantial input from: bde
|
30118 |
05-Oct-1997 |
phk |
Reverse rev 1.56 and rev 1.59. These made NFS too flakey.
|
29653 |
21-Sep-1997 |
dyson |
Change the M_NAMEI allocations to use the zone allocator. This change plus the previous changes to use the zone allocator decrease the useage of malloc by half. The Zone allocator will be upgradeable to be able to use per CPU-pools, and has more intelligent usage of SPLs. Additionally, it has reasonable stats gathering capabilities, while making most calls inline.
|
29363 |
14-Sep-1997 |
peter |
select -> poll flag missing vnode op table entries
|
29293 |
10-Sep-1997 |
phk |
Don't repeat checks done at general level.
|
29291 |
10-Sep-1997 |
phk |
Remove a couple of stubborn NetBSD #if's.
|
29288 |
10-Sep-1997 |
phk |
unifdef -U__NetBSD__ -D__FreeBSD__
|
29202 |
07-Sep-1997 |
bde |
Removed more vestiges of config-time swap configuration.
|
29024 |
02-Sep-1997 |
bde |
Added used #include - don't depend on <sys/mbuf.h> including <sys/malloc.h> (unless we only use the bogusly shared M*WAIT flags).
|
28787 |
26-Aug-1997 |
phk |
Uncut&paste cache_lookup().
This unifies several times in theory indentical 50 lines of code.
The filesystems have a new method: vop_cachedlookup, which is the meat of the lookup, and use vfs_cache_lookup() for their vop_lookup method. vfs_cache_lookup() will check the namecache and pass on to the vop_cachedlookup method in case of a miss.
It's still the task of the individual filesystems to populate the namecache with cache_enter().
Filesystems that do not use the namecache will just provide the vop_lookup method as usual.
|
28270 |
16-Aug-1997 |
wollman |
Fix all areas of the system (or at least all those in LINT) to avoid storing socket addresses in mbufs. (Socket buffers are the one exception.) A number of kernel APIs needed to get fixed in order to make this happen. Also, fix three protocol families which kept PCBs in mbufs to not malloc them instead. Delete some old compatibility cruft while we're at it, and add some new routines in the in_cksum family.
|
27845 |
02-Aug-1997 |
bde |
Removed unused #includes.
|
27609 |
22-Jul-1997 |
dfr |
Correct some dumb mistakes in the WebNFS stuff.
Submitted by: bde
|
27446 |
16-Jul-1997 |
dfr |
Merge WebNFS changes from NetBSD.
Obtained from: NetBSD
|
26995 |
27-Jun-1997 |
wpaul |
Fix a condition where nfs_statfs() can precipitate a panic. There is code that says this:
nfsm_request(vp, NFSPROC_FSSTAT, p, cred); if (v3) nfsm_postop_attr(vp, retattr); if (!error) nfsm_dissect(sfp, struct nfs_statfs *, NFSX_STATFS(v3));
The problem here is that if error != 0, nfsm_dissect() will not be called, which leaves sfp == NULL. But nfs_statfs() does not bail out at this point: it continues processing until it tries to dereference sfp, which causes a panic. I was able to generate this crash under the following conditions:
1) Set up a machine as an NFS server and NFS client, with amd running (using NIS maps). /usr/local is exported, though any exported fs can can be used to trigger the bug. 2) Log in as normal user, with home directory mounted from a SunOS 4.1.3 NFS server via amd (along with a few other NFS filesystems from same machine). 3) Su to root and type the following: # mount localhost:/usr/local /mnt # df
To fix the panic, I changed the code to read:
if (!error) { nfsm_dissect(sfp, struct nfs_statfs *, NFSX_STATFS(v3)); } else goto nfsmout;
This is a bit kludgy in that nfsmout is a label defined by the nfsm_subs.h macros, but these macros are themselves more than a little kludgy. This stops the machine from crashing, but does not fix the overall bug: 'error' somehow becomes 5 (EIO) when a statfs() is performed on the locally mounted NFS filesystem. This seems to only happen the first time the filesystem is accesed: on subsequent accesses, it seems to work fine again.
Now, I know there's no practical use in mounting a local filesystem via NFS, but doing it shouldn't cause the system to melt down.
|
26952 |
25-Jun-1997 |
tegge |
Clear nfs_iodwant[myiod] when the nfsiod process exits due to a signal.
|
26929 |
25-Jun-1997 |
dfr |
Avoid small synchronous writes when an application does lots of random-access short writes within a block (e.g. ld).
|
26928 |
25-Jun-1997 |
dfr |
Make nfs_lookup return a NULLVP on error so that DIAGNOSTIC kernels don't panic.
|
26669 |
16-Jun-1997 |
dyson |
Upgrade NFS to support the new vfs_bio resource/buffer management.
|
26581 |
12-Jun-1997 |
tegge |
Move commonly used code into static functions in order to reduce kernel bloat.
|
26580 |
12-Jun-1997 |
tegge |
Remove unused routines.
|
26469 |
06-Jun-1997 |
dfr |
Fix a problem caused by removing large numbers of files from a directory which could cause a bad size to be given to uiomove, causing a page fault.
|
26420 |
03-Jun-1997 |
dfr |
Various fixes from NetBSD:
Use u_int for rpc procedure numbers. Some fixes to NQNFS. A rare NULL pointer dereference. Ignore NFSMNT_NOCONN for TCP mounts.
Obtained from: NetBSD
|
26418 |
03-Jun-1997 |
dfr |
Implement the async mount option for NFSv3. This makes NFS pretend that all writes sent to the server were synchronous and therefore no commits are needed. This is the same as the vfs.nfs.async variable on the server but allows each client to choose whether to work this way.
Also make the vfs.nfs.async variable do the 'right' thing for NFSv3, i.e. pretend that the write was synchronous.
|
26410 |
03-Jun-1997 |
dfr |
Fix a problem with nfs_flush where if many B_NEEDCOMMIT buffers are attached to the vnode, some of them could be re-written synchronously (if they overflowed the fixed size array nfs_flush had for them). The fix involves mallocing an array if there are more than its limited size stack buffer.
Reviewed by: Hidetoshi Shimokawa <simokawa@sat.t.u-tokyo.ac.jp>
|
26409 |
03-Jun-1997 |
dfr |
Fix some performance problems with the NFS mmap fixes.
|
25952 |
20-May-1997 |
dfr |
Plug a memory leak in nfs_link.
PR: kern/1001
|
25930 |
19-May-1997 |
dfr |
Fix a few bugs with NFS and mmap caused by NFS' use of b_validoff and b_validend. The changes to vfs_bio.c are a bit ugly but hopefully can be tidied up later by a slight redesign.
PR: kern/2573, kern/2754, kern/3046 (possibly) Reviewed by: dyson
|
25877 |
17-May-1997 |
phk |
Remove redundant check for vp == dvp (done in VFS before calling).
|
25804 |
14-May-1997 |
tegge |
Use same syntax as netboot for root and swap mounts. Handle mount options. Ignore T16 (swap server address) and T6 (DNS server).
|
25785 |
13-May-1997 |
dfr |
Check the B_CLUSTER flag when choosing whether to use unstable or filesync writes.
PR: kern/3438 Submitted by: Tor Egge <Tor.Egge@idi.ntnu.no>
|
25781 |
13-May-1997 |
dfr |
Don't keep addresses in mbuf chains. This should simplify the next round of network changes from Garret.
Reviewed by: Garrett Wollman <wollman@khavrinen.lcs.mit.edu>
|
25755 |
12-May-1997 |
tegge |
Use the old nfs arguments in the nfs_diskless structure, to be compatible with boot proms made from the 2.2 source. Convert the nfs arguments when copying to the new diskless structure. Copy the gateway field in the diskless structure.
|
25723 |
11-May-1997 |
tegge |
Bring in some kernel bootp support. This removes the need for netboot to fill in the nfs_diskless structure, at the cost of some kernel bloat. The advantage is that this code works on a wider range of network adapters than netboot. Several new kernel options are documented in LINT. Obtained from: parts of the code comes from NetBSD.
|
25664 |
10-May-1997 |
dfr |
Implement a separate control for write gathering on NFSv3. This is turned off for NFSv3 by default since write gathering seems to reduce performance for NFSv3 by up to 60%.
Add sysctl knobs to control both variables.
|
25663 |
10-May-1997 |
dfr |
Fix a nasty hang connected with write gathering. Also add debug print statements to bits of the server which helped me find the hang.
|
25611 |
09-May-1997 |
dfr |
Prevent a mapped root which appears on the server as e.g. nobody from accessing files which it shouldn't be able to. This required a better approximation of VOP_ACCESS for NFSv2 (NFSv3 already has an ACCESS rpc which is a better solution) and adding a call to VOP_ACCESS from VOP_LOOKUP.
PR: kern/876, kern/2635 Submitted by: David Malone <dwmalone@maths.tcd.ie> (for kern/2635)
|
25610 |
09-May-1997 |
dfr |
Fix memory leak caused by the fact that the directory offset cookies and the sillyrename information are stored in the same place.
|
25459 |
04-May-1997 |
phk |
Now I can even execute "df" on my diskless :-)
|
25453 |
04-May-1997 |
phk |
1. Add a {pointer, v_id} pair to the vnode to store the reference to the ".." vnode. This is cheaper storagewise than keeping it in the namecache, and it makes more sense since it's a 1:1 mapping.
2. Also handle the case of "." more intelligently rather than stuff the namecache with pointless entries.
3. Add two lists to the vnode and hang namecache entries which go from or to this vnode. When cleaning a vnode, delete all namecache entries it invalidates.
4. Never reuse namecache enties, malloc new ones when we need it, free old ones when they die. No longer a hard limit on how many we can have.
5. Remove the upper limit on namelength of namecache entries.
6. Make a global list for negative namecache entries, limit their number to a sysctl'able (debug.ncnegfactor) fraction of the total namecache. Currently the default fraction is 1/16th. (Suggestions for better default wanted!)
7. Assign v_id correctly in the face of 32bit rollover.
8. Remove the LRU list for namecache entries, not needed. Remove the #ifdef NCH_STATISTICS stuff, it's not needed either.
9. Use the vnode freelist as a true LRU list, also for namecache accesses.
10. Reuse vnodes more aggresively but also more selectively, if we can't reuse, malloc a new one. There is no longer a hard limit on their number, they grow to the point where we don't reuse potentially usable vnodes. A vnode will not get recycled if still has pages in core or if it is the source of namecache entries (Yes, this does indeed work :-) "." and ".." are not namecache entries any longer...)
11. Do not overload the v_id field in namecache entries with whiteout information, use a char sized flags field instead, so we can get rid of the vpid and v_id fields from the namecache struct. Since we're linked to the vnodes and purged when they're cleaned, we don't have to check the v_id any more.
12. NFS knew about the limitation on name length in the namecache, it shouldn't and doesn't now.
Bugs: The namecache statistics no longer includes the hits for ".." and "." hits.
Performance impact: Generally in the +/- 0.5% for "normal" workstations, but I hope this will allow the system to be selftuning over a bigger range of "special" applications. The case where RAM is available but unused for cache because we don't have any vnodes should be gone.
Future work: Straighten out the namecache statistics.
"desiredvnodes" is still used to (bogusly ?) size hash tables in the filesystems.
I have still to find a way to safely free unused vnodes back so their number can shrink when not needed.
There is a few uses of the v_id field left in the filesystems, scheduled for demolition at a later time.
Maybe a one slot cache for unused namecache entries should be implemented to decrease the malloc/free frequency.
|
25416 |
03-May-1997 |
phk |
Make nfs roots (diskless) functional again. It may still not be correct, but it is functional.
|
25307 |
30-Apr-1997 |
dfr |
Allow NULL rpcs on non-privileged ports at all times to work around broken clients.
PR: kern/3298 Submitted by: Tor Egge <Tor.Egge@idi.ntnu.no>
|
25201 |
27-Apr-1997 |
wollman |
The long-awaited mega-massive-network-code- cleanup. Part I.
This commit includes the following changes: 1) Old-style (pr_usrreq()) protocols are no longer supported, the compatibility glue for them is deleted, and the kernel will panic on boot if any are compiled in.
2) Certain protocol entry points are modified to take a process structure, so they they can easily tell whether or not it is possible to sleep, and also to access credentials.
3) SS_PRIV is no more, and with it goes the SO_PRIVSTATE setsockopt() call. Protocols should use the process pointer they are now passed.
4) The PF_LOCAL and PF_ROUTE families have been updated to use the new style, as has the `raw' skeleton family.
5) PF_LOCAL sockets now obey the process's umask when creating a socket in the filesystem.
As a result, LINT is now broken. I'm hoping that some enterprising hacker with a bit more time will either make the broken bits work (should be easy for netipx) or dike them out.
|
25089 |
22-Apr-1997 |
dfr |
Fix broken usage of nm_readdirsize and increase the socket buffers for UDP to prevent possible socket overflows.
2.2 candidate.
PR: kern/3304 Reviewed by: Thomas David Rivers <ponds!rivers@dg-rtp.dg.com>
|
25023 |
19-Apr-1997 |
dfr |
Fix a bug where a program which appended many small records to a file could wind up writing zeros instead of real data when the file is on an NFSv2 mounted directory.
While tracking this bug down, I noticed that nfs_asyncio was waking *all* the iods when a block was written instead of just one per block. Fixing this gives a 25% performance improvment for writes on v2 (less for v3).
Both are 2.2 candidates.
PR: kern/2774
|
25003 |
18-Apr-1997 |
dfr |
Don't allow partial buffers to be cluster-comitted. Zero the b_dirty{off,end} after cluster-comitting a group of buffers.
With these fixes, I was able to complete a 'make world' with remote src and obj directories.
|
24626 |
04-Apr-1997 |
dfr |
Fix various bugs in the locking protocol, allowing proper shared locks to be used. This should fix the lock panics that people are seeing.
|
24577 |
03-Apr-1997 |
dfr |
The code which recovered from a modified directory situation did not check for eof when re-caching the directory. This could cause it to loop forever if a directory was truncated.
|
24381 |
29-Mar-1997 |
bde |
Removed #include of <ufs/ufs/dir.h>. Nfs no longer depends on any ufs features, and the one thing that it depended on (DIRBLKSIZ) now has conflicting spelling.
|
24378 |
29-Mar-1997 |
bde |
Define our own version of DIRBLKSIZ instead of (ab)using ufs's value. Use the same value of 512 (ufs actually uses DEV_BSIZE). There are too many versions of DIRBLKSIZ, one for ufs, one for ext2fs, one for nfs, one for ibcs2, one for linux, one for applications, ... I think nfs's DIRBLKSIZ needs to be a divisor of the directory blocks sizes of all supported file systems. There is also NFS_DIRBLKSIZ, which is different from nfs's DIRBLKSIZ but is sometimes confused with it in comments.
Removed a bogus #ifdef KERNEL that hid the tunable constants for nfs. This came in undocumented with the Lite2 merge although it isn't in Lite2. It required more-bogus #define KERNEL's in fstat and pstat to make the constants visible.
Restored a spelling fix from rev.1.17.
Removed duplicate #defines of all the the NFS mount option flags.
|
24330 |
27-Mar-1997 |
guido |
Add code that will reject nfs requests in teh kernel from nonprivileged ports. This option will be automatically set/cleraed when mount is run without/with the -n option. Reviewed by: Doug Rabson
|
24204 |
24-Mar-1997 |
bde |
Don't include <sys/ioctl.h> in the kernel. Stage 2: include <sys/sockio.h> instead of <sys/ioctl.h> in network files.
|
24101 |
22-Mar-1997 |
bde |
Fixed some invalid (non-atomic) accesses to `time', mostly ones of the form `tv = time'. Use a new function gettime(). The current version just forces atomicicity without fixing precision or efficiency bugs. Simplified some related valid accesses by using the central function.
|
23570 |
09-Mar-1997 |
bde |
YAMInTheWrongDirectionF22 (part of rev.1.28.2.3: set B_CLUSTEROK for commits).
|
23218 |
28-Feb-1997 |
bde |
Fixed a panic in nfs_writevp(). Lite2 provided a fix for a silly missing-parentheses bug, but this exposed a misplaced vfs_busy_pages(). This bug cost a factor of 2.5-3 in nfsv3 write performance! It should be fixed in 2.2.
Removed some debugging code that gets triggered often in normal operation. There are still many backwards diagnostics (#define DIAGNOSTIC gives no diagnostics).
Submitted by: vfs_busy_pages() fix by dfr
|
22975 |
22-Feb-1997 |
peter |
Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.
|
22871 |
18-Feb-1997 |
bde |
Changed `#ifdef COMPAT_PRELITE2' to `#ifndef NO_COMPAT_PRELITE2' so that old nfs mount calls are supported by default.
|
22521 |
10-Feb-1997 |
dyson |
This is the kernel Lite/2 commit. There are some requisite userland changes, so don't expect to be able to run the kernel as-is (very well) without the appropriate Lite/2 userland changes.
The system boots and can mount UFS filesystems.
Untested: ext2fs, msdosfs, NFS Known problems: Incorrect Berkeley ID strings in some files. Mount_std mounts will not work until the getfsent library routine is changed.
Reviewed by: various people Submitted by: Jeffery Hsu <hsu@freebsd.org>
|
21673 |
14-Jan-1997 |
jkh |
Make the long-awaited change from $Id$ to $FreeBSD$
This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long.
Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
|
21124 |
31-Dec-1996 |
wpaul |
Fix (properly, I hope) 'panic: sillyrename dir' crash that can happen if you do:
% cd /nfsdir % mkdir -p foo/foo % mv foo/foo .
nfs_sillyrename() self-destructs if you try to sillyrename a directory, however nfs_rename() can be coerced into doing just that by the above sequence of commands. To avoid this, nfs_rename() now checks that v_type of the 'destination' vnode != VDIR before attempting the sillyrename. The server correctly handles this particular situation by returning ENOTEMPTY on the rename() attempt.
I asked if this was the correct fix for this on -hackers but nobody ever answered.
This is a 2.2 candidate.
|
20407 |
13-Dec-1996 |
wollman |
Convert the interface address and IP interface address structures to TAILQs. Fix places which referenced these for no good reason that I can see (the references remain, but were fixed to compile again; they are still questionable).
|
19449 |
06-Nov-1996 |
dfr |
Improve the queuing algorithms used by NFS' asynchronous i/o. The existing mechanism uses a global queue for some buffers and the vp->b_dirtyblkhd queue for others. This turns sequential writes into randomly ordered writes to the server, affecting both read and write performance. The existing mechanism also copes badly with hung servers, tending to block accesses to other servers when all the iods are waiting for a hung server.
The new mechanism uses a queue for each mount point. All asynchronous i/o goes through this queue which preserves the ordering of requests. A simple mechanism ensures that the iods are shared out fairly between active mount points. This removes the sysctl variable vfs.nfs.dwrite since the new queueing mechanism removes the old delayed write code completely.
This should go into the 2.2 branch.
|
19070 |
21-Oct-1996 |
dfr |
If a large (>4096 bytes) directory was modified, the old directory contents are discarded, including the cached seek cookies. Unfortunately, if the directory was larger than NFS_DIRBLKSIZ, then this confused nfs_readdirrpc(), making it appear as if the directory was truncated.
Reviewed by: Karl Denninger <karl@Mcs.Net>
|
19058 |
20-Oct-1996 |
phk |
Add four sysctl variables that joerg wanted.
|
18888 |
12-Oct-1996 |
bde |
Staticized `nfs_dwrite'.
|
18866 |
11-Oct-1996 |
dfr |
This fixes a problem with the nfs socket handling code which happens if a single process is performing a large number of requests (in this case writing a large file). The writing process could monopolise the recieve lock and prevent any other processes from recieving their replies.
It also adds a new sysctl variable 'vfs.nfs.dwrite' which controls the behaviour which originally pointed out the problem. When a process writes to a file over NFS, it usually arranges for another process (the 'iod') to perform the request. If no iods are available, then it turns the write into a 'delayed write' which is later picked up by the next iod to do a write request for that file. This can cause that particular iod to do a disproportionate number of requests from a single process which can harm performance on some NFS servers. The alternative is to perform the write synchronously in the context of the original writing process if no iod is avaiable for asynchronous writing.
The 'delayed write' behaviour is selected when vfs.nfs.dwrite=1 and the non-delayed behaviour is selected when vfs.nfs.dwrite=0. The default is vfs.nfs.dwrite=1; if many people tell me that performance is better if vfs.nfs.dwrite=0 then I will change the default.
Submitted by: Hidetoshi Shimokawa <simokawa@sat.t.u-tokyo.ac.jp>
|
18397 |
19-Sep-1996 |
nate |
In sys/time.h, struct timespec is defined as:
/* * Structure defined by POSIX.4 to be like a timeval. */ struct timespec { time_t ts_sec; /* seconds */ long ts_nsec; /* and nanoseconds */ };
The correct names of the fields are tv_sec and tv_nsec.
Reminded by: James Drobina <jdrobina@infinet.com>
|
17761 |
21-Aug-1996 |
dyson |
Even though this looks like it, this is not a complex code change. The interface into the "VMIO" system has changed to be more consistant and robust. Essentially, it is now no longer necessary to call vn_open to get merged VM/Buffer cache operation, and exceptional conditions such as merged operation of VBLK devices is simpler and more correct.
This code corrects a potentially large set of problems including the problems with ktrace output and loaded systems, file create/deletes, etc.
Most of the changes to NFS are cosmetic and name changes, eliminating a layer of subroutine calls. The direct calls to vput/vrele have been re-instituted for better cross platform compatibility.
Reviewed by: davidg
|
17186 |
16-Jul-1996 |
dfr |
Various fixes from frank@fwi.uva.nl (Frank van der Linden) via rick@snowhite.cis.uoguelph.ca:
1. Clear B_NEEDCOMMIT in nfs_write to make sure that dirty data is correctly send to the server. If a buffer was dirtied when it was in the B_DELWRI+B_NEEDCOMMIT state, the state of the buffer was left unchanged and when the buffer was later cleaned, just a commit rpc was made to the server to complete the previous write. Clearing B_NEEDCOMMIT ensures that another write is made to the server.
2. If a server returned a server (for whatever reason) returned an answer to a write RPC that implied that fewer bytes than requested were written, bad things would happen.
3. The setattr operation passed on the atime in stead of the mtime to the server. The fix is trivial.
4. XIDs always started at 0, but this caused some servers (older DEC OSF/1 3.0 so I've been told) who had very long-lasting XID caches to get confused if, after a reboot of a BSD client, RPCs came in with a XID that had in the past been used before from that client. Patch is to use the current time in seconds as a starting point for XIDs. The patch below is not perfect, because it requires the root fs to be mounted first. This is because of the check BSD systems do, comparing FS time to system time.
Reviewed by: Bruce Evans, Terry Lambert. Obtained from: frank@fwi.uva.nl (Frank van der Linden) via rick@snowhite.cis.uoguelph.ca
|
17096 |
11-Jul-1996 |
wollman |
Modify the kernel to use the new pr_usrreqs interface rather than the old pr_usrreq mechanism which was poorly designed and error-prone. This commit renames pr_usrreq to pr_ousrreq so that old code which depended on it would break in an obvious manner. This commit also implements the new interface for TCP, although the old function is left as an example (#ifdef'ed out). This commit ALSO fixes a longstanding bug in the TCP timer processing (introduced by davidg on 1995/04/12) which caused timer processing on a TCB to always stop after a single timer had expired (because it misinterpreted the return value from tcp_usrreq() to indicate that the TCB had been deleted). Finally, some code related to polling has been deleted from if.c because it is not relevant t -current and doesn't look at all like my current code.
|
16634 |
23-Jun-1996 |
bde |
Don't truncate minor or major numbers in the nfsv3 client.
|
16365 |
14-Jun-1996 |
phk |
Fix for NFS_NOSERVER
Poul mentioned that he thought this was some kind of timing problem, and that started me thinking. After a little poking around, I found that nfs_timer() was completely disabled when NFS_NOSERVER was #defined. But after looking at nfs_timer(), it seemed like it was something required by both the client and server code, and disabling it outright just didn't seem to make any sense. Parts of it relate only to the NFS server side code, so I disabled those, but I re-enabled the rest of the function and made sure that it would be called from nfs_init() (in nfs_subs.c).
With nfs_timer() re-enabled, everything seems to work again. The only other changes I made were to #ifdef away some variable declarations in the NFS_NOSERVER case so that gcc would stop complaining about unused variables.
Reviewed by: phk Submitted by: Bill Paul <wpaul@skynet.ctr.columbia.edu>
|
16312 |
12-Jun-1996 |
dg |
Moved the fsnode MALLOC to before the call to getnewvnode() so that the process won't possibly block before filling in the fsnode pointer (v_data) which might be dereferenced during a sync since the vnode is put on the mnt_vnodelist by getnewvnode.
Pointed out by Matt Day <mday@artisoft.com>
|
16192 |
08-Jun-1996 |
pst |
Clear flags before using an inactive buffer. This is a kludge, but matches the code in bread().
Reviewed by: bde
|
15543 |
02-May-1996 |
phk |
removed: CLBYTES PD_SHIFT PGSHIFT NBPG PGOFSET CLSIZELOG2 CLSIZE pdei() ptei() kvtopte() ptetov() ispt() ptetoav() &c &c new: NPDEPG
Major macro cleanup.
|
15480 |
30-Apr-1996 |
bde |
#include <sys/filedesc.h> explicitly instead of depending on it being bogusly included by <sys/socketvar.h>.
|
15479 |
30-Apr-1996 |
bde |
Fixed nfs sysctls. They missed out on the fs -> vfs name changes from Lite2. This broke nfsstat.
|
14093 |
13-Feb-1996 |
wollman |
Kill XNS. While we're at it, fix socreate() to take a process argument. (This was supposed to get committed days ago...)
|
13765 |
30-Jan-1996 |
mpp |
Fix a bunch of spelling errors in the comment fields of a bunch of system include files.
|
13625 |
25-Jan-1996 |
bde |
Fixed spelling of s_namlen so that this compiles again.
|
13619 |
24-Jan-1996 |
phk |
Use new printf features rather than local kludges.
|
13612 |
24-Jan-1996 |
mpp |
Add a check to prevent a computation from underflowing and causing a panic due to an attaempt to allocate a buffer for a terabyte or so of data when an attempt is made to create sparse data (e.g. a holey file) more than 1 block past the end of the file.
Note: some other areas of this code need to be looked at, since they might cause problems when the file size exceeds 2GB, due to storing results in ints when the computations are being done with quad sized variables.
Reviewed by: bde
|
13490 |
19-Jan-1996 |
dyson |
Eliminated many redundant vm_map_lookup operations for vm_mmap. Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish overhead for merged cache. Efficiency improvement for vfs_cluster. It used to do alot of redundant calls to cluster_rbuild. Correct the ordering for vrele of .text and release of credentials. Use the selective tlb update for 486/586/P6. Numerous fixes to the size of objects allocated for files. Additionally, fixes in the various pagers. Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs. Fixes in the swap pager for exhausted resources. The pageout code will not as readily thrash. Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE), thereby improving efficiency of several routines. Eliminate even more unnecessary vm_page_protect operations. Significantly speed up process forks. Make vm_object_page_clean more efficient, thereby eliminating the pause that happens every 30seconds. Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the case of filesystems mounted async. Fix a panic with busy pages when write clustering is done for non-VMIO buffers.
|
13416 |
13-Jan-1996 |
phk |
Add an option NFS_NOSERVER which saves 100K in the install kernel (or any other kernel that uses it). Use with option NFS.
|
13084 |
28-Dec-1995 |
phk |
Don't print swap server as root server. Submitted by: Mattias.Gronlund@sa.erisoft.se (Mattias Gronlund)
|
12970 |
22-Dec-1995 |
phk |
Move fs.nfs.nfsstats sysctl var back to it's old OID.
|
12911 |
17-Dec-1995 |
phk |
Staticize.
|
12662 |
07-Dec-1995 |
dg |
Untangled the vm.h include file spaghetti.
|
12588 |
03-Dec-1995 |
bde |
Completed function declarations and/or added prototypes and/or moved prototypes to the right place.
|
12457 |
21-Nov-1995 |
bde |
Completed function declarations, added prototypes and removed redundant declarations.
|
12453 |
21-Nov-1995 |
bde |
Completed function declarations and/or added prototypes.
|
12287 |
14-Nov-1995 |
phk |
Get rid of hostnamelen variable.
|
12274 |
14-Nov-1995 |
bde |
Included <sys/sysproto.h> to get central declarations for syscall args structs and prototypes for syscalls.
Ifdefed duplicated decentralized declarations of args structs. It's convenient to have this visible but they are hard to maintain. Some are already different from the central declarations. 4.4lite2 puts them in comments in the function headers but I wanted to avoid the large changes for that.
|
12158 |
09-Nov-1995 |
bde |
Introduced a type `vop_t' for vnode operation functions and used it 1138 times (:-() in casts and a few more times in declarations. This change is null for the i386.
The type has to be `typedef int vop_t(void *)' and not `typedef int vop_t()' because `gcc -Wstrict-prototypes' warns about the latter. Since vnode op functions are called with args of different (struct pointer) types, neither of these function types is any use for type checking of the arg, so it would be preferable not to use the complete function type, especially since using the complete type requires adding 1138 casts to avoid compiler warnings and another 40+ casts to reverse the function pointer conversions before calling the functions.
|
12118 |
06-Nov-1995 |
bde |
Replaced bogus macros for dummy devswitch entries by functions. These functions went away:
enosys (hasn't been used for some time) enxio enodev enoioctl (was used only once, actually for a vop)
if_tun.c: Continued cleaning up...
conf.h: Probably fixed the type of d_reset_t. It is hard to tell the correct type because there are no non-dummy device reset functions.
Removed last vestige of ambiguous sleep message strings.
|
11982 |
31-Oct-1995 |
joerg |
Include a prerequisite header (so this is consistent again with the NFSv2 state).
|
11921 |
29-Oct-1995 |
phk |
Second batch of cleanup changes. This time mostly making a lot of things static and some unused variables here and there.
|
11645 |
22-Oct-1995 |
dg |
Fix order problem: unbusy pages before releasing the buffer.
Submitted by: John Dyson <dyson>
|
11644 |
22-Oct-1995 |
dg |
Moved the filesystem read-only check out of the syscalls and into the filesystem layer, as was done in lite-2. Merged in some other cosmetic changes while I was at it. Rewrote most of msdosfs_access() to be more like ufs_access() and to include the FS read-only check.
Obtained from: partially from 4.4BSD-lite2
|
10551 |
04-Sep-1995 |
dyson |
Added VOP_GETPAGES/VOP_PUTPAGES and also the "backwards" block count for VOP_BMAP. Updated affected filesystems...
|
10478 |
30-Aug-1995 |
dfr |
Make nfs diskless work again.
Reviewed by: John Hay <jhay@mikom.csir.co.za>
|
10223 |
24-Aug-1995 |
dg |
Killed redundant declarations of nfsm_rpchead().
|
10222 |
24-Aug-1995 |
dfr |
Some fixes found using gcc -Wall:
nfsm_rpchead() has been called with the wrong number of args and misplaced args since someone added new args in the middle for nfsv3.
Here's another one that would be important on 64-bit systems. VOP_READDIR takes a `u_int **cookies' arg.
Submitted by: Bruce Evans <bde@zeta.org.au>
|
10219 |
24-Aug-1995 |
dfr |
Add support for amd direct maps.
Reviewed by: Thomas Graichen <graichen@sirius.physik.fu-berlin.de>
|
10027 |
11-Aug-1995 |
dg |
Converted mountlist to a CIRCLEQ.
Partially obtained from: 4.4BSD-Lite2
|
9842 |
01-Aug-1995 |
dg |
Removed my special-case hack for VOP_LINK and fixed the problem with the wrong vp's ops vector being used by changing the VOP_LINK's argument order. The special-case hack doesn't go far enough and breaks the generic bypass routine used in some non-leaf filesystems. Pointed out by Kirk McKusick.
|
9759 |
29-Jul-1995 |
bde |
Eliminate sloppy common-style declarations. There should be none left for the LINT configuation.
|
9681 |
24-Jul-1995 |
dfr |
Slightly better fix than previous revision.
Submitted by: Rick Macklem <rick@snowhite.cis.uoguelph.ca>
|
9679 |
24-Jul-1995 |
dfr |
Fix a problem which appeared to truncate a file to the nearest block boundary when it is moved to an NFS filesystem from from another filesystem and /bin/mv failed to set the file ownership during the move.
I believe that this bug is present in STABLE but I have not tested it. The fix would be the same in STABLE even though the code has changed quite considerably in CURRENT.
|
9627 |
22-Jul-1995 |
dg |
Correct my cut-'n-paste job from ffs_vfsops.c and fix up the formatting to be similar.
Submitted by: Bruce Evans
|
9604 |
21-Jul-1995 |
dg |
Implemented an nfs_node hash list lock, similar to what was implemented in ffs_vget(), and for the same reason: to prevent a race condition that results in duplicate vnodes/NFSnodes being allocated.
|
9588 |
20-Jul-1995 |
dg |
vnode_pager_alloc() never returns NULL, so don't check for it.
|
9522 |
13-Jul-1995 |
dfr |
I believe that the following fix to nfs_vnops.c should do the trick w.r.t. the problem "when a file is truncated on the server after being written on a client under NFSv3, the client doesn't see the size drop to zero". (As you noted, the problem is that NMODIFIED wasn't being cleared by nfs_close when it flushed the buffers. After checking through the code, the only place where NMODIFIED was used to test for the possibility of dirty blocks was in nfs_setattr(). The two cases are safe to do when there aren't dirty blocks, so I just took out the tests. Unfortunately, testing for v_dirtyblkhd.lh_first being non-null is not sufficient, since there are times when the code moves blocks to the clean list and then back to the dirty list.)
Submitted by: rick@snowhite.cis.uoguelph.ca
|
9507 |
13-Jul-1995 |
dg |
NOTE: libkvm, w, ps, 'top', and any other utility which depends on struct proc or any VM system structure will have to be rebuilt!!!
Much needed overhaul of the VM system. Included in this first round of changes:
1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages, haspage, and sync operations are supported. The haspage interface now provides information about clusterability. All pager routines now take struct vm_object's instead of "pagers".
2) Improved data structures. In the previous paradigm, there is constant confusion caused by pagers being both a data structure ("allocate a pager") and a collection of routines. The idea of a pager structure has escentially been eliminated. Objects now have types, and this type is used to index the appropriate pager. In most cases, items in the pager structure were duplicated in the object data structure and thus were unnecessary. In the few cases that remained, a un_pager structure union was created in the object to contain these items.
3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now be removed. For instance, vm_object_enter(), vm_object_lookup(), vm_object_remove(), and the associated object hash list were some of the things that were removed.
4) simple_lock's removed. Discussion with several people reveals that the SMP locking primitives used in the VM system aren't likely the mechanism that we'll be adopting. Even if it were, the locking that was in the code was very inadequate and would have to be mostly re-done anyway. The locking in a uni-processor kernel was a no-op but went a long way toward making the code difficult to read and debug.
5) Places that attempted to kludge-up the fact that we don't have kernel thread support have been fixed to reflect the reality that we are really dealing with processes, not threads. The VM system didn't have complete thread support, so the comments and mis-named routines were just wrong. We now use tsleep and wakeup directly in the lock routines, for instance.
6) Where appropriate, the pagers have been improved, especially in the pager_alloc routines. Most of the pager_allocs have been rewritten and are now faster and easier to maintain.
7) The pagedaemon pageout clustering algorithm has been rewritten and now tries harder to output an even number of pages before and after the requested page. This is sort of the reverse of the ideal pagein algorithm and should provide better overall performance.
8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup have been removed. Some other unnecessary casts have also been removed.
9) Some almost useless debugging code removed.
10) Terminology of shadow objects vs. backing objects straightened out. The fact that the vm_object data structure escentially had this backwards really confused things. The use of "shadow" and "backing object" throughout the code is now internally consistent and correct in the Mach terminology.
11) Several minor bug fixes, including one in the vm daemon that caused 0 RSS objects to not get purged as intended.
12) A "default pager" has now been created which cleans up the transition of objects to the "swap" type. The previous checks throughout the code for swp->pg_data != NULL were really ugly. This change also provides the rudiments for future backing of "anonymous" memory by something other than the swap pager (via the vnode pager, for example), and it allows the decision about which of these pagers to use to be made dynamically (although will need some additional decision code to do this, of course).
13) (dyson) MAP_COPY has been deprecated and the corresponding "copy object" code has been removed. MAP_COPY was undocumented and non- standard. It was furthermore broken in several ways which caused its behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will continue to work correctly, but via the slightly different semantics of MAP_PRIVATE.
14) (dyson) Sharing maps have been removed. It's marginal usefulness in a threads design can be worked around in other ways. Both #12 and #13 were done to simplify the code and improve readability and maintain- ability. (As were most all of these changes)
TODO:
1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing this will reduce the vnode pager to a mere fraction of its current size.
2) Rewrite vm_fault and the swap/vnode pagers to use the clustering information provided by the new haspage pager interface. This will substantially reduce the overhead by eliminating a large number of VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be improved to provide both a "behind" and "ahead" indication of contiguousness.
3) Implement the extended features of pager_haspage in swap_pager_haspage(). It currently just says 0 pages ahead/behind.
4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps via a much more general mechanism that could also be used for disk striping of regular filesystems.
5) Do something to improve the architecture of vm_object_collapse(). The fact that it makes calls into the swap pager and knows too much about how the swap pager operates really bothers me. It also doesn't allow for collapsing of non-swap pager objects ("unnamed" objects backed by other pagers).
|
9456 |
09-Jul-1995 |
dg |
Moved call to VOP_GETATTR() out of vnode_pager_alloc() and into the places that call vnode_pager_alloc() so that a failure return can be dealt with. This fixes a panic seen on NFS clients when a file being opened is deleted on the server before the open completes.
|
9428 |
07-Jul-1995 |
dfr |
Use a consistent blocksize for sizing bufs to avoid panicing the bio system.
|
9365 |
28-Jun-1995 |
dfr |
Use the correct cred for nfs_commit operations.
|
9356 |
28-Jun-1995 |
dg |
1) Converted v_vmdata to v_object. 2) Removed unnecessary vm_object_lookup()/pager_cache(object, TRUE) pairs after vnode_pager_alloc() calls - the object is already guaranteed to be persistent. 3) Removed some gratuitous casts.
|
9354 |
28-Jun-1995 |
dg |
Fixed VOP_LINK argument order botch.
|
9336 |
27-Jun-1995 |
dfr |
Changes to support version 3 of the NFS protocol. The version 2 support has been tested (client+server) against FreeBSD-2.0, IRIX 5.3 and FreeBSD-current (using a loopback mount). The version 2 support is stable AFAIK. The version 3 support has been tested with a loopback mount and minimally against an IRIX 5.3 server. It needs more testing and may have problems. I have patched amd to support the new variable length filehandles although it will still only use version 2 of the protocol.
Before booting a kernel with these changes, nfs clients will need to at least build and install /usr/sbin/mount_nfs. Servers will need to build and install /usr/sbin/mountd.
NFS diskless support is untested.
Obtained from: Rick Macklem <rick@snowhite.cis.uoguelph.ca>
|
9221 |
14-Jun-1995 |
joerg |
The duplicate information returned in fa_type and fa_mode is an ambiguity in the NFS version 2 protocol.
VREG should be taken literally as a regular file. If a server intents to return some type information differently in the upper bits of the mode field (e.g. for sockets, or FIFOs), NFSv2 mandates fa_type to be VNON. Anyway, we leave the examination of the mode bits even in the VREG case to avoid breakage for bogus servers, but we make sure that there are actually type bits set in the upper part of fa_mode (and failing that, trust the va_type field).
NFSv3 cleared the issue, and requires fa_mode to not contain any type information (while also introduing sockets and FIFOs for fa_type).
The fix has been tested against a variety of NFS servers. It fixes problems with the ``Tropic'' NFS server for Windows, while apparently not breaking anything.
Pointed-out by: scott@zorch.sf-bay.org (Scott Hazen Mueller)
|
9202 |
11-Jun-1995 |
rgrimes |
Merge RELENG_2_0_5 into HEAD
|
8876 |
30-May-1995 |
rgrimes |
Remove trailing whitespace.
|
8832 |
29-May-1995 |
dg |
Fixed some serious bugs that resulted in object reference counts not being handled correctly. This would manifest itself as "object deallocated too many times" panics and perhaps other strange inconsistencies on NFS servers.
Reviewed by: me, of course Submitted by: John Dyson
|
8692 |
21-May-1995 |
dg |
Changes to fix the following bugs:
1) Files weren't properly synced on filesystems other than UFS. In some cases, this lead to lost data. Most likely would be noticed on NFS. The fix is to make the VM page sync/object_clean general rather than in each filesystem. 2) Mixing regular and mmaped file I/O on NFS was very broken. It caused chunks of files to end up as zeroes rather than the intended contents. The fix was to fix several race conditions and to kludge up the "b_dirtyoff" and "b_dirtyend" that NFS relies upon - paying attention to page modifications that occurred via the mmapping.
Reviewed by: David Greenman Submitted by: John Dyson
|
8504 |
14-May-1995 |
dg |
Changed swap partition handling/allocation so that it doesn't require specific partitions be mentioned in the kernel config file ("swap on foo" is now obsolete).
From Poul-Henning:
The visible effect is this:
As default, unless options "NSWAPDEV=23" is in your config, you will have four swap-devices. You can swapon(2) any block device you feel like, it doesn't have to be in the kernel config.
There is a performance/resource win available by getting the NSWAPDEV right (but only if you have just one swap-device ??), but using that as default would be too restrictive.
The invisible effect is that:
Swap-handling disappears from the $arch part of the kernel. It gets a lot simpler (-145 lines) and cleaner.
Reviewed by: John Dyson, David Greenman Submitted by: Poul-Henning Kamp, with minor changes by me.
|
7969 |
21-Apr-1995 |
dyson |
Slight re-ordering of the creation of a vmio object to fix a condition that can cause NFS I/O failures.
|
7871 |
16-Apr-1995 |
dg |
Various fixes from John Dyson:
1) Rewrote screwy code that uses an incore buffer without making it busy. 2) Use B_CACHE instead of B_DONE in cases where it is appropriate. 3) Minor code optimization.
This *might* fix kern/345 submitted by Heikki Suonsivu.
|
7275 |
23-Mar-1995 |
dg |
Deleted bogus DIAGNOSTIC "nfs_fsync: dirty" message. This can and does happen normally when there is heavy write activity to a file since the vnode isn't locked (NFS plays fast and loose with vnode locks). This change "fixes" PR#267.
|
7095 |
16-Mar-1995 |
wollman |
Add four more filesystem flags:
VFCF_NETWORK (this FS goes over the net) VFCF_READONLY (read-write mounts do not make any sense) VFCF_SYNTHETIC (data in this FS is not real) VFCF_LOOPBACK (this FS aliases something else)
cd9660 is readonly; nullfs, umapfs, and union are loopback; NFS is netowkr; procfs, kernfs, and fdesc are synthetic.
|
7090 |
16-Mar-1995 |
bde |
Add and move declarations to fix all of the warnings from `gcc -Wimplicit' (except in netccitt, netiso and netns) and most of the warnings from `gcc -Wnested-externs'. Fix all the bugs found. There were no serious ones.
|
6875 |
04-Mar-1995 |
dg |
Removed obsolete vtrace() remnants.
|
6420 |
15-Feb-1995 |
phk |
YF fix.
|
6417 |
15-Feb-1995 |
dg |
Fixed two more bugs related to the merged cache changes.
Submitted by: John Dyson
|
6361 |
14-Feb-1995 |
phk |
YFfix +int nfsrv_vput __P(( struct vnode * )); +int nfsrv_vrele __P(( struct vnode * )); +int nfsrv_vmio __P(( struct vnode * ));
|
6210 |
06-Feb-1995 |
dg |
Changed order of release of vnode/object to fix a problem where the vnode is freed with an old object still attached (subsequently causing a panic). Fixes NFS server panic "object/pager mismatch".
Submitted by: John Dyson
|
6151 |
03-Feb-1995 |
dg |
Fixed bmap run-length brokeness. Use bmap run-length extension when doing clustered paging.
Submitted by: John Dyson
|
6148 |
03-Feb-1995 |
dg |
Removed a pile of vfs_unbusy_pages()...both unnecessary and wrong - resulted in serious system instability. Changed a B_INVAL to a B_NOCACHE so that buffer data is properly disposed of.
Submitted by: John Dyson, Rick Macklin, and ohki@gssm.otsuka.tsukuba.ac.jp
|
5471 |
10-Jan-1995 |
dg |
Added two missing brelse() calls.
Submitted by: rick@snowhite.cis.uoguelph.ca
|
5455 |
09-Jan-1995 |
dg |
These changes embody the support of the fully coherent merged VM buffer cache, much higher filesystem I/O performance, and much better paging performance. It represents the culmination of over 6 months of R&D.
The majority of the merged VM/cache work is by John Dyson.
The following highlights the most significant changes. Additionally, there are (mostly minor) changes to the various filesystem modules (nfs, msdosfs, etc) to support the new VM/buffer scheme.
vfs_bio.c: Significant rewrite of most of vfs_bio to support the merged VM buffer cache scheme. The scheme is almost fully compatible with the old filesystem interface. Significant improvement in the number of opportunities for write clustering.
vfs_cluster.c, vfs_subr.c Upgrade and performance enhancements in vfs layer code to support merged VM/buffer cache. Fixup of vfs_cluster to eliminate the bogus pagemove stuff.
vm_object.c: Yet more improvements in the collapse code. Elimination of some windows that can cause list corruption.
vm_pageout.c: Fixed it, it really works better now. Somehow in 2.0, some "enhancements" broke the code. This code has been reworked from the ground-up.
vm_fault.c, vm_page.c, pmap.c, vm_object.c Support for small-block filesystems with merged VM/buffer cache scheme.
pmap.c vm_map.c Dynamic kernel VM size, now we dont have to pre-allocate excessive numbers of kernel PTs.
vm_glue.c Much simpler and more effective swapping code. No more gratuitous swapping.
proc.h Fixed the problem that the p_lock flag was not being cleared on a fork.
swap_pager.c, vnode_pager.c Removal of old vfs_bio cruft to support the past pseudo-coherency. Now the code doesn't need it anymore.
machdep.c Changes to better support the parameter values for the merged VM/buffer cache scheme.
machdep.c, kern_exec.c, vm_glue.c Implemented a seperate submap for temporary exec string space and another one to contain process upages. This eliminates all map fragmentation problems that previously existed.
ffs_inode.c, ufs_inode.c, ufs_readwrite.c Changes for merged VM/buffer cache. Add "bypass" support for sneaking in on busy buffers.
Submitted by: John Dyson and David Greenman
|
5353 |
03-Jan-1995 |
jkh |
From Gene Stark: If nd->swap_nblks is zero in nfs_mountroot(), then the system comes up without initializing swapdev_vp to an actual vnode pointer. The swap pager assumes a non-NULL value for swapdev_vp.
The fix is to try initializing local swap if no NFS swap space is specified.
|
5013 |
08-Dec-1994 |
phk |
Would you please correct nfs/nfs_vfsops.c so that the ip address of the root filesystem is printed out correctly? It's line 299 in nfs/nfs_vfsops.c.
Reviewed by: phk Submitted by: Luigi Rizzo luigi@iet.unipi.it
|
4067 |
02-Nov-1994 |
wollman |
Forward-declare a few structures to avoid warning messages.
|
3820 |
23-Oct-1994 |
wollman |
Implement fs.nfs MIB variables.
|
3797 |
22-Oct-1994 |
phk |
This is where the action is. I'm still not sure that swap is 100% OK, but it seems to work.
|
3664 |
17-Oct-1994 |
phk |
This is a bunch of changes from NetBSD. There are a couple of bug-fixes. But mostly it is changes to use the list-maintenance macros instead of doing the pointer-gymnastics by hand.
Obtained from: NetBSD
|
3451 |
09-Oct-1994 |
dg |
Got rid of map.h. It's a leftover from the rmap code, and we use rlists. Changed swapmap into swaplist.
|
3396 |
06-Oct-1994 |
dg |
Use tsleep() rather than sleep so that 'ps' is more informative about the wait.
|
3305 |
02-Oct-1994 |
phk |
Prototyping and general gcc-shutting up. Gcc has one warning now which looks bad, I will get to it eventually, unless somebody beats me to it.
|
2997 |
22-Sep-1994 |
wollman |
Make NFS loadable.
|
2979 |
22-Sep-1994 |
wollman |
More loadable VFS changes:
- Make a number of filesystems work again when they are statically compiled (blush)
- FIFOs are no longer optional; ``options FIFO'' removed from distributed config files.
|
2946 |
21-Sep-1994 |
wollman |
Implemented loadable VFS modules, and made most existing filesystems loadable. (NFS is a notable exception.)
|
2384 |
29-Aug-1994 |
dg |
"bogus" fixes from 1.1.5 to work around some cache coherency problems.
|
2175 |
21-Aug-1994 |
paul |
More idempotency....... this is fun :-)
|
2152 |
20-Aug-1994 |
dg |
Implemented filesystem clean bit via:
machdep.c: Changed printf's a little and call vfs_unmountall() if the sync was successful.
cd9660_vfsops.c, ffs_vfsops.c, nfs_vfsops.c, lfs_vfsops.c: Allow dismount of root FS. It is now disallowed at a higher level.
vfs_conf.c: Removed unused rootfs global.
vfs_subr.c: Added new routines vfs_unmountall and vfs_unmountroot. Filesystems are now dismounted if the machine is properly rebooted.
ffs_vfsops.c: Toggle clean bit at the appropriate places. Print warning if an unclean FS is mounted.
ffs_vfsops.c, lfs_vfsops.c: Fix bug in selecting proper flags for VOP_CLOSE().
vfs_syscalls.c: Disallow dismounting root FS via umount syscall.
|
2112 |
18-Aug-1994 |
wollman |
Fix up some sloppy coding practices:
- Delete redundant declarations. - Add -Wredundant-declarations to Makefile.i386 so they don't come back. - Delete sloppy COMMON-style declarations of uninitialized data in header files. - Add a few prototypes. - Clean up warnings resulting from the above.
NB: ioconf.c will still generate a redundant-declaration warning, which is unavoidable unless somebody volunteers to make `config' smarter.
|
2010 |
10-Aug-1994 |
dg |
Initialize lockf pointer. I missed this when I made NFS use the generic advlock mechanism, and not doing so results in random system crashes.
|
1979 |
09-Aug-1994 |
dg |
Removed some padding bytes from the nfsnode struct to make the structure size a power of 2 again. The system complains otherwise - probably because it wastes space with our malloc scheme otherwise.
|
1960 |
08-Aug-1994 |
dg |
Made lockf advisory locking code generic (rather than ufs specific), and use it in NFS. This is required both for diskless support and for POSIX compliance. Note: the support in NFS is only for the local node.
Submitted by: based on work originally done by Yuval Yurom
|
1937 |
08-Aug-1994 |
dg |
Changed B_AGE policy to work correctly in a world with relatively large buffer caches. The old policy generally ended up caching nothing.
|
1858 |
05-Aug-1994 |
dg |
Converted 'vmunix' to 'kernel'.
|
1828 |
04-Aug-1994 |
dg |
Made NFS attribute cache timeouts kernel config file tunable via NFS_MINATTRTIMO and NFS_MAXATTRTIMO.
|
1817 |
02-Aug-1994 |
dg |
Added $Id$
|
1549 |
25-May-1994 |
rgrimes |
The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.
Reviewed by: Rodney W. Grimes Submitted by: John Dyson and David Greenman
|
1541 |
24-May-1994 |
rgrimes |
BSD 4.4 Lite Kernel Sources
|