#
259065 |
|
07-Dec-2013 |
gjb |
- Copy stable/10 (r259064) to releng/10.0 as part of the 10.0-RELEASE cycle. - Update __FreeBSD_version [1] - Set branch name to -RC1
[1] 10.0-CURRENT __FreeBSD_version value ended at '55', so start releng/10.0 at '100' so the branch is started with a value ending in zero.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation |
#
256281 |
|
10-Oct-2013 |
gjb |
Copy head (r256279) to stable/10 as part of the 10.0-RELEASE cycle.
Approved by: re (implicit) Sponsored by: The FreeBSD Foundation
|
#
243245 |
|
18-Nov-2012 |
trasz |
Add UFS writesuspension mechanism, designed to allow userland processes to modify on-disk metadata for filesystems mounted for write.
Reviewed by: kib, mckusick Sponsored by: FreeBSD Foundation
|
#
234605 |
|
23-Apr-2012 |
trasz |
Remove unused thread argument from vtruncbuf().
Reviewed by: kib
|
#
233438 |
|
24-Mar-2012 |
mckusick |
Add a third flags argument to ffs_syncvnode to avoid a possible conflict with MNT_WAIT flags that passed in its second argument. This will be MFC'ed together with r232351.
Discussed with: kib
|
#
232351 |
|
01-Mar-2012 |
mckusick |
This change avoids a kernel deadlock on "snaplk" when using snapshots on UFS filesystems running with journaled soft updates. This is the first of several bugs that need to be fixed before removing the restriction added in -r230250 to prevent the use of snapshots on filesystems running with journaled soft updates.
The deadlock occurs when holding the snapshot lock (snaplk) and then trying to flush an inode via ffs_update(). We become blocked by another process trying to flush a different inode contained in the same inode block that we need. It holds the inode block for which we are waiting locked. When it tries to write the inode block, it gets blocked waiting for the our snaplk when it calls ffs_copyonwrite() to see if the inode block needs to be copied in our snapshot.
The most obvious place that this deadlock arises is in the ffs_copyonwrite() routine when it updates critical metadata in a snapshot and tries to write it out before proceeding. The fix here is to write the data and indirect block pointer for the snapshot, but to skip the call to ffs_update() to write the snapshot inode. To ensure that we will never have to update a pointer in the inode itself, the ffs_snapshot() routine that creates the snapshot has to ensure that all the direct blocks are allocated as part of the creation of the snapshot.
A less obvious place that this deadlock occurs is when we hold the snaplk because we are deleting a snapshot. In the course of doing the deletion, we need to allocate various soft update dependency structures and allocate some journal space. If we hit a resource limit while doing this we decrease the resources in use by flushing out an existing dirty file to get it to give up the soft dependency resources that it holds. The flush can cause an ffs_update() to be done on the inode for the file that we have selected to flush resulting in the same deadlock as described above when the inode that we have chosen to flush resides in the same inode block as the snapshot inode that we hold. The fix is to defer cleaning up any time that the inode on which we are operating is a snapshot.
Help and review by: Jeff Roberson Tested by: Peter Holm MFC (to 9 only) after: 2 weeks
|
#
225166 |
|
25-Aug-2011 |
mm |
Generalize ffs_pages_remove() into vn_pages_remove().
Remove mapped pages for all dataset vnodes in zfs_rezget() using new vn_pages_remove() to fix mmapped files changed by zfs rollback or zfs receive -F.
PR: kern/160035, kern/156933 Reviewed by: kib, pjd Approved by: re (kib) MFC after: 1 week
|
#
223325 |
|
20-Jun-2011 |
jeff |
- Fix directory count rollbacks by passing the mode to the journal dep earlier. - Add rollback/forward code for frag and cluster accounting. - Handle the FREEDEP case in softdep_sync_buf(). (submitted by pho)
|
#
223169 |
|
16-Jun-2011 |
mckusick |
Drop the include of <ufs/ffs/ffs_extern.h> from usr.sbin/makefs/ffs/ffs_bswap.c and usr.sbin/makefs/ffs/ffs_subr.c as they have no need of anything in that file. No other programs or libraries include <ufs/ffs/ffs_extern.h> (nor should they as it is totally in-kernel interfaces). For added protection I enclosed the entire contents of <ufs/ffs/ffs_extern.h> in ifdef _KERNEL.
Feedback from: Bruce Evans and Tai-hwa Liang
|
#
223138 |
|
16-Jun-2011 |
avatar |
Fixing compilation bustage by introducing another forward declaration.
|
#
223127 |
|
15-Jun-2011 |
mckusick |
Ensure that filesystem metadata contained within persistent snapshots is always kept consistent.
Suggested by: Jeff Roberson
|
#
223114 |
|
15-Jun-2011 |
mckusick |
With the restructuring of the block reclaimation code, the notification messages for a filesystem being out of space need to be moved so that they do not print out until after a failed cleanup attempt.
Suggested by: Jeff Roberson
|
#
223052 |
|
13-Jun-2011 |
dim |
Use alternative, less messy solution to avoid breakage after r223020: put the snapdata structure between #ifdef _KERNEL guards.
Suggested by: kib
|
#
223020 |
|
12-Jun-2011 |
mckusick |
Update to soft updates journaling to properly track freed blocks that get claimed by snapshots.
Submitted by: Jeff Roberson Tested by: Peter Holm
|
#
222958 |
|
10-Jun-2011 |
jeff |
Implement fully asynchronous partial truncation with softupdates journaling to resolve errors which can cause corruption on recovery with the old synchronous mechanism.
- Append partial truncation freework structures to indirdeps while truncation is proceeding. These prevent new block pointers from becoming valid until truncation completes and serialize truncations. - On completion of a partial truncate journal work waits for zeroed pointers to hit indirects. - softdep_journal_freeblocks() handles last frag allocation and last block zeroing. - vtruncbuf/ffs_page_remove moved into softdep_*_freeblocks() so it is only implemented in one place. - Block allocation failure handling moved up one level so it does not proceed with buf locks held. This permits us to do more extensive reclaims when filesystem space is exhausted. - softdep_sync_metadata() is broken into two parts, the first executes once at the start of ffs_syncvnode() and flushes truncations and inode dependencies. The second is called on each locked buf. This eliminates excessive looping and rollbacks. - Improve the mechanism in process_worklist_item() that handles acquiring vnode locks for handle_workitem_remove() so that it works more generally and does not loop excessively over the same worklist items on each call. - Don't corrupt directories by zeroing the tail in fsck. This is only done for regular files. - Push a fsync complete record for files that need it so the checker knows a truncation in the journal is no longer valid.
Discussed with: mckusick, kib (ffs_pages_remove and ffs_truncate parts) Tested by: pho
|
#
220374 |
|
05-Apr-2011 |
mckusick |
Be far more persistent in reclaiming blocks and inodes before giving up and declaring a filesystem out of space. Especially necessary when running on a small filesystem. With this improvement, it should be possible to use soft updates on a small root filesystem.
Kudos to: Peter Holm Testing by: Peter Holm MFC: 2 weeks
|
#
219895 |
|
23-Mar-2011 |
mckusick |
Add retry code analogous to the block allocation retry code to avoid running out of inodes.
Reported by: Peter Holm
|
#
207141 |
|
24-Apr-2010 |
jeff |
- Merge soft-updates journaling from projects/suj/head into head. This brings in support for an optional intent log which eliminates the need for background fsck on unclean shutdown.
Sponsored by: iXsystems, Yahoo!, and Juniper. With help from: McKusick and Peter Holm
|
#
187790 |
|
27-Jan-2009 |
rwatson |
Following a fair amount of real world experience with ACLs and extended attributes since FreeBSD 5, make the following semantic changes:
- Don't update the inode modification time (mtime) when extended attributes (and hence also ACLs) are added, modified, or removed. - Don't update the inode access tie (atime) when extended attributes (and hence also ACLs) are queried.
This means that rsync (and related tools) won't improperly think that the data in the file has changed when only the ACL has changed.
Note that ffs_reallocblks() has not been changed to not update on an IO_EXT transaction, but currently EAs don't use the cluster write routines so this shouldn't be a problem. If EAs grow support for clustering, then VOP_REALLOCBLKS() will need to grow a flag argument to carry down IO_EXT to UFS.
MFC after: 1 week PR: ports/125739 Reported by: Alexander Zagrebin <alexz@visp.ru> Tested by: pluknet <pluknet@gmail.com>, Greg Byshenk <freebsd@byshenk.net> Discussed with: kib, kientzle, timur, Alexander Bokovoy <ab@samba.org>
|
#
183073 |
|
16-Sep-2008 |
kib |
When attempt is made to suspend a filesystem that is already syspended, wait until the current suspension is lifted instead of silently returning success immediately. The consequences of calling vfs_write() resume when not owning the suspension are not well-defined at best.
Add the vfs_susp_clean() mount method to be called from vfs_write_resume(). Set it to process_deferred_inactive() for ffs, and stop calling it manually.
Add the thread flag TDP_IGNSUSP that allows to bypass the suspension point in the vn_start_write. It is intended for use by VFS in the situations where the suspender want to do some i/o requiring calls to vn_start_write(), and this i/o cannot be done later.
Reviewed by: tegge In collaboration with: pho MFC after: 1 month
|
#
183070 |
|
16-Sep-2008 |
kib |
When downgrading the read-write mount to read-only, do_unmount() sets MNT_RDONLY flag before the VFS_MOUNT() is called. In ufs_inactive() and ufs_itimes_locked(), UFS verifies whether the fs is read-only by checking MNT_RDONLY, but this may cause loss of the IN_MODIFIED flag for inode on the fs being remounted rw->ro.
Introduce UFS_RDONLY() struct ufsmount' method that reports the value of the fs_ronly. The later is set to 1 only after the remount is finished.
Reviewed by: tegge In collaboration with: pho MFC after: 1 month
|
#
182365 |
|
28-Aug-2008 |
kib |
Softdep code may need to instantiate vnode when processing dependencies. In particular, it may need this while syncing filesystem being unmounted. Since during unmount MNTK_NOINSMNTQUE flag is set, that could sometimes disallow insertion of the vnode into the vnode mount list, softdep code needs to overwrite the MNTK_NOINSMNTQUE flag.
Create the ffs_vgetf() function that sets the VV_FORCEINSMQ flag for new vnode and use it consistently from the softdep code instead of ffs_vget().
Add the retry logic to the softdep_flushfiles() to flush the vnodes that could be instantiated while flushing softdep dependencies.
Tested by: pho, kris Reviewed by: tegge MFC after: 1 month
|
#
166799 |
|
17-Feb-2007 |
mckusick |
This README file is obsolete. The cited problems were fixed long ago and the code is installed by default so no longer requires action by the administrator to be included.
|
#
166193 |
|
23-Jan-2007 |
kib |
Cylinder group bitmaps and blocks containing inode for a snapshot file are after snaplock, while other ffs device buffers are before snaplock in global lock order. By itself, this could cause deadlock when bdwrite() tries to flush dirty buffers on snapshotted ffs. If, during the flush, COW activity for snapshot needs to allocate block and ffs_alloccg() selects the cylinder group that is being written by bdwrite(), then kernel would panic due to recursive buffer lock acquision.
Avoid dealing with buffers in bdwrite() that are from other side of snaplock divisor in the lock order then the buffer being written. Add new BOP, bop_bdwrite(), to do dirty buffer flushing for same vnode in the bdwrite(). Default implementation, bufbdflush(), refactors the code from bdwrite(). For ffs device buffers, specialized implementation is used.
Reviewed by: tegge, jeff, Russell Cattelan (cattelan xfs org, xfs changes) Tested by: Peter Holm X-MFC after: 3 weeks (if ever: it changes ABI)
|
#
163841 |
|
31-Oct-2006 |
pjd |
Add gjournal specific code to the UFS file system: - Add FS_GJOURNAL flag which enables gjournal support on a file system. - Add cg_unrefs field to the cylinder group structure which holds number of unreferenced (orphaned) inodes in the given cylinder group. - Add fs_unrefs field to the super block structure which holds total number of unreferenced (orphaned) inodes. - When file or a directory is orphaned (last reference is removed, but object is still open), increase fs_unrefs and cg_unrefs fields, which is a hint for fsck in which cylinder groups looks for such (orphaned) objects. - When file is last closed, decrease {fs,cg}_unrefs fields. - Add VV_DELETED vnode flag which points at orphaned objects.
Sponsored by: home.pl
|
#
156451 |
|
08-Mar-2006 |
tegge |
Use vn_start_secondary_write() and vn_finished_secondary_write() as a replacement for vn_write_suspend_wait() to better account for secondary write processing.
Close race where secondary writes could be started after ffs_sync() returned but before the file system was marked as suspended.
Detect if secondary writes or softdep processing occurred during vnode sync loop in ffs_sync() and retry the loop if needed.
|
#
156203 |
|
02-Mar-2006 |
jeff |
- Move softdep from using a global worklist to per-mount worklists. This has many positive effects including improved smp locking, reducing interdependencies between mounts that can lead to deadlocks, etc. - Add the softdep worklist and various counters to the ufsmnt structure. - Add a mount pointer to the workitem and remove mount pointers from the various structures derived from the workitem as they are now redundant. - Remove the poor-man's semaphore protecting softdep_process_worklist and softdep_flushworklist. Several threads may now process the list simultaneously. - Add softdep_waitidle() to block the thread until all pending dependencies being operated on by other threads have been flushed. - Use softdep_waitidle() in unmount and snapshots to block either operation until the fs is stable. - Remove softdep worklist processing from the syncer and move it into the softdep_flush() thread. This thread processes all softdep mounts once each second and when it is called via the new softdep_speedup() when there is a resource shortage. This removes the softdep hook from the kernel and various hacks in header files to support it.
Reviewed by/Discussed with: tegge, truckman, mckusick Tested by: kris
|
#
144585 |
|
03-Apr-2005 |
jeff |
- Move the contents of softdep_disk_prewrite into ffs_geom_strategy to fix two bugs. - ffs_disk_prewrite was pulling the vp from the buf and checking for COPYONWRITE, when really it wanted the vp from the bufobj that we're writing to, which is the devvp. This lead to us skipping the copy on write to all file data, which significantly broke snapshots for the last few months. - When the SOFTUPDATES option was not included in the kernel config we would also skip the copy on write check, which would effectively disable snapshots. - Remove an invalid mp_fixme().
Debugging tips from: mckusick Reported by: iedowse, others Discussed with: phk
|
#
141631 |
|
10-Feb-2005 |
phk |
Make a some SYSCTL_NODEs and some of FFS's VFS_ methods static.
|
#
141542 |
|
08-Feb-2005 |
phk |
Split the vop_vector for ffs1 and ffs2, this is mostly for the different EXTATTR support.
|
#
141539 |
|
08-Feb-2005 |
phk |
Background writes are entirely an FFS/Softupdates thing.
Give FFS vnodes a specific bufwrite method which contains all the background write stuff and then calls into the default bufwrite() for the rest of the job.
Remove all the background write related stuff from the normal bufwrite.
This drags the softdep_move_dependencies() back into FFS.
Long term, it is worth looking at simply copying the data into allocated memory and issuing the bio directly and not create the "shadow buf" in the first place (just like copy-on-write is done in snapshots for instance). I don't think we really gain anything but complexity from doing this with a buf.
|
#
141525 |
|
08-Feb-2005 |
phk |
(forced commit to record correct commit message)
Split ffs_fsync() into a VOP_FSYNC() component and an internal part called ffs_syncvnode().
Eliminate unnecessary thread argument and XXX'ed curthread passes for same. Reduce softdep_sync_metadata() from a struct vop_fsync_args to just the vnode argument it needs.
Convert internal VOP_FSYNC() calls to use ffs_syncvnode().
|
#
141522 |
|
08-Feb-2005 |
phk |
For snapshots we need all VOP_LOCKs to be exclusive.
The "business class upgrade" was implemented in UFS's VOP_LOCK implementation ufs_lock() which is the wrong layer, so move it to ffs_lock().
Also, as long as we have not abandonned advanced vfs-stacking we should not preclude it from happening: instead of implementing a copy locally, use the VOP_LOCK_APV(&ufs) to correctly arrive at vop_stdlock() at the bottom.
|
#
140701 |
|
24-Jan-2005 |
jeff |
- Change some function parameters so that the ufsmount structure is accessable in places where the ufs lock will be needed.
Sponsored By: Isilon Systems, Inc.
|
#
138290 |
|
01-Dec-2004 |
phk |
Back when VOP_* was introduced, we did not have new-style struct initializations but we did have lofty goals and big ideals.
Adjust to more contemporary circumstances and gain type checking.
Replace the entire vop_t frobbing thing with properly typed structures. The only casualty is that we can not add a new VOP_ method with a loadable module. History has not given us reason to belive this would ever be feasible in the the first place.
Eliminate in toto VOCALL(), vop_t, VNODEOP_SET() etc.
Give coda correct prototypes and function definitions for all vop_()s.
Generate a bit more data from the vnode_if.src file: a struct vop_vector and protype typedefs for all vop methods.
Add a new vop_bypass() and make vop_default be a pointer to another struct vop_vector.
Remove a lot of vfs_init since vop_vector is ready to use from the compiler.
Cast various vop_mumble() to void * with uppercase name, for instance VOP_PANIC, VOP_NULL etc.
Implement VCALL() by making vdesc_offset the offsetof() the relevant function pointer in vop_vector. This is disgusting but since the code is generated by a script comparatively safe. The alternative for nullfs etc. would be much worse.
Fix up all vnode method vectors to remove casts so they become typesafe. (The bulk of this is generated by scripts)
|
#
137035 |
|
29-Oct-2004 |
phk |
Move UFS from DEVFS backing to GEOM backing.
This eliminates a bunch of vnode overhead (approx 1-2 % speed improvement) and gives us more control over the access to the storage device.
Access counts on the underlying device are not correctly tracked and therefore it is possible to read-only mount the same disk device multiple times: syv# mount -p /dev/md0 /var ufs rw 2 2 /dev/ad0 /mnt ufs ro 1 1 /dev/ad0 /mnt2 ufs ro 1 1 /dev/ad0 /mnt3 ufs ro 1 1
Since UFS/FFS is not a synchrousely consistent filesystem (ie: it caches things in RAM) this is not possible with read-write mounts, and the system will correctly reject this.
Details:
Add a geom consumer and a bufobj pointer to ufsmount.
Eliminate the vnode argument from softdep_disk_prewrite(). Pick the vnode out of bp->b_vp for now. Eventually we should find it through bp->b_bufobj->b_private.
In the mountcode, use g_vfs_open() once we have used VOP_ACCESS() to check permissions.
When upgrading and downgrading between r/o and r/w do the right thing with GEOM access counts. Remove all the workarounds for not being able to do this with VOP_OPEN().
If we are the root mount, drop the exclusive access count until we upgrade to r/w. This allows fsck of the root filesystem and the MNT_RELOAD to work correctly.
Set bo_private to the GEOM consumer on the device bufobj.
Change the ffs_ops->strategy function to call g_vfs_strategy()
In ufs_strategy() directly call the strategy on the disk bufobj. Same in rawread.
In ffs_fsync() we will no longer see VCHR device nodes, so remove code which synced the filesystem mounted on it, in case we came there. I'm not sure this code made sense in the first place since we would have taken the specfs route on such a vnode.
Redo the highly bogus readblock() function in the snapshot code to something slightly less bogus: Constructing an uio and using physio was really quite a detour. Instead just fill in a bio and ship it down.
|
#
136969 |
|
26-Oct-2004 |
phk |
The island council met and voted buf_prewrite() home.
Give ffs it's own bufobj->bo_ops vector and create a private strategy routine, (currently misnamed for forwards compatibility), which is just a copy of the generic bufstrategy routine except we call softdep_disk_prewrite() directly instead of through the buf_prewrite() indirection.
Teach UFS about the need for softdep_disk_prewrite() and call the function directly in FFS.
Remove buf_prewrite() from the default bufstrategy() and from the global bio_ops method vector.
|
#
136963 |
|
26-Oct-2004 |
phk |
Degeneralize the per cdev copyonwrite callback. The only possible value is ffs_copyonwrite() and the only place it can be called from is FFS which would never want to call another filesystems copyonwrite method, should one exist, so there is no reason why anything generic should know about this.
|
#
135877 |
|
28-Sep-2004 |
phk |
Remove support for accessing device nodes in UFS/FFS.
Device nodes can still be created and exported with NFS.
|
#
132902 |
|
30-Jul-2004 |
phk |
Put a version element in the VFS filesystem configuration structure and refuse initializing filesystems with a wrong version. This will aid maintenance activites on the 5-stable branch.
s/vfs_mount/vfs_omount/
s/vfs_nmount/vfs_mount/
Name our filesystems mount function consistently.
Eliminate the namiedata argument to both vfs_mount and vfs_omount. It was originally there to save stack space. A few places abused it to get hold of some credentials to pass around. Effectively it is unused.
Reorganize the root filesystem selection code.
|
#
127975 |
|
07-Apr-2004 |
imp |
Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999 and irc message from Robert Watson saying that clause 3 can be removed from those files with an NAI copyright that also have only a University of California copyrights.
Approved by: core, rwatson
|
#
111239 |
|
21-Feb-2003 |
mckusick |
This patch fixes a bug on an active filesystem on which a snapshot is being taken from panicing with either "freeing free block" or "freeing free inode". The problem arises when the snapshot code is scanning the filesystem looking for inodes with a reference count of zero (e.g., unlinked but still open) so that it can expunge them from its view. If it encounters a reclaimed vnode and has to restart its scan, then it will panic if it encounters and tries to free an inode that it has already processed. The fix is to check each candidate inode to see if it has already been processed before trying to delete it from the snapshot image.
Sponsored by: DARPA & NAI Labs.
|
#
108315 |
|
27-Dec-2002 |
phk |
Move the allocation of the inode contents into ffs_vfsops.c rather than passing malloc types around.
|
#
108313 |
|
27-Dec-2002 |
phk |
Make ffs_mountfs() static.
Remove the malloctype from the ufs mount structure, instead add a callback to the storage method for freeing inodes: UFS_IFREE().
Add vfs_ifree() method function which frees an inode.
Unvariablelize the malloc type used for allocating inodes.
|
#
101777 |
|
13-Aug-2002 |
phk |
Introduce typedefs for the member functions of struct vfsops and employ these in the main filesystems. This does not change the resulting code but makes the source a little bit more grepable.
Sponsored by: DARPA and NAI Labs.
|
#
100925 |
|
30-Jul-2002 |
phk |
Move ffs_isfreeblock() to ffs_alloc.c and make it static.
Sponsored by: DARPA & NAI Labs.
|
#
100344 |
|
19-Jul-2002 |
mckusick |
Add support to UFS2 to provide storage for extended attributes. As this code is not actually used by any of the existing interfaces, it seems unlikely to break anything (famous last words).
The internal kernel interface to manipulate these attributes is invoked using two new IO_ flags: IO_NORMAL and IO_EXT. These flags may be specified in the ioflags word of VOP_READ, VOP_WRITE, and VOP_TRUNCATE. Specifying IO_NORMAL means that you want to do I/O to the normal data part of the file and IO_EXT means that you want to do I/O to the extended attributes part of the file. IO_NORMAL and IO_EXT are mutually exclusive for VOP_READ and VOP_WRITE, but may be specified individually or together in the case of VOP_TRUNCATE. For example, when removing a file, VOP_TRUNCATE is called with both IO_NORMAL and IO_EXT set. For backward compatibility, if neither IO_NORMAL nor IO_EXT is set, then IO_NORMAL is assumed.
Note that the BA_ and IO_ flags have been `merged' so that they may both be used in the same flags word. This merger is possible by assigning the IO_ flags to the low sixteen bits and the BA_ flags the high sixteen bits. This works because the high sixteen bits of the IO_ word is reserved for read-ahead and help with write clustering so will never be used for flags. This merge lets us get away from code of the form:
if (ioflags & IO_SYNC) flags |= BA_SYNC;
For the future, I have considered adding a new field to the vattr structure, va_extsize. This addition could then be exported through the stat structure to allow applications to find out the size of the extended attribute storage and also would provide a more standard interface for truncating them (via VOP_SETATTR rather than VOP_TRUNCATE).
I am also contemplating adding a pathconf parameter (for concreteness, lets call it _PC_MAX_EXTSIZE) which would let an application determine the maximum size of the extended atribute storage.
Sponsored by: DARPA & NAI Labs.
|
#
99206 |
|
01-Jul-2002 |
iedowse |
Add the ffs bits necessary to support unloading of the ufs kernel module. This adds an ffs_uninit() function that calls ufs_uninit() and also calls a new softdep_uninitialize() function. Add a stub for softdep_uninitialize() to cover the non-SOFTUPDATES case.
Reviewed by: mckusick
|
#
98542 |
|
21-Jun-2002 |
mckusick |
This commit adds basic support for the UFS2 filesystem. The UFS2 filesystem expands the inode to 256 bytes to make space for 64-bit block pointers. It also adds a file-creation time field, an ability to use jumbo blocks per inode to allow extent like pointer density, and space for extended attributes (up to twice the filesystem block size worth of attributes, e.g., on a 16K filesystem, there is space for 32K of attributes). UFS2 fully supports and runs existing UFS1 filesystems. New filesystems built using newfs can be built in either UFS1 or UFS2 format using the -O option. In this commit UFS1 is the default format, so if you want to build UFS2 format filesystems, you must specify -O 2. This default will be changed to UFS2 when UFS2 proves itself to be stable. In this commit the boot code for reading UFS2 filesystems is not compiled (see /sys/boot/common/ufsread.c) as there is insufficient space in the boot block. Once the size of the boot block is increased, this code can be defined.
Things to note: the definition of SBSIZE has changed to SBLOCKSIZE. The header file <ufs/ufs/dinode.h> must be included before <ufs/ffs/fs.h> so as to get the definitions of ufs2_daddr_t and ufs_lbn_t.
Still TODO: Verify that the first level bootstraps work for all the architectures. Convert the utility ffsinfo to understand UFS2 and test growfs. Add support for the extended attribute storage. Update soft updates to ensure integrity of extended attribute storage. Switch the current extended attribute interfaces to use the extended attribute storage. Add the extent like functionality (framework is there, but is currently never used).
Sponsored by: DARPA & NAI Labs. Reviewed by: Poul-Henning Kamp <phk@freebsd.org>
|
#
92728 |
|
19-Mar-2002 |
alfred |
Remove __P.
|
#
92462 |
|
16-Mar-2002 |
mckusick |
Add a flags parameter to VFS_VGET to pass through the desired locking flags when acquiring a vnode. The immediate purpose is to allow polling lock requests (LK_NOWAIT) needed by soft updates to avoid deadlock when enlisting other processes to help with the background cleanup. For the future it will allow the use of shared locks for read access to vnodes. This change touches a lot of files as it affects most filesystems within the system. It has been well tested on FFS, loopback, and CD-ROM filesystems. only lightly on the others, so if you find a problem there, please let me (mckusick@mckusick.com) know.
|
#
90098 |
|
01-Feb-2002 |
mckusick |
When taking a snapshot, we must check for active files that have been unlinked (e.g., with a zero link count). We have to expunge all trace of these files from the snapshot so that they are neither reclaimed prematurely by fsck nor saved unnecessarily by dump.
|
#
89637 |
|
22-Jan-2002 |
mckusick |
This patch fixes a long standing complaint with soft updates in which small and/or nearly full filesystems would fail with `file system full' messages when trying to replace a number of existing files (for example during a system installation). When the allocation routines are about to fail with a file system full condition, they make a call to softdep_request_cleanup() which attempts to accelerate the flushing of pending deletion requests in an effort to free up space. In the face of filesystem I/O requests that exceed the available disk transfer capacity, the cleanup request could take an unbounded amount of time. Thus, the softdep_request_cleanup() routine will only try for tickdelay seconds (default 2 seconds) before giving up and returning a filesystem full error. Under typical conditions, the softdep_request_cleanup() routine is able to free up space in under fifty milliseconds.
|
#
83366 |
|
12-Sep-2001 |
julian |
KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process.
Sorry john! (your next MFC will be a doosie!)
Reviewed by: peter@freebsd.org, dillon@freebsd.org
X-MFC after: ha ha ha ha
|
#
77508 |
|
30-May-2001 |
jhb |
Forward declare struct cg to quiet a warning.
Submitted by: bde
|
#
76269 |
|
04-May-2001 |
mckusick |
Refinement to revision 1.16 of ufs/ffs/ffs_snapshot.c to reduce the amount of time that the filesystem must be suspended. The current snapshot is elided as well as the earlier snapshots.
|
#
76132 |
|
29-Apr-2001 |
phk |
VOP_BALLOC was never really a VOP in the first place, so convert it to UFS_BALLOC like the other "between UFS and FFS function interfaces".
|
#
76126 |
|
29-Apr-2001 |
phk |
Remove faint traces of non-existant ffs_bmap().
|
#
74548 |
|
21-Mar-2001 |
mckusick |
Add kernel support for running fsck on active filesystems.
|
#
73942 |
|
07-Mar-2001 |
mckusick |
Fixes to track snapshot copy-on-write checking in the specinfo structure rather than assuming that the device vnode would reside in the FFS filesystem (which is obviously a broken assumption with the device filesystem).
|
#
70183 |
|
19-Dec-2000 |
mckusick |
Several small but important fixes for snapshots:
1) Be more tolerant of missing snapshot files by only trying to decrement their reference count if they are registered as active.
2) Fix for snapshots of filesystems with block sizes larger than 8K (from Ollivier Robert <roberto@eurocontrol.fr>).
3) Fix to avoid losing last block in snapshot file when calculating blocks that need to be copied (from Don Coleman <coleman@coleman.org>).
|
#
67106 |
|
14-Oct-2000 |
adrian |
Initial commit of IFS - a inode-namespaced FFS. Here is a short description:
How it works: --
Basically ifs is a copy of ffs, overriding some vfs/vnops. (Yes, hack.) I didn't see the need in duplicating all of sys/ufs/ffs to get this off the ground.
File creation is done through a special file - 'newfile' . When newfile is called, the system allocates and returns an inode. Note that newfile is done in a cloning fashion:
fd = open("newfile", O_CREAT|O_RDWR, 0644); fstat(fd, &st);
printf("new file is %d\n", (int)st.st_ino);
Once you have created a file, you can open() and unlink() it by its returned inode number retrieved from the stat call, ie:
fd = open("5", O_RDWR);
The creation permissions depend entirely if you have write access to the root directory of the filesystem.
To get the list of currently allocated inodes, VOP_READDIR has been added which returns a directory listing of those currently allocated.
--
What this entails:
* patching conf/files and conf/options to include IFS as a new compile option (and since ifs depends upon FFS, include the FFS routines)
* An entry in i386/conf/NOTES indicating IFS exists and where to go for an explanation
* Unstaticize a couple of routines in src/sys/ufs/ffs/ which the IFS routines require (ffs_mount() and ffs_reload())
* a new bunch of routines in src/sys/ufs/ifs/ which implement the IFS routines. IFS replaces some of the vfsops, and a handful of vnops - most notably are VFS_VGET(), VOP_LOOKUP(), VOP_UNLINK() and VOP_READDIR(). Any other directory operation is marked as invalid.
What this results in:
* an IFS partition's create permissions are controlled by the perm/ownership of the root mount point, just like a normal directory
* Each inode has perm and ownership too
* IFS does *NOT* mean an FFS partition can be opened per inode. This is a completely seperate filesystem here
* Softupdates doesn't work with IFS, and really I don't think it needs it. Besides, fsck's are FAST. (Try it :-)
* Inodes 0 and 1 aren't allocatable because they are special (dump/swap IIRC). Inode 2 isn't allocatable since UFS/FFS locks all inodes in the system against this particular inode, and unravelling THAT code isn't trivial. Therefore, useful inodes start at 3.
Enjoy, and feedback is definitely appreciated!
|
#
63788 |
|
24-Jul-2000 |
mckusick |
This patch corrects the first round of panics and hangs reported with the new snapshot code.
Update addaliasu to correctly implement the semantics of the old checkalias function. When a device vnode first comes into existence, check to see if an anonymous vnode for the same device was created at boot time by bdevvp(). If so, adopt the bdevvp vnode rather than creating a new vnode for the device. This corrects a problem which caused the kernel to panic when taking a snapshot of the root filesystem.
Change the calling convention of vn_write_suspend_wait() to be the same as vn_start_write().
Split out softdep_flushworklist() from softdep_flushfiles() so that it can be used to clear the work queue when suspending filesystem operations.
Access to buffers becomes recursive so that snapshots can recursively traverse their indirect blocks using ffs_copyonwrite() when checking for the need for copy on write when flushing one of their own indirect blocks. This eliminates a deadlock between the syncer daemon and a process taking a snapshot.
Ensure that softdep_process_worklist() can never block because of a snapshot being taken. This eliminates a problem with buffer starvation.
Cleanup change in ffs_sync() which did not synchronously wait when MNT_WAIT was specified. The result was an unclean filesystem panic when doing forcible unmount with heavy filesystem I/O in progress.
Return a zero'ed block when reading a block that was not in use at the time that a snapshot was taken. Normally, these blocks should never be read. However, the readahead code will occationally read them which can cause unexpected behavior.
Clean up the debugging code that ensures that no blocks be written on a filesystem while it is suspended. Snapshots must explicitly label the blocks that they are writing during the suspension so that they do not cause a `write on suspended filesystem' panic.
Reorganize ffs_copyonwrite() to eliminate a deadlock and also to prevent a race condition that would permit the same block to be copied twice. This change eliminates an unexpected soft updates inconsistency in fsck caused by the double allocation.
Use bqrelse rather than brelse for buffers that will be needed soon again by the snapshot code. This improves snapshot performance.
|
#
62976 |
|
11-Jul-2000 |
mckusick |
Add snapshots to the fast filesystem. Most of the changes support the gating of system calls that cause modifications to the underlying filesystem. The gating can be enabled by any filesystem that needs to consistently suspend operations by adding the vop_stdgetwritemount to their set of vnops. Once gating is enabled, the function vfs_write_suspend stops all new write operations to a filesystem, allows any filesystem modifying system calls already in progress to complete, then sync's the filesystem to disk and returns. The function vfs_write_resume allows the suspended write operations to begin again. Gating is not added by default for all filesystems as for SMP systems it adds two extra locks to such critical kernel paths as the write system call. Thus, gating should only be added as needed.
Details on the use and current status of snapshots in FFS can be found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness is not included here. Unless and until you create a snapshot file, these changes should have no effect on your system (famous last words).
|
#
62968 |
|
11-Jul-2000 |
mckusick |
Clean up warning about undeclared function by declaring softdep_fsync in mount.h instead of ffs_extern.h. The correct solution is to use an indirect function pointer so that the kernel does not have to be built with options FFS, but that will be left for another day.
|
#
61729 |
|
16-Jun-2000 |
phk |
ARGH! I have too many source trees :-(
Fix prototype errors in last commit.
|
#
55690 |
|
09-Jan-2000 |
mckusick |
Make static non-exported functions from soft updates.
|
#
53464 |
|
20-Nov-1999 |
eivind |
We do not have ffs_checkexp, so remove the prototype
|
#
51138 |
|
10-Sep-1999 |
alfred |
Seperate the export check in VFS_FHTOVP, exports are now checked via VFS_CHECKEXP.
Add fh(open|stat|stafs) syscalls to allow userland to query filesystems based on (network) filehandle.
Obtained from: NetBSD
|
#
50477 |
|
27-Aug-1999 |
peter |
$Id$ -> $FreeBSD$
|
#
47131 |
|
13-May-1999 |
mckusick |
Add a hook to ffs_fsync to allow soft updates to get first chance at doing a sync on the block device for the filesystem. That allows it to push the bitmap blocks before the inode blocks which greatly reduces the number of inode rollbacks that need to be done.
|
#
42374 |
|
07-Jan-1999 |
bde |
Don't pass unused unused timestamp args to UFS_UPDATE() or waste time initializing them. This almost finishes centralizing (in-core) timestamp updates in ufs_itimes().
|
#
34826 |
|
23-Mar-1998 |
bde |
Forward declare even more structs to restore some self-sufficiency. Didn't fix new dependence on <ufs/ufs/inode.h> and its prerequisites.
|
#
34266 |
|
08-Mar-1998 |
julian |
Reviewed by: dyson@freebsd.org (john Dyson), dg@root.com (david greenman) Submitted by: Kirk McKusick (mcKusick@mckusick.com) Obtained from: WHistle development tree
|
#
33054 |
|
03-Feb-1998 |
bde |
Forward declare some structs so that this file is more self-sufficient.
|
#
31352 |
|
22-Nov-1997 |
bde |
Staticized.
|
#
30492 |
|
16-Oct-1997 |
phk |
Another VFS cleanup "kilo commit"
1. Remove VOP_UPDATE, it is (also) an UFS/{FFS,LFS,EXT2FS,MFS} intereface function, and now lives in the ufsmount structure.
2. Remove VOP_SEEK, it was unused.
3. Add mode default vops:
VOP_ADVLOCK vop_einval VOP_CLOSE vop_null VOP_FSYNC vop_null VOP_IOCTL vop_enotty VOP_MMAP vop_einval VOP_OPEN vop_null VOP_PATHCONF vop_einval VOP_READLINK vop_einval VOP_REALLOCBLKS vop_eopnotsupp
And remove identical functionality from filesystems
4. Add vop_stdpathconf, which returns the canonical stuff. Use it in the filesystems. (XXX: It's probably wrong that specfs and fifofs sets this vop, shouldn't it come from the "host" filesystem, for instance ufs or cd9660 ?)
5. Try to make system wide VOP functions have vop_* names.
6. Initialize the um_* vectors in LFS.
(Recompile your LKMS!!!)
|
#
30474 |
|
16-Oct-1997 |
phk |
VFS mega cleanup commit (x/N)
1. Add new file "sys/kern/vfs_default.c" where default actions for VOPs go. Implement proper defaults for ABORTOP, BWRITE, LEASE, POLL, REVOKE and STRATEGY. Various stuff spread over the entire tree belongs here.
2. Change VOP_BLKATOFF to a normal function in cd9660.
3. Kill VOP_BLKATOFF, VOP_TRUNCATE, VOP_VFREE, VOP_VALLOC. These are private interface functions between UFS and the underlying storage manager layer (FFS/LFS/MFS/EXT2FS). The functions now live in struct ufsmount instead.
4. Remove a kludge of VOP_ functions in all filesystems, that did nothing but obscure the simplicity and break the expandability. If a filesystem doesn't implement VOP_FOO, it shouldn't have an entry for it in its vnops table. The system will try to DTRT if it is not implemented. There are still some cruft left, but the bulk of it is done.
5. Fix another VCALL in vfs_cache.c (thanks Bruce!)
|
#
30283 |
|
10-Oct-1997 |
phk |
Add type arg to ffs_mountfs and avoid examining v_tag to find out if MFS is getting a free ride.
Use generic ufs_reclaim().
|
#
29888 |
|
27-Sep-1997 |
kato |
Clustered read and write are switched at mount-option level.
1. Clustered I/O is switched by the MNT_NOCLUSTERR and MNT_NOCLUSTERW bits of the mnt_flag. The sysctl variables, vfs.foo.doclusterread and vfs.foo.doclusterwrite are deleted. Only mount option can control clustered I/O from userland. 2. When foofs_mount mounts block device, foofs_mount checks D_CLUSTERR and D_CLUSTERW bits of the d_flags member in the block device switch table. If D_NOCLUSTERR / D_NOCLUSTERW are set, MNT_NOCLUSTERR / MNT_NOCLUSTERW bits will be set. In this case, MNT_NOCLUSTERR and MNT_NOCLUSTERW cannot be cleared from userland. 3. Vnode driver disables both clustered read and write. 4. Union filesystem disables clutered write.
Reviewed by: bde
|
#
28270 |
|
16-Aug-1997 |
wollman |
Fix all areas of the system (or at least all those in LINT) to avoid storing socket addresses in mbufs. (Socket buffers are the one exception.) A number of kernel APIs needed to get fixed in order to make this happen. Also, fix three protocol families which kept PCBs in mbufs to not malloc them instead. Delete some old compatibility cruft while we're at it, and add some new routines in the in_cksum family.
|
#
22975 |
|
22-Feb-1997 |
peter |
Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.
|
#
22579 |
|
12-Feb-1997 |
mpp |
Add function prototypes for most of the new Lite2 functions. Also made a few of the miscfs routines static to be consistent. Some modules simply required some additional #includes to remove -Wall warnings.
|
#
22521 |
|
10-Feb-1997 |
dyson |
This is the kernel Lite/2 commit. There are some requisite userland changes, so don't expect to be able to run the kernel as-is (very well) without the appropriate Lite/2 userland changes.
The system boots and can mount UFS filesystems.
Untested: ext2fs, msdosfs, NFS Known problems: Incorrect Berkeley ID strings in some files. Mount_std mounts will not work until the getfsent library routine is changed.
Reviewed by: various people Submitted by: Jeffery Hsu <hsu@freebsd.org>
|
#
21673 |
|
14-Jan-1997 |
jkh |
Make the long-awaited change from $Id$ to $FreeBSD$
This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long.
Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
|
#
15493 |
|
01-May-1996 |
bde |
Removed bogus _BEGIN_DECLS/_END_DECLS.
Removed unused struct tag declarations in cloned code.
Added or cleaned up idempotency ifdefs.
|
#
14249 |
|
25-Feb-1996 |
bde |
Removed vestigial support for the obsolete FIFO option. In ext2fs it caused null pointer panics for all fifo operations unless FIFO was defined.
|
#
12911 |
|
17-Dec-1995 |
phk |
Staticize.
|
#
12158 |
|
09-Nov-1995 |
bde |
Introduced a type `vop_t' for vnode operation functions and used it 1138 times (:-() in casts and a few more times in declarations. This change is null for the i386.
The type has to be `typedef int vop_t(void *)' and not `typedef int vop_t()' because `gcc -Wstrict-prototypes' warns about the latter. Since vnode op functions are called with args of different (struct pointer) types, neither of these function types is any use for type checking of the arg, so it would be preferable not to use the complete function type, especially since using the complete type requires adding 1138 casts to avoid compiler warnings and another 40+ casts to reverse the function pointer conversions before calling the functions.
|
#
10578 |
|
06-Sep-1995 |
dyson |
Added indirect pointer for ffs_getpages, and added external declaration.
|
#
7430 |
|
28-Mar-1995 |
bde |
Add and move declarations to fix all of the warnings from `gcc -Wimplicit' (except in netccitt, netiso and netns) that I didn't notice when I fixed "all" such warnings before.
|
#
2979 |
|
22-Sep-1994 |
wollman |
More loadable VFS changes:
- Make a number of filesystems work again when they are statically compiled (blush)
- FIFOs are no longer optional; ``options FIFO'' removed from distributed config files.
|
#
2112 |
|
18-Aug-1994 |
wollman |
Fix up some sloppy coding practices:
- Delete redundant declarations. - Add -Wredundant-declarations to Makefile.i386 so they don't come back. - Delete sloppy COMMON-style declarations of uninitialized data in header files. - Add a few prototypes. - Clean up warnings resulting from the above.
NB: ioconf.c will still generate a redundant-declaration warning, which is unavoidable unless somebody volunteers to make `config' smarter.
|
#
1817 |
|
02-Aug-1994 |
dg |
Added $Id$
|
#
1549 |
|
25-May-1994 |
rgrimes |
The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.
Reviewed by: Rodney W. Grimes Submitted by: John Dyson and David Greenman
|
#
1542 |
|
24-May-1994 |
rgrimes |
This commit was generated by cvs2svn to compensate for changes in r1541, which included commits to RCS files with non-trunk default branches.
|
#
1541 |
|
24-May-1994 |
rgrimes |
BSD 4.4 Lite Kernel Sources
|