#
29363fb4 |
|
23-Nov-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove ancient SCCS tags. Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl script. Sponsored by: Netflix
|
#
2ff63af9 |
|
16-Aug-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove $FreeBSD$: one-line .h pattern Remove /^\s*\*+\s*\$FreeBSD\$.*$\n/
|
#
831b1ff7 |
|
27-Jul-2023 |
Kirk McKusick <mckusick@FreeBSD.org> |
UFS/FFS: Migrate to modern uintXX_t from u_intXX_t. As per https://lists.freebsd.org/archives/freebsd-scsi/2023-July/000257.html move to the modern uintXX_t. While here also migrate u_char to uint8_t. Where other kernel interfaces allow, migrate u_long to uint64_t. No functional changes intended. MFC-after: 1 week Sponsored-by: The FreeBSD Foundation
|
#
6f0ca273 |
|
26-Jul-2023 |
Kirk McKusick <mckusick@FreeBSD.org> |
Add diagnostics to fsck_ffs(8) for journaled soft-updates debugging. MFC-after: 1 week Sponsored-by: The FreeBSD Foundation
|
#
e4a905d1 |
|
25-May-2023 |
Kirk McKusick <mckusick@FreeBSD.org> |
Add the ability to adjust directory depths to background fsck_ffs(8). Commit fe5e6e2 improved FFS directory placement when creating new directories. It is done by keeping track of the depth of directories in the filesystem and placing those lower in the tree closer together while spreading out those higher in the tree. Fsck_ffs(8) checks these depths and if incorrect adjusts them to their correct value. When running in background fsck_ffs(8) needs to be able to make an adjustment to the depth. This commit adds the sysctl to make such an adjustment and adds the code to fsck_ffs(8) to use the new sysctl. MFC after: 1 week Sponsored by: The FreeBSD Foundation
|
#
0a6e34e9 |
|
15-May-2023 |
Kirk McKusick <mckusick@FreeBSD.org> |
Fix size differences between architectures of the UFS/FFS CGSIZE macro value. The cylinder group header structure ended with `u_int8_t cg_space[1]' representing the beginning of the inode bitmap array. Some architectures like the i386 rounded this up to a 4-byte boundry while other architectures like the amd64 rounded it up to an 8-byte boundry. Thus sizeof(struct cg) was four bytes bigger on an amd64 machine than on an i386 machine. If a filesystem created on an i386 machine was moved to an amd64 machine, the size of the cylinder group calculated by the CGSIZE macro would appear to grow by four bytes. Filesystems whose cylinder groups were exactly equal to the block size on an i386 machine would appear to have a cylinder group that was four bytes too big when moved to an amd64 machine. Note that although the structure appears to be too big, it in fact is fine. It is just the calaculation of its size that is in error. The fix is to remove the cg_space element from the cylinder-group structure so that the calculated size of the structure is the same size on all architectures. Reported by: Tijl Coosemans Tested by: Tijl Coosemans and Peter Holm MFC after: 1 week Sponsored by: The FreeBSD Foundation
|
#
243a0eda |
|
21-Oct-2022 |
Kirk McKusick <mckusick@FreeBSD.org> |
Increase the maximum size of the journaled soft-updates journal. The size of the journaled soft-updates journal should be big enough to hold two minutes of filesystem metadata-update activity. The maximum size of the soft updates journal was set in the 1990s. At the time it was assummed that disk arrays would top out at 16 drives and disk writes per drive would top out at 500 per second. Today's I/O subsystems are considerably bigger and faster than those limits. Thus this delta removes the hard upper limit and lets tunefs(8) and newfs(8) set the upper bound based on the size of the filesystem and its cylinder groups. Sponsored by: The FreeBSD Foundation
|
#
e6886616 |
|
13-Aug-2022 |
Kirk McKusick <mckusick@FreeBSD.org> |
Move the ability to search for alternate UFS superblocks from fsck_ffs(8) into ffs_sbsearch() to allow use by other parts of the system. Historically only fsck_ffs(8), the UFS filesystem checker, had code to track down and use alternate UFS superblocks. Since fsdb(8) used much of the fsck_ffs(8) implementation it had some ability to track down alternate superblocks. This change extracts the code to track down alternate superblocks from fsck_ffs(8) and puts it into a new function ffs_sbsearch() in sys/ufs/ffs/ffs_subr.c. Like ffs_sbget() and ffs_sbput() also found in ffs_subr.c, these functions can be used directly by the kernel subsystems. Additionally they are exported to the UFS library, libufs(8) so that they can be used by user-level programs. The new functions added to libufs(8) are sbfind(3) that is an alternative to sbread(3) and sbsearch(3) that is an alternative to sbget(3). See their manual pages for further details. The utilities that have been changed to search for superblocks are dumpfs(8), fsdb(8), ffsinfo(8), and fsck_ffs(8). Also, the prtblknos(8) tool found in tools/diag/prtblknos searches for superblocks. The UFS specific mount code uses the superblock search interface when mounting the root filesystem and when the administrator doing a mount(8) command specifies the force flag (-f). The standalone UFS boot code (found in stand/libsa/ufs.c) uses the superblock search code in the hope of being able to get the system up and running so that fsck_ffs(8) can be used to get the filesystem cleaned up. The following utilities have not been changed to search for superblocks: clri(8), tunefs(8), snapinfo(8), fstyp(8), quot(8), dump(8), fsirand(8), growfs(8), quotacheck(8), gjournal(8), and glabel(8). When these utilities fail, they do report the cause of the failure. The one exception is the tasting code used to try and figure what a given disk contains. The tasting code will remain silent so as not to put out a slew of messages as it trying to taste every new mass storage device that shows up. Reviewed by: kib Reviewed by: Warner Losh Tested by: Peter Holm Differential Revision: https://reviews.freebsd.org/D36053 Sponsored by: The FreeBSD Foundation
|
#
d22531d5 |
|
31-Jul-2022 |
Kirk McKusick <mckusick@FreeBSD.org> |
Identify each UFS/FFS superblock integrity check as a warning or fatal error. Identify each of the superblock validation checks as either a warning or a fatal error. Any integrity check that can cause a system hang or crash is marked as fatal. Those that may simply lead to poor file layoutor other less good operating conditions are marked as warning. Normally both fatal and warning are treated as errors and prevent the superblock from being loaded. A new flag, UFS_NOWARNFAIL, is added. When passed to ffs_sbget() it will note warnings that it finds, but will still proceed with loading the superblock. Note that when UFS_NOWARNFAIL is used, it also includes UFS_NOHASHFAIL. No legitimate superblocks should fail as a result of these changes.
|
#
b21582ee |
|
30-Jul-2022 |
Kirk McKusick <mckusick@FreeBSD.org> |
Add a flags parameter to the ffs_sbget() function that reads UFS superblocks. Rather than trying to shoehorn flags into the requested superblock address, create a separate flags parameter to the ffs_sbget() function in sys/ufs/ffs/ffs_subr.c. The ffs_sbget() function is used both in the kernel and in user-level utilities through export to the sbget() function in the libufs(3) library (see sbget(3) for details). The kernel uses ffs_sbget() when mounting UFS filesystems, in the glabel(8) and gjournal(8) GEOM utilities, and in the standalone library used when booting the system from a UFS root filesystem. The ffs_sbget() function reads the superblock located at the byte offset specified by its sblockloc parameter. The value UFS_STDSB may be specified for sblockloc to request that the standard location for the superblock be read. The two existing options are now flags: UFS_NOHASHFAIL will note if the check hash is wrong but will still return the superblock. This is used by the bootstrap code to give the system a chance to come up so that fsck can be run to correct the problem. UFS_NOMSG indicates that superblock inconsistency error messages should not be printed. It is used by programs like fsck that want to print their own error message and programs like glabel(8) that just want to know if a UFS filesystem exists on a partition. One additional flag is added: UFS_NOCSUM causes only the superblock itself to be returned, but does not read in any auxiliary data structures like the cylinder group summary information. It is used by clients like glabel(8) that just want to check for possible filesystem types. Using UFS_NOCSUM skips the superblock checks for csum data which allows superblocks that have corrupted csum data to be read and used. The validate_sblock() function checks that the superblock has not been corrupted in a way that can crash or hang the system. Unless the UFS_NOMSG flag is specified, it will print out any errors that it finds. Prior to this commit, validate_sblock() returned as soon as it found an inconsistency so would print at most one message. It now does all its checks so when UFS_NOMSG has not been specified will print out everything that it finds inconsistent. Sponsored by: The FreeBSD Foundation
|
#
f2b39152 |
|
15-Nov-2021 |
Kirk McKusick <mckusick@FreeBSD.org> |
Add ability to suppress UFS/FFS superblock check-hash failure messages. When reading UFS/FFS superblocks that have check hashes, both the kernel and libufs print an error message if the check hash is incorrect. This commit adds the ability to request that the error message not be made. It is intended for use by programs like fsck that wants to print its own error message and by kernel subsystems like glabel that just wants to check for possible filesystem types. This capability will be used in followup commits. Sponsored by: Netflix
|
#
b366ee48 |
|
14-Nov-2021 |
Kirk McKusick <mckusick@FreeBSD.org> |
Consolodate four copies of the STDSB define into a single place. The STDSB macro is passed to the ffs_sbget() routine to fetch a UFS/FFS superblock "from the stadard place". It was identically defined in lib/libufs/libufs.h, stand/libsa/ufs.c, sys/ufs/ffs/ffs_extern.h, and sys/ufs/ffs/ffs_subr.c. Delete it from these four files and define it instead in sys/ufs/ffs/fs.h. All existing uses of this macro already include sys/ufs/ffs/fs.h so no include changes need to be made. No functional change intended. Sponsored by: Netflix
|
#
996d40f9 |
|
24-Oct-2020 |
Kirk McKusick <mckusick@FreeBSD.org> |
Various new check-hash checks have been added to the UFS filesystem over various major releases. Superblock check hashes were added for the 12 release and cylinder-group and inode check hashes will appear in the 13 release. When a disk with a UFS filesystem is writably mounted, the kernel clears the feature flags for anything that it does not support. For example, if a UFS disk from a 12-stable kernel is mounted on an 11-stable system, the 11-stable kernel will clear the flag in the filesystem superblock that indicates that superblock check-hashs are being maintained. Thus if the disk is later moved back to a 12-stable system, the 12-stable system will know to ignore its incorrect check-hash. If the only filesystem modification done on the earlier kernel is to run a utility such as growfs(8) that modifies the superblock but neither updates the check-hash nor clears the feature flag indicating that it does not support the check-hash, the disk will fail to mount if it is moved back to its original newer kernel. This patch moves the code that clears the filesystem feature flags from the mount code (ffs_mountfs()) to the code that reads the superblock (ffs_sbget()). As ffs_sbget() is used by the kernel mount code and is imported into libufs(3), all the filesystem utilities will now also clear these flags when they make modifications to the filesystem. As suggested by John Baldwin, fsck_ffs(8) has been changed to accept and repair bad superblock check-hashes rather than refusing to run. This change allows fsck to recover filesystems that have been impacted by utilities older than those created after this change and is a sensible thing to do in any event. Reported by: John Baldwin (jhb@) MFC after: 2 weeks Sponsored by: Netflix
|
#
34816cb9 |
|
18-Jun-2020 |
Kirk McKusick <mckusick@FreeBSD.org> |
Move the pointers stored in the superblock into a separate fs_summary_info structure. This change was originally done by the CheriBSD project as they need larger pointers that do not fit in the existing superblock. This cleanup of the superblock eases the task of the commit that immediately follows this one. Suggested by: brooks Reviewed by: kib PR: 246983 Sponsored by: Netflix
|
#
f2620e9c |
|
21-Apr-2020 |
John Baldwin <jhb@FreeBSD.org> |
Retire two unused background fsck sysctls. These two sysctls were added to support UFS softupdates journalling with snapshots. However, the changes to fsck to use them were never committed and there have never been any in-tree uses of these sysctls. More details from Kirk: When journalling got added to soft updates, its journal rollback freed blocks that it thought were no longer in use. But it does not take snapshots into account (i.e., if a snapshot is still using it, then it cannot be freed). So I added the needed logic to fsck by having the free go through the kernel's blkfree code so it could grab blocks that were still needed by snapshots. That is done using the setbufoutput hack. I never got that code working reliably, so it is still sitting in my work directory. Which also explains why you still cannot take snapshots on filesystems running with journalling... In looking over my use of this feature, and in particular the troubles I was having with it, I conclude that it may be better to extract the code from the kernel that handles freeing blocks claimed by snapshots and putting it into fsck directly. My original intent was that it is complex and at the time changing, so only having to maintain it in one place was appealing. But at this point it has not changed in years and the hacks like setinode and setbufoutput to be able to use the kernel code is sufficiently ugly, that I am leaning towards just extracting it. Reviewed by: mckusick MFC after: 1 week Sponsored by: DARPA Differential Revision: https://reviews.freebsd.org/D24484
|
#
5a0d467f |
|
13-Aug-2019 |
Kirk McKusick <mckusick@FreeBSD.org> |
Clarify comment that describes how the FS_METACKHASH is managed. MFC after: 3 days
|
#
ac4b20a0 |
|
25-Feb-2019 |
Kirk McKusick <mckusick@FreeBSD.org> |
After a crash, a file that extends into indirect blocks may end up shorter than its size resulting in a hole as its final block (which is a violation of the invarients of the UFS filesystem). Soft updates will always ensure that the file size is correct when writing inodes to disk for files that contain only direct block pointers. However soft updates does not roll back sizes for files with indirect blocks that it has set to unallocated because their contents have not yet been written to disk. Hence, the file can appear to have a hole at its end because the block pointer has been rolled back to zero when its inode was written to disk. Thus, fsck_ffs calculates the last allocated block in the file. For files that extend into indirect blocks, fsck_ffs checks for a size past the last allocated block of the file and if that is found, shortens the file to reference the last allocated block thus avoiding having it reference a hole at its end. Submitted by: Chuck Silvers <chs@netflix.com> Tested by: Chuck Silvers <chs@netflix.com> MFC after: 1 week Sponsored by: Netflix
|
#
ec888383 |
|
23-Oct-2018 |
Kirk McKusick <mckusick@FreeBSD.org> |
Continuing efforts to provide hardening of FFS, this change adds a check hash to the superblock. If a check hash fails when an attempt is made to mount a filesystem, the mount fails with EINVAL (Invalid argument). This avoids a class of filesystem panics related to corrupted superblocks. The hash is done using crc32c. Check hases are added only to UFS2 and not to UFS1 as UFS1 is primarily used in embedded systems with small memories and low-powered processors which need as light-weight a filesystem as possible. Reviewed by: kib Tested by: Peter Holm Sponsored by: Netflix
|
#
068beacf |
|
08-Feb-2018 |
Kirk McKusick <mckusick@FreeBSD.org> |
The goal of this change is to prevent accidental foot shooting by folks running filesystems created on check-hash enabled kernels (which I will call "new") on a non-check-hash enabled kernels (which I will call "old). The idea here is to detect when a filesystem is run on an old kernel and flag the filesystem so that when it gets moved back to a new kernel, it will not start getting a slew of check-hash errors. Back when the UFS version 2 filesystem was created, it added a file flag FS_INDEXDIRS that was to be set on any filesystem that kept some sort of on-disk indexing for directories. The idea was precisely to solve the issue we have today. Specifically that a newer kernel that supported indexing would be able to tell that the filesystem had been run on an older non-indexing kernel and that the indexes should not be used until they had been rebuilt. Since we have never implemented on-disk directory indicies, the FS_INDEXDIRS flag is cleared every time any UFS version 2 filesystem ever created is mounted for writing. This commit repurposes the FS_INDEXDIRS flag as the FS_METACKHASH flag. Thus, the FS_METACKHASH is definitively known to have always been cleared. The FS_INDEXDIRS flag has been moved to a new block of flags that will always be cleared starting with this commit (until they get used to implement some future feature which needs to detect that the filesystem was mounted on a kernel that predates the new feature). If a filesystem with check-hashes enabled is mounted on an old kernel the FS_METACKHASH flag is cleared. When that filesystem is mounted on a new kernel it will see that the FS_METACKHASH has been cleared and clears all of the fs_metackhash flags. To get them re-enabled the user must run fsck (in interactive mode without the -y flag) which will ask for each supported check hash whether it should be rebuilt and enabled. When fsck is run in its default preen mode, it will just ignore the check hashes so they will remain disabled. The kernel has always disabled any check hash functions that it does not support, so as more types of check hashes are added, we will get a non-surprising result. Specifically if filesystems get moved to kernels supporting fewer of the check hashes, those that are not supported will be disabled. If the filesystem is moved back to a kernel with more of the check-hashes available and fsck is run interactively to rebuild them, then their checking will resume. Otherwise just the smaller subset will be checked. A side effect of this commit is that filesystems running with cylinder-group check hashes will stop having them checked until fsck is run to re-enable them (since none of them currently have the FS_METACKHASH flag set). So, if you want check hashes enabled on your filesystems after booting a kernel with these changes, you need to run fsck to enable them. Any newly created filesystems will have check hashes enabled. If in doubt as to whether you have check hashes emabled, run dumpfs and look at the list of enabled flags at the end of the superblock details.
|
#
dffce215 |
|
25-Jan-2018 |
Kirk McKusick <mckusick@FreeBSD.org> |
Refactoring of reading and writing of the UFS/FFS superblock. Specifically reading is done if ffs_sbget() and writing is done in ffs_sbput(). These functions are exported to libufs via the sbget() and sbput() functions which then used in the various filesystem utilities. This work is in preparation for adding subperblock check hashes. No functional change intended. Reviewed by: kib
|
#
51369649 |
|
20-Nov-2017 |
Pedro F. Giffuni <pfg@FreeBSD.org> |
sys: further adoption of SPDX licensing ID tags. Mainly focus on files that use BSD 3-Clause license. The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts. Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.
|
#
75e3597a |
|
21-Sep-2017 |
Kirk McKusick <mckusick@FreeBSD.org> |
Continuing efforts to provide hardening of FFS, this change adds a check hash to cylinder groups. If a check hash fails when a cylinder group is read, no further allocations are attempted in that cylinder group until it has been fixed by fsck. This avoids a class of filesystem panics related to corrupted cylinder group maps. The hash is done using crc32c. Check hases are added only to UFS2 and not to UFS1 as UFS1 is primarily used in embedded systems with small memories and low-powered processors which need as light-weight a filesystem as possible. Specifics of the changes: sys/sys/buf.h: Add BX_FSPRIV to reserve a set of eight b_xflags that may be used by individual filesystems for their own purpose. Their specific definitions are found in the header files for each filesystem that uses them. Also add fields to struct buf as noted below. sys/kern/vfs_bio.c: It is only necessary to compute a check hash for a cylinder group when it is actually read from disk. When calling bread, you do not know whether the buffer was found in the cache or read. So a new flag (GB_CKHASH) and a pointer to a function to perform the hash has been added to breadn_flags to say that the function should be called to calculate a hash if the data has been read. The check hash is placed in b_ckhash and the B_CKHASH flag is set to indicate that a read was done and a check hash calculated. Though a rather elaborate mechanism, it should also work for check hashing other metadata in the future. A kernel internal API change was to change breada into a static fucntion and add flags and a function pointer to a check-hash function. sys/ufs/ffs/fs.h: Add flags for types of check hashes; stored in a new word in the superblock. Define corresponding BX_ flags for the different types of check hashes. Add a check hash word in the cylinder group. sys/ufs/ffs/ffs_alloc.c: In ffs_getcg do the dance with breadn_flags to get a check hash and if one is provided, check it. sys/ufs/ffs/ffs_vfsops.c: Copy across the BX_FFSTYPES flags in background writes. Update the check hash when writing out buffers that need them. sys/ufs/ffs/ffs_snapshot.c: Recompute check hash when updating snapshot cylinder groups. sys/libkern/crc32.c: lib/libufs/Makefile: lib/libufs/libufs.h: lib/libufs/cgroup.c: Include libkern/crc32.c in libufs and use it to compute check hashes when updating cylinder groups. Four utilities are affected: sbin/newfs/mkfs.c: Add the check hashes when building the cylinder groups. sbin/fsck_ffs/fsck.h: sbin/fsck_ffs/fsutil.c: Verify and update check hashes when checking and writing cylinder groups. sbin/fsck_ffs/pass5.c: Offer to add check hashes to existing filesystems. Precompute check hashes when rebuilding cylinder group (although this will be done when it is written in fsutil.c it is necessary to do it early before comparing with the old cylinder group) sbin/dumpfs/dumpfs.c Print out the new check hash flag(s) sbin/fsdb/Makefile: Needs to add libufs now used by pass5.c imported from fsck_ffs. Reviewed by: kib Tested by: Peter Holm (pho)
|
#
855662c6 |
|
04-Sep-2017 |
Kirk McKusick <mckusick@FreeBSD.org> |
The new fsck recovery information to enable it to find backup superblocks created in revision 322297 only works on disks with sector sizes up to 4K. This update allows the recovery information to be created by newfs and used by fsck on disks with sector sizes up to 64K. Note that FFS currently limits filesystem to be mounted from disks with up to 8K sectors. Expanding this limitation will be the subject of another commit. Reported by: Peter Holm Reviewed with: kib
|
#
77b63aa0 |
|
08-Aug-2017 |
Kirk McKusick <mckusick@FreeBSD.org> |
Since the switch to GPT disk labels, fsck for UFS/FFS has been unable to automatically find alternate superblocks. This checkin places the information needed to find alternate superblocks to the end of the area reserved for the boot block. Filesystems created with a newfs of this vintage or later will create the recovery information. If you have a filesystem created prior to this change and wish to have a recovery block created for your filesystem, you can do so by running fsck in forground mode (i.e., do not use the -p or -y options). As it starts, fsck will ask ``SAVE DATA TO FIND ALTERNATE SUPERBLOCKS'' to which you should answer yes. Discussed with: kib, imp MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D11589
|
#
3cf259c3 |
|
05-May-2017 |
Ed Maste <emaste@FreeBSD.org> |
UFS fs.h: clear warning from use in makefs(1) makefs(1) has a number of signedness warnings (when built with higher WARNS), most of which can be addressed by careful application of casts in makefs itself. There is one case where a signedness warning arises from the blksize macro, so must be addressed in the macro itself. Reviewed by: kib, mckusick MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D10589
|
#
1dc349ab |
|
15-Feb-2017 |
Ed Maste <emaste@FreeBSD.org> |
prefix UFS symbols with UFS_ to reduce namespace pollution Specifically: ROOTINO -> UFS_ROOTINO WINO -> UFS_WINO NXADDR -> UFS_NXADDR NDADDR -> UFS_NDADDR NIADDR -> UFS_NIADDR MAXSYMLINKLEN_UFS[12] -> UFS[12]_MAXSYMLINKLEN (for consistency) Also prefix ext2's and nandfs's NDADDR and NIADDR with EXT2_ and NANDFS_ Reviewed by: kib, mckusick Obtained from: NetBSD MFC after: 1 month Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D9536
|
#
949b56ba |
|
06-Sep-2016 |
Warner Losh <imp@FreeBSD.org> |
Renumber the advertising clause.
|
#
08e94183 |
|
29-Apr-2016 |
Pedro F. Giffuni <pfg@FreeBSD.org> |
UFS: spelling fixes on comments. No functional change.
|
#
4896af9f |
|
01-Mar-2014 |
Pedro F. Giffuni <pfg@FreeBSD.org> |
ufs: small formatting fixes. Cleanup some extra space. Use of tabs vs. spaces. No functional change. MFC after: 3 days Reviewed by: mckusick
|
#
baa12a84 |
|
22-Mar-2013 |
Kirk McKusick <mckusick@FreeBSD.org> |
The purpose of this change to the FFS layout policy is to reduce the running time for a full fsck. It also reduces the random access time for large files and speeds the traversal time for directory tree walks. The key idea is to reserve a small area in each cylinder group immediately following the inode blocks for the use of metadata, specifically indirect blocks and directory contents. The new policy is to preferentially place metadata in the metadata area and everything else in the blocks that follow the metadata area. The size of this area can be set when creating a filesystem using newfs(8) or changed in an existing filesystem using tunefs(8). Both utilities use the `-k held-for-metadata-blocks' option to specify the amount of space to be held for metadata blocks in each cylinder group. By default, newfs(8) sets this area to half of minfree (typically 4% of the data area). This work was inspired by a paper presented at Usenix's FAST '13: www.usenix.org/conference/fast13/ffsck-fast-file-system-checker Details of this implementation appears in the April 2013 of ;login: www.usenix.org/publications/login/april-2013-volume-38-number-2. A copy of the April 2013 ;login: paper can also be downloaded from: www.mckusick.com/publications/faster_fsck.pdf. Reviewed by: kib Tested by: Peter Holm MFC after: 4 weeks
|
#
213a71c6 |
|
18-Nov-2012 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Fix build of kdump(1).
|
#
1848286a |
|
18-Nov-2012 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Add UFS writesuspension mechanism, designed to allow userland processes to modify on-disk metadata for filesystems mounted for write. Reviewed by: kib, mckusick Sponsored by: FreeBSD Foundation
|
#
549f62fa |
|
30-Oct-2012 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Fix problem with geom_label(4) not recognizing UFS labels on filesystems extended using growfs(8). The problem here is that geom_label checks if the filesystem size recorded in UFS superblock is equal to the provider (i.e. device) size. This check cannot be removed due to backward compatibility. On the other hand, in most cases growfs(8) cannot set fs_size in the superblock to match the provider size, because, differently from newfs(8), it cannot recompute cylinder group sizes. To fix this problem, add another superblock field, fs_providersize, used only for this purpose. The geom_label(4) will attach if either fs_size (filesystem created with newfs(8)) or fs_providersize (filesystem expanded using growfs(8)) matches the device size. PR: kern/165962 Reviewed by: mckusick Sponsored by: FreeBSD Foundation
|
#
58b1333a |
|
09-Nov-2011 |
Gleb Kurtsou <gleb@FreeBSD.org> |
Use implementation independent inoNN_t scalars for on-disk UFS structures Approved by: mdf (mentor)
|
#
927a12ae |
|
15-Jul-2011 |
Kirk McKusick <mckusick@FreeBSD.org> |
Add an FFS specific mount option to allow a filesystem checker (typically fsck_ffs) to register that it wishes to use FFS specific sysctl's to update the filesystem. This ensures that two checkers cannot run on a given filesystem at the same time and that no other process accidentally or maliciously uses the filesystem updating sysctls inappropriately. This functionality is needed by the journaling soft-updates recovery code.
|
#
280e091a |
|
10-Jun-2011 |
Jeff Roberson <jeff@FreeBSD.org> |
Implement fully asynchronous partial truncation with softupdates journaling to resolve errors which can cause corruption on recovery with the old synchronous mechanism. - Append partial truncation freework structures to indirdeps while truncation is proceeding. These prevent new block pointers from becoming valid until truncation completes and serialize truncations. - On completion of a partial truncate journal work waits for zeroed pointers to hit indirects. - softdep_journal_freeblocks() handles last frag allocation and last block zeroing. - vtruncbuf/ffs_page_remove moved into softdep_*_freeblocks() so it is only implemented in one place. - Block allocation failure handling moved up one level so it does not proceed with buf locks held. This permits us to do more extensive reclaims when filesystem space is exhausted. - softdep_sync_metadata() is broken into two parts, the first executes once at the start of ffs_syncvnode() and flushes truncations and inode dependencies. The second is called on each locked buf. This eliminates excessive looping and rollbacks. - Improve the mechanism in process_worklist_item() that handles acquiring vnode locks for handle_workitem_remove() so that it works more generally and does not loop excessively over the same worklist items on each call. - Don't corrupt directories by zeroing the tail in fsck. This is only done for regular files. - Push a fsync complete record for files that need it so the checker knows a truncation in the journal is no longer valid. Discussed with: mckusick, kib (ffs_pages_remove and ffs_truncate parts) Tested by: pho
|
#
455a6e0f |
|
11-Feb-2011 |
Konstantin Belousov <kib@FreeBSD.org> |
Use the native sector size of the device backing the UFS volume for SU+J journal blocks, instead of hard coding 512 byte sector size. Journal need to atomically write the block, that can only be guaranteed at the device sector size, not larger. Attempt to write less then sector size results in driver errors. Note that this is the first structure in UFS that depends on the sector size. Other elements are written in the units of fragments. In collaboration with: pho Reviewed by: jeff Tested by: bz, pho
|
#
8c2a54de |
|
28-Dec-2010 |
Konstantin Belousov <kib@FreeBSD.org> |
Add kernel side support for BIO_DELETE/TRIM on UFS. The FS_TRIM fs flag indicates that administrator requested issuing of TRIM commands for the volume. UFS will only send the command to disk if the disk reports GEOM::candelete attribute. Since disk queue is reordered, data block is marked as free in the bitmap only after TRIM command completed. Due to need to sleep waiting for i/o to finish, TRIM bio_done routine schedules taskqueue to set the bitmap bit. Based on the patch by: mckusick Reviewed by: mckusick, pjd Tested by: pho MFC after: 1 month
|
#
fae5c47d |
|
11-Nov-2010 |
Konstantin Belousov <kib@FreeBSD.org> |
Add function lbn_offset to calculate offset of the indirect block of given level. Reviewed by: jeff Tested by: pho
|
#
a7d5f7eb |
|
19-Oct-2010 |
Jamie Gritton <jamie@FreeBSD.org> |
A new jail(8) with a configuration file, to replace the work currently done by /etc/rc.d/jail.
|
#
c0b2efce |
|
14-Sep-2010 |
Kirk McKusick <mckusick@FreeBSD.org> |
Update comments in soft updates code to more fully describe the addition of journalling. Only functional change is to tighten a KASSERT. Reviewed by: jeff Roberson
|
#
113db2dd |
|
24-Apr-2010 |
Jeff Roberson <jeff@FreeBSD.org> |
- Merge soft-updates journaling from projects/suj/head into head. This brings in support for an optional intent log which eliminates the need for background fsck on unclean shutdown. Sponsored by: iXsystems, Yahoo!, and Juniper. With help from: McKusick and Peter Holm
|
#
0718d64d |
|
18-Apr-2010 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
MFC r200796: Implement NFSv4 ACL support for UFS. Reviewed by: rwatson
|
#
4179ce18 |
|
26-Feb-2010 |
Kirk McKusick <mckusick@FreeBSD.org> |
MFC of 203763, 203764, 203768, 203769, 203770, 203782, and 203784. These fixes correct a problem in the file system that treats large inode numbers as negative rather than unsigned. For a default (16K block) file system, this bug began to show up at a file system size above about 16Tb. These fixes also update newfs to ensure that it will never create a filesystem with more than 2^32 inodes. They also update libufs, tunefs, and growfs so that they properly handle inode numbers as unsigned. Reported by: Scott Burns, John Kilburg, and Bruce Evans Followup by: Jeff Roberson PR: 133980
|
#
81479e68 |
|
11-Feb-2010 |
Kirk McKusick <mckusick@FreeBSD.org> |
One last pass to get all the unsigned comparisons correct.
|
#
e870d1e6 |
|
10-Feb-2010 |
Kirk McKusick <mckusick@FreeBSD.org> |
This fix corrects a problem in the file system that treats large inode numbers as negative rather than unsigned. For a default (16K block) file system, this bug began to show up at a file system size above about 16Tb. To fully handle this problem, newfs must be updated to ensure that it will never create a filesystem with more than 2^32 inodes. That patch will be forthcoming soon. Reported by: Scott Burns, John Kilburg, Bruce Evans Followup by: Jeff Roberson PR: 133980 MFC after: 2 weeks
|
#
e268f54c |
|
11-Jan-2010 |
Kirk McKusick <mckusick@FreeBSD.org> |
Background: When renaming a directory it passes through several intermediate states. First its new name will be created causing it to have two names (from possibly different parents). Next, if it has different parents, its value of ".." will be changed from pointing to the old parent to pointing to the new parent. Concurrently, its old name will be removed bringing it back into a consistent state. When fsck encounters an extra name for a directory, it offers to remove the "extraneous hard link"; when it finds that the names have been changed but the update to ".." has not happened, it offers to rewrite ".." to point at the correct parent. Both of these changes were considered unexpected so would cause fsck in preen mode or fsck in background mode to fail with the need to run fsck manually to fix these problems. Fsck running in preen mode or background mode now corrects these expected inconsistencies that arise during directory rename. The functionality added with this update is used by fsck running in background mode to make these fixes. Solution: This update adds three new fsck sysctl commands to support background fsck in correcting expected inconsistencies that arise from incomplete directory rename operations. They are: setcwd(dirinode) - set the current directory to dirinode in the filesystem associated with the snapshot. setdotdot(oldvalue, newvalue) - Verify that the inode number for ".." in the current directory is oldvalue then change it to newvalue. unlink(nameptr, oldvalue) - Verify that the inode number associated with nameptr in the current directory is oldvalue then unlink it. As with all other fsck sysctls, these new ones may only be used by processes with appropriate priviledge. Reported by: jeff Security issues: rwatson
|
#
9340fc72 |
|
21-Dec-2009 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Implement NFSv4 ACL support for UFS. Reviewed by: rwatson
|
#
a6b691a8 |
|
17-Dec-2008 |
Sam Leffler <sam@FreeBSD.org> |
Apply the big hammer: o remove all of compat except for pwcache and strstuftoll; these might end up in libutil or similar so keep them in the subdir o mv getid.c up to the top level; this looks like something that'll be makefs-specific o eliminate private versions of .h files in sys; use system files instead o eliminate private ffs_tables.c; use the system version directly (might want to adopt const'ification at some point but that's the only diff I can see) o mv remaining code from sys to ffs and strip out unused bits; this now becomes part of makefs o add compat defs and shims to makefs.h o strip all vestiges of nbtool_config.h, compat_defs.h, etc. o fixup includes after file shuffling o rename system #defines that do implicit byte swapping to have an _swap suffix; e.g. DIRSIZ -> DIRSIZ_SWAP, cg_inosused -> cg_inosused_swap; if we ever add endian-agnostic support to the kernel these can go back to their original names o strip some netbsd'isms that aren't worth shim'ing (e.g. _DIAGASSERT) Code compiles w/o complaints but is untested.
|
#
b6842439 |
|
23-Nov-2008 |
Sam Leffler <sam@FreeBSD.org> |
prepare makefs for import to base
|
#
d7f03759 |
|
19-Oct-2008 |
Ulf Lilleengen <lulf@FreeBSD.org> |
- Import the HEAD csup code which is the basis for the cvsmode work.
|
#
8e7a2353 |
|
24-May-2008 |
Craig Rodrigues <rodrigc@FreeBSD.org> |
Fix comments to replace SBSIZE with SBLOCKSIZE, since SBSIZE was renamed to SBLOCKSIZE in version 1.33 Reviewed by: mckusick
|
#
1a60c7fc |
|
31-Oct-2006 |
Pawel Jakub Dawidek <pjd@FreeBSD.org> |
Add gjournal specific code to the UFS file system: - Add FS_GJOURNAL flag which enables gjournal support on a file system. - Add cg_unrefs field to the cylinder group structure which holds number of unreferenced (orphaned) inodes in the given cylinder group. - Add fs_unrefs field to the super block structure which holds total number of unreferenced (orphaned) inodes. - When file or a directory is orphaned (last reference is removed, but object is still open), increase fs_unrefs and cg_unrefs fields, which is a hint for fsck in which cylinder groups looks for such (orphaned) objects. - When file is last closed, decrease {fs,cg}_unrefs fields. - Add VV_DELETED vnode flag which points at orphaned objects. Sponsored by: home.pl
|
#
a16baf37 |
|
20-Feb-2005 |
Xin LI <delphij@FreeBSD.org> |
The recomputation of file system summary at mount time can be a very slow process, especially for large file systems that is just recovered from a crash. Since the summary is already re-sync'ed every 30 second, we will not lag behind too much after a crash. With this consideration in mind, it is more reasonable to transfer the responsibility to background fsck, to reduce the delay after a crash. Add a new sysctl variable, vfs.ffs.compute_summary_at_mount, to control this behavior. When set to nonzero, we will get the "old" behavior, that the summary is computed immediately at mount time. Add five new sysctl variables to adjust ndir, nbfree, nifree, nffree and numclusters respectively. Teach fsck_ffs about these API, however, intentionally not to check the existence, since kernels without these sysctls must have recomputed the summary and hence no adjustments are necessary. This change has eliminated the usual tens of minutes of delay of mounting large dirty volumes. Reviewed by: mckusick MFC After: 1 week
|
#
f2aa1113 |
|
24-Jan-2005 |
Jeff Roberson <jeff@FreeBSD.org> |
- Mark the struct fs members that require the ufsmount mutex. - Define some macros for manipulating the fs_active bitmap. Sponsored By: Isilon Systems, Inc.
|
#
60727d8b |
|
06-Jan-2005 |
Warner Losh <imp@FreeBSD.org> |
/* -> /*- for license, minor formatting changes
|
#
894d8d3c |
|
09-Oct-2004 |
Nate Lawson <njl@FreeBSD.org> |
Fix fsbtodb() for UFS1. This fixes an overflow for file sizes >1 TB, allowing for sizes up to 4 TB. This doesn't affect UFS2 since b is already a 64 bit type, coincidental with daddr_t. Submitted by: bde
|
#
b72ea57f |
|
19-Aug-2004 |
John Baldwin <jhb@FreeBSD.org> |
Generalize the UFS bad magic value used to determine when a filesystem has only been partly initialized via newfs(8) so that it applies to both UFS1 and UFS2. Submitted by: "Xin LI" delphij at frontfree dot net MFC: maybe?
|
#
b4a1d929 |
|
31-May-2004 |
Kirill Ponomarev <krion@FreeBSD.org> |
- Fix typo Approved by: tobez
|
#
012d4134 |
|
06-Apr-2004 |
Warner Losh <imp@FreeBSD.org> |
Remove advertising clause from University of California Regent's license, per letter dated July 22, 1999 and irc message from Robert Watson saying that clause 3 can be removed from those files with an NAI copyright that also have only a University of California copyrights. Approved by: core, rwatson
|
#
b1fddb23 |
|
03-Apr-2004 |
Maxime Henrion <mux@FreeBSD.org> |
Fix the remaining warnings of growfs(8) on my sparc64 box with WARNS=6. I don't change the WARNS level in the Makefile because I didn't tested this on other archs. The fs.h fix was suggested by: marcel Reviewed by: md5(1)
|
#
ec52df8e |
|
16-Nov-2003 |
Wes Peters <wes@FreeBSD.org> |
Write the UFS2 superblock with a 'BAD' magic number at the beginning of newfs, to signify the newfs operation has not yet completed. Re- write the superblock with the correct magic number once all of the cylinder groups have been created to show the operation has finished. Sponsored by: St. Bernard Software
|
#
d60682c2 |
|
21-Feb-2003 |
Kirk McKusick <mckusick@FreeBSD.org> |
This patch fixes a bug in the logical block calculation macros so that they convert to 64-bit values before shifting rather than afterwards. Once fixed, they can be used rather than inline expanded. Sponsored by: DARPA & NAI Labs.
|
#
cc4a8583 |
|
09-Jan-2003 |
Marcel Moolenaar <marcel@FreeBSD.org> |
o Improve wording of the comment that accompanies fs_pad. The padding is not specific to non-i386 architectures. It is caused by non-i386 specific alignment requirements of fs_swuid, o Add a CTASSERT to catch a change in the size of struct fs at compile-time rather than run-time. Ok'd: gordon Tested on: i386 ia64
|
#
963cae78 |
|
09-Jan-2003 |
Gordon Tetlow <gordon@FreeBSD.org> |
Fix superblock alignment problems on non-i386 platforms. Also change fs_uuid to fs_swuid, making it more descriptive. Submitted by: marcel Reviewed by: peter Pointy hat to: gordon
|
#
291871da |
|
08-Jan-2003 |
Gordon Tetlow <gordon@FreeBSD.org> |
Steal some space from fs_fsmnt to create fs_volname and fs_uuid. The volname will be used to support volume names with the help of a GEOM module (to be committed). uuid will be used to deal with conflicting volume names (which doesn't work just yet). Approved by: mckusick@
|
#
ada981b2 |
|
26-Nov-2002 |
Kirk McKusick <mckusick@FreeBSD.org> |
Create a new 32-bit fs_flags word in the superblock. Add code to move the old 8-bit fs_old_flags to the new location the first time that the filesystem is mounted by a new kernel. One of the unused flags in fs_old_flags is used to indicate that the flags have been moved. Leave the fs_old_flags word intact so that it will work properly if used on an old kernel. Change the fs_sblockloc superblock location field to be in units of bytes instead of in units of filesystem fragments. The old units did not work properly when the fragment size exceeeded the superblock size (8192). Update old fs_sblockloc values at the same time that the flags are moved. Suggested by: BOUWSMA Barry <freebsd-misuser@netscum.dyndns.dk> Sponsored by: DARPA & NAI Labs.
|
#
3ceef565 |
|
14-Oct-2002 |
Robert Watson <rwatson@FreeBSD.org> |
Define two new superblock file system flags: FS_ACLS Administrative enable/disable of extended ACL support FS_MULTILABEL Administrative flag to indicate to the MAC Framework that objects in the file system are individually labeled using extended attributes. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories Reviewed by: (in principal) mckusick, phk
|
#
1c85e6a3 |
|
21-Jun-2002 |
Kirk McKusick <mckusick@FreeBSD.org> |
This commit adds basic support for the UFS2 filesystem. The UFS2 filesystem expands the inode to 256 bytes to make space for 64-bit block pointers. It also adds a file-creation time field, an ability to use jumbo blocks per inode to allow extent like pointer density, and space for extended attributes (up to twice the filesystem block size worth of attributes, e.g., on a 16K filesystem, there is space for 32K of attributes). UFS2 fully supports and runs existing UFS1 filesystems. New filesystems built using newfs can be built in either UFS1 or UFS2 format using the -O option. In this commit UFS1 is the default format, so if you want to build UFS2 format filesystems, you must specify -O 2. This default will be changed to UFS2 when UFS2 proves itself to be stable. In this commit the boot code for reading UFS2 filesystems is not compiled (see /sys/boot/common/ufsread.c) as there is insufficient space in the boot block. Once the size of the boot block is increased, this code can be defined. Things to note: the definition of SBSIZE has changed to SBLOCKSIZE. The header file <ufs/ufs/dinode.h> must be included before <ufs/ffs/fs.h> so as to get the definitions of ufs2_daddr_t and ufs_lbn_t. Still TODO: Verify that the first level bootstraps work for all the architectures. Convert the utility ffsinfo to understand UFS2 and test growfs. Add support for the extended attribute storage. Update soft updates to ensure integrity of extended attribute storage. Switch the current extended attribute interfaces to use the extended attribute storage. Add the extent like functionality (framework is there, but is currently never used). Sponsored by: DARPA & NAI Labs. Reviewed by: Poul-Henning Kamp <phk@freebsd.org>
|
#
d394511d |
|
16-May-2002 |
Tom Rhodes <trhodes@FreeBSD.org> |
More s/file system/filesystem/g
|
#
7110af75 |
|
12-May-2002 |
Poul-Henning Kamp <phk@FreeBSD.org> |
ARGH! SBLOCK is not unused. Try to get this right. BBSIZE belongs in <sys/disklabel.h> (but shouldn't be a constant). Define SBLOCK again, using the right math. Sponsored by: DARPA & NAI Labs.
|
#
7cb71b74 |
|
12-May-2002 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Remove #define for BBOFF, it is assumed == 0 so many places that we might as well forget about it. In fact the only thing which used it was the SBOFF macro. Sponsored by: DARPA & NAI Labs.
|
#
16910634 |
|
12-May-2002 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Remove unused BBLOCK and SBLOCK #defines. Sponsored by: DARPA & NAI Labs.
|
#
a463023d |
|
03-Apr-2002 |
Poul-Henning Kamp <phk@FreeBSD.org> |
Move the FFS parameter MAXFRAG from <sys/param.h> to <ufs/ffs/fs.h> Sponsored by: DARPA & NAI Labs.
|
#
99bef878 |
|
17-Jan-2002 |
Kirk McKusick <mckusick@FreeBSD.org> |
Fix a bug introduced in ffs_snapshot.c -r1.25 and fs.h -r1.26 which caused incomplete snapshots to be taken. When background fsck would run on these snapshots, the result would be files being incorrectly released which would subsequently panic the kernel with ``handle_workitem_freefile: inodedep survived'', ``handle_written_inodeblock: live inodedep'', and ``handle_workitem_remove: lost inodedep'' errors.
|
#
f305c5d1 |
|
18-Dec-2001 |
Kirk McKusick <mckusick@FreeBSD.org> |
Change the atomic_set_char to atomic_set_int and atomic_clear_char to atomic_clear_int to ease the implementation for the sparc64. Requested by: Jake Burkholder <jake@locore.ca>
|
#
3fa4044e |
|
16-Dec-2001 |
Ian Dowse <iedowse@FreeBSD.org> |
Move the new superblock field `fs_active' into the region of the superblock that is already set up to handle pointer types. This fixes an accidental change in the superblock size on 64-bit platforms caused by revision 1.24.
|
#
cc5a9233 |
|
13-Dec-2001 |
Kirk McKusick <mckusick@FreeBSD.org> |
Minimize the time necessary to suspend operations on a filesystem when taking a snapshot. The two time consuming operations are scanning all the filesystem bitmaps to determine which blocks are in use and scanning all the other snapshots so as to be able to expunge their blocks from the view of the current snapshot. The bitmap scanning is broken into two passes. Before suspending the filesystem all bitmaps are scanned. After the suspension, those bitmaps that changed after being scanned the first time are rescanned. Typically there are few bitmaps that need to be rescanned. The expunging of other snapshots is now done after the suspension is released by observing that we can easily identify any blocks that were allocated to them after the suspension (they will be maked as `not needing to be copied' in the just created snapshot). For all the gory details, see the ``Running fsck in the Background'' paper in the Usenix BSDCon 2002 Conference Proceedings, pages 55-64.
|
#
815d14dd |
|
15-Jul-2001 |
Peter Wemm <peter@FreeBSD.org> |
Use a fixed type for times in on-disk structures for ufs rather than something that could potentially change like time_t.
|
#
9ccb939e |
|
08-May-2001 |
Kirk McKusick <mckusick@FreeBSD.org> |
When running with soft updates, track the number of blocks and files that are committed to being freed and reflect these blocks in the counts returned by statfs (and thus also by the `df' command). This change allows programs such as those that do news expiration to know when to stop if they are trying to create a certain percentage of free space. Note that this change does not solve the much harder problem of making this to-be-freed space available to applications that want it (thus on a nearly full filesystem, you may still encounter out-of-space conditions even though the free space will show up eventually). Hopefully this harder problem will be the subject of a future enhancement.
|
#
1a6a6610 |
|
13-Apr-2001 |
Kirk McKusick <mckusick@FreeBSD.org> |
This checkin adds support in ufs/ffs for the FS_NEEDSFSCK flag. It is described in ufs/ffs/fs.h as follows: /* * Filesystem flags. * * Note that the FS_NEEDSFSCK flag is set and cleared only by the * fsck utility. It is set when background fsck finds an unexpected * inconsistency which requires a traditional foreground fsck to be * run. Such inconsistencies should only be found after an uncorrectable * disk error. A foreground fsck will clear the FS_NEEDSFSCK flag when * it has successfully cleaned up the filesystem. The kernel uses this * flag to enforce that inconsistent filesystems be mounted read-only. */ #define FS_UNCLEAN 0x01 /* filesystem not clean at mount */ #define FS_DOSOFTDEP 0x02 /* filesystem using soft dependencies */ #define FS_NEEDSFSCK 0x04 /* filesystem needs sync fsck before mount */
|
#
a61ab64a |
|
10-Apr-2001 |
Kirk McKusick <mckusick@FreeBSD.org> |
Directory layout preference improvements from Grigoriy Orlov <gluk@ptci.ru>. His description of the problem and solution follow. My own tests show speedups on typical filesystem intensive workloads of 5% to 12% which is very impressive considering the small amount of code change involved. ------ One day I noticed that some file operations run much faster on small file systems then on big ones. I've looked at the ffs algorithms, thought about them, and redesigned the dirpref algorithm. First I want to describe the results of my tests. These results are old and I have improved the algorithm after these tests were done. Nevertheless they show how big the perfomance speedup may be. I have done two file/directory intensive tests on a two OpenBSD systems with old and new dirpref algorithm. The first test is "tar -xzf ports.tar.gz", the second is "rm -rf ports". The ports.tar.gz file is the ports collection from the OpenBSD 2.8 release. It contains 6596 directories and 13868 files. The test systems are: 1. Celeron-450, 128Mb, two IDE drives, the system at wd0, file system for test is at wd1. Size of test file system is 8 Gb, number of cg=991, size of cg is 8m, block size = 8k, fragment size = 1k OpenBSD-current from Dec 2000 with BUFCACHEPERCENT=35 2. PIII-600, 128Mb, two IBM DTLA-307045 IDE drives at i815e, the system at wd0, file system for test is at wd1. Size of test file system is 40 Gb, number of cg=5324, size of cg is 8m, block size = 8k, fragment size = 1k OpenBSD-current from Dec 2000 with BUFCACHEPERCENT=50 You can get more info about the test systems and methods at: http://www.ptci.ru/gluk/dirpref/old/dirpref.html Test Results tar -xzf ports.tar.gz rm -rf ports mode old dirpref new dirpref speedup old dirprefnew dirpref speedup First system normal 667 472 1.41 477 331 1.44 async 285 144 1.98 130 14 9.29 sync 768 616 1.25 477 334 1.43 softdep 413 252 1.64 241 38 6.34 Second system normal 329 81 4.06 263.5 93.5 2.81 async 302 25.7 11.75 112 2.26 49.56 sync 281 57.0 4.93 263 90.5 2.9 softdep 341 40.6 8.4 284 4.76 59.66 "old dirpref" and "new dirpref" columns give a test time in seconds. speedup - speed increasement in times, ie. old dirpref / new dirpref. ------ Algorithm description The old dirpref algorithm is described in comments: /* * Find a cylinder to place a directory. * * The policy implemented by this algorithm is to select from * among those cylinder groups with above the average number of * free inodes, the one with the smallest number of directories. */ A new directory is allocated in a different cylinder groups than its parent directory resulting in a directory tree that is spreaded across all the cylinder groups. This spreading out results in a non-optimal access to the directories and files. When we have a small filesystem it is not a problem but when the filesystem is big then perfomance degradation becomes very apparent. What I mean by a big file system ? 1. A big filesystem is a filesystem which occupy 20-30 or more percent of total drive space, i.e. first and last cylinder are physically located relatively far from each other. 2. It has a relatively large number of cylinder groups, for example more cylinder groups than 50% of the buffers in the buffer cache. The first results in long access times, while the second results in many buffers being used by metadata operations. Such operations use cylinder group blocks and on-disk inode blocks. The cylinder group block (fs->fs_cblkno) contains struct cg, inode and block bit maps. It is 2k in size for the default filesystem parameters. If new and parent directories are located in different cylinder groups then the system performs more input/output operations and uses more buffers. On filesystems with many cylinder groups, lots of cache buffers are used for metadata operations. My solution for this problem is very simple. I allocate many directories in one cylinder group. I also do some things, so that the new allocation method does not cause excessive fragmentation and all directory inodes will not be located at a location far from its file's inodes and data. The algorithm is: /* * Find a cylinder group to place a directory. * * The policy implemented by this algorithm is to allocate a * directory inode in the same cylinder group as its parent * directory, but also to reserve space for its files inodes * and data. Restrict the number of directories which may be * allocated one after another in the same cylinder group * without intervening allocation of files. * * If we allocate a first level directory then force allocation * in another cylinder group. */ My early versions of dirpref give me a good results for a wide range of file operations and different filesystem capacities except one case: those applications that create their entire directory structure first and only later fill this structure with files. My solution for such and similar cases is to limit a number of directories which may be created one after another in the same cylinder group without intervening file creations. For this purpose, I allocate an array of counters at mount time. This array is linked to the superblock fs->fs_contigdirs[cg]. Each time a directory is created the counter increases and each time a file is created the counter decreases. A 60Gb filesystem with 8mb/cg requires 10kb of memory for the counters array. The maxcontigdirs is a maximum number of directories which may be created without an intervening file creation. I found in my tests that the best performance occurs when I restrict the number of directories in one cylinder group such that all its files may be located in the same cylinder group. There may be some deterioration in performance if all the file inodes are in the same cylinder group as its containing directory, but their data partially resides in a different cylinder group. The maxcontigdirs value is calculated to try to prevent this condition. Since there is no way to know how many files and directories will be allocated later I added two optimization parameters in superblock/tunefs. They are: int32_t fs_avgfilesize; /* expected average file size */ int32_t fs_avgfpdir; /* expected # of files per directory */ These parameters have reasonable defaults but may be tweeked for special uses of a filesystem. They are only necessary in rare cases like better tuning a filesystem being used to store a squid cache. I have been using this algorithm for about 3 months. I have done a lot of testing on filesystems with different capacities, average filesize, average number of files per directory, and so on. I think this algorithm has no negative impact on filesystem perfomance. It works better than the default one in all cases. The new dirpref will greatly improve untarring/removing/coping of big directories, decrease load on cvs servers and much more. The new dirpref doesn't speedup a compilation process, but also doesn't slow it down. Obtained from: Grigoriy Orlov <gluk@ptci.ru>
|
#
812b1d41 |
|
20-Mar-2001 |
Kirk McKusick <mckusick@FreeBSD.org> |
Add kernel support for running fsck on active filesystems.
|
#
589c7af9 |
|
07-Mar-2001 |
Kirk McKusick <mckusick@FreeBSD.org> |
Fixes to track snapshot copy-on-write checking in the specinfo structure rather than assuming that the device vnode would reside in the FFS filesystem (which is obviously a broken assumption with the device filesystem).
|
#
f55ff3f3 |
|
15-Jan-2001 |
Ian Dowse <iedowse@FreeBSD.org> |
The ffs superblock includes a 128-byte region for use by temporary in-core pointers to summary information. An array in this region (fs_csp) could overflow on filesystems with a very large number of cylinder groups (~16000 on i386 with 8k blocks). When this happens, other fields in the superblock get corrupted, and fsck refuses to check the filesystem. Solve this problem by replacing the fs_csp array in 'struct fs' with a single pointer, and add padding to keep the length of the 128-byte region fixed. Update the kernel and userland utilities to use just this single pointer. With this change, the kernel no longer makes use of the superblock fields 'fs_csshift' and 'fs_csmask'. Add a comment to newfs/mkfs.c to indicate that these fields must be calculated for compatibility with older kernels. Reviewed by: mckusick
|
#
22e5a623 |
|
03-Jul-2000 |
Kirk McKusick <mckusick@FreeBSD.org> |
Get userland visible flags added for snapshots to give a few days advance preparation for them to get migrated into place so that subsequent changes in utilities will not fail to compile for lack of up-to-date header files in /usr/include.
|
#
584508a7 |
|
16-Mar-2000 |
Kirk McKusick <mckusick@FreeBSD.org> |
Use 64-bit math to calculate if we have hit our freespace limit. Necessary for coherent results on filesystems bigger than 0.5Tb.
|
#
c3aac50f |
|
27-Aug-1999 |
Peter Wemm <peter@FreeBSD.org> |
$Id$ -> $FreeBSD$
|
#
b1897c19 |
|
08-Mar-1998 |
Julian Elischer <julian@FreeBSD.org> |
Reviewed by: dyson@freebsd.org (john Dyson), dg@root.com (david greenman) Submitted by: Kirk McKusick (mcKusick@mckusick.com) Obtained from: WHistle development tree
|
#
90b42d7e |
|
23-Mar-1997 |
Bruce Evans <bde@FreeBSD.org> |
Fixed corrupted newline and corrupted tab in previous commit.
|
#
8f89943e |
|
23-Mar-1997 |
Guido van Rooij <guido@FreeBSD.org> |
Add generation number randomization. Newly created filesystems wil now automatically have random generation numbers. The kenel way of handling those also changed. Further it is advised to run fsirand on all your nfs exported filesystems. the code is mostly copied from OpenBSD, with the randomization chanegd to use /dev/urandom Reviewed by: Garrett Obtained from: OpenBSD
|
#
6875d254 |
|
22-Feb-1997 |
Peter Wemm <peter@FreeBSD.org> |
Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not ready for it yet.
|
#
996c772f |
|
09-Feb-1997 |
John Dyson <dyson@FreeBSD.org> |
This is the kernel Lite/2 commit. There are some requisite userland changes, so don't expect to be able to run the kernel as-is (very well) without the appropriate Lite/2 userland changes. The system boots and can mount UFS filesystems. Untested: ext2fs, msdosfs, NFS Known problems: Incorrect Berkeley ID strings in some files. Mount_std mounts will not work until the getfsent library routine is changed. Reviewed by: various people Submitted by: Jeffery Hsu <hsu@freebsd.org>
|
#
1130b656 |
|
14-Jan-1997 |
Jordan K. Hubbard <jkh@FreeBSD.org> |
Make the long-awaited change from $Id$ to $FreeBSD$ This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long. Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
|
#
ccb17497 |
|
12-Oct-1996 |
Bruce Evans <bde@FreeBSD.org> |
Fixed lblktosize(). It overflowed at 2G. This bug only affected ufs_read() and ufs_write(). Found by: looking at warnings for comparing the result of lblktosize() (which is usually daddr_t = long) with file sizes (which are u_quad_t for ufs). File sizes should probably be off_t's to avoid warnings when the are compared with file offsets, so the fixed lblktosize() casts to off_t instead of u_quad_t. Added definition of smalllblksize(). It is the same as the old lblksize() and is more efficient for small block numbers on 32-bit machines. Use smalllblktosize() instead of its expansion in blksize() and dblksize(). This keeps the line length short and makes it more obvious that the shift can't overflow.
|
#
e1eec28a |
|
11-Mar-1996 |
Peter Wemm <peter@FreeBSD.org> |
Import 4.4BSD-Lite2 onto the vendor branch, note that in the kernel, all files are off the vendor branch, so this should not change anything. A "U" marker generally means that the file was not changed in between the 4.4Lite and Lite-2 releases, and does not need a merge. "C" generally means that there was a change.
|
#
6c5e9bbd |
|
30-Jan-1996 |
Mike Pritchard <mpp@FreeBSD.org> |
Fix a bunch of spelling errors in the comment fields of a bunch of system include files.
|
#
9b2e5354 |
|
30-May-1995 |
Rodney W. Grimes <rgrimes@FreeBSD.org> |
Remove trailing whitespace.
|
#
07be9bd4 |
|
10-Mar-1995 |
David Greenman <dg@FreeBSD.org> |
Increased default minfree to 8%.
|
#
eb9fb78c |
|
21-Aug-1994 |
Paul Richards <paul@FreeBSD.org> |
Made idempotent Reviewed by: Submitted by:
|
#
3c4dd356 |
|
02-Aug-1994 |
David Greenman <dg@FreeBSD.org> |
Added $Id$
|
#
df8bae1d |
|
24-May-1994 |
Rodney W. Grimes <rgrimes@FreeBSD.org> |
BSD 4.4 Lite Kernel Sources
|