Cross Reference: /linux-master/fs/xfs/scrub/bmap.c

History log of /linux-master/fs/xfs/scrub/bmap.c
Revision	Date	Author	Comments
# 5049ff4d	22-Feb-2024	Darrick J. Wong <djwong@kernel.org>	xfs: create a helper to decide if a file mapping targets the rt volume Create a helper so that we can stop open-coding this decision everywhere. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
# 8f71bede	15-Dec-2023	Darrick J. Wong <djwong@kernel.org>	xfs: repair inode fork block mapping data structures Use the reverse-mapping btree information to rebuild an inode block map. Update the btree bulk loading code as necessary to support inode rooted btrees and fix some bitrot problems. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
# c3a22c2e	15-Dec-2023	Darrick J. Wong <djwong@kernel.org>	xfs: skip the rmapbt search on an empty attr fork unless we know it was zapped The attribute fork scrubber can optionally scan the reverse mapping records of the filesystem to determine if the fork is missing mappings that it should have. However, this is a very expensive operation, so we only want to do this if we suspect that the fork is missing records. For attribute forks the criteria for suspicion is that the attr fork is in EXTENTS format and has zero extents. However, there are several ways that a file can end up in this state through regular filesystem usage. For example, an LSM can set a s_security hook but then decide not to set an ACL; or an attr set can create the attr fork but then the actual set operation fails with ENOSPC; or we can delete all the attrs on a file whose data fork is in btree format, in which case we do not delete the attr fork. We don't want to run the expensive check for any case that can be arrived at through regular operations. However. When online inode repair decides to zap an attribute fork, it cannot determine if it is zapping ACL information. As a precaution it removes all the discretionary access control permissions and sets the user and group ids to zero. Check these three additional conditions to decide if we want to scan the rmap records. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
# d9041681	15-Dec-2023	Darrick J. Wong <djwong@kernel.org>	xfs: set inode sick state flags when we zap either ondisk fork In a few patches, we'll add some online repair code that tries to massage the ondisk inode record just enough to get it to pass the inode verifiers so that we can continue with more file repairs. Part of that massaging can include zapping the ondisk forks to clear errors. After that point, the bmap fork repair functions will rebuild the zapped forks. Christoph asked for stronger protections against online repair zapping a fork to get the inode to load vs. other threads trying to access the partially repaired file. Do this by adding a special "[DA]FORK_ZAPPED" inode health flag whenever repair zaps a fork, and sprinkling checks for that flag into the various file operations for things that don't like handling an unexpected zero-extents fork. In practice xfs_scrub will scrub and fix the forks almost immediately after zapping them, so the window is very small. However, if a crash or unmount should occur, we can still detect these zapped inode forks by looking for a zero-extents fork when data was expected. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
# 259ba1d3	15-Dec-2023	Darrick J. Wong <djwong@kernel.org>	xfs: try to attach dquots to files before repairing them Inode resource usage is tracked in the quota metadata. Repairing a file might change the resources used by that file, which means that we need to attach dquots to the file that we're examining before accessing anything in the file protected by the ILOCK. However, there's a twist: a dquot cache miss requires the dquot to be read in from the quota file, during which we drop the ILOCK on the file being examined. This means that we must try to attach the dquots before taking the ILOCK. Therefore, dquots must be attached to files in the scrub setup function. If doing so yields corruption errors (or unknown dquot errors), we instead clear the quotachecked status, which will cause a quotacheck on next mount. A future series will make this trigger live quotacheck. While we're here, change the xrep_ino_dqattach function to use the unlocked dqattach functions so that we avoid cycling the ILOCK if the inode already has dquots attached. This makes the naming and locking requirements consistent with the rest of the filesystem. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
# 3d2b6d03	16-Oct-2023	Darrick J. Wong <djwong@kernel.org>	xfs: rename xfs_verify_rtext to xfs_verify_rtbext This helper function validates that a range of blocks in the realtime section is completely contained within the realtime section. It does /not/ validate ranges of rtextents. Rename the function to avoid suggesting that it does, and change the type of the @len parameter since xfs_rtblock_t is a position unit, not a length unit. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>
# e27a1369	10-Aug-2023	Darrick J. Wong <djwong@kernel.org>	xfs: don't check reflink iflag state when checking cow fork Any inode on a reflink filesystem can have a cow fork, even if the inode does not have the reflink iflag set. This happens either because the inode once had the iflag set but does not now, because we don't free the incore cow fork until the icache deletes the inode; or because we're running in alwayscow mode. Either way, we can collapse both of the xfs_is_reflink_inode calls into one, and change it to xfs_has_reflink, now that the bmap checker will return ENOENT if there is no pointer to the incore fork. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
# 65092ca1	10-Aug-2023	Darrick J. Wong <djwong@kernel.org>	xfs: simplify returns in xchk_bmap Remove the pointless goto and return code in xchk_bmap, since it only serves to obscure what's going on in the function. Instead, return whichever error code is appropriate there. For nonexistent forks, this should have been ENOENT. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
# 294012fb	10-Aug-2023	Darrick J. Wong <djwong@kernel.org>	xfs: wrap ilock/iunlock operations on sc->ip Scrub tracks the resources that it's holding onto in the xfs_scrub structure. This includes the inode being checked (if applicable) and the inode lock state of that inode. Replace the open-coded structure manipulation with a trivial helper to eliminate sources of error. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
# 6be73cec	04-Jun-2023	Darrick J. Wong <djwong@kernel.org>	xfs: fix broken logic when detecting mergeable bmap records Commit 6bc6c99a944c was a well-intentioned effort to initiate consolidation of adjacent bmbt mapping records by setting the PREEN flag. Consolidation can only happen if the length of the combined record doesn't overflow the 21-bit blockcount field of the bmbt recordset. Unfortunately, the length test is inverted, leading to it triggering on data forks like these: EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL 0: [0..16777207]: 76110848..92888055 0 (76110848..92888055) 16777208 1: [16777208..20639743]: 92888056..96750591 0 (92888056..96750591) 3862536 Note that record 0 has a length of 16777208 512b blocks. This corresponds to 2097151 4k fsblocks, which is the maximum. Hence the two records cannot be merged. However, the logic is still wrong even if we change the in-loop comparison, because the scope of our examination isn't broad enough inside the loop to detect mappings like this: 0: [0..9]: 76110838..76110847 0 (76110838..76110847) 10 1: [10..16777217]: 76110848..92888055 0 (76110848..92888055) 16777208 2: [16777218..20639753]: 92888056..96750591 0 (92888056..96750591) 3862536 These three records could be merged into two, but one cannot determine this purely from looking at records 0-1 or 1-2 in isolation. Hoist the mergability detection outside the loop, and base its decision making on whether or not a merged mapping could be expressed in fewer bmbt records. While we're at it, fix the incorrect return type of the iter function. Fixes: 336642f79283 ("xfs: alert the user about data/attr fork mappings that could be merged") Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
# 397b2d7e	01-May-2023	Darrick J. Wong <djwong@kernel.org>	xfs: flush dirty data and drain directios before scrubbing cow fork When we're scrubbing the COW fork, we need to take MMAPLOCK_EXCL to prevent page_mkwrite from modifying any inode state. The ILOCK should suffice to avoid confusing online fsck, but let's take the same locks that we do everywhere else. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
# 1e59fdb7	11-Apr-2023	Darrick J. Wong <djwong@kernel.org>	xfs: don't call xchk_bmap_check_rmaps for btree-format file forks The logic at the end of xchk_bmap_want_check_rmaps tries to detect a file fork that has been zapped by what will become the online inode repair code. Zapped forks are in FMT_EXTENTS with zero extents, and some sort of hint that there's supposed to be data somewhere in the filesystem. Unfortunately, the inverted logic here is confusing and has the effect that we always call xchk_bmap_check_rmaps for FMT_BTREE forks. This is horribly inefficient and unnecessary, so invert the logic to get rid of this performance problem. This has caused 8h delays in generic/333 and generic/334. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
# e8882f69	11-Apr-2023	Darrick J. Wong <djwong@kernel.org>	xfs: split the xchk_bmap_check_rmaps into a predicate This function has two parts: the second part scans every reverse mapping record for this file fork to make sure that there's a corresponding mapping in the fork, and the first part decides if we even want to do that. Split the first part into a separate predicate so that we can make more changes to it in the next patch. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
# 336642f7	11-Apr-2023	Darrick J. Wong <djwong@kernel.org>	xfs: alert the user about data/attr fork mappings that could be merged If the data or attr forks have mappings that could be merged, let the user know that the structure could be optimized. This isn't a filesystem corruption since the regular filesystem does not try to be smart about merging bmbt records. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
# c0d5a92f	11-Apr-2023	Darrick J. Wong <djwong@kernel.org>	xfs: split xchk_bmap_xref_rmap into two functions There's more special-cased functionality than not in this function. Split it into two so that each can be far more cohesive. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
# 634d4a79	11-Apr-2023	Darrick J. Wong <djwong@kernel.org>	xfs: accumulate iextent records when checking bmap Currently, the bmap scrubber checks file fork mappings individually. In the case that the file uses multiple mappings to a single contiguous piece of space, the scrubber repeatedly locks the AG to check the existence of a reverse mapping that overlaps this file mapping. If the reverse mapping starts before or ends after the mapping we're checking, it will also crawl around in the bmbt checking correspondence for adjacent extents. This is not very time efficient because it does the crawling while holding the AGF buffer, and checks the middle mappings multiple times. Instead, create a custom iextent record iterator function that combines multiple adjacent allocated mappings into one large incore bmbt record. This is feasible because the incore bmbt record length is 64-bits wide. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
# 971ee3a6	11-Apr-2023	Darrick J. Wong <djwong@kernel.org>	xfs: change bmap scrubber to store the previous mapping Convert the inode data/attr/cow fork scrubber to remember the entire previous mapping, not just the next expected offset. No behavior changes here, but this will enable some better checking in subsequent patches. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
# 1fc7a059	11-Apr-2023	Darrick J. Wong <djwong@kernel.org>	xfs: don't take the MMAPLOCK when scrubbing file metadata The MMAPLOCK stabilizes mappings in a file's pagecache. Therefore, we do not need it to check directories, symlinks, extended attributes, or file-based metadata. Reduce its usage to the one case that requires it, which is when we want to scrub the data fork of a regular file. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
# 46e0dd89	11-Apr-2023	Darrick J. Wong <djwong@kernel.org>	xfs: rename xchk_get_inode -> xchk_iget_for_scrubbing Dave Chinner suggested renaming this function to make more obvious what it does. The function returns an incore inode to callers that want to scrub a metadata structure that hangs off an inode. If the iget fails with EINVAL, it will single-step the loading process to distinguish between actually free inodes or impossible inumbers (ENOENT); discrepancies between the inobt freemask and the free status in the inode record (EFSCORRUPTED). Any other negative errno is returned unchanged. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
# 30f8ee5e	11-Apr-2023	Darrick J. Wong <djwong@kernel.org>	xfs: ensure that single-owner file blocks are not owned by others For any file fork mapping that can only have a single owner, make sure that there are no other rmap owners for that mapping. This patch requires the more detailed checking provided by xfs_rmap_count_owners so that we can know how many rmap records for a given range of space had a matching owner, how many had a non-matching owner, and how many conflicted with the records that have a matching owner. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
# 7ac14fa2	11-Apr-2023	Darrick J. Wong <djwong@kernel.org>	xfs: ensure that all metadata and data blocks are not cow staging extents Make sure that all filesystem metadata blocks and file data blocks are not also marked as CoW staging extents. The extra checking added here was inspired by an actual VM host filesystem corruption incident due to bugs in the CoW handling of 4.x kernels. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
# 69010fe3	11-Apr-2023	Darrick J. Wong <djwong@kernel.org>	xfs: standardize ondisk to incore conversion for bmap btrees Fix all xfs_bmbt_disk_get_all callsites to call xfs_bmap_validate_extent and bubble up corruption reports. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
# 466c525d	11-Apr-2023	Darrick J. Wong <djwong@kernel.org>	xfs: minimize overhead of drain wakeups by using jump labels To reduce the runtime overhead even further when online fsck isn't running, use a static branch key to decide if we call wake_up on the drain. For compilers that support jump labels, the call to wake_up is replaced by a nop sled when nobody is waiting for intents to drain. From my initial microbenchmarking, every transition of the static key between the on and off states takes about 22000ns to complete; this is paid entirely by the xfs_scrub process. When the static key is off (which it should be when fsck isn't running), the nop sled adds an overhead of approximately 0.36ns to runtime code. The post-atomic lockless waiter check adds about 0.03ns, which is basically free. For the few compilers that don't support jump labels, runtime code pays the cost of calling wake_up on an empty waitqueue, which was observed to be about 30ns. However, most architectures that have sufficient memory and CPU capacity to run XFS also support jump labels, so this is not much of a worry. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
# ecc73f8a	11-Apr-2023	Darrick J. Wong <djwong@kernel.org>	xfs: update copyright years for scrub/ files Update the copyright years in the scrub/ source code files. This isn't required, but it's helpful to remind myself just how long it's taken to develop this feature. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
# 739a2fe0	11-Apr-2023	Darrick J. Wong <djwong@kernel.org>	xfs: fix author and spdx headers on scrub/ files Fix the spdx tags to match current practice, and update the author contact information. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
# c4d5660a	12-Feb-2023	Dave Chinner <dchinner@redhat.com>	xfs: active perag reference counting We need to be able to dynamically remove instantiated AGs from memory safely, either for shrinking the filesystem or paging AG state in and out of memory (e.g. supporting millions of AGs). This means we need to be able to safely exclude operations from accessing perags while dynamic removal is in progress. To do this, introduce the concept of active and passive references. Active references are required for high level operations that make use of an AG for a given operation (e.g. allocation) and pin the perag in memory for the duration of the operation that is operating on the perag (e.g. transaction scope). This means we can fail to get an active reference to an AG, hence callers of the new active reference API must be able to handle lookup failure gracefully. Passive references are used in low level code, where we might need to access the perag structure for the purposes of completing high level operations. For example, buffers need to use passive references because: - we need to be able to do metadata IO during operations like grow and shrink transactions where high level active references to the AG have already been blocked - buffers need to pin the perag until they are reclaimed from memory, something that high level code has no direct control over. - unused cached buffers should not prevent a shrink from being started. Hence we have active references that will form exclusion barriers for operations to be performed on an AG, and passive references that will prevent reclaim of the perag until all objects with passive references have been reclaimed themselves. This patch introduce xfs_perag_grab()/xfs_perag_rele() as the API for active AG reference functionality. We also need to convert the for_each_perag*() iterators to use active references, which will start the process of converting high level code over to using active references. Conversion of non-iterator based code to active references will be done in followup patches. Note that the implementation using reference counting is really just a development vehicle for the API to ensure we don't have any leaks in the callers. Once we need to remove perag structures from memory dyanmically, we will need a much more robust per-ag state transition mechanism for preventing new references from being taken while we wait for existing references to drain before removal from memory can occur.... Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Allison Henderson <allison.henderson@oracle.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
# 5eef4635	06-Nov-2022	Darrick J. Wong <djwong@kernel.org>	xfs: teach scrub to flag non-extents format cow forks CoW forks only exist in memory, which means that they can only ever have an incore extent tree. Hence they must always be FMT_EXTENTS, so check this when we're scrubbing them. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
# 31785537	06-Nov-2022	Darrick J. Wong <djwong@kernel.org>	xfs: check that CoW fork extents are not shared Ensure that extents in an inode's CoW fork are not marked as shared in the refcount btree. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
# 830ffa09	06-Nov-2022	Darrick J. Wong <djwong@kernel.org>	xfs: block map scrub should handle incore delalloc reservations Enhance the block map scrubber to check delayed allocation reservations. Though there are no physical space allocations to check, we do need to make sure that the range of file offsets being mapped are correct, and to bump the lastoff cursor so that key order checking works correctly. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
# 6a577786	06-Nov-2022	Darrick J. Wong <djwong@kernel.org>	xfs: teach scrub to check for adjacent bmaps when rmap larger than bmap When scrub is checking file fork mappings against rmap records and the rmap record starts before or ends after the bmap record, check the adjacent bmap records to make sure that they're adjacent to the one we're checking. This helps us to detect cases where the rmaps cover territory that the bmaps do not. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
# 033985b6	06-Nov-2022	Darrick J. Wong <djwong@kernel.org>	xfs: fix perag loop in xchk_bmap_check_rmaps sparse complains that we can return an uninitialized error from this function and that pag could be uninitialized. We know that there are no zero-AG filesystems and hence we had to call xchk_bmap_check_ag_rmaps at least once, so this is not actually possible, but I'm too worn out from automated complaints from unsophisticated AIs so let's just fix this and move on to more interesting problems, eh? Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
# 732436ef	09-Jul-2022	Darrick J. Wong <djwong@kernel.org>	xfs: convert XFS_IFORK_PTR to a static inline helper We're about to make this logic do a bit more, so convert the macro to a static inline function for better typechecking and fewer shouty macros. No functional changes here. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com>
# 08d3e84f	07-Jul-2022	Dave Chinner <dchinner@redhat.com>	xfs: pass perag to xfs_alloc_read_agf() xfs_alloc_read_agf() initialises the perag if it hasn't been done yet, so it makes sense to pass it the perag rather than pull a reference from the buffer. This allows callers to be per-ag centric rather than passing mount/agno pairs everywhere. Whilst modifying the xfs_reflink_find_shared() function definition, declare it static and remove the extern declaration as it is an internal function only these days. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
# 5b7ca8b3	25-Apr-2022	Darrick J. Wong <djwong@kernel.org>	xfs: simplify xfs_rmap_lookup_le call sites Most callers of xfs_rmap_lookup_le will retrieve the btree record immediately if the lookup succeeds. The overlapped version of this function (xfs_rmap_lookup_le_range) will return the record if the lookup succeeds, so make the regular version do it too. Get rid of the useless len argument, since it's not part of the lookup key. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
# 95f0b95e	08-Aug-2021	Chandan Babu R <chandan.babu@oracle.com>	xfs: Define max extent length based on on-disk format definition The maximum extent length depends on maximum block count that can be stored in a BMBT record. Hence this commit defines MAXEXTLEN based on BMBT_BLOCKCOUNT_BITLEN. While at it, the commit also renames MAXEXTLEN to XFS_MAX_BMBT_EXTLEN. Suggested-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com>
# 6ca444cf	16-Sep-2021	Darrick J. Wong <djwong@kernel.org>	xfs: prepare xfs_btree_cur for dynamic cursor heights Split out the btree level information into a separate struct and put it at the end of the cursor structure as a VLA. Files with huge data forks (and in the future, the realtime rmap btree) will require the ability to support many more levels than a per-AG btree cursor, which means that we're going to create per-btree type cursor caches to conserve memory for the more common case. Note that a subsequent patch actually introduces dynamic cursor heights. This one merely rearranges the structure to prepare for that. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com>
# 61e0d0cc	19-Aug-2021	Darrick J. Wong <djwong@kernel.org>	xfs: fix perag structure refcounting error when scrub fails The kernel test robot found the following bug when running xfs/355 to scrub a bmap btree: XFS: Assertion failed: !sa->pag, file: fs/xfs/scrub/common.c, line: 412 ------------[ cut here ]------------ kernel BUG at fs/xfs/xfs_message.c:110! invalid opcode: 0000 [#1] SMP PTI CPU: 2 PID: 1415 Comm: xfs_scrub Not tainted 5.14.0-rc4-00021-g48c6615cc557 #1 Hardware name: Hewlett-Packard p6-1451cx/2ADA, BIOS 8.15 02/05/2013 RIP: 0010:assfail+0x23/0x28 [xfs] RSP: 0018:ffffc9000aacb890 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffffc9000aacbcc8 RCX: 0000000000000000 RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffffc09e7dcd RBP: ffffc9000aacbc80 R08: ffff8881fdf17d50 R09: 0000000000000000 R10: 000000000000000a R11: f000000000000000 R12: 0000000000000000 R13: ffff88820c7ed000 R14: 0000000000000001 R15: ffffc9000aacb980 FS: 00007f185b955700(0000) GS:ffff8881fdf00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f7f6ef43000 CR3: 000000020de38002 CR4: 00000000001706e0 Call Trace: xchk_ag_read_headers+0xda/0x100 [xfs] xchk_ag_init+0x15/0x40 [xfs] xchk_btree_check_block_owner+0x76/0x180 [xfs] xchk_btree_get_block+0xd0/0x140 [xfs] xchk_btree+0x32e/0x440 [xfs] xchk_bmap_btree+0xd4/0x140 [xfs] xchk_bmap+0x1eb/0x3c0 [xfs] xfs_scrub_metadata+0x227/0x4c0 [xfs] xfs_ioc_scrub_metadata+0x50/0xc0 [xfs] xfs_file_ioctl+0x90c/0xc40 [xfs] __x64_sys_ioctl+0x83/0xc0 do_syscall_64+0x3b/0xc0 The unusual handling of errors while initializing struct xchk_ag is the root cause here. Since the beginning of xfs_scrub, the goal of xchk_ag_read_headers has been to read all three AG header buffers and attach them both to the xchk_ag structure and the scrub transaction. Corruption errors on any of the three headers doesn't necessarily trigger an immediate return to userspace, because xfs_scrub can also tell us to /fix/ the problem. In other words, it's possible for the xchk_ag init functions to return an error code and a partially filled out structure so that scrub can use however much information it managed to pull. Before 5.15, it was sufficient to cancel (or commit) the scrub transaction on the way out of the scrub code to release the buffers. Ccommit 48c6615cc557 added a reference to the perag structure to struct xchk_ag. Since perag structures are not attached to transactions like buffers are, this adds the requirement that the perag ref be released explicitly. The scrub teardown function xchk_teardown was amended to do this for the xchk_ag embedded in struct xfs_scrub. Unfortunately, I forgot that certain parts of the scrub code probe multiple AGs and therefore handle the initialization and cleanup on their own. Specifically, the bmbt scrubber will initialize it long enough to cross-reference AG metadata for btree blocks and for the extent mappings in the bmbt. If one of the AG headers is corrupt, the init function returns with a live perag structure reference and some of the AG header buffers. If an error occurs, the cross referencing will be noted as XCORRUPTion and skipped, but the main scrub process will move on to the next record. It is now necessary to release the perag reference before we try to analyze something from a different AG, or else we'll trip over the assertion noted above. Fixes: 48c6615cc557 ("xfs: grab active perag ref when reading AG headers") Reported-by: kernel test robot <oliver.sang@intel.com> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
# ebd9027d	18-Aug-2021	Dave Chinner <dchinner@redhat.com>	xfs: convert xfs_sb_version_has checks to use mount features This is a conversion of the remaining xfs_sb_version_has..(sbp) checks to use xfs_has_..(mp) feature checks. This was largely done with a vim replacement macro that did: :0,$s/xfs_sb_version_has\(.\)&\(.\)->m_sb/xfs_has_\1\2/g<CR> A couple of other variants were also used, and the rest touched up by hand. $ size -t fs/xfs/built-in.a text data bss dec hex filename before 1127533 311352 484 1439369 15f689 (TOTALS) after 1125360 311352 484 1437196 15ee0c (TOTALS) Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
# 38c26bfd	18-Aug-2021	Dave Chinner <dchinner@redhat.com>	xfs: replace xfs_sb_version checks with feature flag checks Convert the xfs_sb_version_hasfoo() to checks against mp->m_features. Checks of the superblock itself during disk operations (e.g. in the read/write verifiers and the to/from disk formatters) are not converted - they operate purely on the superblock state. Everything else should use the mount features. Large parts of this conversion were done with sed with commands like this: for f in `git grep -l xfs_sb_version_has fs/xfs/.c`; do sed -i -e 's/xfs_sb_version_has\(.\)(&\(.*\)->m_sb)/xfs_has_\1(\2)/' $f done With manual cleanups for things like "xfs_has_extflgbit" and other little inconsistencies in naming. The result is ia lot less typing to check features and an XFS binary size reduced by a bit over 3kB: $ size -t fs/xfs/built-in.a text data bss dec hex filenam before 1130866 311352 484 1442702 16038e (TOTALS) after 1127727 311352 484 1439563 15f74b (TOTALS) Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
# 22ece4e8	10-Aug-2021	Darrick J. Wong <djwong@kernel.org>	xfs: mark the record passed into xchk_btree functions as const xchk_btree calls a user-supplied function to validate each btree record that it finds. Those functions are not supposed to change the record data, so mark the parameter const. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de>