History log of /linux-master/fs/f2fs/recovery.c
Revision Date Author Comments
# 968c4f72 06-May-2024 Chao Yu <chao@kernel.org>

f2fs: remove unused GC_FAILURE_PIN

After commit 3db1de0e582c ("f2fs: change the current atomic write way"),
we removed all GC_FAILURE_ATOMIC usage, let's change i_gc_failures[]
array to i_pin_failure for cleanup.

Meanwhile, let's define i_current_depth and i_gc_failures as union
variable due to they won't be valid at the same time.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 31f85ccc 07-Mar-2024 Zhiguo Niu <zhiguo.niu@unisoc.com>

f2fs: unify the error handling of f2fs_is_valid_blkaddr

There are some cases of f2fs_is_valid_blkaddr not handled as
ERROR_INVALID_BLKADDR,so unify the error handling about all of
f2fs_is_valid_blkaddr.
Do f2fs_handle_error in __f2fs_is_valid_blkaddr for cleanup.

Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 28f66cc6 01-Mar-2024 Zhiguo Niu <zhiguo.niu@unisoc.com>

f2fs: fix to check return value __allocate_new_segment

__allocate_new_segment may return error when get_new_segment
fails, so its caller should check its return value.

Signed-off-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 4d4c5938 16-Feb-2024 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix write pointers all the time

Even if the roll forward recovery stopped due to any error, we have to fix
the write pointers in order to mount the disk from the previous checkpoint.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# a60108f7 06-Feb-2024 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: use BLKS_PER_SEG, BLKS_PER_SEC, and SEGS_PER_SEC

No functional change.

Reviewed-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 21ec6823 24-Jan-2024 Chao Yu <chao@kernel.org>

f2fs: fix to avoid potential panic during recovery

During recovery, if FAULT_BLOCK is on, it is possible that
f2fs_reserve_new_block() will return -ENOSPC during recovery,
then it may trigger panic.

Also, if fault injection rate is 1 and only FAULT_BLOCK fault
type is on, it may encounter deadloop in loop of block reservation.

Let's change as below to fix these issues:
- remove bug_on() to avoid panic.
- limit the loop count of block reservation to avoid potential
deadloop.

Fixes: 956fa1ddc132 ("f2fs: fix to check return value of f2fs_reserve_new_block()")
Reported-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 9dad4d96 02-Dec-2023 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix write pointers on zoned device after roll forward

1. do roll forward recovery
2. update current segments pointers
3. fix the entire zones' write pointers
4. do checkpoint

Reviewed-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 956fa1dd 15-Nov-2023 Chao Yu <chao@kernel.org>

f2fs: fix to check return value of f2fs_reserve_new_block()

Let's check return value of f2fs_reserve_new_block() in do_recover_data()
rather than letting it fails silently.

Also refactoring check condition on return value of f2fs_reserve_new_block()
as below:
- trigger f2fs_bug_on() only for ENOSPC case;
- use do-while statement to avoid redundant codes;

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 11cc6426 04-Oct-2023 Jeff Layton <jlayton@kernel.org>

f2fs: convert to new timestamp accessors

Convert to using the new inode timestamp accessor functions.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://lore.kernel.org/r/20231004185347.80880-34-jlayton@kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>


# eb61c2cc 07-Aug-2023 Chao Yu <chao@kernel.org>

f2fs: fix to account cp stats correctly

cp_foreground_calls sysfs entry shows total CP call count rather than
foreground CP call count, fix it.

Fixes: fc7100ea2a52 ("f2fs: Add f2fs stats to sysfs")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# c62ebd35 05-Jul-2023 Jeff Layton <jlayton@kernel.org>

f2fs: convert to ctime accessor functions

In later patches, we're going to change how the inode's ctime field is
used. Switch to using accessor functions instead of raw accesses of
inode->i_ctime.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Message-Id: <20230705190309.579783-41-jlayton@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>


# 3f8ac7da 16-Jun-2023 Colin Ian King <colin.i.king@gmail.com>

f2fs: remove redundant assignment to variable err

The assignment to variable err is redundant since the code jumps to
label next and err is then re-assigned a new value on the call to
sanity_check_node_chain. Remove the assignment.

Cleans up clang scan build warning:
fs/f2fs/recovery.c:464:6: warning: Value stored to 'err' is never read [deadcode.DeadStores]

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 38a4a330 27-May-2023 Chunhai Guo <guochunhai@vivo.com>

f2fs: Detect looped node chain efficiently

find_fsync_dnodes() detect the looped node chain by comparing the loop
counter with free blocks. While it may take tens of seconds to quit when
the free blocks are large enough. We can use Floyd's cycle detection
algorithm to make the detection more efficient.

Signed-off-by: Chunhai Guo <guochunhai@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# e1bb7d3d 02-Apr-2023 Chao Yu <chao@kernel.org>

f2fs: fix to recover quota data correctly

With -O quota mkfs option, xfstests generic/417 fails due to fsck detects
data corruption on quota inodes.

[ASSERT] (fsck_chk_quota_files:2051) --> Quota file is missing or invalid quota file content found.

The root cause is there is a hole f2fs doesn't hold quota inodes,
so all recovered quota data will be dropped due to SBI_POR_DOING
flag was set.
- f2fs_fill_super
- f2fs_recover_orphan_inodes
- f2fs_enable_quota_files
- f2fs_quota_off_umount
<--- quota inodes were dropped --->
- f2fs_recover_fsync_data
- f2fs_enable_quota_files
- f2fs_quota_off_umount

This patch tries to eliminate the hole by holding quota inodes
during entire recovery flow as below:
- f2fs_fill_super
- f2fs_recover_quota_begin
- f2fs_recover_orphan_inodes
- f2fs_recover_fsync_data
- f2fs_recover_quota_end

Then, recovered quota data can be persisted after SBI_POR_DOING
is cleared.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# e67fe633 12-Jan-2023 Christian Brauner <brauner@kernel.org>

fs: port i_{g,u}id_into_vfs{g,u}id() to mnt_idmap

Convert to struct mnt_idmap.
Remove legacy file_mnt_user_ns() and mnt_user_ns().

Last cycle we merged the necessary infrastructure in
256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>


# f861646a 12-Jan-2023 Christian Brauner <brauner@kernel.org>

quota: port to mnt_idmap

Convert to struct mnt_idmap.

Last cycle we merged the necessary infrastructure in
256c8aed2b42 ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.

Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.

Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.

Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>


# 870af777 25-Nov-2022 Yangtao Li <frank.li@vivo.com>

f2fs: do some cleanup for f2fs module init

Just for cleanup, no functional changes.

Signed-off-by: Yangtao Li <frank.li@vivo.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 95fa90c9 28-Sep-2022 Chao Yu <chao@kernel.org>

f2fs: support recording errors into superblock

This patch supports to record detail reason of FSCORRUPTED error into
f2fs_super_block.s_errors[].

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# c6ad7fd1 14-Sep-2022 Chao Yu <chao@kernel.org>

f2fs: fix to do sanity check on summary info

As Wenqing Liu reported in bugzilla:

https://bugzilla.kernel.org/show_bug.cgi?id=216456

BUG: KASAN: use-after-free in recover_data+0x63ae/0x6ae0 [f2fs]
Read of size 4 at addr ffff8881464dcd80 by task mount/1013

CPU: 3 PID: 1013 Comm: mount Tainted: G W 6.0.0-rc4 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
Call Trace:
dump_stack_lvl+0x45/0x5e
print_report.cold+0xf3/0x68d
kasan_report+0xa8/0x130
recover_data+0x63ae/0x6ae0 [f2fs]
f2fs_recover_fsync_data+0x120d/0x1fc0 [f2fs]
f2fs_fill_super+0x4665/0x61e0 [f2fs]
mount_bdev+0x2cf/0x3b0
legacy_get_tree+0xed/0x1d0
vfs_get_tree+0x81/0x2b0
path_mount+0x47e/0x19d0
do_mount+0xce/0xf0
__x64_sys_mount+0x12c/0x1a0
do_syscall_64+0x38/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd

The root cause is: in fuzzed image, SSA table is corrupted: ofs_in_node
is larger than ADDRS_PER_PAGE(), result in out-of-range access on 4k-size
page.

- recover_data
- do_recover_data
- check_index_in_prev_nodes
- f2fs_data_blkaddr

This patch adds sanity check on summary info in recovery and GC flow
in where the flows rely on them.

After patch:
[ 29.310883] F2FS-fs (loop0): Inconsistent ofs_in_node:65286 in summary, ino:0, nid:6, max:1018

Cc: stable@vger.kernel.org
Reported-by: Wenqing Liu <wenqingliu0120@gmail.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 0ef4ca04 12-Sep-2022 Chao Yu <chao@kernel.org>

f2fs: fix to do sanity check on destination blkaddr during recovery

As Wenqing Liu reported in bugzilla:

https://bugzilla.kernel.org/show_bug.cgi?id=216456

loop5: detected capacity change from 0 to 131072
F2FS-fs (loop5): recover_inode: ino = 6, name = hln, inline = 1
F2FS-fs (loop5): recover_data: ino = 6 (i_size: recover) err = 0
F2FS-fs (loop5): recover_inode: ino = 6, name = hln, inline = 1
F2FS-fs (loop5): recover_data: ino = 6 (i_size: recover) err = 0
F2FS-fs (loop5): recover_inode: ino = 6, name = hln, inline = 1
F2FS-fs (loop5): recover_data: ino = 6 (i_size: recover) err = 0
F2FS-fs (loop5): Bitmap was wrongly set, blk:5634
------------[ cut here ]------------
WARNING: CPU: 3 PID: 1013 at fs/f2fs/segment.c:2198
RIP: 0010:update_sit_entry+0xa55/0x10b0 [f2fs]
Call Trace:
<TASK>
f2fs_do_replace_block+0xa98/0x1890 [f2fs]
f2fs_replace_block+0xeb/0x180 [f2fs]
recover_data+0x1a69/0x6ae0 [f2fs]
f2fs_recover_fsync_data+0x120d/0x1fc0 [f2fs]
f2fs_fill_super+0x4665/0x61e0 [f2fs]
mount_bdev+0x2cf/0x3b0
legacy_get_tree+0xed/0x1d0
vfs_get_tree+0x81/0x2b0
path_mount+0x47e/0x19d0
do_mount+0xce/0xf0
__x64_sys_mount+0x12c/0x1a0
do_syscall_64+0x38/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd

If we enable CONFIG_F2FS_CHECK_FS config, it will trigger a kernel panic
instead of warning.

The root cause is: in fuzzed image, SIT table is inconsistent with inode
mapping table, result in triggering such warning during SIT table update.

This patch introduces a new flag DATA_GENERIC_ENHANCE_UPDATE, w/ this
flag, data block recovery flow can check destination blkaddr's validation
in SIT table, and skip f2fs_replace_block() to avoid inconsistent status.

Cc: stable@vger.kernel.org
Reported-by: Wenqing Liu <wenqingliu0120@gmail.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# b27c82e1 21-Jun-2022 Christian Brauner <brauner@kernel.org>

attr: port attribute changes to new types

Now that we introduced new infrastructure to increase the type safety
for filesystems supporting idmapped mounts port the first part of the
vfs over to them.

This ports the attribute changes codepaths to rely on the new better
helpers using a dedicated type.

Before this change we used to take a shortcut and place the actual
values that would be written to inode->i_{g,u}id into struct iattr. This
had the advantage that we moved idmappings mostly out of the picture
early on but it made reasoning about changes more difficult than it
should be.

The filesystem was never explicitly told that it dealt with an idmapped
mount. The transition to the value that needed to be stored in
inode->i_{g,u}id appeared way too early and increased the probability of
bugs in various codepaths.

We know place the same value in struct iattr no matter if this is an
idmapped mount or not. The vfs will only deal with type safe
vfs{g,u}id_t. This makes it massively safer to perform permission checks
as the type will tell us what checks we need to perform and what helpers
we need to use.

Fileystems raising FS_ALLOW_IDMAP can't simply write ia_vfs{g,u}id to
inode->i_{g,u}id since they are different types. Instead they need to
use the dedicated vfs{g,u}id_to_k{g,u}id() helpers that map the
vfs{g,u}id into the filesystem.

The other nice effect is that filesystems like overlayfs don't need to
care about idmappings explicitly anymore and can simply set up struct
iattr accordingly directly.

Link: https://lore.kernel.org/lkml/CAHk-=win6+ahs1EwLkcq8apqLi_1wXFWbrPf340zYEhObpz4jA@mail.gmail.com [1]
Link: https://lore.kernel.org/r/20220621141454.2914719-9-brauner@kernel.org
Cc: Seth Forshee <sforshee@digitalocean.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
CC: linux-fsdevel@vger.kernel.org
Reviewed-by: Seth Forshee <sforshee@digitalocean.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>


# 71e7b535 21-Jun-2022 Christian Brauner <brauner@kernel.org>

quota: port quota helpers mount ids

Port the is_quota_modification() and dqout_transfer() helper to type
safe vfs{g,u}id_t. Since these helpers are only called by a few
filesystems don't introduce a new helper but simply extend the existing
helpers to pass down the mount's idmapping.

Note, that this is a non-functional change, i.e. nothing will have
happened here or at the end of this series to how quota are done! This
a change necessary because we will at the end of this series make
ownership changes easier to reason about by keeping the original value
in struct iattr for both non-idmapped and idmapped mounts.

For now we always pass the initial idmapping which makes the idmapping
functions these helpers call nops.

This is done because we currently always pass the actual value to be
written to i_{g,u}id via struct iattr. While this allowed us to treat
the {g,u}id values in struct iattr as values that can be directly
written to inode->i_{g,u}id it also increases the potential for
confusion for filesystems.

Now that we are have dedicated types to prevent this confusion we will
ultimately only map the value from the idmapped mount into a filesystem
value that can be written to inode->i_{g,u}id when the filesystem
actually updates the inode. So pass down the initial idmapping until we
finished that conversion at which point we pass down the mount's
idmapping.

Since struct iattr uses an anonymous union with overlapping types as
supported by the C standard, filesystems that haven't converted to
ia_vfs{g,u}id won't see any difference and things will continue to work
as before. In other words, no functional changes intended with this
change.

Link: https://lore.kernel.org/r/20220621141454.2914719-7-brauner@kernel.org
Cc: Seth Forshee <sforshee@digitalocean.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Aleksa Sarai <cyphar@cyphar.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
CC: linux-fsdevel@vger.kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Seth Forshee <sforshee@digitalocean.com>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>


# 47c8ebcc 27-Jan-2022 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: add a way to limit roll forward recovery time

This adds a sysfs entry to call checkpoint during fsync() in order to avoid
long elapsed time to run roll-forward recovery when booting the device.
Default value doesn't enforce the limitation which is same as before.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 430f163b 03-Feb-2022 Chao Yu <chao@kernel.org>

f2fs: adjust readahead block number during recovery

In a fragmented image, entries in dnode block list may locate in
incontiguous physical block address space, however, in recovery flow,
we will always readahead BIO_MAX_VECS size blocks, so in such case,
current readahead policy is low efficient, let's adjust readahead
window size dynamically based on consecutiveness of dnode blocks.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# e4544b63 07-Jan-2022 Tim Murray <timmurray@google.com>

f2fs: move f2fs to use reader-unfair rwsems

f2fs rw_semaphores work better if writers can starve readers,
especially for the checkpoint thread, because writers are strictly
more important than reader threads. This prevents significant priority
inversion between low-priority readers that blocked while trying to
acquire the read lock and a second acquisition of the write lock that
might be blocking high priority work.

Signed-off-by: Tim Murray <timmurray@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 5298d4bf 17-Jan-2022 Christoph Hellwig <hch@lst.de>

unicode: clean up the Kconfig symbol confusion

Turn the CONFIG_UNICODE symbol into a tristate that generates some always
built in code and remove the confusing CONFIG_UNICODE_UTF8_DATA symbol.

Note that a lot of the IS_ENABLED() checks could be turned from cpp
statements into normal ifs, but this change is intended to be fairly
mechanic, so that should be cleaned up later.

Fixes: 2b3d04787012 ("unicode: Add utf8-data module")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>


# a9419b63 13-Dec-2021 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: do not bother checkpoint by f2fs_get_node_info

This patch tries to mitigate lock contention between f2fs_write_checkpoint and
f2fs_get_node_info along with nat_tree_lock.

The idea is, if checkpoint is currently running, other threads that try to grab
nat_tree_lock would be better to wait for checkpoint.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 4034247a 14-Jan-2022 NeilBrown <neilb@suse.de>

mm: introduce memalloc_retry_wait()

Various places in the kernel - largely in filesystems - respond to a
memory allocation failure by looping around and re-trying. Some of
these cannot conveniently use __GFP_NOFAIL, for reasons such as:

- a GFP_ATOMIC allocation, which __GFP_NOFAIL doesn't work on
- a need to check for the process being signalled between failures
- the possibility that other recovery actions could be performed
- the allocation is quite deep in support code, and passing down an
extra flag to say if __GFP_NOFAIL is wanted would be clumsy.

Many of these currently use congestion_wait() which (in almost all
cases) simply waits the given timeout - congestion isn't tracked for
most devices.

It isn't clear what the best delay is for loops, but it is clear that
the various filesystems shouldn't be responsible for choosing a timeout.

This patch introduces memalloc_retry_wait() with takes on that
responsibility. Code that wants to retry a memory allocation can call
this function passing the GFP flags that were used. It will wait
however is appropriate.

For now, it only considers __GFP_NORETRY and whatever
gfpflags_allow_blocking() tests. If blocking is allowed without
__GFP_NORETRY, then alloc_page either made some reclaim progress, or
waited for a while, before failing. So there is no need for much
further waiting. memalloc_retry_wait() will wait until the current
jiffie ends. If this condition is not met, then alloc_page() won't have
waited much if at all. In that case memalloc_retry_wait() waits about
200ms. This is the delay that most current loops uses.

linux/sched/mm.h needs to be included in some files now,
but linux/backing-dev.h does not.

Link: https://lkml.kernel.org/r/163754371968.13692.1277530886009912421@noble.neil.brown.name
Signed-off-by: NeilBrown <neilb@suse.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: Darrick J. Wong <djwong@kernel.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 10a26878 28-Oct-2021 Chao Yu <chao@kernel.org>

f2fs: support fault injection for dquot_initialize()

This patch adds a new function f2fs_dquot_initialize() to wrap
dquot_initialize(), and it supports to inject fault into
f2fs_dquot_initialize() to simulate inner failure occurs in
dquot_initialize().

Usage:
a) echo 65536 > /sys/fs/f2fs/<dev>/inject_type or
b) mount -o fault_type=65536 <dev> <mountpoint>

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# c02599f2 01-Sep-2021 Chao Yu <chao@kernel.org>

f2fs: avoid attaching SB_ACTIVE flag during mount

Quoted from [1]

"I do remember that I've added this code back then because otherwise
orphan cleanup was losing updates to quota files. But you're right
that now I don't see how that could be happening and it would be nice
if we could get rid of this hack"

[1] https://lore.kernel.org/linux-ext4/99cce8ca-e4a0-7301-840f-2ace67c551f3@huawei.com/T/#m04990cfbc4f44592421736b504afcc346b2a7c00

Related fix in ext4 by
commit 72ffb49a7b62 ("ext4: do not set SB_ACTIVE in ext4_orphan_cleanup()").

f2fs has the same hack implementation in
- f2fs_recover_orphan_inodes()
- f2fs_recover_fsync_data()

Let's get rid of this hack as well in f2fs.

Cc: Zhang Yi <yi.zhang@huawei.com>
Cc: Jan Kara <jack@suse.cz>
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 32410577 08-Aug-2021 Chao Yu <chao@kernel.org>

f2fs: support fault injection for f2fs_kmem_cache_alloc()

This patch supports to inject fault into f2fs_kmem_cache_alloc().

Usage:
a) echo 32768 > /sys/fs/f2fs/<dev>/inject_type or
b) mount -o fault_type=32768 <dev> <mountpoint>

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 4d9a2bb1 10-Jun-2021 Chao Yu <chao@kernel.org>

f2fs: introduce f2fs_casefolded_name slab cache

Add a slab cache: "f2fs_casefolded_name" for memory allocation
of casefold name.

Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# cad83c96 07-May-2021 Chao Yu <chao@kernel.org>

f2fs: fix to avoid racing on fsync_entry_slab by multi filesystem instances

As syzbot reported, there is an use-after-free issue during f2fs recovery:

Use-after-free write at 0xffff88823bc16040 (in kfence-#10):
kmem_cache_destroy+0x1f/0x120 mm/slab_common.c:486
f2fs_recover_fsync_data+0x75b0/0x8380 fs/f2fs/recovery.c:869
f2fs_fill_super+0x9393/0xa420 fs/f2fs/super.c:3945
mount_bdev+0x26c/0x3a0 fs/super.c:1367
legacy_get_tree+0xea/0x180 fs/fs_context.c:592
vfs_get_tree+0x86/0x270 fs/super.c:1497
do_new_mount fs/namespace.c:2905 [inline]
path_mount+0x196f/0x2be0 fs/namespace.c:3235
do_mount fs/namespace.c:3248 [inline]
__do_sys_mount fs/namespace.c:3456 [inline]
__se_sys_mount+0x2f9/0x3b0 fs/namespace.c:3433
do_syscall_64+0x3f/0xb0 arch/x86/entry/common.c:47
entry_SYSCALL_64_after_hwframe+0x44/0xae

The root cause is multi f2fs filesystem instances can race on accessing
global fsync_entry_slab pointer, result in use-after-free issue of slab
cache, fixes to init/destroy this slab cache only once during module
init/destroy procedure to avoid this issue.

Reported-by: syzbot+9d90dad32dd9727ed084@syzkaller.appspotmail.com
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 5f029c04 05-Apr-2021 Yi Zhuang <zhuangyi1@huawei.com>

f2fs: clean up build warnings

This patch combined the below three clean-up patches.

- modify open brace '{' following function definitions
- ERROR: spaces required around that ':'
- ERROR: spaces required before the open parenthesis '('
- ERROR: spaces prohibited before that ','
- Made suggested modifications from checkpatch in reference to WARNING:
Missing a blank line after declarations

Signed-off-by: Yi Zhuang <zhuangyi1@huawei.com>
Signed-off-by: Jia Yang <jiayang5@huawei.com>
Signed-off-by: Jack Qiu <jack.qiu@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 8769918b 22-Nov-2020 Sahitya Tummala <stummala@codeaurora.org>

f2fs: change to use rwsem for cp_mutex

Use rwsem to ensure serialization of the callers and to avoid
starvation of high priority tasks, when the system is under
heavy IO workload.

Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 7ad08a58 18-Nov-2020 Daniel Rosenberg <drosen@google.com>

f2fs: Handle casefolding with Encryption

Expand f2fs's casefolding support to include encrypted directories. To
index casefolded+encrypted directories, we use the SipHash of the
casefolded name, keyed by a key derived from the directory's fscrypt
master key. This ensures that the dirhash doesn't leak information
about the plaintext filenames.

Encryption keys are unavailable during roll-forward recovery, so we
can't compute the dirhash when recovering a new dentry in an encrypted +
casefolded directory. To avoid having to force a checkpoint when a new
file is fsync'ed, store the dirhash on-disk appended to i_name.

This patch incorporates work by Eric Biggers <ebiggers@google.com>
and Jaegeuk Kim <jaegeuk@kernel.org>.

Co-developed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 9627a7b3 06-Jul-2020 Chao Yu <chao@kernel.org>

f2fs: fix error path in do_recover_data()

- don't panic kernel if f2fs_get_node_page() fails in
f2fs_recover_inline_data() or f2fs_recover_inline_xattr();
- return error number of f2fs_truncate_blocks() to
f2fs_recover_inline_data()'s caller;

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 901d745f 22-Jun-2020 Chao Yu <chao@kernel.org>

f2fs: split f2fs_allocate_new_segments()

to two independent functions:
- f2fs_allocate_new_segment() for specified type segment allocation
- f2fs_allocate_new_segments() for all data type segments allocation

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 43c780ba 07-May-2020 Eric Biggers <ebiggers@google.com>

f2fs: rework filename handling

Rework f2fs's handling of filenames to use a new 'struct f2fs_filename'.
Similar to 'struct ext4_filename', this stores the usr_fname, disk_name,
dirhash, crypto_buf, and casefolded name. Some of these names can be
NULL in some cases. 'struct f2fs_filename' differs from
'struct fscrypt_name' mainly in that the casefolded name is included.

For user-initiated directory operations like lookup() and create(),
initialize the f2fs_filename by translating the corresponding
fscrypt_name, then computing the dirhash and casefolded name if needed.

This makes the dirhash and casefolded name be cached for each syscall,
so we don't have to recompute them repeatedly. (Previously, f2fs
computed the dirhash once per directory level, and the casefolded name
once per directory block.) This improves performance.

This rework also makes it much easier to correctly handle all
combinations of normal, encrypted, casefolded, and encrypted+casefolded
directories. (The fourth isn't supported yet but is being worked on.)

The only other cases where an f2fs_filename gets initialized are for two
filesystem-internal operations: (1) when converting an inline directory
to a regular one, we grab the needed disk_name and hash from an existing
f2fs_dir_entry; and (2) when roll-forward recovering a new dentry, we
grab the needed disk_name from f2fs_inode::i_name and compute the hash.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 5df7731f 17-Feb-2020 Chao Yu <chao@kernel.org>

f2fs: introduce DEFAULT_IO_TIMEOUT

As Geert Uytterhoeven reported:

for parameter HZ/50 in congestion_wait(BLK_RW_ASYNC, HZ/50);

On some platforms, HZ can be less than 50, then unexpected 0 timeout
jiffies will be set in congestion_wait().

This patch introduces a macro DEFAULT_IO_TIMEOUT to wrap a determinate
value with msecs_to_jiffies(20) to instead HZ/50 to avoid such issue.

Quoted from Geert Uytterhoeven:

"A timeout of HZ means 1 second.
HZ/50 means 20 ms, but has the risk of being zero, if HZ < 50.

If you want to use a timeout of 20 ms, you best use msecs_to_jiffies(20),
as that takes care of the special cases, and never returns 0."

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# a2ced1ce 14-Feb-2020 Chao Yu <chao@kernel.org>

f2fs: clean up codes with {f2fs_,}data_blkaddr()

- rename datablock_addr() to data_blkaddr().
- wrap data_blkaddr() with f2fs_data_blkaddr() to clean up
parameters.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# c426d991 09-Dec-2019 Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>

f2fs: Check write pointer consistency of open zones

On sudden f2fs shutdown, write pointers of zoned block devices can go
further but f2fs meta data keeps current segments at positions before the
write operations. After remounting the f2fs, this inconsistency causes
write operations not at write pointers and "Unaligned write command"
error is reported.

To avoid the error, compare current segments with write pointers of open
zones the current segments point to, during mount operation. If the write
pointer position is not aligned with the current segment position, assign
a new zone to the current segment. Also check the newly assigned zone has
write pointer at zone start. If not, reset write pointer of the zone.

Perform the consistency check during fsync recovery. Not to lose the
fsync data, do the check after fsync data gets restored and before
checkpoint commit which flushes data at current segment positions. Not to
cause conflict with kworker's dirfy data/node flush, do the fix within
SBI_POR_DOING protection.

Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# f5a53edc 18-Oct-2019 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: support aligned pinned file

This patch supports 2MB-aligned pinned file, which can guarantee no GC at all
by allocating fully valid 2MB segment.

Check free segments by has_not_enough_free_secs() with large budget.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 10f966bb 19-Jun-2019 Chao Yu <chao@kernel.org>

f2fs: use generic EFSBADCRC/EFSCORRUPTED

f2fs uses EFAULT as error number to indicate filesystem is corrupted
all the time, but generic filesystems use EUCLEAN for such condition,
we need to change to follow others.

This patch adds two new macros as below to wrap more generic error
code macros, and spread them in code.

EFSBADCRC EBADMSG /* Bad CRC detected */
EFSCORRUPTED EUCLEAN /* Filesystem is corrupted */

Reported-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# dcbb4c10 18-Jun-2019 Joe Perches <joe@perches.com>

f2fs: introduce f2fs_<level> macros to wrap f2fs_printk()

- Add and use f2fs_<level> macros
- Convert f2fs_msg to f2fs_printk
- Remove level from f2fs_printk and embed the level in the format
- Coalesce formats and align multi-line arguments
- Remove unnecessary duplicate extern f2fs_msg f2fs.h

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 93770ab7 15-Apr-2019 Chao Yu <chao@kernel.org>

f2fs: introduce DATA_GENERIC_ENHANCE

Previously, f2fs_is_valid_blkaddr(, blkaddr, DATA_GENERIC) will check
whether @blkaddr locates in main area or not.

That check is weak, since the block address in range of main area can
point to the address which is not valid in segment info table, and we
can not detect such condition, we may suffer worse corruption as system
continues running.

So this patch introduce DATA_GENERIC_ENHANCE to enhance the sanity check
which trigger SIT bitmap check rather than only range check.

This patch did below changes as wel:
- set SBI_NEED_FSCK in f2fs_is_valid_blkaddr().
- get rid of is_valid_data_blkaddr() to avoid panic if blkaddr is invalid.
- introduce verify_fio_blkaddr() to wrap fio {new,old}_blkaddr validation check.
- spread blkaddr check in:
* f2fs_get_node_info()
* __read_out_blkaddrs()
* f2fs_submit_page_read()
* ra_data_block()
* do_recover_data()

This patch can fix bug reported from bugzilla below:

https://bugzilla.kernel.org/show_bug.cgi?id=203215
https://bugzilla.kernel.org/show_bug.cgi?id=203223
https://bugzilla.kernel.org/show_bug.cgi?id=203231
https://bugzilla.kernel.org/show_bug.cgi?id=203235
https://bugzilla.kernel.org/show_bug.cgi?id=203241

= Update by Jaegeuk Kim =

DATA_GENERIC_ENHANCE enhanced to validate block addresses on read/write paths.
But, xfstest/generic/446 compalins some generated kernel messages saying invalid
bitmap was detected when reading a block. The reaons is, when we get the
block addresses from extent_cache, there is no lock to synchronize it from
truncating the blocks in parallel.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 22d61e28 15-Apr-2019 Chao Yu <chao@kernel.org>

f2fs: fix to avoid panic in do_recover_data()

As Jungyeon reported in bugzilla:

https://bugzilla.kernel.org/show_bug.cgi?id=203227

- Overview
When mounting the attached crafted image, following errors are reported.
Additionally, it hangs on sync after trying to mount it.

The image is intentionally fuzzed from a normal f2fs image for testing.
Compile options for F2FS are as follows.
CONFIG_F2FS_FS=y
CONFIG_F2FS_STAT_FS=y
CONFIG_F2FS_FS_XATTR=y
CONFIG_F2FS_FS_POSIX_ACL=y
CONFIG_F2FS_CHECK_FS=y

- Reproduces
mkdir test
mount -t f2fs tmp.img test
sync

- Messages
kernel BUG at fs/f2fs/recovery.c:549!
RIP: 0010:recover_data+0x167a/0x1780
Call Trace:
f2fs_recover_fsync_data+0x613/0x710
f2fs_fill_super+0x1043/0x1aa0
mount_bdev+0x16d/0x1a0
mount_fs+0x4a/0x170
vfs_kern_mount+0x5d/0x100
do_mount+0x200/0xcf0
ksys_mount+0x79/0xc0
__x64_sys_mount+0x1c/0x20
do_syscall_64+0x43/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9

During recovery, if ofs_of_node is inconsistent in between recovered
node page and original checkpointed node page, let's just fail recovery
instead of making kernel panic.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 98838579 10-Apr-2019 Chao Yu <chao@kernel.org>

f2fs: fix error path of recovery

There are some places in where we missed to unlock page or unlock page
incorrectly, fix them.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# bae0ee7a 25-Dec-2018 Chao Yu <chao@kernel.org>

f2fs: check PageWriteback flag for ordered case

For all ordered cases in f2fs_wait_on_page_writeback(), we need to
check PageWriteback status, so let's clean up to relocate the check
into f2fs_wait_on_page_writeback().

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 7beb01f7 24-Oct-2018 Chao Yu <chao@kernel.org>

f2fs: clean up f2fs_sb_has_##feature_name

In F2FS_HAS_FEATURE(), we will use F2FS_SB(sb) to get sbi pointer to
access .raw_super field, to avoid unneeded pointer conversion, this
patch changes to F2FS_HAS_FEATURE() accept sbi parameter directly.

Just do cleanup, no logic change.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 78130819 25-Sep-2018 Chao Yu <chao@kernel.org>

f2fs: fix to keep project quota consistent

This patch does below changes to keep consistence of project quota data
in sudden power-cut case:
- update inode.i_projid and project quota atomically under lock_op() in
f2fs_ioc_setproject()
- recover inode.i_projid and project quota in recover_inode()

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# af033b2a 20-Sep-2018 Chao Yu <chao@kernel.org>

f2fs: guarantee journalled quota data by checkpoint

For journalled quota mode, let checkpoint to flush dquot dirty data
and quota file data to guarntee persistence of all quota sysfile in
last checkpoint, by this way, we can avoid corrupting quota sysfile
when encountering SPO.

The implementation is as below:

1. add a global state SBI_QUOTA_NEED_FLUSH to indicate that there is
cached dquot metadata changes in quota subsystem, and later checkpoint
should:
a) flush dquot metadata into quota file.
b) flush quota file to storage to keep file usage be consistent.

2. add a global state SBI_QUOTA_NEED_REPAIR to indicate that quota
operation failed due to -EIO or -ENOSPC, so later,
a) checkpoint will skip syncing dquot metadata.
b) CP_QUOTA_NEED_FSCK_FLAG will be set in last cp pack to give a
hint for fsck repairing.

3. add a global state SBI_QUOTA_SKIP_FLUSH, in checkpoint, if quota
data updating is very heavy, it may cause hungtask in block_operation().
To avoid this, if our retry time exceed threshold, let's just skip
flushing and retry in next checkpoint().

Signed-off-by: Weichao Guo <guoweichao@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: avoid warnings and set fsck flag]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 26b5a079 12-Oct-2018 Sheng Yong <shengyong1@huawei.com>

f2fs: cleanup dirty pages if recover failed

During recover, we will try to create new dentries for inodes with
dentry_mark. But if the parent is missing (e.g. killed by fsck),
recover will break. But those recovered dirty pages are not cleanup.
This will hit f2fs_bug_on:

[ 53.519566] F2FS-fs (loop0): Found nat_bits in checkpoint
[ 53.539354] F2FS-fs (loop0): recover_inode: ino = 5, name = file, inline = 3
[ 53.539402] F2FS-fs (loop0): recover_dentry: ino = 5, name = file, dir = 0, err = -2
[ 53.545760] F2FS-fs (loop0): Cannot recover all fsync data errno=-2
[ 53.546105] F2FS-fs (loop0): access invalid blkaddr:4294967295
[ 53.546171] WARNING: CPU: 1 PID: 1798 at fs/f2fs/checkpoint.c:163 f2fs_is_valid_blkaddr+0x26c/0x320
[ 53.546174] Modules linked in:
[ 53.546183] CPU: 1 PID: 1798 Comm: mount Not tainted 4.19.0-rc2+ #1
[ 53.546186] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 53.546191] RIP: 0010:f2fs_is_valid_blkaddr+0x26c/0x320
[ 53.546195] Code: 85 bb 00 00 00 48 89 df 88 44 24 07 e8 ad a8 db ff 48 8b 3b 44 89 e1 48 c7 c2 40 03 72 a9 48 c7 c6 e0 01 72 a9 e8 84 3c ff ff <0f> 0b 0f b6 44 24 07 e9 8a 00 00 00 48 8d bf 38 01 00 00 e8 7c a8
[ 53.546201] RSP: 0018:ffff88006c067768 EFLAGS: 00010282
[ 53.546208] RAX: 0000000000000000 RBX: ffff880068844200 RCX: ffffffffa83e1a33
[ 53.546211] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88006d51e590
[ 53.546215] RBP: 0000000000000005 R08: ffffed000daa3cb3 R09: ffffed000daa3cb3
[ 53.546218] R10: 0000000000000001 R11: ffffed000daa3cb2 R12: 00000000ffffffff
[ 53.546221] R13: ffff88006a1f8000 R14: 0000000000000200 R15: 0000000000000009
[ 53.546226] FS: 00007fb2f3646840(0000) GS:ffff88006d500000(0000) knlGS:0000000000000000
[ 53.546229] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 53.546234] CR2: 00007f0fd77f0008 CR3: 00000000687e6002 CR4: 00000000000206e0
[ 53.546237] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 53.546240] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 53.546242] Call Trace:
[ 53.546248] f2fs_submit_page_bio+0x95/0x740
[ 53.546253] read_node_page+0x161/0x1e0
[ 53.546271] ? truncate_node+0x650/0x650
[ 53.546283] ? add_to_page_cache_lru+0x12c/0x170
[ 53.546288] ? pagecache_get_page+0x262/0x2d0
[ 53.546292] __get_node_page+0x200/0x660
[ 53.546302] f2fs_update_inode_page+0x4a/0x160
[ 53.546306] f2fs_write_inode+0x86/0xb0
[ 53.546317] __writeback_single_inode+0x49c/0x620
[ 53.546322] writeback_single_inode+0xe4/0x1e0
[ 53.546326] sync_inode_metadata+0x93/0xd0
[ 53.546330] ? sync_inode+0x10/0x10
[ 53.546342] ? do_raw_spin_unlock+0xed/0x100
[ 53.546347] f2fs_sync_inode_meta+0xe0/0x130
[ 53.546351] f2fs_fill_super+0x287d/0x2d10
[ 53.546367] ? vsnprintf+0x742/0x7a0
[ 53.546372] ? f2fs_commit_super+0x180/0x180
[ 53.546379] ? up_write+0x20/0x40
[ 53.546385] ? set_blocksize+0x5f/0x140
[ 53.546391] ? f2fs_commit_super+0x180/0x180
[ 53.546402] mount_bdev+0x181/0x200
[ 53.546406] mount_fs+0x94/0x180
[ 53.546411] vfs_kern_mount+0x6c/0x1e0
[ 53.546415] do_mount+0xe5e/0x1510
[ 53.546420] ? fs_reclaim_release+0x9/0x30
[ 53.546424] ? copy_mount_string+0x20/0x20
[ 53.546428] ? fs_reclaim_acquire+0xd/0x30
[ 53.546435] ? __might_sleep+0x2c/0xc0
[ 53.546440] ? ___might_sleep+0x53/0x170
[ 53.546453] ? __might_fault+0x4c/0x60
[ 53.546468] ? _copy_from_user+0x95/0xa0
[ 53.546474] ? memdup_user+0x39/0x60
[ 53.546478] ksys_mount+0x88/0xb0
[ 53.546482] __x64_sys_mount+0x5d/0x70
[ 53.546495] do_syscall_64+0x65/0x130
[ 53.546503] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 53.547639] ---[ end trace b804d1ea2fec893e ]---

So if recover fails, we need to drop all recovered data.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 0c093b59 06-Oct-2018 Chao Yu <chao@kernel.org>

f2fs: fix to recover inode->i_flags of inode block during POR

Testcase to reproduce this bug:
1. mkfs.f2fs /dev/sdd
2. mount -t f2fs /dev/sdd /mnt/f2fs
3. touch /mnt/f2fs/file
4. sync
5. chattr +a /mnt/f2fs/file
6. xfs_io -a /mnt/f2fs/file -c "fsync"
7. godown /mnt/f2fs
8. umount /mnt/f2fs
9. mount -t f2fs /dev/sdd /mnt/f2fs
10. xfs_io /mnt/f2fs/file

There is no error when opening this file w/o O_APPEND, but actually,
we expect the correct result should be:

/mnt/f2fs/file: Operation not permitted

The root cause is, in recover_inode(), we recover inode->i_flags more
than F2FS_I(inode)->i_flags, so fix it.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# edc55aaf 17-Sep-2018 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: avoid f2fs_bug_on if f2fs_get_meta_page_nofail got EIO

This patch avoids BUG_ON when f2fs_get_meta_page_nofail got EIO during
xfstests/generic/475.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 4a1728ca 25-Sep-2018 Chao Yu <chao@kernel.org>

f2fs: mark inode dirty explicitly in recover_inode()

Mark inode dirty explicitly in the end of recover_inode() to make sure
that all recoverable fields can be persisted later.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 7de36cf3 25-Sep-2018 Chao Yu <chao@kernel.org>

f2fs: fix to recover inode's i_gc_failures during POR

inode.i_gc_failures is used to indicate that skip count of migrating
on blocks of inode, we should guarantee it can be recovered in sudden
power-off case.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 19c73a69 25-Sep-2018 Chao Yu <chao@kernel.org>

f2fs: fix to recover inode's i_flags during POR

Testcase to reproduce this bug:
1. mkfs.f2fs /dev/sdd
2. mount -t f2fs /dev/sdd /mnt/f2fs
3. touch /mnt/f2fs/file
4. sync
5. chattr +A /mnt/f2fs/file
6. xfs_io -f /mnt/f2fs/file -c "fsync"
7. godown /mnt/f2fs
8. umount /mnt/f2fs
9. mount -t f2fs /dev/sdd /mnt/f2fs
10. lsattr /mnt/f2fs/file

-----------------N- /mnt/f2fs/file

But actually, we expect the corrct result is:

-------A---------N- /mnt/f2fs/file

The reason is we didn't recover inode.i_flags field during mount,
fix it.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# f4474aa6 25-Sep-2018 Chao Yu <chao@kernel.org>

f2fs: fix to recover inode's project id during POR

Testcase to reproduce this bug:
1. mkfs.f2fs -O extra_attr -O project_quota /dev/sdd
2. mount -t f2fs /dev/sdd /mnt/f2fs
3. touch /mnt/f2fs/file
4. sync
5. chattr -p 1 /mnt/f2fs/file
6. xfs_io -f /mnt/f2fs/file -c "fsync"
7. godown /mnt/f2fs
8. umount /mnt/f2fs
9. mount -t f2fs /dev/sdd /mnt/f2fs
10. lsattr -p /mnt/f2fs/file

0 -----------------N- /mnt/f2fs/file

But actually, we expect the correct result is:

1 -----------------N- /mnt/f2fs/file

The reason is we didn't recover inode.i_projid field during mount,
fix it.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# dc4cd125 20-Sep-2018 Chao Yu <chao@kernel.org>

f2fs: fix to recover inode's uid/gid during POR

Step to reproduce this bug:
1. logon as root
2. mount -t f2fs /dev/sdd /mnt;
3. touch /mnt/file;
4. chown system /mnt/file; chgrp system /mnt/file;
5. xfs_io -f /mnt/file -c "fsync";
6. godown /mnt;
7. umount /mnt;
8. mount -t f2fs /dev/sdd /mnt;

After step 8) we will expect file's uid/gid are all system, but during
recovery, these two fields were not been recovered, fix it.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 7c1a000d 11-Sep-2018 Chao Yu <chao@kernel.org>

f2fs: add SPDX license identifiers

Remove the verbose license text from f2fs files and replace them with
SPDX tags. This does not change the license of any of the code.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 1378752b 22-Aug-2018 Chao Yu <chao@kernel.org>

f2fs: fix to flush all dirty inodes recovered in readonly fs

generic/417 reported as blow:

------------[ cut here ]------------
kernel BUG at /home/yuchao/git/devf2fs/inode.c:695!
invalid opcode: 0000 [#1] PREEMPT SMP
CPU: 1 PID: 21697 Comm: umount Tainted: G W O 4.18.0-rc2+ #39
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
EIP: f2fs_evict_inode+0x556/0x580 [f2fs]
Call Trace:
? _raw_spin_unlock+0x2c/0x50
evict+0xa8/0x170
dispose_list+0x34/0x40
evict_inodes+0x118/0x120
generic_shutdown_super+0x41/0x100
? rcu_read_lock_sched_held+0x97/0xa0
kill_block_super+0x22/0x50
kill_f2fs_super+0x6f/0x80 [f2fs]
deactivate_locked_super+0x3d/0x70
deactivate_super+0x40/0x60
cleanup_mnt+0x39/0x70
__cleanup_mnt+0x10/0x20
task_work_run+0x81/0xa0
exit_to_usermode_loop+0x59/0xa7
do_fast_syscall_32+0x1f5/0x22c
entry_SYSENTER_32+0x53/0x86
EIP: f2fs_evict_inode+0x556/0x580 [f2fs]

It can simply reproduced with scripts:

Enable quota feature during mkfs.

Testcase1:
1. mkfs.f2fs /dev/zram0
2. mount -t f2fs /dev/zram0 /mnt/f2fs
3. xfs_io -f /mnt/f2fs/file -c "pwrite 0 4k" -c "fsync"
4. godown /mnt/f2fs
5. umount /mnt/f2fs
6. mount -t f2fs -o ro /dev/zram0 /mnt/f2fs
7. umount /mnt/f2fs

Testcase2:
1. mkfs.f2fs /dev/zram0
2. mount -t f2fs /dev/zram0 /mnt/f2fs
3. touch /mnt/f2fs/file
4. create process[pid = x] do:
a) open /mnt/f2fs/file;
b) unlink /mnt/f2fs/file
5. godown -f /mnt/f2fs
6. kill process[pid = x]
7. umount /mnt/f2fs
8. mount -t f2fs -o ro /dev/zram0 /mnt/f2fs
9. umount /mnt/f2fs

The reason is: during recovery, i_{c,m}time of inode will be updated, then
the inode can be set dirty w/o being tracked in sbi->inode_list[DIRTY_META]
global list, so later write_checkpoint will not flush such dirty inode into
node page.

Once umount is called, sync_filesystem() in generic_shutdown_super() will
skip syncng dirty inodes due to sb_rdonly check, leaving dirty inodes
there.

To solve this issue, during umount, add remove SB_RDONLY flag in
sb->s_flags, to make sure sync_filesystem() will not be skipped.

Signed-off-by: Chao Yu <yuchao0@huawei.com>

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 7fa750a1 13-Aug-2018 Arnd Bergmann <arnd@arndb.de>

f2fs: rework fault injection handling to avoid a warning

When CONFIG_F2FS_FAULT_INJECTION is disabled, we get a warning about an
unused label:

fs/f2fs/segment.c: In function '__submit_discard_cmd':
fs/f2fs/segment.c:1059:1: error: label 'submit' defined but not used [-Werror=unused-label]

This could be fixed by adding another #ifdef around it, but the more
reliable way of doing this seems to be to remove the other #ifdefs
where that is easily possible.

By defining time_to_inject() as a trivial stub, most of the checks for
CONFIG_F2FS_FAULT_INJECTION can go away. This also leads to nicer
formatting of the code.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# e6b0b159 19-Jul-2018 Yunlei He <heyunlei@huawei.com>

f2fs: fix wrong kernel message when recover fsync data on ro fs

This patch fix wrong message info for recover fsync data
on readonly fs.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 7735730d 16-Jul-2018 Chao Yu <chao@kernel.org>

f2fs: fix to propagate error from __get_meta_page()

If caller of __get_meta_page() can handle error, let's propagate error
from __get_meta_page().

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 82902c06 05-Jul-2018 Chao Yu <chao@kernel.org>

f2fs: fix to detect looped node chain correctly

Below dmesg was printed when testing generic/388 of fstest:

F2FS-fs (zram1): find_fsync_dnodes: detect looped node chain, blkaddr:526615, next:526616
F2FS-fs (zram1): Cannot recover all fsync data errno=-22
F2FS-fs (zram1): Mounted with checkpoint version = 22300d0e
F2FS-fs (zram1): find_fsync_dnodes: detect looped node chain, blkaddr:526615, next:526616
F2FS-fs (zram1): Cannot recover all fsync data errno=-22

The reason is that we initialize free_blocks with free blocks of
filesystem, so if filesystem is full, free_blocks can be zero,
below condition will be true, so that, it will fail recovery.

if (++loop_cnt >= free_blocks ||
blkaddr == next_blkaddr_of_node(page))

To fix this issue, initialize free_blocks with correct value which
includes over-privision blocks.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# e1da7872 05-Jun-2018 Chao Yu <chao@kernel.org>

f2fs: introduce and spread verify_blkaddr

This patch introduces verify_blkaddr to check meta/data block address
with valid range to detect bug earlier.

In addition, once we encounter an invalid blkaddr, notice user to run
fsck to fix, and let the kernel panic.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 4d57b86d 29-May-2018 Chao Yu <chao@kernel.org>

f2fs: clean up symbol namespace

As Ted reported:

"Hi, I was looking at f2fs's sources recently, and I noticed that there
is a very large number of non-static symbols which don't have a f2fs
prefix. There's well over a hundred (see attached below).

As one example, in fs/f2fs/dir.c there is:

unsigned char get_de_type(struct f2fs_dir_entry *de)

This function is clearly only useful for f2fs, but it has a generic
name. This means that if any other file system tries to have the same
symbol name, there will be a symbol conflict and the kernel would not
successfully build. It also means that when someone is looking f2fs
sources, it's not at all obvious whether a function such as
read_data_page(), invalidate_blocks(), is a generic kernel function
found in the fs, mm, or block layers, or a f2fs specific function.

You might want to fix this at some point. Hopefully Kent's bcachefs
isn't similarly using genericly named functions, since that might
cause conflicts with f2fs's functions --- but just as this would be a
problem that we would rightly insist that Kent fix, this is something
that we should have rightly insisted that f2fs should have fixed
before it was integrated into the mainline kernel.

acquire_orphan_inode
add_ino_entry
add_orphan_inode
allocate_data_block
allocate_new_segments
alloc_nid
alloc_nid_done
alloc_nid_failed
available_free_memory
...."

This patch adds "f2fs_" prefix for all non-static symbols in order to:
a) avoid conflict with other kernel generic symbols;
b) to indicate the function is f2fs specific one instead of generic
one;

Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 7b525dd0 23-May-2018 Chao Yu <chao@kernel.org>

f2fs: clean up with is_valid_blkaddr()

- rename is_valid_blkaddr() to is_valid_meta_blkaddr() for readability.
- introduce is_valid_blkaddr() for cleanup.

No logic change in this patch.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# eff15c2a 22-Apr-2018 Sheng Yong <shengyong1@huawei.com>

f2fs: do not check F2FS_INLINE_DOTS in recover

Only dir may have F2FS_INLINE_DOTS flag, so there is no need to check
the flag in recover flow.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# fb0e72c8 03-Feb-2018 Chao Yu <chao@kernel.org>

f2fs: fix to handle looped node chain during recovery

There is no checksum in node block now, so bit-transition from hardware
can make node_footer.next_blkaddr being corrupted w/o any detection,
result in node chain becoming looped one.

For this condition, during recovery, in order to avoid running into dead
loop, let's detect it and just skip out.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# bdbc90fa 28-Feb-2018 Yunlong Song <yunlong.song@huawei.com>

f2fs: don't put dentry page in pagecache into highmem

Previous dentry page uses highmem, which will cause panic in platforms
using highmem (such as arm), since the address space of dentry pages
from highmem directly goes into the decryption path via the function
fscrypt_fname_disk_to_usr. But sg_init_one assumes the address is not
from highmem, and then cause panic since it doesn't call kmap_high but
kunmap_high is triggered at the end. To fix this problem in a simple
way, this patch avoids to put dentry page in pagecache into highmem.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix coding style]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 37a086f0 19-Jan-2018 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: recover some i_inline flags

This fixes lost i_inline flags during roll-forward.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# e17d488b 22-Nov-2017 Sheng Yong <shengyong1@huawei.com>

f2fs: remove unused parameter

Commit d260081ccf37 ("f2fs: change recovery policy of xattr node block")
removes the use of blkaddr, which is no longer used. So remove the
parameter.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 1751e8a6 27-Nov-2017 Linus Torvalds <torvalds@linux-foundation.org>

Rename superblock flags (MS_xyz -> SB_xyz)

This is a pure automated search-and-replace of the internal kernel
superblock flags.

The s_flags are now called SB_*, with the names and the values for the
moment mirroring the MS_* flags that they're equivalent to.

Note how the MS_xyz flags are the ones passed to the mount system call,
while the SB_xyz flags are what we then use in sb->s_flags.

The script to do this was:

# places to look in; re security/*: it generally should *not* be
# touched (that stuff parses mount(2) arguments directly), but
# there are two places where we really deal with superblock flags.
FILES="drivers/mtd drivers/staging/lustre fs ipc mm \
include/linux/fs.h include/uapi/linux/bfs_fs.h \
security/apparmor/apparmorfs.c security/apparmor/include/lib.h"
# the list of MS_... constants
SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \
DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \
POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \
I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \
ACTIVE NOUSER"

SED_PROG=
for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done

# we want files that contain at least one of MS_...,
# with fs/namespace.c and fs/pnode.c excluded.
L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c')

for f in $L; do sed -i $f $SED_PROG; done

Requested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# ea676733 06-Oct-2017 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: support quota sys files

This patch supports hidden quota files in the system, which will be used for
Android. It requires up-to-date f2fs-tools later than v1.9.0.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 125c9fb1 12-Aug-2017 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: check hot_data for roll-forward recovery

We need to check HOT_DATA to truncate any previous data block when doing
roll-forward recovery.

Cc: <stable@vger.kernel.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# afd2b4da 10-Aug-2017 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: let fill_super handle roll-forward errors

If we set CP_ERROR_FLAG in roll-forward error, f2fs is no longer to proceed
any IOs due to f2fs_cp_error(). But, for example, if some stale data is involved
on roll-forward process, we're able to get -ENOENT, getting fs stuck.
If we get any error, let fill_super set SBI_NEED_FSCK and try to recover back
to stable point.

Cc: <stable@vger.kernel.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 4b2414d0 07-Aug-2017 Chao Yu <chao@kernel.org>

f2fs: support journalled quota

This patch supports to enable f2fs to accept quota information through
mount option:
- {usr,grp,prj}jquota=<quota file path>
- jqfmt=<quota type>

Then, in ->mount flow, we can recover quota file during log replaying,
by this, journelled quota can be supported.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: Fix wrong return values.]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 7a2af766 18-Jul-2017 Chao Yu <chao@kernel.org>

f2fs: enhance on-disk inode structure scalability

This patch add new flag F2FS_EXTRA_ATTR storing in inode.i_inline
to indicate that on-disk structure of current inode is extended.

In order to extend, we changed the inode structure a bit:

Original one:

struct f2fs_inode {
...
struct f2fs_extent i_ext;
__le32 i_addr[DEF_ADDRS_PER_INODE];
__le32 i_nid[DEF_NIDS_PER_INODE];
}

Extended one:

struct f2fs_inode {
...
struct f2fs_extent i_ext;
union {
struct {
__le16 i_extra_isize;
__le16 i_padding;
__le32 i_extra_end[0];
};
__le32 i_addr[DEF_ADDRS_PER_INODE];
};
__le32 i_nid[DEF_NIDS_PER_INODE];
}

Once F2FS_EXTRA_ATTR is set, we will steal four bytes in the head of
i_addr field for storing i_extra_isize and i_padding. with i_extra_isize,
we can calculate actual size of reserved space in i_addr, available
attribute fields included in total extra attribute fields for current
inode can be described as below:

+--------------------+
| .i_mode |
| ... |
| .i_ext |
+--------------------+
| .i_extra_isize |-----+
| .i_padding | |
| .i_prjid | |
| .i_atime_extra | |
| .i_ctime_extra | |
| .i_mtime_extra |<----+
| .i_inode_cs |<----- store blkaddr/inline from here
| .i_xattr_cs |
| ... |
+--------------------+
| |
| block address |
| |
+--------------------+
| .i_nid |
+--------------------+
| node_footer |
| (nid, ino, offset) |
+--------------------+

Hence, with this patch, we would enhance scalability of f2fs inode for
storing more newly added attribute.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# d40d30c5 14-Apr-2017 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: avoid dirty node pages in check_only recovery

In the check_only mode, we should not make any dirty node pages. Otherwise,
we can get this panic:

F2FS-fs (nvme0n1p1): Need to recover fsync data
------------[ cut here ]------------
kernel BUG at fs/f2fs/node.c:2204!
CPU: 7 PID: 19923 Comm: mount Tainted: G OE 4.9.8 #2
RIP: 0010:[<ffffffffc0979c0b>] [<ffffffffc0979c0b>] flush_nat_entries+0x43b/0x7d0 [f2fs]
Call Trace:
[<ffffffffc096ddaa>] ? __f2fs_submit_merged_bio+0x5a/0xd0 [f2fs]
[<ffffffffc096ddaa>] ? __f2fs_submit_merged_bio+0x5a/0xd0 [f2fs]
[<ffffffffc096dddb>] ? __f2fs_submit_merged_bio+0x8b/0xd0 [f2fs]
[<ffffffff860e450f>] ? up_write+0x1f/0x40
[<ffffffffc096dddb>] ? __f2fs_submit_merged_bio+0x8b/0xd0 [f2fs]
[<ffffffffc0969f04>] write_checkpoint+0x2f4/0xf20 [f2fs]
[<ffffffff860e938d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc0960bc9>] ? f2fs_sync_fs+0x79/0x190 [f2fs]
[<ffffffffc0960bc9>] ? f2fs_sync_fs+0x79/0x190 [f2fs]
[<ffffffffc0960bd5>] f2fs_sync_fs+0x85/0x190 [f2fs]
[<ffffffffc097b6de>] f2fs_balance_fs_bg+0x7e/0x1c0 [f2fs]
[<ffffffffc0977b64>] f2fs_write_node_pages+0x34/0x350 [f2fs]
[<ffffffff860e5f42>] ? __lock_is_held+0x52/0x70
[<ffffffff861d9b31>] do_writepages+0x21/0x30
[<ffffffff86298ce1>] __writeback_single_inode+0x61/0x760
[<ffffffff86909127>] ? _raw_spin_unlock+0x27/0x40
[<ffffffff8629a735>] writeback_single_inode+0xd5/0x190
[<ffffffff8629a889>] write_inode_now+0x99/0xc0
[<ffffffff86283876>] iput+0x1f6/0x2c0
[<ffffffffc0964b52>] f2fs_fill_super+0xc32/0x10c0 [f2fs]
[<ffffffff86266462>] mount_bdev+0x182/0x1b0
[<ffffffffc0963f20>] ? f2fs_commit_super+0x100/0x100 [f2fs]
[<ffffffffc0960da5>] f2fs_mount+0x15/0x20 [f2fs]
[<ffffffff86266e08>] mount_fs+0x38/0x170
[<ffffffff86288bab>] vfs_kern_mount+0x6b/0x160
[<ffffffff8628bcfe>] do_mount+0x1be/0xd60

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# d260081c 08-Feb-2017 Chao Yu <chao@kernel.org>

f2fs: change recovery policy of xattr node block

Currently, if we call fsync after updating the xattr date belongs to the
file, f2fs needs to trigger checkpoint to keep xattr data consistent. But,
this policy cause low performance as checkpoint will block most foreground
operations and cause unneeded and unrelated IOs around checkpoint.

This patch will reuse regular file recovery policy for xattr node block,
so, we change to write xattr node block tagged with fsync flag to warm
area instead of cold area, and during recovery, we search warm node chain
for fsynced xattr block, and do the recovery.

So, for below application IO pattern, performance can be improved
obviously:
- touch file
- create/update/delete xattr entry in file
- fsync file

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# dba79f38 24-Jan-2017 Chao Yu <chao@kernel.org>

f2fs: fix to avoid overflow when left shifting page offset

We use following method to calculate size with current page index:
size = index << PAGE_SHIFT
If type of index has only 32-bits size, left shifting will incur overflow,
which makes result incorrect.

So let's cast index with 64-bits type to avoid such issue.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# fed24668 13-Dec-2016 Yunlei He <heyunlei@huawei.com>

f2fs: remove unused values in recover_fsync_data

This patch remove unused values in function recover_fsync_data

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 26787236 28-Nov-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: do not activate auto_recovery for fallocated i_size

If a file needs to keep its i_size by fallocate, we need to turn off auto
recovery during roll-forward recovery.

This will resolve the below scenario.

1. xfs_io -f /mnt/f2fs/file -c "pwrite 0 4096" -c "fsync"
2. xfs_io -f /mnt/f2fs/file -c "falloc -k 4096 4096" -c "fsync"
3. md5sum /mnt/f2fs/file;
4. godown /mnt/f2fs/
5. umount /mnt/f2fs/
6. mount -t f2fs /dev/sdx /mnt/f2fs
7. md5sum /mnt/f2fs/file

Reported-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 3a3a5ead 16-Nov-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: do not recover i_size if it's valid

If i_size is already valid during roll_forward recovery, we should not update
it according to the block alignment.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# d47b8715 04-Nov-2016 Chao Yu <chao@kernel.org>

Revert "f2fs: do not recover from previous remained wrong dnodes"

i_times of inode will be set with current system time which can be
configured through 'date', so it's not safe to judge dnode block as
garbage data or unchanged inode depend on i_times.

Now, we have used enhanced 'cp_ver + cp' crc method to verify valid
dnode block, so I expect recoverying invalid dnode is almost not
possible.

This reverts commit 807b1e1c8e08452948495b1a9985ab46d329e5c2.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 9f0552e0 03-Nov-2016 Chao Yu <chao@kernel.org>

f2fs: fix wrong i_atime recovery

Shouldn't update in-memory i_atime with on-disk i_mtime of inode when
recovering inode.

Shuoran found this bug which is hidden for a long time, honour is belong
to him.

Signed-off-by: Shuoran Liu <liushuoran@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# aaec2b1d 19-Sep-2016 Chao Yu <chao@kernel.org>

f2fs: introduce cp_lock to protect updating of ckpt_flags

This patch introduces spinlock to protect updating process of ckpt_flags
field in struct f2fs_checkpoint, it avoids incorrectly updating in race
condition.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: add __is_set_ckpt_flags likewise __set_ckpt_flags]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 9e1e6df4 19-Sep-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: put directory inodes before checkpoint in roll-forward recovery

Before checkpoint, we'd be better drop any inodes.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# a468f0ef 19-Sep-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: use crc and cp version to determine roll-forward recovery

Previously, we used cp_version only to detect recoverable dnodes.
In order to avoid same garbage cp_version, we needed to truncate the next
dnode during checkpoint, resulting in additional discard or data write.
If we can distinguish this by using crc in addition to cp_version, we can
remove this overhead.

There is backward compatibility concern where it changes node_footer layout.
So, this patch introduces a new checkpoint flag, CP_CRC_RECOVERY_FLAG, to
detect new layout. New layout will be activated only when this flag is set.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# e8ea9b3d 09-Sep-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: avoid ENOMEM during roll-forward recovery

This patch gives another chances during roll-forward recovery regarding to
-ENOMEM.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# f4702d61 09-Sep-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: add common iget in add_fsync_inode

There is no functional change.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# e7ba108a 28-Aug-2016 Shuoran Liu <liushuoran@huawei.com>

f2fs: add roll-forward recovery process for encrypted dentry

Add roll-forward recovery process for encrypted dentry, so the first fsync
issued to an encrypted file does not need writing checkpoint.

This improves the performance of the following test at thousands of small
files: open -> write -> fsync -> close

Signed-off-by: Shuoran Liu <liushuoran@huawei.com>
Acked-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: modify kernel message to show encrypted names]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 275b66b0 29-Aug-2016 Chao Yu <chao@kernel.org>

f2fs: support async discard

Like most filesystems, f2fs will issue discard command synchronously, so
when user trigger fstrim through ioctl, multiple discard commands will be
issued serially with sync mode, which makes poor performance.

In this patch we try to support async discard, so that all discard
commands can be issued and be waited for endio in batch to improve
performance.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 6f3ec995 19-Jul-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: handle error case with f2fs_bug_on

It's enough to show BUG or WARN by f2fs_bug_on for error case.
Then, we don't need to remain corrupted filesystem.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 91246c21 18-Jul-2016 Chao Yu <chao@kernel.org>

f2fs: fix to report error number of f2fs_find_entry

This patch fixes to report the right error number of f2fs_find_entry to
its caller.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 36abef4e 03-Jun-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: introduce mode=lfs mount option

This mount option is to enable original log-structured filesystem forcefully.
So, there should be no random writes for main area.

Especially, this supports host-managed SMR device.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 26de9b11 20-May-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: avoid unnecessary updating inode during fsync

If roll-forward recovery can recover i_size, we don't need to update inode's
metadata during fsync.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# ee6d182f 20-May-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: remove syncing inode page in all the cases

This patch reduces to call them across the whole tree.
- sync_inode_page()
- update_inode_page()
- update_inode()
- f2fs_write_inode()

Instead, checkpoint will flush all the dirty inode metadata before syncing
node pages.
Note that, this is doable, since we call mark_inode_dirty_sync() for all
inode's field change which needs to update on-disk inode as well.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# fc9581c8 20-May-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: introduce f2fs_i_size_write with mark_inode_dirty_sync

This patch introduces f2fs_i_size_write() to call mark_inode_dirty_sync() with
i_size_write().

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 975756c4 19-May-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: avoid ENOSPC fault in the recovery process

This patch avoids impossible error injection, ENOSPC, during recovery process.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 41382ec4 16-May-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: use percpu_counter for alloc_valid_block_count

This patch uses percpu_count for sbi->alloc_valid_block_count.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 3b9b10f9 11-May-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: avoid f2fs_bug_on during recovery

We don't need to use f2fs_bug_on() to treat with any error case when allocating
a block during recovery.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# f61cce5b 07-May-2016 Chao Yu <chao@kernel.org>

f2fs: fix inode cache leak

When testing f2fs with inline_dentry option, generic/342 reports:
VFS: Busy inodes after unmount of dm-0. Self-destruct in 5 seconds. Have a nice day...

After rmmod f2fs module, kenrel shows following dmesg:
=============================================================================
BUG f2fs_inode_cache (Tainted: G O ): Objects remaining in f2fs_inode_cache on __kmem_cache_shutdown()
-----------------------------------------------------------------------------

Disabling lock debugging due to kernel taint
INFO: Slab 0xf51ca0e0 objects=22 used=1 fp=0xd1e6fc60 flags=0x40004080
CPU: 3 PID: 7455 Comm: rmmod Tainted: G B O 4.6.0-rc4+ #16
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
00000086 00000086 d062fe18 c13a83a0 f51ca0e0 d062fe38 d062fea4 c11c7276
c1981040 f51ca0e0 00000016 00000001 d1e6fc60 40004080 656a624f 20737463
616d6572 6e696e69 6e692067 66326620 6e695f73 5f65646f 68636163 6e6f2065
Call Trace:
[<c13a83a0>] dump_stack+0x5f/0x8f
[<c11c7276>] slab_err+0x76/0x80
[<c11cbfc0>] ? __kmem_cache_shutdown+0x100/0x2f0
[<c11cbfc0>] ? __kmem_cache_shutdown+0x100/0x2f0
[<c11cbfe5>] __kmem_cache_shutdown+0x125/0x2f0
[<c1198a38>] kmem_cache_destroy+0x158/0x1f0
[<c176b43d>] ? mutex_unlock+0xd/0x10
[<f8f15aa3>] exit_f2fs_fs+0x4b/0x5a8 [f2fs]
[<c10f596c>] SyS_delete_module+0x16c/0x1d0
[<c1001b10>] ? do_fast_syscall_32+0x30/0x1c0
[<c13c59bf>] ? __this_cpu_preempt_check+0xf/0x20
[<c10afa7d>] ? trace_hardirqs_on_caller+0xdd/0x210
[<c10ad50b>] ? trace_hardirqs_off+0xb/0x10
[<c1001b81>] do_fast_syscall_32+0xa1/0x1c0
[<c176d888>] sysenter_past_esp+0x45/0x74
INFO: Object 0xd1e6d9e0 @offset=6624
kmem_cache_destroy f2fs_inode_cache: Slab cache still has objects
CPU: 3 PID: 7455 Comm: rmmod Tainted: G B O 4.6.0-rc4+ #16
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
00000286 00000286 d062fef4 c13a83a0 f174b000 d062ff14 d062ff28 c1198ac7
c197fe18 f3c5b980 d062ff20 000d04f2 d062ff0c d062ff0c d062ff14 d062ff14
f8f20dc0 fffffff5 d062e000 d062ff30 f8f15aa3 d062ff7c c10f596c 73663266
Call Trace:
[<c13a83a0>] dump_stack+0x5f/0x8f
[<c1198ac7>] kmem_cache_destroy+0x1e7/0x1f0
[<f8f15aa3>] exit_f2fs_fs+0x4b/0x5a8 [f2fs]
[<c10f596c>] SyS_delete_module+0x16c/0x1d0
[<c1001b10>] ? do_fast_syscall_32+0x30/0x1c0
[<c13c59bf>] ? __this_cpu_preempt_check+0xf/0x20
[<c10afa7d>] ? trace_hardirqs_on_caller+0xdd/0x210
[<c10ad50b>] ? trace_hardirqs_off+0xb/0x10
[<c1001b81>] do_fast_syscall_32+0xa1/0x1c0
[<c176d888>] sysenter_past_esp+0x45/0x74

The reason is: in recovery flow, we use delayed iput mechanism for directory
which has recovered dentry block. It means the reference of inode will be
held until last dirty dentry page being writebacked.

But when we mount f2fs with inline_dentry option, during recovery, dirent
may only be recovered into dir inode page rather than dentry page, so there
are no chance for us to release inode reference in ->writepage when
writebacking last dentry page.

We can call paired iget/iput explicityly for inline_dentry case, but for
non-inline_dentry case, iput will call writeback_single_inode to write all
data pages synchronously, but during recovery, ->writepages of f2fs skips
writing all pages, result in losing dirent.

This patch fixes this issue by obsoleting old mechanism, and introduce a
new dir_list to hold all directory inodes which has recovered datas until
finishing recovery.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# ae8d1db3 04-May-2016 Chao Yu <chao@kernel.org>

f2fs: remove unneeded readahead in find_fsync_dnodes

In find_fsync_dnodes, get_tmp_page will read dnode page synchronously,
previously, ra_meta_page did the same work, which is redundant, remove
it.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 3f8ab270 29-Apr-2016 Chao Yu <chao@kernel.org>

f2fs: factor out fsync inode entry operations

Factor out fsync inode entry operations into {add,del}_fsync_inode.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 608514de 15-Apr-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: set fsync mark only for the last dnode

In order to give atomic writes, we should consider power failure during
sync_node_pages in fsync.
So, this patch marks fsync flag only in the last dnode block.

Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 6781eabb 23-Mar-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: give -EINVAL for norecovery and rw mount

Once detecting something to recover, f2fs should stop mounting, given norecovery
and rw mount options.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 09cbfeaf 01-Apr-2016 Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros

PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized. And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special. They are
not.

The changes are pretty straight-forward:

- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;

- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

- page_cache_get() -> get_page();

- page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>


# 28bc106b 05-Feb-2016 Chao Yu <chao@kernel.org>

f2fs: support revoking atomic written pages

f2fs support atomic write with following semantics:
1. open db file
2. ioctl start atomic write
3. (write db file) * n
4. ioctl commit atomic write
5. close db file

With this flow we can avoid file becoming corrupted when abnormal power
cut, because we hold data of transaction in referenced pages linked in
inmem_pages list of inode, but without setting them dirty, so these data
won't be persisted unless we commit them in step 4.

But we should still hold journal db file in memory by using volatile
write, because our semantics of 'atomic write support' is incomplete, in
step 4, we could fail to submit all dirty data of transaction, once
partial dirty data was committed in storage, then after a checkpoint &
abnormal power-cut, db file will be corrupted forever.

So this patch tries to improve atomic write flow by adding a revoking flow,
once inner error occurs in committing, this gives another chance to try to
revoke these partial submitted data of current transaction, it makes
committing operation more like aotmical one.

If we're not lucky, once revoking operation was failed, EAGAIN will be
reported to user for suggesting doing the recovery with held journal file,
or retrying current transaction again.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 81ca7350 26-Jan-2016 Chao Yu <chao@kernel.org>

f2fs: remove unneeded pointer conversion

There are redundant pointer conversion in following call stack:
- at position a, inode was been converted to f2fs_file_info.
- at position b, f2fs_file_info was been converted to inode again.

- truncate_blocks(inode,..)
- fi = F2FS_I(inode) ---a
- ADDRS_PER_PAGE(node_page, fi)
- addrs_per_inode(fi)
- inode = &fi->vfs_inode ---b
- f2fs_has_inline_xattr(inode)
- fi = F2FS_I(inode)
- is_inode_flag_set(fi,..)

In order to avoid unneeded conversion, alter ADDRS_PER_PAGE and
addrs_per_inode to acept parameter with type of inode pointer.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# fec1d657 20-Jan-2016 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: use wait_for_stable_page to avoid contention

In write_begin, if storage supports stable_page, we don't need to wait for
writeback to update its contents.
This patch introduces to use wait_for_stable_page instead of
wait_on_page_writeback.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# c34f42e2 23-Dec-2015 Chao Yu <chao@kernel.org>

f2fs: report error of do_checkpoint

do_checkpoint and write_checkpoint can fail due to reasons like triggering
in a readonly fs or encountering IO error of storage device.

So it's better to report such error info to user, let user be aware of
failure of doing checkpoint.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# b7973f23 30-Nov-2015 Chao Yu <chao@kernel.org>

f2fs: clean up argument of recover_data

In recover_data, value of argument 'type' will be CURSEG_WARM_NODE all
the time, remove it for cleanup.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 807b1e1c 03-Dec-2015 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: do not recover from previous remained wrong dnodes

If device does not support discard, some obsolete dnodes can be recovered
by roll-forward. This patch enhances the recovery flow.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 26879fb1 12-Oct-2015 Chao Yu <chao@kernel.org>

f2fs: support lower priority asynchronous readahead in ra_meta_pages

Now, we use ra_meta_pages to reads continuous physical blocks as much as
possible to improve performance of following reads. However, ra_meta_pages
uses a synchronous readahead approach by submitting bio with READ, as READ
is with high priority, it can not be used in the case of preloading blocks,
and it's not sure when these RAed pages will be used.

This patch supports asynchronous readahead in ra_meta_pages by tagging bio
with READA flag in order to allow preloading.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 2b947003 12-Oct-2015 Chao Yu <chao@kernel.org>

f2fs: don't tag REQ_META for temporary non-meta pages

In recovery or checkpoint flow, we grab pages temperarily in meta inode's
mapping for caching temperary data, actually, datas in these pages were
not meta data of f2fs, but still we tag them with REQ_META flag. However,
lower device like eMMC may do some optimization for data of such type.
So in order to avoid wrong optimization, we'd better remove such flag
for temperary non-meta pages.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 72235541 25-Sep-2015 Chao Yu <chao@kernel.org>

f2fs: remove unneeded f2fs_{,un}lock_op in do_recover_data()

Protecting recovery flow by using cp_rwsem is not needed, since we have
prevent triggering any checkpoint by locking cp_mutex previously.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 9edcdabf 11-Sep-2015 Chao Yu <chao@kernel.org>

f2fs: fix overflow of size calculation

We have potential overflow issue when calculating size of object, when
we left shift index with PAGE_CACHE_SHIFT bits, if type of index has only
32-bits space in 32-bit architecture, left shifting will incur overflow,
i.e:

pgoff_t index = 0xFFFFFFFF;
loff_t size = index << PAGE_CACHE_SHIFT;
size: 0xFFFFF000

So we should cast index with 64-bits type to avoid this issue.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 315df839 11-Aug-2015 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: do not write any node pages related to orphan inodes

We should not write node pages when deleting orphan inodes.
In order to do that, we can eaisly set POR_DOING flag earlier before entering
orphan inode routine.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 12a8343e 05-Aug-2015 Chao Yu <chao@kernel.org>

f2fs: recover invalid/reserved block address for fsynced file

When testing with generic/101 in xfstests, error message outputed as below:

--- tests/generic/101.out
+++ results//generic/101.out.bad
@@ -10,10 +10,14 @@
File foo content after log replay:
0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
*
-0200000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
+0200000 bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
*
0372000
...
(Run 'diff -u tests/generic/101.out results/generic/101.out.bad' to see the entire diff)

The test flow is like below:
1. pwrite foo -S 0xaa 0 64K
2. pwrite foo -S 0xbb 64K 61K
3. sync
4. truncate foo 64K
5. truncate foo 125K
6. fsync foo
7. flakey drop writes
8. umount

After this test, we expect the data of recovered file will have the first
64k of data filling with value 0xaa and the next 61k of data filling with
value 0x00 because we have fsynced it before dropping writes in dm.

In f2fs, during recovering, we will only recover the valid block address
in direct node page if it is marked as a fsynced dnode, but block address
which means invalid/reserved (with value NULL_ADDR/NEW_ADDR) will not be
recovered. So, the file recovered shows its incorrect data 0xbb in range of
[61k, 125k].

In this patch, we fix to recover invalid/reserved block during recover flow.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# e90c2d28 28-Jul-2015 Chao Yu <chao@kernel.org>

f2fs: invalidate temporary meta page

To avoid meeting garbage data in next free node block at the end of warm
node chain when doing recovery, we will try to zero out that invalid block.

If the device is not support discard, our way for zeroing out block is:
grabbing a temporary zeroed page in meta inode, then, issue write request
with this page.

But, we forget to release that temporary page, so our memory usage will
increase without gaining any hit ratio benefit, so it's better to free it
for saving memory.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 528e3459 28-May-2015 Chao Yu <chao@kernel.org>

f2fs: hide common code in f2fs_replace_block

This patch clean up codes through:
1.rename f2fs_replace_block to __f2fs_replace_block().
2.introduce new f2fs_replace_block() to include __f2fs_replace_block()
and some common related codes around __f2fs_replace_block().

Then, newly introduced function f2fs_replace_block can be used by
following patch.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# e7d55452 29-Apr-2015 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs crypto: add filename encryption for roll-forward recovery

This patch adds a bit flag to indicate whether or not i_name in the inode
is encrypted.

If this name is encrypted, we can't do recover_dentry during roll-forward.
So, f2fs_sync_file() needs to do checkpoint, if this will be needed in future.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 19f106bc 05-May-2015 Chao Yu <chao@kernel.org>

f2fs: introduce f2fs_replace_block() for reuse

Introduce a generic function replace_block base on recover_data_page,
and export it. So with it we can operate file's meta data which is in
CP/SSA area when we invoke fallocate with FALLOC_FL_COLLAPSE_RANGE
flag.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# f0c9cada 18-Apr-2015 Chao Yu <chao@kernel.org>

f2fs: use is_valid_blkaddr to verify blkaddr for readability

Export is_valid_blkaddr() and use it to replace some codes for readability.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 10027551 09-Apr-2015 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: pass checkpoint reason on roll-forward recovery

This patch adds CP_RECOVERY to remain recovery information for checkpoint.
And, it makes sure writing checkpoint in this case.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# e03b07d9 01-Apr-2015 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: do not recover wrong data index

During the roll-forward recovery, if we found a new data index written fsync
lastly, we need to recover new block address.
But, if that address was corrupted, we should not recover that.
Otherwise, f2fs gets kernel panic from:

In check_index_in_prev_nodes(),

sentry = get_seg_entry(sbi, segno);
--------------------------> out-of-range segno.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 418f6c27 31-Mar-2015 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: do not increase link count during recovery

If there are multiple fsynced dnodes having a dent flag, roll-forward routine
sets FI_INC_LINK for their inode, and recovery_dentry increases its link count
accordingly.
That results in normal file having a link count as 2, so we can't unlink those
files.

This was added to handle several inode blocks having same inode number with
different directory paths.
But, current f2fs doesn't replay all of path changes and only recover its dentry
for the last fsynced inode block.
So, there is no reason to do this.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 510022a8 30-Mar-2015 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: add F2FS_INLINE_DOTS to recover missing dot dentries

If f2fs was corrupted with missing dot dentries, it needs to recover them after
fsck.f2fs detection.

The underlying precedure is:

1. The fsck.f2fs remains F2FS_INLINE_DOTS flag in directory inode, if it detects
missing dot dentries.

2. When f2fs looks up the corrupted directory, it triggers f2fs_add_link with
proper inode numbers and their dot and dotdot names.

3. Once f2fs recovers the directory without errors, it removes F2FS_INLINE_DOTS
finally.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# c9ef4810 26-Mar-2015 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix mismatching lock and unlock pages for roll-forward recovery

Previously, inode page is not correctly locked and unlocked in pair during
the roll-forward recovery.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 216a620a 19-Mar-2015 Chao Yu <chao@kernel.org>

f2fs: split set_data_blkaddr from f2fs_update_extent_cache

Split __set_data_blkaddr from f2fs_update_extent_cache for readability.

Additionally rename __set_data_blkaddr to set_data_blkaddr for exporting.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 8fbc418f 24-Feb-2015 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: avoid wrong error during recovery

During the roll-forward recovery, -ENOENT for f2fs_iget can be skipped.
So, this error value should not be propagated.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 1614091d 23-Feb-2015 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: remove obsolete code

This patch removes obsolete code in which summary variable is not needed.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 7e4dde79 05-Feb-2015 Chao Yu <chao@kernel.org>

f2fs: introduce universal lookup/update interface for extent cache

In this patch, we do these jobs:
1. rename {check,update}_extent_cache to {lookup,update}_extent_info;
2. introduce universal lookup/update interface of extent cache:
f2fs_{lookup,update}_extent_cache including above two real functions, then
export them to function callers.

So after above cleanup, we can add new rb-tree based extent cache into exported
interfaces.

v2:
o remove "f2fs_" for inner function {lookup,update}_extent_info suggested by
Jaegeuk Kim.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# caf0047e 28-Jan-2015 Chao Yu <chao@kernel.org>

f2fs: merge flags in struct f2fs_sb_info

Currently, there are several variables with Boolean type as below:

struct f2fs_sb_info {
...
int s_dirty;
bool need_fsck;
bool s_closing;
...
bool por_doing;
...
}

For this there are some issues:
1. there are some space of f2fs_sb_info is wasted due to aligning after Boolean
type variables by compiler.
2. if we continuously add new flag into f2fs_sb_info, structure will be messed
up.

So in this patch, we try to:
1. switch s_dirty to Boolean type variable since it has two status 0/1.
2. merge s_dirty/need_fsck/s_closing/por_doing variables into s_flag.
3. introduce an enum type which can indicate different states of sbi.
4. use new introduced universal interfaces is_sbi_flag_set/{set,clear}_sbi_flag
to operate flags for sbi.

After that, above issues will be fixed.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# bc4a1f87 22-Jan-2015 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: leave comment for code readability

During the recovery, any xattr blocks should not be found, since they are
written into cold log, not the warm node chain.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# e1509cf2 30-Dec-2014 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: clean up to remove parameter

This patch uses dn->data_blkaddr as a parameter for the destination block
address.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 635aee1f 08-Dec-2014 Chao Yu <chao@kernel.org>

f2fs: avoid to ra unneeded blocks in recover flow

To improve recovery speed, f2fs try to readahead many contiguous blocks in warm
node segment, but for most time, abnormal power-off do not occur frequently, so
when mount a normal power-off f2fs image, by contrary ra so many blocks and then
invalid them will hurt the performance of mount.
It's better to just ra the first next-block for normal condition.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 9486ba44 21-Nov-2014 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: introduce f2fs_dentry_kunmap to clean up

This patch introduces f2fs_dentry_kunmap to clean up dirty codes.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 622f28ae 24-Sep-2014 Chao Yu <chao@kernel.org>

f2fs: enable inline dir handling

Add inline dir functions into normal dir ops' function to handle inline ops.
Besides, we enable inline dir mode when a new dir inode is created if
inline_data option is on.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# dbeacf02 24-Sep-2014 Chao Yu <chao@kernel.org>

f2fs: export dir operations for inline dir

This patch exports some dir operations for inline dir, additionally introduces
f2fs_drop_nlink from f2fs_delete_entry for reusing by inline dir function.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 7cd8558b 23-Sep-2014 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: check the use of macros on block counts and addresses

This patch cleans up the existing and new macros for readability.

Rule is like this.

,-----------------------------------------> MAX_BLKADDR -,
| ,------------- TOTAL_BLKS ----------------------------,
| | |
| ,- seg0_blkaddr ,----- sit/nat/ssa/main blkaddress |
block | | (SEG0_BLKADDR) | | | | (e.g., MAIN_BLKADDR) |
address 0..x................ a b c d .............................
| |
global seg# 0...................... m .............................
| | |
| `------- MAIN_SEGS -----------'
`-------------- TOTAL_SEGS ---------------------------'
| |
seg# 0..........xx..................

= Note =
o GET_SEGNO_FROM_SEG0 : blk address -> global segno
o GET_SEGNO : blk address -> segno
o START_BLOCK : segno -> starting block address

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 75ab4cb8 20-Sep-2014 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: introduce cp_control structure

This patch add a new data structure to control checkpoint parameters.
Currently, it presents the reason of checkpoint such as is_umount and normal
sync.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# c52e1b10 11-Sep-2014 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: remove redundant operation during roll-forward recovery

If same data is updated multiple times, we don't need to redo whole the
operations.
Let's just update the lastest one.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 441ac5cb 15-Sep-2014 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix roll-forward missing scenarios

We can summarize the roll forward recovery scenarios as follows.

[Term] F: fsync_mark, D: dentry_mark

1. inode(x) | CP | inode(x) | dnode(F)
-> Update the latest inode(x).

2. inode(x) | CP | inode(F) | dnode(F)
-> No problem.

3. inode(x) | CP | dnode(F) | inode(x)
-> Recover to the latest dnode(F), and drop the last inode(x)

4. inode(x) | CP | dnode(F) | inode(F)
-> No problem.

5. CP | inode(x) | dnode(F)
-> The inode(DF) was missing. Should drop this dnode(F).

6. CP | inode(DF) | dnode(F)
-> No problem.

7. CP | dnode(F) | inode(DF)
-> If f2fs_iget fails, then goto next to find inode(DF).

8. CP | dnode(F) | inode(x)
-> If f2fs_iget fails, then goto next to find inode(DF).
But it will fail due to no inode(DF).

So, this patch adds some missing points such as #1, #5, #7, and #8.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 4c521f49 11-Sep-2014 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: use meta_inode cache to improve roll-forward speed

Previously, all the dnode pages should be read during the roll-forward recovery.
Even worsely, whole the chain was traversed twice.
This patch removes that redundant and costly read operations by using page cache
of meta_inode and readahead function as well.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 60979115 12-Sep-2014 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix double lock for inode page during roll-foward recovery

If the inode is same and its data index are needed to truncate, we can fall into
double lock for its inode page via get_dnode_of_data.

Error case is like this.

1. write data 1, 2, 3, 4, 5 in inode #4.
2. write data 100, 102, 103, 104, 105 in dnode #6 of inode #4.
3. sync
4. update data 100->106 in dnode #6.
5. fsync inode #4.
6. power-cut

-> Then,
1. go back to #3's checkpoint
2. in do_recover_data, get_dnode_of_data() gets inode #4.
3. detect 100->106 in dnode #6.
4. check_index_in_prev_nodes tries to truncate 100 in dnode #6.
5. to trigger truncate_hole, get_dnode_of_data should grab inode #4.
6. detect *kernel hang*

This patch should resolve that bug.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 9850cf4a 02-Sep-2014 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: need fsck.f2fs when f2fs_bug_on is triggered

If any f2fs_bug_on is triggered, fsck.f2fs is needed.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 4081363f 02-Sep-2014 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: introduce F2FS_I_SB, F2FS_M_SB, and F2FS_P_SB

This patch adds three inline functions to clean up dirty casting codes.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 202095a7 15-Aug-2014 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: remove rewrite_node_page

I think we need to let the dirty node pages remain in the page cache instead
of rewriting them in their places.
So, after done with successful recovery, write_checkpoint will flush all of them
through the normal write path.
Through this, we can avoid potential error cases in terms of block allocation.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 14f4e690 13-Aug-2014 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: prevent checkpoint during roll-forward

Any checkpoint should not be done during the core roll-forward procedure.
Especially, it includes error cases too.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# b307384e 08-Aug-2014 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: avoid bug_on when error is occurred

During the recovery, if an error like EIO or ENOMEM, f2fs_bug_on should skip.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 1c35a90e 08-Aug-2014 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix to recover inline_xattr/data and blocks

This patch fixes not to skip xattr recovery and inline xattr/data recovery
order.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 695facc0 07-Aug-2014 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: clear FI_INC_LINK during the recovery

If an inode are fsynced multiple times with fsync & dent marks, this inode will
set FI_INC_LINK at find_fsync_dnodes during the recovery.
But, in recover_inode, recover_dentry doesn't clear that flag when multiple hits
were occurred.

So this patch removes the flag for the further consistency.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 70cfed88 02-Aug-2014 Chao Yu <chao@kernel.org>

f2fs: avoid skipping recover_inline_xattr after recover_inline_data

When we recover data of inode in roll-forward procedure, and the inode has both
inline data and inline xattr. We may skip recovering inline xattr if we recover
inline data form node page first.
This patch will fix the problem that we lost inline xattr data in above
scenario.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# cf2271e7 25-Jul-2014 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: avoid retrying wrong recovery routine when error was occurred

This patch eliminates the propagation of recovery errors to the next mount.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 86928f98 06-Jun-2014 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: avoid not to call remove_dirty_inode

There is an errorneous case during the recovery like below.

In recovery_dentry,
1) dir = f2fs_iget();
2) mark the dir with FI_DELAY_IPUT
3) goto unmap_out

After the end of recovery routine, there is no dirty dentries so the dir cannot
be released by iput in remove_dirty_dir_inode.

This patch fixes such the bug case by handling the iget and iput in the
recovery_dentry procedure.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>


# 5c1f9927 28-Apr-2014 Chao Yu <chao@kernel.org>

f2fs: set errno when f2fs_iget failed in recover_dentry

We should set the error number correctly when we fail in recover_dentry(), so
the recover flow could stop for the reason as error number shows instead of
continuing.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 6403eb1f 26-Apr-2014 Chao Yu <chao@kernel.org>

f2fs: introduce help macro ADDRS_PER_PAGE()

Introduce help macro ADDRS_PER_PAGE() to get the number of address pointers in
direct node or inode.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# ed57c27f 14-Apr-2014 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: remove costly dirty_dir_inode operations

This patch removes list opeations in handling dirty dir inodes.
Previously, F2FS traverses whole the list of dirty dir inodes to check whether
there is an existing inode or not, resulting in heavy CPU overheads.

So this patch removes such the traverse operations by adding FI_DIRTY_DIR to
indicate the inode lies on the list or not.
Through this simple flag, we can remove redundant operations gracefully.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 2d7b822a 28-Mar-2014 Chao Yu <chao@kernel.org>

f2fs: use list_for_each_entry{_safe} for simplyfying code

This patch use list_for_each_entry{_safe} instead of list_for_each{_safe} for
simplfying code.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 3cb5ad15 17-Mar-2014 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: call f2fs_wait_on_page_writeback instead of native function

If a page is on writeback, f2fs can face with deadlock due to under writepages.
This is caused by merging IOs inside f2fs, so if it comes to detect, let's throw
merged IOs, which is implemented by f2fs_wait_on_page_writeback.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# e8512d2e 07-Mar-2014 Gu Zheng <guz.fnst@cn.fujitsu.com>

f2fs: remove the unused ctor argument of f2fs_kmem_cache_create()

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 695fd1ed 27-Feb-2014 Chao Yu <chao@kernel.org>

f2fs: use existing macro to clean up some codes

This patch use existing macro F2FS_INODE/NEXT_FREE_BLKADDR to clean up some
codes.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# f6517cfc 27-Jan-2014 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix a build warning

This patch modifies flow a little bit to avoid the following build warnings.

src/fs/f2fs/recovery.c: In function ‘check_index_in_prev_nodes’:
src/fs/f2fs/recovery.c:288:51: warning: ‘sum.<U5390>.<U52f8>.ofs_in_node’ may
be used uninitialized in this function [-Wmaybe-uninitialized]
src/fs/f2fs/recovery.c:260:23: warning: ‘sum.nid’ may be used uninitialized
in this function [-Wmaybe-uninitialized]

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 491c0854 03-Feb-2014 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: clean up with a macro

This patch adds GET_BLKOFF_FROM_SEG0 to clean up some codes.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# abb2366c 27-Jan-2014 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix to recover xattr node block

If a new xattr node page was allocated and its inode is fsynced, we should
recover the xattr node page during the roll-forward process after power-cut.
But, previously, f2fs didn't handle that case, resulting in kernel panic as
follows reported by Tom Li.

BUG: unable to handle kernel paging request at ffffc9001c861a98
IP: [<ffffffffa0295236>] check_index_in_prev_nodes+0x86/0x2d0 [f2fs]
Call Trace:
[<ffffffff815ece9b>] ? printk+0x48/0x4a
[<ffffffffa029626a>] recover_fsync_data+0xdca/0xf50 [f2fs]
[<ffffffffa02873ae>] f2fs_fill_super+0x92e/0x970 [f2fs]
[<ffffffff8112c9f8>] mount_bdev+0x1b8/0x200
[<ffffffffa0286a80>] ? f2fs_remount+0x130/0x130 [f2fs]
[<ffffffffa0285e40>] f2fs_mount+0x10/0x20 [f2fs]
[<ffffffff8112d4de>] mount_fs+0x3e/0x1b0
[<ffffffff810ef4eb>] ? __alloc_percpu+0xb/0x10
[<ffffffff8114761f>] vfs_kern_mount+0x6f/0x120
[<ffffffff811497b9>] do_mount+0x259/0xa90
[<ffffffff810ead1d>] ? memdup_user+0x3d/0x80
[<ffffffff810eadb3>] ? strndup_user+0x53/0x70
[<ffffffff8114a2c9>] SyS_mount+0x89/0xd0
[<ffffffff815feae2>] system_call_fastpath+0x16/0x1b

This patch adds a recovery function of xattr node pages.

Reported-by: Tom Li <biergaizi@members.fsf.org>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 6c311ec6 17-Jan-2014 Chris Fries <cfries@motorola.com>

f2fs: clean checkpatch warnings

Fixed a variety of trivial checkpatch warnings. The only delta should
be some minor formatting on log strings that were split / too long.

Signed-off-by: Chris Fries <cfries@motorola.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 1e1bb4ba 25-Dec-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: add inline_data recovery routine

This patch adds a inline_data recovery routine with the following policy.

[prev.] [next] of inline_data flag
o o -> recover inline_data
o x -> remove inline_data, and then recover data blocks
x o -> remove inline_data, and then recover inline_data
x x -> recover data blocks

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 58bfaf44 26-Dec-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: introduce F2FS_INODE macro to get f2fs_inode

This patch introduces F2FS_INODE that returns struct f2fs_inode * from the inode
page.
By using this macro, we can remove unnecessary casting codes like below.

struct f2fs_inode *ri = &F2FS_NODE(inode_page)->i;
-> struct f2fs_inode *ri = F2FS_INODE(inode_page);

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# d96b1431 22-Dec-2013 Chao Yu <chao@kernel.org>

f2fs: check filename length in recover_dentry

In current flow, we will get Null return value of f2fs_find_entry in
recover_dentry when name.len is bigger than F2FS_NAME_LEN, and then we
still add this inode into its dir entry.
To avoid this situation, we must check filename length before we use it.

Another point is that we could remove the code of checking filename length
In f2fs_find_entry, because f2fs_lookup will be called previously to ensure of
validity of filename length.

V2:
o add WARN_ON() as Jaegeuk Kim suggested.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 6bacf52f 05-Dec-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: add unlikely() macro for compiler more aggressively

This patch adds unlikely() macro into the most of codes.
The basic rule is to add that when:
- checking unusual errors,
- checking page mappings,
- and the other unlikely conditions.

Change log from v1:
- Don't add unlikely for the NULL test and error test: advised by Andi Kleen.

Cc: Chao Yu <chao2.yu@samsung.com>
Cc: Andi Kleen <andi@firstfloor.org>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# b9987a27 04-Dec-2013 Chao Yu <chao@kernel.org>

f2fs: avoid unneeded page release for correct _count of page

In find_fsync_dnodes() and recover_data(), our flow is like this:

->f2fs_submit_page_bio()
-> f2fs_put_page()
-> page_cache_release() ---- page->_count declined to zero.
->__free_pages()
-> put_page_testzero() ---- page->_count will be declined again.

We will get a segment fault in put_page_testzero when CONFIG_DEBUG_VM
is on, or return MM with a bad page with wrong _count num.

So let's just release this page.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# a0acdfe0 04-Dec-2013 Chao Yu <chao@kernel.org>

f2fs: use inner macro GFP_F2FS_ZERO for simplification

Use inner macro GFP_F2FS_ZERO to instead of GFP_NOFS | __GFP_ZERO for
simplification of code.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 93dfe2ac 29-Nov-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: refactor bio-related operations

This patch integrates redundant bio operations on read and write IOs.

1. Move bio-related codes to the top of data.c.
2. Replace f2fs_submit_bio with f2fs_submit_merged_bio, which handles read
bios additionally.
3. Introduce __submit_merged_bio to submit the merged bio.
4. Change f2fs_readpage to f2fs_submit_page_bio.
5. Introduce f2fs_submit_page_mbio to integrate previous submit_read_page and
submit_write_page.

Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com >
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 5d56b671 29-Oct-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: add an option to avoid unnecessary BUG_ONs

If you want to remove unnecessary BUG_ONs, you can just turn off F2FS_CHECK_FS
in your kernel config.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# aabe5136 22-Oct-2013 Haicheng Li <haicheng.li@linux.intel.com>

f2fs: use bool for booleans

Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# e479556b 27-Sep-2013 Gu Zheng <guz.fnst@cn.fujitsu.com>

f2fs: use rw_sem instead of fs_lock(locks mutex)

The fs_locks is used to block other ops(ex, recovery) when doing checkpoint.
And each other operate routine(besides checkpoint) needs to acquire a fs_lock,
there is a terrible problem here, if these are too many concurrency threads acquiring
fs_lock, so that they will block each other and may lead to some performance problem,
but this is not the phenomenon we want to see.
Though there are some optimization patches introduced to enhance the usage of fs_lock,
but the thorough solution is using a *rw_sem* to replace the fs_lock.
Checkpoint routine takes write_sem, and other ops take read_sem, so that we can block
other ops(ex, recovery) when doing checkpoint, and other ops will not disturb each other,
this can avoid the problem described above completely.
Because of the weakness of rw_sem, the above change may introduce a potential problem
that the checkpoint thread might get starved if other threads are intensively locking
the read semaphore for I/O.(Pointed out by Xu Jin)
In order to avoid this, a wait_list is introduced, the appending read semaphore ops
will be dropped into the wait_list if checkpoint thread is waiting for write semaphore,
and will be waked up when checkpoint thread gives up write semaphore.
Thanks to Kim's previous review and test, and will be very glad to see other guys'
performance tests about this patch.

V2:
-fix the potential starvation problem.
-use more suitable func name suggested by Xu Jin.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
[Jaegeuk Kim: adjust minor coding standard]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 2e5558f4 24-Sep-2013 Russ W. Knize <rknize@gmail.com>

f2fs: account for orphan inodes during recovery

During recovery, orphan inodes are deleted via truncate_hole().
These orphans are added by recover_dentry() via f2fs_delete_entry().
However, f2fs_delete_entry() adds them via add_orphan_inode()
without calling acquire_orphan_inode() first. This prevents the
counters from being incremented properly, which causes them to
underflow when remove_orphan_inode() is called later on.

Signed-off-by: Russ Knize <rknize@motorola.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 691c6fd2 23-Sep-2013 Chao Yu <chao@kernel.org>

f2fs: remove unneeded write checkpoint in recover_fsync_data

Previously, recover_fsync_data still to write checkpoint when there is
nothing to recover with normal umount image.
It may reduce mount performance and flash memory lifetime, so let's remove
it.

Signed-off-by: Tan Shu <shu.tan@samsung.com>
Signed-off-by: Yu Chao <chao2.yu@samsung.com>
Reviewed-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# de93653f 12-Aug-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: reserve the xattr space dynamically

This patch enables the number of direct pointers inside on-disk inode block to
be changed dynamically according to the size of inline xattr space.

The number of direct pointers, ADDRS_PER_INODE, can be changed only if the file
has inline xattr flag.

The number of direct pointers that will be used by inline xattrs is defined as
F2FS_INLINE_XATTR_ADDRS.
Current patch assigns F2FS_INLINE_XATTR_ADDRS to 0 temporarily.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# e27dae4d 14-Aug-2013 Dan Carpenter <dan.carpenter@oracle.com>

f2fs: alloc_page() doesn't return an ERR_PTR

alloc_page() returns a NULL on failure, it never returns an ERR_PTR.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# d71b5564 09-Aug-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: introduce cur_cp_version function to reduce code size

This patch introduces a new inline function, cur_cp_version, to reduce redundant
codes.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 45590710 15-Jul-2013 Gu Zheng <guz.fnst@cn.fujitsu.com>

f2fs: introduce help function F2FS_NODE()

Introduce help function F2FS_NODE() to simplify the conversion of node_page to
f2fs_node.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 5ebefc5b 26-Jun-2013 Gu Zheng <guz.fnst@cn.fujitsu.com>

f2fs: remove the unused argument "sbi" of func destroy_fsync_dnodes()

As destroy_fsync_dnodes() is a simple list-cleanup func, so delete the unused
and unrelated f2fs_sb_info argument of it.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 060dd67b 23-Jun-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix an endian conversion bug detected by sparse

This patch should fix the following bug reported by kbuild test robot.

fs/f2fs/recovery.c:233:33: sparse: incorrect type in assignment
(different base types)

parse warnings: (new ones prefixed by >>)

>> recovery.c:233: sparse: incorrect type in assignment (different base types)
recovery.c:233: expected unsigned int [unsigned] [assigned] ofs_in_node
recovery.c:233: got restricted __le16 [assigned] [usertype] ofs_in_node
>> recovery.c:238: sparse: incorrect type in assignment (different base types)
recovery.c:238: expected unsigned int [unsigned] ofs_in_node
recovery.c:238: got restricted __le16 [assigned] [usertype] ofs_in_node

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 5deb8267 05-Jun-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix iget/iput of dir during recovery

It is possible that iput is skipped after iget during the recovery.

In recover_dentry(),
dir = f2fs_iget();
...
if (de && inode->i_ino == le32_to_cpu(de->ino))
goto out;

In this case, this dir is not able to be added in dirty_dir_inode_list.
The actual linking is done only when set_page_dirty() is called.

So let's add this newly got inode into the list explicitly, and put it at the
end of the recovery routine.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 6b8213d9 27-May-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix dentry recovery routine

The error scenario is:
1. create /a
(1.a link /a /b)
2. sync
3. unlinke /a
4. create /a
5. fsync /a
6. Sudden power-off

When the f2fs recovers the fsynced dentry, /a, we discover an exsiting dentry at
f2fs_find_entry() in recover_dentry().

In such the case, we should unlink the existing dentry and its inode
and then recover newly created dentry.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# f28c06fa 23-May-2013 Dan Carpenter <dan.carpenter@oracle.com>

f2fs: dereferencing an ERR_PTR

There is an error path where "dir" is an ERR_PTR.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 39cf72cf 21-May-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix to handle do_recover_data errors

This patch adds error handling codes of check_index_in_prev_nodes and its
caller, do_recover_data.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# b292dcab 21-May-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: reuse the locked dnode page and its inode

This patch fixes the following deadlock bug during the recovery.

INFO: task mount:1322 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mount D ffffffff81125870 0 1322 1266 0x00000000
ffff8801207e39d8 0000000000000046 ffff88012ab1dee0 0000000000000046
ffff8801207e3a08 ffff880115903f40 ffff8801207e3fd8 ffff8801207e3fd8
ffff8801207e3fd8 ffff880115903f40 ffff8801207e39d8 ffff88012fc94520
Call Trace:
[<ffffffff81125870>] ? __lock_page+0x70/0x70
[<ffffffff816a92d9>] schedule+0x29/0x70
[<ffffffff816a93af>] io_schedule+0x8f/0xd0
[<ffffffff8112587e>] sleep_on_page+0xe/0x20
[<ffffffff816a649a>] __wait_on_bit_lock+0x5a/0xc0
[<ffffffff81125867>] __lock_page+0x67/0x70
[<ffffffff8106c7b0>] ? autoremove_wake_function+0x40/0x40
[<ffffffff81126857>] find_lock_page+0x67/0x80
[<ffffffff8112698f>] find_or_create_page+0x3f/0xb0
[<ffffffffa03901a8>] ? sync_inode_page+0xa8/0xd0 [f2fs]
[<ffffffffa038fdf7>] get_node_page+0x67/0x180 [f2fs]
[<ffffffffa039818b>] recover_fsync_data+0xacb/0xff0 [f2fs]
[<ffffffff816aaa1e>] ? _raw_spin_unlock+0x3e/0x40
[<ffffffffa0389634>] f2fs_fill_super+0x7d4/0x850 [f2fs]
[<ffffffff81184cf9>] mount_bdev+0x1c9/0x210
[<ffffffffa0388e60>] ? validate_superblock+0x180/0x180 [f2fs]
[<ffffffffa0387635>] f2fs_mount+0x15/0x20 [f2fs]
[<ffffffff81185a13>] mount_fs+0x43/0x1b0
[<ffffffff81145ba0>] ? __alloc_percpu+0x10/0x20
[<ffffffff811a0796>] vfs_kern_mount+0x76/0x120
[<ffffffff811a2cb7>] do_mount+0x237/0xa10
[<ffffffff81140b9b>] ? strndup_user+0x5b/0x80
[<ffffffff811a3520>] SyS_mount+0x90/0xe0
[<ffffffff816b3502>] system_call_fastpath+0x16/0x1b

The bug is triggered when check_index_in_prev_nodes tries to get the direct
node page by calling get_node_page.
At this point, if the direct node page is already locked by get_dnode_of_data,
its caller, we got a deadlock condition.

This patch adds additional condition check for the reuse of locked direct node
pages prior to the get_node_page call.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 2c2c149f 19-May-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: don't do checkpoint if error is occurred

If we met an error during the dentry recovery, we should not conduct checkpoint.
Otherwise, some errorneous dentry blocks overwrites the existing blocks that
contain the remaining recovery information.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 45856aff 19-May-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix to unlock page before exit

If we got an error after lock_page, we should unlock it before exit.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 9a55ed65 19-May-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: remove unnecessary kmap/kunmap operations

The allocated page used by the recovery is not on HIGHMEM, so that we don't
need to use kmap/kunmap.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# f356fe0c 16-May-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: add debug msgs in the recovery routine

This patch adds some trivial debugging messages in the recovery process.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 74d0b917 15-May-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix BUG_ON during f2fs_evict_inode(dir)

During the dentry recovery routine, recover_inode() triggers __f2fs_add_link
with its directory inode.

In the following scenario, a bug is captured.
1. dir = f2fs_iget(pino)
2. __f2fs_add_link(dir, name)
3. iput(dir)
-> f2fs_evict_inode() faces with BUG_ON(atomic_read(fi->dirty_dents))

Kernel BUG at ffffffffa01c0676 [verbose debug info unavailable]
[<ffffffffa01c0676>] f2fs_evict_inode+0x276/0x300 [f2fs]
Call Trace:
[<ffffffff8118ea00>] evict+0xb0/0x1b0
[<ffffffff8118f1c5>] iput+0x105/0x190
[<ffffffffa01d2dac>] recover_fsync_data+0x3bc/0x1070 [f2fs]
[<ffffffff81692e8a>] ? io_schedule+0xaa/0xd0
[<ffffffff81690acb>] ? __wait_on_bit_lock+0x7b/0xc0
[<ffffffff8111a0e7>] ? __lock_page+0x67/0x70
[<ffffffff81165e21>] ? kmem_cache_alloc+0x31/0x140
[<ffffffff8118a502>] ? __d_instantiate+0x92/0xf0
[<ffffffff812a949b>] ? security_d_instantiate+0x1b/0x30
[<ffffffff8118a5b4>] ? d_instantiate+0x54/0x70

This means that we should flush all the dentry pages between iget and iput().
But, during the recovery routine, it is unallowed due to consistency, so we
have to wait the whole recovery process.
And then, write_checkpoint flushes all the dirty dentry blocks, and nicely we
can put the stale dir inodes from the dirty_dir_inode_list.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 8c26d7d5 15-May-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix por_doing variable coverage

The reason of using sbi->por_doing is to alleviate data writes during the
recovery.
The find_fsync_dnodes() produces some dirty dentry pages, so we should
cover it too with sbi->por_doing.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# addbe45b 14-May-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: remove redundant assignment

We don't need to assign a value redundantly.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 047184b4 02-May-2013 Chris Fries <C.Fries@motorola.com>

f2fs: recover when journal contains deleted files

When recovering a journal file with fsync data for files that have
been deleted, don't bail out on recovery.

Signed-off-by: Chris Fries <C.Fries@motorola.com>
Reviewed-by: Russell Knize <rknize2@motorola.com>
Reviewed-by: Jason Hrycay <jason.hrycay@motorola.com>
[Jaegeuk Kim: fit the coding style]
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 39936837 22-Nov-2012 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: introduce a new global lock scheme

In the previous version, f2fs uses global locks according to the usage types,
such as directory operations, block allocation, block write, and so on.

Reference the following lock types in f2fs.h.
enum lock_type {
RENAME, /* for renaming operations */
DENTRY_OPS, /* for directory operations */
DATA_WRITE, /* for data write */
DATA_NEW, /* for data allocation */
DATA_TRUNC, /* for data truncate */
NODE_NEW, /* for node allocation */
NODE_TRUNC, /* for node truncate */
NODE_WRITE, /* for node write */
NR_LOCK_TYPE,
};

In that case, we lose the performance under the multi-threading environment,
since every types of operations must be conducted one at a time.

In order to address the problem, let's share the locks globally with a mutex
array regardless of any types.
So, let users grab a mutex and perform their jobs in parallel as much as
possbile.

For this, I propose a new global lock scheme as follows.

0. Data structure
- f2fs_sb_info -> mutex_lock[NR_GLOBAL_LOCKS]
- f2fs_sb_info -> node_write

1. mutex_lock_op(sbi)
- try to get an avaiable lock from the array.
- returns the index of the gottern lock variable.

2. mutex_unlock_op(sbi, index of the lock)
- unlock the given index of the lock.

3. mutex_lock_all(sbi)
- grab all the locks in the array before the checkpoint.

4. mutex_unlock_all(sbi)
- release all the locks in the array after checkpoint.

5. block_operations()
- call mutex_lock_all()
- sync_dirty_dir_inodes()
- grab node_write
- sync_node_pages()

Note that,
the pairs of mutex_lock_op()/mutex_unlock_op() and
mutex_lock_all()/mutex_unlock_all() should be used together.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 6ead1142 20-Mar-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix the recovery flow to handle errors correctly

We should handle errors during the recovery flow correctly.
For example, if we get -ENOMEM, we should report a mount failure instead of
conducting the remained mount procedure.

Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 393ff91f 08-Mar-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: reduce unncessary locking pages during read

This patch reduces redundant locking and unlocking pages during read operations.
In f2fs_readpage, let's use wait_on_page_locked() instead of lock_page.
And then, when we need to modify any data finally, let's lock the page so that
we can avoid lock contention.

[readpage rule]
- The f2fs_readpage returns unlocked page, or released page too in error cases.
- Its caller should handle read error, -EIO, after locking the page, which
indicates read completion.
- Its caller should check PageUptodate after grab_cache_page.

Signed-off-by: Changman Lee <cm224.lee@samsung.com>
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 266e97a8 25-Feb-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: introduce readahead mode of node pages

Previously, f2fs reads several node pages ahead when get_dnode_of_data is called
with RDONLY_NODE flag.
And, this flag is set by the following functions.
- get_data_block_ro
- get_lock_data_page
- do_write_data_page
- truncate_blocks
- truncate_hole

However, this readahead mechanism is initially introduced for the use of
get_data_block_ro to enhance the sequential read performance.

So, let's clarify all the cases with the additional modes as follows.

enum {
ALLOC_NODE, /* allocate a new node page if needed */
LOOKUP_NODE, /* look up a node without readahead */
LOOKUP_NODE_RA, /*
* look up a node with readahead called
* by get_datablock_ro.
*/
}

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Reviewed-by: Namjae Jeon <namjae.jeon@samsung.com>


# 43727527 03-Feb-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: clarify and enhance the f2fs_gc flow

This patch makes clearer the ambiguous f2fs_gc flow as follows.

1. Remove intermediate checkpoint condition during f2fs_gc
(i.e., should_do_checkpoint() and GC_BLOCKED)

2. Remove unnecessary return values of f2fs_gc because of #1.
(i.e., GC_NODE, GC_OK, etc)

3. Simplify write_checkpoint() because of #2.

4. Clarify the main f2fs_gc flow.
o monitor how many freed sections during one iteration of do_garbage_collect().
o do GC more without checkpoints if we can't get enough free sections.
o do checkpoint once we've got enough free sections through forground GCs.

5. Adopt thread-logging (Slack-Space-Recycle) scheme more aggressively on data
log types. See. get_ssr_segement()

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# d4686d56 30-Jan-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: avoid balanc_fs during evict_inode

1. Background

Previously, if f2fs tries to move data blocks of an *evicting* inode during the
cleaning process, it stops the process incompletely and then restarts the whole
process, since it needs a locked inode to grab victim data pages in its address
space. In order to get a locked inode, iget_locked() by f2fs_iget() is normally
used, but, it waits if the inode is on freeing.

So, here is a deadlock scenario.
1. f2fs_evict_inode() <- inode "A"
2. f2fs_balance_fs()
3. f2fs_gc()
4. gc_data_segment()
5. f2fs_iget() <- inode "A" too!

If step #1 and #5 treat a same inode "A", step #5 would fall into deadlock since
the inode "A" is on freeing. In order to resolve this, f2fs_iget_nowait() which
skips __wait_on_freeing_inode() was introduced in step #5, and stops f2fs_gc()
to complete f2fs_evict_inode().

1. f2fs_evict_inode() <- inode "A"
2. f2fs_balance_fs()
3. f2fs_gc()
4. gc_data_segment()
5. f2fs_iget_nowait() <- inode "A", then stop f2fs_gc() w/ -ENOENT

2. Problem and Solution

In the above scenario, however, f2fs cannot finish f2fs_evict_inode() only if:
o there are not enough free sections, and
o f2fs_gc() tries to move data blocks of the *evicting* inode repeatedly.

So, the final solution is to use f2fs_iget() and remove f2fs_balance_fs() in
f2fs_evict_inode().
The f2fs_evict_inode() actually truncates all the data and node blocks, which
means that it doesn't produce any dirty node pages accordingly.
So, we don't need to do f2fs_balance_fs() in practical.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# b7f7a5e0 25-Jan-2013 Al Viro <viro@zeniv.linux.org.uk>

f2fs: get rid of fake on-stack dentries

those should never be used for a lot of reasons...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>


# d8b79b2f 20-Jan-2013 Dan Carpenter <dan.carpenter@oracle.com>

f2fs: use _safe() version of list_for_each

This is calling list_del() inside a loop which is a problem when we try
move to the next item on the list. I've converted it to use the _safe
version. And also, as a cleanup, I've converted it to use
list_for_each_entry instead of list_for_each.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# c335a869 02-Jan-2013 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: check return value during recovery

This patch resolves Coverity #753102:

>>> No check of the return value of "f2fs_add_link(&dent, inode)".

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 24c366a9 29-Dec-2012 Namjae Jeon <namjae.jeon@samsung.com>

f2fs: remove unneeded INIT_LIST_HEAD at few places

While creating a new entry for addition to the list(orphan inode list
and fsync inode entry list), there is no need to call HEAD initialization
for these entries. So, remove that init part.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# fd8bb65f 21-Dec-2012 Namjae Jeon <namjae.jeon@samsung.com>

f2fs: fix fsync_inode list addition logic and avoid invalid access to memory

In function find_fsync_dnodes() - the fsync inodes gets added to the list, but
in one path suppose f2fs_iget results in error, in such case - error gets added
to the fsync inode list.
In next call to recover_data()->get_fsync_inode()
entry = list_entry(this, struct fsync_inode_entry, list);
if (entry->inode->i_ino == ino)
This can result in "invalid access to memory" when it encounters 'error' as
entry in the fsync inode list.
So, add the fsync inode entry to the list only in case of no errors.
And, free the object at that point itself in case of issue.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 06025f4d 21-Dec-2012 Namjae Jeon <namjae.jeon@samsung.com>

f2fs: handle error from f2fs_iget_nowait

In case f2fs_iget_nowait returns error, it results in truncate_hole being
called with 'error' value as inode pointer. There is no check in truncate_hole
for valid inode, so it could result in crash due "invalid access to memory".
Avoid this by handling error condition properly.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Amit Sahrawat <a.sahrawat@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 0a8165d7 28-Nov-2012 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: adjust kernel coding style

As pointed out by Randy Dunlap, this patch removes all usage of "/**" for comment
blocks. Instead, just use "/*".

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# 25ca923b 28-Nov-2012 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: fix endian conversion bugs reported by sparse

This patch should resolve the bugs reported by the sparse tool.
Initial reports were written by "kbuild test robot" managed by fengguang.wu.

In my local machines, I've tested also by running:
> make C=2 CF="-D__CHECK_ENDIAN__"

Accordingly, I've found lots of warnings and bugs related to the endian
conversion. And I've fixed all at this moment.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>


# d624c96f 02-Nov-2012 Jaegeuk Kim <jaegeuk@kernel.org>

f2fs: add recovery routines for roll-forward

This adds roll-forward routines to recover fsynced data.

- F2FS uses basically roll-back model with checkpointing.

- In order to implement fsync(), there are two approaches as follows.

1. A roll-back model with checkpointing at every fsync()
: This is a naive method, but suffers from very low performance.

2. A roll-forward model
: F2FS adopts this model where all the fsynced data should be recovered, which
were written after checkpointing was done. In order to figure out the data,
F2FS keeps a "fsync" mark in direct node blocks. In addition, F2FS remains
the location of next node block in each direct node block for reconstructing
the chain of node blocks during the recovery.

- In order to enhance the performance, F2FS keeps a "dentry" mark also in direct
node blocks. If this is set during the recovery, F2FS replays adding a dentry.

Signed-off-by: Jaegeuk Kim <jaegeuk.kim@samsung.com>