This patch fixes to set FI_UPDATE_WRITE only if in-place IO was issued.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
As Jungyeon reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=203239
- Overview
When mounting the attached crafted image and running program, following errors are reported.
Additionally, it hangs on sync after running program.
The image is intentionally fuzzed from a normal f2fs image for testing.
Compile options for F2FS are as follows.
CONFIG_F2FS_FS=y
CONFIG_F2FS_STAT_FS=y
CONFIG_F2FS_FS_XATTR=y
CONFIG_F2FS_FS_POSIX_ACL=y
CONFIG_F2FS_CHECK_FS=y
- Reproduces
cc poc_15.c
./run.sh f2fs
sync
- Kernel messages
------------[ cut here ]------------
kernel BUG at fs/f2fs/segment.c:3162!
RIP: 0010:f2fs_inplace_write_data+0x12d/0x160
Call Trace:
f2fs_do_write_data_page+0x3c1/0x820
__write_data_page+0x156/0x720
f2fs_write_cache_pages+0x20d/0x460
f2fs_write_data_pages+0x1b4/0x300
do_writepages+0x15/0x60
__filemap_fdatawrite_range+0x7c/0xb0
file_write_and_wait_range+0x2c/0x80
f2fs_do_sync_file+0x102/0x810
do_fsync+0x33/0x60
__x64_sys_fsync+0xb/0x10
do_syscall_64+0x43/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The reason is f2fs_inplace_write_data() will trigger kernel panic due
to data block locates in node type segment.
To avoid panic, let's just return error code and set SBI_NEED_FSCK to
give a hint to fsck for latter repairing.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
As Jungyeon reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=203233
- Overview
When mounting the attached crafted image and running program, following errors are reported.
Additionally, it hangs on sync after running program.
The image is intentionally fuzzed from a normal f2fs image for testing.
Compile options for F2FS are as follows.
CONFIG_F2FS_FS=y
CONFIG_F2FS_STAT_FS=y
CONFIG_F2FS_FS_XATTR=y
CONFIG_F2FS_FS_POSIX_ACL=y
CONFIG_F2FS_CHECK_FS=y
- Reproduces
cc poc_13.c
mkdir test
mount -t f2fs tmp.img test
cp a.out test
cd test
sudo ./a.out
sync
- Kernel messages
F2FS-fs (sdb): Bitmap was wrongly set, blk:4608
kernel BUG at fs/f2fs/segment.c:2102!
RIP: 0010:update_sit_entry+0x394/0x410
Call Trace:
f2fs_allocate_data_block+0x16f/0x660
do_write_page+0x62/0x170
f2fs_do_write_node_page+0x33/0xa0
__write_node_page+0x270/0x4e0
f2fs_sync_node_pages+0x5df/0x670
f2fs_write_checkpoint+0x372/0x1400
f2fs_sync_fs+0xa3/0x130
f2fs_do_sync_file+0x1a6/0x810
do_fsync+0x33/0x60
__x64_sys_fsync+0xb/0x10
do_syscall_64+0x43/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
sit.vblocks and sum valid block count in sit.valid_map may be
inconsistent, segment w/ zero vblocks will be treated as free
segment, while allocating in free segment, we may allocate a
free block, if its bitmap is valid previously, it can cause
kernel crash due to bitmap verification failure.
Anyway, to avoid further serious metadata inconsistence and
corruption, it is necessary and worth to detect SIT
inconsistence. So let's enable check_block_count() to verify
vblocks and valid_map all the time rather than do it only
CONFIG_F2FS_CHECK_FS is enabled.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
As Jungyeon reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=203229
- Overview
When mounting the attached crafted image, following errors are reported.
Additionally, it hangs on sync after trying to mount it.
The image is intentionally fuzzed from a normal f2fs image for testing.
Compile options for F2FS are as follows.
CONFIG_F2FS_FS=y
CONFIG_F2FS_STAT_FS=y
CONFIG_F2FS_FS_XATTR=y
CONFIG_F2FS_FS_POSIX_ACL=y
CONFIG_F2FS_CHECK_FS=y
- Reproduces
mkdir test
mount -t f2fs tmp.img test
sync
- Kernel message
kernel BUG at fs/f2fs/recovery.c:591!
RIP: 0010:recover_data+0x12d8/0x1780
Call Trace:
f2fs_recover_fsync_data+0x613/0x710
f2fs_fill_super+0x1043/0x1aa0
mount_bdev+0x16d/0x1a0
mount_fs+0x4a/0x170
vfs_kern_mount+0x5d/0x100
do_mount+0x200/0xcf0
ksys_mount+0x79/0xc0
__x64_sys_mount+0x1c/0x20
do_syscall_64+0x43/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
With corrupted image wihch has out-of-range valid node/block count, during
recovery, once we failed due to no free space, it will trigger kernel
panic.
Adding sanity check on valid node/block count in f2fs_sanity_check_ckpt()
to detect such condition, so that potential panic can be avoided.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
As Jungyeon reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=203227
- Overview
When mounting the attached crafted image, following errors are reported.
Additionally, it hangs on sync after trying to mount it.
The image is intentionally fuzzed from a normal f2fs image for testing.
Compile options for F2FS are as follows.
CONFIG_F2FS_FS=y
CONFIG_F2FS_STAT_FS=y
CONFIG_F2FS_FS_XATTR=y
CONFIG_F2FS_FS_POSIX_ACL=y
CONFIG_F2FS_CHECK_FS=y
- Reproduces
mkdir test
mount -t f2fs tmp.img test
sync
- Messages
kernel BUG at fs/f2fs/recovery.c:549!
RIP: 0010:recover_data+0x167a/0x1780
Call Trace:
f2fs_recover_fsync_data+0x613/0x710
f2fs_fill_super+0x1043/0x1aa0
mount_bdev+0x16d/0x1a0
mount_fs+0x4a/0x170
vfs_kern_mount+0x5d/0x100
do_mount+0x200/0xcf0
ksys_mount+0x79/0xc0
__x64_sys_mount+0x1c/0x20
do_syscall_64+0x43/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
During recovery, if ofs_of_node is inconsistent in between recovered
node page and original checkpointed node page, let's just fail recovery
instead of making kernel panic.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
As Jungyeon reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=203225
- Overview
When mounting the attached crafted image and unmounting it, following errors are reported.
Additionally, it hangs on sync after unmounting.
The image is intentionally fuzzed from a normal f2fs image for testing.
Compile options for F2FS are as follows.
CONFIG_F2FS_FS=y
CONFIG_F2FS_STAT_FS=y
CONFIG_F2FS_FS_XATTR=y
CONFIG_F2FS_FS_POSIX_ACL=y
CONFIG_F2FS_CHECK_FS=y
- Reproduces
mkdir test
mount -t f2fs tmp.img test
touch test/t
umount test
sync
- Messages
kernel BUG at fs/f2fs/node.c:3073!
RIP: 0010:f2fs_destroy_node_manager+0x2f0/0x300
Call Trace:
f2fs_put_super+0xf4/0x270
generic_shutdown_super+0x62/0x110
kill_block_super+0x1c/0x50
kill_f2fs_super+0xad/0xd0
deactivate_locked_super+0x35/0x60
cleanup_mnt+0x36/0x70
task_work_run+0x75/0x90
exit_to_usermode_loop+0x93/0xa0
do_syscall_64+0xba/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
RIP: 0010:f2fs_destroy_node_manager+0x2f0/0x300
NAT table is corrupted, so reserved meta/node inode ids were added into
free list incorrectly, during file creation, since reserved id has cached
in inode hash, so it fails the creation and preallocated nid can not be
released later, result in kernel panic.
To fix this issue, let's do nid boundary check during free nid loading.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
As Jungyeon reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=203221
- Overview
When mounting the attached crafted image and running program, this error is reported.
The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
- Reproduces
cc poc_07.c
mkdir test
mount -t f2fs tmp.img test
cp a.out test
cd test
sudo ./a.out
- Messages
kernel BUG at fs/f2fs/node.c:1279!
RIP: 0010:read_node_page+0xcf/0xf0
Call Trace:
__get_node_page+0x6b/0x2f0
f2fs_iget+0x8f/0xdf0
f2fs_lookup+0x136/0x320
__lookup_slow+0x92/0x140
lookup_slow+0x30/0x50
walk_component+0x1c1/0x350
path_lookupat+0x62/0x200
filename_lookup+0xb3/0x1a0
do_fchmodat+0x3e/0xa0
__x64_sys_chmod+0x12/0x20
do_syscall_64+0x43/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
On below paths, we can have opportunity to readahead inode page
- gc_node_segment -> f2fs_ra_node_page
- gc_data_segment -> f2fs_ra_node_page
- f2fs_fill_dentries -> f2fs_ra_node_page
Unlike synchronized read, on readahead path, we can set page uptodate
before verifying page's checksum, then read_node_page() will trigger
kernel panic once it encounters a uptodated page w/ incorrect checksum.
So considering readahead scenario, we have to do checksum each time
when loading inode page even if it is uptodated.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
As Jungyeon reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=203219
- Overview
When mounting the attached crafted image and running program, I got this error.
Additionally, it hangs on sync after running the program.
The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
- Reproduces
cc poc_06.c
mkdir test
mount -t f2fs tmp.img test
cp a.out test
cd test
sudo ./a.out
sync
- Messages
kernel BUG at fs/f2fs/node.c:1183!
RIP: 0010:f2fs_remove_inode_page+0x294/0x2d0
Call Trace:
f2fs_evict_inode+0x2a3/0x3a0
evict+0xba/0x180
__dentry_kill+0xbe/0x160
dentry_kill+0x46/0x180
dput+0xbb/0x100
do_renameat2+0x3c9/0x550
__x64_sys_rename+0x17/0x20
do_syscall_64+0x43/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The reason is f2fs_remove_inode_page() will trigger kernel panic due to
inconsistent i_blocks value of inode.
To avoid panic, let's just print debug message and set SBI_NEED_FSCK to
give a hint to fsck for latter repairing of potential image corruption.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix build warning and add unlikely]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
As Jungyeon reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=203217
- Overview
When mounting the attached crafted image and running program, I got this error.
Additionally, it hangs on sync after running the program.
The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
- Reproduces
cc poc_test_05.c
mkdir test
mount -t f2fs tmp.img test
sudo ./a.out
sync
- Messages
kernel BUG at fs/f2fs/inode.c:707!
RIP: 0010:f2fs_evict_inode+0x33f/0x3a0
Call Trace:
evict+0xba/0x180
f2fs_iget+0x598/0xdf0
f2fs_lookup+0x136/0x320
__lookup_slow+0x92/0x140
lookup_slow+0x30/0x50
walk_component+0x1c1/0x350
path_lookupat+0x62/0x200
filename_lookup+0xb3/0x1a0
do_readlinkat+0x56/0x110
__x64_sys_readlink+0x16/0x20
do_syscall_64+0x43/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
During inode loading, __recover_inline_status() can recovery inode status
and set inode dirty, once we failed in following process, it will fail
the check in f2fs_evict_inode, result in trigger BUG_ON().
Let's clear dirty inode in error path of f2fs_iget() to avoid panic.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Just removing '\n' in f2fs_msg(, "\n") to avoid redundant new blank line.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch changes codes as below:
- don't use is_read_io() as a condition to judge the meta IO.
- use .is_por to replace .is_meta to indicate IO is from recovery explicitly.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
As Jungyeon reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=203213
- Overview
When mounting the attached crafted image and running program, I got this error.
Additionally, it hangs on sync after running the this script.
The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
- Reproduces
mkdir test
mount -t f2fs tmp.img test
cp a.out test
cd test
sudo ./a.out
sync
kernel BUG at fs/f2fs/f2fs.h:2012!
RIP: 0010:truncate_node+0x2c9/0x2e0
Call Trace:
f2fs_truncate_xattr_node+0xa1/0x130
f2fs_remove_inode_page+0x82/0x2d0
f2fs_evict_inode+0x2a3/0x3a0
evict+0xba/0x180
__dentry_kill+0xbe/0x160
dentry_kill+0x46/0x180
dput+0xbb/0x100
do_renameat2+0x3c9/0x550
__x64_sys_rename+0x17/0x20
do_syscall_64+0x43/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The reason is dec_valid_node_count() will trigger kernel panic due to
inconsistent count in between inode.i_blocks and actual block.
To avoid panic, let's just print debug message and set SBI_NEED_FSCK to
give a hint to fsck for latter repairing.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix build warning and add unlikely]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
As Jungyeon reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=203209
- Overview
When mounting the attached crafted image and running program, I got this error.
Additionally, it hangs on sync after the this script.
The image is intentionally fuzzed from a normal f2fs image for testing and I enabled option CONFIG_F2FS_CHECK_FS on.
- Reproduces
cc poc_01.c
./run.sh f2fs
sync
kernel BUG at fs/f2fs/f2fs.h:1788!
RIP: 0010:f2fs_truncate_data_blocks_range+0x342/0x350
Call Trace:
f2fs_truncate_blocks+0x36d/0x3c0
f2fs_truncate+0x88/0x110
f2fs_setattr+0x3e1/0x460
notify_change+0x2da/0x400
do_truncate+0x6d/0xb0
do_sys_ftruncate+0xf1/0x160
do_syscall_64+0x43/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The reason is dec_valid_block_count() will trigger kernel panic due to
inconsistent count in between inode.i_blocks and actual block.
To avoid panic, let's just print debug message and set SBI_NEED_FSCK to
give a hint to fsck for latter repairing.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix build warning and add unlikely]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
With below mkfs and mount option:
MKFS_OPTIONS -- -O extra_attr -O project_quota -O inode_checksum -O flexible_inline_xattr -O inode_crtime -f
MOUNT_OPTIONS -- -o noinline_xattr
We may miss xattr data with below testcase:
- mkdir dir
- setfattr -n "user.name" -v 0 dir
- for ((i = 0; i < 190; i++)) do touch dir/$i; done
- umount
- mount
- getfattr -n "user.name" dir
user.name: No such attribute
The root cause is that we persist xattr data into reserved inline xattr
space, even if inline_xattr is not enable in inline directory inode, after
inline dentry conversion, reserved space no longer exists, so that xattr
data missed.
Let's use inline xattr space only if inline_xattr flag is set on inode
to fix this iusse.
Fixes: 6afc662e68 ("f2fs: support flexible inline xattr size")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
With below mkfs and mount option, generic/339 of fstest will report that
scratch image becomes corrupted.
MKFS_OPTIONS -- -O extra_attr -O project_quota -O inode_checksum -O flexible_inline_xattr -O inode_crtime -f /dev/zram1
MOUNT_OPTIONS -- -o acl,user_xattr -o discard,noinline_xattr /dev/zram1 /mnt/scratch_f2fs
[ASSERT] (f2fs_check_dirent_position:1315) --> Wrong position of dirent pino:1970, name: (...)
level:8, dir_level:0, pgofs:951, correct range:[900, 901]
In old kernel, inline data and directory always reserved 200 bytes in
inode layout, even if inline_xattr is disabled, then new kernel tries
to retrieve that space for non-inline xattr inode, but for inline dentry,
its layout size should be fixed, so we just keep that reserved space.
But the problem here is that, after inline dentry conversion, inline
dentry layout no longer exists, if we still reserve inline xattr space,
after dents updates, there will be a hole in inline xattr space, which
can break hierarchy hash directory structure.
This patch fixes this issue by retrieving inline xattr space after
inline dentry conversion.
Fixes: 6afc662e68 ("f2fs: support flexible inline xattr size")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
There are some places in where we missed to unlock page or unlock page
incorrectly, fix them.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
As Jungyeon reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=203211
- Overview
When mounting the attached crafted image and making a new file, I got this error and the error messages keep repeating.
The image is intentionally fuzzed from a normal f2fs image for testing and I run with option CONFIG_F2FS_CHECK_FS on.
- Reproduces
mkdir test
mount -t f2fs tmp.img test
cd test
touch t
- Messages
[ 58.820451] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.821485] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.822530] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.823571] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.824616] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.825640] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.826663] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.827698] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.828719] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.829759] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.830783] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.831828] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.832869] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.833888] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.834945] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.835996] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.837028] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.838051] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.839072] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.840100] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.841147] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.842186] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.843214] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.844267] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.845282] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.846305] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
[ 58.847341] F2FS-fs (sdb): Inconsistent segment (1) type [1, 0] in SSA and SIT
... (repeating)
During GC, if segment type stored in SSA and SIT is inconsistent, we just
skip migrating current segment directly, since we need to know the exact
type to decide the migration function we use.
So in foreground GC, we will easily run into a infinite loop as we may
select the same victim segment which has inconsistent type due to greedy
policy. In order to end up this, we choose to shutdown filesystem. For
backgrond GC, we need to do that as well, so that we can avoid latter
potential infinite looped foreground GC.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Commit af033b2aa8 ("f2fs: guarantee journalled quota data by checkpoint")
added function is_journalled_quota() in f2fs.h, but it located outside of
_LINUX_F2FS_H macro coverage, it has been fixed with commit 0af725fcb7
("f2fs: fix wrong #endif").
But anyway, in order to avoid making same mistake latter, let's add single
line comment to notice which #if the last #endif is corresponding to.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: Remove unnecessary empty EOL]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
As Hagbard Celine reported:
Hi, this is a long standing bug that I've hit before on older kernels,
but I was not able to get the syslog saved because of the nature of
the bug. This time I had booted form a pen-drive, and was able to save
the log to it's efi-partition.
What i did to trigger it was to create a partition and format it f2fs,
then mount it with options:
"rw,relatime,lazytime,background_gc=on,disable_ext_identify,discard,heap,user_xattr,inline_xattr,acl,inline_data,inline_dentry,flush_merge,data_flush,extent_cache,mode=adaptive,active_logs=6,whint_mode=fs-based,alloc_mode=default,fsync_mode=strict".
Then I unpacked a big .tar.xz to the partition (I used a
gentoo-stage3-tarball as I was in process of installing Gentoo).
Same options just without data_flush gives no problems.
Mar 20 20:54:01 usbgentoo kernel: FAT-fs (nvme0n1p4): Volume was not
properly unmounted. Some data may be corrupt. Please run fsck.
Mar 20 21:05:23 usbgentoo kernel: kworker/dying (1588) used greatest
stack depth: 12064 bytes left
Mar 20 21:06:40 usbgentoo kernel: BUG: stack guard page was hit at
00000000a4b0733c (stack is 0000000056016422..0000000096e7463f)
Mar 20 21:06:40 usbgentoo kernel: kernel stack overflow
......
Mar 20 21:06:40 usbgentoo kernel: Call Trace:
Mar 20 21:06:40 usbgentoo kernel: read_node_page+0x71/0xf0
Mar 20 21:06:40 usbgentoo kernel: ? xas_load+0x8/0x50
Mar 20 21:06:40 usbgentoo kernel: __get_node_page+0x73/0x2a0
Mar 20 21:06:40 usbgentoo kernel: f2fs_get_dnode_of_data+0x34e/0x580
Mar 20 21:06:40 usbgentoo kernel: f2fs_write_inline_data+0x5e/0x2a0
Mar 20 21:06:40 usbgentoo kernel: __write_data_page+0x421/0x690
Mar 20 21:06:40 usbgentoo kernel: f2fs_write_cache_pages+0x1cf/0x460
Mar 20 21:06:40 usbgentoo kernel: f2fs_write_data_pages+0x2b3/0x2e0
Mar 20 21:06:40 usbgentoo kernel: ? f2fs_inode_chksum_verify+0x1d/0xc0
Mar 20 21:06:40 usbgentoo kernel: ? read_node_page+0x71/0xf0
Mar 20 21:06:40 usbgentoo kernel: do_writepages+0x3c/0xd0
Mar 20 21:06:40 usbgentoo kernel: __filemap_fdatawrite_range+0x7c/0xb0
Mar 20 21:06:40 usbgentoo kernel: f2fs_sync_dirty_inodes+0xf2/0x200
Mar 20 21:06:40 usbgentoo kernel: f2fs_balance_fs_bg+0x2a3/0x2c0
Mar 20 21:06:40 usbgentoo kernel: ? f2fs_inode_dirtied+0x21/0xc0
Mar 20 21:06:40 usbgentoo kernel: f2fs_balance_fs+0xd6/0x2b0
Mar 20 21:06:40 usbgentoo kernel: __write_data_page+0x4fb/0x690
......
Mar 20 21:06:40 usbgentoo kernel: __writeback_single_inode+0x2a1/0x340
Mar 20 21:06:40 usbgentoo kernel: ? soft_cursor+0x1b4/0x220
Mar 20 21:06:40 usbgentoo kernel: writeback_sb_inodes+0x1d5/0x3e0
Mar 20 21:06:40 usbgentoo kernel: __writeback_inodes_wb+0x58/0xa0
Mar 20 21:06:40 usbgentoo kernel: wb_writeback+0x250/0x2e0
Mar 20 21:06:40 usbgentoo kernel: ? 0xffffffff8c000000
Mar 20 21:06:40 usbgentoo kernel: ? cpumask_next+0x16/0x20
Mar 20 21:06:40 usbgentoo kernel: wb_workfn+0x2f6/0x3b0
Mar 20 21:06:40 usbgentoo kernel: ? __switch_to_asm+0x40/0x70
Mar 20 21:06:40 usbgentoo kernel: process_one_work+0x1f5/0x3f0
Mar 20 21:06:40 usbgentoo kernel: worker_thread+0x28/0x3c0
Mar 20 21:06:40 usbgentoo kernel: ? rescuer_thread+0x330/0x330
Mar 20 21:06:40 usbgentoo kernel: kthread+0x10e/0x130
Mar 20 21:06:40 usbgentoo kernel: ? kthread_create_on_node+0x60/0x60
Mar 20 21:06:40 usbgentoo kernel: ret_from_fork+0x35/0x40
The root cause is that we run into an infinite recursive calling in
between f2fs_balance_fs_bg and writepage() as described below:
- f2fs_write_data_pages --- A
- __write_data_page
- f2fs_balance_fs
- f2fs_balance_fs_bg --- B
- f2fs_sync_dirty_inodes
- filemap_fdatawrite
- f2fs_write_data_pages --- A
...
- f2fs_balance_fs_bg --- B
...
In order to fix this issue, let's detect such condition in __write_data_page()
and just skip calling f2fs_balance_fs() recursively.
Reported-by: Hagbard Celine <hagbardcelin@gmail.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
f2fs_hw_support_discard() only tests if the super block device supports
discard. However, for a multi-device volume, not all disks used may
support discard. Improve the check performed to test all devices of
the volume and report discard as supported if at least one device of
the volume supports discard. To implement this, introduce the helper
function f2fs_bdev_support_discard(), which returns true for zoned block
devices (where discard is processed as a zone reset) and for regular
disks supporting the discard command.
f2fs_bdev_support_discard() is also used in __queue_discard_cmd() to
handle discard command issuing for a particular device of the volume.
That is, prevent issuing a discard command for block devices that do
not support it.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
For zoned block devices, an array of zone types for each device is
allocated and initialized in order to determine if a section is stored
on a sequential zone (zone reset needed) or a conventional zone (no
zone reset needed and regular discard applies). Considering this usage,
the zone types stored in memory can be replaced with a bitmap to
indicate an equivalent information, that is, if a zone is sequential or
not. This reduces the memory usage for each zoned device by roughly 8:
on a 14TB disk with zones of 256 MB, the zone type array consumes
13x4KB pages while the bitmap uses only 2x4KB pages.
This patch changes the f2fs_dev_info structure blkz_type field to the
bitmap blkz_seq. Access to this bitmap is done using the helper
function f2fs_blkz_is_seq(), which is a rewrite of the function
get_blkz_type().
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
For a single device mount using a zoned block device, the zone
information for the device is stored in the sbi->devs single entry
array and sbi->s_ndevs is set to 1. This differs from a single device
mount using a regular block device which does not allocate sbi->devs
and sets sbi->s_ndevs to 0.
However, sbi->s_devs == 0 condition is used throughout the code to
differentiate a single device mount from a multi-device mount where
sbi->s_ndevs is always larger than 1. This results in problems with
single zoned block device volumes as these are treated as multi-device
mounts but do not have the start_blk and end_blk information set. One
of the problem observed is skipping of zone discard issuing resulting in
write commands being issued to full zones or unaligned to a zone write
pointer.
Fix this problem by simply treating the cases sbi->s_ndevs == 0 (single
regular block device mount) and sbi->s_ndevs == 1 (single zoned block
device mount) in the same manner. This is done by introducing the
helper function f2fs_is_multi_device() and using this helper in place
of direct tests of sbi->s_ndevs value, improving code readability.
Fixes: 7bb3a371d1 ("f2fs: Fix zoned block device support")
Cc: <stable@vger.kernel.org>
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
We've continued mainly to fix bugs in this round, as f2fs has been shipped
in more devices. Especially, we've focused on stabilizing checkpoint=disable
feature, and provided some interfaces for QA.
Enhancement:
- expose FS_NOCOW_FL for pin_file
- run discard jobs at unmount time with timeout
- tune discarding thread to avoid idling which consumes power
- some checking codes to address vulnerabilities
- give random value to i_generation
- shutdown with more flags for QA
Bug fix:
- clean up stale objects when mount is failed along with checkpoint=disable
- fix system being stuck due to wrong count by atomic writes
- handle some corrupted disk cases
- fix a deadlock in f2fs_read_inline_dir
We've also added some minor build errors and clean-up patches.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE00UqedjCtOrGVvQiQBSofoJIUNIFAlyKk4YACgkQQBSofoJI
UNIMVw//Rb3nmbQkMW/86DxtHDxuS8GEJmle0DiHeFMHgwy0ET0uZs9/AEfmuejC
95cXnF44QfVaFwkOXCK6aKXJXwN0+ZS0YvV/gPE8lgU6sdQhJBox5DC+rx+OwFq5
rZiF8qvE8iyM9Xt+RfMBGufzUb+LKBz0ozQFZpKJiNTBBf5vpeqMYASEEfxiEmZz
GvvUNSBRw39OB5zTl5l2hnoNqkoFu6XHnf4f9+DnraVi8SuQzj6hdqsx0nYTHfLi
Rax8kA4HUwoVgjhaLLXFbbhWIQ83bcZ0cj6wq7Lr7NbbIi7bKYP6sxtKjbe2Fuql
m9Chm2LIvD1BfJnjdTk2krqY7Z4bX/4gmXukno/8X/cjWkpBV6HFWS73iTgrJjU2
d8kBFXwlIn+JlATSjsTtdfvKkTwxUhaGw1bBA96Am4c5tLQyOqyYWcfQA/tam/v4
dM9EQX5ZeRb6NXDeIxkXNfTSpDRnqlhJsTV5aK8qporyF1RkKVbyCpSt1P4q3KO5
UwsGZLFAVMzFaUVfyIS7dR5QVczQUTCH4g0yFNpBMvF8epOA4+jbYxQeGZfqFK3H
mTC/Ba+VWWdYW2pZRNc9TnBsHg/xadMJq7EQb/ykGBe6JZJfB0wREj4LSr1lGK9a
cU8JFGyqg1Rt/uRP0bb5IIec1YVton3Lq8ND9VZPNcV/mS5Gehg=
=9BoH
-----END PGP SIGNATURE-----
Merge tag 'f2fs-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
"We've continued mainly to fix bugs in this round, as f2fs has been
shipped in more devices. Especially, we've focused on stabilizing
checkpoint=disable feature, and provided some interfaces for QA.
Enhancements:
- expose FS_NOCOW_FL for pin_file
- run discard jobs at unmount time with timeout
- tune discarding thread to avoid idling which consumes power
- some checking codes to address vulnerabilities
- give random value to i_generation
- shutdown with more flags for QA
Bug fixes:
- clean up stale objects when mount is failed along with
checkpoint=disable
- fix system being stuck due to wrong count by atomic writes
- handle some corrupted disk cases
- fix a deadlock in f2fs_read_inline_dir
We've also added some minor build error fixes and clean-up patches"
* tag 'f2fs-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (53 commits)
f2fs: set pin_file under CAP_SYS_ADMIN
f2fs: fix to avoid deadlock in f2fs_read_inline_dir()
f2fs: fix to adapt small inline xattr space in __find_inline_xattr()
f2fs: fix to do sanity check with inode.i_inline_xattr_size
f2fs: give some messages for inline_xattr_size
f2fs: don't trigger read IO for beyond EOF page
f2fs: fix to add refcount once page is tagged PG_private
f2fs: remove wrong comment in f2fs_invalidate_page()
f2fs: fix to use kvfree instead of kzfree
f2fs: print more parameters in trace_f2fs_map_blocks
f2fs: trace f2fs_ioc_shutdown
f2fs: fix to avoid deadlock of atomic file operations
f2fs: fix to dirty inode for i_mode recovery
f2fs: give random value to i_generation
f2fs: no need to take page lock in readdir
f2fs: fix to update iostat correctly in IPU path
f2fs: fix encrypted page memory leak
f2fs: make fault injection covering __submit_flush_wait()
f2fs: fix to retry fill_super only if recovery failed
f2fs: silence VM_WARN_ON_ONCE in mempool_alloc
...
Android uses pin_file for uncrypt during OTA, and that should be managed by
CAP_SYS_ADMIN only.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
With below testcase, we will fail to find existed xattr entry:
1. mkfs.f2fs -O extra_attr -O flexible_inline_xattr /dev/zram0
2. mount -t f2fs -o inline_xattr_size=1 /dev/zram0 /mnt/f2fs/
3. touch /mnt/f2fs/file
4. setfattr -n "user.name" -v 0 /mnt/f2fs/file
5. getfattr -n "user.name" /mnt/f2fs/file
/mnt/f2fs/file: user.name: No such attribute
The reason is for inode which has very small inline xattr size,
__find_inline_xattr() will fail to traverse any entry due to first
entry may not be loaded from xattr node yet, later, we may skip to
check entire xattr datas in __find_xattr(), result in such wrong
condition.
This patch adds condition to check such case to avoid this issue.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
As Paul Bandha reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=202709
When I run the poc on the mounted f2fs img I get a buffer overflow in
read_inline_xattr due to there being no sanity check on the value of
i_inline_xattr_size.
I created the img by just modifying the value of i_inline_xattr_size
in the inode:
i_name [test1.txt]
i_ext: fofs:0 blkaddr:0 len:0
i_extra_isize [0x 18 : 24]
i_inline_xattr_size [0x ffff : 65535]
i_addr[ofs] [0x 0 : 0]
mkdir /mnt/f2fs
mount ./f2fs1.img /mnt/f2fs
gcc poc.c -o poc
./poc
int main() {
int y = syscall(SYS_listxattr, "/mnt/f2fs/test1.txt", NULL, 0);
printf("ret %d", y);
printf("errno: %d\n", errno);
}
BUG: KASAN: slab-out-of-bounds in read_inline_xattr+0x18f/0x260
Read of size 262140 at addr ffff88011035efd8 by task f2fs1poc/3263
CPU: 0 PID: 3263 Comm: f2fs1poc Not tainted 4.18.0-custom #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/2014
Call Trace:
dump_stack+0x71/0xab
print_address_description+0x83/0x250
kasan_report+0x213/0x350
memcpy+0x1f/0x50
read_inline_xattr+0x18f/0x260
read_all_xattrs+0xba/0x190
f2fs_listxattr+0x9d/0x3f0
listxattr+0xb2/0xd0
path_listxattr+0x93/0xe0
do_syscall_64+0x9d/0x220
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Let's add sanity check for inode.i_inline_xattr_size during f2fs_iget()
to avoid this issue.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch adds some kernel messages when user sets wrong inline_xattr_size.
Fixes: 500e0b28ec ("f2fs: fix to check inline_xattr_size boundary correctly")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In f2fs_mpage_readpages(), if page is beyond EOF, we should just
zero out it, but previously, before checking previous mapping
info, we missed to check filesize boundary, fix it.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
As Gao Xiang reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=202749
f2fs may skip pageout() due to incorrect page reference count.
The problem here is that MM defined the rule [1] very clearly that
once page was set with PG_private flag, we should increment the
refcount in that page, also main flows like pageout(), migrate_page()
will assume there is one additional page reference count if
page_has_private() returns true.
But currently, f2fs won't add/del refcount when changing PG_private
flag. Anyway, f2fs should follow MM's rule to make MM's related flows
running as expected.
[1] https://lore.kernel.org/lkml/2b19b3c4-2bc4-15fa-15cc-27a13e5c7af1@aol.com/
Reported-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Since 8c242db9b8 ("f2fs: fix stale ATOMIC_WRITTEN_PAGE private pointer"),
we've started to not skip clear private flag for atomic_write page
truncation, so removing old wrong comment in f2fs_invalidate_page().
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
As Jiqun Li reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=202747
System can panic due to using wrong allocate/free function pair
in xattr interface:
- use kvmalloc to allocate memory
- use kzfree to free memory
Let's fix to use kvfree instead of kzfree, BTW, we are safe to
get rid of kzfree, since there is no such confidential data stored
as xattr, we don't need to zero it before free memory.
Fixes: 5222595d09 ("f2fs: use kvmalloc, if kmalloc is failed")
Reported-by: Jiqun Li <jiqun.li@unisoc.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
As Seulbae Kim reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=202637
We didn't recover permission field correctly after sudden power-cut,
the reason is in setattr we didn't add inode into global dirty list
once i_mode is changed, so latter checkpoint triggered by fsync will
not flush last i_mode into disk, result in this problem, fix it.
Reported-by: Seulbae Kim <seulbae@gatech.edu>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This follows to give random number to i_generation along with commit
2325306802 ("ext4: improve smp scalability for inode generation")
This can be used for DUN for UFS HW encryption.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
VFS will take inode_lock for readdir, therefore no need to
take page lock in readdir at all just as the majority of
other generic filesystems.
This patch improves concurrency since .iterate_shared
was introduced to VFS years ago.
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In error path of IPU, we didn't account iostat correctly, fix it.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
For IPU path of f2fs_do_write_data_page(), in its error path, we
need to release encrypted page and fscrypt context, otherwise it
will cause memory leak.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch changes to allow failure of f2fs_bio_alloc() in
__submit_flush_wait(), which can simulate flush error in checkpoint()
for covering more error paths.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
With current retry mechanism in f2fs_fill_super, first fill_super
fails due to no memory, then second fill_super runs w/o recovery,
if we succeed, we may lose fsynced data, it doesn't make sense.
Let's retry fill_super only if it occurs non-ENOMEM error during
recovery.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Note that __GFP_ZERO is not supported for mempool_alloc,
which also documented in the mempool_alloc comments.
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
First: Ted, Jaegeuk, and I have decided to add me as a co-maintainer for
fscrypt, and we're now using a shared git tree. So we've updated
MAINTAINERS accordingly, and I'm doing the pull request this time.
The actual changes for v5.1 are:
- Remove the fs-specific kconfig options like CONFIG_EXT4_ENCRYPTION and
make fscrypt support for all fscrypt-capable filesystems be controlled
by CONFIG_FS_ENCRYPTION, similar to how CONFIG_QUOTA works.
- Improve error code for rename() and link() into encrypted directories.
- Various cleanups.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCXH2YDRQcZWJpZ2dlcnNA
Z29vZ2xlLmNvbQAKCRDzXCl4vpKOK+SAAQCWYOTwYko8uE8Ze8i2fiUm0vr91NOg
zj5DGmK7Izxy/gEAsNDOVA7zWrDg/f5600/7aLpDQQTGHA38YVsgiyd7DgY=
=S3tT
-----END PGP SIGNATURE-----
Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt
Pull fscrypt updates from Eric Biggers:
"First: Ted, Jaegeuk, and I have decided to add me as a co-maintainer
for fscrypt, and we're now using a shared git tree. So we've updated
MAINTAINERS accordingly, and I'm doing the pull request this time.
The actual changes for v5.1 are:
- Remove the fs-specific kconfig options like CONFIG_EXT4_ENCRYPTION
and make fscrypt support for all fscrypt-capable filesystems be
controlled by CONFIG_FS_ENCRYPTION, similar to how CONFIG_QUOTA
works.
- Improve error code for rename() and link() into encrypted
directories.
- Various cleanups"
* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
MAINTAINERS: add Eric Biggers as an fscrypt maintainer
fscrypt: return -EXDEV for incompatible rename or link into encrypted dir
fscrypt: remove filesystem specific build config option
f2fs: use IS_ENCRYPTED() to check encryption status
ext4: use IS_ENCRYPTED() to check encryption status
fscrypt: remove CRYPTO_CTR dependency
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlx63XIQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpp2vEACfrrQsap7R+Av28mmXpmXi2FPa3g5Tev1t
yYjK2qHvhlMZjPTYw3hCmbYdDDczlF7PEgSE2x2DjdcsYapb8Fy1lZ2X16c7ztBR
HD/t9b5AVSQsczZzKgv3RqsNtTnjzS5V0A8XH8FAP2QRgiwDMwSN6G0FP0JBLbE/
ZgxQrH1Iy1F33Wz4hI3Z7dEghKPZrH1IlegkZCEu47q9SlWS76qUetSy2GEtchOl
3Lgu54mQZyVdI5/QZf9DyMDLF6dIz3tYU2qhuo01AHjGRCC72v86p8sIiXcUr94Q
8pbegJhJ/g8KBol9Qhv3+pWG/QUAZwi/ZwasTkK+MJ4klRXfOrznxPubW1z6t9Vn
QRo39Po5SqqP0QWAscDxCFjESIQlWlKa+LZurJL7DJDCUGrSgzTpnVwFqKwc5zTP
HJa5MT2tEeL2TfUYRYCfh0ZV0elINdHA1y1klDBh38drh4EWr2gW8xdseGYXqRjh
fLgEpoF7VQ8kTvxKN+E4jZXkcZmoLmefp0ZyAbblS6IawpPVC7kXM9Fdn2OU8f2c
fjVjvSiqxfeN6dnpfeLDRbbN9894HwgP/LPropJOQ7KmjCorQq5zMDkAvoh3tElq
qwluRqdBJpWT/F05KweY+XVW8OawIycmUWqt6JrVNoIDAK31auHQv47kR0VA4OvE
DRVVhYpocw==
=VBaU
-----END PGP SIGNATURE-----
Merge tag 'for-5.1/block-20190302' of git://git.kernel.dk/linux-block
Pull block layer updates from Jens Axboe:
"Not a huge amount of changes in this round, the biggest one is that we
finally have Mings multi-page bvec support merged. Apart from that,
this pull request contains:
- Small series that avoids quiescing the queue for sysfs changes that
match what we currently have (Aleksei)
- Series of bcache fixes (via Coly)
- Series of lightnvm fixes (via Mathias)
- NVMe pull request from Christoph. Nothing major, just SPDX/license
cleanups, RR mp policy (Hannes), and little fixes (Bart,
Chaitanya).
- BFQ series (Paolo)
- Save blk-mq cpu -> hw queue mapping, removing a pointer indirection
for the fast path (Jianchao)
- fops->iopoll() added for async IO polling, this is a feature that
the upcoming io_uring interface will use (Christoph, me)
- Partition scan loop fixes (Dongli)
- mtip32xx conversion from managed resource API (Christoph)
- cdrom registration race fix (Guenter)
- MD pull from Song, two minor fixes.
- Various documentation fixes (Marcos)
- Multi-page bvec feature. This brings a lot of nice improvements
with it, like more efficient splitting, larger IOs can be supported
without growing the bvec table size, and so on. (Ming)
- Various little fixes to core and drivers"
* tag 'for-5.1/block-20190302' of git://git.kernel.dk/linux-block: (117 commits)
block: fix updating bio's front segment size
block: Replace function name in string with __func__
nbd: propagate genlmsg_reply return code
floppy: remove set but not used variable 'q'
null_blk: fix checking for REQ_FUA
block: fix NULL pointer dereference in register_disk
fs: fix guard_bio_eod to check for real EOD errors
blk-mq: use HCTX_TYPE_DEFAULT but not 0 to index blk_mq_tag_set->map
block: optimize bvec iteration in bvec_iter_advance
block: introduce mp_bvec_for_each_page() for iterating over page
block: optimize blk_bio_segment_split for single-page bvec
block: optimize __blk_segment_map_sg() for single-page bvec
block: introduce bvec_nth_page()
iomap: wire up the iopoll method
block: add bio_set_polled() helper
block: wire up block device iopoll method
fs: add an iopoll method to struct file_operations
loop: set GENHD_FL_NO_PART_SCAN after blkdev_reread_part()
loop: do not print warn message if partition scan is successful
block: bounce: make sure that bvec table is updated
...
If we met this once, let fsck.f2fs clear this only.
Note that, this addresses all the subtle fault injection test.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
As Dan reported:
"We put an upper bound on ->write_io_size_bits but we don't have a lower
bound."
So let's add lower bound check for ->write_io_size_bits in parse_options().
[We don't allow configuring ->write_io_size_bits to zero, since at least
we need to fill one dummy page for aligned IO.]
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
We use below condition to check inline_xattr_size boundary:
if (!F2FS_OPTION(sbi).inline_xattr_size ||
F2FS_OPTION(sbi).inline_xattr_size >=
DEF_ADDRS_PER_INODE -
F2FS_TOTAL_EXTRA_ATTR_SIZE -
DEF_INLINE_RESERVED_SIZE -
DEF_MIN_INLINE_SIZE)
There is there problems in that check:
- we should allow inline_xattr_size equaling to min size of inline
{data,dentry} area.
- F2FS_TOTAL_EXTRA_ATTR_SIZE and inline_xattr_size are based on
different size unit, previous one is 4 bytes, latter one is 1 bytes.
- DEF_MIN_INLINE_SIZE only indicate min size of inline data area,
however, we need to consider min size of inline dentry area as well,
minimal inline dentry should at least contain two entries: '.' and
'..', so that min inline_dentry size is 40 bytes.
.bitmap 1 * 1 = 1
.reserved 1 * 1 = 1
.dentry 11 * 2 = 22
.filename 8 * 2 = 16
total 40
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>