Commit Graph

74668 Commits

Author SHA1 Message Date
Sungjong Seo
0c336d6e33 exfat: fix incorrect loading of i_blocks for large files
When calculating i_blocks, there was a mistake that was masked with a
32-bit variable. So i_blocks for files larger than 4 GiB had incorrect
values. Mask with a 64-bit variable instead of 32-bit one.

Fixes: 5f2aa07507 ("exfat: add inode operations")
Cc: stable@vger.kernel.org # v5.7+
Reported-by: Ganapathi Kamath <hgkamath@hotmail.com>
Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
2021-11-01 07:49:21 +09:00
Gao Xiang
a0961f351d erofs: don't trigger WARN() when decompression fails
syzbot reported a WARNING [1] due to corrupted compressed data.

As Dmitry said, "If this is not a kernel bug, then the code should
not use WARN. WARN if for kernel bugs and is recognized as such by
all testing systems and humans."

[1] https://lore.kernel.org/r/000000000000b3586105cf0ff45e@google.com

Link: https://lore.kernel.org/r/20211025074311.130395-1-hsiangkao@linux.alibaba.com
Cc: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Reported-by: syzbot+d8aaffc3719597e8cfb4@syzkaller.appspotmail.com
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-10-31 21:00:28 +08:00
Changcheng Deng
2a09b57507 xfs: use swap() to make code cleaner
Use swap() in order to make code cleaner. Issue found by coccinelle.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Changcheng Deng <deng.changcheng@zte.com.cn>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-30 09:28:55 -07:00
Wan Jiabing
0b9007ec7b xfs: Remove duplicated include in xfs_super
Fix following checkincludes.pl warning:
./fs/xfs/xfs_super.c: xfs_btree.h is included more than once.

The include is in line 15. Remove the duplicated here.

Signed-off-by: Wan Jiabing <wanjiabing@vivo.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-10-30 09:28:49 -07:00
Eric W. Biederman
e21294a7aa signal: Replace force_sigsegv(SIGSEGV) with force_fatal_sig(SIGSEGV)
Now that force_fatal_sig exists it is unnecessary and a bit confusing
to use force_sigsegv in cases where the simpler force_fatal_sig is
wanted.  So change every instance we can to make the code clearer.

Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Link: https://lkml.kernel.org/r/877de7jrev.fsf@disp2133
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-10-29 14:31:34 -05:00
Eric W. Biederman
111e70490d exit/kthread: Have kernel threads return instead of calling do_exit
In 2009 Oleg reworked[1] the kernel threads so that it is not
necessary to call do_exit if you are not using kthread_stop().  Remove
the explicit calls of do_exit and complete_and_exit (with a NULL
completion) that were previously necessary.

[1] 63706172f3 ("kthreads: rework kthread_stop()")
Link: https://lkml.kernel.org/r/20211020174406.17889-12-ebiederm@xmission.com
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2021-10-29 14:31:33 -05:00
Linus Torvalds
fd919bbd33 Merge tag 'for-5.15-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
 "Last minute fixes for crash on 32bit architectures when compression is
  in use. It's a regression introduced in 5.15-rc and I'd really like
  not let this into the final release, fixes via stable trees would add
  unnecessary delay.

  The problem is on 32bit architectures with highmem enabled, the pages
  for compression may need to be kmapped, while the patches removed that
  as we don't use GFP_HIGHMEM allocations anymore. The pages that don't
  come from local allocation still may be from highmem. Despite being on
  32bit there's enough such ARM machines in use so it's not a marginal
  issue.

  I did full reverts of the patches one by one instead of a huge one.
  There's one exception for the "lzo" revert as there was an
  intermediate patch touching the same code to make it compatible with
  subpage. I can't revert that one too, so the revert in lzo.c is
  manual. Qu Wenruo has worked on that with me and verified the changes"

* tag 'for-5.15-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  Revert "btrfs: compression: drop kmap/kunmap from lzo"
  Revert "btrfs: compression: drop kmap/kunmap from zlib"
  Revert "btrfs: compression: drop kmap/kunmap from zstd"
  Revert "btrfs: compression: drop kmap/kunmap from generic helpers"
2021-10-29 10:46:59 -07:00
Chao Yu
10a2687856 f2fs: support fault injection for dquot_initialize()
This patch adds a new function f2fs_dquot_initialize() to wrap
dquot_initialize(), and it supports to inject fault into
f2fs_dquot_initialize() to simulate inner failure occurs in
dquot_initialize().

Usage:
a) echo 65536 > /sys/fs/f2fs/<dev>/inject_type or
b) mount -o fault_type=65536 <dev> <mountpoint>

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-10-29 10:38:53 -07:00
Chao Yu
ca98d72141 f2fs: fix incorrect return value in f2fs_sanity_check_ckpt()
As Pavel Machek reported in [1]

This code looks quite confused: part of function returns 1 on
corruption, part returns -errno. The problem is not stable-specific.

[1] https://lkml.org/lkml/2021/9/19/207

Let's fix to make 'insane cp_payload case' to return 1 rater than
EFSCORRUPTED, so that return value can be kept consistent for all
error cases, it can avoid confusion of code logic.

Fixes: 65ddf65648 ("f2fs: fix to do sanity check for sb/cp fields correctly")
Reported-by: Pavel Machek <pavel@denx.de>
Reviewed-by: Pavel Machek <pavel@denx.de>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-10-29 10:37:35 -07:00
Pavel Begunkov
1d5f5ea7cb io-wq: remove worker to owner tw dependency
INFO: task iou-wrk-6609:6612 blocked for more than 143 seconds.
      Not tainted 5.15.0-rc5-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:iou-wrk-6609    state:D stack:27944 pid: 6612 ppid:  6526 flags:0x00004006
Call Trace:
 context_switch kernel/sched/core.c:4940 [inline]
 __schedule+0xb44/0x5960 kernel/sched/core.c:6287
 schedule+0xd3/0x270 kernel/sched/core.c:6366
 schedule_timeout+0x1db/0x2a0 kernel/time/timer.c:1857
 do_wait_for_common kernel/sched/completion.c:85 [inline]
 __wait_for_common kernel/sched/completion.c:106 [inline]
 wait_for_common kernel/sched/completion.c:117 [inline]
 wait_for_completion+0x176/0x280 kernel/sched/completion.c:138
 io_worker_exit fs/io-wq.c:183 [inline]
 io_wqe_worker+0x66d/0xc40 fs/io-wq.c:597
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

io-wq worker may submit a task_work to the master task and upon
io_worker_exit() wait for the tw to get executed. The problem appears
when the master task is waiting in coredump.c:

468                     freezer_do_not_count();
469                     wait_for_completion(&core_state->startup);
470                     freezer_count();

Apparently having some dependency on children threads getting everything
stuck. Workaround it by cancelling the taks_work callback that causes it
before going into io_worker_exit() waiting.

p.s. probably a better option is to not submit tw elevating the refcount
in the first place, but let's leave this excercise for the future.

Cc: stable@vger.kernel.org
Reported-and-tested-by: syzbot+27d62ee6f256b186883e@syzkaller.appspotmail.com
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/142a716f4ed936feae868959059154362bfa8c19.1635509451.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-29 09:49:33 -06:00
Jens Axboe
f75d118349 io_uring: harder fdinfo sq/cq ring iterating
The ring iteration is racy, which isn't necessarily a problem except it
can cause us to iterate the whole thing. That isn't desired or ideal,
and it can lead to excessive runtimes of reading fdinfo.

Cap the iteration at tail - head OR the ring size. While in there, clean
up the ring masking and just dump the raw values along with the masks.
That provides more useful debug info.

Fixes: 83f84356bc ("io_uring: add more uring info to fdinfo for debug")
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-29 09:49:33 -06:00
yangerkun
9a25440376 ovl: fix use after free in struct ovl_aio_req
Example for triggering use after free in a overlay on ext4 setup:

aio_read
  ovl_read_iter
    vfs_iter_read
      ext4_file_read_iter
        ext4_dio_read_iter
          iomap_dio_rw -> -EIOCBQUEUED
          /*
	   * Here IO is completed in a separate thread,
	   * ovl_aio_cleanup_handler() frees aio_req which has iocb embedded
	   */
          file_accessed(iocb->ki_filp); /**BOOM**/

Fix by introducing a refcount in ovl_aio_req similarly to aio_kiocb.  This
guarantees that iocb is only freed after vfs_read/write_iter() returns on
underlying fs.

Fixes: 2406a307ac ("ovl: implement async IO routines")
Signed-off-by: yangerkun <yangerkun@huawei.com>
Link: https://lore.kernel.org/r/20210930032228.3199690-3-yangerkun@huawei.com/
Cc: <stable@vger.kernel.org> # v5.6
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-10-29 13:48:19 +02:00
David Sterba
ccaa66c8dd Revert "btrfs: compression: drop kmap/kunmap from lzo"
This reverts commit 8c945d32e6.

The kmaps in compression code are still needed and cause crashes on
32bit machines (ARM, x86). Reproducible eg. by running fstest btrfs/004
with enabled LZO or ZSTD compression.

The revert does not apply cleanly due to changes in a6e66e6f8c
("btrfs: rework lzo_decompress_bio() to make it subpage compatible")
that reworked the page iteration so the revert is done to be equivalent
to the original code.

Link: https://lore.kernel.org/all/CAJCQCtT+OuemovPO7GZk8Y8=qtOObr0XTDp8jh4OHD6y84AFxw@mail.gmail.com/
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214839
Tested-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-29 13:25:43 +02:00
David Sterba
55276e14df Revert "btrfs: compression: drop kmap/kunmap from zlib"
This reverts commit 696ab562e6.

The kmaps in compression code are still needed and cause crashes on
32bit machines (ARM, x86). Reproducible eg. by running fstest btrfs/004
with enabled LZO or ZSTD compression.

Link: https://lore.kernel.org/all/CAJCQCtT+OuemovPO7GZk8Y8=qtOObr0XTDp8jh4OHD6y84AFxw@mail.gmail.com/
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214839
Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-29 13:03:05 +02:00
David Sterba
56ee254d23 Revert "btrfs: compression: drop kmap/kunmap from zstd"
This reverts commit bbaf9715f3.

The kmaps in compression code are still needed and cause crashes on
32bit machines (ARM, x86). Reproducible eg. by running fstest btrfs/004
with enabled LZO or ZSTD compression.

Example stacktrace with ZSTD on a 32bit ARM machine:

  Unable to handle kernel NULL pointer dereference at virtual address 00000000
  pgd = c4159ed3
  [00000000] *pgd=00000000
  Internal error: Oops: 5 [#1] PREEMPT SMP ARM
  Modules linked in:
  CPU: 0 PID: 210 Comm: kworker/u2:3 Not tainted 5.14.0-rc79+ #12
  Hardware name: Allwinner sun4i/sun5i Families
  Workqueue: btrfs-delalloc btrfs_work_helper
  PC is at mmiocpy+0x48/0x330
  LR is at ZSTD_compressStream_generic+0x15c/0x28c

  (mmiocpy) from [<c0629648>] (ZSTD_compressStream_generic+0x15c/0x28c)
  (ZSTD_compressStream_generic) from [<c06297dc>] (ZSTD_compressStream+0x64/0xa0)
  (ZSTD_compressStream) from [<c049444c>] (zstd_compress_pages+0x170/0x488)
  (zstd_compress_pages) from [<c0496798>] (btrfs_compress_pages+0x124/0x12c)
  (btrfs_compress_pages) from [<c043c068>] (compress_file_range+0x3c0/0x834)
  (compress_file_range) from [<c043c4ec>] (async_cow_start+0x10/0x28)
  (async_cow_start) from [<c0475c3c>] (btrfs_work_helper+0x100/0x230)
  (btrfs_work_helper) from [<c014ef68>] (process_one_work+0x1b4/0x418)
  (process_one_work) from [<c014f210>] (worker_thread+0x44/0x524)
  (worker_thread) from [<c0156aa4>] (kthread+0x180/0x1b0)
  (kthread) from [<c0100150>]

Link: https://lore.kernel.org/all/CAJCQCtT+OuemovPO7GZk8Y8=qtOObr0XTDp8jh4OHD6y84AFxw@mail.gmail.com/
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=214839
Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-29 13:02:50 +02:00
Filipe Manana
d1ed82f355 btrfs: remove root argument from check_item_in_log()
The root argument passed to check_item_in_log() always matches the root
of the given directory, so it can be eliminated.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-29 12:39:13 +02:00
Filipe Manana
6d9cc07215 btrfs: remove root argument from add_link()
The root argument for tree-log.c:add_link() always matches the root of the
given directory and the given inode, so it can eliminated.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-29 12:39:13 +02:00
Filipe Manana
4467af8809 btrfs: remove root argument from btrfs_unlink_inode()
The root argument passed to btrfs_unlink_inode() and its callee,
__btrfs_unlink_inode(), always matches the root of the given directory and
the given inode. So remove the argument and make __btrfs_unlink_inode()
use the root of the directory.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-29 12:39:13 +02:00
Filipe Manana
9798ba24cb btrfs: remove root argument from drop_one_dir_item()
The root argument for drop_one_dir_item() always matches the root of the
given directory inode, since each log tree is associated to one and only
one subvolume/root, so remove the argument.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-29 12:39:13 +02:00
Li Zhang
5d03dbebba btrfs: clear MISSING device status bit in btrfs_close_one_device
Reported bug: https://github.com/kdave/btrfs-progs/issues/389

There's a problem with scrub reporting aborted status but returning
error code 0, on a filesystem with missing and readded device.

Roughly these steps:

- mkfs -d raid1 dev1 dev2
- fill with data
- unmount
- make dev1 disappear
- mount -o degraded
- copy more data
- make dev1 appear again

Running scrub afterwards reports that the command was aborted, but the
system log message says the exit code was 0.

It seems that the cause of the error is decrementing
fs_devices->missing_devices but not clearing device->dev_state.  Every
time we umount filesystem, it would call close_ctree, And it would
eventually involve btrfs_close_one_device to close the device, but it
only decrements fs_devices->missing_devices but does not clear the
device BTRFS_DEV_STATE_MISSING bit. Worse, this bug will cause Integer
Overflow, because every time umount, fs_devices->missing_devices will
decrease. If fs_devices->missing_devices value hit 0, it would overflow.

With added debugging:

   loop1: detected capacity change from 0 to 20971520
   BTRFS: device fsid 56ad51f1-5523-463b-8547-c19486c51ebb devid 1 transid 21 /dev/loop1 scanned by systemd-udevd (2311)
   loop2: detected capacity change from 0 to 20971520
   BTRFS: device fsid 56ad51f1-5523-463b-8547-c19486c51ebb devid 2 transid 17 /dev/loop2 scanned by systemd-udevd (2313)
   BTRFS info (device loop1): flagging fs with big metadata feature
   BTRFS info (device loop1): allowing degraded mounts
   BTRFS info (device loop1): using free space tree
   BTRFS info (device loop1): has skinny extents
   BTRFS info (device loop1):  before clear_missing.00000000f706684d /dev/loop1 0
   BTRFS warning (device loop1): devid 2 uuid 6635ac31-56dd-4852-873b-c60f5e2d53d2 is missing
   BTRFS info (device loop1):  before clear_missing.0000000000000000 /dev/loop2 1
   BTRFS info (device loop1): flagging fs with big metadata feature
   BTRFS info (device loop1): allowing degraded mounts
   BTRFS info (device loop1): using free space tree
   BTRFS info (device loop1): has skinny extents
   BTRFS info (device loop1):  before clear_missing.00000000f706684d /dev/loop1 0
   BTRFS warning (device loop1): devid 2 uuid 6635ac31-56dd-4852-873b-c60f5e2d53d2 is missing
   BTRFS info (device loop1):  before clear_missing.0000000000000000 /dev/loop2 0
   BTRFS info (device loop1): flagging fs with big metadata feature
   BTRFS info (device loop1): allowing degraded mounts
   BTRFS info (device loop1): using free space tree
   BTRFS info (device loop1): has skinny extents
   BTRFS info (device loop1):  before clear_missing.00000000f706684d /dev/loop1 18446744073709551615
   BTRFS warning (device loop1): devid 2 uuid 6635ac31-56dd-4852-873b-c60f5e2d53d2 is missing
   BTRFS info (device loop1):  before clear_missing.0000000000000000 /dev/loop2 18446744073709551615

If fs_devices->missing_devices is 0, next time it would be 18446744073709551615

After apply this patch, the fs_devices->missing_devices seems to be
right:

  $ truncate -s 10g test1
  $ truncate -s 10g test2
  $ losetup /dev/loop1 test1
  $ losetup /dev/loop2 test2
  $ mkfs.btrfs -draid1 -mraid1 /dev/loop1 /dev/loop2 -f
  $ losetup -d /dev/loop2
  $ mount -o degraded /dev/loop1 /mnt/1
  $ umount /mnt/1
  $ mount -o degraded /dev/loop1 /mnt/1
  $ umount /mnt/1
  $ mount -o degraded /dev/loop1 /mnt/1
  $ umount /mnt/1
  $ dmesg

   loop1: detected capacity change from 0 to 20971520
   loop2: detected capacity change from 0 to 20971520
   BTRFS: device fsid 15aa1203-98d3-4a66-bcae-ca82f629c2cd devid 1 transid 5 /dev/loop1 scanned by mkfs.btrfs (1863)
   BTRFS: device fsid 15aa1203-98d3-4a66-bcae-ca82f629c2cd devid 2 transid 5 /dev/loop2 scanned by mkfs.btrfs (1863)
   BTRFS info (device loop1): flagging fs with big metadata feature
   BTRFS info (device loop1): allowing degraded mounts
   BTRFS info (device loop1): disk space caching is enabled
   BTRFS info (device loop1): has skinny extents
   BTRFS info (device loop1):  before clear_missing.00000000975bd577 /dev/loop1 0
   BTRFS warning (device loop1): devid 2 uuid 8b333791-0b3f-4f57-b449-1c1ab6b51f38 is missing
   BTRFS info (device loop1):  before clear_missing.0000000000000000 /dev/loop2 1
   BTRFS info (device loop1): checking UUID tree
   BTRFS info (device loop1): flagging fs with big metadata feature
   BTRFS info (device loop1): allowing degraded mounts
   BTRFS info (device loop1): disk space caching is enabled
   BTRFS info (device loop1): has skinny extents
   BTRFS info (device loop1):  before clear_missing.00000000975bd577 /dev/loop1 0
   BTRFS warning (device loop1): devid 2 uuid 8b333791-0b3f-4f57-b449-1c1ab6b51f38 is missing
   BTRFS info (device loop1):  before clear_missing.0000000000000000 /dev/loop2 1
   BTRFS info (device loop1): flagging fs with big metadata feature
   BTRFS info (device loop1): allowing degraded mounts
   BTRFS info (device loop1): disk space caching is enabled
   BTRFS info (device loop1): has skinny extents
   BTRFS info (device loop1):  before clear_missing.00000000975bd577 /dev/loop1 0
   BTRFS warning (device loop1): devid 2 uuid 8b333791-0b3f-4f57-b449-1c1ab6b51f38 is missing
   BTRFS info (device loop1):  before clear_missing.0000000000000000 /dev/loop2 1

CC: stable@vger.kernel.org # 4.19+
Signed-off-by: Li Zhang <zhanglikernel@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-29 12:39:13 +02:00
Anand Jain
5c78a5e7aa btrfs: call btrfs_check_rw_degradable only if there is a missing device
In open_ctree() in btrfs_check_rw_degradable() [1], we check each block
group individually if at least the minimum number of devices is available
for that profile. If all the devices are available, then we don't have to
check degradable.

[1]
open_ctree()
::
3559 if (!sb_rdonly(sb) && !btrfs_check_rw_degradable(fs_info, NULL)) {

Also before calling btrfs_check_rw_degradable() in open_ctee() at the
line number shown below [2] we call btrfs_read_chunk_tree() and down to
add_missing_dev() to record number of missing devices.

[2]
open_ctree()
::
3454         ret = btrfs_read_chunk_tree(fs_info);

btrfs_read_chunk_tree()
  read_one_chunk() / read_one_dev()
    add_missing_dev()

So, check if there is any missing device before btrfs_check_rw_degradable()
in open_ctree().

Also, with this the mount command could save ~16ms.[3] in the most
common case, that is no device is missing.

[3]
 1) * 16934.96 us | btrfs_check_rw_degradable [btrfs]();

CC: stable@vger.kernel.org # 4.19+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Anand Jain <anand.jain@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-29 12:39:13 +02:00
David Sterba
e77fbf9903 btrfs: send: prepare for v2 protocol
This is preparatory work for send protocol update to version 2 and
higher.

We have many pending protocol update requests but still don't have the
basic protocol rev in place, the first thing that must happen is to do
the actual versioning support.

The protocol version is u32 and is a new member in the send ioctl
struct. Validity of the version field is backed by a new flag bit. Old
kernels would fail when a higher version is requested. Version protocol
0 will pick the highest supported version, BTRFS_SEND_STREAM_VERSION,
  that's also exported in sysfs.

The version is still unchanged and will be increased once we have new
incompatible commands or stream updates.

Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-10-29 12:38:43 +02:00
Gautham Ananthakrishna
6f1b228529 ocfs2: fix race between searching chunks and release journal_head from buffer_head
Encountered a race between ocfs2_test_bg_bit_allocatable() and
jbd2_journal_put_journal_head() resulting in the below vmcore.

  PID: 106879  TASK: ffff880244ba9c00  CPU: 2   COMMAND: "loop3"
  Call trace:
    panic
    oops_end
    no_context
    __bad_area_nosemaphore
    bad_area_nosemaphore
    __do_page_fault
    do_page_fault
    page_fault
      [exception RIP: ocfs2_block_group_find_clear_bits+316]
    ocfs2_block_group_find_clear_bits [ocfs2]
    ocfs2_cluster_group_search [ocfs2]
    ocfs2_search_chain [ocfs2]
    ocfs2_claim_suballoc_bits [ocfs2]
    __ocfs2_claim_clusters [ocfs2]
    ocfs2_claim_clusters [ocfs2]
    ocfs2_local_alloc_slide_window [ocfs2]
    ocfs2_reserve_local_alloc_bits [ocfs2]
    ocfs2_reserve_clusters_with_limit [ocfs2]
    ocfs2_reserve_clusters [ocfs2]
    ocfs2_lock_refcount_allocators [ocfs2]
    ocfs2_make_clusters_writable [ocfs2]
    ocfs2_replace_cow [ocfs2]
    ocfs2_refcount_cow [ocfs2]
    ocfs2_file_write_iter [ocfs2]
    lo_rw_aio
    loop_queue_work
    kthread_worker_fn
    kthread
    ret_from_fork

When ocfs2_test_bg_bit_allocatable() called bh2jh(bg_bh), the
bg_bh->b_private NULL as jbd2_journal_put_journal_head() raced and
released the jounal head from the buffer head.  Needed to take bit lock
for the bit 'BH_JournalHead' to fix this race.

Link: https://lkml.kernel.org/r/1634820718-6043-1-git-send-email-gautham.ananthakrishna@oracle.com
Signed-off-by: Gautham Ananthakrishna <gautham.ananthakrishna@oracle.com>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: <rajesh.sivaramasubramaniom@oracle.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-10-28 17:18:55 -07:00
Amir Goldstein
a390ccb316 fuse: add FOPEN_NOFLUSH
Add flag returned by FUSE_OPEN and FUSE_CREATE requests to avoid flushing
data cache on close.

Different filesystems implement ->flush() is different ways:
 - Most disk filesystems do not implement ->flush() at all
 - Some network filesystem (e.g. nfs) flush local write cache of
   FMODE_WRITE file and send a "flush" command to server
 - Some network filesystem (e.g. cifs) flush local write cache of
   FMODE_WRITE file without sending an additional command to server

FUSE flushes local write cache of ANY file, even non FMODE_WRITE
and sends a "flush" command to server (if server implements it).

The FUSE implementation of ->flush() seems over agressive and
arbitrary and does not make a lot of sense when writeback caching is
disabled.

Instead of deciding on another arbitrary implementation that makes
sense, leave the choice of per-file flush behavior in the hands of
the server.

Link: https://lore.kernel.org/linux-fsdevel/CAJfpegspE8e6aKd47uZtSYX8Y-1e1FWS0VL0DH2Skb9gQP5RJQ@mail.gmail.com/
Suggested-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-10-28 10:20:31 +02:00
Miklos Szeredi
c6c745b810 fuse: only update necessary attributes
fuse_update_attributes() refreshes metadata for internal use.

Each use needs a particular set of attributes to be refreshed, but
currently that cannot be expressed and all but atime are refreshed.

Add a mask argument, which lets fuse_update_get_attr() to decide based on
the cache_mask and the inval_mask whether a GETATTR call is needed or not.

Reported-by: Yongji Xie <xieyongji@bytedance.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-10-28 09:45:33 +02:00
Miklos Szeredi
ec85537519 fuse: take cache_mask into account in getattr
When deciding to send a GETATTR request take into account the cache mask
(which attributes are always valid).  The cache mask takes precedence over
the invalid mask.

This results in the GETATTR request not being sent unnecessarily.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-10-28 09:45:33 +02:00
Miklos Szeredi
4b52f059b5 fuse: add cache_mask
If writeback_cache is enabled, then the size, mtime and ctime attributes of
regular files are always valid in the kernel's cache.  They are retrieved
from userspace only when the inode is freshly looked up.

Add a more generic "cache_mask", that indicates which attributes are
currently valid in cache.

This patch doesn't change behavior.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-10-28 09:45:33 +02:00
Miklos Szeredi
04d82db0c5 fuse: move reverting attributes to fuse_change_attributes()
In case of writeback_cache fuse_fillattr() would revert the queried
attributes to the cached version.

Move this to fuse_change_attributes() in order to manage the writeback
logic in a central helper.  This will be necessary for patches that follow.

Only fuse_do_getattr() -> fuse_fillattr() uses the attributes after calling
fuse_change_attributes(), so this should not change behavior.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-10-28 09:45:32 +02:00
Miklos Szeredi
c15016b7ae fuse: simplify local variables holding writeback cache state
There are two instances of "bool is_wb = fc->writeback_cache" where the
actual use mostly involves checking "is_wb && S_ISREG(inode->i_mode)".

Clean up these cases by storing the second condition in the local variable.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-10-28 09:45:32 +02:00
Miklos Szeredi
20235b435a fuse: cleanup code conditional on fc->writeback_cache
It's safe to call file_update_time() if writeback cache is not enabled,
since S_NOCMTIME is set in this case.  This part is purely a cleanup.

__fuse_copy_file_range() also calls fuse_write_update_attr() only in the
writeback cache case.  This is inconsistent with other callers, where it's
called unconditionally.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-10-28 09:45:32 +02:00
Miklos Szeredi
484ce65715 fuse: fix attr version comparison in fuse_read_update_size()
A READ request returning a short count is taken as indication of EOF, and
the cached file size is modified accordingly.

Fix the attribute version checking to allow for changes to fc->attr_version
on other inodes.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-10-28 09:45:32 +02:00
Miklos Szeredi
d347739a0e fuse: always invalidate attributes after writes
Extend the fuse_write_update_attr() helper to invalidate cached attributes
after a write.

This has already been done in all cases except in fuse_notify_store(), so
this is mostly a cleanup.

fuse_direct_write_iter() calls fuse_direct_IO() which already calls
fuse_write_update_attr(), so don't repeat that again in the former.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-10-28 09:45:32 +02:00
Miklos Szeredi
27ae449ba2 fuse: rename fuse_write_update_size()
This function already updates the attr_version in fuse_inode, regardless of
whether the size was changed or not.

Rename the helper to fuse_write_update_attr() to reflect the more generic
nature.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-10-28 09:45:32 +02:00
Miklos Szeredi
8c56e03d2e fuse: don't bump attr_version in cached write
The attribute version in fuse_inode should be updated whenever the
attributes might have changed on the server.  In case of cached writes this
is not the case, so updating the attr_version is unnecessary and could
possibly affect performance.

Open code the remaining part of fuse_write_update_size().

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-10-28 09:45:32 +02:00
Miklos Szeredi
fa5eee57e3 fuse: selective attribute invalidation
Only invalidate attributes that the operation might have changed.

Introduce two constants for common combinations of changed attributes:

  FUSE_STATX_MODIFY: file contents are modified but not size

  FUSE_STATX_MODSIZE: size and/or file contents modified

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-10-28 09:45:32 +02:00
Miklos Szeredi
97f044f690 fuse: don't increment nlink in link()
The fuse_iget() call in create_new_entry() already updated the inode with
all the new attributes and incremented the attribute version.

Incrementing the nlink will result in the wrong count.  This wasn't noticed
because the attributes were invalidated right after this.

Updating ctime is still needed for the writeback case when the ctime is not
refreshed.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-10-28 09:45:26 +02:00
Hyeong-Jun Kim
02d58cd253 f2fs: compress: disallow disabling compress on non-empty compressed file
Compresse file and normal file has differ in i_addr addressing,
specifically addrs per inode/block. So, we will face data loss, if we
disable the compression flag on non-empty files. Therefore we should
disallow not only enabling but disabling the compression flag on
non-empty files.

Fixes: 4c8ff7095b ("f2fs: support data compression")
Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com>
Signed-off-by: Hyeong-Jun Kim <hj514.kim@samsung.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2021-10-27 15:32:29 -07:00
Trond Myklebust
01d29f87fc NFSv4: Fix a regression in nfs_set_open_stateid_locked()
If we already hold open state on the client, yet the server gives us a
completely different stateid to the one we already hold, then we
currently treat it as if it were an out-of-sequence update, and wait for
5 seconds for other updates to come in.
This commit fixes the behaviour so that we immediately start processing
of the new stateid, and then leave it to the call to
nfs4_test_and_free_stateid() to decide what to do with the old stateid.

Fixes: b4868b44c5 ("NFSv4: Wait for stateid updates after CLOSE/OPEN_DOWNGRADE")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-27 15:24:46 -04:00
Dongliang Mu
81dedaf10c fs: reiserfs: remove useless new_opts in reiserfs_remount
Since the commit c3d98ea082 ("VFS: Don't use save/replace_mount_options
if not using generic_show_options") eliminates replace_mount_options
in reiserfs_remount, but does not handle the allocated new_opts,
it will cause memory leak in the reiserfs_remount.

Because new_opts is useless in reiserfs_mount, so we fix this bug by
removing the useless new_opts in reiserfs_remount.

Fixes: c3d98ea082 ("VFS: Don't use save/replace_mount_options if not using generic_show_options")
Link: https://lore.kernel.org/r/20211027143445.4156459-1-mudongliangabcd@gmail.com
Signed-off-by: Dongliang Mu <mudongliangabcd@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2021-10-27 16:45:20 +02:00
Gabriel Krisman Bertazi
9a089b21f7 ext4: Send notifications on error
Send a FS_ERROR message via fsnotify to a userspace monitoring tool
whenever a ext4 error condition is triggered.  This follows the existing
error conditions in ext4, so it is hooked to the ext4_error* functions.

Link: https://lore.kernel.org/r/20211025192746.66445-30-krisman@collabora.com
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Acked-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jan Kara <jack@suse.cz>
2021-10-27 12:53:46 +02:00
Gabriel Krisman Bertazi
9709bd548f fanotify: Allow users to request FAN_FS_ERROR events
Wire up the FAN_FS_ERROR event in the fanotify_mark syscall, allowing
user space to request the monitoring of FAN_FS_ERROR events.

These events are limited to filesystem marks, so check it is the
case in the syscall handler.

Link: https://lore.kernel.org/r/20211025192746.66445-29-krisman@collabora.com
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2021-10-27 12:53:45 +02:00
Gabriel Krisman Bertazi
130a3c7421 fanotify: Emit generic error info for error event
The error info is a record sent to users on FAN_FS_ERROR events
documenting the type of error.  It also carries an error count,
documenting how many errors were observed since the last reporting.

Link: https://lore.kernel.org/r/20211025192746.66445-28-krisman@collabora.com
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2021-10-27 12:53:42 +02:00
Gabriel Krisman Bertazi
936d6a38be fanotify: Report fid info for file related file system errors
Plumb the pieces to add a FID report to error records.  Since all error
event memory must be pre-allocated, we pre-allocate the maximum file
handle size possible, such that it should always fit.

For errors that don't expose a file handle, report it with an invalid
FID. Internally we use zero-length FILEID_ROOT file handle for passing
the information (which we report as zero-length FILEID_INVALID file
handle to userspace) so we update the handle reporting code to deal with
this case correctly.

Link: https://lore.kernel.org/r/20211025192746.66445-27-krisman@collabora.com
Link: https://lore.kernel.org/r/20211025192746.66445-25-krisman@collabora.com
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
[Folded two patches into 2 to make series bisectable]
Signed-off-by: Jan Kara <jack@suse.cz>
2021-10-27 12:37:20 +02:00
Gabriel Krisman Bertazi
572c28f27a fanotify: WARN_ON against too large file handles
struct fanotify_error_event, at least, is preallocated and isn't able to
to handle arbitrarily large file handles.  Future-proof the code by
complaining loudly if a handle larger than MAX_HANDLE_SZ is ever found.

Link: https://lore.kernel.org/r/20211025192746.66445-26-krisman@collabora.com
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2021-10-27 12:37:20 +02:00
Gabriel Krisman Bertazi
4bd5a5c8e6 fanotify: Add helpers to decide whether to report FID/DFID
Now that there is an event that reports FID records even for a zeroed
file handle, wrap the logic that deides whether to issue the records
into helper functions.  This shouldn't have any impact on the code, but
simplifies further patches.

Link: https://lore.kernel.org/r/20211025192746.66445-24-krisman@collabora.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2021-10-27 12:37:20 +02:00
Gabriel Krisman Bertazi
2c5069433a fanotify: Wrap object_fh inline space in a creator macro
fanotify_error_event would duplicate this sequence of declarations that
already exist elsewhere with a slight different size.  Create a helper
macro to avoid code duplication.

Link: https://lore.kernel.org/r/20211025192746.66445-23-krisman@collabora.com
Suggested-by: Jan Kara <jack@suse.cz>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2021-10-27 12:37:20 +02:00
Gabriel Krisman Bertazi
8a6ae64132 fanotify: Support merging of error events
Error events (FAN_FS_ERROR) against the same file system can be merged
by simply iterating the error count.  The hash is taken from the fsid,
without considering the FH.  This means that only the first error object
is reported.

Link: https://lore.kernel.org/r/20211025192746.66445-22-krisman@collabora.com
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2021-10-27 12:37:13 +02:00
Gabriel Krisman Bertazi
83e9acbe13 fanotify: Support enqueueing of error events
Once an error event is triggered, enqueue it in the notification group,
similarly to what is done for other events.  FAN_FS_ERROR is not
handled specially, since the memory is now handled by a preallocated
mempool.

For now, make the event unhashed.  A future patch implements merging of
this kind of event.

Link: https://lore.kernel.org/r/20211025192746.66445-21-krisman@collabora.com
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2021-10-27 12:35:10 +02:00
Gabriel Krisman Bertazi
734a1a5ecc fanotify: Pre-allocate pool of error events
Pre-allocate slots for file system errors to have greater chances of
succeeding, since error events can happen in GFP_NOFS context.  This
patch introduces a group-wide mempool of error events, shared by all
FAN_FS_ERROR marks in this group.

Link: https://lore.kernel.org/r/20211025192746.66445-20-krisman@collabora.com
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2021-10-27 12:35:05 +02:00
Gabriel Krisman Bertazi
8d11a4f43e fanotify: Reserve UAPI bits for FAN_FS_ERROR
FAN_FS_ERROR allows reporting of event type FS_ERROR to userspace, which
is a mechanism to report file system wide problems via fanotify.  This
commit preallocate userspace visible bits to match the FS_ERROR event.

Link: https://lore.kernel.org/r/20211025192746.66445-19-krisman@collabora.com
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2021-10-27 12:34:59 +02:00