Commit Graph

74668 Commits

Author SHA1 Message Date
Huang Jianan
7e508f2ca8 erofs: rename lz4_0pading to zero_padding
Renaming lz4_0padding to zero_padding globally since LZMA and later
algorithms also need that.

Link: https://lore.kernel.org/r/20211112160935.19394-1-jnhuang95@gmail.com
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2021-12-01 21:28:00 +08:00
David Howells
ddca5b0eba netfs: Adjust docs after foliation
Adjust the netfslib docs in light of the foliation changes.

Also un-kdoc-mark netfs_skip_folio_read() since it's internal and isn't
part of the API.

Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: linux-cachefs@redhat.com
cc: linux-mm@kvack.org
Link: https://lore.kernel.org/r/163706992597.3179783.18360472879717076435.stgit@warthog.procyon.org.uk/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-29 10:10:26 -08:00
Ye Bin
2087009c74 io_uring: validate timespec for timeout removals
Like commit f6223ff799, timeout removal should also validate the
timespec that is being passed in.

Signed-off-by: Ye Bin <yebin10@huawei.com>
Link: https://lore.kernel.org/r/20211129041537.1936270-1-yebin10@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-29 06:46:39 -07:00
Linus Torvalds
3498e7f2bb Merge tag '5.16-rc2-ksmbd-fixes' of git://git.samba.org/ksmbd
Pull ksmbd fixes from Steve French:
 "Five ksmbd server fixes, four of them for stable:

   - memleak fix

   - fix for default data stream on filesystems that don't support xattr

   - error logging fix

   - session setup fix

   - minor doc cleanup"

* tag '5.16-rc2-ksmbd-fixes' of git://git.samba.org/ksmbd:
  ksmbd: fix memleak in get_file_stream_info()
  ksmbd: contain default data stream even if xattr is empty
  ksmbd: downgrade addition info error msg to debug in smb2_get_info_sec()
  docs: filesystem: cifs: ksmbd: Fix small layout issues
  ksmbd: Fix an error handling path in 'smb2_sess_setup()'
2021-11-27 14:49:35 -08:00
Guenter Roeck
4eec7faf67 fs: ntfs: Limit NTFS_RW to page sizes smaller than 64k
NTFS_RW code allocates page size dependent arrays on the stack. This
results in build failures if the page size is 64k or larger.

  fs/ntfs/aops.c: In function 'ntfs_write_mst_block':
  fs/ntfs/aops.c:1311:1: error:
	the frame size of 2240 bytes is larger than 2048 bytes

Since commit f22969a660 ("powerpc/64s: Default to 64K pages for 64 bit
book3s") this affects ppc:allmodconfig builds, but other architectures
supporting page sizes of 64k or larger are also affected.

Increasing the maximum frame size for affected architectures just to
silence this error does not really help.  The frame size would have to
be set to a really large value for 256k pages.  Also, a large frame size
could potentially result in stack overruns in this code and elsewhere
and is therefore not desirable.  Make NTFS_RW dependent on page sizes
smaller than 64k instead.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Cc: Anton Altaparmakov <anton@tuxera.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-27 14:34:41 -08:00
Linus Torvalds
4f0dda359c Merge tag 'xfs-5.16-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull xfs fixes from Darrick Wong:
 "Fixes for a resource leak and a build robot complaint about totally
  dead code:

   - Fix buffer resource leak that could lead to livelock on corrupt fs.

   - Remove unused function xfs_inew_wait to shut up the build robots"

* tag 'xfs-5.16-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: remove xfs_inew_wait
  xfs: Fix the free logic of state in xfs_attr_node_hasname
2021-11-27 12:59:54 -08:00
Linus Torvalds
adfb743ac0 Merge tag 'iomap-5.16-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux
Pull iomap fixes from Darrick Wong:
 "A single iomap bug fix and a cleanup for 5.16-rc2.

  The bug fix changes how iomap deals with reading from an inline data
  region -- whereas the current code (incorrectly) lets the iomap read
  iter try for more bytes after reading the inline region (which zeroes
  the rest of the page!) and hopes the next iteration terminates, we
  surveyed the inlinedata implementations and realized that all
  inlinedata implementations also require that the inlinedata region end
  at EOF, so we can simply terminate the read.

  The second patch documents these assumptions in the code so that
  they're not subtle implications anymore, and cleans up some of the
  grosser parts of that function.

  Summary:

   - Fix an accounting problem where unaligned inline data reads can run
     off the end of the read iomap iterator. iomap has historically
     required that inline data mappings only exist at the end of a file,
     though this wasn't documented anywhere.

   - Document iomap_read_inline_data and change its return type to be
     appropriate for the information that it's actually returning"

* tag 'iomap-5.16-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  iomap: iomap_read_inline_data cleanup
  iomap: Fix inline extent handling in iomap_readpage
2021-11-27 12:50:03 -08:00
Linus Torvalds
86799cdfbc Merge tag 'io_uring-5.16-2021-11-27' of git://git.kernel.dk/linux-block
Pull more io_uring fixes from Jens Axboe:
 "The locking fixup that was applied earlier this rc has both a deadlock
  and IRQ safety issue, let's get that ironed out before -rc3. This
  contains:

   - Link traversal locking fix (Pavel)

   - Cancelation fix (Pavel)

   - Relocate cond_resched() for huge buffer chain freeing, avoiding a
     softlockup warning (Ye)

   - Fix timespec validation (Ye)"

* tag 'io_uring-5.16-2021-11-27' of git://git.kernel.dk/linux-block:
  io_uring: Fix undefined-behaviour in io_issue_sqe
  io_uring: fix soft lockup when call __io_remove_buffers
  io_uring: fix link traversal locking
  io_uring: fail cancellation for EXITING tasks
2021-11-27 11:28:37 -08:00
Linus Torvalds
7413927713 Merge tag 'nfs-for-5.16-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client fixes from Trond Myklebust:
 "Highlights include:

  Stable fixes:

   - NFSv42: Fix pagecache invalidation after COPY/CLONE

  Bugfixes:

   - NFSv42: Don't fail clone() just because the server failed to return
     post-op attributes

   - SUNRPC: use different lockdep keys for INET6 and LOCAL

   - NFSv4.1: handle NFS4ERR_NOSPC from CREATE_SESSION

   - SUNRPC: fix header include guard in trace header"

* tag 'nfs-for-5.16-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  SUNRPC: use different lock keys for INET6 and LOCAL
  sunrpc: fix header include guard in trace header
  NFSv4.1: handle NFS4ERR_NOSPC by CREATE_SESSION
  NFSv42: Fix pagecache invalidation after COPY/CLONE
  NFS: Add a tracepoint to show the results of nfs_set_cache_invalid()
  NFSv42: Don't fail clone() unless the OP_CLONE operation failed
2021-11-27 10:33:55 -08:00
Linus Torvalds
52dc4c640a Merge tag 'erofs-for-5.16-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs fix from Gao Xiang:
 "Fix an ABBA deadlock introduced by XArray conversion"

* tag 'erofs-for-5.16-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  erofs: fix deadlock when shrink erofs slab
2021-11-27 10:27:35 -08:00
Ye Bin
f6223ff799 io_uring: Fix undefined-behaviour in io_issue_sqe
We got issue as follows:
================================================================================
UBSAN: Undefined behaviour in ./include/linux/ktime.h:42:14
signed integer overflow:
-4966321760114568020 * 1000000000 cannot be represented in type 'long long int'
CPU: 1 PID: 2186 Comm: syz-executor.2 Not tainted 4.19.90+ #12
Hardware name: linux,dummy-virt (DT)
Call trace:
 dump_backtrace+0x0/0x3f0 arch/arm64/kernel/time.c:78
 show_stack+0x28/0x38 arch/arm64/kernel/traps.c:158
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x170/0x1dc lib/dump_stack.c:118
 ubsan_epilogue+0x18/0xb4 lib/ubsan.c:161
 handle_overflow+0x188/0x1dc lib/ubsan.c:192
 __ubsan_handle_mul_overflow+0x34/0x44 lib/ubsan.c:213
 ktime_set include/linux/ktime.h:42 [inline]
 timespec64_to_ktime include/linux/ktime.h:78 [inline]
 io_timeout fs/io_uring.c:5153 [inline]
 io_issue_sqe+0x42c8/0x4550 fs/io_uring.c:5599
 __io_queue_sqe+0x1b0/0xbc0 fs/io_uring.c:5988
 io_queue_sqe+0x1ac/0x248 fs/io_uring.c:6067
 io_submit_sqe fs/io_uring.c:6137 [inline]
 io_submit_sqes+0xed8/0x1c88 fs/io_uring.c:6331
 __do_sys_io_uring_enter fs/io_uring.c:8170 [inline]
 __se_sys_io_uring_enter fs/io_uring.c:8129 [inline]
 __arm64_sys_io_uring_enter+0x490/0x980 fs/io_uring.c:8129
 invoke_syscall arch/arm64/kernel/syscall.c:53 [inline]
 el0_svc_common+0x374/0x570 arch/arm64/kernel/syscall.c:121
 el0_svc_handler+0x190/0x260 arch/arm64/kernel/syscall.c:190
 el0_svc+0x10/0x218 arch/arm64/kernel/entry.S:1017
================================================================================

As ktime_set only judge 'secs' if big than KTIME_SEC_MAX, but if we pass
negative value maybe lead to overflow.
To address this issue, we must check if 'sec' is negative.

Signed-off-by: Ye Bin <yebin10@huawei.com>
Link: https://lore.kernel.org/r/20211118015907.844807-1-yebin10@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-27 06:41:38 -07:00
Ye Bin
1d0254e6b4 io_uring: fix soft lockup when call __io_remove_buffers
I got issue as follows:
[ 567.094140] __io_remove_buffers: [1]start ctx=0xffff8881067bf000 bgid=65533 buf=0xffff8881fefe1680
[  594.360799] watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [kworker/u32:5:108]
[  594.364987] Modules linked in:
[  594.365405] irq event stamp: 604180238
[  594.365906] hardirqs last  enabled at (604180237): [<ffffffff93fec9bd>] _raw_spin_unlock_irqrestore+0x2d/0x50
[  594.367181] hardirqs last disabled at (604180238): [<ffffffff93fbbadb>] sysvec_apic_timer_interrupt+0xb/0xc0
[  594.368420] softirqs last  enabled at (569080666): [<ffffffff94200654>] __do_softirq+0x654/0xa9e
[  594.369551] softirqs last disabled at (569080575): [<ffffffff913e1d6a>] irq_exit_rcu+0x1ca/0x250
[  594.370692] CPU: 2 PID: 108 Comm: kworker/u32:5 Tainted: G            L    5.15.0-next-20211112+ #88
[  594.371891] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014
[  594.373604] Workqueue: events_unbound io_ring_exit_work
[  594.374303] RIP: 0010:_raw_spin_unlock_irqrestore+0x33/0x50
[  594.375037] Code: 48 83 c7 18 53 48 89 f3 48 8b 74 24 10 e8 55 f5 55 fd 48 89 ef e8 ed a7 56 fd 80 e7 02 74 06 e8 43 13 7b fd fb bf 01 00 00 00 <e8> f8 78 474
[  594.377433] RSP: 0018:ffff888101587a70 EFLAGS: 00000202
[  594.378120] RAX: 0000000024030f0d RBX: 0000000000000246 RCX: 1ffffffff2f09106
[  594.379053] RDX: 0000000000000000 RSI: ffffffff9449f0e0 RDI: 0000000000000001
[  594.379991] RBP: ffffffff9586cdc0 R08: 0000000000000001 R09: fffffbfff2effcab
[  594.380923] R10: ffffffff977fe557 R11: fffffbfff2effcaa R12: ffff8881b8f3def0
[  594.381858] R13: 0000000000000246 R14: ffff888153a8b070 R15: 0000000000000000
[  594.382787] FS:  0000000000000000(0000) GS:ffff888399c00000(0000) knlGS:0000000000000000
[  594.383851] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  594.384602] CR2: 00007fcbe71d2000 CR3: 00000000b4216000 CR4: 00000000000006e0
[  594.385540] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  594.386474] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  594.387403] Call Trace:
[  594.387738]  <TASK>
[  594.388042]  find_and_remove_object+0x118/0x160
[  594.389321]  delete_object_full+0xc/0x20
[  594.389852]  kfree+0x193/0x470
[  594.390275]  __io_remove_buffers.part.0+0xed/0x147
[  594.390931]  io_ring_ctx_free+0x342/0x6a2
[  594.392159]  io_ring_exit_work+0x41e/0x486
[  594.396419]  process_one_work+0x906/0x15a0
[  594.399185]  worker_thread+0x8b/0xd80
[  594.400259]  kthread+0x3bf/0x4a0
[  594.401847]  ret_from_fork+0x22/0x30
[  594.402343]  </TASK>

Message from syslogd@localhost at Nov 13 09:09:54 ...
kernel:watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [kworker/u32:5:108]
[  596.793660] __io_remove_buffers: [2099199]start ctx=0xffff8881067bf000 bgid=65533 buf=0xffff8881fefe1680

We can reproduce this issue by follow syzkaller log:
r0 = syz_io_uring_setup(0x401, &(0x7f0000000300), &(0x7f0000003000/0x2000)=nil, &(0x7f0000ff8000/0x4000)=nil, &(0x7f0000000280)=<r1=>0x0, &(0x7f0000000380)=<r2=>0x0)
sendmsg$ETHTOOL_MSG_FEATURES_SET(0xffffffffffffffff, &(0x7f0000003080)={0x0, 0x0, &(0x7f0000003040)={&(0x7f0000000040)=ANY=[], 0x18}}, 0x0)
syz_io_uring_submit(r1, r2, &(0x7f0000000240)=@IORING_OP_PROVIDE_BUFFERS={0x1f, 0x5, 0x0, 0x401, 0x1, 0x0, 0x100, 0x0, 0x1, {0xfffd}}, 0x0)
io_uring_enter(r0, 0x3a2d, 0x0, 0x0, 0x0, 0x0)

The reason above issue  is 'buf->list' has 2,100,000 nodes, occupied cpu lead
to soft lockup.
To solve this issue, we need add schedule point when do while loop in
'__io_remove_buffers'.
After add  schedule point we do regression, get follow data.
[  240.141864] __io_remove_buffers: [1]start ctx=0xffff888170603000 bgid=65533 buf=0xffff8881116fcb00
[  268.408260] __io_remove_buffers: [1]start ctx=0xffff8881b92d2000 bgid=65533 buf=0xffff888130c83180
[  275.899234] __io_remove_buffers: [2099199]start ctx=0xffff888170603000 bgid=65533 buf=0xffff8881116fcb00
[  296.741404] __io_remove_buffers: [1]start ctx=0xffff8881b659c000 bgid=65533 buf=0xffff8881010fe380
[  305.090059] __io_remove_buffers: [2099199]start ctx=0xffff8881b92d2000 bgid=65533 buf=0xffff888130c83180
[  325.415746] __io_remove_buffers: [1]start ctx=0xffff8881b92d1000 bgid=65533 buf=0xffff8881a17d8f00
[  333.160318] __io_remove_buffers: [2099199]start ctx=0xffff8881b659c000 bgid=65533 buf=0xffff8881010fe380
...

Fixes:8bab4c09f24e("io_uring: allow conditional reschedule for intensive iterators")
Signed-off-by: Ye Bin <yebin10@huawei.com>
Link: https://lore.kernel.org/r/20211122024737.2198530-1-yebin10@huawei.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-27 06:41:32 -07:00
Linus Torvalds
925c94371c Merge tag 'fuse-fixes-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fix from Miklos Szeredi:
 "Fix a regression caused by a bugfix in the previous release. The
  symptom is a VM_BUG_ON triggered from splice to the fuse device.

  Unfortunately the original bugfix was already backported to a number
  of stable releases, so this fix-fix will need to be backported as
  well"

* tag 'fuse-fixes-5.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
  fuse: release pipe buf after last use
2021-11-26 12:01:31 -08:00
Linus Torvalds
7e63545264 Merge tag 'for-5.16-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fix from David Sterba:
 "One more fix to the lzo code, a missing put_page causing memory leaks
  when some error branches are taken"

* tag 'for-5.16-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix the memory leak caused in lzo_compress_pages()
2021-11-26 11:24:32 -08:00
Pavel Begunkov
6af3f48bf6 io_uring: fix link traversal locking
WARNING: inconsistent lock state
5.16.0-rc2-syzkaller #0 Not tainted
inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
ffff888078e11418 (&ctx->timeout_lock
){?.+.}-{2:2}
, at: io_timeout_fn+0x6f/0x360 fs/io_uring.c:5943
{HARDIRQ-ON-W} state was registered at:
  [...]
  spin_unlock_irq include/linux/spinlock.h:399 [inline]
  __io_poll_remove_one fs/io_uring.c:5669 [inline]
  __io_poll_remove_one fs/io_uring.c:5654 [inline]
  io_poll_remove_one+0x236/0x870 fs/io_uring.c:5680
  io_poll_remove_all+0x1af/0x235 fs/io_uring.c:5709
  io_ring_ctx_wait_and_kill+0x1cc/0x322 fs/io_uring.c:9534
  io_uring_release+0x42/0x46 fs/io_uring.c:9554
  __fput+0x286/0x9f0 fs/file_table.c:280
  task_work_run+0xdd/0x1a0 kernel/task_work.c:164
  exit_task_work include/linux/task_work.h:32 [inline]
  do_exit+0xc14/0x2b40 kernel/exit.c:832

674ee8e1b4 ("io_uring: correct link-list traversal locking") fixed a
data race but introduced a possible deadlock and inconsistentcy in irq
states. E.g.

io_poll_remove_all()
    spin_lock_irq(timeout_lock)
    io_poll_remove_one()
        spin_lock/unlock_irq(poll_lock);
    spin_unlock_irq(timeout_lock)

Another type of problem is freeing a request while holding
->timeout_lock, which may leads to a deadlock in
io_commit_cqring() -> io_flush_timeouts() and other places.

Having 3 nested locks is also too ugly. Add io_match_task_safe(), which
would briefly take and release timeout_lock for race prevention inside,
so the actuall request cancellation / free / etc. code doesn't have it
taken.

Reported-by: syzbot+ff49a3059d49b0ca0eec@syzkaller.appspotmail.com
Reported-by: syzbot+847f02ec20a6609a328b@syzkaller.appspotmail.com
Reported-by: syzbot+3368aadcd30425ceb53b@syzkaller.appspotmail.com
Reported-by: syzbot+51ce8887cdef77c9ac83@syzkaller.appspotmail.com
Reported-by: syzbot+3cb756a49d2f394a9ee3@syzkaller.appspotmail.com
Fixes: 674ee8e1b4 ("io_uring: correct link-list traversal locking")
Cc: stable@kernel.org # 5.15+
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/397f7ebf3f4171f1abe41f708ac1ecb5766f0b68.1637937097.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-26 08:35:57 -07:00
Pavel Begunkov
617a89484d io_uring: fail cancellation for EXITING tasks
WARNING: CPU: 1 PID: 20 at fs/io_uring.c:6269 io_try_cancel_userdata+0x3c5/0x640 fs/io_uring.c:6269
CPU: 1 PID: 20 Comm: kworker/1:0 Not tainted 5.16.0-rc1-syzkaller #0
Workqueue: events io_fallback_req_func
RIP: 0010:io_try_cancel_userdata+0x3c5/0x640 fs/io_uring.c:6269
Call Trace:
 <TASK>
 io_req_task_link_timeout+0x6b/0x1e0 fs/io_uring.c:6886
 io_fallback_req_func+0xf9/0x1ae fs/io_uring.c:1334
 process_one_work+0x9b2/0x1690 kernel/workqueue.c:2298
 worker_thread+0x658/0x11f0 kernel/workqueue.c:2445
 kthread+0x405/0x4f0 kernel/kthread.c:327
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
 </TASK>

We need original task's context to do cancellations, so if it's dying
and the callback is executed in a fallback mode, fail the cancellation
attempt.

Fixes: 89b263f6d5 ("io_uring: run linked timeouts from task_work")
Cc: stable@kernel.org # 5.15+
Reported-by: syzbot+ab0cfe96c2b3cd1c1153@syzkaller.appspotmail.com
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/4c41c5f379c6941ad5a07cd48cb66ed62199cf7e.1637937097.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-26 08:35:43 -07:00
Qu Wenruo
daf87e9535 btrfs: fix the memory leak caused in lzo_compress_pages()
[BUG]
Fstests generic/027 is pretty easy to trigger a slow but steady memory
leak if run with "-o compress=lzo" mount option.

Normally one single run of generic/027 is enough to eat up at least 4G ram.

[CAUSE]
In commit d4088803f5 ("btrfs: subpage: make lzo_compress_pages()
compatible") we changed how @page_in is released.

But that refactoring makes @page_in only released after all pages being
compressed.

This leaves error path not releasing @page_in. And by "error path"
things like incompressible data will also be treated as an error
(-E2BIG).

Thus it can cause a memory leak if even nothing wrong happened.

[FIX]
Add check under @out label to release @page_in when needed, so when we
hit any error, the input page is properly released.

Reported-by: Josef Bacik <josef@toxicpanda.com>
Fixes: d4088803f5 ("btrfs: subpage: make lzo_compress_pages() compatible")
Reviewed-and-tested-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2021-11-26 16:10:05 +01:00
Linus Torvalds
de4444f596 Merge tag 'io_uring-5.16-2021-11-25' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
 "A locking fix for link traversal, and fixing up an outdated function
  name in a comment"

* tag 'io_uring-5.16-2021-11-25' of git://git.kernel.dk/linux-block:
  io_uring: correct link-list traversal locking
  io_uring: fix missed comment from *task_file rename
2021-11-25 10:57:45 -08:00
Linus Torvalds
8ef4678f2f Merge tag '5.16-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
 "Four small cifs/smb3 fixes:

   - two multichannel fixes

   - fix problem noted by kernel test robot

   - update internal version number"

* tag '5.16-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: update internal version number
  smb2: clarify rc initialization in smb2_reconnect
  cifs: populate server_hostname for extra channels
  cifs: nosharesock should be set on new server
2021-11-25 10:48:14 -08:00
Linus Torvalds
79941493ff Merge tag 'folio-5.16b' of git://git.infradead.org/users/willy/pagecache
Pull folio fixes from Matthew Wilcox:
 "In the course of preparing the folio changes for iomap for next merge
  window, we discovered some problems that would be nice to address now:

   - Renaming multi-page folios to large folios.

     mapping_multi_page_folio_support() is just a little too long, so we
     settled on mapping_large_folio_support(). That meant renaming, eg
     folio_test_multi() to folio_test_large().

     Rename AS_THP_SUPPORT to match

   - I hadn't included folio wrappers for zero_user_segments(), etc.
     Also, multi-page^W^W large folio support is now independent of
     CONFIG_TRANSPARENT_HUGEPAGE, so machines with HIGHMEM always need
     to fall back to the out-of-line zero_user_segments().

     Remove FS_THP_SUPPORT to match

   - The build bots finally got round to telling me that I missed a
     couple of architectures when adding flush_dcache_folio(). Christoph
     suggested that we just add linux/cacheflush.h and not rely on
     asm-generic/cacheflush.h"

* tag 'folio-5.16b' of git://git.infradead.org/users/willy/pagecache:
  mm: Add functions to zero portions of a folio
  fs: Rename AS_THP_SUPPORT and mapping_thp_support
  fs: Remove FS_THP_SUPPORT
  mm: Remove folio_test_single
  mm: Rename folio_test_multi to folio_test_large
  Add linux/cacheflush.h
2021-11-25 10:13:56 -08:00
Hao Xu
b6c7db3218 io_uring: better to use REQ_F_IO_DRAIN for req->flags
It's better to use REQ_F_IO_DRAIN for req->flags rather than
IOSQE_IO_DRAIN though they have same value.

Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20211125092103.224502-3-haoxu@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-25 09:00:42 -07:00
Hao Xu
e302f1046f io_uring: fix no lock protection for ctx->cq_extra
ctx->cq_extra should be protected by completion lock so that the
req_need_defer() does the right check.

Cc: stable@vger.kernel.org
Signed-off-by: Hao Xu <haoxu@linux.alibaba.com>
Link: https://lore.kernel.org/r/20211125092103.224502-2-haoxu@linux.alibaba.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-25 09:00:42 -07:00
Vivek Goyal
3e2b6fdbdc fuse: send security context of inode on file
When a new inode is created, send its security context to server along with
creation request (FUSE_CREAT, FUSE_MKNOD, FUSE_MKDIR and FUSE_SYMLINK).
This gives server an opportunity to create new file and set security
context (possibly atomically).  In all the configurations it might not be
possible to set context atomically.

Like nfs and ceph, use security_dentry_init_security() to dermine security
context of inode and send it with create, mkdir, mknod, and symlink
requests.

Following is the information sent to server.

fuse_sectx_header, fuse_secctx, xattr_name, security_context

 - struct fuse_secctx_header
   This contains total number of security contexts being sent and total
   size of all the security contexts (including size of
   fuse_secctx_header).

 - struct fuse_secctx
   This contains size of security context which follows this structure.
   There is one fuse_secctx instance per security context.

 - xattr name string
   This string represents name of xattr which should be used while setting
   security context.

 - security context
   This is the actual security context whose size is specified in
   fuse_secctx struct.

Also add the FUSE_SECURITY_CTX flag for the `flags` field of the
fuse_init_out struct.  When this flag is set the kernel will append the
security context for a newly created inode to the request (create, mkdir,
mknod, and symlink).  The server is responsible for ensuring that the inode
appears atomically (preferrably) with the requested security context.

For example, If the server is using SELinux and backed by a "real" linux
file system that supports extended attributes it can write the security
context value to /proc/thread-self/attr/fscreate before making the syscall
to create the inode.

This patch is based on patch from Chirantan Ekbote <chirantan@chromium.org>

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-11-25 14:05:18 +01:00
Miklos Szeredi
473441720c fuse: release pipe buf after last use
Checking buf->flags should be done before the pipe_buf_release() is called
on the pipe buffer, since releasing the buffer might modify the flags.

This is exactly what page_cache_pipe_buf_release() does, and which results
in the same VM_BUG_ON_PAGE(PageLRU(page)) that the original patch was
trying to fix.

Reported-by: Justin Forbes <jmforbes@linuxtx.org>
Fixes: 712a951025 ("fuse: fix page stealing")
Cc: <stable@vger.kernel.org> # v2.6.35
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-11-25 14:05:18 +01:00
Miklos Szeredi
53db28933e fuse: extend init flags
FUSE_INIT flags are close to running out, so add another 32bits worth of
space.

Add FUSE_INIT_EXT flag to the old flags field in fuse_init_in.  If this
flag is set, then fuse_init_in is extended by 48bytes, in which a flags_hi
field is allocated to contain the high 32bits of the flags.

A flags_hi field is also added to fuse_init_out, allocated out of the
remaining unused fields.

Known userspace implementations of the fuse protocol have been checked to
accept the extended FUSE_INIT request, but this might cause problems with
other implementations.  If that happens to be the case, the protocol
negotiation will have to be extended with an extra initialization request
roundtrip.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2021-11-25 14:05:18 +01:00
Namjae Jeon
178ca6f85a ksmbd: fix memleak in get_file_stream_info()
Fix memleak in get_file_stream_info()

Fixes: 34061d6b76 ("ksmbd: validate OutputBufferLength of QUERY_DIR, QUERY_INFO, IOCTL requests")
Cc: stable@vger.kernel.org # v5.15
Reported-by: Coverity Scan <scan-admin@coverity.com>
Acked-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-11-25 00:09:26 -06:00
Namjae Jeon
1ec72153ff ksmbd: contain default data stream even if xattr is empty
If xattr is not supported like exfat or fat, ksmbd server doesn't
contain default data stream in FILE_STREAM_INFORMATION response. It will
cause ppt or doc file update issue if local filesystem is such as ones.
This patch move goto statement to contain it.

Fixes: 9f6323311c ("ksmbd: add default data stream name in FILE_STREAM_INFORMATION")
Cc: stable@vger.kernel.org # v5.15
Acked-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-11-25 00:09:26 -06:00
Namjae Jeon
8e537d1465 ksmbd: downgrade addition info error msg to debug in smb2_get_info_sec()
While file transfer through windows client, This error flood message
happen. This flood message will cause performance degradation and
misunderstand server has problem.

Fixes: e294f78d34 ("ksmbd: allow PROTECTED_DACL_SECINFO and UNPROTECTED_DACL_SECINFO addition information in smb2 set info security")
Cc: stable@vger.kernel.org # v5.15
Acked-by: Hyunchul Lee <hyc.lee@gmail.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-11-25 00:09:26 -06:00
Christophe JAILLET
f8fbfd85f5 ksmbd: Fix an error handling path in 'smb2_sess_setup()'
All the error handling paths of 'smb2_sess_setup()' end to 'out_err'.

All but the new error handling path added by the commit given in the Fixes
tag below.

Fix this error handling path and branch to 'out_err' as well.

Fixes: 0d994cd482 ("ksmbd: add buffer validation in session setup")
Cc: stable@vger.kernel.org # v5.15
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-11-25 00:09:26 -06:00
Pavel Begunkov
5562a8d71a io_uring: disable drain with cqe skip
Current IOSQE_IO_DRAIN implementation doesn't work well with CQE
skipping and it's not allowed, otherwise some requests might be not
executed until the ring is destroyed and the userspace would hang.

Let's fail all drain requests after seeing IOSQE_CQE_SKIP_SUCCESS at
least once. All drained requests prior to that will get run normally,
so there should be no stalls. However, even though such mixing wouldn't
lead to issues at the moment, it's still not allowed as the behaviour
may change.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/bcf7164f8bf3eb54b7bb7b4fd119907fa4d4d43b.1636559119.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-24 11:17:53 -07:00
Pavel Begunkov
3d4aeb9f98 io_uring: don't spinlock when not posting CQEs
When no of queued for the batch completion requests need to post an CQE,
see IOSQE_CQE_SKIP_SUCCESS, avoid grabbing ->completion_lock and other
commit/post.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/8d4b4a08bca022cbe19af00266407116775b3e4d.1636559119.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-24 11:17:53 -07:00
Pavel Begunkov
04c76b41ca io_uring: add option to skip CQE posting
Emitting a CQE is expensive from the kernel perspective. Often, it's
also not convenient for the userspace, spends some cycles on processing
and just complicates the logic. A similar problems goes for linked
requests, where we post an CQE for each request in the link.

Introduce a new flags, IOSQE_CQE_SKIP_SUCCESS, trying to help with it.
When set and a request completed successfully, it won't generate a CQE.
When fails, it produces an CQE, but all following linked requests will
be CQE-less, regardless whether they have IOSQE_CQE_SKIP_SUCCESS or not.
The notion of "fail" is the same as for link failing-cancellation, where
it's opcode dependent, and _usually_ result >= 0 is a success, but not
always.

Linked timeouts are a bit special. When the requests it's linked to was
not attempted to be executed, e.g. failing linked requests, it follows
the description above. Otherwise, whether a linked timeout will post a
completion or not solely depends on IOSQE_CQE_SKIP_SUCCESS of that
linked timeout request. Linked timeout never "fail" during execution, so
for them it's unconditional. It's expected for users to not really care
about the result of it but rely solely on the result of the master
request. Another reason for such a treatment is that it's racy, and the
timeout callback may be running awhile the master request posts its
completion.

use case 1:
If one doesn't care about results of some requests, e.g. normal
timeouts, just set IOSQE_CQE_SKIP_SUCCESS. Error result will still be
posted and need to be handled.

use case 2:
Set IOSQE_CQE_SKIP_SUCCESS for all requests of a link but the last,
and it'll post a completion only for the last one if everything goes
right, otherwise there will be one only one CQE for the first failed
request.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0220fbe06f7cf99e6fc71b4297bb1cb6c0e89c2c.1636559119.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-24 11:17:53 -07:00
Pavel Begunkov
913a571aff io_uring: clean cqe filling functions
Split io_cqring_fill_event() into a couple of more targeted functions.
The first on is io_fill_cqe_aux() for completions that are not
associated with request completions and doing the ->cq_extra accounting.
Examples are additional CQEs from multishot poll and rsrc notifications.

The second is io_fill_cqe_req(), should be called when it's a normal
request completion. Nothing more to it at the moment, will be used in
later patches.

The last one is inlined __io_fill_cqe() for a finer grained control,
should be used with caution and in hottest places.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/59a9117a4a44fc9efcf04b3afa51e0d080f5943c.1636559119.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-24 11:17:53 -07:00
Andreas Gruenbacher
5ad448ce29 iomap: iomap_read_inline_data cleanup
Change iomap_read_inline_data to return 0 or an error code; this
simplifies the callers.  Add a description.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
[djwong: document the return value of iomap_read_inline_data explicitly]
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-11-24 10:15:47 -08:00
Christoph Hellwig
1090427bf1 xfs: remove xfs_inew_wait
With the remove of xfs_dqrele_all_inodes, xfs_inew_wait and all the
infrastructure used to wake the XFS_INEW bit waitqueue is unused.

Reported-by: kernel test robot <lkp@intel.com>
Fixes: 777eb1fa85 ("xfs: remove xfs_dqrele_all_inodes")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-11-24 10:06:02 -08:00
Yang Xu
a1de97fe29 xfs: Fix the free logic of state in xfs_attr_node_hasname
When testing xfstests xfs/126 on lastest upstream kernel, it will hang on some machine.
Adding a getxattr operation after xattr corrupted, I can reproduce it 100%.

The deadlock as below:
[983.923403] task:setfattr        state:D stack:    0 pid:17639 ppid: 14687 flags:0x00000080
[  983.923405] Call Trace:
[  983.923410]  __schedule+0x2c4/0x700
[  983.923412]  schedule+0x37/0xa0
[  983.923414]  schedule_timeout+0x274/0x300
[  983.923416]  __down+0x9b/0xf0
[  983.923451]  ? xfs_buf_find.isra.29+0x3c8/0x5f0 [xfs]
[  983.923453]  down+0x3b/0x50
[  983.923471]  xfs_buf_lock+0x33/0xf0 [xfs]
[  983.923490]  xfs_buf_find.isra.29+0x3c8/0x5f0 [xfs]
[  983.923508]  xfs_buf_get_map+0x4c/0x320 [xfs]
[  983.923525]  xfs_buf_read_map+0x53/0x310 [xfs]
[  983.923541]  ? xfs_da_read_buf+0xcf/0x120 [xfs]
[  983.923560]  xfs_trans_read_buf_map+0x1cf/0x360 [xfs]
[  983.923575]  ? xfs_da_read_buf+0xcf/0x120 [xfs]
[  983.923590]  xfs_da_read_buf+0xcf/0x120 [xfs]
[  983.923606]  xfs_da3_node_read+0x1f/0x40 [xfs]
[  983.923621]  xfs_da3_node_lookup_int+0x69/0x4a0 [xfs]
[  983.923624]  ? kmem_cache_alloc+0x12e/0x270
[  983.923637]  xfs_attr_node_hasname+0x6e/0xa0 [xfs]
[  983.923651]  xfs_has_attr+0x6e/0xd0 [xfs]
[  983.923664]  xfs_attr_set+0x273/0x320 [xfs]
[  983.923683]  xfs_xattr_set+0x87/0xd0 [xfs]
[  983.923686]  __vfs_removexattr+0x4d/0x60
[  983.923688]  __vfs_removexattr_locked+0xac/0x130
[  983.923689]  vfs_removexattr+0x4e/0xf0
[  983.923690]  removexattr+0x4d/0x80
[  983.923693]  ? __check_object_size+0xa8/0x16b
[  983.923695]  ? strncpy_from_user+0x47/0x1a0
[  983.923696]  ? getname_flags+0x6a/0x1e0
[  983.923697]  ? _cond_resched+0x15/0x30
[  983.923699]  ? __sb_start_write+0x1e/0x70
[  983.923700]  ? mnt_want_write+0x28/0x50
[  983.923701]  path_removexattr+0x9b/0xb0
[  983.923702]  __x64_sys_removexattr+0x17/0x20
[  983.923704]  do_syscall_64+0x5b/0x1a0
[  983.923705]  entry_SYSCALL_64_after_hwframe+0x65/0xca
[  983.923707] RIP: 0033:0x7f080f10ee1b

When getxattr calls xfs_attr_node_get function, xfs_da3_node_lookup_int fails with EFSCORRUPTED in
xfs_attr_node_hasname because we have use blocktrash to random it in xfs/126. So it
free state in internal and xfs_attr_node_get doesn't do xfs_buf_trans release job.

Then subsequent removexattr will hang because of it.

This bug was introduced by kernel commit 07120f1abd ("xfs: Add xfs_has_attr and subroutines").
It adds xfs_attr_node_hasname helper and said caller will be responsible for freeing the state
in this case. But xfs_attr_node_hasname will free state itself instead of caller if
xfs_da3_node_lookup_int fails.

Fix this bug by moving the step of free state into caller.

Also, use "goto error/out" instead of returning error directly in xfs_attr_node_addname_find_attr and
xfs_attr_node_removename_setup function because we should free state ourselves.

Fixes: 07120f1abd ("xfs: Add xfs_has_attr and subroutines")
Signed-off-by: Yang Xu <xuyang2018.jy@fujitsu.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-11-24 10:06:02 -08:00
Minchan Kim
393c371408 kernfs: switch global kernfs_rwsem lock to per-fs lock
The kernfs implementation has big lock granularity(kernfs_rwsem) so
every kernfs-based(e.g., sysfs, cgroup) fs are able to compete the
lock. It makes trouble for some cases to wait the global lock
for a long time even though they are totally independent contexts
each other.

A general example is process A goes under direct reclaim with holding
the lock when it accessed the file in sysfs and process B is waiting
the lock with exclusive mode and then process C is waiting the lock
until process B could finish the job after it gets the lock from
process A.

This patch switches the global kernfs_rwsem to per-fs lock, which
put the rwsem into kernfs_root.

Suggested-by: Tejun Heo <tj@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Link: https://lore.kernel.org/r/20211118230008.2679780-1-minchan@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-11-24 13:55:16 +01:00
Pavel Begunkov
2ea537ca02 io_uring: improve argument types of kiocb_done()
kiocb_done() accepts a pointer to struct kiocb, pass struct io_kiocb
(i.e. io_uring's request) instead so we can get rid of useless
container_of().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/252016eed77806f58b48251a85cd8c645f900433.1637524285.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-23 12:24:20 -07:00
Pavel Begunkov
f3251183b2 io_uring: clean __io_import_iovec()
Apparently, implicit 0 to NULL conversion with ERR_PTR is not
recommended and makes some tooling like Smatch to complain. Handle it
explicitly, compilers are perfectly capable to optimise it out.

Link: https://lore.kernel.org/all/20211108134937.GA2863@kili/
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5c6ed369ad95075dab345df679f8677b8fe66656.1637524285.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-23 12:24:20 -07:00
Pavel Begunkov
7297ce3d59 io_uring: improve send/recv error handling
Hide all error handling under common if block, removes two extra ifs on
the success path and keeps the handling more condensed.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5761545158a12968f3caf30f747eea65ed75dfc1.1637524285.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-23 12:24:20 -07:00
Pavel Begunkov
06bdea20c1 io_uring: simplify reissue in kiocb_done
Simplify failed resubmission prep in kiocb_done(), it's a bit ugly with
conditional logic and hand handling cflags / select buffers. Instead,
punt to tw and use io_req_task_complete() already handling all the
cases.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/667c33484b05b612e9420e1b1d5f4dc46d0ee9ce.1637524285.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-23 12:24:20 -07:00
Steve French
0b03fe6d3a cifs: update internal version number
To 2.34

Signed-off-by: Steve French <stfrench@microsoft.com>
2021-11-23 10:07:15 -06:00
Steve French
350f4a562e smb2: clarify rc initialization in smb2_reconnect
It is clearer to initialize rc at the beginning of the function.

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-11-23 10:07:00 -06:00
Shyam Prasad N
5112d80c16 cifs: populate server_hostname for extra channels
Recently, a new field got added to the smb3_fs_context struct
named server_hostname. While creating extra channels, pick up
this field from primary channel.

Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-11-23 10:05:39 -06:00
Shyam Prasad N
b9ad6b5b68 cifs: nosharesock should be set on new server
Recent fix to maintain a nosharesock state on the
server struct caused a regression. It updated this
field in the old tcp session, and not the new one.

This caused the multichannel scenario to misbehave.

Fixes: c9f1c19cf7 (cifs: nosharesock should not share socket with future sessions)
Signed-off-by: Shyam Prasad N <sprasad@microsoft.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
Signed-off-by: Steve French <stfrench@microsoft.com>
2021-11-23 10:04:49 -06:00
Huang Jianan
57bbeacdbe erofs: fix deadlock when shrink erofs slab
We observed the following deadlock in the stress test under low
memory scenario:

Thread A                               Thread B
- erofs_shrink_scan
 - erofs_try_to_release_workgroup
  - erofs_workgroup_try_to_freeze -- A
                                       - z_erofs_do_read_page
                                        - z_erofs_collection_begin
                                         - z_erofs_register_collection
                                          - erofs_insert_workgroup
                                           - xa_lock(&sbi->managed_pslots) -- B
                                           - erofs_workgroup_get
                                            - erofs_wait_on_workgroup_freezed -- A
  - xa_erase
   - xa_lock(&sbi->managed_pslots) -- B

To fix this, it needs to hold xa_lock before freezing the workgroup
since xarray will be touched then. So let's hold the lock before
accessing each workgroup, just like what we did with the radix tree
before.

[ Gao Xiang: Jianhua Hao also reports this issue at
  https://lore.kernel.org/r/b10b85df30694bac8aadfe43537c897a@xiaomi.com ]

Link: https://lore.kernel.org/r/20211118135844.3559-1-huangjianan@oppo.com
Fixes: 64094a0441 ("erofs: convert workstn to XArray")
Reviewed-by: Chao Yu <chao@kernel.org>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Huang Jianan <huangjianan@oppo.com>
Reported-by: Jianhua Hao <haojianhua1@xiaomi.com>
Signed-off-by: Gao Xiang <xiang@kernel.org>
2021-11-23 14:58:16 +08:00
Pavel Begunkov
674ee8e1b4 io_uring: correct link-list traversal locking
As io_remove_next_linked() is now under ->timeout_lock (see
io_link_timeout_fn), we should update locking around io_for_each_link()
and io_match_task() to use the new lock.

Cc: stable@kernel.org # 5.15+
Fixes: 89850fce16 ("io_uring: run timeouts from task_work")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b54541cedf7de59cb5ae36109e58529ca16e66aa.1637631883.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-11-22 19:31:54 -07:00
Andreas Gruenbacher
d8af404ffc iomap: Fix inline extent handling in iomap_readpage
Before commit 740499c784 ("iomap: fix the iomap_readpage_actor return
value for inline data"), when hitting an IOMAP_INLINE extent,
iomap_readpage_actor would report having read the entire page.  Since
then, it only reports having read the inline data (iomap->length).

This will force iomap_readpage into another iteration, and the
filesystem will report an unaligned hole after the IOMAP_INLINE extent.
But iomap_readpage_actor (now iomap_readpage_iter) isn't prepared to
deal with unaligned extents, it will get things wrong on filesystems
with a block size smaller than the page size, and we'll eventually run
into the following warning in iomap_iter_advance:

  WARN_ON_ONCE(iter->processed > iomap_length(iter));

Fix that by changing iomap_readpage_iter to return 0 when hitting an
inline extent; this will cause iomap_iter to stop immediately.

To fix readahead as well, change iomap_readahead_iter to pass on
iomap_readpage_iter return values less than or equal to zero.

Fixes: 740499c784 ("iomap: fix the iomap_readpage_actor return value for inline data")
Cc: stable@vger.kernel.org # v5.15+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
2021-11-21 16:28:07 -08:00
Geert Uytterhoeven
61eb495c83 pstore/blk: Use "%lu" to format unsigned long
On 32-bit:

    fs/pstore/blk.c: In function ‘__best_effort_init’:
    include/linux/kern_levels.h:5:18: warning: format ‘%zu’ expects argument of type ‘size_t’, but argument 3 has type ‘long unsigned int’ [-Wformat=]
	5 | #define KERN_SOH "\001"  /* ASCII Start Of Header */
	  |                  ^~~~~~
    include/linux/kern_levels.h:14:19: note: in expansion of macro ‘KERN_SOH’
       14 | #define KERN_INFO KERN_SOH "6" /* informational */
	  |                   ^~~~~~~~
    include/linux/printk.h:373:9: note: in expansion of macro ‘KERN_INFO’
      373 |  printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
	  |         ^~~~~~~~~
    fs/pstore/blk.c:314:3: note: in expansion of macro ‘pr_info’
      314 |   pr_info("attached %s (%zu) (no dedicated panic_write!)\n",
	  |   ^~~~~~~

Cc: stable@vger.kernel.org
Fixes: 7bb9557b48 ("pstore/blk: Use the normal block device I/O path")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210629103700.1935012-1-geert@linux-m68k.org
Cc: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-21 09:44:19 -08:00
Linus Torvalds
923dcc5eb0 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "15 patches.

  Subsystems affected by this patch series: ipc, hexagon, mm (swap,
  slab-generic, kmemleak, hugetlb, kasan, damon, and highmem), and proc"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  proc/vmcore: fix clearing user buffer by properly using clear_user()
  kmap_local: don't assume kmap PTEs are linear arrays in memory
  mm/damon/dbgfs: fix missed use of damon_dbgfs_lock
  mm/damon/dbgfs: use '__GFP_NOWARN' for user-specified size buffer allocation
  kasan: test: silence intentional read overflow warnings
  hugetlb, userfaultfd: fix reservation restore on userfaultfd error
  hugetlb: fix hugetlb cgroup refcounting during mremap
  mm: kmemleak: slob: respect SLAB_NOLEAKTRACE flag
  hexagon: ignore vmlinux.lds
  hexagon: clean up timer-regs.h
  hexagon: export raw I/O routines for modules
  mm: emit the "free" trace report before freeing memory in kmem_cache_free()
  shm: extend forced shm destroy to support objects from several IPC nses
  ipc: WARN if trying to remove ipc object which is absent
  mm/swap.c:put_pages_list(): reinitialise the page list
2021-11-20 13:17:24 -08:00