Commit Graph

1199782 Commits

Author SHA1 Message Date
Chao Yu
3a2c0e55f9 f2fs: allow f2fs_ioc_{,de}compress_file to be interrupted
This patch allows f2fs_ioc_{,de}compress_file() to be interrupted, so that,
userspace won't be blocked when manual {,de}compression on large file is
interrupted by signal.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-14 13:41:08 -07:00
Christoph Hellwig
51bf8d3c81 f2fs: don't reopen the main block device in f2fs_scan_devices
f2fs_scan_devices reopens the main device since the very beginning, which
has always been useless, and also means that we don't pass the right
holder for the reopen, which now leads to a warning as the core super.c
holder ops aren't passed in for the reopen.

Fixes: 3c62be17d4 ("f2fs: support multiple devices")
Fixes: 0718afd47f ("block: introduce holder ops")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-14 13:41:08 -07:00
Chao Yu
b5ab3276eb f2fs: fix to avoid mmap vs set_compress_option case
Compression option in inode should not be changed after they have
been used, however, it may happen in below race case:

Thread A				Thread B
- f2fs_ioc_set_compress_option
 - check f2fs_is_mmap_file()
 - check get_dirty_pages()
 - check F2FS_HAS_BLOCKS()
					- f2fs_file_mmap
					 - set_inode_flag(FI_MMAP_FILE)
					- fault
					 - do_page_mkwrite
					  - f2fs_vm_page_mkwrite
					  - f2fs_get_block_locked
					 - fault_dirty_shared_page
					  - set_page_dirty
 - update i_compress_algorithm
 - update i_log_cluster_size
 - update i_cluster_size

Avoid such race condition by covering f2fs_file_mmap() w/ i_sem lock,
meanwhile add mmap file check condition in f2fs_may_compress() as well.

Fixes: e1e8debec6 ("f2fs: add F2FS_IOC_SET_COMPRESS_OPTION ioctl")
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-14 13:41:07 -07:00
Randy Dunlap
c709d099a0 f2fs: fix spelling in ABI documentation
Correct spelling problems as identified by codespell.

Fixes: 9e615dbba4 ("f2fs: add missing description for ipu_policy node")
Fixes: b2e4a2b300 ("f2fs: expose discard related parameters in sysfs")
Fixes: 846ae671ad ("f2fs: expose extension_list sysfs entry")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Chao Yu <chao@kernel.org>
Cc: linux-f2fs-devel@lists.sourceforge.net
Cc: Yangtao Li <frank.li@vivo.com>
Cc: Konstantin Vyshetsky <vkon@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-14 13:41:07 -07:00
Jaegeuk Kim
d2d9bb3b6d f2fs: get out of a repeat loop when getting a locked data page
https://bugzilla.kernel.org/show_bug.cgi?id=216050

Somehow we're getting a page which has a different mapping.
Let's avoid the infinite loop.

Cc: <stable@vger.kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-14 13:41:07 -07:00
Jaegeuk Kim
a3ab557466 f2fs: flush inode if atomic file is aborted
Let's flush the inode being aborted atomic operation to avoid stale dirty
inode during eviction in this call stack:

  f2fs_mark_inode_dirty_sync+0x22/0x40 [f2fs]
  f2fs_abort_atomic_write+0xc4/0xf0 [f2fs]
  f2fs_evict_inode+0x3f/0x690 [f2fs]
  ? sugov_start+0x140/0x140
  evict+0xc3/0x1c0
  evict_inodes+0x17b/0x210
  generic_shutdown_super+0x32/0x120
  kill_block_super+0x21/0x50
  deactivate_locked_super+0x31/0x90
  cleanup_mnt+0x100/0x160
  task_work_run+0x59/0x90
  do_exit+0x33b/0xa50
  do_group_exit+0x2d/0x80
  __x64_sys_exit_group+0x14/0x20
  do_syscall_64+0x3b/0x90
  entry_SYSCALL_64_after_hwframe+0x63/0xcd

This triggers f2fs_bug_on() in f2fs_evict_inode:
 f2fs_bug_on(sbi, is_inode_flag_set(inode, FI_DIRTY_INODE));

This fixes the syzbot report:

loop0: detected capacity change from 0 to 131072
F2FS-fs (loop0): invalid crc value
F2FS-fs (loop0): Found nat_bits in checkpoint
F2FS-fs (loop0): Mounted with checkpoint version = 48b305e4
------------[ cut here ]------------
kernel BUG at fs/f2fs/inode.c:869!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 5014 Comm: syz-executor220 Not tainted 6.4.0-syzkaller-11479-g6cd06ab12d1a #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/27/2023
RIP: 0010:f2fs_evict_inode+0x172d/0x1e00 fs/f2fs/inode.c:869
Code: ff df 48 c1 ea 03 80 3c 02 00 0f 85 6a 06 00 00 8b 75 40 ba 01 00 00 00 4c 89 e7 e8 6d ce 06 00 e9 aa fc ff ff e8 63 22 e2 fd <0f> 0b e8 5c 22 e2 fd 48 c7 c0 a8 3a 18 8d 48 ba 00 00 00 00 00 fc
RSP: 0018:ffffc90003a6fa00 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
RDX: ffff8880273b8000 RSI: ffffffff83a2bd0d RDI: 0000000000000007
RBP: ffff888077db91b0 R08: 0000000000000007 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: ffff888029a3c000
R13: ffff888077db9660 R14: ffff888029a3c0b8 R15: ffff888077db9c50
FS:  0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f1909bb9000 CR3: 00000000276a9000 CR4: 0000000000350ef0
Call Trace:
 <TASK>
 evict+0x2ed/0x6b0 fs/inode.c:665
 dispose_list+0x117/0x1e0 fs/inode.c:698
 evict_inodes+0x345/0x440 fs/inode.c:748
 generic_shutdown_super+0xaf/0x480 fs/super.c:478
 kill_block_super+0x64/0xb0 fs/super.c:1417
 kill_f2fs_super+0x2af/0x3c0 fs/f2fs/super.c:4704
 deactivate_locked_super+0x98/0x160 fs/super.c:330
 deactivate_super+0xb1/0xd0 fs/super.c:361
 cleanup_mnt+0x2ae/0x3d0 fs/namespace.c:1254
 task_work_run+0x16f/0x270 kernel/task_work.c:179
 exit_task_work include/linux/task_work.h:38 [inline]
 do_exit+0xa9a/0x29a0 kernel/exit.c:874
 do_group_exit+0xd4/0x2a0 kernel/exit.c:1024
 __do_sys_exit_group kernel/exit.c:1035 [inline]
 __se_sys_exit_group kernel/exit.c:1033 [inline]
 __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:1033
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f309be71a09
Code: Unable to access opcode bytes at 0x7f309be719df.
RSP: 002b:00007fff171df518 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 00007f309bef7330 RCX: 00007f309be71a09
RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000001
RBP: 0000000000000001 R08: ffffffffffffffc0 R09: 00007f309bef1e40
R10: 0000000000010600 R11: 0000000000000246 R12: 00007f309bef7330
R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000001
 </TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:f2fs_evict_inode+0x172d/0x1e00 fs/f2fs/inode.c:869
Code: ff df 48 c1 ea 03 80 3c 02 00 0f 85 6a 06 00 00 8b 75 40 ba 01 00 00 00 4c 89 e7 e8 6d ce 06 00 e9 aa fc ff ff e8 63 22 e2 fd <0f> 0b e8 5c 22 e2 fd 48 c7 c0 a8 3a 18 8d 48 ba 00 00 00 00 00 fc
RSP: 0018:ffffc90003a6fa00 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
RDX: ffff8880273b8000 RSI: ffffffff83a2bd0d RDI: 0000000000000007
RBP: ffff888077db91b0 R08: 0000000000000007 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: ffff888029a3c000
R13: ffff888077db9660 R14: ffff888029a3c0b8 R15: ffff888077db9c50
FS:  0000000000000000(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f1909bb9000 CR3: 00000000276a9000 CR4: 0000000000350ef0

Cc: <stable@vger.kernel.org>
Reported-and-tested-by: syzbot+e1246909d526a9d470fa@syzkaller.appspotmail.com
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-14 13:41:07 -07:00
Chao Yu
863907a4f5 f2fs: don't handle error case of f2fs_compress_alloc_page()
f2fs_compress_alloc_page() uses mempool to allocate memory, it never
fail, don't handle error case in its callers.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-14 13:41:06 -07:00
Jaegeuk Kim
579c7e4150 Revert "f2fs: clean up w/ sbi->log_sectors_per_block"
This reverts commit bfd4766239.

Shinichiro Kawasaki reported:

When I ran workloads on f2fs using v6.5-rcX with fixes [1][2] and a zoned block
devices with 4kb logical block size, I observe mount failure as follows. When
I revert this commit, the failure goes away.

[  167.781975][ T1555] F2FS-fs (dm-0): IO Block Size:        4 KB
[  167.890728][ T1555] F2FS-fs (dm-0): Found nat_bits in checkpoint
[  171.482588][ T1555] F2FS-fs (dm-0): Zone without valid block has non-zero write pointer. Reset the write pointer: wp[0x1300,0x8]
[  171.496000][ T1555] F2FS-fs (dm-0): (0) : Unaligned zone reset attempted (block 280000 + 80000)
[  171.505037][ T1555] F2FS-fs (dm-0): Discard zone failed:  (errno=-5)

The patch replaced "sbi->log_blocksize - SECTOR_SHIFT" with
"sbi->log_sectors_per_block". However, I think these two are not equal when the
device has 4k logical block size. The former uses Linux kernel sector size 512
byte. The latter use 512b sector size or 4kb sector size depending on the
device. mkfs.f2fs obtains logical block size via BLKSSZGET ioctl from the device
and reflects it to the value sbi->log_sector_size_per_block. This causes
unexpected write pointer calculations in check_zone_write_pointer(). This
resulted in unexpected zone reset and the mount failure.

[1] https://lkml.kernel.org/linux-f2fs-devel/20230711050101.GA19128@lst.de/
[2] https://lore.kernel.org/linux-f2fs-devel/20230804091556.2372567-1-shinichiro.kawasaki@wdc.com/

Cc: stable@vger.kernel.org
Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Fixes: bfd4766239 ("f2fs: clean up w/ sbi->log_sectors_per_block")
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2023-08-14 13:40:27 -07:00
Linus Torvalds
06c2afb862 Linux 6.5-rc1 2023-07-09 13:53:13 -07:00
Linus Torvalds
c192ac7357 MAINTAINERS 2: Electric Boogaloo
We just sorted the entries and fields last release, so just out of a
perverse sense of curiosity, I decided to see if we can keep things
ordered for even just one release.

The answer is "No. No we cannot".

I suggest that all kernel developers will need weekly training sessions,
involving a lot of Big Bird and Sesame Street.  And at the yearly
maintainer summit, we will all sing the alphabet song together.

I doubt I will keep doing this.  At some point "perverse sense of
curiosity" turns into just a cold dark place filled with sadness and
despair.

Repeats: 80e62bc848 ("MAINTAINERS: re-sort all entries and fields")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-07-09 10:29:53 -07:00
Linus Torvalds
f71f64210d dma-mapping fixes for Linux 6.5
- swiotlb area sizing fixes (Petr Tesarik)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmSq57sLHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYOh0xAAkwklIxxzXvNlgjvy2hdgPWImPS8tGPDSIsqA9TDD
 WDZq89nz/ndnchdPObDvyJXmfBgqa0qCHqopBVPqMKv5a1pKZhrRXYlbajGQQwji
 MIqefTLZ/VGw7bDpEivOt+yadwQ3KxuxWWs7/JPKLLReSJ22H8P+jkrK7P7kBL5X
 YaMtZG9d86fvFHnQHKdAOlF1iCvnoZDHPcvaZbI6m5mMSZ+HIYqK5pP1MTUEAbIU
 MX4ZSI7/mL0q67+kZuM/NCSeq1pH0Cd0D2DGm+8k/y87G81GS6E5Wgg+xO7vYiXf
 5ChuwlAO9K9KhH7NIRkKhkad/Ii89ENXSyv5gmPRoKYK5FXajnHSlJTUrZojV6XC
 Pbsd9ATVzV0rY61EPyh6G1a+Ttp/pwMp+W0I2fi032GVAePQ/vhB9x9O+2O/3QiC
 v80nUSatkciZncWqkguhp3NONsRmLKep3CCQnEAA/gLs27B0avdQeslnqbOOUQKd
 Si+djIoel8ONjQ+mW8eFCsVYMH1xFSo0aGsgGe0y2cyBE3DN1AW9eRnOXWT4C1PR
 UyDlx8ACs87ojec+YRQFYk2/PbsU7CQiH1pteXvBHcbhiVUAvrtXtg6ANQ+7066P
 IIduAZmlHcHb1BhyrSQbAtRllVLIp/l9IAkCSY9SvL0tjh/B5CaRBD5m2Taow5I/
 KUI=
 =4Lfc
 -----END PGP SIGNATURE-----

Merge tag 'dma-mapping-6.5-2023-07-09' of git://git.infradead.org/users/hch/dma-mapping

Pull dma-mapping fixes from Christoph Hellwig:

 - swiotlb area sizing fixes (Petr Tesarik)

* tag 'dma-mapping-6.5-2023-07-09' of git://git.infradead.org/users/hch/dma-mapping:
  swiotlb: reduce the number of areas to match actual memory pool size
  swiotlb: always set the number of areas before allocating the pool
2023-07-09 10:24:22 -07:00
Linus Torvalds
a9943ad3dd - Optimize IRQ domain's name assignment
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmSqa70ACgkQEsHwGGHe
 VUrTlhAAtfvA+mCeWVwn42827YlJeMvFV6RedERSk8x5CO/IJgHx2YFB0UXlznuM
 xQSIC7pkY2He62ggb+s6m+Pb1M3bxxJDaRSRnQGHOqXuSNuzokqjwvVMaSBE9s5W
 fgvdbZ4CDEynW7awiDq1PaMcJFzbc6RMbBDspNDke2ikZgGvqqWl3Gx2DHhGyNfQ
 UFJHETejxhfheU7Y9c06P2ToTyzc0RcaPFQfFmYIse+c6iTcjnqr9NRHCIklSygN
 F3VjSEXC1MkAZIAuV9pjkl3z01xfvDe1WqxH3jID/vM97AY08DqmBQC31VZOBmmV
 shndEUpc2yi7ued0WU1XLf1sMyEHm5rgF/19gRhFR9sxu4Zm9+JxOofo0URwqtsV
 Z3d0CxpDrF0ATJs2zTstUmOG0AD6SBfC+BQZi3M6rohXgR0z04dBnyTgCnxAs60u
 d5byFza5nZ/sX7KZXgVlKfH5ej0AcZuyFM/qMKqiTqzz5C4dv0Hq7y718aaecxmz
 W9A3olDGhC48pr9250TthEMhBzRdXvKgKQBUhC2TY/0hUBHyvmHeU//ql54XYhgh
 XVx3qI7yieZC3d27hZN0WTCG6qSCJoq8LROmGDIbx16Uj/cm1AHzJTJ+5CbCOafZ
 41gPBoNCNYNXlgYNBXpVcPRtfcK98fGl3jpPOk0llaSa6IW+Lcc=
 =ATsc
 -----END PGP SIGNATURE-----

Merge tag 'irq_urgent_for_v6.5_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq update from Borislav Petkov:

 - Optimize IRQ domain's name assignment

* tag 'irq_urgent_for_v6.5_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqdomain: Use return value of strreplace()
2023-07-09 10:16:04 -07:00
Linus Torvalds
51e3d7c274 - Do FPU AP initialization on Xen PV too which got missed by the recent
boot reordering work
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmSqal4ACgkQEsHwGGHe
 VUr+pg/+JyqfzyymWYAaPUfwaFH7V8425p8thrZL+OSnDDoAZt5UnPpLB4lYZKWW
 u2SlphNSLhuclZ7Wly1zkkPO1J8O88FRCFFBxONtnrQ4WqH2P7f2E6cHgzD4dQRF
 RX/pNuLQ1TNYiOHvNvJ3xJvVdAGrcXFBbqupfSig+dQMBKIyuzGu/Jn7Cm0Q+HJK
 j9WJWiGNJ+f8WOEbHiTdI89OFcPmUMe2nhtK/I/QIUoCBIiyp3jQ2RilZwY2V7Wu
 U5kSQChqp7N+e275TLlOCFGvNW2htCZ5GPc2/nCOkfmnTDTwjVGX8jQr+EqC1pj1
 WcueoTjBMw2Drs4/V9ItkGXYqmUE4CK03nGp6uZ2hA5Qo8mSAdzr59A3+I7BbHur
 ulbm1i6ZZ0ip9Co080E0JS0F1CIL7ROIQ6HDQz4BUGQ1BbmIhNBmdj7yBJ20nTrr
 L7EmwgDsOF2NhKpg5USGrPxJWBvc9ma72CAlHAiPVUgzFIR6Z5DN9TM8aWgZZPDt
 RULC1/L/SI2FQmrMnCYhjO7Om0qJFk422cWCVjOA3D/lRo3toFEJ/XopxxXz9FZs
 guAIJuFLjDun13hxS9PCGvRCkg2cdVsCykkg1ydAbg2ux99rPDAmmnwYPG7pvxiP
 2W0gq43dbQAZlYjRx3gV5sHpUtPCsF+1Lz5jXkldRZJNXD1v1Fk=
 =RZFV
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v6.5_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fpu fix from Borislav Petkov:

 - Do FPU AP initialization on Xen PV too which got missed by the recent
   boot reordering work

* tag 'x86_urgent_for_v6.5_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/xen: Fix secondary processors' FPU initialization
2023-07-09 10:13:32 -07:00
Linus Torvalds
e3da8db055 A single fix for the mechanism to park CPUs with an INIT IPI.
On shutdown or kexec, the kernel tries to park the non-boot CPUs with an
 INIT IPI. But the same code path is also used by the crash utility. If the
 CPU which panics is not the boot CPU then it sends an INIT IPI to the boot
 CPU which resets the machine. Prevent this by validating that the CPU which
 runs the stop mechanism is the boot CPU. If not, leave the other CPUs in
 HLT.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmSqoEcTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoYNSEACwo5zgibek27qeMvJGfNztm0qRa4mw
 wN0qV31yaNcEfhqL8bMU8n3wvEA+pZBqhaU5fyalY+yxc29jI/j9eda5zR+Fi9e5
 kVyFT2M0rVSDLFraoQeD+T/tSSK2MJtswF12ytY5mHzHMCb6Uy9fNCpUiQlB+i81
 AcnlKQk9ifZXFdMJPj5E+E6l776T8NZPoYEdFgJloxaYOGTdFJDWDlryx4LD7Urz
 Fx/ec8Ug/FYSPl2XzXHugvHjNefxKoomcZ3v3CSZonBcav7Gz6F06HAR5vVRWSHx
 4Dlh6zdy+60YKBmkvpb+RJIBMo8aXclwT+tntaoJvGHZ+PNASO6JVz9PvmoNgfWK
 Oy2n1K687qIOY6d+yxUZgbZpwXX5bG6kc0xbicUNigGagrYTfd83G5RAfwxNkqsY
 23Qw4Ue8uxve4M8iM/FfxKIShuDBiLCIDDIrWDjEkvIAnr1pd+NPUv7kqOTI7Kz/
 srNgcOwalypzuS93lgaN1yjRv1mmaPXhdhjy0DwGbC54bKgNzfq+7z75Ibn0dSFF
 JUPFVjztB+ymnM6PJ1dR77SvPi+xOi60nw7L+Qu9US4yKkW0NeGiIWVsggNorbU6
 UPFSE5gxwFD0w1EZ9W+IDeOZUNhjJUINZsn8txm+tb+oEqTIGRPHPOo0C1dBmLW9
 AmDIeHljj0iWIw==
 =DOCF
 -----END PGP SIGNATURE-----

Merge tag 'x86-core-2023-07-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fix from Thomas Gleixner:
 "A single fix for the mechanism to park CPUs with an INIT IPI.

  On shutdown or kexec, the kernel tries to park the non-boot CPUs with
  an INIT IPI. But the same code path is also used by the crash utility.
  If the CPU which panics is not the boot CPU then it sends an INIT IPI
  to the boot CPU which resets the machine.

  Prevent this by validating that the CPU which runs the stop mechanism
  is the boot CPU. If not, leave the other CPUs in HLT"

* tag 'x86-core-2023-07-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/smp: Don't send INIT to boot CPU
2023-07-09 10:08:38 -07:00
Linus Torvalds
74099e2034 - fixes for KVM
- fix for loongson build and cpu probing
 - DT fixes
 -----BEGIN PGP SIGNATURE-----
 
 iQJOBAABCAA4FiEEbt46xwy6kEcDOXoUeZbBVTGwZHAFAmSqY18aHHRzYm9nZW5k
 QGFscGhhLmZyYW5rZW4uZGUACgkQeZbBVTGwZHBYrw//VbewcC/3lYDaS2SDGIs1
 2f6DGshl7WeY/Y3IBA5xJXZHoK7AgwmC4OtPw+Ge4ZatD4mdW2B2aulIzRcm9yJV
 8EXNBDs7LYiKb7vG7ImYS38eykm/9dQBeK0gDsKA0eVWIKXgCCuDS78INnYFUwUk
 ZOJ8y2UkDxCOjLzhND3cDVWZEL2ZNq3XzPhFRaCrdnJTV3GgQs6s1f1A5t3iszNx
 4JQ4OM3z+wFNZj1CcWxs/lvMfftaXX5Osyyt6Pgy0gSou7ZrdWGlgIu1QDu9agq5
 o1+hXGYDUlnWsmQoUa8OA4+jhQs/XQYwMLs/sbSH9/1iGVPYDiWuw/oDPlesx4/u
 LtIk85SIL2todeD94b4w2UJvQjuaUgl/WzTF1CfAziX2yjs1zaHQaWXblnNzINLS
 eUpqFDbfPD0yzwjyfC52lNGbSKqOtBLr83GGzPrpp0OZDEI3rjwigARYcLXfh8SR
 KAgWRfYsrTi1efREdyAUlgsyys0yyCVTW5IqpKb6paSrIUIMpF3QE1+fl0Y3XuNK
 YByUyxJVYwWhljBXrNlaVcqdxtz5eZqeVQf0kus7AeHOOzNx0mBRGd99oKIXMMU8
 6fg/rhxSjtEnUdEWQC7hhrnnbgW/CT6uP1EndshYZ/NntyEygJVNNzHnKTbJYQWk
 LDnAtQTlmMB7snCc1+ECjO8=
 =5EjT
 -----END PGP SIGNATURE-----

Merge tag 'mips_6.5_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

Pull MIPS fixes from Thomas Bogendoerfer:

 - fixes for KVM

 - fix for loongson build and cpu probing

 - DT fixes

* tag 'mips_6.5_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
  MIPS: kvm: Fix build error with KVM_MIPS_DEBUG_COP0_COUNTERS enabled
  MIPS: dts: add missing space before {
  MIPS: Loongson: Fix build error when make modules_install
  MIPS: KVM: Fix NULL pointer dereference
  MIPS: Loongson: Fix cpu_probe_loongson() again
2023-07-09 10:02:49 -07:00
Linus Torvalds
76487845fd Minor cleanups for 6.5:
* Fix an uninitialized variable warning.
 
 Signed-off-by: Darrick J. Wong <djwong@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQ2qTKExjcn+O1o2YRKO3ySh0YRpgUCZKjUjwAKCRBKO3ySh0YR
 pn92AQC4gY9GOyKcc/aiAd/t1u8gGxnFtcN06xh4TdVArMM4/AD/UtEKx9LYuaSF
 pyhw5SfzxI555HfXkA8ci/D+BxguVQs=
 =/vX1
 -----END PGP SIGNATURE-----

Merge tag 'xfs-6.5-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fix from Darrick Wong:
 "Nothing exciting here, just getting rid of a gcc warning that I got
  tired of seeing when I turn on gcov"

* tag 'xfs-6.5-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: fix uninit warning in xfs_growfs_data
2023-07-09 09:50:42 -07:00
Linus Torvalds
4770353b66 3 smb3 client fixes
-----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmSqNkIACgkQiiy9cAdy
 T1GXsAwAhYUyjlXZLDsmO+9PjKhM9WRM1IO5myy3P396R0Tzq741f8LM7Lx08qc+
 D1701gsnhIrvprem1HjtW6DZzCVnLdpBIYUEnwUr8eDqMpk1VFKug3xSVhIRMih3
 Y30dHTgQ0aCLrrh5XHOWhBHJbpq7Wdlh3q0oi8I36Of8e6tGFNo2wI4ud7no4aIj
 N222dWOs56FXtVAmgEAuc7U2A40ztMOp7FXrbzhK4FwD5kO+pFkqJcLjG6Bk10ph
 Tyg3Wh2TnX+MviOY0xUaN0X50dSoSJPkSUYGkccrIcfVPwEoH7l6j0LNgAVyhG7K
 f5EUbM7Td51a1Znj9wX6U9N0UfO/IOZRDFZ7ACckLBBBEzfKYCgYY5dWJ6aVxZHb
 bB336f1ObvDiocEabS1SMa//sXUjpOy3Tg8etLCYJpqjWYE8nO7lERoBWGWXkUqy
 xO86pGQjYLzkw16R11tzbplv+1HxoGwIuQnOubivv2prn++NZ4Zr2ohBeDlyJc1/
 WwF42UfM
 =F8D0
 -----END PGP SIGNATURE-----

Merge tag '6.5-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6

Pull more smb client updates from Steve French:

 - fix potential use after free in unmount

 - minor cleanup

 - add worker to cleanup stale directory leases

* tag '6.5-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: Add a laundromat thread for cached directories
  smb: client: remove redundant pointer 'server'
  cifs: fix session state transition to avoid use-after-free issue
2023-07-09 09:45:32 -07:00
Linus Torvalds
cff0687396 Fixes for pci_clean_master, error handling in driver inits, and various
other issues/bugs.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEoE9b9c3U2JxX98mqbmZLrHqL0iMFAmSpuR8ACgkQbmZLrHqL
 0iOnrg/9FT/mEyLZXfI/sAZWyCKvydvbFL78Id9JVKxqQ00tlLlMtRFlLG7gtagn
 GrOf0HSM6RJxeWo8gqGjb1tM0L1qO/Bt9MlqZf7jIJTyL8DBUMz5sFdzqQ5TZo6j
 krKRjzjiNIyPlSUWGBBt2MIVUbJwfFu6f1dnpAVTWjyc2CNEYa5r3FvAqOTVYTMS
 IiNJyCaGwdPOXQVxOWwJnT4aPcWZzOMf4GYWX2hi8HczcEh1bHcBtHL6UGk5CiG9
 igj1e4Q3865lDp9eFIrQEtW2BSf1igzILpT3WmSxMXvufzILLAl6pOq/s/ziTEM2
 Hrg/JnU1W1Iozpyzd5A1lVIy5+NFN3AEHM3iBI0aZ5Qj1yBhNv4CJ/WLqPABBs+D
 cmuea+gOJjKKVGqdLNZ9+rDGtIV+SRQA+0GGoBhBXvVqH5C7qUxx5ZJMD26mC9lS
 L0dQTbgVyJc3Zv8/FHNGuKvp+xq+vgeZmt8zDViNPlmB9bjz3XikCE7D0lr74coD
 Vvmn4dUo65EqvWfKqLR8fXDZ9pdaTqF19xddNPiZS4p3/CDtKWqXIuBdTKiqq1s4
 Lx78R8wU/uifWSf/9+cPyJ7UCCyPpUdj/SahzZLE4ACfmdfYWIHNf0REt+kGn7Vv
 uCoe7rXBccrz03Fv5mHHuNGS24dfdMim2LXYb3GN2A5atIT0Nko=
 =xPW1
 -----END PGP SIGNATURE-----

Merge tag 'ntb-6.5' of https://github.com/jonmason/ntb

Pull NTB updates from Jon Mason:
 "Fixes for pci_clean_master, error handling in driver inits, and
  various other issues/bugs"

* tag 'ntb-6.5' of https://github.com/jonmason/ntb:
  ntb: hw: amd: Fix debugfs_create_dir error checking
  ntb.rst: Fix copy and paste error
  ntb_netdev: Fix module_init problem
  ntb: intel: Remove redundant pci_clear_master
  ntb: epf: Remove redundant pci_clear_master
  ntb_hw_amd: Remove redundant pci_clear_master
  ntb: idt: drop redundant pci_enable_pcie_error_reporting()
  MAINTAINERS: git://github -> https://github.com for jonmason
  NTB: EPF: fix possible memory leak in pci_vntb_probe()
  NTB: ntb_tool: Add check for devm_kcalloc
  NTB: ntb_transport: fix possible memory leak while device_register() fails
  ntb: intel: Fix error handling in intel_ntb_pci_driver_init()
  NTB: amd: Fix error handling in amd_ntb_pci_driver_init()
  ntb: idt: Fix error handling in idt_pci_driver_init()
2023-07-09 09:35:51 -07:00
Hugh Dickins
1c7873e336 mm: lock newly mapped VMA with corrected ordering
Lockdep is certainly right to complain about

  (&vma->vm_lock->lock){++++}-{3:3}, at: vma_start_write+0x2d/0x3f
                 but task is already holding lock:
  (&mapping->i_mmap_rwsem){+.+.}-{3:3}, at: mmap_region+0x4dc/0x6db

Invert those to the usual ordering.

Fixes: 33313a747e ("mm: lock newly mapped VMA which can be modified after it becomes visible")
Cc: stable@vger.kernel.org
Signed-off-by: Hugh Dickins <hughd@google.com>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-07-08 16:44:11 -07:00
Linus Torvalds
946c6b59c5 16 hotfixes. Six are cc:stable and the remainder address post-6.4 issues.
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZKmgXAAKCRDdBJ7gKXxA
 joqDAP0V520Jy0cyJrRMvaQRFMqtVeDOdTpAue7ZOQHSi/LZnAD9EEAxDpYF/V4x
 PO27ixXQ4Glm2iYgH7bDX7J73WiA3wg=
 =JsYW
 -----END PGP SIGNATURE-----

Merge tag 'mm-hotfixes-stable-2023-07-08-10-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull hotfixes from Andrew Morton:
 "16 hotfixes. Six are cc:stable and the remainder address post-6.4
  issues"

The merge undoes the disabling of the CONFIG_PER_VMA_LOCK feature, since
it was all hopefully fixed in mainline.

* tag 'mm-hotfixes-stable-2023-07-08-10-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
  lib: dhry: fix sleeping allocations inside non-preemptable section
  kasan, slub: fix HW_TAGS zeroing with slub_debug
  kasan: fix type cast in memory_is_poisoned_n
  mailmap: add entries for Heiko Stuebner
  mailmap: update manpage link
  bootmem: remove the vmemmap pages from kmemleak in free_bootmem_page
  MAINTAINERS: add linux-next info
  mailmap: add Markus Schneider-Pargmann
  writeback: account the number of pages written back
  mm: call arch_swap_restore() from do_swap_page()
  squashfs: fix cache race with migration
  mm/hugetlb.c: fix a bug within a BUG(): inconsistent pte comparison
  docs: update ocfs2-devel mailing list address
  MAINTAINERS: update ocfs2-devel mailing list address
  mm: disable CONFIG_PER_VMA_LOCK until its fixed
  fork: lock VMAs of the parent process when forking
2023-07-08 14:30:25 -07:00
Suren Baghdasaryan
fb49c45532 fork: lock VMAs of the parent process when forking
When forking a child process, the parent write-protects anonymous pages
and COW-shares them with the child being forked using copy_present_pte().

We must not take any concurrent page faults on the source vma's as they
are being processed, as we expect both the vma and the pte's behind it
to be stable.  For example, the anon_vma_fork() expects the parents
vma->anon_vma to not change during the vma copy.

A concurrent page fault on a page newly marked read-only by the page
copy might trigger wp_page_copy() and a anon_vma_prepare(vma) on the
source vma, defeating the anon_vma_clone() that wasn't done because the
parent vma originally didn't have an anon_vma, but we now might end up
copying a pte entry for a page that has one.

Before the per-vma lock based changes, the mmap_lock guaranteed
exclusion with concurrent page faults.  But now we need to do a
vma_start_write() to make sure no concurrent faults happen on this vma
while it is being processed.

This fix can potentially regress some fork-heavy workloads.  Kernel
build time did not show noticeable regression on a 56-core machine while
a stress test mapping 10000 VMAs and forking 5000 times in a tight loop
shows ~5% regression.  If such fork time regression is unacceptable,
disabling CONFIG_PER_VMA_LOCK should restore its performance.  Further
optimizations are possible if this regression proves to be problematic.

Suggested-by: David Hildenbrand <david@redhat.com>
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Closes: https://lore.kernel.org/all/dbdef34c-3a07-5951-e1ae-e9c6e3cdf51b@kernel.org/
Reported-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Closes: https://lore.kernel.org/all/b198d649-f4bf-b971-31d0-e8433ec2a34c@applied-asynchrony.com/
Reported-by: Jacob Young <jacobly.alt@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624
Fixes: 0bff0aaea0 ("x86/mm: try VMA lock-based page fault handling first")
Cc: stable@vger.kernel.org
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-07-08 14:08:02 -07:00
Suren Baghdasaryan
33313a747e mm: lock newly mapped VMA which can be modified after it becomes visible
mmap_region adds a newly created VMA into VMA tree and might modify it
afterwards before dropping the mmap_lock.  This poses a problem for page
faults handled under per-VMA locks because they don't take the mmap_lock
and can stumble on this VMA while it's still being modified.  Currently
this does not pose a problem since post-addition modifications are done
only for file-backed VMAs, which are not handled under per-VMA lock.
However, once support for handling file-backed page faults with per-VMA
locks is added, this will become a race.

Fix this by write-locking the VMA before inserting it into the VMA tree.
Other places where a new VMA is added into VMA tree do not modify it
after the insertion, so do not need the same locking.

Cc: stable@vger.kernel.org
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-07-08 14:08:02 -07:00
Suren Baghdasaryan
c137381f71 mm: lock a vma before stack expansion
With recent changes necessitating mmap_lock to be held for write while
expanding a stack, per-VMA locks should follow the same rules and be
write-locked to prevent page faults into the VMA being expanded. Add
the necessary locking.

Cc: stable@vger.kernel.org
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-07-08 14:08:02 -07:00
Linus Torvalds
7fcd473a64 SCSI misc on 20230708
A few late arriving patches that missed the initial pull request.
 It's mostly bug fixes (the dt-bindings is a fix for the initial pull).
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCZKmvwSYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishResAQCPDbBh
 omMRBE+W+Vx2TgOJGjo/F+T1D2JjBhLIGpNVggEApJtgrQutAToiCU/qIP9GOTl7
 evetzh5boMMuyD2s7ak=
 =pi4v
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull more SCSI updates from James Bottomley:
 "A few late arriving patches that missed the initial pull request. It's
  mostly bug fixes (the dt-bindings is a fix for the initial pull)"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: ufs: core: Remove unused function declaration
  scsi: target: docs: Remove tcm_mod_builder.py
  scsi: target: iblock: Quiet bool conversion warning with pr_preempt use
  scsi: dt-bindings: ufs: qcom: Fix ICE phandle
  scsi: core: Simplify scsi_cdl_check_cmd()
  scsi: isci: Fix comment typo
  scsi: smartpqi: Replace one-element arrays with flexible-array members
  scsi: target: tcmu: Replace strlcpy() with strscpy()
  scsi: ncr53c8xx: Replace strlcpy() with strscpy()
  scsi: lpfc: Fix lpfc_name struct packing
2023-07-08 12:35:18 -07:00
Linus Torvalds
84dc5aa3f0 Part 2 of I2C patches for 6.5
* xiic patch should have been in part 1 but slipped through
 * mpc patch fixes a build regression from part 1
 * nomadik is a fix which needed a rebase after part 1
 -----BEGIN PGP SIGNATURE-----
 
 iQJDBAABCgAtFiEEOZGx6rniZ1Gk92RdFA3kzBSgKbYFAmSoblYPHHdzYUBrZXJu
 ZWwub3JnAAoJEBQN5MwUoCm2iG0P/RG3yGeV0PzQ2XbKPwIMAQURQ7NiqVVrAnUP
 kkduOkPJ9QU+9Qjb/4SRCTxWFVFbqruyOZWB+5hmwHtv4MS3H1ulKaH9bat5bQsF
 F28o5pjMVQEhjl+Y1jAtnSOlhprbTds6B1NdgbMFopnQ7ExlB1/RAt0rvZdgsfkt
 WeKRQMlne0mpQ7mybp6qsN8ldYBSDu8AmZ0S/RSnCwRwl0AxYYxk5roQAfg0tbaU
 N1zWUerNO7lAVoJnFsdeo0CMxK7t3QGHPSZGayargEF0KJHmxfzU14oZ6l28j6wq
 mSazjyK5wTeqqa4mU835RO9Ko6zE42u5nM+ok8Ui6a4xWjvafNrjnZkIClMg2eC9
 7ESZoz+Rtd3kCu3LiWXqPhs9cplHrRtA4M8E4gbTbsdMB8kFRaaiocjHfPksO8SG
 QtFXHGV5AgtoBkCC3womZWvY17XOZf40S6NUFafZ8kZFg5WGCWP3rwUZhz/+frSM
 QiIdxM+cER6WC6ADnZHdQyFl1fjLc4EdbwoN54E7DRBqB4s0wy+1vU+/qf9JdIuR
 zIN0I2OFxnwFEiTfAlTX1RmYy7L/3dtP2Gk8og+W9CxHSwJsURhqIfz/MjNuoh7U
 c11jS9QfETBj2GDjRoRxE0zj4Q2rhjdTwFhQfuYp0eI6UsDfYHHB7ql2FwO1+EW1
 VkmkX38T
 =xwp6
 -----END PGP SIGNATURE-----

Merge tag 'i2c-for-6.5-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull more i2c updates from Wolfram Sang:

 - xiic patch should have been in the original pull but slipped through

 - mpc patch fixes a build regression

 - nomadik cleanup

* tag 'i2c-for-6.5-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: mpc: Drop unused variable
  i2c: nomadik: Remove a useless call in the remove function
  i2c: xiic: Don't try to handle more interrupt events after error
2023-07-08 12:28:00 -07:00
Linus Torvalds
8fc3b8f082 hardening fixes for v6.5-rc1
- Check for NULL bdev in LoadPin (Matthias Kaehlcke)
 
 - Revert unwanted KUnit FORTIFY build default
 
 - Fix 1-element array causing boot warnings with xhci-hub
 -----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAmSoVSsWHGtlZXNjb29r
 QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJjyuD/9Sgr+T3VJyROJdKouYO8tLUqaO
 g0A6+WE0L7XyO4ZYk4FOadeihsVEPuhB0fpDTwriKCKdPB35+Nhq8YfWPPQcGdjQ
 0IAT5AjsjYDDFGABRtsNRcL+KXyR+QRVUnSllEsZuwb3lyq6HRbdTF2QBjToAbyO
 QOgEnFJNqPp2w9y2KSzpMuYL4I9o1WbyM+huVSfoKe/3d2WnVKiARMpV+0EJgUAy
 BvORp55+c1w77IRbQduACWszdCLXfkQyI+p5ii3M7cZmePDe4q8LHN01WtIMEnHy
 cln7AnwU4daxzfdeAWIQMLFjOXTLHlkRhC18KSobeBc5Zkudtcg5LxtFGiDsDgOU
 mUWB/Ow8rgr6KlYkMFmFrW/GAVX12KbPXDATECa/4Yhl55Ydl/1bChJWWnX2pppU
 mRRnwIcY7MfhRLeB284Gst81wOHy408arJsm/vck5kdya0Ys1y38rgNQm7iKfXVu
 FYMrDU9qqGmeIVk2namjQYoWH5ei670PXndtrcvSffeZOhpzk2FnFphtraPe0mrl
 l1lcUonZwEoTJ4wDiOR9cjSphoDVom9LgwygQVb4KGHBjuCfRABDV2DGy9duBMtv
 Akcet48VkCX6wF91+30fFmTs5haRiF/5kkx5fGuxhFlQO8QHYVjIOH55VqhAt3mw
 d0OWiZaNRvbNfjPSkQ==
 =R3uK
 -----END PGP SIGNATURE-----

Merge tag 'hardening-v6.5-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull hardening fixes from Kees Cook:

 - Check for NULL bdev in LoadPin (Matthias Kaehlcke)

 - Revert unwanted KUnit FORTIFY build default

 - Fix 1-element array causing boot warnings with xhci-hub

* tag 'hardening-v6.5-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  usb: ch9: Replace bmSublinkSpeedAttr 1-element array with flexible array
  Revert "fortify: Allow KUnit test to build without FORTIFY"
  dm: verity-loadpin: Add NULL pointer check for 'bdev' parameter
2023-07-08 12:08:39 -07:00
Anup Sharma
bff6efc54b ntb: hw: amd: Fix debugfs_create_dir error checking
The debugfs_create_dir function returns ERR_PTR in case of error, and the
only correct way to check if an error occurred is 'IS_ERR' inline function.
This patch will replace the null-comparison with IS_ERR.

Signed-off-by: Anup Sharma <anupnewsmail@gmail.com>
Suggested-by: Ivan Orlov <ivan.orlov0322@gmail.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
2023-07-08 13:55:44 -04:00
Linus Torvalds
c206353dfd perf tools changes and fixes for v6.5: 2nd batch
Build:
 
  - Allow to generate vmlinux.h from BTF using `make GEN_VMLINUX_H=1`
    and skip if the vmlinux has no BTF.
 
  - Replace deprecated clang -target xxx option by --target=xxx.
 
 perf record:
 
  - Print event attributes with well known type and config symbols in the
    debug output like below:
 
     # perf record -e cycles,cpu-clock -C0 -vv true
     <SNIP>
     ------------------------------------------------------------
     perf_event_attr:
       type                             0 (PERF_TYPE_HARDWARE)
       size                             136
       config                           0 (PERF_COUNT_HW_CPU_CYCLES)
       { sample_period, sample_freq }   4000
       sample_type                      IP|TID|TIME|CPU|PERIOD|IDENTIFIER
       read_format                      ID
       disabled                         1
       inherit                          1
       freq                             1
       sample_id_all                    1
       exclude_guest                    1
     ------------------------------------------------------------
     sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 5
     ------------------------------------------------------------
     perf_event_attr:
       type                             1 (PERF_TYPE_SOFTWARE)
       size                             136
       config                           0 (PERF_COUNT_SW_CPU_CLOCK)
       { sample_period, sample_freq }   4000
       sample_type                      IP|TID|TIME|CPU|PERIOD|IDENTIFIER
       read_format                      ID
       disabled                         1
       inherit                          1
       freq                             1
       sample_id_all                    1
       exclude_guest                    1
 
  - Update AMD IBS event error message since it now support per-process
    profiling but no priviledge filters.
 
     $ sudo perf record -e ibs_op//k -C 0
     Error:
     AMD IBS doesn't support privilege filtering. Try again without
     the privilege modifiers (like 'k') at the end.
 
 perf lock contention:
 
  - Support CSV style output using -x option
 
     $ sudo perf lock con -ab -x, sleep 1
     # output: contended, total wait, max wait, avg wait, type, caller
     19, 194232, 21415, 10222, spinlock, process_one_work+0x1f0
     15, 162748, 23843, 10849, rwsem:R, do_user_addr_fault+0x40e
     4, 86740, 23415, 21685, rwlock:R, ep_poll_callback+0x2d
     1, 84281, 84281, 84281, mutex, iwl_mvm_async_handlers_wk+0x135
     8, 67608, 27404, 8451, spinlock, __queue_work+0x174
     3, 58616, 31125, 19538, rwsem:W, do_mprotect_pkey+0xff
     3, 52953, 21172, 17651, rwlock:W, do_epoll_wait+0x248
     2, 30324, 19704, 15162, rwsem:R, do_madvise+0x3ad
     1, 24619, 24619, 24619, spinlock, rcu_core+0xd4
 
  - Add --output option to save the data to a file not to be interfered
    by other debug messages.
 
 Test:
 
  - Fix event parsing test on ARM where there's no raw PMU nor supports
    PERF_PMU_CAP_EXTENDED_HW_TYPE.
 
  - Update the lock contention test case for CSV output.
 
  - Fix a segfault in the daemon command test.
 
 Vendor events (JSON):
 
  - Add has_event() to check if the given event is available on system
    at runtime.  On Intel machines, some transaction events may not be
    present when TSC extensions are disabled.
 
  - Update Intel event metrics.
 
 Misc:
 
  - Sort symbols by name using an external array of pointers instead of
    a rbtree node in the symbol.  This will save 16-bytes or 24-bytes
    per symbol whether the sorting is actually requested or not.
 
  - Fix unwinding DWARF callstacks using libdw when --symfs option is
    used.
 
 Signed-off-by: Namhyung Kim <namhyung@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQSo2x5BnqMqsoHtzsmMstVUGiXMgwUCZKb4mwAKCRCMstVUGiXM
 g1QqAPwKZow/DhAzyN7KvzdNd+SojRGpUMl6RkVphY/9ntDqPAD+L3V5aXLTiC1L
 8kUzdpRX5VMjqdR9U7TycUOi4QU40QA=
 =dEF1
 -----END PGP SIGNATURE-----

Merge tag 'perf-tools-for-v6.5-2-2023-07-06' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next

Pull more perf tools updates from Namhyung Kim:
 "These are remaining changes and fixes for this cycle.

  Build:

   - Allow generating vmlinux.h from BTF using `make GEN_VMLINUX_H=1`
     and skip if the vmlinux has no BTF.

   - Replace deprecated clang -target xxx option by --target=xxx.

  perf record:

   - Print event attributes with well known type and config symbols in
     the debug output like below:

       # perf record -e cycles,cpu-clock -C0 -vv true
       <SNIP>
       ------------------------------------------------------------
       perf_event_attr:
         type                             0 (PERF_TYPE_HARDWARE)
         size                             136
         config                           0 (PERF_COUNT_HW_CPU_CYCLES)
         { sample_period, sample_freq }   4000
         sample_type                      IP|TID|TIME|CPU|PERIOD|IDENTIFIER
         read_format                      ID
         disabled                         1
         inherit                          1
         freq                             1
         sample_id_all                    1
         exclude_guest                    1
       ------------------------------------------------------------
       sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 5
       ------------------------------------------------------------
       perf_event_attr:
         type                             1 (PERF_TYPE_SOFTWARE)
         size                             136
         config                           0 (PERF_COUNT_SW_CPU_CLOCK)
         { sample_period, sample_freq }   4000
         sample_type                      IP|TID|TIME|CPU|PERIOD|IDENTIFIER
         read_format                      ID
         disabled                         1
         inherit                          1
         freq                             1
         sample_id_all                    1
         exclude_guest                    1

   - Update AMD IBS event error message since it now support per-process
     profiling but no priviledge filters.

       $ sudo perf record -e ibs_op//k -C 0
       Error:
       AMD IBS doesn't support privilege filtering. Try again without
       the privilege modifiers (like 'k') at the end.

  perf lock contention:

   - Support CSV style output using -x option

       $ sudo perf lock con -ab -x, sleep 1
       # output: contended, total wait, max wait, avg wait, type, caller
       19, 194232, 21415, 10222, spinlock, process_one_work+0x1f0
       15, 162748, 23843, 10849, rwsem:R, do_user_addr_fault+0x40e
       4, 86740, 23415, 21685, rwlock:R, ep_poll_callback+0x2d
       1, 84281, 84281, 84281, mutex, iwl_mvm_async_handlers_wk+0x135
       8, 67608, 27404, 8451, spinlock, __queue_work+0x174
       3, 58616, 31125, 19538, rwsem:W, do_mprotect_pkey+0xff
       3, 52953, 21172, 17651, rwlock:W, do_epoll_wait+0x248
       2, 30324, 19704, 15162, rwsem:R, do_madvise+0x3ad
       1, 24619, 24619, 24619, spinlock, rcu_core+0xd4

   - Add --output option to save the data to a file not to be interfered
     by other debug messages.

  Test:

   - Fix event parsing test on ARM where there's no raw PMU nor supports
     PERF_PMU_CAP_EXTENDED_HW_TYPE.

   - Update the lock contention test case for CSV output.

   - Fix a segfault in the daemon command test.

  Vendor events (JSON):

   - Add has_event() to check if the given event is available on system
     at runtime. On Intel machines, some transaction events may not be
     present when TSC extensions are disabled.

   - Update Intel event metrics.

  Misc:

   - Sort symbols by name using an external array of pointers instead of
     a rbtree node in the symbol. This will save 16-bytes or 24-bytes
     per symbol whether the sorting is actually requested or not.

   - Fix unwinding DWARF callstacks using libdw when --symfs option is
     used"

* tag 'perf-tools-for-v6.5-2-2023-07-06' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next: (38 commits)
  perf test: Fix event parsing test when PERF_PMU_CAP_EXTENDED_HW_TYPE isn't supported.
  perf test: Fix event parsing test on Arm
  perf evsel amd: Fix IBS error message
  perf: unwind: Fix symfs with libdw
  perf symbol: Fix uninitialized return value in symbols__find_by_name()
  perf test: Test perf lock contention CSV output
  perf lock contention: Add --output option
  perf lock contention: Add -x option for CSV style output
  perf lock: Remove stale comments
  perf vendor events intel: Update tigerlake to 1.13
  perf vendor events intel: Update skylakex to 1.31
  perf vendor events intel: Update skylake to 57
  perf vendor events intel: Update sapphirerapids to 1.14
  perf vendor events intel: Update icelakex to 1.21
  perf vendor events intel: Update icelake to 1.19
  perf vendor events intel: Update cascadelakex to 1.19
  perf vendor events intel: Update meteorlake to 1.03
  perf vendor events intel: Add rocketlake events/metrics
  perf vendor metrics intel: Make transaction metrics conditional
  perf jevents: Support for has_event function
  ...
2023-07-08 10:21:51 -07:00
Linus Torvalds
ad8258e877 bitmap patches for v6.5
Fixes for different bitmap pieces:
 
  - lib/test_bitmap: increment failure counter properly
 
    The tests that don't use expect_eq() macro to determine that a test is
    failured must increment failed_tests explicitly.
 
  - lib/bitmap: drop optimization of bitmap_{from,to}_arr64
 
    bitmap_{from,to}_arr64() optimization is overly optimistic on 32-bit LE
    architectures when it's wired to bitmap_copy_clear_tail().
 
  - nodemask: Drop duplicate check in for_each_node_mask()
 
    As the return value type of first_node() became unsigned, the node >= 0
    became unnecessary.
 
  - cpumask: fix function description kernel-doc notation
 
  - MAINTAINERS: Add bits.h to the BITMAP API record
  - MAINTAINERS: Add bitfield.h to the BITMAP API record
 
    Add linux/bits.h and linux/bitfield.h for visibility
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEEi8GdvG6xMhdgpu/4sUSA/TofvsgFAmShrd0ACgkQsUSA/Tof
 vsg5RAv/YyedOccCkLtd4D+l+jF/a1qd64u6HfONkYzdoSeKIBiAlKukpHOjJjky
 ii5vSwEMzW34MA6wBylmY4078OXEwEu9SUsF/+5jGvqJ1ABCZkwFoMuAavRHkHhc
 xE/es6XvpSbZ4U4GHnyaUbQCuEkpW21U644QdSWqjVPDHmo32EgtNG7hTaZfCRNh
 VNa8H+tbgi5rQUMBeyxppJjyynGKwut6pX4OcOi2lq1onz8zzHx8H4AJC2v8Yg0r
 2ZRS1FfAuk2Pvp7JFQOExcxf0x/Ph3zp8dqIgfo2pSMAi/0ZjnTkAPxCYIGsWG5u
 TbfO3rHviHtKAN0q5/6xp939G69flW5xSica+IvtuXSIS/JbllwDN4mrexYxppWH
 Euj/ixJe79Dn+jtOgG17p4cp70yAcM1IQfyWH25Q8Jej+C/gN8WB0UXD4K7jTXrn
 9rGID28WbuclAWzJDmFQTpZutG1GHjM7rytit+if08gz065vUhTllhJC+vCHQojY
 clS4E2DW
 =nZF4
 -----END PGP SIGNATURE-----

Merge tag 'bitmap-6.5-rc1' of https://github.com/norov/linux

Pull bitmap updates from Yury Norov:
 "Fixes for different bitmap pieces:

   - lib/test_bitmap: increment failure counter properly

     The tests that don't use expect_eq() macro to determine that a test
     is failured must increment failed_tests explicitly.

   - lib/bitmap: drop optimization of bitmap_{from,to}_arr64

     bitmap_{from,to}_arr64() optimization is overly optimistic
     on 32-bit LE architectures when it's wired to
     bitmap_copy_clear_tail().

   - nodemask: Drop duplicate check in for_each_node_mask()

     As the return value type of first_node() became unsigned, the node
     >= 0 became unnecessary.

   - cpumask: fix function description kernel-doc notation

   - MAINTAINERS: Add bits.h and bitfield.h to the BITMAP API record

     Add linux/bits.h and linux/bitfield.h for visibility"

* tag 'bitmap-6.5-rc1' of https://github.com/norov/linux:
  MAINTAINERS: Add bitfield.h to the BITMAP API record
  MAINTAINERS: Add bits.h to the BITMAP API record
  cpumask: fix function description kernel-doc notation
  nodemask: Drop duplicate check in for_each_node_mask()
  lib/bitmap: drop optimization of bitmap_{from,to}_arr64
  lib/test_bitmap: increment failure counter properly
2023-07-08 10:02:24 -07:00
Geert Uytterhoeven
8ba388c06b lib: dhry: fix sleeping allocations inside non-preemptable section
The Smatch static checker reports the following warnings:

    lib/dhry_run.c:38 dhry_benchmark() warn: sleeping in atomic context
    lib/dhry_run.c:43 dhry_benchmark() warn: sleeping in atomic context

Indeed, dhry() does sleeping allocations inside the non-preemptable
section delimited by get_cpu()/put_cpu().

Fix this by using atomic allocations instead.
Add error handling, as atomic these allocations may fail.

Link: https://lkml.kernel.org/r/bac6d517818a7cd8efe217c1ad649fffab9cc371.1688568764.git.geert+renesas@glider.be
Fixes: 13684e966d ("lib: dhry: fix unstable smp_processor_id(_) usage")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/r/0469eb3a-02eb-4b41-b189-de20b931fa56@moroto.mountain
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-08 09:29:32 -07:00
Andrey Konovalov
fdb54d9660 kasan, slub: fix HW_TAGS zeroing with slub_debug
Commit 946fa0dbf2 ("mm/slub: extend redzone check to extra allocated
kmalloc space than requested") added precise kmalloc redzone poisoning to
the slub_debug functionality.

However, this commit didn't account for HW_TAGS KASAN fully initializing
the object via its built-in memory initialization feature.  Even though
HW_TAGS KASAN memory initialization contains special memory initialization
handling for when slub_debug is enabled, it does not account for in-object
slub_debug redzones.  As a result, HW_TAGS KASAN can overwrite these
redzones and cause false-positive slub_debug reports.

To fix the issue, avoid HW_TAGS KASAN memory initialization when
slub_debug is enabled altogether.  Implement this by moving the
__slub_debug_enabled check to slab_post_alloc_hook.  Common slab code
seems like a more appropriate place for a slub_debug check anyway.

Link: https://lkml.kernel.org/r/678ac92ab790dba9198f9ca14f405651b97c8502.1688561016.git.andreyknvl@google.com
Fixes: 946fa0dbf2 ("mm/slub: extend redzone check to extra allocated kmalloc space than requested")
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Reported-by: Will Deacon <will@kernel.org>
Acked-by: Marco Elver <elver@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: kasan-dev@googlegroups.com
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Collingbourne <pcc@google.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-08 09:29:32 -07:00
Andrey Konovalov
05c56e7b43 kasan: fix type cast in memory_is_poisoned_n
Commit bb6e04a173 ("kasan: use internal prototypes matching gcc-13
builtins") introduced a bug into the memory_is_poisoned_n implementation:
it effectively removed the cast to a signed integer type after applying
KASAN_GRANULE_MASK.

As a result, KASAN started failing to properly check memset, memcpy, and
other similar functions.

Fix the bug by adding the cast back (through an additional signed integer
variable to make the code more readable).

Link: https://lkml.kernel.org/r/8c9e0251c2b8b81016255709d4ec42942dcaf018.1688431866.git.andreyknvl@google.com
Fixes: bb6e04a173 ("kasan: use internal prototypes matching gcc-13 builtins")
Signed-off-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-08 09:29:32 -07:00
Heiko Stuebner
d3a808ec78 mailmap: add entries for Heiko Stuebner
I am going to lose my vrull.eu address at the end of july, and while
adding it to mailmap I also realised that there are more old addresses
from me dangling, so update .mailmap for all of them.

Link: https://lkml.kernel.org/r/20230704163919.1136784-3-heiko@sntech.de
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-08 09:29:31 -07:00
Heiko Stuebner
ddcd91f4cb mailmap: update manpage link
Patch series "Update .mailmap for my work address and fix manpage".

While updating mailmap for the going-away address, I also found that on
current systems the manpage linked from the header comment changed.

And in fact it looks like the git mailmap feature got its own manpage.


This patch (of 2):

On recent systems the git-shortlog manpage only tells people to
    See gitmailmap(5)

So instead of sending people on a scavenger hunt, put that info into the
header directly.  Though keep the old reference around for older systems.

Link: https://lkml.kernel.org/r/20230704163919.1136784-1-heiko@sntech.de
Link: https://lkml.kernel.org/r/20230704163919.1136784-2-heiko@sntech.de
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-08 09:29:31 -07:00
Liu Shixin
028725e733 bootmem: remove the vmemmap pages from kmemleak in free_bootmem_page
commit dd0ff4d12d ("bootmem: remove the vmemmap pages from kmemleak in
put_page_bootmem") fix an overlaps existing problem of kmemleak.  But the
problem still existed when HAVE_BOOTMEM_INFO_NODE is disabled, because in
this case, free_bootmem_page() will call free_reserved_page() directly.

Fix the problem by adding kmemleak_free_part() in free_bootmem_page() when
HAVE_BOOTMEM_INFO_NODE is disabled.

Link: https://lkml.kernel.org/r/20230704101942.2819426-1-liushixin2@huawei.com
Fixes: f41f2ed43c ("mm: hugetlb: free the vmemmap pages associated with each HugeTLB page")
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Acked-by: Muchun Song <songmuchun@bytedance.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-08 09:29:31 -07:00
Randy Dunlap
0d707cdefb MAINTAINERS: add linux-next info
Add linux-next info to MAINTAINERS for ease of finding this data.

Link: https://lkml.kernel.org/r/20230704054410.12527-1-rdunlap@infradead.org
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-08 09:29:31 -07:00
Markus Schneider-Pargmann
6dedd768f3 mailmap: add Markus Schneider-Pargmann
Add my old mail address and update my name.

Link: https://lkml.kernel.org/r/20230628081341.3470229-1-msp@baylibre.com
Signed-off-by: Markus Schneider-Pargmann <msp@baylibre.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-08 09:29:30 -07:00
Matthew Wilcox (Oracle)
8344a3d44b writeback: account the number of pages written back
nr_to_write is a count of pages, so we need to decrease it by the number
of pages in the folio we just wrote, not by 1.  Most callers specify
either LONG_MAX or 1, so are unaffected, but writeback_sb_inodes() might
end up writing 512x as many pages as it asked for.

Dave added:

: XFS is the only filesystem this would affect, right?  AFAIA, nothing
: else enables large folios and uses writeback through
: write_cache_pages() at this point...
: 
: In which case, I'd be surprised if much difference, if any, gets
: noticed by anyone.

Link: https://lkml.kernel.org/r/20230628185548.981888-1-willy@infradead.org
Fixes: 793917d997 ("mm/readahead: Add large folio readahead")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-08 09:29:30 -07:00
Peter Collingbourne
6dca4ac6fc mm: call arch_swap_restore() from do_swap_page()
Commit c145e0b47c ("mm: streamline COW logic in do_swap_page()") moved
the call to swap_free() before the call to set_pte_at(), which meant that
the MTE tags could end up being freed before set_pte_at() had a chance to
restore them.  Fix it by adding a call to the arch_swap_restore() hook
before the call to swap_free().

Link: https://lkml.kernel.org/r/20230523004312.1807357-2-pcc@google.com
Link: https://linux-review.googlesource.com/id/I6470efa669e8bd2f841049b8c61020c510678965
Fixes: c145e0b47c ("mm: streamline COW logic in do_swap_page()")
Signed-off-by: Peter Collingbourne <pcc@google.com>
Reported-by: Qun-wei Lin <Qun-wei.Lin@mediatek.com>
Closes: https://lore.kernel.org/all/5050805753ac469e8d727c797c2218a9d780d434.camel@mediatek.com/
Acked-by: David Hildenbrand <david@redhat.com>
Acked-by: "Huang, Ying" <ying.huang@intel.com>
Reviewed-by: Steven Price <steven.price@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: <stable@vger.kernel.org>	[6.1+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-08 09:29:30 -07:00
Vincent Whitchurch
08bab74ae6 squashfs: fix cache race with migration
Migration replaces the page in the mapping before copying the contents and
the flags over from the old page, so check that the page in the page cache
is really up to date before using it.  Without this, stressing squashfs
reads with parallel compaction sometimes results in squashfs reporting
data corruption.

Link: https://lkml.kernel.org/r/20230629-squashfs-cache-migration-v1-1-d50ebe55099d@axis.com
Fixes: e994f5b677 ("squashfs: cache partial compressed blocks")
Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Phillip Lougher <phillip@squashfs.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-08 09:29:30 -07:00
John Hubbard
191fcdb6c9 mm/hugetlb.c: fix a bug within a BUG(): inconsistent pte comparison
The following crash happens for me when running the -mm selftests (below).
Specifically, it happens while running the uffd-stress subtests:

kernel BUG at mm/hugetlb.c:7249!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 3238 Comm: uffd-stress Not tainted 6.4.0-hubbard-github+ #109
Hardware name: ASUS X299-A/PRIME X299-A, BIOS 1503 08/03/2018
RIP: 0010:huge_pte_alloc+0x12c/0x1a0
...
Call Trace:
 <TASK>
 ? __die_body+0x63/0xb0
 ? die+0x9f/0xc0
 ? do_trap+0xab/0x180
 ? huge_pte_alloc+0x12c/0x1a0
 ? do_error_trap+0xc6/0x110
 ? huge_pte_alloc+0x12c/0x1a0
 ? handle_invalid_op+0x2c/0x40
 ? huge_pte_alloc+0x12c/0x1a0
 ? exc_invalid_op+0x33/0x50
 ? asm_exc_invalid_op+0x16/0x20
 ? __pfx_put_prev_task_idle+0x10/0x10
 ? huge_pte_alloc+0x12c/0x1a0
 hugetlb_fault+0x1a3/0x1120
 ? finish_task_switch+0xb3/0x2a0
 ? lock_is_held_type+0xdb/0x150
 handle_mm_fault+0xb8a/0xd40
 ? find_vma+0x5d/0xa0
 do_user_addr_fault+0x257/0x5d0
 exc_page_fault+0x7b/0x1f0
 asm_exc_page_fault+0x22/0x30

That happens because a BUG() statement in huge_pte_alloc() attempts to
check that a pte, if present, is a hugetlb pte, but it does so in a
non-lockless-safe manner that leads to a false BUG() report.

We got here due to a couple of bugs, each of which by itself was not quite
enough to cause a problem:

First of all, before commit c33c794828f2("mm: ptep_get() conversion"), the
BUG() statement in huge_pte_alloc() was itself fragile: it relied upon
compiler behavior to only read the pte once, despite using it twice in the
same conditional.

Next, commit c33c794828 ("mm: ptep_get() conversion") broke that
delicate situation, by causing all direct pte reads to be done via
READ_ONCE().  And so READ_ONCE() got called twice within the same BUG()
conditional, leading to comparing (potentially, occasionally) different
versions of the pte, and thus to false BUG() reports.

Fix this by taking a single snapshot of the pte before using it in the
BUG conditional.

Now, that commit is only partially to blame here but, people doing
bisections will invariably land there, so this will help them find a fix
for a real crash.  And also, the previous behavior was unlikely to ever
expose this bug--it was fragile, yet not actually broken.

So that's why I chose this commit for the Fixes tag, rather than the
commit that created the original BUG() statement.

Link: https://lkml.kernel.org/r/20230701010442.2041858-1-jhubbard@nvidia.com
Fixes: c33c794828 ("mm: ptep_get() conversion")
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
Acked-by: James Houghton <jthoughton@google.com>
Acked-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Dave Airlie <airlied@gmail.com>
Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Lorenzo Stoakes <lstoakes@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Rapoport (IBM) <rppt@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: SeongJae Park <sj@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Uladzislau Rezki (Sony) <urezki@gmail.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Yu Zhao <yuzhao@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-08 09:29:29 -07:00
Anthony Iliopoulos
5a569db68c docs: update ocfs2-devel mailing list address
The ocfs2-devel mailing list has been migrated to the kernel.org
infrastructure, update all related documentation pointers to reflect the
change.

Link: https://lkml.kernel.org/r/20230628013437.47030-3-ailiop@suse.com
Signed-off-by: Anthony Iliopoulos <ailiop@suse.com>
Acked-by: Joseph Qi <jiangqi903@gmail.com>
Acked-by: Joel Becker <jlbec@evilplan.org>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mark@fasheh.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-08 09:29:29 -07:00
Anthony Iliopoulos
a57b4b7f05 MAINTAINERS: update ocfs2-devel mailing list address
The ocfs2-devel mailing list has been migrated to the kernel.org
infrastructure, update the related entry to reflect the change.

Link: https://lkml.kernel.org/r/20230628013437.47030-2-ailiop@suse.com
Signed-off-by: Anthony Iliopoulos <ailiop@suse.com>
Acked-by: Joseph Qi <jiangqi903@gmail.com>
Acked-by: Joel Becker <jlbec@evilplan.org>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-08 09:29:29 -07:00
Suren Baghdasaryan
f96c486703 mm: disable CONFIG_PER_VMA_LOCK until its fixed
A memory corruption was reported in [1] with bisection pointing to the
patch [2] enabling per-VMA locks for x86.  Disable per-VMA locks config to
prevent this issue until the fix is confirmed.  This is expected to be a
temporary measure.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=217624
[2] https://lore.kernel.org/all/20230227173632.3292573-30-surenb@google.com

Link: https://lkml.kernel.org/r/20230706011400.2949242-3-surenb@google.com
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Closes: https://lore.kernel.org/all/dbdef34c-3a07-5951-e1ae-e9c6e3cdf51b@kernel.org/
Reported-by: Jacob Young <jacobly.alt@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624
Fixes: 0bff0aaea0 ("x86/mm: try VMA lock-based page fault handling first")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Holger Hoffstätte <holger@applied-asynchrony.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-08 09:29:29 -07:00
Suren Baghdasaryan
2b4f3b4987 fork: lock VMAs of the parent process when forking
Patch series "Avoid memory corruption caused by per-VMA locks", v4.

A memory corruption was reported in [1] with bisection pointing to the
patch [2] enabling per-VMA locks for x86.  Based on the reproducer
provided in [1] we suspect this is caused by the lack of VMA locking while
forking a child process.

Patch 1/2 in the series implements proper VMA locking during fork.  I
tested the fix locally using the reproducer and was unable to reproduce
the memory corruption problem.

This fix can potentially regress some fork-heavy workloads.  Kernel build
time did not show noticeable regression on a 56-core machine while a
stress test mapping 10000 VMAs and forking 5000 times in a tight loop
shows ~7% regression.  If such fork time regression is unacceptable,
disabling CONFIG_PER_VMA_LOCK should restore its performance.  Further
optimizations are possible if this regression proves to be problematic.

Patch 2/2 disables per-VMA locks until the fix is tested and verified.


This patch (of 2):

When forking a child process, parent write-protects an anonymous page and
COW-shares it with the child being forked using copy_present_pte(). 
Parent's TLB is flushed right before we drop the parent's mmap_lock in
dup_mmap().  If we get a write-fault before that TLB flush in the parent,
and we end up replacing that anonymous page in the parent process in
do_wp_page() (because, COW-shared with the child), this might lead to some
stale writable TLB entries targeting the wrong (old) page.  Similar issue
happened in the past with userfaultfd (see flush_tlb_page() call inside
do_wp_page()).

Lock VMAs of the parent process when forking a child, which prevents
concurrent page faults during fork operation and avoids this issue.  This
fix can potentially regress some fork-heavy workloads.  Kernel build time
did not show noticeable regression on a 56-core machine while a stress
test mapping 10000 VMAs and forking 5000 times in a tight loop shows ~7%
regression.  If such fork time regression is unacceptable, disabling
CONFIG_PER_VMA_LOCK should restore its performance.  Further optimizations
are possible if this regression proves to be problematic.

Link: https://lkml.kernel.org/r/20230706011400.2949242-1-surenb@google.com
Link: https://lkml.kernel.org/r/20230706011400.2949242-2-surenb@google.com
Fixes: 0bff0aaea0 ("x86/mm: try VMA lock-based page fault handling first")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Suggested-by: David Hildenbrand <david@redhat.com>
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Closes: https://lore.kernel.org/all/dbdef34c-3a07-5951-e1ae-e9c6e3cdf51b@kernel.org/
Reported-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Closes: https://lore.kernel.org/all/b198d649-f4bf-b971-31d0-e8433ec2a34c@applied-asynchrony.com/
Reported-by: Jacob Young <jacobly.alt@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=3D217624
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Tested-by: Holger Hoffsttte <holger@applied-asynchrony.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-07-08 09:29:29 -07:00
Geoff Levand
48063dfa4f ntb.rst: Fix copy and paste error
It seems the text for the NTB MSI Test Client section was copied from the
NTB Tool Test Client, but was not updated for the new section.  Corrects
the NTB MSI Test Client section text.

Reviewed-by: Logan Gunthorpe <logang@deltatee.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Geoff Levand <geoff@infradead.org>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
2023-07-08 11:05:05 -04:00
Geoff Levand
ce946519f9 ntb_netdev: Fix module_init problem
With both the ntb_transport_init and the ntb_netdev_init_module routines in the
module_init init group, the ntb_netdev_init_module routine can be called before
the ntb_transport_init routine that it depends on is called.  To assure the
proper initialization order put ntb_netdev_init_module in the late_initcall
group.

Fixes runtime errors where the ntb_netdev_init_module call fails with ENODEV.

Signed-off-by: Geoff Levand <geoff@infradead.org>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
2023-07-08 11:03:53 -04:00
Cai Huoqing
d353fb4b70 ntb: intel: Remove redundant pci_clear_master
Remove pci_clear_master to simplify the code,
the bus-mastering is also cleared in do_pci_disable_device,
like this:
./drivers/pci/pci.c:2197
static void do_pci_disable_device(struct pci_dev *dev)
{
	u16 pci_command;

	pci_read_config_word(dev, PCI_COMMAND, &pci_command);
	if (pci_command & PCI_COMMAND_MASTER) {
		pci_command &= ~PCI_COMMAND_MASTER;
		pci_write_config_word(dev, PCI_COMMAND, pci_command);
	}

	pcibios_disable_device(dev);
}.
And dev->is_busmaster is set to 0 in pci_disable_device.

Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
Acked-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
2023-07-08 11:02:37 -04:00
Cai Huoqing
f2748c6d76 ntb: epf: Remove redundant pci_clear_master
Remove pci_clear_master to simplify the code,
the bus-mastering is also cleared in do_pci_disable_device,
like this:
./drivers/pci/pci.c:2197
static void do_pci_disable_device(struct pci_dev *dev)
{
	u16 pci_command;

	pci_read_config_word(dev, PCI_COMMAND, &pci_command);
	if (pci_command & PCI_COMMAND_MASTER) {
		pci_command &= ~PCI_COMMAND_MASTER;
		pci_write_config_word(dev, PCI_COMMAND, pci_command);
	}

	pcibios_disable_device(dev);
}.
And dev->is_busmaster is set to 0 in pci_disable_device.

Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
2023-07-08 11:02:37 -04:00
Cai Huoqing
da6b4dc49e ntb_hw_amd: Remove redundant pci_clear_master
Remove pci_clear_master to simplify the code,
the bus-mastering is also cleared in do_pci_disable_device,
like this:
./drivers/pci/pci.c:2197
static void do_pci_disable_device(struct pci_dev *dev)
{
	u16 pci_command;

	pci_read_config_word(dev, PCI_COMMAND, &pci_command);
	if (pci_command & PCI_COMMAND_MASTER) {
		pci_command &= ~PCI_COMMAND_MASTER;
		pci_write_config_word(dev, PCI_COMMAND, pci_command);
	}

	pcibios_disable_device(dev);
}.
And dev->is_busmaster is set to 0 in pci_disable_device.

Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
2023-07-08 11:02:37 -04:00