linux

Author	SHA1	Message	Date
Damien Le Moal	9277a6d4fb	zonefs: Export open zone resource information through sysfs To allow applications to easily check the current usage status of the open zone resources of the mounted device, export through sysfs the counter of write open sequential files s_wro_seq_files field of struct zonefs_sb_info. The attribute is named nr_wro_seq_files and is read only. The maximum number of write open sequential files (zones) indicated by the s_max_wro_seq_files field of struct zonefs_sb_info is also exported as the read only attribute max_wro_seq_files. Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com>	2022-04-21 08:37:51 +09:00
Damien Le Moal	7d6dfbe03b	zonefs: Always do seq file write open accounting The explicit_open mount option forces an explicitly open of the zone of sequential files that are open for writing to ensure that the open file can be written without the device failing write operations due to open zone resources limit being exceeded. To implement this, zonefs accounts all write open seq file when this mount option is used. This accounting however can be easily performed even when the explicit_open mount option is not used, thus allowing applications to control zone resources on their own, without relying on open() system call failures from zonefs. To implement this, the helper zonefs_file_use_exp_open() is removed and replaced with the helper zonefs_seq_file_need_wro() which test if a file is a sequential file being open with write access. zonefs_open_zone() and zonefs_close_zone() are renamed respectively to zonefs_seq_file_write_open() and zonefs_seq_file_write_close() and modified to update the s_wro_seq_files counter regardless of the explicit_open mount option use. If the explicit_open mount option is used, zonefs_seq_file_write_open() execute an explicit zone open operation for a sequential file open for writing for the first time, as before. Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com>	2022-04-21 08:37:46 +09:00
Damien Le Moal	2b95a23c4f	zonefs: Rename super block information fields The s_open_zones field of struct zonefs_sb_info is used to count the number of files that are open for writing and may not necessarilly correspond to the number of open zones on the device. For instance, an application may open for writing a sequential zone file, fully write it and keep the file open. In such case, the zone of the file is not open anymore (it is in the full state). Avoid confusion about this counter meaning by renaming it to s_wro_seq_files. To keep things consistent, the field s_max_open_zones is renamed to s_max_wro_seq_files. Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com>	2022-04-21 08:37:41 +09:00
Damien Le Moal	1913953920	zonefs: Fix management of open zones The mount option "explicit_open" manages the device open zone resources to ensure that if an application opens a sequential file for writing, the file zone can always be written by explicitly opening the zone and accounting for that state with the s_open_zones counter. However, if some zones are already open when mounting, the device open zone resource usage status will be larger than the initial s_open_zones value of 0. Ensure that this inconsistency does not happen by closing any sequential zone that is open when mounting. Furthermore, with ZNS drives, closing an explicitly open zone that has not been written will change the zone state to "closed", that is, the zone will remain in an active state. Since this can then cause failures of explicit open operations on other zones if the drive active zone resources are exceeded, we need to make sure that the zone is not active anymore by resetting it instead of closing it. To address this, zonefs_zone_mgmt() is modified to change a REQ_OP_ZONE_CLOSE request into a REQ_OP_ZONE_RESET for sequential zones that have not been written. Fixes: `b5c00e9757` ("zonefs: open/close zone on file open/close") Cc: <stable@vger.kernel.org> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com>	2022-04-21 08:37:35 +09:00
Damien Le Moal	b954ebba29	zonefs: Clear inode information flags on inode creation Ensure that the i_flags field of struct zonefs_inode_info is cleared to 0 when initializing a zone file inode, avoiding seeing the flag ZONEFS_ZONE_OPEN being incorrectly set. Fixes: `b5c00e9757` ("zonefs: open/close zone on file open/close") Cc: <stable@vger.kernel.org> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Hans Holmberg <hans.holmberg@wdc.com>	2022-04-21 08:37:23 +09:00
Kaixu Xia	2d9ac4319b	xfs: simplify local variable assignment in file write code Get the struct inode pointer from iocb->ki_filp->f_mapping->host directly and the other variables are unnecessary, so simplify the local variables assignment. Signed-off-by: Kaixu Xia <kaixuxia@tencent.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Dave Chinner <david@fromorbit.com>	2022-04-21 08:47:54 +10:00
Dave Chinner	9a5280b312	xfs: reorder iunlink remove operation in xfs_ifree The O_TMPFILE creation implementation creates a specific order of operations for inode allocation/freeing and unlinked list modification. Currently both are serialised by the AGI, so the order doesn't strictly matter as long as the are both in the same transaction. However, if we want to move the unlinked list insertions largely out from under the AGI lock, then we have to be concerned about the order in which we do unlinked list modification operations. O_TMPFILE creation tells us this order is inode allocation/free, then unlinked list modification. Change xfs_ifree() to use this same ordering on unlinked list removal. This way we always guarantee that when we enter the iunlinked list removal code from this path, we already have the AGI locked and we don't have to worry about lock nesting AGI reads inside unlink list locks because it's already locked and attached to the transaction. We can do this safely as the inode freeing and unlinked list removal are done in the same transaction and hence are atomic operations with respect to log recovery. Reported-by: Frank Hofmann <fhofmann@cloudflare.com> Fixes: `298f7bec50` ("xfs: pin inode backing buffer to the inode log item") Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2022-04-21 08:45:16 +10:00
Dave Chinner	b9b3fe152e	xfs: convert buffer flags to unsigned. 5.18 w/ std=gnu11 compiled with gcc-5 wants flags stored in unsigned fields to be unsigned. This manifests as a compiler error such as: /kisskb/src/fs/xfs/./xfs_trace.h:432:2: note: in expansion of macro 'TP_printk' TP_printk("dev %d:%d daddr 0x%llx bbcount 0x%x hold %d pincount %d " ^ /kisskb/src/fs/xfs/./xfs_trace.h:440:5: note: in expansion of macro '__print_flags' __print_flags(__entry->flags, "\|", XFS_BUF_FLAGS), ^ /kisskb/src/fs/xfs/xfs_buf.h:67:4: note: in expansion of macro 'XBF_UNMAPPED' { XBF_UNMAPPED, "UNMAPPED" } ^ /kisskb/src/fs/xfs/./xfs_trace.h:440:40: note: in expansion of macro 'XFS_BUF_FLAGS' __print_flags(__entry->flags, "\|", XFS_BUF_FLAGS), ^ /kisskb/src/fs/xfs/./xfs_trace.h: In function 'trace_raw_output_xfs_buf_flags_class': /kisskb/src/fs/xfs/xfs_buf.h:46:23: error: initializer element is not constant #define XBF_UNMAPPED (1 << 31)/* do not map the buffer */ as __print_flags assigns XFS_BUF_FLAGS to a structure that uses an unsigned long for the flag. Since this results in the value of XBF_UNMAPPED causing a signed integer overflow, the result is technically undefined behavior, which gcc-5 does not accept as an integer constant. This is based on a patch from Arnd Bergman <arnd@arndb.de>. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Chandan Babu R <chandan.babu@oracle.com> Signed-off-by: Dave Chinner <david@fromorbit.com>	2022-04-21 08:44:59 +10:00
Linus Torvalds	10c5f102e2	Merge tag 'erofs-for-5.18-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs fixes from Gao Xiang: "One patch to fix a use-after-free race related to the on-stack z_erofs_decompressqueue, which happens very rarely but needs to be fixed properly soon. The other patch fixes some sysfs Sphinx warnings" * tag 'erofs-for-5.18-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: Documentation/ABI: sysfs-fs-erofs: Fix Sphinx errors erofs: fix use-after-free of on-stack io[]	2022-04-20 12:35:20 -07:00
Linus Torvalds	906f904097	Revert "fs/pipe: use kvcalloc to allocate a pipe_buffer array" This reverts commit `5a519c8fe4`. It turns out that making the pipe almost arbitrarily large has some rather unexpected downsides. The kernel test robot reports a kernel warning that is due to pipe->max_usage now growing to the point where the iter_file_splice_write() buffer allocation can no longer be satisfied as a slab allocation, and the int nbufs = pipe->max_usage; struct bio_vec *array = kcalloc(nbufs, sizeof(struct bio_vec), GFP_KERNEL); code sequence there will now always fail as a result. That code could be modified to use kvcalloc() too, but I feel very uncomfortable making those kinds of changes for a very niche use case that really should have other options than make these kinds of fundamental changes to pipe behavior. Maybe the CRIU process dumping should be multi-threaded, and use multiple pipes and multiple cores, rather than try to use one larger pipe to minimize splice() calls. Reported-by: kernel test robot <oliver.sang@intel.com> Link: https://lore.kernel.org/all/20220420073717.GD16310@xsang-OptiPlex-9020/ Cc: Andrei Vagin <avagin@gmail.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-04-20 12:07:53 -07:00
Jaegeuk Kim	27275f181c	f2fs: fix wrong condition check when failing metapage read This patch fixes wrong initialization. Fixes: `50c63009f6` ("f2fs: avoid an infinite loop in f2fs_sync_dirty_inodes") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-04-20 11:16:43 -07:00
Jaegeuk Kim	0adc2ab0e8	f2fs: keep io_flags to avoid IO split due to different op_flags in two fio holders Let's attach io_flags to bio only, so that we can merge IOs given original io_flags only. Fixes: `64bf0eef01` ("f2fs: pass the bio operation to bio_alloc_bioset") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-04-20 11:16:43 -07:00
Jaegeuk Kim	930e260763	f2fs: remove obsolete whint_mode This patch removes obsolete whint_mode. Fixes: `41d36a9f3e` ("fs: remove kiocb.ki_hint") Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>	2022-04-20 11:16:43 -07:00
Eric W. Biederman	8d005269c5	binfmt_flat: Drop vestiges of coredump support There is the briefest start of coredump support in binfmt_flat. It is actually a pain to maintain as binfmt_flat is not built on most architectures so it is easy to overlook. Since the support does not do anything remove it. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com> Acked-by: Greg Ungerer <gerg@linux-m68k.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/87mtgh17li.fsf_-_@email.froward.int.ebiederm.org	2022-04-19 19:31:43 -07:00
Christian Brauner	705191b03d	fs: fix acl translation Last cycle we extended the idmapped mounts infrastructure to support idmapped mounts of idmapped filesystems (No such filesystem yet exist.). Since then, the meaning of an idmapped mount is a mount whose idmapping is different from the filesystems idmapping. While doing that work we missed to adapt the acl translation helpers. They still assume that checking for the identity mapping is enough. But they need to use the no_idmapping() helper instead. Note, POSIX ACLs are always translated right at the userspace-kernel boundary using the caller's current idmapping and the initial idmapping. The order depends on whether we're coming from or going to userspace. The filesystem's idmapping doesn't matter at the border. Consequently, if a non-idmapped mount is passed we need to make sure to always pass the initial idmapping as the mount's idmapping and not the filesystem idmapping. Since it's irrelevant here it would yield invalid ids and prevent setting acls for filesystems that are mountable in a userns and support posix acls (tmpfs and fuse). I verified the regression reported in [1] and verified that this patch fixes it. A regression test will be added to xfstests in parallel. Link: https://bugzilla.kernel.org/show_bug.cgi?id=215849 [1] Fixes: `bd303368b7` ("fs: support mapped mounts of mapped filesystems") Cc: Seth Forshee <sforshee@digitalocean.com> Cc: Christoph Hellwig <hch@lst.de> Cc: <stable@vger.kernel.org> # 5.17 Cc: <regressions@lists.linux.dev> Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-04-19 10:19:02 -07:00
Zixuan Fu	0d4837fdb7	fs: jfs: fix possible NULL pointer dereference in dbFree() In our fault-injection testing, the variable "nblocks" in dbFree() can be zero when kmalloc_array() fails in dtSearch(). In this case, the variable "mp" in dbFree() would be NULL and then it is dereferenced in "write_metapage(mp)". The failure log is listed as follows: [ 13.824137] BUG: kernel NULL pointer dereference, address: 0000000000000020 ... [ 13.827416] RIP: 0010:dbFree+0x5f7/0x910 [jfs] [ 13.834341] Call Trace: [ 13.834540] <TASK> [ 13.834713] txFreeMap+0x7b4/0xb10 [jfs] [ 13.835038] txUpdateMap+0x311/0x650 [jfs] [ 13.835375] jfs_lazycommit+0x5f2/0xc70 [jfs] [ 13.835726] ? sched_dynamic_update+0x1b0/0x1b0 [ 13.836092] kthread+0x3c2/0x4a0 [ 13.836355] ? txLockFree+0x160/0x160 [jfs] [ 13.836763] ? kthread_unuse_mm+0x160/0x160 [ 13.837106] ret_from_fork+0x1f/0x30 [ 13.837402] </TASK> ... This patch adds a NULL check of "mp" before "write_metapage(mp)" is called. Reported-by: TOTE Robot <oslab@tsinghua.edu.cn> Signed-off-by: Zixuan Fu <r33s3n6@gmail.com> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>	2022-04-19 10:20:51 -05:00
Christoph Hellwig	0fdf977d45	btrfs: fix direct I/O writes for split bios on zoned devices When a bio is split in btrfs_submit_direct, dip->file_offset contains the file offset for the first bio. But this means the start value used in btrfs_end_dio_bio to record the write location for zone devices is incorrect for subsequent bios. CC: stable@vger.kernel.org # 5.16+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>	2022-04-19 15:45:04 +02:00
Christoph Hellwig	00d825258b	btrfs: fix direct I/O read repair for split bios When a bio is split in btrfs_submit_direct, dip->file_offset contains the file offset for the first bio. But this means the start value used in btrfs_check_read_dio_bio is incorrect for subsequent bios. Add a file_offset field to struct btrfs_bio to pass along the correct offset. Given that check_data_csum only uses start of an error message this means problems with this miscalculation will only show up when I/O fails or checksums mismatch. The logic was removed in `f4f39fc5dc` ("btrfs: remove btrfs_bio::logical member") but we need it due to the bio splitting. CC: stable@vger.kernel.org # 5.16+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>	2022-04-19 15:44:56 +02:00
Christoph Hellwig	50f1cff3d8	btrfs: fix and document the zoned device choice in alloc_new_bio Zone Append bios only need a valid block device in struct bio, but not the device in the btrfs_bio. Use the information from btrfs_zoned_get_device to set up bi_bdev and fix zoned writes on multi-device file system with non-homogeneous capabilities and remove the pointless btrfs_bio.device assignment. Add big fat comments explaining what is going on here. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David Sterba <dsterba@suse.com>	2022-04-19 15:44:49 +02:00
Filipe Manana	50ff57888d	btrfs: fix leaked plug after failure syncing log on zoned filesystems On a zoned filesystem, if we fail to allocate the root node for the log root tree while syncing the log, we end up returning without finishing the IO plug we started before, resulting in leaking resources as we have started writeback for extent buffers of a log tree before. That allocation failure, which typically is either -ENOMEM or -ENOSPC, is not fatal and the fsync can safely fallback to a full transaction commit. So release the IO plug if we fail to allocate the extent buffer for the root of the log root tree when syncing the log on a zoned filesystem. Fixes: `3ddebf27fc` ("btrfs: zoned: reorder log node allocation on zoned filesystem") CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2022-04-19 15:44:17 +02:00
Niklas Cassel	6045ab5fea	binfmt_flat: do not stop relocating GOT entries prematurely on riscv bFLT binaries are usually created using elf2flt. The linker script used by elf2flt has defined the .data section like the following for the last 19 years: .data : { _sdata = . ; __data_start = . ; data_start = . ; (.got.plt) (.got) FILL(0) ; . = ALIGN(0x20) ; LONG(-1) . = ALIGN(0x20) ; ... } It places the .got.plt input section before the .got input section. The same is true for the default linker script (ld --verbose) on most architectures except x86/x86-64. The binfmt_flat loader should relocate all GOT entries until it encounters a -1 (the LONG(-1) in the linker script). The problem is that the .got.plt input section starts with a GOTPLT header (which has size 16 bytes on elf64-riscv and 8 bytes on elf32-riscv), where the first word is set to -1. See the binutils implementation for riscv [1]. This causes the binfmt_flat loader to stop relocating GOT entries prematurely and thus causes the application to crash when running. Fix this by skipping the whole GOTPLT header, since the whole GOTPLT header is reserved for the dynamic linker. The GOTPLT header will only be skipped for bFLT binaries with flag FLAT_FLAG_GOTPIC set. This flag is unconditionally set by elf2flt if the supplied ELF binary has the symbol _GLOBAL_OFFSET_TABLE_ defined. ELF binaries without a .got input section should thus remain unaffected. Tested on RISC-V Canaan Kendryte K210 and RISC-V QEMU nommu_virt_defconfig. [1] https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=bfd/elfnn-riscv.c;hb=binutils-2_38#l3275 Cc: <stable@vger.kernel.org> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Link: https://lore.kernel.org/r/20220414091018.896737-1-niklas.cassel@wdc.com Fixed-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/lkml/202204182333.OIUOotK8-lkp@intel.com Signed-off-by: Kees Cook <keescook@chromium.org>	2022-04-18 15:02:50 -07:00
Haowen Bai	9339faac6d	cifs: Use kzalloc instead of kmalloc/memset Use kzalloc rather than duplicating its implementation, which makes code simple and easy to understand. Signed-off-by: Haowen Bai <baihaowen@meizu.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-04-18 10:22:57 -05:00
Christoph Hellwig	c22198e78d	direct-io: remove random prefetches Randomly poking into block device internals for manual prefetches isn't exactly a very maintainable thing to do. And none of the performance critical direct I/O implementations still use this library function anyway, so just drop it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Link: https://lore.kernel.org/r/20220415045258.199825-28-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-17 19:50:02 -06:00
Christoph Hellwig	44abff2c0b	block: decouple REQ_OP_SECURE_ERASE from REQ_OP_DISCARD Secure erase is a very different operation from discard in that it is a data integrity operation vs hint. Fully split the limits and helper infrastructure to make the separation more clear. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd] Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> [nifs2] Acked-by: Jaegeuk Kim <jaegeuk@kernel.org> [f2fs] Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: David Sterba <dsterba@suse.com> [btrfs] Acked-by: Chao Yu <chao@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-27-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-17 19:49:59 -06:00
Christoph Hellwig	7b47ef52d0	block: add a bdev_discard_granularity helper Abstract away implementation details from file systems by providing a block_device based helper to retrieve the discard granularity. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd] Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Acked-by: David Sterba <dsterba@suse.com> [btrfs] Link: https://lore.kernel.org/r/20220415045258.199825-26-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-17 19:49:59 -06:00
Christoph Hellwig	70200574cc	block: remove QUEUE_FLAG_DISCARD Just use a non-zero max_discard_sectors as an indicator for discard support, similar to what is done for write zeroes. The only places where needs special attention is the RAID5 driver, which must clear discard support for security reasons by default, even if the default stacking rules would allow for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd] Acked-by: Jan Höppner <hoeppner@linux.ibm.com> [s390] Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: David Sterba <dsterba@suse.com> [btrfs] Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-25-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-17 19:49:59 -06:00
Christoph Hellwig	cf0fbf894b	block: add a bdev_max_discard_sectors helper Add a helper to query the number of sectors support per each discard bio based on the block device and use this helper to stop various places from poking into the request_queue to see if discard is supported and if so how much. This mirrors what is done e.g. for write zeroes as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd] Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: David Sterba <dsterba@suse.com> [btrfs] Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-24-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-17 19:49:59 -06:00
Christoph Hellwig	2aba0d19f4	block: add a bdev_max_zone_append_sectors helper Add a helper to check the max supported sectors for zone append based on the block_device instead of having to poke into the block layer internal request_queue. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-16-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-17 19:49:59 -06:00
Christoph Hellwig	36d254893a	block: add a bdev_stable_writes helper Add a helper to check the stable writes flag based on the block_device instead of having to poke into the block layer internal request_queue. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-15-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-17 19:49:59 -06:00
Christoph Hellwig	a557e82e5a	block: add a bdev_fua helper Add a helper to check the FUA flag based on the block_device instead of having to poke into the block layer internal request_queue. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-14-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-17 19:49:59 -06:00
Christoph Hellwig	08e688fdb8	block: add a bdev_write_cache helper Add a helper to check the write cache flag based on the block_device instead of having to poke into the block layer internal request_queue. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: David Sterba <dsterba@suse.com> [btrfs] Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-13-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-17 19:49:59 -06:00
Christoph Hellwig	10f0d2a517	block: add a bdev_nonrot helper Add a helper to check the nonrot flag based on the block_device instead of having to poke into the block layer internal request_queue. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: David Sterba <dsterba@suse.com> [btrfs] Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-12-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-17 19:49:59 -06:00
Christoph Hellwig	f09dac9afb	ntfs3: use bdev_logical_block_size instead of open coding it Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20220415045258.199825-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-17 19:49:58 -06:00
Christoph Hellwig	c1e7b24416	btrfs: use bdev_max_active_zones instead of open coding it Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Acked-by: David Sterba <dsterba@suse.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-17 19:49:58 -06:00
Christoph Hellwig	066ff57101	block: turn bio_kmalloc into a simple kmalloc wrapper Remove the magic autofree semantics and require the callers to explicitly call bio_init to initialize the bio. This allows bio_free to catch accidental bio_put calls on bio_init()ed bios as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Coly Li <colyli@suse.de> Acked-by: Mike Snitzer <snitzer@kernel.org> Link: https://lore.kernel.org/r/20220406061228.410163-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-17 19:30:41 -06:00
Christoph Hellwig	46a2d4ccc4	squashfs: always use bio_kmalloc in squashfs_bio_read If a plain kmalloc that is not backed by a mempool is safe here for a large read (and the actual page allocations), it must also be for a small one, so simplify the code a bit. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Phillip Lougher <phillip@squashfs.org.uk> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220406061228.410163-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-17 19:29:41 -06:00
Christoph Hellwig	f9e69aa9cc	btrfs: simplify ->flush_bio handling Use and embedded bios that is initialized when used instead of bio_kmalloc plus bio_reset. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: David Sterba <dsterba@suse.com> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220406061228.410163-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-17 19:29:41 -06:00
Pavel Begunkov	c0713540f6	io_uring: fix leaks on IOPOLL and CQE_SKIP If all completed requests in io_do_iopoll() were marked with REQ_F_CQE_SKIP, we'll not only skip CQE posting but also io_free_batch_list() leaking memory and resources. Move @nr_events increment before REQ_F_CQE_SKIP check. We'll potentially return the value greater than the real one, but iopolling will deal with it and the userspace will re-iopoll if needed. In anyway, I don't think there are many use cases for REQ_F_CQE_SKIP + IOPOLL. Fixes: `83a13a4181` ("io_uring: tweak iopoll CQE_SKIP event counting") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/5072fc8693fbfd595f89e5d4305bfcfd5d2f0a64.1650186611.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-17 06:54:11 -06:00
Jens Axboe	323b190ba2	io_uring: free iovec if file assignment fails We just return failure in this case, but we need to release the iovec first. If we're doing IO with more than FAST_IOV segments, then the iovec is allocated and must be freed. Reported-by: syzbot+96b43810dfe9c3bb95ed@syzkaller.appspotmail.com Fixes: `584b0180f0` ("io_uring: move read/write file prep state into actual opcode handler") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-16 21:14:00 -06:00
Linus Torvalds	59250f8a7f	Merge branch 'akpm' (patches from Andrew) Merge misc fixes from Andrew Morton: "14 patches. Subsystems affected by this patch series: MAINTAINERS, binfmt, and mm (tmpfs, secretmem, kasan, kfence, pagealloc, zram, compaction, hugetlb, vmalloc, and kmemleak)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: mm: kmemleak: take a full lowmem check in kmemleak_*_phys() mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore revert "fs/binfmt_elf: use PT_LOAD p_align values for static PIE" revert "fs/binfmt_elf: fix PT_LOAD p_align values for loaders" hugetlb: do not demote poisoned hugetlb pages mm: compaction: fix compiler warning when CONFIG_COMPACTION=n mm: fix unexpected zeroed page mapping with zram swap mm, page_alloc: fix build_zonerefs_node() mm, kfence: support kmem_dump_obj() for KFENCE objects kasan: fix hw tags enablement when KUNIT tests are disabled irq_work: use kasan_record_aux_stack_noalloc() record callstack mm/secretmem: fix panic when growing a memfd_secret tmpfs: fix regressions from wider use of ZERO_PAGE MAINTAINERS: Broadcom internal lists aren't maintainers	2022-04-15 15:57:18 -07:00
Andrew Morton	aeb7923733	revert "fs/binfmt_elf: use PT_LOAD p_align values for static PIE" Despite Mike's attempted fix (`925346c129`), regressions reports continue: https://lore.kernel.org/lkml/cb5b81bd-9882-e5dc-cd22-54bdbaaefbbc@leemhuis.info/ https://bugzilla.kernel.org/show_bug.cgi?id=215720 https://lkml.kernel.org/r/b685f3d0-da34-531d-1aa9-479accd3e21b@leemhuis.info So revert this patch. Fixes: `9630f0d60f` ("fs/binfmt_elf: use PT_LOAD p_align values for static PIE") Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Chris Kennelly <ckennelly@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Fangrui Song <maskray@google.com> Cc: H.J. Lu <hjl.tools@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Sandeep Patil <sspatil@google.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thorsten Leemhuis <regressions@leemhuis.info> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-04-15 14:49:56 -07:00
Andrew Morton	354e923df0	revert "fs/binfmt_elf: fix PT_LOAD p_align values for loaders" Commit `925346c129` ("fs/binfmt_elf: fix PT_LOAD p_align values for loaders") was an attempt to fix regressions due to `9630f0d60f` ("fs/binfmt_elf: use PT_LOAD p_align values for static PIE"). But regressionss continue to be reported: https://lore.kernel.org/lkml/cb5b81bd-9882-e5dc-cd22-54bdbaaefbbc@leemhuis.info/ https://bugzilla.kernel.org/show_bug.cgi?id=215720 https://lkml.kernel.org/r/b685f3d0-da34-531d-1aa9-479accd3e21b@leemhuis.info This patch reverts the fix, so the original can also be reverted. Fixes: `925346c129` ("fs/binfmt_elf: fix PT_LOAD p_align values for loaders") Cc: H.J. Lu <hjl.tools@gmail.com> Cc: Chris Kennelly <ckennelly@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Song Liu <songliubraving@fb.com> Cc: David Rientjes <rientjes@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Sandeep Patil <sspatil@google.com> Cc: Fangrui Song <maskray@google.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Thorsten Leemhuis <regressions@leemhuis.info> Cc: Mike Rapoport <rppt@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-04-15 14:49:56 -07:00
Linus Torvalds	0647b9cc7f	Merge tag 'io_uring-5.18-2022-04-14' of git://git.kernel.dk/linux-block Pull io_uring fixes from Jens Axboe: - Ensure we check and -EINVAL any use of reserved or struct padding. Although we generally always do that, it's missed in two spots for resource updates, one for the ring fd registration from this merge window, and one for the extended arg. Make sure we have all of them handled. (Dylan) - A few fixes for the deferred file assignment (me, Pavel) - Add a feature flag for the deferred file assignment so apps can tell we handle it correctly (me) - Fix a small perf regression with the current file position fix in this merge window (me) * tag 'io_uring-5.18-2022-04-14' of git://git.kernel.dk/linux-block: io_uring: abort file assignment prior to assigning creds io_uring: fix poll error reporting io_uring: fix poll file assign deadlock io_uring: use right issue_flags for splice/tee io_uring: verify pad field is 0 in io_get_ext_arg io_uring: verify resv is 0 in ringfd register/unregister io_uring: verify that resv2 is 0 in io_uring_rsrc_update2 io_uring: move io_uring_rsrc_update2 validation io_uring: fix assign file locking issue io_uring: stop using io_wq_work as an fd placeholder io_uring: move apoll->events cache io_uring: io_kiocb_update_pos() should not touch file for non -1 offset io_uring: flag the fact that linked file assignment is sane	2022-04-15 11:33:20 -07:00
Hongyu Jin	60b3005011	erofs: fix use-after-free of on-stack io[] The root cause is the race as follows: Thread #1 Thread #2(irq ctx) z_erofs_runqueue() struct z_erofs_decompressqueue io_A[]; submit bio A z_erofs_decompress_kickoff(,,1) z_erofs_decompressqueue_endio(bio A) z_erofs_decompress_kickoff(,,-1) spin_lock_irqsave() atomic_add_return() io_wait_event() -> pending_bios is already 0 [end of function] wake_up_locked(io_A[]) // crash Referenced backtrace in kernel 5.4: [ 10.129422] Unable to handle kernel paging request at virtual address eb0454a4 [ 10.364157] CPU: 0 PID: 709 Comm: getprop Tainted: G WC O 5.4.147-ab09225 #1 [ 11.556325] [<c01b33b8>] (__wake_up_common) from [<c01b3300>] (__wake_up_locked+0x40/0x48) [ 11.565487] [<c01b3300>] (__wake_up_locked) from [<c044c8d0>] (z_erofs_vle_unzip_kickoff+0x6c/0xc0) [ 11.575438] [<c044c8d0>] (z_erofs_vle_unzip_kickoff) from [<c044c854>] (z_erofs_vle_read_endio+0x16c/0x17c) [ 11.586082] [<c044c854>] (z_erofs_vle_read_endio) from [<c06a80e8>] (clone_endio+0xb4/0x1d0) [ 11.595428] [<c06a80e8>] (clone_endio) from [<c04a1280>] (blk_update_request+0x150/0x4dc) [ 11.604516] [<c04a1280>] (blk_update_request) from [<c06dea28>] (mmc_blk_cqe_complete_rq+0x144/0x15c) [ 11.614640] [<c06dea28>] (mmc_blk_cqe_complete_rq) from [<c04a5d90>] (blk_done_softirq+0xb0/0xcc) [ 11.624419] [<c04a5d90>] (blk_done_softirq) from [<c010242c>] (__do_softirq+0x184/0x56c) [ 11.633419] [<c010242c>] (__do_softirq) from [<c01051e8>] (irq_exit+0xd4/0x138) [ 11.641640] [<c01051e8>] (irq_exit) from [<c010c314>] (__handle_domain_irq+0x94/0xd0) [ 11.650381] [<c010c314>] (__handle_domain_irq) from [<c04fde70>] (gic_handle_irq+0x50/0xd4) [ 11.659641] [<c04fde70>] (gic_handle_irq) from [<c0101b70>] (__irq_svc+0x70/0xb0) Signed-off-by: Hongyu Jin <hongyu.jin@unisoc.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20220401115527.4935-1-hongyu.jin.cn@gmail.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2022-04-15 23:51:43 +08:00
Theodore Ts'o	eb7054212e	ext4: update the cached overhead value in the superblock If we (re-)calculate the file system overhead amount and it's different from the on-disk s_overhead_clusters value, update the on-disk version since this can take potentially quite a while on bigalloc file systems. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org	2022-04-14 22:39:00 -04:00
Jens Axboe	701521403c	io_uring: abort file assignment prior to assigning creds We need to either restore creds properly if we fail on the file assignment, or just do the file assignment first instead. Let's do the latter as it's simpler, should make no difference here for file assignment. Link: https://lore.kernel.org/lkml/000000000000a7edb305dca75a50@google.com/ Reported-by: syzbot+60c52ca98513a8760a91@syzkaller.appspotmail.com Fixes: `6bf9c47a39` ("io_uring: defer file assignment") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-14 20:23:40 -06:00
Theodore Ts'o	85d825dbf4	ext4: force overhead calculation if the s_overhead_cluster makes no sense If the file system does not use bigalloc, calculating the overhead is cheap, so force the recalculation of the overhead so we don't have to trust the precalculated overhead in the superblock. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org	2022-04-14 22:05:47 -04:00
Namjae Jeon	02655a70b7	ksmbd: set fixed sector size to FS_SECTOR_SIZE_INFORMATION Currently ksmbd is using ->f_bsize from vfs_statfs() as sector size. If fat/exfat is a local share, ->f_bsize is a cluster size that is too large to be used as a sector size. Sector sizes larger than 4K cause problem occurs when mounting an iso file through windows client. The error message can be obtained using Mount-DiskImage command, the error is: "Mount-DiskImage : The sector size of the physical disk on which the virtual disk resides is not supported." This patch reports fixed 4KB sector size if ->s_blocksize is bigger than 4KB. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-04-14 20:56:13 -05:00
Namjae Jeon	8510a043d3	ksmbd: increment reference count of parent fp Add missing increment reference count of parent fp in ksmbd_lookup_fd_inode(). Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-04-14 20:56:13 -05:00
Namjae Jeon	50f500b7f6	ksmbd: remove filename in ksmbd_file If the filename is change by underlying rename the server, fp->filename and real filename can be different. This patch remove the uses of fp->filename in ksmbd and replace it with d_path(). Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2022-04-14 20:56:13 -05:00

... 20 21 22 23 24 ...

76795 Commits