linux

Author	SHA1	Message	Date
Takashi Iwai	512b74d17a	exfat: Drop superfluous new line for error messages exfat_err() adds the new line at the end of the message by itself, hence the passed string shouldn't contain a new line. Drop the superfluous newline letters in the error messages in a few places that have been put mistakenly. Reported-by: Joe Perches <joe@perches.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2022-08-01 10:14:07 +09:00
Takashi Iwai	64fca6e621	exfat: Downgrade ENAMETOOLONG error message to debug messages The ENAMETOOLONG error message is printed at each time when user tries to operate with a too long name, and this can flood the kernel logs easily, as every user can trigger this. Let's downgrade this error message level to a debug message for suppressing the superfluous logs. BugLink: https://bugzilla.suse.com/show_bug.cgi?id=1201725 Reviewed-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2022-08-01 10:14:07 +09:00
Takashi Iwai	6425baabda	exfat: Expand exfat_err() and co directly to pr_() macro Currently the error and info messages handled by exfat_err() and co are tossed to exfat_msg() function that does nothing but passes the strings with printk() invocation. Not only that this is more overhead by the indirect calls, but also this makes harder to extend for the debug print usage; because of the direct printk() call, you cannot make it for dynamic debug or without debug like the standard helpers such as pr_debug() or dev_dbg(). For addressing the problem, this patch replaces exfat_() macro to expand to pr_*() directly. Along with it, add the new exfat_debug() macro that is expanded to pr_debug() (which output can be gracefully suppressed via dyndbg). Reviewed-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2022-08-01 10:14:07 +09:00
Takashi Iwai	1b1a9195ae	exfat: Define NLS_NAME_* as bit flags explicitly NLS_NAME_* are bit flags although they are currently defined as enum; it's casually working so far (from 0 to 2), but it's error-prone and may bring a problem when we want to add more flag. This patch changes the definitions of NLS_NAME_* explicitly being bit flags. Reviewed-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2022-08-01 10:14:06 +09:00
Takashi Iwai	86da53e8ff	exfat: Return ENAMETOOLONG consistently for oversized paths LTP has a test for oversized file path renames and it expects the return value to be ENAMETOOLONG. However, exfat returns EINVAL unexpectedly in some cases, hence LTP test fails. The further investigation indicated that the problem happens only when iocharset isn't set to utf8. The difference comes from that, in the case of utf8, exfat_utf8_to_utf16() returns the error -ENAMETOOLONG directly and it's treated as the final error code. Meanwhile, on other iocharsets, exfat_nls_to_ucs2() returns the max path size but it sets NLS_NAME_OVERLEN to lossy flag instead; the caller side checks only whether lossy flag is set or not, resulting in always -EINVAL unconditionally. This patch aligns the return code for both cases by checking the lossy flag bit and returning ENAMETOOLONG when NLS_NAME_OVERLEN bit is set. BugLink: https://bugzilla.suse.com/show_bug.cgi?id=1201725 Reviewed-by: Petr Vorel <pvorel@suse.cz> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2022-08-01 10:14:06 +09:00
Yuezhang Mo	be17b1ccd4	exfat: remove duplicate write inode for extending dir/file Since the timestamps need to be updated, the directory entries will be updated by mark_inode_dirty() whether or not a new cluster is allocated for the file or directory, so there is no need to use __exfat_write_inode() to update the directory entries when allocating a new cluster for a file or directory. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Daniel Palmer <daniel.palmer@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2022-08-01 10:14:06 +09:00
Yuezhang Mo	4493895b2b	exfat: remove duplicate write inode for truncating file This commit moves updating file attributes and timestamps before calling __exfat_write_inode(), so that all updates of the inode had been written by __exfat_write_inode(), mark_inode_dirty() is unneeded. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Daniel Palmer <daniel.palmer@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2022-08-01 10:14:06 +09:00
Yuezhang Mo	23e6e1c9b3	exfat: reuse __exfat_write_inode() to update directory entry __exfat_write_inode() is used to update file and stream directory entries, except for file->start_clu and stream->flags. This commit moves update file->start_clu and stream->flags to __exfat_write_inode() and reuse __exfat_write_inode() to update directory entries. Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Daniel Palmer <daniel.palmer@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2022-08-01 10:14:05 +09:00
Sungjong Seo	204e6ceaa1	exfat: use updated exfat_chain directly during renaming In order for a file to access its own directory entry set, exfat_inode_info(ei) has two copied values. One is ei->dir, which is a snapshot of exfat_chain of the parent directory, and the other is ei->entry, which is the offset of the start of the directory entry set in the parent directory. Since the parent directory can be updated after the snapshot point, it should be used only for accessing one's own directory entry set. However, as of now, during renaming, it could try to traverse or to allocate clusters via snapshot values, it does not make sense. This potential problem has been revealed when exfat_update_parent_info() was removed by commit `d8dad2588a` ("exfat: fix referencing wrong parent directory information after renaming"). However, I don't think it's good idea to bring exfat_update_parent_info() back. Instead, let's use the updated exfat_chain of parent directory diectly. Fixes: `d8dad2588a` ("exfat: fix referencing wrong parent directory information after renaming") Reported-by: Wang Yugui <wangyugui@e16-tech.com> Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com> Tested-by: Wang Yugui <wangyugui@e16-tech.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2022-06-09 21:26:32 +09:00
Linus Torvalds	fdaf9a5840	Page cache changes for 5.19 - Appoint myself page cache maintainer - Fix how scsicam uses the page cache - Use the memalloc_nofs_save() API to replace AOP_FLAG_NOFS - Remove the AOP flags entirely - Remove pagecache_write_begin() and pagecache_write_end() - Documentation updates - Convert several address_space operations to use folios: - is_dirty_writeback - readpage becomes read_folio - releasepage becomes release_folio - freepage becomes free_folio - Change filler_t to require a struct file pointer be the first argument like ->read_folio -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAmKNMDUACgkQDpNsjXcp gj4/mwf/bpHhXH4ZoNIvtUpTF6rZbqeffmc0VrbxCZDZ6igRnRPglxZ9H9v6L53O 7B0FBQIfxgNKHZpdqGdOkv8cjg/GMe/HJUbEy5wOakYPo4L9fZpHbDZ9HM2Eankj xBqLIBgBJ7doKr+Y62DAN19TVD8jfRfVtli5mqXJoNKf65J7BkxljoTH1L3EXD9d nhLAgyQjR67JQrT/39KMW+17GqLhGefLQ4YnAMONtB6TVwX/lZmigKpzVaCi4r26 bnk5vaR/3PdjtNxIoYvxdc71y2Eg05n2jEq9Wcy1AaDv/5vbyZUlZ2aBSaIVbtKX WfrhN9O3L0bU5qS7p9PoyfLc9wpq8A== =djLv -----END PGP SIGNATURE----- Merge tag 'folio-5.19' of git://git.infradead.org/users/willy/pagecache Pull page cache updates from Matthew Wilcox: - Appoint myself page cache maintainer - Fix how scsicam uses the page cache - Use the memalloc_nofs_save() API to replace AOP_FLAG_NOFS - Remove the AOP flags entirely - Remove pagecache_write_begin() and pagecache_write_end() - Documentation updates - Convert several address_space operations to use folios: - is_dirty_writeback - readpage becomes read_folio - releasepage becomes release_folio - freepage becomes free_folio - Change filler_t to require a struct file pointer be the first argument like ->read_folio * tag 'folio-5.19' of git://git.infradead.org/users/willy/pagecache: (107 commits) nilfs2: Fix some kernel-doc comments Appoint myself page cache maintainer fs: Remove aops->freepage secretmem: Convert to free_folio nfs: Convert to free_folio orangefs: Convert to free_folio fs: Add free_folio address space operation fs: Convert drop_buffers() to use a folio fs: Change try_to_free_buffers() to take a folio jbd2: Convert release_buffer_page() to use a folio jbd2: Convert jbd2_journal_try_to_free_buffers to take a folio reiserfs: Convert release_buffer_page() to use a folio fs: Remove last vestiges of releasepage ubifs: Convert to release_folio reiserfs: Convert to release_folio orangefs: Convert to release_folio ocfs2: Convert to release_folio nilfs2: Remove comment about releasepage nfs: Convert to release_folio jfs: Convert to release_folio ...	2022-05-24 19:55:07 -07:00
Linus Torvalds	850f6033cd	Description for this pull request: - fix referencing wrong parent directory information during rename. - introduce a sys_tz mount option to use system timezone. - improve performance while zeroing a cluster with dirsync mount option. - fix slab-out-bounds in exat_clear_bitmap() reported from syzbot. -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEE6NzKS6Uv/XAAGHgyZwv7A1FEIQgFAmKLhPkWHGxpbmtpbmpl b25Aa2VybmVsLm9yZwAKCRBnC/sDUUQhCOeXD/9nJofEx/n9KK0pA1WP2zKoLBnP YgLfnWTBlXfErInklW4kg057S6q8M0pDm0iASLw9P6GZNe0VFV5PigrTyAjy5ghW hm3JiFAHIZgaOlOk2NQd/1Qv/IdlnkbRkngXqHcizxEX/LcKZpP+pb1mdW6NzWrt /HagLClFTUhb0Su3DT7TCqiam5lI+lkarRI0Jo4Scstgsn4aT+25jk9N0bfUih7f hGKJpii+5UWCLlBJnyyghrBRQiiPdsETadJdRnHgeDdzKg/UNWxMP0C+G6PmSko/ mScrR+FeH2toURSUESi1Q558z1+3Fhb8rMbl3aWV70FJDmzMwn9YyhPgBrX3x6Gb AF7UBHFvORStYRUmmSMbX9XkY2gNoI9qZMXghDRlgF8t/WWY8VeVnyslaWwqDQhw qXyOIThiCuhLfKTD+r+MM08oUPcyFBtuGvdzDOH7/b56zEDwzab+hSHZz94xPWEz ESk0hNhaJCEvcgEr7IlSSF4k5Ff+hWVKN4R/DD78yxmjHveOuNMibTeE2YgDIFEX SiKFaAiXUuWGVsAPuAeJ+np/7rW3OEBG8yKhtri5vsoXk+Mqd56rp0EkEHzqbVHQ Ki5gua549KNRuxnbXRtWLjCKucwN86mE45WD0P0ORnBOjlfmgg8adp6BBSW5yVD2 SZkfgI0FL8rWdBwRcQ== =fr78 -----END PGP SIGNATURE----- Merge tag 'exfat-for-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat updates from Namjae Jeon: - fix referencing wrong parent directory information during rename - introduce a sys_tz mount option to use system timezone - improve performance while zeroing a cluster with dirsync mount option - fix slab-out-bounds in exat_clear_bitmap() reported from syzbot * tag 'exfat-for-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: check if cluster num is valid exfat: reduce block requests when zeroing a cluster block: add sync_blockdev_range() exfat: introduce mount option 'sys_tz' exfat: fix referencing wrong parent directory information after renaming	2022-05-24 18:30:27 -07:00
Tadeusz Struk	64ba4b15e5	exfat: check if cluster num is valid Syzbot reported slab-out-of-bounds read in exfat_clear_bitmap. This was triggered by reproducer calling truncute with size 0, which causes the following trace: BUG: KASAN: slab-out-of-bounds in exfat_clear_bitmap+0x147/0x490 fs/exfat/balloc.c:174 Read of size 8 at addr ffff888115aa9508 by task syz-executor251/365 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack_lvl+0x1e2/0x24b lib/dump_stack.c:118 print_address_description+0x81/0x3c0 mm/kasan/report.c:233 __kasan_report mm/kasan/report.c:419 [inline] kasan_report+0x1a4/0x1f0 mm/kasan/report.c:436 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report_generic.c:309 exfat_clear_bitmap+0x147/0x490 fs/exfat/balloc.c:174 exfat_free_cluster+0x25a/0x4a0 fs/exfat/fatent.c:181 __exfat_truncate+0x99e/0xe00 fs/exfat/file.c:217 exfat_truncate+0x11b/0x4f0 fs/exfat/file.c:243 exfat_setattr+0xa03/0xd40 fs/exfat/file.c:339 notify_change+0xb76/0xe10 fs/attr.c:336 do_truncate+0x1ea/0x2d0 fs/open.c:65 Move the is_valid_cluster() helper from fatent.c to a common header to make it reusable in other *.c files. And add is_valid_cluster() to validate if cluster number is within valid range in exfat_clear_bitmap() and exfat_set_bitmap(). Link: https://syzkaller.appspot.com/bug?id=50381fc73821ecae743b8cf24b4c9a04776f767c Reported-by: syzbot+a4087e40b9c13aad7892@syzkaller.appspotmail.com Fixes: `1e49a94cf7` ("exfat: add bitmap operations") Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Tadeusz Struk <tadeusz.struk@linaro.org> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2022-05-23 11:17:30 +09:00
Yuezhang Mo	1b61383854	exfat: reduce block requests when zeroing a cluster If 'dirsync' is enabled, when zeroing a cluster, submitting sector by sector will generate many block requests, will cause the block device to not fully perform its performance. This commit makes the sectors in a cluster to be submitted in once, it will reduce the number of block requests. This will make the block device to give full play to its performance. Test create 1000 directories on SD card with: $ time (for ((i=0;i<1000;i++)); do mkdir dir${i}; done) Performance has been improved by more than 73% on imx6q-sabrelite. Cluster size Before After Improvement 64 KBytes 3m34.036s 0m56.052s 73.8% 128 KBytes 6m2.644s 1m13.354s 79.8% 256 KBytes 11m22.202s 1m39.451s 85.4% imx6q-sabrelite: - CPU: 792 MHz x4 - Memory: 1GB DDR3 - SD Card: SanDisk 8GB Class 4 Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2022-05-23 11:17:30 +09:00
Chung-Chiang Cheng	9b002894b4	exfat: introduce mount option 'sys_tz' EXFAT_TZ_VALID bit in {create,modify,access}_tz is corresponding to OffsetValid field in exfat specification [1]. When this bit isn't set, timestamps should be treated as having the same UTC offset as the current local time. Currently, there is an option 'time_offset' for users to specify the UTC offset for this issue. This patch introduces a new mount option 'sys_tz' to use system timezone as time offset. Link: [1] https://docs.microsoft.com/en-us/windows/win32/fileio/exfat-specification#74102-offsetvalid-field Signed-off-by: Chung-Chiang Cheng <cccheng@synology.com> Acked-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2022-05-23 11:17:29 +09:00
Yuezhang Mo	d8dad2588a	exfat: fix referencing wrong parent directory information after renaming During renaming, the parent directory information maybe updated. But the file/directory still references to the old parent directory information. This bug will cause 2 problems. (1) The renamed file can not be written. [10768.175172] exFAT-fs (sda1): error, failed to bmap (inode : 7afd50e4 iblock : 0, err : -5) [10768.184285] exFAT-fs (sda1): Filesystem has been set read-only ash: write error: Input/output error (2) Some dentries of the renamed file/directory are not set to deleted after removing the file/directory. exfat_update_parent_info() is a workaround for the wrong parent directory information being used after renaming. Now that bug is fixed, this is no longer needed, so remove it. Fixes: `5f2aa07507` ("exfat: add inode operations") Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Reviewed-by: Daniel Palmer <daniel.palmer@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2022-05-23 11:17:29 +09:00
Matthew Wilcox (Oracle)	f132ab7d3a	fs: Convert mpage_readpage to mpage_read_folio mpage_readpage still works in terms of pages, and has not been audited for correctness with large folios, so include an assertion that the filesystem is not passing it large folios. Convert all the filesystems to call mpage_read_folio() instead of mpage_readpage(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>	2022-05-09 16:21:44 -04:00
Matthew Wilcox (Oracle)	9d6b0cd757	fs: Remove flags parameter from aops->write_begin There are no more aop flags left, so remove the parameter. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2022-05-08 14:28:19 -04:00
Matthew Wilcox (Oracle)	be3bbbc588	fs: Remove aop flags parameter from cont_write_begin() There are no more aop flags left, so remove the parameter. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de>	2022-05-08 14:28:19 -04:00
Christoph Hellwig	7b47ef52d0	block: add a bdev_discard_granularity helper Abstract away implementation details from file systems by providing a block_device based helper to retrieve the discard granularity. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd] Acked-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Acked-by: David Sterba <dsterba@suse.com> [btrfs] Link: https://lore.kernel.org/r/20220415045258.199825-26-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-17 19:49:59 -06:00
Christoph Hellwig	70200574cc	block: remove QUEUE_FLAG_DISCARD Just use a non-zero max_discard_sectors as an indicator for discard support, similar to what is done for write zeroes. The only places where needs special attention is the RAID5 driver, which must clear discard support for security reasons by default, even if the default stacking rules would allow for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> [drbd] Acked-by: Jan Höppner <hoeppner@linux.ibm.com> [s390] Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: David Sterba <dsterba@suse.com> [btrfs] Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-25-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-04-17 19:49:59 -06:00
Linus Torvalds	ec251f3e18	Description for this pull request: - Add keep_last_dots mount option to allow access to paths with trailing dots. - Avoid repetitive volume dirty bit set/clear to improve storage life time. -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEE6NzKS6Uv/XAAGHgyZwv7A1FEIQgFAmJGW7IWHGxpbmtpbmpl b25Aa2VybmVsLm9yZwAKCRBnC/sDUUQhCDp3D/49MeePwrzH8Aon+0MMWH2C23aZ UWkHQhedSmnscyb3xbaqT6UJHk3CbIdgMqJpJd1zwRs7U9o37zPQy8ma1DcOPWA9 Bn1NUTdj/RWye+2t/6YT44xODsG5sRrzKeG2AcenivVxCW1+Xl9YBGk9J8XXIxm0 IQTsswGuLfDdtGoHWXiSRPTatTxPsJospZER94nhGEiPxyOgVuLHUWxWxAUMXY8w RL9B/t7AlRFSk8HXW5aL5tNsd6OBeLybBN51lToPGPRq1qd9Aarlp0mlwMnaie2z 1AoTXzExd1IBglfcKhppTIUqAioeIdTOkfSudaLPAoTc6PDjJAxEOQJx2NfBNimC JM209+MtJh5nTiNmpfGxT64vHkR70Kf1+Vdasv/iFfHQlhIbGJ+necIPHXUgA20Y 0iZ9bksiYGNoHFc5wKQ1wBmMxsKNLVz0GYJJez/pw+JfwCGztiSlxQWDkFxj5qNW 2L/9YRJLVpI4vmNTAFM3sHaDaYkXWfAtgWhlS0c95SVVUsvR4vu7UYTOKYlsc1mC T0ABV8TJIXQwlD6HzNGpU0OKD6/ddA81GrIdkoRuzT9KGJDGA1oz8ehHO+Hs796K ivRkARLIVwoQZAoZJR17tYnMd4GnSC/qr70HqCBFnFeNTOkl3fpzjEzxoIXL+A9t KbPtJlbxQao7DKLo6g== =PoqS -----END PGP SIGNATURE----- Merge tag 'exfat-for-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat updates from Namjae Jeon: - Add keep_last_dots mount option to allow access to paths with trailing dots - Avoid repetitive volume dirty bit set/clear to improve storage life time * tag 'exfat-for-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: do not clear VolumeDirty in writeback exfat: allow access to paths with trailing dots	2022-04-01 14:20:24 -07:00
Yuezhang Mo	a4a3d8c52d	exfat: do not clear VolumeDirty in writeback Before this commit, VolumeDirty will be cleared first in writeback if 'dirsync' or 'sync' is not enabled. If the power is suddenly cut off after cleaning VolumeDirty but other updates are not written, the exFAT filesystem will not be able to detect the power failure in the next mount. And VolumeDirty will be set again but not cleared when updating the parent directory. It means that BootSector will be written at least once in each write-back, which will shorten the life of the device. Reviewed-by: Andy Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama Wataru <wataru.aoyama@sony.com> Signed-off-by: Yuezhang Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2022-04-01 10:51:03 +09:00
Vasant Karasulli	9ec784bf77	exfat: allow access to paths with trailing dots The Linux kernel exfat driver currently unconditionally strips trailing periods '.' from path components. This isdone intentionally, loosely following Windows behaviour and specifications which state: #exFAT The concatenated file name has the same set of illegal characters as other FAT-based file systems (see Table 31). #FAT ... Leading and trailing spaces in a long name are ignored. Leading and embedded periods are allowed in a name and are stored in the long name. Trailing periods are ignored. Note: Leading and trailing space ' ' characters are currently retained by Linux kernel exfat, in conflict with the above specification. On Windows 10, trailing and leading space ' ' characters are stripped from the filenames. Some implementations, such as fuse-exfat, don't perform path trailer removal. When mounting images which contain trailing-dot paths, these paths are unreachable, e.g.: + mount.exfat-fuse /dev/zram0 /mnt/test/ FUSE exfat 1.3.0 + cd /mnt/test/ + touch fuse_created_dots... ' fuse_created_spaces ' + ls -l total 0 -rwxrwxrwx 1 root 0 0 Aug 18 09:45 ' fuse_created_spaces ' -rwxrwxrwx 1 root 0 0 Aug 18 09:45 fuse_created_dots... + cd / + umount /mnt/test/ + mount -t exfat /dev/zram0 /mnt/test + cd /mnt/test + ls -l ls: cannot access 'fuse_created_dots...': No such file or directory total 0 -rwxr-xr-x 1 root 0 0 Aug 18 09:45 ' fuse_created_spaces ' -????????? ? ? ? ? ? fuse_created_dots... + touch kexfat_created_dots... ' kexfat_created_spaces ' + ls -l ls: cannot access 'fuse_created_dots...': No such file or directory total 0 -rwxr-xr-x 1 root 0 0 Aug 18 09:45 ' fuse_created_spaces ' -rwxr-xr-x 1 root 0 0 Aug 18 09:45 ' kexfat_created_spaces ' -????????? ? ? ? ? ? fuse_created_dots... -rwxr-xr-x 1 root 0 0 Aug 18 09:45 kexfat_created_dots + cd / + umount /mnt/test/ This commit adds "keep_last_dots" mount option that controls whether or not trailing periods '.' are stripped from path components during file lookup or file creation. This mount option can be used to access paths with trailing periods and disallow creating files with names with trailing periods. E.g. continuing from the previous example: + mount -t exfat -o keep_last_dots /dev/zram0 /mnt/test + cd /mnt/test + ls -l total 0 -rwxr-xr-x 1 root 0 0 Aug 18 10:32 ' fuse_created_spaces ' -rwxr-xr-x 1 root 0 0 Aug 18 10:32 ' kexfat_created_spaces ' -rwxr-xr-x 1 root 0 0 Aug 18 10:32 fuse_created_dots... -rwxr-xr-x 1 root 0 0 Aug 18 10:32 kexfat_created_dots + echo > kexfat_created_dots_again... sh: kexfat_created_dots_again...: Invalid argument Link: https://bugzilla.suse.com/show_bug.cgi?id=1188964 Link: https://lore.kernel.org/linux-fsdevel/003b01d755e4$31fb0d80$95f12880$ @samsung.com/ Link: https://docs.microsoft.com/en-us/windows/win32/fileio/exfat-specification Suggested-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Vasant Karasulli <vkarasulli@suse.de> Co-developed-by: David Disseldorp <ddiss@suse.de> Signed-off-by: David Disseldorp <ddiss@suse.de> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2022-04-01 10:51:02 +09:00
Linus Torvalds	6b1f86f8e9	Filesystem folio changes for 5.18 Primarily this series converts some of the address_space operations to take a folio instead of a page. ->is_partially_uptodate() takes a folio instead of a page and changes the type of the 'from' and 'count' arguments to make it obvious they're bytes. ->invalidatepage() becomes ->invalidate_folio() and has a similar type change. ->launder_page() becomes ->launder_folio() ->set_page_dirty() becomes ->dirty_folio() and adds the address_space as an argument. There are a couple of other misc changes up front that weren't worth separating into their own pull request. -----BEGIN PGP SIGNATURE----- iQEzBAABCgAdFiEEejHryeLBw/spnjHrDpNsjXcpgj4FAmI4hqMACgkQDpNsjXcp gj7r7Af/fVJ7m8kKqjP/IayX3HiJRuIDQw+vM++BlRNXdjz+IyED6whdmFGxJeOY BMyT+8ApOAz7ErS4G+7fAv4ScJK/aEgFUsnSeAiCp0PliiEJ5NNJzElp6sVmQ7H5 SX7+Ek444FZUGsQuy0qL7/ELpR3ditnD7x+5U2g0p5TeaHGUQn84crRyfR4xuhNG EBD9D71BOb7OxUcOHe93pTkK51QsQ0aCrcIsB1tkK5KR0BAthn1HqF7ehL90Rvrr omx5M7aDWGY4oj7IKrhlAs+55Ah2WaOzrZBp0FXNbr4UENDBKWKyUxErwa4xPkf6 Gm1iQG/CspOHnxN3YWsd5WjtlL3A+A== =cOiq -----END PGP SIGNATURE----- Merge tag 'folio-5.18b' of git://git.infradead.org/users/willy/pagecache Pull filesystem folio updates from Matthew Wilcox: "Primarily this series converts some of the address_space operations to take a folio instead of a page. Notably: - a_ops->is_partially_uptodate() takes a folio instead of a page and changes the type of the 'from' and 'count' arguments to make it obvious they're bytes. - a_ops->invalidatepage() becomes ->invalidate_folio() and has a similar type change. - a_ops->launder_page() becomes ->launder_folio() - a_ops->set_page_dirty() becomes ->dirty_folio() and adds the address_space as an argument. There are a couple of other misc changes up front that weren't worth separating into their own pull request" * tag 'folio-5.18b' of git://git.infradead.org/users/willy/pagecache: (53 commits) fs: Remove aops ->set_page_dirty fb_defio: Use noop_dirty_folio() fs: Convert __set_page_dirty_no_writeback to noop_dirty_folio fs: Convert __set_page_dirty_buffers to block_dirty_folio nilfs: Convert nilfs_set_page_dirty() to nilfs_dirty_folio() mm: Convert swap_set_page_dirty() to swap_dirty_folio() ubifs: Convert ubifs_set_page_dirty to ubifs_dirty_folio f2fs: Convert f2fs_set_node_page_dirty to f2fs_dirty_node_folio f2fs: Convert f2fs_set_data_page_dirty to f2fs_dirty_data_folio f2fs: Convert f2fs_set_meta_page_dirty to f2fs_dirty_meta_folio afs: Convert afs_dir_set_page_dirty() to afs_dir_dirty_folio() btrfs: Convert extent_range_redirty_for_io() to use folios fs: Convert trivial uses of __set_page_dirty_nobuffers to filemap_dirty_folio btrfs: Convert from set_page_dirty to dirty_folio fscache: Convert fscache_set_page_dirty() to fscache_dirty_folio() fs: Add aops->dirty_folio fs: Remove aops->launder_page orangefs: Convert launder_page to launder_folio nfs: Convert from launder_page to launder_folio fuse: Convert from launder_page to launder_folio ...	2022-03-22 18:26:56 -07:00
Muchun Song	fd60b28842	fs: allocate inode by using alloc_inode_sb() The inode allocation is supposed to use alloc_inode_sb(), so convert kmem_cache_alloc() of all filesystems to alloc_inode_sb(). Link: https://lkml.kernel.org/r/20220228122126.37293-5-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Theodore Ts'o <tytso@mit.edu> [ext4] Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Cc: Alex Shi <alexs@kernel.org> Cc: Anna Schumaker <Anna.Schumaker@Netapp.com> Cc: Chao Yu <chao@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Fam Zheng <fam.zheng@bytedance.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kari Argillander <kari.argillander@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Qi Zheng <zhengqi.arch@bytedance.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Yang <richard.weiyang@gmail.com> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2022-03-22 15:57:03 -07:00
Matthew Wilcox (Oracle)	e621900ad2	fs: Convert __set_page_dirty_buffers to block_dirty_folio Convert all callers; mostly this is just changing the aops to point at it, but a few implementations need a little more work. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs Tested-by: David Howells <dhowells@redhat.com> # afs	2022-03-16 13:37:04 -04:00
Matthew Wilcox (Oracle)	7ba13abbd3	fs: Turn block_invalidatepage into block_invalidate_folio Remove special-casing of a NULL invalidatepage, since there is no more block_invalidatepage. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs Tested-by: David Howells <dhowells@redhat.com> # afs	2022-03-15 08:23:29 -04:00
Yuezhang.Mo	3d966521a8	exfat: fix missing REQ_SYNC in exfat_update_bhs() If 'dirsync' is enabled, all directory updates within the filesystem should be done synchronously. exfat_update_bh() does as this, but exfat_update_bhs() does not. Reviewed-by: Andy.Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama, Wataru <wataru.aoyama@sony.com> Reviewed-by: Kobayashi, Kento <Kento.A.Kobayashi@sony.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Yuezhang.Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2022-01-10 11:00:04 +09:00
Yuezhang.Mo	c71510b3fa	exfat: remove argument 'sector' from exfat_get_dentry() No any function uses argument 'sector', remove it. Reviewed-by: Andy.Wu <Andy.Wu@sony.com> Reviewed-by: Aoyama, Wataru <wataru.aoyama@sony.com> Acked-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Yuezhang.Mo <Yuezhang.Mo@sony.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2022-01-10 11:00:03 +09:00
Namjae Jeon	1ed147e29e	exfat: move super block magic number to magic.h Move exfat superblock magic number from local definition to magic.h. It is also needed by userspace programs that call fstatfs(). Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2022-01-10 11:00:03 +09:00
Christophe Vu-Brugier	92fba084b7	exfat: fix i_blocks for files truncated over 4 GiB In exfat_truncate(), the computation of inode->i_blocks is wrong if the file is larger than 4 GiB because a 32-bit variable is used as a mask. This is fixed and simplified by using round_up(). Also fix the same buggy computation in exfat_read_root() and another (correct) one in exfat_fill_inode(). The latter was fixed another way last month but can be simplified by using round_up() as well. See: commit `0c336d6e33` ("exfat: fix incorrect loading of i_blocks for large files") Fixes: `98d917047e` ("exfat: add file operations") Cc: stable@vger.kernel.org # v5.7+ Suggested-by: Matthew Wilcox <willy@infradead.org> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Christophe Vu-Brugier <christophe.vu-brugier@seagate.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2022-01-10 11:00:02 +09:00
Christophe Vu-Brugier	7dee6f57d7	exfat: reuse exfat_inode_info variable instead of calling EXFAT_I() Also add a local "struct exfat_inode_info *ei" variable to exfat_truncate() to simplify the code. Signed-off-by: Christophe Vu-Brugier <christophe.vu-brugier@seagate.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2022-01-10 11:00:02 +09:00
Christophe Vu-Brugier	8cf058834b	exfat: make exfat_find_location() static Make exfat_find_location() static. Signed-off-by: Christophe Vu-Brugier <christophe.vu-brugier@seagate.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2022-01-10 11:00:01 +09:00
Christophe Vu-Brugier	6fa96cd5ad	exfat: fix typos in comments Fix typos in comments. Signed-off-by: Christophe Vu-Brugier <christophe.vu-brugier@seagate.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2022-01-10 11:00:01 +09:00
Christophe Vu-Brugier	e21a28bbcc	exfat: simplify is_valid_cluster() Simplify is_valid_cluster(). Signed-off-by: Christophe Vu-Brugier <christophe.vu-brugier@seagate.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2022-01-10 11:00:00 +09:00
Sungjong Seo	0c336d6e33	exfat: fix incorrect loading of i_blocks for large files When calculating i_blocks, there was a mistake that was masked with a 32-bit variable. So i_blocks for files larger than 4 GiB had incorrect values. Mask with a 64-bit variable instead of 32-bit one. Fixes: `5f2aa07507` ("exfat: add inode operations") Cc: stable@vger.kernel.org # v5.7+ Reported-by: Ganapathi Kamath <hgkamath@hotmail.com> Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>	2021-11-01 07:49:21 +09:00
Linus Torvalds	7a5e9a17b2	Description for this pull request: - Improved compatibility issue with exfat from some camera vendors. - Do not need to release root inode on error path. -----BEGIN PGP SIGNATURE----- iQJMBAABCgA2FiEE6NzKS6Uv/XAAGHgyZwv7A1FEIQgFAmDhB1QYHG5hbWphZS5q ZW9uQHNhbXN1bmcuY29tAAoJEGcL+wNRRCEIf/sQAMtOahaIoZzdF/5Iprk4dP5b y8l0C1dM+Ijkc/F0VvfXfdAbRzbXYfd7hh5JyJGXQYdvo6NxjEQ9xSEWPbYxEBJg zUc2jKG1Qgmcy7XKAqcFDZjogBFtAqYsDseN0RDiRpUFwrFkT2wE4pC3TLVTK2+b GKDxoQdzoXp/BfyHJ03YXxxzwDyBmYK7/7yulLA1nVOiNrRcN7foImylZ+6j24u0 3IwxqWfbubl5kH9ABn77ZWI1yhwTRHczhIBh4GBDO/6/+HOsvFOoednQ3i/7lx9q d2JZqRYyV0Nrxnfy1LAkJ4G4UiZURaLM7UEQUDMFxwqYPAmfW2FiAFx8OvoPoTz6 ZxYOUSFg/GZOuYSXa5/NF/Gf2qO1sBdFvqnqXn81sJK0L/5BW98YYayCMkdWtvCn KFWTi7FCtcotJwsirrYom4rsXx5YfSDCCiLhhwo4j43MepA8krDdLcuBxRRp6LRC QZEd7k/gdHvcqxFYYeYM4U5c6m3ASvoT6vJ9kA+P39NwWjrE2wVH1QP3HPVTm1w7 7P4K/mTrE62VcWEzimtbTtNLfmknPA9eEuzxg7mC9QJdN0VczJ24BXxKRqJg95fd yjVNwbsxkwUHXt6GITVGFJkE3E7GIVS9/Ket6VjMZAcSACJjznnL1nnlklu1b6MS k40sJJ1o0VLB5JmKQxY/ =U4e0 -----END PGP SIGNATURE----- Merge tag 'exfat-for-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat updates from Namjae Jeon: - Improved compatibility issue with exfat from some camera vendors. - Do not need to release root inode on error path. * tag 'exfat-for-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: handle wrong stream entry size in exfat_readdir() exfat: avoid incorrectly releasing for root inode	2021-07-06 11:06:04 -07:00
Namjae Jeon	1e5654de0f	exfat: handle wrong stream entry size in exfat_readdir() The compatibility issue between linux exfat and exfat of some camera company was reported from Florian. In their exfat, if the number of files exceeds any limit, the DataLength in stream entry of the directory is no longer updated. So some files created from camera does not show in linux exfat. because linux exfat doesn't allow that cpos becomes larger than DataLength of stream entry. This patch check DataLength in stream entry only if the type is ALLOC_NO_FAT_CHAIN and add the check ensure that dentry offset does not exceed max dentries size(256 MB) to avoid the circular FAT chain issue. Fixes: `ca06197382` ("exfat: add directory operations") Cc: stable@vger.kernel.org # v5.9 Reported-by: Florian Cramer <flrncrmr@gmail.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Tested-by: Chris Down <chris@chrisdown.name> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>	2021-07-04 09:33:00 +09:00
Christoph Hellwig	0af573780b	mm: require ->set_page_dirty to be explicitly wired up Remove the CONFIG_BLOCK default to __set_page_dirty_buffers and just wire that method up for the missing instances. [hch@lst.de: ecryptfs: add a ->set_page_dirty cludge] Link: https://lkml.kernel.org/r/20210624125250.536369-1-hch@lst.de Link: https://lkml.kernel.org/r/20210614061512.3966143-4-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Tyler Hicks <code@tyhicks.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-06-29 10:53:48 -07:00
Chen Li	839a534f1e	exfat: avoid incorrectly releasing for root inode In d_make_root, when we fail to allocate dentry for root inode, we will iput root inode and returned value is NULL in this function. So we do not need to release this inode again at d_make_root's caller. Signed-off-by: Chen Li <chenli@uniontech.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>	2021-06-28 21:49:47 +09:00
Hyeongseok Kim	c6e2f52e30	exfat: speed up iterate/lookup by fixing start point of traversing cluster chain When directory iterate and lookup is called, there's a buggy rewinding of start point for traversing cluster chain to the parent directory entry's first cluster. This caused repeated cluster chain traversing from the first entry of the parent directory that would show worse performance if huge amounts of files exist under the parent directory. Fix not to rewind, make continue from currently referenced cluster and dir entry. Tested with 50,000 files under single directory / 256GB sdcard, with command "time ls -l > /dev/null", Before : 0m08.69s real 0m00.27s user 0m05.91s system After : 0m07.01s real 0m00.25s user 0m04.34s system Signed-off-by: Hyeongseok Kim <hyeongseok@gmail.com> Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>	2021-04-27 20:45:07 +09:00
Hyeongseok Kim	23befe490b	exfat: improve write performance when dirsync enabled Degradation of write speed caused by frequent disk access for cluster bitmap update on every cluster allocation could be improved by selective syncing bitmap buffer. Change to flush bitmap buffer only for the directory related operations. Signed-off-by: Hyeongseok Kim <hyeongseok@gmail.com> Acked-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>	2021-04-27 20:45:06 +09:00
Hyeongseok Kim	654762df2e	exfat: add support ioctl and FITRIM function Add FITRIM ioctl to enable discarding unused blocks while mounted. As current exFAT doesn't have generic ioctl handler, add empty ioctl function first, and add FITRIM handler. Signed-off-by: Hyeongseok Kim <hyeongseok@gmail.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>	2021-04-27 20:45:06 +09:00
Hyeongseok Kim	5c2d728507	exfat: introduce bitmap_lock for cluster bitmap access s_lock which is for protecting concurrent access of file operations is too huge for cluster bitmap protection, so introduce a new bitmap_lock to narrow the lock range if only need to access cluster bitmap. Signed-off-by: Hyeongseok Kim <hyeongseok@gmail.com> Acked-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>	2021-04-27 20:45:06 +09:00
Hyeongseok Kim	77edfc6e51	exfat: fix erroneous discard when clear cluster bit If mounted with discard option, exFAT issues discard command when clear cluster bit to remove file. But the input parameter of cluster-to-sector calculation is abnormally added by reserved cluster size which is 2, leading to discard unrelated sectors included in target+2 cluster. With fixing this, remove the wrong comments in set/clear/find bitmap functions. Fixes: `1e49a94cf7` ("exfat: add bitmap operations") Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Hyeongseok Kim <hyeongseok@gmail.com> Acked-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>	2021-04-27 20:45:06 +09:00
Linus Torvalds	7d6beb71da	idmapped-mounts-v5.12 -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYCegywAKCRCRxhvAZXjc ouJ6AQDlf+7jCQlQdeKKoN9QDFfMzG1ooemat36EpRRTONaGuAD8D9A4sUsG4+5f 4IU5Lj9oY4DEmF8HenbWK2ZHsesL2Qg= =yPaw -----END PGP SIGNATURE----- Merge tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull idmapped mounts from Christian Brauner: "This introduces idmapped mounts which has been in the making for some time. Simply put, different mounts can expose the same file or directory with different ownership. This initial implementation comes with ports for fat, ext4 and with Christoph's port for xfs with more filesystems being actively worked on by independent people and maintainers. Idmapping mounts handle a wide range of long standing use-cases. Here are just a few: - Idmapped mounts make it possible to easily share files between multiple users or multiple machines especially in complex scenarios. For example, idmapped mounts will be used in the implementation of portable home directories in systemd-homed.service(8) where they allow users to move their home directory to an external storage device and use it on multiple computers where they are assigned different uids and gids. This effectively makes it possible to assign random uids and gids at login time. - It is possible to share files from the host with unprivileged containers without having to change ownership permanently through chown(2). - It is possible to idmap a container's rootfs and without having to mangle every file. For example, Chromebooks use it to share the user's Download folder with their unprivileged containers in their Linux subsystem. - It is possible to share files between containers with non-overlapping idmappings. - Filesystem that lack a proper concept of ownership such as fat can use idmapped mounts to implement discretionary access (DAC) permission checking. - They allow users to efficiently changing ownership on a per-mount basis without having to (recursively) chown(2) all files. In contrast to chown (2) changing ownership of large sets of files is instantenous with idmapped mounts. This is especially useful when ownership of a whole root filesystem of a virtual machine or container is changed. With idmapped mounts a single syscall mount_setattr syscall will be sufficient to change the ownership of all files. - Idmapped mounts always take the current ownership into account as idmappings specify what a given uid or gid is supposed to be mapped to. This contrasts with the chown(2) syscall which cannot by itself take the current ownership of the files it changes into account. It simply changes the ownership to the specified uid and gid. This is especially problematic when recursively chown(2)ing a large set of files which is commong with the aforementioned portable home directory and container and vm scenario. - Idmapped mounts allow to change ownership locally, restricting it to specific mounts, and temporarily as the ownership changes only apply as long as the mount exists. Several userspace projects have either already put up patches and pull-requests for this feature or will do so should you decide to pull this: - systemd: In a wide variety of scenarios but especially right away in their implementation of portable home directories. https://systemd.io/HOME_DIRECTORY/ - container runtimes: containerd, runC, LXD:To share data between host and unprivileged containers, unprivileged and privileged containers, etc. The pull request for idmapped mounts support in containerd, the default Kubernetes runtime is already up for quite a while now: https://github.com/containerd/containerd/pull/4734 - The virtio-fs developers and several users have expressed interest in using this feature with virtual machines once virtio-fs is ported. - ChromeOS: Sharing host-directories with unprivileged containers. I've tightly synced with all those projects and all of those listed here have also expressed their need/desire for this feature on the mailing list. For more info on how people use this there's a bunch of talks about this too. Here's just two recent ones: https://www.cncf.io/wp-content/uploads/2020/12/Rootless-Containers-in-Gitpod.pdf https://fosdem.org/2021/schedule/event/containers_idmap/ This comes with an extensive xfstests suite covering both ext4 and xfs: https://git.kernel.org/brauner/xfstests-dev/h/idmapped_mounts It covers truncation, creation, opening, xattrs, vfscaps, setid execution, setgid inheritance and more both with idmapped and non-idmapped mounts. It already helped to discover an unrelated xfs setgid inheritance bug which has since been fixed in mainline. It will be sent for inclusion with the xfstests project should you decide to merge this. In order to support per-mount idmappings vfsmounts are marked with user namespaces. The idmapping of the user namespace will be used to map the ids of vfs objects when they are accessed through that mount. By default all vfsmounts are marked with the initial user namespace. The initial user namespace is used to indicate that a mount is not idmapped. All operations behave as before and this is verified in the testsuite. Based on prior discussions we want to attach the whole user namespace and not just a dedicated idmapping struct. This allows us to reuse all the helpers that already exist for dealing with idmappings instead of introducing a whole new range of helpers. In addition, if we decide in the future that we are confident enough to enable unprivileged users to setup idmapped mounts the permission checking can take into account whether the caller is privileged in the user namespace the mount is currently marked with. The user namespace the mount will be marked with can be specified by passing a file descriptor refering to the user namespace as an argument to the new mount_setattr() syscall together with the new MOUNT_ATTR_IDMAP flag. The system call follows the openat2() pattern of extensibility. The following conditions must be met in order to create an idmapped mount: - The caller must currently have the CAP_SYS_ADMIN capability in the user namespace the underlying filesystem has been mounted in. - The underlying filesystem must support idmapped mounts. - The mount must not already be idmapped. This also implies that the idmapping of a mount cannot be altered once it has been idmapped. - The mount must be a detached/anonymous mount, i.e. it must have been created by calling open_tree() with the OPEN_TREE_CLONE flag and it must not already have been visible in the filesystem. The last two points guarantee easier semantics for userspace and the kernel and make the implementation significantly simpler. By default vfsmounts are marked with the initial user namespace and no behavioral or performance changes are observed. The manpage with a detailed description can be found here: `1d7b902e28` In order to support idmapped mounts, filesystems need to be changed and mark themselves with the FS_ALLOW_IDMAP flag in fs_flags. The patches to convert individual filesystem are not very large or complicated overall as can be seen from the included fat, ext4, and xfs ports. Patches for other filesystems are actively worked on and will be sent out separately. The xfstestsuite can be used to verify that port has been done correctly. The mount_setattr() syscall is motivated independent of the idmapped mounts patches and it's been around since July 2019. One of the most valuable features of the new mount api is the ability to perform mounts based on file descriptors only. Together with the lookup restrictions available in the openat2() RESOLVE_* flag namespace which we added in v5.6 this is the first time we are close to hardened and race-free (e.g. symlinks) mounting and path resolution. While userspace has started porting to the new mount api to mount proper filesystems and create new bind-mounts it is currently not possible to change mount options of an already existing bind mount in the new mount api since the mount_setattr() syscall is missing. With the addition of the mount_setattr() syscall we remove this last restriction and userspace can now fully port to the new mount api, covering every use-case the old mount api could. We also add the crucial ability to recursively change mount options for a whole mount tree, both removing and adding mount options at the same time. This syscall has been requested multiple times by various people and projects. There is a simple tool available at https://github.com/brauner/mount-idmapped that allows to create idmapped mounts so people can play with this patch series. I'll add support for the regular mount binary should you decide to pull this in the following weeks: Here's an example to a simple idmapped mount of another user's home directory: u1001@f2-vm:/$ sudo ./mount --idmap both:1000:1001:1 /home/ubuntu/ /mnt u1001@f2-vm:/$ ls -al /home/ubuntu/ total 28 drwxr-xr-x 2 ubuntu ubuntu 4096 Oct 28 22:07 . drwxr-xr-x 4 root root 4096 Oct 28 04:00 .. -rw------- 1 ubuntu ubuntu 3154 Oct 28 22:12 .bash_history -rw-r--r-- 1 ubuntu ubuntu 220 Feb 25 2020 .bash_logout -rw-r--r-- 1 ubuntu ubuntu 3771 Feb 25 2020 .bashrc -rw-r--r-- 1 ubuntu ubuntu 807 Feb 25 2020 .profile -rw-r--r-- 1 ubuntu ubuntu 0 Oct 16 16:11 .sudo_as_admin_successful -rw------- 1 ubuntu ubuntu 1144 Oct 28 00:43 .viminfo u1001@f2-vm:/$ ls -al /mnt/ total 28 drwxr-xr-x 2 u1001 u1001 4096 Oct 28 22:07 . drwxr-xr-x 29 root root 4096 Oct 28 22:01 .. -rw------- 1 u1001 u1001 3154 Oct 28 22:12 .bash_history -rw-r--r-- 1 u1001 u1001 220 Feb 25 2020 .bash_logout -rw-r--r-- 1 u1001 u1001 3771 Feb 25 2020 .bashrc -rw-r--r-- 1 u1001 u1001 807 Feb 25 2020 .profile -rw-r--r-- 1 u1001 u1001 0 Oct 16 16:11 .sudo_as_admin_successful -rw------- 1 u1001 u1001 1144 Oct 28 00:43 .viminfo u1001@f2-vm:/$ touch /mnt/my-file u1001@f2-vm:/$ setfacl -m u:1001:rwx /mnt/my-file u1001@f2-vm:/$ sudo setcap -n 1001 cap_net_raw+ep /mnt/my-file u1001@f2-vm:/$ ls -al /mnt/my-file -rw-rwxr--+ 1 u1001 u1001 0 Oct 28 22:14 /mnt/my-file u1001@f2-vm:/$ ls -al /home/ubuntu/my-file -rw-rwxr--+ 1 ubuntu ubuntu 0 Oct 28 22:14 /home/ubuntu/my-file u1001@f2-vm:/$ getfacl /mnt/my-file getfacl: Removing leading '/' from absolute path names # file: mnt/my-file # owner: u1001 # group: u1001 user::rw- user:u1001:rwx group::rw- mask::rwx other::r-- u1001@f2-vm:/$ getfacl /home/ubuntu/my-file getfacl: Removing leading '/' from absolute path names # file: home/ubuntu/my-file # owner: ubuntu # group: ubuntu user::rw- user:ubuntu:rwx group::rw- mask::rwx other::r--" * tag 'idmapped-mounts-v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: (41 commits) xfs: remove the possibly unused mp variable in xfs_file_compat_ioctl xfs: support idmapped mounts ext4: support idmapped mounts fat: handle idmapped mounts tests: add mount_setattr() selftests fs: introduce MOUNT_ATTR_IDMAP fs: add mount_setattr() fs: add attr_flags_to_mnt_flags helper fs: split out functions to hold writers namespace: only take read lock in do_reconfigure_mnt() mount: make {lock,unlock}_mount_hash() static namespace: take lock_mount_hash() directly when changing flags nfs: do not export idmapped mounts overlayfs: do not mount on top of idmapped mounts ecryptfs: do not mount on top of idmapped mounts ima: handle idmapped mounts apparmor: handle idmapped mounts fs: make helpers idmap mount aware exec: handle idmapped mounts would_dump: handle idmapped mounts ...	2021-02-23 13:39:45 -08:00
Linus Torvalds	c63dca9e23	Description for this pull request: - Improve file deletion performance with dirsync mount option. - fix shift-out-of-bounds in exfat_fill_super() generated by syzkaller. -----BEGIN PGP SIGNATURE----- iQJMBAABCgA2FiEE6NzKS6Uv/XAAGHgyZwv7A1FEIQgFAmAzA7MYHG5hbWphZS5q ZW9uQHNhbXN1bmcuY29tAAoJEGcL+wNRRCEIYKMP/1i4uV1KebeQTkIiNlPMae+H hq2TgkrOLp7gKY+G8VoSrYIG9U5fEGh+bo6UwIXQpGDrnFb9AWAE0YoyCUV0sTGS 6CQmmjl1Cq4gwmBs2yjkbmGa54d2NRExZc2zMewDxnWRQjIvqJftxh1qowjwN13j arfjvWGJ61Je/S1jQ0vS0+nC5f4Mcy+60JyAQB4gxX5327o1ZVohs16jYTTZuS4+ PsARwM+q3wYQ/THeTf01E/a77IfsS49EtR1aTo8/9fCfQCxKdYT0wkJQQ/J0yhTL l6LMp1PNHV3iePRPZWAF15n8eB2dXGfcaathgtecY+9arKJVDnAoALt91gxN1bxE ZZZWkOlviN4FmdQvKNtVaBhdAxNqybe1P0yLvH1Vm2nY0oyY+qgQ79A/xNWcBvhg Q6Y1fV08KLWRPsafoZVntb3pd/DT1ekeedrFuqPe3c6kTziFn1GmUhb+6DGCEoFs 2sbuxATOJCdNL4GHLt0CRNQYU2Dm6g0yuQ3qodu4wcco9yUgiNXe2sM6NpMXaTH7 XUlgsaxl8zO5TKrVfre93gIP48HRf2/gZu8ejXs2rUEIzhSy3rsIMyt2DhTUe9Db CW2SMGQADG9Aiod4gRaunvMrTE3S7NMXLsOG5K6eThar8fOnqQ/F0NHi5DUWuFdw 17G5gwimXbZxW0GhTvWH =kzJV -----END PGP SIGNATURE----- Merge tag 'exfat-for-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat updates from Namjae Jeon: - improve file deletion performance with dirsync mount option - fix shift-out-of-bounds in exfat_fill_super() reported by syzkaller * tag 'exfat-for-5.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: improve performance of exfat_free_cluster when using dirsync mount option exfat: fix shift-out-of-bounds in exfat_fill_super()	2021-02-22 13:15:50 -08:00
Hyeongseok Kim	f728760aa9	exfat: improve performance of exfat_free_cluster when using dirsync mount option There are stressful update of cluster allocation bitmap when using dirsync mount option which is doing sync buffer on every cluster bit clearing. This could result in performance degradation when deleting big size file. Fix to update only when the bitmap buffer index is changed would make less disk access, improving performance especially for truncate operation. Testing with Samsung 256GB sdcard, mounted with dirsync option (mount -t exfat /dev/block/mmcblk0p1 /temp/mount -o dirsync) Remove 4GB file, blktrace result. [Before] : 39 secs. Total (blktrace): Reads Queued: 0, 0KiB Writes Queued: 32775, 16387KiB Read Dispatches: 0, 0KiB Write Dispatches: 32775, 16387KiB Reads Requeued: 0 Writes Requeued: 0 Reads Completed: 0, 0KiB Writes Completed: 32775, 16387KiB Read Merges: 0, 0KiB Write Merges: 0, 0KiB IO unplugs: 2 Timer unplugs: 0 [After] : 1 sec. Total (blktrace): Reads Queued: 0, 0KiB Writes Queued: 13, 6KiB Read Dispatches: 0, 0KiB Write Dispatches: 13, 6KiB Reads Requeued: 0 Writes Requeued: 0 Reads Completed: 0, 0KiB Writes Completed: 13, 6KiB Read Merges: 0, 0KiB Write Merges: 0, 0KiB IO unplugs: 1 Timer unplugs: 0 Signed-off-by: Hyeongseok Kim <hyeongseok@gmail.com> Acked-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>	2021-02-22 09:55:14 +09:00
Namjae Jeon	78c276f549	exfat: fix shift-out-of-bounds in exfat_fill_super() syzbot reported a warning which could cause shift-out-of-bounds issue. Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x183/0x22e lib/dump_stack.c:120 ubsan_epilogue lib/ubsan.c:148 [inline] __ubsan_handle_shift_out_of_bounds+0x432/0x4d0 lib/ubsan.c:395 exfat_read_boot_sector fs/exfat/super.c:471 [inline] __exfat_fill_super fs/exfat/super.c:556 [inline] exfat_fill_super+0x2acb/0x2d00 fs/exfat/super.c:624 get_tree_bdev+0x406/0x630 fs/super.c:1291 vfs_get_tree+0x86/0x270 fs/super.c:1496 do_new_mount fs/namespace.c:2881 [inline] path_mount+0x1937/0x2c50 fs/namespace.c:3211 do_mount fs/namespace.c:3224 [inline] __do_sys_mount fs/namespace.c:3432 [inline] __se_sys_mount+0x2f9/0x3b0 fs/namespace.c:3409 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 exfat specification describe sect_per_clus_bits field of boot sector could be at most 25 - sect_size_bits and at least 0. And sect_size_bits can also affect this calculation, It also needs validation. This patch add validation for sect_per_clus_bits and sect_size_bits field of boot sector. Fixes: `719c1e1829` ("exfat: add super block operations") Cc: stable@vger.kernel.org # v5.9+ Reported-by: syzbot+da4fe66aaadd3c2e2d1c@syzkaller.appspotmail.com Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com> Tested-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>	2021-02-22 09:55:13 +09:00
Christoph Hellwig	c6bf3f0e25	block: use an on-stack bio in blkdev_issue_flush There is no point in allocating memory for a synchronous flush. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-01-27 09:51:48 -07:00

1 2 3

119 Commits