-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmScT18ACgkQnJ2qBz9k
QNnlqAf/bIU+I3Qd3EUpzWrOXEyRjaUggRnb4ibIH2I6DjSAP4wtm5wiG/+wjDFe
v+gdRd8PlAlHbZJvW3WUxeSzWendqd78i2lgwFN+s2QCVtQSUsNy7mtUvOL2b1zy
Kf35vTNbkKE0TevoqHZmoT/mehSBj6Zt4k5POMalfxwnJHoVF25OqHEQQc8vnOjv
as/uMaHVwK/Q0pMafTz8vt9Fogkdqe6A+qLLxTvG6iQKd2Z0NdYK2GxR0oTVhDOK
Ly+h1evRldgOcrishrje00LZT8SznUQkWBjIpPN/HbXR1qc5Jk+BYJUqT2jg7zVd
EW61U79nsaugpTUicpTUIluUZ7/QKA==
=toKL
-----END PGP SIGNATURE-----
Merge tag 'fs_for_v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull misc filesystem updates from Jan Kara:
- Rewrite kmap_local() handling in ext2
- Convert ext2 direct IO path to iomap (with some infrastructure tweaks
associated with that)
- Convert two boilerplate licenses in udf to SPDX identifiers
- Other small udf, ext2, and quota fixes and cleanups
* tag 'fs_for_v6.5-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: Fix uninitialized array access for some pathnames
ext2: Drop fragment support
quota: fix warning in dqgrab()
quota: Properly disable quotas when add_dquot_ref() fails
fs: udf: udftime: Replace LGPL boilerplate with SPDX identifier
fs: udf: Replace GPL 2.0 boilerplate license notice with SPDX identifier
fs: Drop wait_unfrozen wait queue
ext2_find_entry()/ext2_dotdot(): callers don't need page_addr anymore
ext2_{set_link,delete_entry}(): don't bother with page_addr
ext2_put_page(): accept any pointer within the page
ext2_get_page(): saner type
ext2: use offset_in_page() instead of open-coding it as subtraction
ext2_rename(): set_link and delete_entry may fail
ext2: Add direct-io trace points
ext2: Move direct-io to use iomap
ext2: Use generic_buffers_fsync() implementation
ext4: Use generic_buffers_fsync_noflush() implementation
fs/buffer.c: Add generic_buffers_fsync*() implementation
ext2/dax: Fix ext2_setsize when len is page aligned
Ext2 has fields in superblock reserved for subblock allocation support.
However that never landed. Drop the many years dead code.
Reported-by: syzbot+af5e10f73dbff48f70af@syzkaller.appspotmail.com
Signed-off-by: Jan Kara <jack@suse.cz>
... and that's how it should've been done in the first place
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Tested-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
ext2_set_link() simply doesn't use it anymore and ext2_delete_entry()
can easily obtain it from the directory entry pointer...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Tested-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
eliminates the need to keep the pointer to the first byte within
the page if we are guaranteed to have pointers to some byte
in the same page at hand.
Don't backport without commit 88d7b12068 ("highmem: round down the
address passed to kunmap_flush_on_unmap()").
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Tested-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
We need to pass to caller both the page reference and pointer to the
first byte in the now-mapped page. The former always has the same type,
the latter varies from caller to caller. So make it
void *ext2_get_page(...., struct page **page)
rather than
struct page *ext2_get_page(..., void **page_addr)
and avoid the casts...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Tested-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Tested-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Tested-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
This patch adds the trace point to ext2 direct-io apis
in fs/ext2/file.c
Here is how the output looks like
a.out-467865 [006] 6758.170968: ext2_dio_write_begin: dev 7:12 ino 0xe isize 0x1000 pos 0x0 len 4096 flags DIRECT|WRITE aio 1 ret 0
a.out-467865 [006] 6758.171061: ext2_dio_write_end: dev 7:12 ino 0xe isize 0x1000 pos 0x0 len 0 flags DIRECT|WRITE aio 1 ret -529
kworker/3:153-444162 [003] 6758.171252: ext2_dio_write_endio: dev 7:12 ino 0xe isize 0x1000 pos 0x0 len 4096 flags DIRECT|WRITE aio 1 ret 0
a.out-468222 [001] 6761.628924: ext2_dio_read_begin: dev 7:12 ino 0xe isize 0x1000 pos 0x0 len 4096 flags DIRECT aio 1 ret 0
a.out-468222 [001] 6761.629063: ext2_dio_read_end: dev 7:12 ino 0xe isize 0x1000 pos 0x0 len 0 flags DIRECT aio 1 ret -529
a.out-468428 [005] 6763.937454: ext2_dio_write_begin: dev 7:12 ino 0xe isize 0x1000 pos 0x0 len 4096 flags DIRECT aio 0 ret 0
a.out-468428 [005] 6763.937829: ext2_dio_write_endio: dev 7:12 ino 0xe isize 0x1000 pos 0x0 len 4096 flags DIRECT aio 0 ret 0
a.out-468428 [005] 6763.937847: ext2_dio_write_end: dev 7:12 ino 0xe isize 0x1000 pos 0x1000 len 0 flags DIRECT aio 0 ret 4096
a.out-468609 [000] 6765.702878: ext2_dio_read_begin: dev 7:12 ino 0xe isize 0x1000 pos 0x0 len 4096 flags DIRECT aio 0 ret 0
a.out-468609 [000] 6765.703243: ext2_dio_read_end: dev 7:12 ino 0xe isize 0x1000 pos 0x1000 len 0 flags DIRECT aio 0 ret 4096
Reported-and-tested-by: Disha Goel <disgoel@linux.ibm.com>
[Need to add CFLAGS_trace for fixing unable to find trace file problem]
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <b8b0897fa2b273a448d7b4ba7317357ac73c08bc.1682069716.git.ritesh.list@gmail.com>
This patch converts ext2 direct-io path to iomap interface.
- This also takes care of DIO_SKIP_HOLES part in which we return -ENOTBLK
from ext2_iomap_begin(), in case if the write is done on a hole.
- This fallbacks to buffered-io in case of DIO_SKIP_HOLES or in case of
a partial write or if any error is detected in ext2_iomap_end().
We try to return -ENOTBLK in such cases.
- For any unaligned or extending DIO writes, we pass
IOMAP_DIO_FORCE_WAIT flag to ensure synchronous writes.
- For extending writes we set IOMAP_F_DIRTY in ext2_iomap_begin because
otherwise with dsync writes on devices that support FUA, generic_write_sync
won't be called and we might miss inode metadata updates.
- Since ext2 already now uses _nolock vartiant of sync write. Hence
there is no inode lock problem with iomap in this patch.
- ext2_iomap_ops are now being shared by DIO, DAX & fiemap path
Tested-by: Disha Goel <disgoel@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <610b672a52f2a7ff6dc550fd14d0f995806232a5.1682069716.git.ritesh.list@gmail.com>
Next patch converts ext2 to use iomap interface for DIO.
iomap layer can call generic_write_sync() -> ext2_fsync() from
iomap_dio_complete while still holding the inode_lock().
Now writeback from other paths doesn't need inode_lock().
It seems there is also no need of an inode_lock() for
sync_mapping_buffers(). It uses it's own mapping->private_lock
for it's buffer list handling.
Hence this patch is in preparation to move ext2 to iomap.
This uses generic_buffers_fsync() which does not take any inode_lock()
in ext2_fsync().
Tested-by: Disha Goel <disgoel@linux.ibm.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <76d206a464574ff91db25bc9e43479b51ca7e307.1682069716.git.ritesh.list@gmail.com>
PAGE_ALIGN(x) macro gives the next highest value which is multiple of
pagesize. But if x is already page aligned then it simply returns x.
So, if x passed is 0 in dax_zero_range() function, that means the
length gets passed as 0 to ->iomap_begin().
In ext2 it then calls ext2_get_blocks -> max_blocks as 0 and hits bug_on
here in ext2_get_blocks().
BUG_ON(maxblocks == 0);
Instead we should be calling dax_truncate_page() here which takes
care of it. i.e. it only calls dax_zero_range if the offset is not
page/block aligned.
This can be easily triggered with following on fsdax mounted pmem
device.
dd if=/dev/zero of=file count=1 bs=512
truncate -s 0 file
[79.525838] EXT2-fs (pmem0): DAX enabled. Warning: EXPERIMENTAL, use at your own risk
[79.529376] ext2 filesystem being mounted at /mnt1/test supports timestamps until 2038 (0x7fffffff)
[93.793207] ------------[ cut here ]------------
[93.795102] kernel BUG at fs/ext2/inode.c:637!
[93.796904] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[93.798659] CPU: 0 PID: 1192 Comm: truncate Not tainted 6.3.0-rc2-xfstests-00056-g131086faa369 #139
[93.806459] RIP: 0010:ext2_get_blocks.constprop.0+0x524/0x610
<...>
[93.835298] Call Trace:
[93.836253] <TASK>
[93.837103] ? lock_acquire+0xf8/0x110
[93.838479] ? d_lookup+0x69/0xd0
[93.839779] ext2_iomap_begin+0xa7/0x1c0
[93.841154] iomap_iter+0xc7/0x150
[93.842425] dax_zero_range+0x6e/0xa0
[93.843813] ext2_setsize+0x176/0x1b0
[93.845164] ext2_setattr+0x151/0x200
[93.846467] notify_change+0x341/0x4e0
[93.847805] ? lock_acquire+0xf8/0x110
[93.849143] ? do_truncate+0x74/0xe0
[93.850452] ? do_truncate+0x84/0xe0
[93.851739] do_truncate+0x84/0xe0
[93.852974] do_sys_ftruncate+0x2b4/0x2f0
[93.854404] do_syscall_64+0x3f/0x90
[93.855789] entry_SYSCALL_64_after_hwframe+0x72/0xdc
CC: stable@vger.kernel.org
Fixes: 2aa3048e03 ("iomap: switch iomap_zero_range to use iomap_iter")
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <046a58317f29d9603d1068b2bbae47c2332c17ae.1682069716.git.ritesh.list@gmail.com>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmRHq24ACgkQnJ2qBz9k
QNkowggAuyZ313GrAUNDH2Cvu3PE804LpW6k27VxtIxmgtMSZvuOED0fC99Xt+3y
SnQrJKfw4HVKIDT2sx3ECaiKa+v1UBMG+KmaZ5u5qLjxfjY8YHy6TbHMwY/NV2Aj
jGgJYN+5E9AJqStzuu4C9fUzgOgvi4IzgNX4hp0WbESVk6jbDqW2KGNAv6IRM9ku
qDg54aL8W50uDMgUJysqe180SlDgeYGPBg1OfA7xrzMaBpEK3z5+v52ZlpGNguDP
MZMGwZgFLCP2R5Z9+zqhyK9UuIHUDNE05y20+Hsm1AxlI0iVVt+CahQ9T/6P9OBT
QQ+vnepQR6ORtFBlPCVgyiTQgsO3Tw==
=+MkS
-----END PGP SIGNATURE-----
Merge tag 'fs_for_v6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext2, reiserfs, udf, and quota updates from Jan Kara:
"A couple of small fixes and cleanups for ext2, udf, reiserfs, and
quota.
The biggest change is making CONFIG_PRINT_QUOTA_WARNING depend on
BROKEN with an outlook for removing it completely in an year or so"
* tag 'fs_for_v6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
quota: mark PRINT_QUOTA_WARNING as BROKEN
quota: update Kconfig comment
reiserfs: remove unused iter variable
quota: Use register_sysctl_init() for registering fs_dqstats_table
reiserfs: remove unused sched_count variable
ext2: remove redundant assignment to pointer end
quota: make dquot_set_dqinfo return errors from ->write_info
quota: fixup *_write_file_info() to return proper error code
quota: simplify two-level sysctl registration for fs_dqstats_table
udf: use wrapper i_blocksize() in udf_discard_prealloc()
udf: Use folios in udf_adinicb_writepage()
ext2: Check block size validity during mount
ext2: Correct maximum ext2 filesystem block size
Pointer is assigned a value that is never read, the assignment is
redundant and can be removed.
Cleans up clang-scan warning:
fs/ext2/xattr.c:555:3: warning: Value stored to 'end' is never read [deadcode.DeadStores]
end = (char *)header + sb->s_blocksize;
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <20230317143420.419005-1-colin.i.king@gmail.com>
Check that log of block size stored in the superblock has sensible
value. Otherwise the shift computing the block size can overflow leading
to undefined behavior.
Reported-by: syzbot+4fec412f59eba8c01b77@syzkaller.appspotmail.com
Signed-off-by: Jan Kara <jack@suse.cz>
Ext2 has traditionally supported filesystem block sizes upto page size
or upto 65536. Macro EXT2_MAX_BLOCK_SIZE is set to 4096, however that is
never used in ext2 so practically we always allowed whatever
sb_set_blocksize() accepted. Fix value of EXT2_MAX_BLOCK_SIZE because it
will be used in the next patch.
Signed-off-by: Jan Kara <jack@suse.cz>
Reflect in their naming and document that they are kept around for
legacy reasons and shouldn't be used anymore by new code.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Remove struct posix_acl_{access,default}_handler for all filesystems
that don't depend on the xattr handler in their inode->i_op->listxattr()
method in any way. There's nothing more to do than to simply remove the
handler. It's been effectively unused ever since we introduced the new
posix acl api.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmPueAQQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgplopEACo17a4Z2p2xCedA0NCqX2ggtsSIdYiluPm
pgdBzIEsgwKo1XVLGRgGiC8VdMRuzO4Zh/NGn4iRF1a68wjgjnwGWY7r052TUoSr
q1yya739BpffnkXjj15x86cwl+5rHv2RQkm15+2HqBgcruA63/ZgdKBtjj+EtVKs
zYOlmgyfFbkn8AdULMGiDKP4lixV8gUelv6vWneBwNrj4iSLnuN1+8nJNsl4wxwg
ImSpx63AzhUoeL6byc+fmiA8fZhDhSvwON2tCyyCmOjlFM/TLrsm5t1juWiDid1O
UROkQwQtsmjSUq3ow5fRJfjbZ3HLa1uGQr95DYHy0OBRAteAhDY5Upv0DXNL0ZBh
uNNg8AXtJbyc+pLHWnncyiTzi+3eWs7WiMn04/a5eDhFvcJ0PZjLIgRi5j1ezUS1
bWqoPaAIxoMD83WoMxjnKvBpGeMzPHvNTijeZjkGOu0vOk8JhXqNmLTjNG9aLtzf
1Nvp0o8AqtQAW7cgFazZSWtw4bPk/wZ7mW0zHtqLDHIzXkc7A/Uo0ftdv84G08aW
pvakNz4aNLwSPf7hxgPP9SgS9CeHhxK8PS6uk3V788SI8qGiew11+EcTNGkQNmvw
/ItCo93UaWD/7SZLObTLslmet7rFHzz6PXaXrMxrSvaeZMkgr7DWEy9XS+ueOtXO
fS8QhJX11w==
=IU45
-----END PGP SIGNATURE-----
Merge tag 'for-6.3/dio-2023-02-16' of git://git.kernel.dk/linux
Pull legacy dio update from Jens Axboe:
"We only have a few file systems that use the old dio code, make them
select it rather than build it unconditionally"
* tag 'for-6.3/dio-2023-02-16' of git://git.kernel.dk/linux:
fs: build the legacy direct I/O code conditionally
fs: move sb_init_dio_done_wq out of direct-io.c
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmPvZWkACgkQnJ2qBz9k
QNkVtQf/V515KIb7lEkwOjlF7AQg5MS/c8zuodHISzWYhZyd0KSTb+qnF/QzLvVm
cJ3cIPXrIyVw4Eeqh0qQvukOYCcBvUa1IBW5kePiy3mHiHRD2PRgaxSGBXTXqqPG
xXagwllrn3/mG4ZKXlNYFrzgoshSFFeBdkLGEi+/L6DAe0B+mG+FIHON1eWylgOT
j1D+/k9RNvRhRU8WtStcI4u9mnVPqUI2RSWUpjxuNzUPtyFflPVNCz+bkXXovPQ0
ZQY2HeZcs7jsorCmeSHUzTt5bbj3BfhO3uWL4/wnHgp+88OBRUUyvMrNbOF97xd6
KFqbJVQbSevasUSEvCS+3+EChzjGoA==
=DaJi
-----END PGP SIGNATURE-----
Merge tag 'fixes_for_v6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull UDF and ext2 fixes from Jan Kara:
- Rewrite of udf directory iteration code to address multiple syzbot
reports
- Fixes to udf extent handling and block mapping code to address
several syzbot reports and filesystem corruption issues uncovered by
fsx & fsstress
- Convert udf to kmap_local()
- Add sanity checks when loading udf bitmaps
- Drop old VARCONV support which I've never seen used and which was
broken for quite some years without anybody noticing
- Finish conversion of ext2 to kmap_local()
- One fix to mpage_writepages() on which other udf fixes depend
* tag 'fixes_for_v6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (78 commits)
udf: Avoid directory type conversion failure due to ENOMEM
udf: Use unsigned variables for size calculations
udf: remove reporting loc in debug output
udf: Check consistency of Space Bitmap Descriptor
udf: Fix file counting in LVID
udf: Limit file size to 4TB
udf: Don't return bh from udf_expand_dir_adinicb()
udf: Convert udf_expand_file_adinicb() to avoid kmap_atomic()
udf: Convert udf_adinicb_writepage() to memcpy_to_page()
udf: Switch udf_adinicb_readpage() to kmap_local_page()
udf: Move udf_adinicb_readpage() to inode.c
udf: Mark aops implementation static
udf: Switch to single address_space_operations
udf: Add handling of in-ICB files to udf_bmap()
udf: Convert all file types to use udf_write_end()
udf: Convert in-ICB files to use udf_write_begin()
udf: Convert in-ICB files to use udf_direct_IO()
udf: Convert in-ICB files to use udf_writepages()
udf: Unify .read_folio for normal and in-ICB files
udf: Fix off-by-one error when discarding preallocation
...
Add a new LEGACY_DIRECT_IO config symbol that is only selected by the
file systems that still use the legacy blockdev_direct_IO code, so that
kernels without support for those file systems don't need to build the
code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Link: https://lore.kernel.org/r/20230125065839.191256-3-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Convert to struct mnt_idmap.
Last cycle we merged the necessary infrastructure in
256c8aed2b ("fs: introduce dedicated idmap type for mounts").
This is just the conversion to struct mnt_idmap.
Currently we still pass around the plain namespace that was attached to a
mount. This is in general pretty convenient but it makes it easy to
conflate namespaces that are relevant on the filesystem with namespaces
that are relevent on the mount level. Especially for non-vfs developers
without detailed knowledge in this area this can be a potential source for
bugs.
Once the conversion to struct mnt_idmap is done all helpers down to the
really low-level helpers will take a struct mnt_idmap argument instead of
two namespace arguments. This way it becomes impossible to conflate the two
eliminating the possibility of any bugs. All of the vfs and all filesystems
only operate on struct mnt_idmap.
Acked-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Propagate errors from ext2_prepare_chunk to the callers and handle them
there. While touching the prototype also turn update_times into a bool
from the current int used as bool.
[JK: fixed up error recovery path in ext2_rename()]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <20230116085205.2342975-1-hch@lst.de>
kmap_atomic() is deprecated in favor of kmap_local_page(). Therefore,
replace kmap_atomic() with kmap_local_page().
kmap_atomic() is implemented like a kmap_local_page() which also disables
page-faults and preemption (the latter only for !PREEMPT_RT kernels).
However, the code within the mapping and un-mapping in ext2_make_empty()
does not depend on the above-mentioned side effects.
Therefore, a mere replacement of the old API with the new one is all it
is required (i.e., there is no need to explicitly add any calls to
pagefault_disable() and/or preempt_disable()).
Suggested-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Fabio M. De Francesco <fmdefrancesco@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <20221231174205.8492-1-fmdefrancesco@gmail.com>
-----BEGIN PGP SIGNATURE-----
iQEzBAABCAAdFiEEq1nRK9aeMoq1VSgcnJ2qBz9kQNkFAmOXWlMACgkQnJ2qBz9k
QNltfwf9GIEZASzX/v2LXYInbgvJnuRxeNoNFDZjatz+Nk17fJ9P0Fzcu7OztMTk
+4HX3pxXYn/eFTyxVf5c3C2gU4KDV2InrYk+IyA7unZ92ROO5uaxrrknSPouYXoO
fd0zwlMQ8bxk7wgjSnG+0Q38dbWr9XgYRqcURjXvRG9e68o49SXTc333lXc+l25X
WphjDK6d1gXWiHKdYVYiROF7HjAjaeRk8clXtFhHmyGvhi+wvfP6mqOhzMCRuR7U
M1dYR/B2+AJieOmVK1gqsLFc2f/TN3AEYMsRi256vYEuQhY7WRkxQw6afmYsLc8J
sjj4mR15SwZewLtIlNbX3phvi1OBWA==
=Q4qw
-----END PGP SIGNATURE-----
Merge tag 'fixes_for_v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull udf and ext2 fixes from Jan Kara:
- a couple of smaller cleanups and fixes for ext2
- fixes of a data corruption issues in udf when handling holes and
preallocation extents
- fixes and cleanups of several smaller issues in udf
- add maintainer entry for isofs
* tag 'fixes_for_v6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: Fix extending file within last block
udf: Discard preallocation before extending file with a hole
udf: Do not bother looking for prealloc extents if i_lenExtents matches i_size
udf: Fix preallocation discarding at indirect extent boundary
udf: Increase UDF_MAX_READ_VERSION to 0x0260
fs/ext2: Fix code indentation
ext2: unbugger ext2_empty_dir()
udf: remove ->writepage
ext2: remove ->writepage
ext2: Don't flush page immediately for DIRSYNC directories
ext2: Fix some kernel-doc warnings
maintainers: Add ISOFS entry
udf: Avoid double brelse() in udf_rename()
fs: udf: Optimize udf_free_in_core_inode and udf_find_fileset function
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCY5bwTgAKCRCRxhvAZXjc
ovd2AQCK00NAtGjQCjQPQGyTa4GAPqvWgq1ef0lnhv+TL5US5gD9FncQ8UofeMXt
pBfjtAD6ettTPCTxUQfnTwWEU4rc7Qg=
=27Wm
-----END PGP SIGNATURE-----
Merge tag 'fs.acl.rework.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping
Pull VFS acl updates from Christian Brauner:
"This contains the work that builds a dedicated vfs posix acl api.
The origins of this work trace back to v5.19 but it took quite a while
to understand the various filesystem specific implementations in
sufficient detail and also come up with an acceptable solution.
As we discussed and seen multiple times the current state of how posix
acls are handled isn't nice and comes with a lot of problems: The
current way of handling posix acls via the generic xattr api is error
prone, hard to maintain, and type unsafe for the vfs until we call
into the filesystem's dedicated get and set inode operations.
It is already the case that posix acls are special-cased to death all
the way through the vfs. There are an uncounted number of hacks that
operate on the uapi posix acl struct instead of the dedicated vfs
struct posix_acl. And the vfs must be involved in order to interpret
and fixup posix acls before storing them to the backing store, caching
them, reporting them to userspace, or for permission checking.
Currently a range of hacks and duct tape exist to make this work. As
with most things this is really no ones fault it's just something that
happened over time. But the code is hard to understand and difficult
to maintain and one is constantly at risk of introducing bugs and
regressions when having to touch it.
Instead of continuing to hack posix acls through the xattr handlers
this series builds a dedicated posix acl api solely around the get and
set inode operations.
Going forward, the vfs_get_acl(), vfs_remove_acl(), and vfs_set_acl()
helpers must be used in order to interact with posix acls. They
operate directly on the vfs internal struct posix_acl instead of
abusing the uapi posix acl struct as we currently do. In the end this
removes all of the hackiness, makes the codepaths easier to maintain,
and gets us type safety.
This series passes the LTP and xfstests suites without any
regressions. For xfstests the following combinations were tested:
- xfs
- ext4
- btrfs
- overlayfs
- overlayfs on top of idmapped mounts
- orangefs
- (limited) cifs
There's more simplifications for posix acls that we can make in the
future if the basic api has made it.
A few implementation details:
- The series makes sure to retain exactly the same security and
integrity module permission checks. Especially for the integrity
modules this api is a win because right now they convert the uapi
posix acl struct passed to them via a void pointer into the vfs
struct posix_acl format to perform permission checking on the mode.
There's a new dedicated security hook for setting posix acls which
passes the vfs struct posix_acl not a void pointer. Basing checking
on the posix acl stored in the uapi format is really unreliable.
The vfs currently hacks around directly in the uapi struct storing
values that frankly the security and integrity modules can't
correctly interpret as evidenced by bugs we reported and fixed in
this area. It's not necessarily even their fault it's just that the
format we provide to them is sub optimal.
- Some filesystems like 9p and cifs need access to the dentry in
order to get and set posix acls which is why they either only
partially or not even at all implement get and set inode
operations. For example, cifs allows setxattr() and getxattr()
operations but doesn't allow permission checking based on posix
acls because it can't implement a get acl inode operation.
Thus, this patch series updates the set acl inode operation to take
a dentry instead of an inode argument. However, for the get acl
inode operation we can't do this as the old get acl method is
called in e.g., generic_permission() and inode_permission(). These
helpers in turn are called in various filesystem's permission inode
operation. So passing a dentry argument to the old get acl inode
operation would amount to passing a dentry to the permission inode
operation which we shouldn't and probably can't do.
So instead of extending the existing inode operation Christoph
suggested to add a new one. He also requested to ensure that the
get and set acl inode operation taking a dentry are consistently
named. So for this version the old get acl operation is renamed to
->get_inode_acl() and a new ->get_acl() inode operation taking a
dentry is added. With this we can give both 9p and cifs get and set
acl inode operations and in turn remove their complex custom posix
xattr handlers.
In the future I hope to get rid of the inode method duplication but
it isn't like we have never had this situation. Readdir is just one
example. And frankly, the overall gain in type safety and the more
pleasant api wise are simply too big of a benefit to not accept
this duplication for a while.
- We've done a full audit of every codepaths using variant of the
current generic xattr api to get and set posix acls and
surprisingly it isn't that many places. There's of course always a
chance that we might have missed some and if so I'm sure we'll find
them soon enough.
The crucial codepaths to be converted are obviously stacking
filesystems such as ecryptfs and overlayfs.
For a list of all callers currently using generic xattr api helpers
see [2] including comments whether they support posix acls or not.
- The old vfs generic posix acl infrastructure doesn't obey the
create and replace semantics promised on the setxattr(2) manpage.
This patch series doesn't address this. It really is something we
should revisit later though.
The patches are roughly organized as follows:
(1) Change existing set acl inode operation to take a dentry
argument (Intended to be a non-functional change)
(2) Rename existing get acl method (Intended to be a non-functional
change)
(3) Implement get and set acl inode operations for filesystems that
couldn't implement one before because of the missing dentry.
That's mostly 9p and cifs (Intended to be a non-functional
change)
(4) Build posix acl api, i.e., add vfs_get_acl(), vfs_remove_acl(),
and vfs_set_acl() including security and integrity hooks
(Intended to be a non-functional change)
(5) Implement get and set acl inode operations for stacking
filesystems (Intended to be a non-functional change)
(6) Switch posix acl handling in stacking filesystems to new posix
acl api now that all filesystems it can stack upon support it.
(7) Switch vfs to new posix acl api (semantical change)
(8) Remove all now unused helpers
(9) Additional regression fixes reported after we merged this into
linux-next
Thanks to Seth for a lot of good discussion around this and
encouragement and input from Christoph"
* tag 'fs.acl.rework.v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping: (36 commits)
posix_acl: Fix the type of sentinel in get_acl
orangefs: fix mode handling
ovl: call posix_acl_release() after error checking
evm: remove dead code in evm_inode_set_acl()
cifs: check whether acl is valid early
acl: make vfs_posix_acl_to_xattr() static
acl: remove a slew of now unused helpers
9p: use stub posix acl handlers
cifs: use stub posix acl handlers
ovl: use stub posix acl handlers
ecryptfs: use stub posix acl handlers
evm: remove evm_xattr_acl_change()
xattr: use posix acl api
ovl: use posix acl api
ovl: implement set acl method
ovl: implement get acl method
ecryptfs: implement set acl method
ecryptfs: implement get acl method
ksmbd: use vfs_remove_acl()
acl: add vfs_remove_acl()
...
ts=4 can cause misunderstanding in code reading. It is better to replace
8 spaces with one tab.
Signed-off-by: Rong Tao <rongtao@cestc.cn>
Signed-off-by: Jan Kara <jack@suse.cz>
In 27cfa25895 "ext2: fix fs corruption when trying to remove
a non-empty directory with IO error" a funny thing has happened:
- page = ext2_get_page(inode, i, dir_has_error, &page_addr);
+ page = ext2_get_page(inode, i, 0, &page_addr);
- if (IS_ERR(page)) {
- dir_has_error = 1;
- continue;
- }
+ if (IS_ERR(page))
+ goto not_empty;
And at not_empty: we hit ext2_put_page(page, page_addr), which does
put_page(page). Which, unless I'm very mistaken, should oops
immediately when given ERR_PTR(-E...) as page.
OK, shit happens, insufficiently tested patches included. But when
commit in question describes the fault-injection test that exercised
that particular failure exit...
Ow.
CC: stable@vger.kernel.org
Fixes: 27cfa25895 ("ext2: fix fs corruption when trying to remove a non-empty directory with IO error")
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jan Kara <jack@suse.cz>
->writepage is a very inefficient method to write back data, and only
used through write_cache_pages or as a fallback when no ->migrate_folio
method is present.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jan Kara <jack@suse.cz>
We do not need to writeout modified directory blocks immediately when
modifying them while the page is locked. It is enough to do the flush
somewhat later which has the added benefit that inode times can be
flushed as well. It also allows us to stop depending on
write_one_page() function.
Signed-off-by: Jan Kara <jack@suse.cz>
This is a simple mechanical transformation done by:
@@
expression E;
@@
- prandom_u32_max
+ get_random_u32_below
(E)
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs
Reviewed-by: SeongJae Park <sj@kernel.org> # for damon
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> # for infiniband
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> # for arm
Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
The current code provokes some kernel-doc warnings:
fs/ext2/dir.c:417: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
Signed-off-by: Bo Liu <liubo03@inspur.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jan Kara <jack@suse.cz>
The current way of setting and getting posix acls through the generic
xattr interface is error prone and type unsafe. The vfs needs to
interpret and fixup posix acls before storing or reporting it to
userspace. Various hacks exist to make this work. The code is hard to
understand and difficult to maintain in it's current form. Instead of
making this work by hacking posix acls through xattr handlers we are
building a dedicated posix acl api around the get and set inode
operations. This removes a lot of hackiness and makes the codepaths
easier to maintain. A lot of background can be found in [1].
The current inode operation for getting posix acls takes an inode
argument but various filesystems (e.g., 9p, cifs, overlayfs) need access
to the dentry. In contrast to the ->set_acl() inode operation we cannot
simply extend ->get_acl() to take a dentry argument. The ->get_acl()
inode operation is called from:
acl_permission_check()
-> check_acl()
-> get_acl()
which is part of generic_permission() which in turn is part of
inode_permission(). Both generic_permission() and inode_permission() are
called in the ->permission() handler of various filesystems (e.g.,
overlayfs). So simply passing a dentry argument to ->get_acl() would
amount to also having to pass a dentry argument to ->permission(). We
should avoid this unnecessary change.
So instead of extending the existing inode operation rename it from
->get_acl() to ->get_inode_acl() and add a ->get_acl() method later that
passes a dentry argument and which filesystems that need access to the
dentry can implement instead of ->get_inode_acl(). Filesystems like cifs
which allow setting and getting posix acls but not using them for
permission checking during lookup can simply not implement
->get_inode_acl().
This is intended to be a non-functional change.
Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1]
Suggested-by/Inspired-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>
The current way of setting and getting posix acls through the generic
xattr interface is error prone and type unsafe. The vfs needs to
interpret and fixup posix acls before storing or reporting it to
userspace. Various hacks exist to make this work. The code is hard to
understand and difficult to maintain in it's current form. Instead of
making this work by hacking posix acls through xattr handlers we are
building a dedicated posix acl api around the get and set inode
operations. This removes a lot of hackiness and makes the codepaths
easier to maintain. A lot of background can be found in [1].
Since some filesystem rely on the dentry being available to them when
setting posix acls (e.g., 9p and cifs) they cannot rely on set acl inode
operation. But since ->set_acl() is required in order to use the generic
posix acl xattr handlers filesystems that do not implement this inode
operation cannot use the handler and need to implement their own
dedicated posix acl handlers.
Update the ->set_acl() inode method to take a dentry argument. This
allows all filesystems to rely on ->set_acl().
As far as I can tell all codepaths can be switched to rely on the dentry
instead of just the inode. Note that the original motivation for passing
the dentry separate from the inode instead of just the dentry in the
xattr handlers was because of security modules that call
security_d_instantiate(). This hook is called during
d_instantiate_new(), d_add(), __d_instantiate_anon(), and
d_splice_alias() to initialize the inode's security context and possibly
to set security.* xattrs. Since this only affects security.* xattrs this
is completely irrelevant for posix acls.
Link: https://lore.kernel.org/all/20220801145520.1532837-1-brauner@kernel.org [1]
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner@kernel.org>