The ioctl encoding for this parameter is a long but the documentation says
it should be an int and the kernel drivers expect it to be an int. If the
fuse driver treats this as a long it might end up scribbling over the stack
of a userspace process that only allocated enough space for an int.
This was previously discussed in [1] and a patch for fuse was proposed in
[2]. From what I can tell the patch in [2] was nacked in favor of adding
new, "fixed" ioctls and using those from userspace. However there is still
no "fixed" version of these ioctls and the fact is that it's sometimes
infeasible to change all userspace to use the new one.
Handling the ioctls specially in the fuse driver seems like the most
pragmatic way for fuse servers to support them without causing crashes in
userspace applications that call them.
[1]: https://lore.kernel.org/linux-fsdevel/20131126200559.GH20559@hall.aurel32.net/T/
[2]: https://sourceforge.net/p/fuse/mailman/message/31771759/
Signed-off-by: Chirantan Ekbote <chirantan@chromium.org>
Fixes: 59efec7b90 ("fuse: implement ioctl support")
Cc: <stable@vger.kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
fuse_writepages() ignores some errors taken from fuse_writepages_fill() I
believe it is a bug: if .writepages is called with WB_SYNC_ALL it should
either guarantee that all data was successfully saved or return error.
Fixes: 26d614df1d ("fuse: Implement writepages callback")
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
fuse_writepages_fill uses following construction:
if (wpa && ap->num_pages &&
(A || B || C)) {
action;
} else if (wpa && D) {
if (E) {
the same action;
}
}
- ap->num_pages check is always true and can be removed
- "if" and "else if" calls the same action and can be merged.
Move checking A, B, C, D, E conditions to a helper, add comments.
Original-patch-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
fuse_writepages_fill() calls tree_insert() with ap->num_pages = 0 which
triggers the following warning:
WARNING: CPU: 1 PID: 17211 at fs/fuse/file.c:1728 tree_insert+0xab/0xc0 [fuse]
RIP: 0010:tree_insert+0xab/0xc0 [fuse]
Call Trace:
fuse_writepages_fill+0x5da/0x6a0 [fuse]
write_cache_pages+0x171/0x470
fuse_writepages+0x8a/0x100 [fuse]
do_writepages+0x43/0xe0
Fix up the warning and clean up the code around rb-tree insertion:
- Rename tree_insert() to fuse_insert_writeback() and make it return the
conflicting entry in case of failure
- Re-add tree_insert() as a wrapper around fuse_insert_writeback()
- Rename fuse_writepage_in_flight() to fuse_writepage_add() and reverse
the meaning of the return value to mean
+ "true" in case the writepage entry was successfully added
+ "false" in case it was in-fligt queued on an existing writepage
entry's auxiliary list or the existing writepage entry's temporary
page updated
Switch from fuse_find_writeback() + tree_insert() to
fuse_insert_writeback()
- Move setting orig_pages to before inserting/updating the entry; this may
result in the orig_pages value being discarded later in case of an
in-flight request
- In case of a new writepage entry use fuse_writepage_add()
unconditionally, only set data->wpa if the entry was added.
Fixes: 6b2fb79963 ("fuse: optimize writepages search")
Reported-by: kernel test robot <rong.a.chen@intel.com>
Original-path-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
In fuse_writepage_end() the old writepages entry needs to be removed from
the rbtree before inserting the new one, otherwise tree_insert() would
fail. This is a very rare codepath and no reproducer exists.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCXt/0GAAKCRDh3BK/laaZ
PIJjAP48TurDqomsQMBLiOsSUy0YIhd5QC/G5MYLKSBojXoR+gD+KfqXhVIDz0En
OI+K4674cNhf4CXNzUedU3qSOaJLfAU=
=PqbB
-----END PGP SIGNATURE-----
Merge tag 'fuse-update-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse updates from Miklos Szeredi:
- Fix a rare deadlock in virtiofs
- Fix st_blocks in writeback cache mode
- Fix wrong checks in splice move causing spurious warnings
- Fix a race between a GETATTR request and a FUSE_NOTIFY_INVAL_INODE
notification
- Use rb-tree instead of linear search for pages currently under
writeout by userspace
- Fix copy_file_range() inconsistencies
* tag 'fuse-update-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: copy_file_range should truncate cache
fuse: fix copy_file_range cache issues
fuse: optimize writepages search
fuse: update attr_version counter on fuse_notify_inval_inode()
fuse: don't check refcount after stealing page
fuse: fix weird page warning
fuse: use dump_page
virtiofs: do not use fuse_fill_super_common() for device installation
fuse: always allow query of st_dev
fuse: always flush dirty data on close(2)
fuse: invalidate inode attr in writeback cache mode
fuse: Update stale comment in queue_interrupt()
fuse: BUG_ON correction in fuse_dev_splice_write()
virtiofs: Add mount option and atime behavior to the doc
virtiofs: schedule blocking async replies in separate worker
Implement the new readahead operation in fuse by using __readahead_batch()
to fill the array of pages in fuse_args_pages directly. This lets us
inline fuse_readpages_fill() into fuse_readahead().
[willy@infradead.org: build fix]
Link: http://lkml.kernel.org/r/20200415025938.GB5820@bombadil.infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: Chao Yu <yuchao0@huawei.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Gao Xiang <gaoxiang25@huawei.com>
Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: http://lkml.kernel.org/r/20200414150233.24495-25-willy@infradead.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After the copy operation completes the cache is not up-to-date. Truncate
all pages in the interval that has successfully been copied.
Truncating completely copied dirty pages is okay, since the data has been
overwritten anyway. Truncating partially copied dirty pages is not okay;
add a comment for now.
Fixes: 88bc7d5097 ("fuse: add support for copy_file_range()")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
a) Dirty cache needs to be written back not just in the writeback_cache
case, since the dirty pages may come from memory maps.
b) The fuse_writeback_range() helper takes an inclusive interval, so the
end position needs to be pos+len-1 instead of pos+len.
Fixes: 88bc7d5097 ("fuse: add support for copy_file_range()")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
We want cached data to synced with the userspace filesystem on close(), for
example to allow getting correct st_blocks value. Do this regardless of
whether the userspace filesystem implements a FLUSH method or not.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Under writeback mode, inode->i_blocks is not updated, making utils du
read st.blocks as 0.
For example, when using virtiofs (cache=always & nondax mode) with
writeback_cache enabled, writing a new file and check its disk usage
with du, du reports 0 usage.
# uname -r
5.6.0-rc6+
# mount -t virtiofs virtiofs /mnt/virtiofs
# rm -f /mnt/virtiofs/testfile
# create new file and do extend write
# xfs_io -fc "pwrite 0 4k" /mnt/virtiofs/testfile
wrote 4096/4096 bytes at offset 0
4 KiB, 1 ops; 0.0001 sec (28.103 MiB/sec and 7194.2446 ops/sec)
# du -k /mnt/virtiofs/testfile
0 <==== disk usage is 0
# stat -c %s,%b /mnt/virtiofs/testfile
4096,0 <==== i_size is correct, but st_blocks is 0
Fix it by invalidating attr in fuse_flush(), so we get up-to-date attr
from server on next getattr.
Signed-off-by: Eryu Guan <eguan@linux.alibaba.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
In virtiofs (unlike in regular fuse) processing of async replies is
serialized. This can result in a deadlock in rare corner cases when
there's a circular dependency between the completion of two or more async
replies.
Such a deadlock can be reproduced with xfstests:generic/503 if TEST_DIR ==
SCRATCH_MNT (which is a misconfiguration):
- Process A is waiting for page lock in worker thread context and blocked
(virtio_fs_requests_done_work()).
- Process B is holding page lock and waiting for pending writes to
finish (fuse_wait_on_page_writeback()).
- Write requests are waiting in virtqueue and can't complete because
worker thread is blocked on page lock (process A).
Fix this by creating a unique work_struct for each async reply that can
block (O_DIRECT read).
Fixes: a62a8ef9d9 ("virtio-fs: add virtiofs filesystem")
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes coccicheck warning:
fs/fuse/readdir.c:335:1-19: WARNING: Assignment of 0/1 to bool variable
fs/fuse/file.c:1398:2-19: WARNING: Assignment of 0/1 to bool variable
fs/fuse/file.c:1400:2-20: WARNING: Assignment of 0/1 to bool variable
fs/fuse/cuse.c:454:1-20: WARNING: Assignment of 0/1 to bool variable
fs/fuse/cuse.c:455:1-19: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:497:2-17: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:504:2-23: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:511:2-22: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:518:2-23: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:522:2-26: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:526:2-18: WARNING: Assignment of 0/1 to bool variable
fs/fuse/inode.c:1000:1-20: WARNING: Assignment of 0/1 to bool variable
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Handle the special case of fuse_readpages() wanting to read the last page
of a hugest file possible and overflowing the end offset in the process.
This is basically to unbreak xfstests:generic/525 and prevent filesystems
from doing bad things with an overflowing offset.
Reported-by: Xiao Yang <ice_yangxiao@163.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
fuse_direct_io() can end up advancing the iterator by more than the amount
of data read or written. This case is handled by the generic code if going
through ->direct_IO(), but not in the FOPEN_DIRECT_IO case.
Fix by reverting the extra bytes from the iterator in case of error or a
short count.
To test: install lxcfs, then the following testcase
int fd = open("/var/lib/lxcfs/proc/uptime", O_RDONLY);
sendfile(1, fd, NULL, 16777216);
sendfile(1, fd, NULL, 16777216);
will spew WARN_ON() in iov_iter_pipe().
Reported-by: Peter Geis <pgwipeout@gmail.com>
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Fixes: 3c3db095b6 ("fuse: use iov_iter based generic splice helpers")
Cc: <stable@vger.kernel.org> # v5.1
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Buffered read in fuse normally goes via:
-> generic_file_buffered_read()
-> fuse_readpages()
-> fuse_send_readpages()
->fuse_simple_request() [called since v5.4]
In the case of a read request, fuse_simple_request() will return a
non-negative bytecount on success or a negative error value. A positive
bytecount was taken to be an error and the PG_error flag set on the page.
This resulted in generic_file_buffered_read() falling back to ->readpage(),
which would repeat the read request and succeed. Because of the repeated
read succeeding the bug was not detected with regression tests or other use
cases.
The FTP module in GVFS however fails the second read due to the
non-seekable nature of FTP downloads.
Fix by checking and ignoring positive return value from
fuse_simple_request().
Reported-by: Ondrej Holy <oholy@redhat.com>
Link: https://gitlab.gnome.org/GNOME/gvfs/issues/441
Fixes: 134831e36b ("fuse: convert readpages to simple api")
Cc: <stable@vger.kernel.org> # v5.4
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
exit_aio() is sometimes stuck in wait_for_completion() after aio is issued
with direct IO and the task receives a signal.
The reason is failure to call ->ki_complete() due to a leaked reference to
fuse_io_priv. This happens in fuse_async_req_send() if
fuse_simple_background() returns an error (e.g. -EINTR).
In this case the error value is propagated via io->err, so return success
to not confuse callers.
This issue is tracked as a virtio-fs issue:
https://gitlab.com/virtio-fs/qemu/issues/14
Reported-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Fixes: 45ac96ed7c ("fuse: convert direct_io to simple api")
Cc: <stable@vger.kernel.org> # v5.4
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Make sure filesystem is not returning a bogus number of bytes written.
Fixes: ea9b9907b8 ("fuse: implement perform_write")
Cc: <stable@vger.kernel.org> # v2.6.26
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Currently fuse_writepages_fill() calls get_fuse_inode() few times with
the same argument.
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Make sure cached writes are not reordered around open(..., O_TRUNC), with
the obvious wrong results.
Fixes: 4d99ff8f12 ("fuse: Turn writeback cache on")
Cc: <stable@vger.kernel.org> # v3.15+
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
account per-file, dentry, and inode data
blockdev/superblock and temporary per-request data was left alone, as
this usually isn't accounted
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Khazhismel Kumykov <khazhy@google.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
unlock_page() was missing in case of an already in-flight write against the
same page.
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Fixes: ff17be0864 ("fuse: writepage: skip already in flight")
Cc: <stable@vger.kernel.org> # v3.13
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Since we cannot reserve the request structure up-front, make sure that the
request allocation doesn't fail using __GFP_NOFAIL.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Derive fuse_writepage_args from fuse_io_args.
Sending the request is tricky since it was done with fi->lock held, hence
we must either use atomic allocation or release the lock. Both are
possible so try atomic first and if it fails, release the lock and do the
regular allocation with GFP_NOFS and __GFP_NOFAIL. Both flags are
necessary for correct operation.
Move the page realloc function from dev.c to file.c and convert to using
fuse_writepage_args.
The last caller of fuse_write_fill() is gone, so get rid of it.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
The old fuse_read_fill() helper can be deleted, now that the last user is
gone.
The fuse_io_args struct is moved to fuse_i.h so it can be shared between
readdir/read code.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Need to extend fuse_io_args with 'attr_ver' and 'ff' members, that take the
functionality of the same named members in fuse_req.
fuse_short_read() can now take struct fuse_args_pages.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Change of semantics in fuse_async_req_send/fuse_send_(read|write): these
can now return error, in which case the 'end' callback isn't called, so the
fuse_io_args object needs to be freed.
Added verification that the return value is sane (less than or equal to the
requested read/write size).
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Extract a fuse_write_flags() helper that converts ki_flags relevant write
to open flags.
The other parts of fuse_send_write() aren't used in the
fuse_perform_write() case.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Derive fuse_io_args from struct fuse_args_pages. This will be used for
both synchronous and asynchronous read/write requests.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
This will allow the use of this function when converting to the simple api
(which doesn't use fuse_req).
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
fuse_simple_request() is converted to return length of last (instead of
single) out arg, since FUSE_IOCTL_OUT has two out args, the second of which
is variable length.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
fuse_req_pages_alloc() is moved to file.c, since its internal use by the
device code will eventually be removed.
Rename to fuse_pages_alloc() to signify that it's not only usable for
fuse_req page array.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
We can use the "force" flag to make sure the DESTROY request is always sent
to userspace. So no need to keep it allocated during the lifetime of the
filesystem.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Add 'force' to fuse_args and use fuse_get_req_nofail_nopages() to allocate
the request in that case.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Instead of complex games with a reserved request, just use __GFP_NOFAIL.
Both calers (flush, readdir) guarantee that connection was already
initialized, so no need to wait for fc->initialized.
Also remove unneeded clearing of FR_BACKGROUND flag.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
...to make future expansion simpler. The hiearachical structure is a
historical thing that does not serve any practical purpose.
The generated code is excatly the same before and after the patch.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Like ->write_iter(), we update mtime and strip setuid of dst file before
copy and like ->read_iter(), we update atime of src file after copy.
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Acked-by: Miklos Szeredi <miklos@szeredi.hu>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
We want to enable cross-filesystem copy_file_range functionality
where possible, so push the "same superblock only" checks down to
the individual filesystem callouts so they can make their own
decisions about cross-superblock copy offload and fallack to
generic_copy_file_range() for cross-superblock copy.
[Amir] We do not call ->remap_file_range() in case the files are not
on the same sb and do not call ->copy_file_range() in case the files
do not belong to the same filesystem driver.
This changes behavior of the copy_file_range(2) syscall, which will
now allow cross filesystem in-kernel copy. CIFS already supports
cross-superblock copy, between two shares to the same server. This
functionality will now be available via the copy_file_range(2) syscall.
Cc: Steve French <stfrench@microsoft.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Now that we have generic_copy_file_range(), remove it as a fallback
case when offloads fail. This puts the responsibility for executing
fallbacks on the filesystems that implement ->copy_file_range and
allows us to add operational validity checks to
generic_copy_file_range().
Rework vfs_copy_file_range() to call a new do_copy_file_range()
helper to execute the copying callout, and move calls to
generic_file_copy_range() into filesystem methods where they
currently return failures.
[Amir] overlayfs is not responsible of executing the fallback.
It is the responsibility of the underlying filesystem.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
The fuse_writeback_range() helper flushes dirty data to the userspace
filesystem.
When the function returns, the WRITE requests for the data in the given
range have all been completed. This is not equivalent to fsync() on the
given range, since the userspace filesystem may not yet have the data on
stable storage.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Prior to sending COPY_FILE_RANGE to userspace filesystem, we must flush all
dirty pages in both the source and destination files.
This patch adds the missing flush of the source file.
Tested on libfuse-3.5.0 with:
libfuse/example/passthrough_ll /mnt/fuse/ -o writeback
libfuse/test/test_syscalls /mnt/fuse/tmp/test
Fixes: 88bc7d5097 ("fuse: add support for copy_file_range()")
Cc: <stable@vger.kernel.org> # v4.20
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
In the FOPEN_DIRECT_IO case the write path doesn't call file_remove_privs()
and that means setuid bit is not cleared if unpriviliged user writes to a
file with setuid bit set.
pjdfstest chmod test 12.t tests this and fails.
Fix this by adding a flag to the FUSE_WRITE message that requests clearing
privileges on the given file. This needs
This better than just calling fuse_remove_privs(), because the attributes
may not be up to date, so in that case a write may miss clearing the
privileges.
Test case:
$ passthrough_ll /mnt/pasthrough-mnt -o default_permissions,allow_other,cache=never
$ mkdir /mnt/pasthrough-mnt/testdir
$ cd /mnt/pasthrough-mnt/testdir
$ prove -rv pjdfstests/tests/chmod/12.t
Reported-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Tested-by: Vivek Goyal <vgoyal@redhat.com>
Do the proper cleanup in case the size check fails.
Tested with xfstests:generic/228
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 0cbade024b ("fuse: honor RLIMIT_FSIZE in fuse_file_fallocate")
Cc: Liu Bo <bo.liu@linux.alibaba.com>
Cc: <stable@vger.kernel.org> # v3.5
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Currently, a CUSE server running on a 64-bit kernel can tell when an ioctl
request comes from a process running a 32-bit ABI, but cannot tell whether
the requesting process is using legacy IA32 emulation or x32 ABI. In
particular, the server does not know the size of the client process's
`time_t` type.
For 64-bit kernels, the `FUSE_IOCTL_COMPAT` and `FUSE_IOCTL_32BIT` flags
are currently set in the ioctl input request (`struct fuse_ioctl_in` member
`flags`) for a 32-bit requesting process. This patch defines a new flag
`FUSE_IOCTL_COMPAT_X32` and sets it if the 32-bit requesting process is
using the x32 ABI. This allows the server process to distinguish between
requests coming from client processes using IA32 emulation or the x32 ABI
and so infer the size of the client process's `time_t` type and any other
IA32/x32 differences.
Signed-off-by: Ian Abbott <abbotti@mev.co.uk>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
The FUSE_FSYNC_DATASYNC flag was introduced by commit b6aeadeda2
("[PATCH] FUSE - file operations") as a magic number. No new values have
been added to fsync_flags since.
Signed-off-by: Alan Somers <asomers@FreeBSD.org>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Starting from commit 9c225f2655 ("vfs: atomic f_pos accesses as per
POSIX") files opened even via nonseekable_open gate read and write via lock
and do not allow them to be run simultaneously. This can create read vs
write deadlock if a filesystem is trying to implement a socket-like file
which is intended to be simultaneously used for both read and write from
filesystem client. See commit 10dce8af34 ("fs: stream_open - opener for
stream-like files so that read and write can run simultaneously without
deadlock") for details and e.g. commit 581d21a2d0 ("xenbus: fix deadlock
on writes to /proc/xen/xenbus") for a similar deadlock example on
/proc/xen/xenbus.
To avoid such deadlock it was tempting to adjust fuse_finish_open to use
stream_open instead of nonseekable_open on just FOPEN_NONSEEKABLE flags,
but grepping through Debian codesearch shows users of FOPEN_NONSEEKABLE,
and in particular GVFS which actually uses offset in its read and write
handlers
https://codesearch.debian.net/search?q=-%3Enonseekable+%3Dhttps://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1080https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1247-1346https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1399-1481
so if we would do such a change it will break a real user.
Add another flag (FOPEN_STREAM) for filesystem servers to indicate that the
opened handler is having stream-like semantics; does not use file position
and thus the kernel is free to issue simultaneous read and write request on
opened file handle.
This patch together with stream_open() should be added to stable kernels
starting from v3.14+. This will allow to patch OSSPD and other FUSE
filesystems that provide stream-like files to return FOPEN_STREAM |
FOPEN_NONSEEKABLE in open handler and this way avoid the deadlock on all
kernel versions. This should work because fuse_finish_open ignores unknown
open flags returned from a filesystem and so passing FOPEN_STREAM to a
kernel that is not aware of this flag cannot hurt. In turn the kernel that
is not aware of FOPEN_STREAM will be < v3.14 where just FOPEN_NONSEEKABLE
is sufficient to implement streams without read vs write deadlock.
Cc: stable@vger.kernel.org # v3.14+
Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
fstests generic/228 reported this failure that fuse fallocate does not
honor what 'ulimit -f' has set.
This adds the necessary inode_newsize_ok() check.
Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Fixes: 05ba1f0823 ("fuse: add FALLOCATE operation")
Cc: <stable@vger.kernel.org> # v3.5
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>