linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-14 16:12:02 +00:00

Author	SHA1	Message	Date
Chirantan Ekbote	31070f6cce	fuse: Fix parameter for FS_IOC_{GET,SET}FLAGS The ioctl encoding for this parameter is a long but the documentation says it should be an int and the kernel drivers expect it to be an int. If the fuse driver treats this as a long it might end up scribbling over the stack of a userspace process that only allocated enough space for an int. This was previously discussed in [1] and a patch for fuse was proposed in [2]. From what I can tell the patch in [2] was nacked in favor of adding new, "fixed" ioctls and using those from userspace. However there is still no "fixed" version of these ioctls and the fact is that it's sometimes infeasible to change all userspace to use the new one. Handling the ioctls specially in the fuse driver seems like the most pragmatic way for fuse servers to support them without causing crashes in userspace applications that call them. [1]: https://lore.kernel.org/linux-fsdevel/20131126200559.GH20559@hall.aurel32.net/T/ [2]: https://sourceforge.net/p/fuse/mailman/message/31771759/ Signed-off-by: Chirantan Ekbote <chirantan@chromium.org> Fixes: `59efec7b90` ("fuse: implement ioctl support") Cc: <stable@vger.kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2020-07-15 14:18:20 +02:00
Vasily Averin	7779b047a5	fuse: don't ignore errors from fuse_writepages_fill() fuse_writepages() ignores some errors taken from fuse_writepages_fill() I believe it is a bug: if .writepages is called with WB_SYNC_ALL it should either guarantee that all data was successfully saved or return error. Fixes: `26d614df1d` ("fuse: Implement writepages callback") Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2020-07-14 14:45:42 +02:00
Miklos Szeredi	6ddf3af93e	fuse: clean up condition for writepage sending fuse_writepages_fill uses following construction: if (wpa && ap->num_pages && (A \|\| B \|\| C)) { action; } else if (wpa && D) { if (E) { the same action; } } - ap->num_pages check is always true and can be removed - "if" and "else if" calls the same action and can be merged. Move checking A, B, C, D, E conditions to a helper, add comments. Original-patch-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2020-07-14 14:45:41 +02:00
Miklos Szeredi	c146024ec4	fuse: fix warning in tree_insert() and clean up writepage insertion fuse_writepages_fill() calls tree_insert() with ap->num_pages = 0 which triggers the following warning: WARNING: CPU: 1 PID: 17211 at fs/fuse/file.c:1728 tree_insert+0xab/0xc0 [fuse] RIP: 0010:tree_insert+0xab/0xc0 [fuse] Call Trace: fuse_writepages_fill+0x5da/0x6a0 [fuse] write_cache_pages+0x171/0x470 fuse_writepages+0x8a/0x100 [fuse] do_writepages+0x43/0xe0 Fix up the warning and clean up the code around rb-tree insertion: - Rename tree_insert() to fuse_insert_writeback() and make it return the conflicting entry in case of failure - Re-add tree_insert() as a wrapper around fuse_insert_writeback() - Rename fuse_writepage_in_flight() to fuse_writepage_add() and reverse the meaning of the return value to mean + "true" in case the writepage entry was successfully added + "false" in case it was in-fligt queued on an existing writepage entry's auxiliary list or the existing writepage entry's temporary page updated Switch from fuse_find_writeback() + tree_insert() to fuse_insert_writeback() - Move setting orig_pages to before inserting/updating the entry; this may result in the orig_pages value being discarded later in case of an in-flight request - In case of a new writepage entry use fuse_writepage_add() unconditionally, only set data->wpa if the entry was added. Fixes: `6b2fb79963` ("fuse: optimize writepages search") Reported-by: kernel test robot <rong.a.chen@intel.com> Original-path-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2020-07-14 14:45:41 +02:00
Miklos Szeredi	69a6487ac0	fuse: move rb_erase() before tree_insert() In fuse_writepage_end() the old writepages entry needs to be removed from the rbtree before inserting the new one, otherwise tree_insert() would fail. This is a very rare codepath and no reproducer exists. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2020-07-14 14:45:41 +02:00
Linus Torvalds	5b14671be5	fuse update for 5.8 -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQSQHSd0lITzzeNWNm3h3BK/laaZPAUCXt/0GAAKCRDh3BK/laaZ PIJjAP48TurDqomsQMBLiOsSUy0YIhd5QC/G5MYLKSBojXoR+gD+KfqXhVIDz0En OI+K4674cNhf4CXNzUedU3qSOaJLfAU= =PqbB -----END PGP SIGNATURE----- Merge tag 'fuse-update-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse updates from Miklos Szeredi: - Fix a rare deadlock in virtiofs - Fix st_blocks in writeback cache mode - Fix wrong checks in splice move causing spurious warnings - Fix a race between a GETATTR request and a FUSE_NOTIFY_INVAL_INODE notification - Use rb-tree instead of linear search for pages currently under writeout by userspace - Fix copy_file_range() inconsistencies * tag 'fuse-update-5.8' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: copy_file_range should truncate cache fuse: fix copy_file_range cache issues fuse: optimize writepages search fuse: update attr_version counter on fuse_notify_inval_inode() fuse: don't check refcount after stealing page fuse: fix weird page warning fuse: use dump_page virtiofs: do not use fuse_fill_super_common() for device installation fuse: always allow query of st_dev fuse: always flush dirty data on close(2) fuse: invalidate inode attr in writeback cache mode fuse: Update stale comment in queue_interrupt() fuse: BUG_ON correction in fuse_dev_splice_write() virtiofs: Add mount option and atime behavior to the doc virtiofs: schedule blocking async replies in separate worker	2020-06-09 15:48:24 -07:00
Matthew Wilcox (Oracle)	76a0294eb1	fuse: convert from readpages to readahead Implement the new readahead operation in fuse by using __readahead_batch() to fill the array of pages in fuse_args_pages directly. This lets us inline fuse_readpages_fill() into fuse_readahead(). [willy@infradead.org: build fix] Link: http://lkml.kernel.org/r/20200415025938.GB5820@bombadil.infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Acked-by: Miklos Szeredi <mszeredi@redhat.com> Cc: Chao Yu <yuchao0@huawei.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Gao Xiang <gaoxiang25@huawei.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: http://lkml.kernel.org/r/20200414150233.24495-25-willy@infradead.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2020-06-02 10:59:07 -07:00
Miklos Szeredi	9b46418c40	fuse: copy_file_range should truncate cache After the copy operation completes the cache is not up-to-date. Truncate all pages in the interval that has successfully been copied. Truncating completely copied dirty pages is okay, since the data has been overwritten anyway. Truncating partially copied dirty pages is not okay; add a comment for now. Fixes: `88bc7d5097` ("fuse: add support for copy_file_range()") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2020-05-20 11:39:35 +02:00
Miklos Szeredi	2c4656dfd9	fuse: fix copy_file_range cache issues a) Dirty cache needs to be written back not just in the writeback_cache case, since the dirty pages may come from memory maps. b) The fuse_writeback_range() helper takes an inclusive interval, so the end position needs to be pos+len-1 instead of pos+len. Fixes: `88bc7d5097` ("fuse: add support for copy_file_range()") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2020-05-20 11:39:35 +02:00
Maxim Patlasov	6b2fb79963	fuse: optimize writepages search Re-work fi->writepages, replacing list with rb-tree. This improves performance because kernel fuse iterates through fi->writepages for each writeback page and typical number of entries is about 800 (for 100MB of fuse writeback). Before patch: 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB) copied, 41.3473 s, 260 MB/s 2 1 0 57445400 40416 6323676 0 0 33 374743 8633 19210 1 8 88 3 0 29.86% [kernel] [k] _raw_spin_lock 26.62% [fuse] [k] fuse_page_is_writeback After patch: 10240+0 records in 10240+0 records out 10737418240 bytes (11 GB) copied, 21.4954 s, 500 MB/s 2 9 0 53676040 31744 10265984 0 0 64 854790 10956 48387 1 6 88 6 0 23.55% [kernel] [k] copy_user_enhanced_fast_string 9.87% [kernel] [k] __memcpy 3.10% [kernel] [k] _raw_spin_lock Signed-off-by: Maxim Patlasov <mpatlasov@virtuozzo.com> Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2020-05-19 14:50:38 +02:00
Miklos Szeredi	614c026e8a	fuse: always flush dirty data on close(2) We want cached data to synced with the userspace filesystem on close(), for example to allow getting correct st_blocks value. Do this regardless of whether the userspace filesystem implements a FLUSH method or not. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2020-05-19 14:50:37 +02:00
Eryu Guan	cf576c58b3	fuse: invalidate inode attr in writeback cache mode Under writeback mode, inode->i_blocks is not updated, making utils du read st.blocks as 0. For example, when using virtiofs (cache=always & nondax mode) with writeback_cache enabled, writing a new file and check its disk usage with du, du reports 0 usage. # uname -r 5.6.0-rc6+ # mount -t virtiofs virtiofs /mnt/virtiofs # rm -f /mnt/virtiofs/testfile # create new file and do extend write # xfs_io -fc "pwrite 0 4k" /mnt/virtiofs/testfile wrote 4096/4096 bytes at offset 0 4 KiB, 1 ops; 0.0001 sec (28.103 MiB/sec and 7194.2446 ops/sec) # du -k /mnt/virtiofs/testfile 0 <==== disk usage is 0 # stat -c %s,%b /mnt/virtiofs/testfile 4096,0 <==== i_size is correct, but st_blocks is 0 Fix it by invalidating attr in fuse_flush(), so we get up-to-date attr from server on next getattr. Signed-off-by: Eryu Guan <eguan@linux.alibaba.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2020-05-19 14:50:37 +02:00
Vivek Goyal	bb737bbe48	virtiofs: schedule blocking async replies in separate worker In virtiofs (unlike in regular fuse) processing of async replies is serialized. This can result in a deadlock in rare corner cases when there's a circular dependency between the completion of two or more async replies. Such a deadlock can be reproduced with xfstests:generic/503 if TEST_DIR == SCRATCH_MNT (which is a misconfiguration): - Process A is waiting for page lock in worker thread context and blocked (virtio_fs_requests_done_work()). - Process B is holding page lock and waiting for pending writes to finish (fuse_wait_on_page_writeback()). - Write requests are waiting in virtqueue and can't complete because worker thread is blocked on page lock (process A). Fix this by creating a unique work_struct for each async reply that can block (O_DIRECT read). Fixes: `a62a8ef9d9` ("virtio-fs: add virtiofs filesystem") Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2020-04-20 17:01:34 +02:00
zhengbin	cabdb4fa2f	fuse: use true,false for bool variable Fixes coccicheck warning: fs/fuse/readdir.c:335:1-19: WARNING: Assignment of 0/1 to bool variable fs/fuse/file.c:1398:2-19: WARNING: Assignment of 0/1 to bool variable fs/fuse/file.c:1400:2-20: WARNING: Assignment of 0/1 to bool variable fs/fuse/cuse.c:454:1-20: WARNING: Assignment of 0/1 to bool variable fs/fuse/cuse.c:455:1-19: WARNING: Assignment of 0/1 to bool variable fs/fuse/inode.c:497:2-17: WARNING: Assignment of 0/1 to bool variable fs/fuse/inode.c:504:2-23: WARNING: Assignment of 0/1 to bool variable fs/fuse/inode.c:511:2-22: WARNING: Assignment of 0/1 to bool variable fs/fuse/inode.c:518:2-23: WARNING: Assignment of 0/1 to bool variable fs/fuse/inode.c:522:2-26: WARNING: Assignment of 0/1 to bool variable fs/fuse/inode.c:526:2-18: WARNING: Assignment of 0/1 to bool variable fs/fuse/inode.c:1000:1-20: WARNING: Assignment of 0/1 to bool variable Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2020-02-06 16:39:28 +01:00
Miklos Szeredi	2f1398291b	fuse: don't overflow LLONG_MAX with end offset Handle the special case of fuse_readpages() wanting to read the last page of a hugest file possible and overflowing the end offset in the process. This is basically to unbreak xfstests:generic/525 and prevent filesystems from doing bad things with an overflowing offset. Reported-by: Xiao Yang <ice_yangxiao@163.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2020-02-06 16:39:28 +01:00
Miklos Szeredi	f658adeea4	fix up iter on short count in fuse_direct_io() fuse_direct_io() can end up advancing the iterator by more than the amount of data read or written. This case is handled by the generic code if going through ->direct_IO(), but not in the FOPEN_DIRECT_IO case. Fix by reverting the extra bytes from the iterator in case of error or a short count. To test: install lxcfs, then the following testcase int fd = open("/var/lib/lxcfs/proc/uptime", O_RDONLY); sendfile(1, fd, NULL, 16777216); sendfile(1, fd, NULL, 16777216); will spew WARN_ON() in iov_iter_pipe(). Reported-by: Peter Geis <pgwipeout@gmail.com> Reported-by: Al Viro <viro@zeniv.linux.org.uk> Fixes: `3c3db095b6` ("fuse: use iov_iter based generic splice helpers") Cc: <stable@vger.kernel.org> # v5.1 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2020-02-06 16:39:28 +01:00
Miklos Szeredi	7df1e988c7	fuse: fix fuse_send_readpages() in the syncronous read case Buffered read in fuse normally goes via: -> generic_file_buffered_read() -> fuse_readpages() -> fuse_send_readpages() ->fuse_simple_request() [called since v5.4] In the case of a read request, fuse_simple_request() will return a non-negative bytecount on success or a negative error value. A positive bytecount was taken to be an error and the PG_error flag set on the page. This resulted in generic_file_buffered_read() falling back to ->readpage(), which would repeat the read request and succeed. Because of the repeated read succeeding the bug was not detected with regression tests or other use cases. The FTP module in GVFS however fails the second read due to the non-seekable nature of FTP downloads. Fix by checking and ignoring positive return value from fuse_simple_request(). Reported-by: Ondrej Holy <oholy@redhat.com> Link: https://gitlab.gnome.org/GNOME/gvfs/issues/441 Fixes: `134831e36b` ("fuse: convert readpages to simple api") Cc: <stable@vger.kernel.org> # v5.4 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2020-01-16 11:09:36 +01:00
Miklos Szeredi	f1ebdeffc6	fuse: fix leak of fuse_io_priv exit_aio() is sometimes stuck in wait_for_completion() after aio is issued with direct IO and the task receives a signal. The reason is failure to call ->ki_complete() due to a leaked reference to fuse_io_priv. This happens in fuse_async_req_send() if fuse_simple_background() returns an error (e.g. -EINTR). In this case the error value is propagated via io->err, so return success to not confuse callers. This issue is tracked as a virtio-fs issue: https://gitlab.com/virtio-fs/qemu/issues/14 Reported-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Fixes: `45ac96ed7c` ("fuse: convert direct_io to simple api") Cc: <stable@vger.kernel.org> # v5.4 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-11-27 09:33:49 +01:00
Miklos Szeredi	8aab336b14	fuse: verify write return Make sure filesystem is not returning a bogus number of bytes written. Fixes: `ea9b9907b8` ("fuse: implement perform_write") Cc: <stable@vger.kernel.org> # v2.6.26 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-11-12 11:49:04 +01:00
Vasily Averin	091d1a7267	fuse: redundant get_fuse_inode() calls in fuse_writepages_fill() Currently fuse_writepages_fill() calls get_fuse_inode() few times with the same argument. Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-10-23 14:26:37 +02:00
Miklos Szeredi	e4648309b8	fuse: truncate pending writes on O_TRUNC Make sure cached writes are not reordered around open(..., O_TRUNC), with the obvious wrong results. Fixes: `4d99ff8f12` ("fuse: Turn writeback cache on") Cc: <stable@vger.kernel.org> # v3.15+ Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-10-23 14:26:37 +02:00
Khazhismel Kumykov	dc69e98c24	fuse: kmemcg account fs data account per-file, dentry, and inode data blockdev/superblock and temporary per-request data was left alone, as this usually isn't accounted Reviewed-by: Shakeel Butt <shakeelb@google.com> Signed-off-by: Khazhismel Kumykov <khazhy@google.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-09-24 15:28:01 +02:00
Vasily Averin	d5880c7a86	fuse: fix missing unlock_page in fuse_writepage() unlock_page() was missing in case of an already in-flight write against the same page. Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Fixes: `ff17be0864` ("fuse: writepage: skip already in flight") Cc: <stable@vger.kernel.org> # v3.13 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-09-24 15:28:01 +02:00
Miklos Szeredi	7213394c4e	fuse: simplify request allocation Page arrays are not allocated together with the request anymore. Get rid of the dead code Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-09-10 16:29:50 +02:00
Miklos Szeredi	4cb548666e	fuse: convert release to simple api Since we cannot reserve the request structure up-front, make sure that the request allocation doesn't fail using __GFP_NOFAIL. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-09-10 16:29:50 +02:00
Miklos Szeredi	33826ebbbe	fuse: convert writepages to simple api Derive fuse_writepage_args from fuse_io_args. Sending the request is tricky since it was done with fi->lock held, hence we must either use atomic allocation or release the lock. Both are possible so try atomic first and if it fails, release the lock and do the regular allocation with GFP_NOFS and __GFP_NOFAIL. Both flags are necessary for correct operation. Move the page realloc function from dev.c to file.c and convert to using fuse_writepage_args. The last caller of fuse_write_fill() is gone, so get rid of it. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-09-10 16:29:49 +02:00
Miklos Szeredi	43f5098eb8	fuse: convert readdir to simple api The old fuse_read_fill() helper can be deleted, now that the last user is gone. The fuse_io_args struct is moved to fuse_i.h so it can be shared between readdir/read code. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-09-10 16:29:49 +02:00
Miklos Szeredi	134831e36b	fuse: convert readpages to simple api Need to extend fuse_io_args with 'attr_ver' and 'ff' members, that take the functionality of the same named members in fuse_req. fuse_short_read() can now take struct fuse_args_pages. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-09-10 16:29:49 +02:00
Miklos Szeredi	45ac96ed7c	fuse: convert direct_io to simple api Change of semantics in fuse_async_req_send/fuse_send_(read\|write): these can now return error, in which case the 'end' callback isn't called, so the fuse_io_args object needs to be freed. Added verification that the return value is sane (less than or equal to the requested read/write size). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-09-10 16:29:49 +02:00
Miklos Szeredi	338f2e3f33	fuse: convert sync write to simple api Extract a fuse_write_flags() helper that converts ki_flags relevant write to open flags. The other parts of fuse_send_write() aren't used in the fuse_perform_write() case. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-09-10 16:29:49 +02:00
Miklos Szeredi	00793ca5d4	fuse: covert readpage to simple api Derive fuse_io_args from struct fuse_args_pages. This will be used for both synchronous and asynchronous read/write requests. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-09-10 16:29:49 +02:00
Miklos Szeredi	a0d45d84f4	fuse: fuse_short_read(): don't take fuse_req as argument This will allow the use of this function when converting to the simple api (which doesn't use fuse_req). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-09-10 16:29:49 +02:00
Miklos Szeredi	093f38a2c1	fuse: convert ioctl to simple api fuse_simple_request() is converted to return length of last (instead of single) out arg, since FUSE_IOCTL_OUT has two out args, the second of which is variable length. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-09-10 16:29:49 +02:00
Miklos Szeredi	4c4f03f78c	fuse: move page alloc fuse_req_pages_alloc() is moved to file.c, since its internal use by the device code will eventually be removed. Rename to fuse_pages_alloc() to signify that it's not only usable for fuse_req page array. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-09-10 16:29:49 +02:00
Miklos Szeredi	1ccd1ea249	fuse: convert destroy to simple api We can use the "force" flag to make sure the DESTROY request is always sent to userspace. So no need to keep it allocated during the lifetime of the filesystem. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-09-10 16:29:49 +02:00
Miklos Szeredi	c500ebaa90	fuse: convert flush to simple api Add 'force' to fuse_args and use fuse_get_req_nofail_nopages() to allocate the request in that case. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-09-10 16:29:48 +02:00
Miklos Szeredi	40ac7ab2d0	fuse: simplify 'nofail' request Instead of complex games with a reserved request, just use __GFP_NOFAIL. Both calers (flush, readdir) guarantee that connection was already initialized, so no need to wait for fc->initialized. Also remove unneeded clearing of FR_BACKGROUND flag. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-09-10 16:29:48 +02:00
Miklos Szeredi	d5b4854357	fuse: flatten 'struct fuse_args' ...to make future expansion simpler. The hiearachical structure is a historical thing that does not serve any practical purpose. The generated code is excatly the same before and after the patch. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-09-10 16:29:48 +02:00
Maxim Patlasov	17b2cbe294	fuse: cleanup fuse_wait_on_page_writeback fuse_wait_on_page_writeback() always returns zero and nobody cares. Let's make it void. Signed-off-by: Maxim Patlasov <mpatlasov@virtuozzo.com> Signed-off-by: Vasily Averin <vvs@virtuozzo.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-09-02 11:07:30 +02:00
Amir Goldstein	fe0da9c09b	fuse: copy_file_range needs to strip setuid bits and update timestamps Like ->write_iter(), we update mtime and strip setuid of dst file before copy and like ->read_iter(), we update atime of src file after copy. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Acked-by: Miklos Szeredi <miklos@szeredi.hu> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2019-06-09 10:07:07 -07:00
Amir Goldstein	5dae222a5f	vfs: allow copy_file_range to copy across devices We want to enable cross-filesystem copy_file_range functionality where possible, so push the "same superblock only" checks down to the individual filesystem callouts so they can make their own decisions about cross-superblock copy offload and fallack to generic_copy_file_range() for cross-superblock copy. [Amir] We do not call ->remap_file_range() in case the files are not on the same sb and do not call ->copy_file_range() in case the files do not belong to the same filesystem driver. This changes behavior of the copy_file_range(2) syscall, which will now allow cross filesystem in-kernel copy. CIFS already supports cross-superblock copy, between two shares to the same server. This functionality will now be available via the copy_file_range(2) syscall. Cc: Steve French <stfrench@microsoft.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2019-06-09 10:06:20 -07:00
Dave Chinner	64bf5ff58d	vfs: no fallback for ->copy_file_range Now that we have generic_copy_file_range(), remove it as a fallback case when offloads fail. This puts the responsibility for executing fallbacks on the filesystems that implement ->copy_file_range and allows us to add operational validity checks to generic_copy_file_range(). Rework vfs_copy_file_range() to call a new do_copy_file_range() helper to execute the copying callout, and move calls to generic_file_copy_range() into filesystem methods where they currently return failures. [Amir] overlayfs is not responsible of executing the fallback. It is the responsibility of the underlying filesystem. Signed-off-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>	2019-06-09 10:06:19 -07:00
Miklos Szeredi	26eb3bae50	fuse: extract helper for range writeback The fuse_writeback_range() helper flushes dirty data to the userspace filesystem. When the function returns, the WRITE requests for the data in the given range have all been completed. This is not equivalent to fsync() on the given range, since the userspace filesystem may not yet have the data on stable storage. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-05-28 13:22:50 +02:00
Miklos Szeredi	a2bc923629	fuse: fix copy_file_range() in the writeback case Prior to sending COPY_FILE_RANGE to userspace filesystem, we must flush all dirty pages in both the source and destination files. This patch adds the missing flush of the source file. Tested on libfuse-3.5.0 with: libfuse/example/passthrough_ll /mnt/fuse/ -o writeback libfuse/test/test_syscalls /mnt/fuse/tmp/test Fixes: `88bc7d5097` ("fuse: add support for copy_file_range()") Cc: <stable@vger.kernel.org> # v4.20 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-05-28 13:22:50 +02:00
Miklos Szeredi	4a2abf99f9	fuse: add FUSE_WRITE_KILL_PRIV In the FOPEN_DIRECT_IO case the write path doesn't call file_remove_privs() and that means setuid bit is not cleared if unpriviliged user writes to a file with setuid bit set. pjdfstest chmod test 12.t tests this and fails. Fix this by adding a flag to the FUSE_WRITE message that requests clearing privileges on the given file. This needs This better than just calling fuse_remove_privs(), because the attributes may not be up to date, so in that case a write may miss clearing the privileges. Test case: $ passthrough_ll /mnt/pasthrough-mnt -o default_permissions,allow_other,cache=never $ mkdir /mnt/pasthrough-mnt/testdir $ cd /mnt/pasthrough-mnt/testdir $ prove -rv pjdfstests/tests/chmod/12.t Reported-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Tested-by: Vivek Goyal <vgoyal@redhat.com>	2019-05-27 11:42:36 +02:00
Miklos Szeredi	35d6fcbb7c	fuse: fallocate: fix return with locked inode Do the proper cleanup in case the size check fails. Tested with xfstests:generic/228 Reported-by: kbuild test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: `0cbade024b` ("fuse: honor RLIMIT_FSIZE in fuse_file_fallocate") Cc: Liu Bo <bo.liu@linux.alibaba.com> Cc: <stable@vger.kernel.org> # v3.5 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-05-27 11:42:35 +02:00
Ian Abbott	6407f44aaf	fuse: Add ioctl flag for x32 compat ioctl Currently, a CUSE server running on a 64-bit kernel can tell when an ioctl request comes from a process running a 32-bit ABI, but cannot tell whether the requesting process is using legacy IA32 emulation or x32 ABI. In particular, the server does not know the size of the client process's `time_t` type. For 64-bit kernels, the `FUSE_IOCTL_COMPAT` and `FUSE_IOCTL_32BIT` flags are currently set in the ioctl input request (`struct fuse_ioctl_in` member `flags`) for a 32-bit requesting process. This patch defines a new flag `FUSE_IOCTL_COMPAT_X32` and sets it if the 32-bit requesting process is using the x32 ABI. This allows the server process to distinguish between requests coming from client processes using IA32 emulation or the x32 ABI and so infer the size of the client process's `time_t` type and any other IA32/x32 differences. Signed-off-by: Ian Abbott <abbotti@mev.co.uk> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-04-24 17:05:07 +02:00
Alan Somers	154603fe3e	fuse: document fuse_fsync_in.fsync_flags The FUSE_FSYNC_DATASYNC flag was introduced by commit `b6aeadeda2` ("[PATCH] FUSE - file operations") as a magic number. No new values have been added to fsync_flags since. Signed-off-by: Alan Somers <asomers@FreeBSD.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-04-24 17:05:07 +02:00
Kirill Smelkov	bbd84f3365	fuse: Add FOPEN_STREAM to use stream_open() Starting from commit `9c225f2655` ("vfs: atomic f_pos accesses as per POSIX") files opened even via nonseekable_open gate read and write via lock and do not allow them to be run simultaneously. This can create read vs write deadlock if a filesystem is trying to implement a socket-like file which is intended to be simultaneously used for both read and write from filesystem client. See commit `10dce8af34` ("fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock") for details and e.g. commit `581d21a2d0` ("xenbus: fix deadlock on writes to /proc/xen/xenbus") for a similar deadlock example on /proc/xen/xenbus. To avoid such deadlock it was tempting to adjust fuse_finish_open to use stream_open instead of nonseekable_open on just FOPEN_NONSEEKABLE flags, but grepping through Debian codesearch shows users of FOPEN_NONSEEKABLE, and in particular GVFS which actually uses offset in its read and write handlers https://codesearch.debian.net/search?q=-%3Enonseekable+%3D https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1080 https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1247-1346 https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1399-1481 so if we would do such a change it will break a real user. Add another flag (FOPEN_STREAM) for filesystem servers to indicate that the opened handler is having stream-like semantics; does not use file position and thus the kernel is free to issue simultaneous read and write request on opened file handle. This patch together with stream_open() should be added to stable kernels starting from v3.14+. This will allow to patch OSSPD and other FUSE filesystems that provide stream-like files to return FOPEN_STREAM \| FOPEN_NONSEEKABLE in open handler and this way avoid the deadlock on all kernel versions. This should work because fuse_finish_open ignores unknown open flags returned from a filesystem and so passing FOPEN_STREAM to a kernel that is not aware of this flag cannot hurt. In turn the kernel that is not aware of FOPEN_STREAM will be < v3.14 where just FOPEN_NONSEEKABLE is sufficient to implement streams without read vs write deadlock. Cc: stable@vger.kernel.org # v3.14+ Signed-off-by: Kirill Smelkov <kirr@nexedi.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-04-24 17:05:07 +02:00
Liu Bo	0cbade024b	fuse: honor RLIMIT_FSIZE in fuse_file_fallocate fstests generic/228 reported this failure that fuse fallocate does not honor what 'ulimit -f' has set. This adds the necessary inode_newsize_ok() check. Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com> Fixes: `05ba1f0823` ("fuse: add FALLOCATE operation") Cc: <stable@vger.kernel.org> # v3.5 Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>	2019-04-24 17:05:06 +02:00

1 2 3 4 5 ...

408 Commits