Commit Graph

3388 Commits

Author SHA1 Message Date
Chuck Lever
2fdd6bd293 NFSD: Update the NFSv2 SETATTR argument decoder to use struct xdr_stream
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25 09:36:26 -05:00
Chuck Lever
77edcdf91f NFSD: Update the NFSv2 LINK argument decoder to use struct xdr_stream
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25 09:36:26 -05:00
Chuck Lever
62aa557efb NFSD: Update the NFSv2 RENAME argument decoder to use struct xdr_stream
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25 09:36:26 -05:00
Chuck Lever
6d742c1864 NFSD: Update NFSv2 diropargs decoding to use struct xdr_stream
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25 09:36:26 -05:00
Chuck Lever
8688361ae2 NFSD: Update the NFSv2 READDIR argument decoder to use struct xdr_stream
As an additional clean up, move code not related to XDR decoding
into readdir's .pc_func call out.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25 09:36:26 -05:00
Chuck Lever
788cd46ecf NFSD: Add helper to set up the pages where the dirlist is encoded
Add a helper similar to nfsd3_init_dirlist_pages().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25 09:36:25 -05:00
Chuck Lever
1fcbd1c945 NFSD: Update the NFSv2 READLINK argument decoder to use struct xdr_stream
If the code that sets up the sink buffer for nfsd_readlink() is
moved adjacent to the nfsd_readlink() call site that uses it, then
the only argument is a file handle, and the fhandle decoder can be
used instead.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25 09:36:25 -05:00
Chuck Lever
a51b5b737a NFSD: Update the NFSv2 WRITE argument decoder to use struct xdr_stream
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25 09:36:25 -05:00
Chuck Lever
8c293ef993 NFSD: Update the NFSv2 READ argument decoder to use struct xdr_stream
The code that sets up rq_vec is refactored so that it is now
adjacent to the nfsd_read() call site where it is used.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25 09:36:25 -05:00
Chuck Lever
ebcd8e8b28 NFSD: Update the NFSv2 GETATTR argument decoder to use struct xdr_stream
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25 09:36:25 -05:00
Chuck Lever
f8a38e2d6c NFSD: Update the MKNOD3args decoder to use struct xdr_stream
This commit removes the last usage of the original decode_sattr3(),
so it is removed as a clean-up.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25 09:36:25 -05:00
Chuck Lever
da39201637 NFSD: Update the SYMLINK3args decoder to use struct xdr_stream
Similar to the WRITE decoder, code that checks the sanity of the
payload size is re-wired to work with xdr_stream infrastructure.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25 09:36:25 -05:00
Chuck Lever
83374c278d NFSD: Update the MKDIR3args decoder to use struct xdr_stream
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25 09:36:25 -05:00
Chuck Lever
6b3a11960d NFSD: Update the CREATE3args decoder to use struct xdr_stream
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25 09:36:25 -05:00
Chuck Lever
9cde9360d1 NFSD: Update the SETATTR3args decoder to use struct xdr_stream
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25 09:36:25 -05:00
Chuck Lever
efaa1e7c2c NFSD: Update the LINK3args decoder to use struct xdr_stream
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25 09:36:24 -05:00
Chuck Lever
d181e0a4be NFSD: Update the RENAME3args decoder to use struct xdr_stream
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25 09:36:24 -05:00
Chuck Lever
54d1d43dc7 NFSD: Update the NFSv3 DIROPargs decoder to use struct xdr_stream
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25 09:36:24 -05:00
Chuck Lever
c8d26a0acf NFSD: Update COMMIT3arg decoder to use struct xdr_stream
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25 09:36:24 -05:00
Chuck Lever
9cedc2e64c NFSD: Update READDIR3args decoders to use struct xdr_stream
As an additional clean up, neither nfsd3_proc_readdir() nor
nfsd3_proc_readdirplus() make use of the dircount argument, so
remove it from struct nfsd3_readdirargs.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25 09:36:24 -05:00
Chuck Lever
40116ebd09 NFSD: Add helper to set up the pages where the dirlist is encoded
De-duplicate some code that is used by both READDIR and READDIRPLUS
to build the dirlist in the Reply. Because this code is not related
to decoding READ arguments, it is moved to a more appropriate spot.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25 09:36:24 -05:00
Chuck Lever
0a8f37fb34 NFSD: Fix returned READDIR offset cookie
Code inspection shows that the server's NFSv3 READDIR implementation
handles offset cookies slightly differently than the NFSv2 READDIR,
NFSv3 READDIRPLUS, and NFSv4 READDIR implementations,
and there doesn't seem to be any need for this difference.

As a clean up, I copied the logic from nfsd3_proc_readdirplus().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25 09:36:24 -05:00
Chuck Lever
224c1c894e NFSD: Update READLINK3arg decoder to use struct xdr_stream
The NFSv3 READLINK request takes a single filehandle, so it can
re-use GETATTR's decoder.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25 09:36:24 -05:00
Chuck Lever
c43b2f229a NFSD: Update WRITE3arg decoder to use struct xdr_stream
As part of the update, open code that sanity-checks the size of the
data payload against the length of the RPC Call message has to be
re-implemented to use xdr_stream infrastructure.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25 09:36:24 -05:00
Chuck Lever
be63bd2ac6 NFSD: Update READ3arg decoder to use struct xdr_stream
The code that sets up rq_vec is refactored so that it is now
adjacent to the nfsd_read() call site where it is used.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25 09:36:24 -05:00
Chuck Lever
3b921a2b14 NFSD: Update ACCESS3arg decoder to use struct xdr_stream
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25 09:36:23 -05:00
Chuck Lever
9575363a9e NFSD: Update GETATTR3args decoder to use struct xdr_stream
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25 09:36:23 -05:00
Chuck Lever
2289e87b59 SUNRPC: Make trace_svc_process() display the RPC procedure symbolically
The next few patches will employ these strings to help make server-
side trace logs more human-readable. A similar technique is already
in use in kernel RPC client code.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-25 09:36:23 -05:00
Guoqing Jiang
684da7628d block: remove unnecessary argument from blk_execute_rq
We can remove 'q' from blk_execute_rq as well after the previous change
in blk_execute_rq_nowait.

And more importantly it never really was needed to start with given
that we can trivial derive it from struct request.

Cc: linux-scsi@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Cc: linux-ide@vger.kernel.org
Cc: linux-mmc@vger.kernel.org
Cc: linux-nvme@lists.infradead.org
Cc: linux-nfs@vger.kernel.org
Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-01-24 21:52:39 -07:00
Christian Brauner
899bf2ceb3 nfs: do not export idmapped mounts
Prevent nfs from exporting idmapped mounts until we have ported it to
support exporting idmapped mounts.

Link: https://lore.kernel.org/linux-api/20210123130958.3t6kvgkl634njpsm@wittgenstein
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: "J. Bruce Fields" <bfields@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-24 14:29:33 +01:00
Christian Brauner
6521f89170 namei: prepare for idmapped mounts
The various vfs_*() helpers are called by filesystems or by the vfs
itself to perform core operations such as create, link, mkdir, mknod, rename,
rmdir, tmpfile and unlink. Enable them to handle idmapped mounts. If the
inode is accessed through an idmapped mount map it into the
mount's user namespace and pass it down. Afterwards the checks and
operations are identical to non-idmapped mounts. If the initial user
namespace is passed nothing changes so non-idmapped mounts will see
identical behavior as before.

Link: https://lore.kernel.org/r/20210121131959.646623-15-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-24 14:27:18 +01:00
Christian Brauner
9fe6145097 namei: introduce struct renamedata
In order to handle idmapped mounts we will extend the vfs rename helper
to take two new arguments in follow up patches. Since this operations
already takes a bunch of arguments add a simple struct renamedata and
make the current helper use it before we extend it.

Link: https://lore.kernel.org/r/20210121131959.646623-14-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-24 14:27:18 +01:00
Tycho Andersen
c7c7a1a18a xattr: handle idmapped mounts
When interacting with extended attributes the vfs verifies that the
caller is privileged over the inode with which the extended attribute is
associated. For posix access and posix default extended attributes a uid
or gid can be stored on-disk. Let the functions handle posix extended
attributes on idmapped mounts. If the inode is accessed through an
idmapped mount we need to map it according to the mount's user
namespace. Afterwards the checks are identical to non-idmapped mounts.
This has no effect for e.g. security xattrs since they don't store uids
or gids and don't perform permission checks on them like posix acls do.

Link: https://lore.kernel.org/r/20210121131959.646623-10-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Signed-off-by: Tycho Andersen <tycho@tycho.pizza>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-24 14:27:17 +01:00
Christian Brauner
e65ce2a50c acl: handle idmapped mounts
The posix acl permission checking helpers determine whether a caller is
privileged over an inode according to the acls associated with the
inode. Add helpers that make it possible to handle acls on idmapped
mounts.

The vfs and the filesystems targeted by this first iteration make use of
posix_acl_fix_xattr_from_user() and posix_acl_fix_xattr_to_user() to
translate basic posix access and default permissions such as the
ACL_USER and ACL_GROUP type according to the initial user namespace (or
the superblock's user namespace) to and from the caller's current user
namespace. Adapt these two helpers to handle idmapped mounts whereby we
either map from or into the mount's user namespace depending on in which
direction we're translating.
Similarly, cap_convert_nscap() is used by the vfs to translate user
namespace and non-user namespace aware filesystem capabilities from the
superblock's user namespace to the caller's user namespace. Enable it to
handle idmapped mounts by accounting for the mount's user namespace.

In addition the fileystems targeted in the first iteration of this patch
series make use of the posix_acl_chmod() and, posix_acl_update_mode()
helpers. Both helpers perform permission checks on the target inode. Let
them handle idmapped mounts. These two helpers are called when posix
acls are set by the respective filesystems to handle this case we extend
the ->set() method to take an additional user namespace argument to pass
the mount's user namespace down.

Link: https://lore.kernel.org/r/20210121131959.646623-9-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-24 14:27:17 +01:00
Christian Brauner
2f221d6f7b attr: handle idmapped mounts
When file attributes are changed most filesystems rely on the
setattr_prepare(), setattr_copy(), and notify_change() helpers for
initialization and permission checking. Let them handle idmapped mounts.
If the inode is accessed through an idmapped mount map it into the
mount's user namespace. Afterwards the checks are identical to
non-idmapped mounts. If the initial user namespace is passed nothing
changes so non-idmapped mounts will see identical behavior as before.

Helpers that perform checks on the ia_uid and ia_gid fields in struct
iattr assume that ia_uid and ia_gid are intended values and have already
been mapped correctly at the userspace-kernelspace boundary as we
already do today. If the initial user namespace is passed nothing
changes so non-idmapped mounts will see identical behavior as before.

Link: https://lore.kernel.org/r/20210121131959.646623-8-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-24 14:27:16 +01:00
Christian Brauner
47291baa8d namei: make permission helpers idmapped mount aware
The two helpers inode_permission() and generic_permission() are used by
the vfs to perform basic permission checking by verifying that the
caller is privileged over an inode. In order to handle idmapped mounts
we extend the two helpers with an additional user namespace argument.
On idmapped mounts the two helpers will make sure to map the inode
according to the mount's user namespace and then peform identical
permission checks to inode_permission() and generic_permission(). If the
initial user namespace is passed nothing changes so non-idmapped mounts
will see identical behavior as before.

Link: https://lore.kernel.org/r/20210121131959.646623-6-christian.brauner@ubuntu.com
Cc: Christoph Hellwig <hch@lst.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: James Morris <jamorris@linux.microsoft.com>
Acked-by: Serge Hallyn <serge@hallyn.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-01-24 14:27:16 +01:00
J. Bruce Fields
51b2ee7d00 nfsd4: readdirplus shouldn't return parent of export
If you export a subdirectory of a filesystem, a READDIRPLUS on the root
of that export will return the filehandle of the parent with the ".."
entry.

The filehandle is optional, so let's just not return the filehandle for
".." if we're at the root of an export.

Note that once the client learns one filehandle outside of the export,
they can trivially access the rest of the export using further lookups.

However, it is also not very difficult to guess filehandles outside of
the export.  So exporting a subdirectory of a filesystem should
considered equivalent to providing access to the entire filesystem.  To
avoid confusion, we recommend only exporting entire filesystems.

Reported-by: Youjipeng <wangzhibei1999@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-01-12 08:54:14 -05:00
Linus Torvalds
c912fd05fa Merge tag 'nfsd-5.11-1' of git://git.linux-nfs.org/projects/cel/cel-2.6
Pull nfsd fixes from Chuck Lever:

 - Fix major TCP performance regression

 - Get NFSv4.2 READ_PLUS regression tests to pass

 - Improve NFSv4 COMPOUND memory allocation

 - Fix sparse warning

* tag 'nfsd-5.11-1' of git://git.linux-nfs.org/projects/cel/cel-2.6:
  NFSD: Restore NFSv4 decoding's SAVEMEM functionality
  SUNRPC: Handle TCP socket sends with kernel_sendpage() again
  NFSD: Fix sparse warning in nfssvc.c
  nfsd: Don't set eof on a truncated READ_PLUS
  nfsd: Fixes for nfsd4_encode_read_plus_data()
2021-01-11 11:35:46 -08:00
Chuck Lever
7b723008f9 NFSD: Restore NFSv4 decoding's SAVEMEM functionality
While converting the NFSv4 decoder to use xdr_stream-based XDR
processing, I removed the old SAVEMEM() macro. This macro wrapped
a bit of logic that avoided a memory allocation by recognizing when
the decoded item resides in a linear section of the Receive buffer.
In that case, it returned a pointer into that buffer instead of
allocating a bounce buffer.

The bounce buffer is necessary only when xdr_inline_decode() has
placed the decoded item in the xdr_stream's scratch buffer, which
disappears the next time xdr_inline_decode() is called with that
xdr_stream. That happens only if the data item crosses a page
boundary in the receive buffer, an exceedingly rare occurrence.

Allocating a bounce buffer every time results in a minor performance
regression that was introduced by the recent NFSv4 decoder overhaul.
Let's restore the previous behavior. On average, it saves about 1.5
kmalloc() calls per COMPOUND.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-12-18 12:28:58 -05:00
Chuck Lever
d6c9e4368c NFSD: Fix sparse warning in nfssvc.c
fs/nfsd/nfssvc.c:36:6: warning: symbol 'inter_copy_offload_enable' was not declared. Should it be static?

The parameter was added by commit ce0887ac96 ("NFSD add nfs4 inter
ssc to nfsd4_copy"). Relocate it into the source file that uses it,
and make it static. This approach is similar to the
nfs4_disable_idmapping, cltrack_prog, and cltrack_legacy_disable
module parameters.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-12-18 12:28:23 -05:00
Trond Myklebust
b68f0cbd3f nfsd: Don't set eof on a truncated READ_PLUS
If the READ_PLUS operation was truncated due to an error, then ensure we
clear the 'eof' flag.

Fixes: 9f0b5792f0 ("NFSD: Encode a full READ_PLUS reply")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-12-18 12:28:00 -05:00
Trond Myklebust
72d78717c6 nfsd: Fixes for nfsd4_encode_read_plus_data()
Ensure that we encode the data payload + padding, and that we truncate
the preallocated buffer to the actual read size.

Fixes: 528b84934e ("NFSD: Add READ_PLUS data support")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-12-18 12:27:55 -05:00
Linus Torvalds
14bd41e418 Merge tag 'fsnotify_for_v5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fsnotify updates from Jan Kara:
 "A few fsnotify fixes from Amir fixing fallout from big fsnotify
  overhaul a few months back and an improvement of defaults limiting
  maximum number of inotify watches from Waiman"

* tag 'fsnotify_for_v5.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  fsnotify: fix events reported to watching parent and child
  inotify: convert to handle_inode_event() interface
  fsnotify: generalize handle_inode_event()
  inotify: Increase default inotify.max_user_watches limit to 1048576
2020-12-17 10:56:27 -08:00
Trond Myklebust
716a8bc7f7 nfsd: Record NFSv4 pre/post-op attributes as non-atomic
For the case of NFSv4, specify to the client that the pre/post-op
attributes were not recorded atomically with the main operation.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-12-09 09:39:38 -05:00
Trond Myklebust
01cbf38539 nfsd: Set PF_LOCAL_THROTTLE on local filesystems only
Don't set PF_LOCAL_THROTTLE on remote filesystems like NFS, since they
aren't expected to ever be subject to double buffering.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-12-09 09:39:38 -05:00
Trond Myklebust
2e19d10c14 nfsd: Fix up nfsd to ensure that timeout errors don't result in ESTALE
If the underlying filesystem times out, then we want knfsd to return
NFSERR_JUKEBOX/DELAY rather than NFSERR_STALE.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-12-09 09:39:38 -05:00
Jeff Layton
7f84b488f9 nfsd: close cached files prior to a REMOVE or RENAME that would replace target
It's not uncommon for some workloads to do a bunch of I/O to a file and
delete it just afterward. If knfsd has a cached open file however, then
the file may still be open when the dentry is unlinked. If the
underlying filesystem is nfs, then that could trigger it to do a
sillyrename.

On a REMOVE or RENAME scan the nfsd_file cache for open files that
correspond to the inode, and proactively unhash and put their
references. This should prevent any delete-on-last-close activity from
occurring, solely due to knfsd's open file cache.

This must be done synchronously though so we use the variants that call
flush_delayed_fput. There are deadlock possibilities if you call
flush_delayed_fput while holding locks, however. In the case of
nfsd_rename, we don't even do the lookups of the dentries to be renamed
until we've locked for rename.

Once we've figured out what the target dentry is for a rename, check to
see whether there are cached open files associated with it. If there
are, then unwind all of the locking, close them all, and then reattempt
the rename.

None of this is really necessary for "typical" filesystems though. It's
mostly of use for NFS, so declare a new export op flag and use that to
determine whether to close the files beforehand.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-12-09 09:39:38 -05:00
Jeff Layton
ba5e8187c5 nfsd: allow filesystems to opt out of subtree checking
When we start allowing NFS to be reexported, then we have some problems
when it comes to subtree checking. In principle, we could allow it, but
it would mean encoding parent info in the filehandles and there may not
be enough space for that in a NFSv3 filehandle.

To enforce this at export upcall time, we add a new export_ops flag
that declares the filesystem ineligible for subtree checking.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-12-09 09:39:38 -05:00
Jeff Layton
daab110e47 nfsd: add a new EXPORT_OP_NOWCC flag to struct export_operations
With NFSv3 nfsd will always attempt to send along WCC data to the
client. This generally involves saving off the in-core inode information
prior to doing the operation on the given filehandle, and then issuing a
vfs_getattr to it after the op.

Some filesystems (particularly clustered or networked ones) have an
expensive ->getattr inode operation. Atomicity is also often difficult
or impossible to guarantee on such filesystems. For those, we're best
off not trying to provide WCC information to the client at all, and to
simply allow it to poll for that information as needed with a GETATTR
RPC.

This patch adds a new flags field to struct export_operations, and
defines a new EXPORT_OP_NOWCC flag that filesystems can use to indicate
that nfsd should not attempt to provide WCC info in NFSv3 replies. It
also adds a blurb about the new flags field and flag to the exporting
documentation.

The server will also now skip collecting this information for NFSv2 as
well, since that info is never used there anyway.

Note that this patch does not add this flag to any filesystem
export_operations structures. This was originally developed to allow
reexporting nfs via nfsd.

Other filesystems may want to consider enabling this flag too. It's hard
to tell however which ones have export operations to enable export via
knfsd and which ones mostly rely on them for open-by-filehandle support,
so I'm leaving that up to the individual maintainers to decide. I am
cc'ing the relevant lists for those filesystems that I think may want to
consider adding this though.

Cc: HPDD-discuss@lists.01.org
Cc: ceph-devel@vger.kernel.org
Cc: cluster-devel@redhat.com
Cc: fuse-devel@lists.sourceforge.net
Cc: ocfs2-devel@oss.oracle.com
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-12-09 09:39:38 -05:00
J. Bruce Fields
1631087ba8 Revert "nfsd4: support change_attr_type attribute"
This reverts commit a85857633b.

We're still factoring ctime into our change attribute even in the
IS_I_VERSION case.  If someone sets the system time backwards, a client
could see the change attribute go backwards.  Maybe we can just say
"well, don't do that", but there's some question whether that's good
enough, or whether we need a better guarantee.

Also, the client still isn't actually using the attribute.

While we're still figuring this out, let's just stop returning this
attribute.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2020-12-09 09:39:38 -05:00