Commit Graph

3531 Commits

Author SHA1 Message Date
NeilBrown
89b24336f0 NFSD: handle errors better in write_ports_addfd()
If write_ports_add() fails, we shouldn't destroy the serv, unless we had
only just created it.  So if there are any permanent sockets already
attached, leave the serv in place.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-12-13 13:42:49 -05:00
Chuck Lever
c2f1c4bd20 NFSD: Fix sparse warning
/home/cel/src/linux/linux/fs/nfsd/nfs4proc.c:1539:24: warning: incorrect type in assignment (different base types)
/home/cel/src/linux/linux/fs/nfsd/nfs4proc.c:1539:24:    expected restricted __be32 [usertype] status
/home/cel/src/linux/linux/fs/nfsd/nfs4proc.c:1539:24:    got int

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-12-13 13:42:49 -05:00
Eric W. Biederman
ca3574bd65 exit: Rename module_put_and_exit to module_put_and_kthread_exit
Update module_put_and_exit to call kthread_exit instead of do_exit.

Change the name to reflect this change in functionality.  All of the
users of module_put_and_exit are causing the current kthread to exit
so this change makes it clear what is happening.  There is no
functional change.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-12-13 12:04:45 -06:00
J. Bruce Fields
548ec0805c nfsd: fix use-after-free due to delegation race
A delegation break could arrive as soon as we've called vfs_setlease.  A
delegation break runs a callback which immediately (in
nfsd4_cb_recall_prepare) adds the delegation to del_recall_lru.  If we
then exit nfs4_set_delegation without hashing the delegation, it will be
freed as soon as the callback is done with it, without ever being
removed from del_recall_lru.

Symptoms show up later as use-after-free or list corruption warnings,
usually in the laundromat thread.

I suspect aba2072f45 "nfsd: grant read delegations to clients holding
writes" made this bug easier to hit, but I looked as far back as v3.0
and it looks to me it already had the same problem.  So I'm not sure
where the bug was introduced; it may have been there from the beginning.

Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-12-10 11:55:15 -05:00
Alexander Sverdlin
b10252c7ae nfsd: Fix nsfd startup race (again)
Commit bd5ae9288d ("nfsd: register pernet ops last, unregister first")
has re-opened rpc_pipefs_event() race against nfsd_net_id registration
(register_pernet_subsys()) which has been fixed by commit bb7ffbf29e
("nfsd: fix nsfd startup race triggering BUG_ON").

Restore the order of register_pernet_subsys() vs register_cld_notifier().
Add WARN_ON() to prevent a future regression.

Crash info:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000012
CPU: 8 PID: 345 Comm: mount Not tainted 5.4.144-... #1
pc : rpc_pipefs_event+0x54/0x120 [nfsd]
lr : rpc_pipefs_event+0x48/0x120 [nfsd]
Call trace:
 rpc_pipefs_event+0x54/0x120 [nfsd]
 blocking_notifier_call_chain
 rpc_fill_super
 get_tree_keyed
 rpc_fs_get_tree
 vfs_get_tree
 do_mount
 ksys_mount
 __arm64_sys_mount
 el0_svc_handler
 el0_svc

Fixes: bd5ae9288d ("nfsd: register pernet ops last, unregister first")
Cc: stable@vger.kernel.org
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nokia.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-12-10 11:54:59 -05:00
Christian Brauner
bb49e9e730 fs: add is_idmapped_mnt() helper
Multiple places open-code the same check to determine whether a given
mount is idmapped. Introduce a simple helper function that can be used
instead. This allows us to get rid of the fragile open-coding. We will
later change the check that is used to determine whether a given mount
is idmapped. Introducing a helper allows us to do this in a single
place instead of doing it for multiple places.

Link: https://lore.kernel.org/r/20211123114227.3124056-2-brauner@kernel.org (v1)
Link: https://lore.kernel.org/r/20211130121032.3753852-2-brauner@kernel.org (v2)
Link: https://lore.kernel.org/r/20211203111707.3901969-2-brauner@kernel.org
Cc: Seth Forshee <sforshee@digitalocean.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
CC: linux-fsdevel@vger.kernel.org
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Seth Forshee <sforshee@digitalocean.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
2021-12-03 18:44:06 +01:00
Linus Torvalds
ef1d8dda23 Merge tag 'nfsd-5.16-1' of git://linux-nfs.org/~bfields/linux
Pull nfsd bugfix from Bruce Fields:
 "This is just one bugfix for a buffer overflow in knfsd's xdr decoding"

* tag 'nfsd-5.16-1' of git://linux-nfs.org/~bfields/linux:
  NFSD: Fix exposure in nfsd4_decode_bitmap()
2021-11-17 08:38:00 -08:00
Chuck Lever
c0019b7db1 NFSD: Fix exposure in nfsd4_decode_bitmap()
rtm@csail.mit.edu reports:
> nfsd4_decode_bitmap4() will write beyond bmval[bmlen-1] if the RPC
> directs it to do so. This can cause nfsd4_decode_state_protect4_a()
> to write client-supplied data beyond the end of
> nfsd4_exchange_id.spo_must_allow[] when called by
> nfsd4_decode_exchange_id().

Rewrite the loops so nfsd4_decode_bitmap() cannot iterate beyond
@bmlen.

Reported by: rtm@csail.mit.edu
Fixes: d1c263a031 ("NFSD: Replace READ* macros in nfsd4_decode_fattr()")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-11-15 15:33:10 -05:00
Linus Torvalds
38764c7340 Merge tag 'nfsd-5.16' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
 "A slow cycle for nfsd: mainly cleanup, including Neil's patch dropping
  support for a filehandle format deprecated 20 years ago, and further
  xdr-related cleanup from Chuck"

* tag 'nfsd-5.16' of git://linux-nfs.org/~bfields/linux: (26 commits)
  nfsd4: remove obselete comment
  nfsd: document server-to-server-copy parameters
  NFSD:fix boolreturn.cocci warning
  nfsd: update create verifier comment
  SUNRPC: Change return value type of .pc_encode
  SUNRPC: Replace the "__be32 *p" parameter to .pc_encode
  NFSD: Save location of NFSv4 COMPOUND status
  SUNRPC: Change return value type of .pc_decode
  SUNRPC: Replace the "__be32 *p" parameter to .pc_decode
  SUNRPC: De-duplicate .pc_release() call sites
  SUNRPC: Simplify the SVC dispatch code path
  SUNRPC: Capture value of xdr_buf::page_base
  SUNRPC: Add trace event when alloc_pages_bulk() makes no progress
  svcrdma: Split svcrmda_wc_{read,write} tracepoints
  svcrdma: Split the svcrdma_wc_send() tracepoint
  svcrdma: Split the svcrdma_wc_receive() tracepoint
  NFSD: Have legacy NFSD WRITE decoders use xdr_stream_subsegment()
  SUNRPC: xdr_stream_subsegment() must handle non-zero page_bases
  NFSD: Initialize pointer ni with NULL and not plain integer 0
  NFSD: simplify struct nfsfh
  ...
2021-11-10 16:45:54 -08:00
Linus Torvalds
2ec20f4895 Merge tag 'nfs-for-5.16-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
 "Highlights include:

  Features:
   - NFSv4.1 can always retrieve and cache the ACCESS mode on OPEN
   - Optimisations for READDIR and the 'ls -l' style workload
   - Further replacements of dprintk() with tracepoints and other
     tracing improvements
   - Ensure we re-probe NFSv4 server capabilities when the user does a
     "mount -o remount"

  Bugfixes:
   - Fix an Oops in pnfs_mark_request_commit()
   - Fix up deadlocks in the commit code
   - Fix regressions in NFSv2/v3 attribute revalidation due to the
     change_attr_type optimisations
   - Fix some dentry verifier races
   - Fix some missing dentry verifier settings
   - Fix a performance regression in nfs_set_open_stateid_locked()
   - SUNRPC was sending multiple SYN calls when re-establishing a TCP
     connection.
   - Fix multiple NFSv4 issues due to missing sanity checking of server
     return values
   - Fix a potential Oops when FREE_STATEID races with an unmount

  Cleanups:
   - Clean up the labelled NFS code
   - Remove unused header <linux/pnfs_osd_xdr.h>"

* tag 'nfs-for-5.16-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (84 commits)
  NFSv4: Sanity check the parameters in nfs41_update_target_slotid()
  NFS: Remove the nfs4_label argument from decode_getattr_*() functions
  NFS: Remove the nfs4_label argument from nfs_setsecurity
  NFS: Remove the nfs4_label argument from nfs_fhget()
  NFS: Remove the nfs4_label argument from nfs_add_or_obtain()
  NFS: Remove the nfs4_label argument from nfs_instantiate()
  NFS: Remove the nfs4_label from the nfs_setattrres
  NFS: Remove the nfs4_label from the nfs4_getattr_res
  NFS: Remove the f_label from the nfs4_opendata and nfs_openres
  NFS: Remove the nfs4_label from the nfs4_lookupp_res struct
  NFS: Remove the label from the nfs4_lookup_res struct
  NFS: Remove the nfs4_label from the nfs4_link_res struct
  NFS: Remove the nfs4_label from the nfs4_create_res struct
  NFS: Remove the nfs4_label from the nfs_entry struct
  NFS: Create a new nfs_alloc_fattr_with_label() function
  NFS: Always initialise fattr->label in nfs_fattr_alloc()
  NFSv4.2: alloc_file_pseudo() takes an open flag, not an f_mode
  NFS: Don't allocate nfs_fattr on the stack in __nfs42_ssc_open()
  NFSv4: Remove unnecessary 'minor version' check
  NFSv4: Fix potential Oops in decode_op_map()
  ...
2021-11-10 16:32:46 -08:00
Linus Torvalds
2acda7549e Merge tag 'fsnotify_for_v5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fsnotify updates from Jan Kara:
 "Support for reporting filesystem errors through fanotify so that
  system health monitoring daemons can watch for these and act instead
  of scraping system logs"

* tag 'fsnotify_for_v5.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: (34 commits)
  samples: remove duplicate include in fs-monitor.c
  samples: Fix warning in fsnotify sample
  docs: Fix formatting of literal sections in fanotify docs
  samples: Make fs-monitor depend on libc and headers
  docs: Document the FAN_FS_ERROR event
  samples: Add fs error monitoring example
  ext4: Send notifications on error
  fanotify: Allow users to request FAN_FS_ERROR events
  fanotify: Emit generic error info for error event
  fanotify: Report fid info for file related file system errors
  fanotify: WARN_ON against too large file handles
  fanotify: Add helpers to decide whether to report FID/DFID
  fanotify: Wrap object_fh inline space in a creator macro
  fanotify: Support merging of error events
  fanotify: Support enqueueing of error events
  fanotify: Pre-allocate pool of error events
  fanotify: Reserve UAPI bits for FAN_FS_ERROR
  fsnotify: Support FS_ERROR event type
  fanotify: Require fid_mode for any non-fd event
  fanotify: Encode empty file handle when no inode is provided
  ...
2021-11-06 16:43:20 -07:00
Chuck Lever
8791545eda NFS: Move NFS protocol display macros to global header
Refactor: surface useful show_ macros so they can be shared between
the client and server trace code.

Additional clean up:
- Housekeeping: ensure the correct #include files are pulled in
  and add proper TRACE_DEFINE_ENUM where they are missing
- Use a consistent naming scheme for the helpers
- Store values to be displayed symbolically as unsigned long, as
  that is the type that the __print_yada() functions take

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-11-02 12:31:23 -04:00
J. Bruce Fields
80479eb862 nfsd4: remove obselete comment
Mandatory locking has been removed.  And the rest of this comment is
redundant with the code.

Reported-by: Jeff layton <jlayton@kernel.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-11-01 17:17:14 -04:00
Gabriel Krisman Bertazi
24dca90590 fsnotify: Protect fsnotify_handle_inode_event from no-inode events
FAN_FS_ERROR allows events without inodes - i.e. for file system-wide
errors.  Even though fsnotify_handle_inode_event is not currently used
by fanotify, this patch protects other backends from cases where neither
inode or dir are provided.  Also document the constraints of the
interface (inode and dir cannot be both NULL).

Link: https://lore.kernel.org/r/20211025192746.66445-12-krisman@collabora.com
Suggested-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jan Kara <jack@suse.cz>
2021-10-27 12:34:12 +02:00
Christoph Hellwig
8c6aabd1c7 nfsd/blocklayout: use ->get_unique_id instead of sending SCSI commands
Call the ->get_unique_id method to query the SCSI identifiers.  This can
use the cached VPD page in the sd driver instead of sending a command
on every LAYOUTGET.  It will also allow to support NVMe based volumes
if the draft for that ever takes off.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20211021060607.264371-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-10-22 08:33:57 -06:00
Changcheng Deng
291cd656da NFSD:fix boolreturn.cocci warning
./fs/nfsd/nfssvc.c: 1072: 8-9: :WARNING return of 0/1 in function
'nfssvc_decode_voidarg' with return type bool

Return statements in functions returning bool should use true/false
instead of 1/0.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Changcheng Deng <deng.changcheng@zte.com.cn>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-10-19 10:36:48 -04:00
J. Bruce Fields
2336d69686 nfsd: update create verifier comment
I don't know if that Solaris behavior matters any more or if it's still
possible to look up that bug ID any more.  The XFS behavior's definitely
still relevant, though; any but the most recent XFS filesystems will
lose the top bits.

Reported-by: Frank S. Filz <ffilzlnx@mindspring.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-10-15 14:42:11 -04:00
Chuck Lever
130e2054d4 SUNRPC: Change return value type of .pc_encode
Returning an undecorated integer is an age-old trope, but it's
not clear (even to previous experts in this code) that the only
valid return values are 1 and 0. These functions do not return
a negative errno, rpc_stat value, or a positive length.

Document there are only two valid return values by having
.pc_encode return only true or false.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-10-13 11:34:49 -04:00
Chuck Lever
fda4944114 SUNRPC: Replace the "__be32 *p" parameter to .pc_encode
The passed-in value of the "__be32 *p" parameter is now unused in
every server-side XDR encoder, and can be removed.

Note also that there is a line in each encoder that sets up a local
pointer to a struct xdr_stream. Passing that pointer from the
dispatcher instead saves one line per encoder function.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-10-13 11:34:49 -04:00
Chuck Lever
3b0ebb255f NFSD: Save location of NFSv4 COMPOUND status
Refactor: Currently nfs4svc_encode_compoundres() relies on the NFS
dispatcher to pass in the buffer location of the COMPOUND status.
Instead, save that buffer location in struct nfsd4_compoundres.

The compound tag follows immediately after.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-10-13 11:34:49 -04:00
Chuck Lever
c44b31c263 SUNRPC: Change return value type of .pc_decode
Returning an undecorated integer is an age-old trope, but it's
not clear (even to previous experts in this code) that the only
valid return values are 1 and 0. These functions do not return
a negative errno, rpc_stat value, or a positive length.

Document there are only two valid return values by having
.pc_decode return only true or false.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-10-13 10:29:41 -04:00
Chuck Lever
16c663642c SUNRPC: Replace the "__be32 *p" parameter to .pc_decode
The passed-in value of the "__be32 *p" parameter is now unused in
every server-side XDR decoder, and can be removed.

Note also that there is a line in each decoder that sets up a local
pointer to a struct xdr_stream. Passing that pointer from the
dispatcher instead saves one line per decoder function.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-10-13 10:29:41 -04:00
Linus Torvalds
1da38549dd Merge tag 'nfsd-5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
 "Bug fixes for NFSD error handling paths"

* tag 'nfsd-5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  NFSD: Keep existing listeners on portlist error
  SUNRPC: fix sign error causing rpcsec_gss drops
  nfsd: Fix a warning for nfsd_file_close_inode
  nfsd4: Handle the NFSv4 READDIR 'dircount' hint being zero
  nfsd: fix error handling of register_pernet_subsys() in init_nfsd()
2021-10-07 14:11:40 -07:00
Benjamin Coddington
c20106944e NFSD: Keep existing listeners on portlist error
If nfsd has existing listening sockets without any processes, then an error
returned from svc_create_xprt() for an additional transport will remove
those existing listeners.  We're seeing this in practice when userspace
attempts to create rpcrdma transports without having the rpcrdma modules
present before creating nfsd kernel processes.  Fix this by checking for
existing sockets before calling nfsd_destroy().

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-10-06 13:24:25 -04:00
Chuck Lever
dae9a6cab8 NFSD: Have legacy NFSD WRITE decoders use xdr_stream_subsegment()
Refactor.

Now that the NFSv2 and NFSv3 XDR decoders have been converted to
use xdr_streams, the WRITE decoder functions can use
xdr_stream_subsegment() to extract the WRITE payload into its own
xdr_buf, just as the NFSv4 WRITE XDR decoder currently does.

That makes it possible to pass the first kvec, pages array + length,
page_base, and total payload length via a single function parameter.

The payload's page_base is not yet assigned or used, but will be in
subsequent patches.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-10-02 16:10:01 -04:00
Colin Ian King
8e70bf27fd NFSD: Initialize pointer ni with NULL and not plain integer 0
Pointer ni is being initialized with plain integer zero. Fix
this by initializing with NULL.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-10-02 15:51:10 -04:00
NeilBrown
d8b26071e6 NFSD: simplify struct nfsfh
Most of the fields in 'struct knfsd_fh' are 2 levels deep (a union and a
struct) and are accessed using macros like:

 #define fh_FOO fh_base.fh_new.fb_FOO

This patch makes the union and struct anonymous, so that "fh_FOO" can be
a name directly within 'struct knfsd_fh' and the #defines aren't needed.

The file handle as a whole is sometimes accessed as "fh_base" or
"fh_base.fh_pad", neither of which are particularly helpful names.
As the struct holding the filehandle is now anonymous, we
cannot use the name of that, so we union it with 'fh_raw' and use that
where the raw filehandle is needed.  fh_raw also ensure the structure is
large enough for the largest possible filehandle.

fh_raw is a 'char' array, removing any need to cast it for memcpy etc.

SVCFH_fmt() is simplified using the "%ph" printk format.  This
changes the appearance of filehandles in dprintk() debugging, making
them a little more precise.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-10-02 15:51:10 -04:00
NeilBrown
c645a883df NFSD: drop support for ancient filehandles
Filehandles not in the "new" or "version 1" format have not been handed
out for new mounts since Linux 2.4 which was released 20 years ago.
I think it is safe to say that no such file handles are still in use,
and that we can drop support for them.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-10-02 15:51:10 -04:00
NeilBrown
ef5825e3cf NFSD: move filehandle format declarations out of "uapi".
A small part of the declaration concerning filehandle format are
currently in the "uapi" include directory:
   include/uapi/linux/nfsd/nfsfh.h

There is a lot more to the filehandle format, including "enum fid_type"
and "enum nfsd_fsid" which are not exported via "uapi".

This small part of the filehandle definition is of minimal use outside
of the kernel, and I can find no evidence that an other code is using
it. Certainly nfs-utils and wireshark (The most likely candidates) do not
use these declarations.

So move it out of "uapi" by copying the content from
  include/uapi/linux/nfsd/nfsfh.h
into
  fs/nfsd/nfsfh.h

A few unnecessary "#include" directives are not copied, and neither is
the #define of fh_auth, which is annotated as being for userspace only.

The copyright claims in the uapi file are identical to those in the nfsd
file, so there is no need to copy those.

The "__u32" style integer types are only needed in "uapi".  In
kernel-only code we can use the more familiar "u32" style.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-10-02 15:50:45 -04:00
Trond Myklebust
19598141f4 nfsd: Fix a warning for nfsd_file_close_inode
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-10-01 11:17:40 -04:00
Trond Myklebust
f2e717d655 nfsd4: Handle the NFSv4 READDIR 'dircount' hint being zero
RFC3530 notes that the 'dircount' field may be zero, in which case the
recommendation is to ignore it, and only enforce the 'maxcount' field.
In RFC5661, this recommendation to ignore a zero valued field becomes a
requirement.

Fixes: aee3776441 ("nfsd4: fix rd_dircount enforcement")
Cc: <stable@vger.kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-09-30 16:53:17 -04:00
Patrick Ho
1d625050c7 nfsd: fix error handling of register_pernet_subsys() in init_nfsd()
init_nfsd() should not unregister pernet subsys if the register fails
but should instead unwind from the last successful operation which is
register_filesystem().

Unregistering a failed register_pernet_subsys() call can result in
a kernel GPF as revealed by programmatically injecting an error in
register_pernet_subsys().

Verified the fix handled failure gracefully with no lingering nfsd
entry in /proc/filesystems.  This change was introduced by the commit
bd5ae9288d ("nfsd: register pernet ops last, unregister first"),
the original error handling logic was correct.

Fixes: bd5ae9288d ("nfsd: register pernet ops last, unregister first")
Cc: stable@vger.kernel.org
Signed-off-by: Patrick Ho <Patrick.Ho@netapp.com>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-09-30 10:58:52 -04:00
Linus Torvalds
cf1d2c3e7e Merge tag 'nfsd-5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
 "Critical bug fixes:

   - Fix crash in NLM TEST procedure

   - NFSv4.1+ backchannel not restored after PATH_DOWN"

* tag 'nfsd-5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
  nfsd: back channel stuck in SEQ4_STATUS_CB_PATH_DOWN
  NLM: Fix svcxdr_encode_owner()
2021-09-22 09:21:02 -07:00
Chuck Lever
8847ecc927 NFSD: Optimize DRC bucket pruning
DRC bucket pruning is done by nfsd_cache_lookup(), which is part of
every NFSv2 and NFSv3 dispatch (ie, it's done while the client is
waiting).

I added a trace_printk() in prune_bucket() to see just how long
it takes to prune. Here are two ends of the spectrum:

 prune_bucket: Scanned 1 and freed 0 in 90 ns, 62 entries remaining
 prune_bucket: Scanned 2 and freed 1 in 716 ns, 63 entries remaining
...
 prune_bucket: Scanned 75 and freed 74 in 34149 ns, 1 entries remaining

Pruning latency is noticeable on fast transports with fast storage.
By noticeable, I mean that the latency measured here in the worst
case is the same order of magnitude as the round trip time for
cached server operations.

We could do something like moving expired entries to an expired list
and then free them later instead of freeing them right in
prune_bucket(). But simply limiting the number of entries that can
be pruned by a lookup is simple and retains more entries in the
cache, making the DRC somewhat more effective.

Comparison with a 70/30 fio 8KB 12 thread direct I/O test:

Before:

  write: IOPS=61.6k, BW=481MiB/s (505MB/s)(14.1GiB/30001msec); 0 zone resets

WRITE:
	1848726 ops (30%)
	avg bytes sent per op: 8340 avg bytes received per op: 136
	backlog wait: 0.635158 	RTT: 0.128525 	total execute time: 0.827242 (milliseconds)

After:

  write: IOPS=63.0k, BW=492MiB/s (516MB/s)(14.4GiB/30001msec); 0 zone resets

WRITE:
	1891144 ops (30%)
	avg bytes sent per op: 8340 avg bytes received per op: 136
	backlog wait: 0.616114 	RTT: 0.126842 	total execute time: 0.805348 (milliseconds)

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2021-09-21 18:21:34 -04:00
Dai Ngo
02579b2ff8 nfsd: back channel stuck in SEQ4_STATUS_CB_PATH_DOWN
When the back channel enters SEQ4_STATUS_CB_PATH_DOWN state, the client
recovers by sending BIND_CONN_TO_SESSION but the server fails to recover
the back channel and leaves it as NFSD4_CB_DOWN.

Fix by enhancing nfsd4_bind_conn_to_session to probe the back channel
by calling nfsd4_probe_callback.

Signed-off-by: Dai Ngo <dai.ngo@oracle.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-09-17 10:35:12 -04:00
Linus Torvalds
a9c9a6f741 Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
 "This series consists of the usual driver updates (ufs, qla2xxx,
  target, smartpqi, lpfc, mpt3sas).

  The core change causing the most churn was replacing the command
  request field request with a macro, allowing us to offset map to it
  and remove the redundant field; the same was also done for the tag
  field.

  The most impactful change is the final removal of scsi_ioctl, which
  has been deprecated for over a decade"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (293 commits)
  scsi: ufs: Fix ufshcd_request_sense_async() for Samsung KLUFG8RHDA-B2D1
  scsi: ufs: ufs-exynos: Fix static checker warning
  scsi: mpt3sas: Use the proper SCSI midlayer interfaces for PI
  scsi: lpfc: Use the proper SCSI midlayer interfaces for PI
  scsi: lpfc: Copyright updates for 14.0.0.1 patches
  scsi: lpfc: Update lpfc version to 14.0.0.1
  scsi: lpfc: Add bsg support for retrieving adapter cmf data
  scsi: lpfc: Add cmf_info sysfs entry
  scsi: lpfc: Add debugfs support for cm framework buffers
  scsi: lpfc: Add support for maintaining the cm statistics buffer
  scsi: lpfc: Add rx monitoring statistics
  scsi: lpfc: Add support for the CM framework
  scsi: lpfc: Add cmfsync WQE support
  scsi: lpfc: Add support for cm enablement buffer
  scsi: lpfc: Add cm statistics buffer support
  scsi: lpfc: Add EDC ELS support
  scsi: lpfc: Expand FPIN and RDF receive logging
  scsi: lpfc: Add MIB feature enablement support
  scsi: lpfc: Add SET_HOST_DATA mbox cmd to pass date/time info to firmware
  scsi: fc: Add EDC ELS definition
  ...
2021-09-02 15:09:46 -07:00
Linus Torvalds
8bda955776 Merge tag 'nfsd-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd updates from Chuck Lever:
 "New features:

   - Support for server-side disconnect injection via debugfs

   - Protocol definitions for new RPC_AUTH_TLS authentication flavor

  Performance improvements:

   - Reduce page allocator traffic in the NFSD splice read actor

   - Reduce CPU utilization in svcrdma's Send completion handler

  Notable bug fixes:

   - Stabilize lockd operation when re-exporting NFS mounts

   - Fix the use of %.*s in NFSD tracepoints

   - Fix /proc/sys/fs/nfs/nsm_use_hostnames"

* tag 'nfsd-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (31 commits)
  nfsd: fix crash on LOCKT on reexported NFSv3
  nfs: don't allow reexport reclaims
  lockd: don't attempt blocking locks on nfs reexports
  nfs: don't atempt blocking locks on nfs reexports
  Keep read and write fds with each nlm_file
  lockd: update nlm_lookup_file reexport comment
  nlm: minor refactoring
  nlm: minor nlm_lookup_file argument change
  lockd: lockd server-side shouldn't set fl_ops
  SUNRPC: Add documentation for the fail_sunrpc/ directory
  SUNRPC: Server-side disconnect injection
  SUNRPC: Move client-side disconnect injection
  SUNRPC: Add a /sys/kernel/debug/fail_sunrpc/ directory
  svcrdma: xpt_bc_xprt is already clear in __svc_rdma_free()
  nfsd4: Fix forced-expiry locking
  rpc: fix gss_svc_init cleanup on failure
  SUNRPC: Add RPC_AUTH_TLS protocol numbers
  lockd: change the proc_handler for nsm_use_hostnames
  sysctl: introduce new proc handler proc_dobool
  SUNRPC: Fix a NULL pointer deref in trace_svc_stats_latency()
  ...
2021-08-31 10:57:06 -07:00
J. Bruce Fields
0bcc7ca40b nfsd: fix crash on LOCKT on reexported NFSv3
Unlike other filesystems, NFSv3 tries to use fl_file in the GETLK case.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-08-26 15:32:29 -04:00
J. Bruce Fields
bb0a55bb71 nfs: don't allow reexport reclaims
In the reexport case, nfsd is currently passing along locks with the
reclaim bit set.  The client sends a new lock request, which is granted
if there's currently no conflict--even if it's possible a conflicting
lock could have been briefly held in the interim.

We don't currently have any way to safely grant reclaim, so for now
let's just deny them all.

I'm doing this by passing the reclaim bit to nfs and letting it fail the
call, with the idea that eventually the client might be able to do
something more forgiving here.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-08-26 15:32:28 -04:00
J. Bruce Fields
f657f8eef3 nfs: don't atempt blocking locks on nfs reexports
NFS implements blocking locks by blocking inside its lock method.  In
the reexport case, this blocks the nfs server thread, which could lead
to deadlocks since an nfs server thread might be required to unlock the
conflicting lock.  It also causes a crash, since the nfs server thread
assumes it can free the lock when its lm_notify lock callback is called.

Ideal would be to make the nfs lock method return without blocking in
this case, but for now it works just not to attempt blocking locks.  The
difference is just that the original client will have to poll (as it
does in the v4.0 case) instead of getting a callback when the lock's
available.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Acked-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-08-26 15:32:10 -04:00
J. Bruce Fields
7f024fcd5c Keep read and write fds with each nlm_file
We shouldn't really be using a read-only file descriptor to take a write
lock.

Most filesystems will put up with it.  But NFS, for example, won't.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-08-23 18:05:31 -04:00
Jeff Layton
f7e33bdbd6 fs: remove mandatory file locking support
We added CONFIG_MANDATORY_FILE_LOCKING in 2015, and soon after turned it
off in Fedora and RHEL8. Several other distros have followed suit.

I've heard of one problem in all that time: Someone migrated from an
older distro that supported "-o mand" to one that didn't, and the host
had a fstab entry with "mand" in it which broke on reboot. They didn't
actually _use_ mandatory locking so they just removed the mount option
and moved on.

This patch rips out mandatory locking support wholesale from the kernel,
along with the Kconfig option and the Documentation file. It also
changes the mount code to ignore the "mand" mount option instead of
erroring out, and to throw a big, ugly warning.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
2021-08-23 06:15:36 -04:00
J. Bruce Fields
f7104cc1a9 nfsd4: Fix forced-expiry locking
This should use the network-namespace-wide client_lock, not the
per-client cl_lock.

You shouldn't see any bugs unless you're actually using the
forced-expiry interface introduced by 89c905becc.

Fixes: 89c905becc "nfsd: allow forced expiration of NFSv4 clients"
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-08-17 11:47:54 -04:00
NeilBrown
ea49dc7900 NFSD: remove vanity comments
Including one's name in copyright claims is appropriate.  Including it
in random comments is just vanity.  After 2 decades, it is time for
these to be gone.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-08-17 11:47:53 -04:00
Chuck Lever
d27b74a867 NFSD: Use new __string_len C macros for nfsd_clid_class
Clean up.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-08-17 11:47:52 -04:00
Chuck Lever
408c0de706 NFSD: Use new __string_len C macros for the nfs_dirent tracepoint
Clean up.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2021-08-17 11:47:52 -04:00
Chuck Lever
496d83cf0f NFSD: Batch release pages during splice read
Large splice reads call put_page() repeatedly. put_page() is
relatively expensive to call, so replace it with the new
svc_rqst_replace_page() helper to help amortize that cost.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: NeilBrown <neilb@suse.de>
2021-08-17 11:47:52 -04:00
Chuck Lever
c7e0b781b7 NFSD: Clean up splice actor
A few useful observations:

 - The value in @size is never modified.

 - splice_desc.len is an unsigned int, and so is xdr_buf.page_len.
   An implicit cast to size_t is unnecessary.

 - The computation of .page_len is the same in all three arms
   of the "if" statement, so hoist it out to make it clear that
   the operation is an unconditional invariant.

The resulting function is 18 bytes shorter on my system (-Os).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: NeilBrown <neilb@suse.de>
2021-08-17 11:47:52 -04:00
Christoph Hellwig
33ff4ce45b scsi: core: Rename CONFIG_BLK_SCSI_REQUEST to CONFIG_SCSI_COMMON
CONFIG_BLK_SCSI_REQUEST is rather misnamed as it enables building a small
amount of code shared by the SCSI initiator, target, and consumers of the
scsi_request passthrough API.  Rename it and also allow building it as a
module.

[mkp: add module license]

Link: https://lore.kernel.org/r/20210724072033.1284840-20-hch@lst.de
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-28 22:24:27 -04:00
Linus Torvalds
a022f7d575 Merge tag 'block-5.14-2021-07-08' of git://git.kernel.dk/linux-block
Pull more block updates from Jens Axboe:
 "A combination of changes that ended up depending on both the driver
  and core branch (and/or the IDE removal), and a few late arriving
  fixes. In detail:

   - Fix io ticks wrap-around issue (Chunguang)

   - nvme-tcp sock locking fix (Maurizio)

   - s390-dasd fixes (Kees, Christoph)

   - blk_execute_rq polling support (Keith)

   - blk-cgroup RCU iteration fix (Yu)

   - nbd backend ID addition (Prasanna)

   - Partition deletion fix (Yufen)

   - Use blk_mq_alloc_disk for mmc, mtip32xx, ubd (Christoph)

   - Removal of now dead block request types due to IDE removal
     (Christoph)

   - Loop probing and control device cleanups (Christoph)

   - Device uevent fix (Christoph)

   - Misc cleanups/fixes (Tetsuo, Christoph)"

* tag 'block-5.14-2021-07-08' of git://git.kernel.dk/linux-block: (34 commits)
  blk-cgroup: prevent rcu_sched detected stalls warnings while iterating blkgs
  block: fix the problem of io_ticks becoming smaller
  nvme-tcp: can't set sk_user_data without write_lock
  loop: remove unused variable in loop_set_status()
  block: remove the bdgrab in blk_drop_partitions
  block: grab a device refcount in disk_uevent
  s390/dasd: Avoid field over-reading memcpy()
  dasd: unexport dasd_set_target_state
  block: check disk exist before trying to add partition
  ubd: remove dead code in ubd_setup_common
  nvme: use return value from blk_execute_rq()
  block: return errors from blk_execute_rq()
  nvme: use blk_execute_rq() for passthrough commands
  block: support polling through blk_execute_rq
  block: remove REQ_OP_SCSI_{IN,OUT}
  block: mark blk_mq_init_queue_data static
  loop: rewrite loop_exit using idr_for_each_entry
  loop: split loop_lookup
  loop: don't allow deleting an unspecified loop device
  loop: move loop_ctl_mutex locking into loop_add
  ...
2021-07-09 12:05:33 -07:00