Commit Graph

335 Commits

Author SHA1 Message Date
Mike Snitzer
bc29408695 nfs_common: fix localio to cope with racing nfs_local_probe()
Fix the possibility of racing nfs_local_probe() resulting in:
  list_add double add: new=ffff8b99707f9f58, prev=ffff8b99707f9f58, next=ffffffffc0f30000.
  ------------[ cut here ]------------
  kernel BUG at lib/list_debug.c:35!

Add nfs_uuid_init() to properly initialize all nfs_uuid_t members
(particularly its list_head).

Switch to returning bool from nfs_uuid_begin(), returns false if
nfs_uuid_t is already in-use (its list_head is on a list). Update
nfs_local_probe() to return early if the nfs_client's cl_uuid
(nfs_uuid_t) is in-use.

Also, switch nfs_uuid_begin() from using list_add_tail_rcu() to
list_add_tail() -- rculist was used in an earlier version of the
localio code that had a lockless nfs_uuid_lookup interface.

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-11-04 10:24:19 -05:00
Yanjun Zhang
a848c29e34 NFSv4: Prevent NULL-pointer dereference in nfs42_complete_copies()
On the node of an NFS client, some files saved in the mountpoint of the
NFS server were copied to another location of the same NFS server.
Accidentally, the nfs42_complete_copies() got a NULL-pointer dereference
crash with the following syslog:

[232064.838881] NFSv4: state recovery failed for open file nfs/pvc-12b5200d-cd0f-46a3-b9f0-af8f4fe0ef64.qcow2, error = -116
[232064.839360] NFSv4: state recovery failed for open file nfs/pvc-12b5200d-cd0f-46a3-b9f0-af8f4fe0ef64.qcow2, error = -116
[232066.588183] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000058
[232066.588586] Mem abort info:
[232066.588701]   ESR = 0x0000000096000007
[232066.588862]   EC = 0x25: DABT (current EL), IL = 32 bits
[232066.589084]   SET = 0, FnV = 0
[232066.589216]   EA = 0, S1PTW = 0
[232066.589340]   FSC = 0x07: level 3 translation fault
[232066.589559] Data abort info:
[232066.589683]   ISV = 0, ISS = 0x00000007
[232066.589842]   CM = 0, WnR = 0
[232066.589967] user pgtable: 64k pages, 48-bit VAs, pgdp=00002000956ff400
[232066.590231] [0000000000000058] pgd=08001100ae100003, p4d=08001100ae100003, pud=08001100ae100003, pmd=08001100b3c00003, pte=0000000000000000
[232066.590757] Internal error: Oops: 96000007 [#1] SMP
[232066.590958] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm vhost_net vhost vhost_iotlb tap tun ipt_rpfilter xt_multiport ip_set_hash_ip ip_set_hash_net xfrm_interface xfrm6_tunnel tunnel4 tunnel6 esp4 ah4 wireguard libcurve25519_generic veth xt_addrtype xt_set nf_conntrack_netlink ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_bitmap_port ip_set_hash_ipport dummy ip_set ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs iptable_filter sch_ingress nfnetlink_cttimeout vport_gre ip_gre ip_tunnel gre vport_geneve geneve vport_vxlan vxlan ip6_udp_tunnel udp_tunnel openvswitch nf_conncount dm_round_robin dm_service_time dm_multipath xt_nat xt_MASQUERADE nft_chain_nat nf_nat xt_mark xt_conntrack xt_comment nft_compat nft_counter nf_tables nfnetlink ocfs2 ocfs2_nodemanager ocfs2_stackglue iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ipmi_ssif nbd overlay 8021q garp mrp bonding tls rfkill sunrpc ext4 mbcache jbd2
[232066.591052]  vfat fat cas_cache cas_disk ses enclosure scsi_transport_sas sg acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler ip_tables vfio_pci vfio_pci_core vfio_virqfd vfio_iommu_type1 vfio dm_mirror dm_region_hash dm_log dm_mod nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc fuse xfs libcrc32c ast drm_vram_helper qla2xxx drm_kms_helper syscopyarea crct10dif_ce sysfillrect ghash_ce sysimgblt sha2_ce fb_sys_fops cec sha256_arm64 sha1_ce drm_ttm_helper ttm nvme_fc igb sbsa_gwdt nvme_fabrics drm nvme_core i2c_algo_bit i40e scsi_transport_fc megaraid_sas aes_neon_bs
[232066.596953] CPU: 6 PID: 4124696 Comm: 10.253.166.125- Kdump: loaded Not tainted 5.15.131-9.cl9_ocfs2.aarch64 #1
[232066.597356] Hardware name: Great Wall .\x93\x8e...RF6260 V5/GWMSSE2GL1T, BIOS T656FBE_V3.0.18 2024-01-06
[232066.597721] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[232066.598034] pc : nfs4_reclaim_open_state+0x220/0x800 [nfsv4]
[232066.598327] lr : nfs4_reclaim_open_state+0x12c/0x800 [nfsv4]
[232066.598595] sp : ffff8000f568fc70
[232066.598731] x29: ffff8000f568fc70 x28: 0000000000001000 x27: ffff21003db33000
[232066.599030] x26: ffff800005521ae0 x25: ffff0100f98fa3f0 x24: 0000000000000001
[232066.599319] x23: ffff800009920008 x22: ffff21003db33040 x21: ffff21003db33050
[232066.599628] x20: ffff410172fe9e40 x19: ffff410172fe9e00 x18: 0000000000000000
[232066.599914] x17: 0000000000000000 x16: 0000000000000004 x15: 0000000000000000
[232066.600195] x14: 0000000000000000 x13: ffff800008e685a8 x12: 00000000eac0c6e6
[232066.600498] x11: 0000000000000000 x10: 0000000000000008 x9 : ffff8000054e5828
[232066.600784] x8 : 00000000ffffffbf x7 : 0000000000000001 x6 : 000000000a9eb14a
[232066.601062] x5 : 0000000000000000 x4 : ffff70ff8a14a800 x3 : 0000000000000058
[232066.601348] x2 : 0000000000000001 x1 : 54dce46366daa6c6 x0 : 0000000000000000
[232066.601636] Call trace:
[232066.601749]  nfs4_reclaim_open_state+0x220/0x800 [nfsv4]
[232066.601998]  nfs4_do_reclaim+0x1b8/0x28c [nfsv4]
[232066.602218]  nfs4_state_manager+0x928/0x10f0 [nfsv4]
[232066.602455]  nfs4_run_state_manager+0x78/0x1b0 [nfsv4]
[232066.602690]  kthread+0x110/0x114
[232066.602830]  ret_from_fork+0x10/0x20
[232066.602985] Code: 1400000d f9403f20 f9402e61 91016003 (f9402c00)
[232066.603284] SMP: stopping secondary CPUs
[232066.606936] Starting crashdump kernel...
[232066.607146] Bye!

Analysing the vmcore, we know that nfs4_copy_state listed by destination
nfs_server->ss_copies was added by the field copies in handle_async_copy(),
and we found a waiting copy process with the stack as:
PID: 3511963  TASK: ffff710028b47e00  CPU: 0   COMMAND: "cp"
 #0 [ffff8001116ef740] __switch_to at ffff8000081b92f4
 #1 [ffff8001116ef760] __schedule at ffff800008dd0650
 #2 [ffff8001116ef7c0] schedule at ffff800008dd0a00
 #3 [ffff8001116ef7e0] schedule_timeout at ffff800008dd6aa0
 #4 [ffff8001116ef860] __wait_for_common at ffff800008dd166c
 #5 [ffff8001116ef8e0] wait_for_completion_interruptible at ffff800008dd1898
 #6 [ffff8001116ef8f0] handle_async_copy at ffff8000055142f4 [nfsv4]
 #7 [ffff8001116ef970] _nfs42_proc_copy at ffff8000055147c8 [nfsv4]
 #8 [ffff8001116efa80] nfs42_proc_copy at ffff800005514cf0 [nfsv4]
 #9 [ffff8001116efc50] __nfs4_copy_file_range.constprop.0 at ffff8000054ed694 [nfsv4]

The NULL-pointer dereference was due to nfs42_complete_copies() listed
the nfs_server->ss_copies by the field ss_copies of nfs4_copy_state.
So the nfs4_copy_state address ffff0100f98fa3f0 was offset by 0x10 and
the data accessed through this pointer was also incorrect. Generally,
the ordered list nfs4_state_owner->so_states indicate open(O_RDWR) or
open(O_WRITE) states are reclaimed firstly by nfs4_reclaim_open_state().
When destination state reclaim is failed with NFS_STATE_RECOVERY_FAILED
and copies are not deleted in nfs_server->ss_copies, the source state
may be passed to the nfs42_complete_copies() process earlier, resulting
in this crash scene finally. To solve this issue, we add a list_head
nfs_server->ss_src_copies for a server-to-server copy specially.

Fixes: 0e65a32c8a ("NFS: handle source server reboot")
Signed-off-by: Yanjun Zhang <zhangyanjun@cestc.cn>
Reviewed-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-10-03 16:19:13 -04:00
Mike Snitzer
56bcd0f07f nfs: implement client support for NFS_LOCALIO_PROGRAM
The LOCALIO auxiliary RPC protocol consists of a single "UUID_IS_LOCAL"
RPC method that allows the Linux NFS client to verify the local Linux
NFS server can see the nonce (single-use UUID) the client generated and
made available in nfs_common for subsequent lookup and verification
by the NFS server.  If matched, the NFS server populates members in the
nfs_uuid_t struct.  The NFS client then transfers these nfs_uuid_t
struct member pointers to the nfs_client struct and cleans up the
nfs_uuid_t struct.  See: fs/nfs/localio.c:nfs_local_probe()

This protocol isn't part of an IETF standard, nor does it need to be
considering it is Linux-to-Linux auxiliary RPC protocol that amounts
to an implementation detail.

Localio is only supported when UNIX-style authentication (AUTH_UNIX, aka
AUTH_SYS) is used (enforced by fs/nfs/localio.c:nfs_local_probe()).

The UUID_IS_LOCAL method encodes the client generated uuid_t in terms of
the fixed UUID_SIZE (16 bytes).  The fixed size opaque encode and decode
XDR methods are used instead of the less efficient variable sized
methods.

Having a nonce (single-use uuid) is better than using the same uuid
for the life of the server, and sending it proactively by client
rather than reactively by the server is also safer.

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Co-developed-by: NeilBrown <neilb@suse.de>
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:30 -04:00
Weston Andros Adamson
70ba381e1a nfs: add LOCALIO support
Add client support for bypassing NFS for localhost reads, writes, and
commits. This is only useful when the client and the server are
running on the same host.

nfs_local_probe() is stubbed out, later commits will enable client and
server handshake via a Linux-only LOCALIO auxiliary RPC protocol.

This has dynamic binding with the nfsd module (via nfs_localio module
which is part of nfs_common). LOCALIO will only work if nfsd is
already loaded.

The "localio_enabled" nfs kernel module parameter can be used to
disable and enable the ability to use LOCALIO support.

CONFIG_NFS_LOCALIO enables NFS client support for LOCALIO.

Lastly, LOCALIO uses an nfsd_file to initiate all IO. To make proper
use of nfsd_file (and nfsd's filecache) its lifetime (duration before
nfsd_file_put is called) must extend until after commit, read and
write operations. So rather than immediately drop the nfsd_file
reference in nfs_local_open_fh(), that doesn't happen until
nfs_local_pgio_release() for read/write and not until
nfs_local_release_commit_data() for commit. The same applies to the
reference held on nfsd's nn->nfsd_serv. Both objects' lifetimes and
associated references are managed through calls to
nfs_to->nfsd_file_put_local().

Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Co-developed-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de> # nfs_open_local_fh
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:30 -04:00
NeilBrown
d98f722725 nfs: simplify and guarantee owner uniqueness.
I have evidence of an Linux NFS client getting NFS4ERR_BAD_SEQID to a
v4.0 LOCK request to a Linux server (which had fixed the problem with
RELEASE_LOCKOWNER bug fixed).
The LOCK request presented a "new" lock owner so there are two seq ids
in the request: that for the open file, and that for the new lock.
Given the context I am confident that the new lock owner was reported to
have the wrong seqid.  As lock owner identifiers are reused, the server
must still have a lock owner active which the client thinks is no longer
active.

I wasn't able to determine a root-cause but the simplest fix seems to be
to ensure lock owners are always unique much as open owners are (thanks
to a time stamp).  The easiest way to ensure uniqueness is with a 64bit
counter for each server.  That will never cycle (if updated once a
nanosecond the last 584 years.  A single NFS server would not handle
open/lock requests nearly that fast, and a Linux node is unlikely to
have an uptime approaching that).

This patch removes the 2 ida and instead uses a per-server
atomic64_t to provide uniqueness.

Note that the lock owner already encodes the id as 64 bits even though
it is a 32bit value.  So changing to a 64bit value does not change the
encoding of the lock owner.  The open owner encoding is now 4 bytes
larger.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2024-09-23 15:03:12 -04:00
Jan Kara
2f1f31042e nfs: Block on write congestion
Commit 6df25e5853 ("nfs: remove reliance on bdi congestion")
introduced NFS-private solution for limiting number of writes
outstanding against a particular server. Unlike previous bdi congestion
this algorithm actually works and limits number of outstanding writeback
pages to nfs_congestion_kb which scales with amount of client's memory
and is capped at 256 MB. As a result some workloads such as random
buffered writes over NFS got slower (from ~170 MB/s to ~126 MB/s). The
fio command to reproduce is:

fio --direct=0 --ioengine=sync --thread --invalidate=1 --group_reporting=1
  --runtime=300 --fallocate=posix --ramp_time=10 --new_group --rw=randwrite
  --size=64256m --numjobs=4 --bs=4k --fsync_on_close=1 --end_fsync=1

This happens because the client sends ~256 MB worth of dirty pages to
the server and any further background writeback request is ignored until
the number of writeback pages gets below the threshold of 192 MB. By the
time this happens and clients decides to trigger another round of
writeback, the server often has no pages to write and the disk is idle.

To fix this problem and make the client react faster to eased congestion
of the server by blocking waiting for congestion to resolve instead of
aborting writeback. This improves the random 4k buffered write
throughput to 184 MB/s.

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-07-08 13:47:27 -04:00
Jan Kara
f8a3955083 nfs: Properly initialize server->writeback
Atomic types should better be initialized with atomic_long_set() instead
of relying on zeroing done by kzalloc(). Clean this up.

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2024-07-08 13:47:27 -04:00
Trond Myklebust
cdbd321ac1 NFS: remove unused variable nfs_rpcstat
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2024-03-09 09:14:51 -05:00
Josef Bacik
1548036ef1 nfs: make the rpc_stat per net namespace
Now that we're exposing the rpc stats on a per-network namespace basis,
move this struct into struct nfs_net and use that to make sure only the
per-network namespace stats are exposed.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2024-03-09 09:14:51 -05:00
Al Viro
c1b967d03c nfs: fix UAF on pathwalk running into umount
NFS ->d_revalidate(), ->permission() and ->get_link() need to access
some parts of nfs_server when called in RCU mode:
	server->flags
	server->caps
	*(server->io_stats)
and, worst of all, call
	server->nfs_client->rpc_ops->have_delegation
(the last one - as NFS_PROTO(inode)->have_delegation()).  We really
don't want to RCU-delay the entire nfs_free_server() (it would have
to be done with schedule_work() from RCU callback, since it can't
be made to run from interrupt context), but actual freeing of
nfs_server and ->io_stats can be done via call_rcu() just fine.
nfs_client part is handled simply by making nfs_free_client() use
kfree_rcu().

Acked-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2024-02-25 02:10:32 -05:00
Trond Myklebust
537935f72e NFS/pNFS: Set the connect timeout for the pNFS flexfiles driver
Ensure that the connect timeout for the pNFS flexfiles driver is of the
same order as the I/O timeout, so that we can fail over quickly when
trying to read from a data server that is down.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2023-08-24 13:24:15 -04:00
Benjamin Coddington
e901f17b07 NFS: Don't cleanup sysfs superblock entry if uninitialized
Its possible to end up in nfs_free_server() before the server's superblock
sysfs entry has been initialized, in which case calling kobject_put() will
emit a WARNING.  Check if the kobject has been initialized before cleaning
it up.

Fixes: 1c7251187d ("NFS: add superblock sysfs entries")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-06-29 14:25:35 -04:00
Benjamin Coddington
d97c058977 NFS: add a sysfs link to the lockd rpc_client
After lockd is started, add a symlink for lockd's rpc_client under
NFS' superblock sysfs.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-06-19 15:06:07 -04:00
Benjamin Coddington
e13b549319 NFS: Add sysfs links to sunrpc clients for nfs_clients
For the general and state management nfs_client under each mount, create
symlinks to their respective rpc_client sysfs entries.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-06-19 15:04:13 -04:00
Benjamin Coddington
1c7251187d NFS: add superblock sysfs entries
Create a sysfs directory for each mount that corresponds to the mount's
nfs_server struct.  As the mount is being constructed, use the name
"server-n", but rename it to the "MAJOR:MINOR" of the mount after assigning
a device_id. The rename approach allows us to populate the mount's directory
with links to the various rpc_client objects during the mount's
construction.  The naming convention (MAJOR:MINOR) can be used to reference
a particular NFS mount's sysfs tree.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-06-19 15:03:48 -04:00
Chuck Lever
c8407f2e56 NFS: Add an "xprtsec=" NFS mount option
After some discussion, we decided that controlling transport layer
security policy should be separate from the setting for the user
authentication flavor. To accomplish this, add a new NFS mount
option to select a transport layer security policy for RPC
operations associated with the mount point.

  xprtsec=none     - Transport layer security is forced off.

  xprtsec=tls      - Establish an encryption-only TLS session. If
                     the initial handshake fails, the mount fails.
                     If TLS is not available on a reconnect, drop
                     the connection and try again.

  xprtsec=mtls     - Both sides authenticate and an encrypted
                     session is created. If the initial handshake
                     fails, the mount fails. If TLS is not available
                     on a reconnect, drop the connection and try
                     again.

To support client peer authentication (mtls), the handshake daemon
will have configurable default authentication material (certificate
or pre-shared key). In the future, mount options can be added that
can provide this material on a per-mount basis.

Updates to mount.nfs (to support xprtsec=auto) and nfs(5) will be
sent under separate cover.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-06-19 12:30:17 -04:00
Chuck Lever
6c0a8c5fcf NFS: Have struct nfs_client carry a TLS policy field
The new field is used to match struct nfs_clients that have the same
TLS policy setting.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2023-06-19 12:29:23 -04:00
Kees Cook
cf0d7e7f45 NFS: Avoid memcpy() run-time warning for struct sockaddr overflows
The 'nfs_server' and 'mount_server' structures include a union of
'struct sockaddr' (with the older 16 bytes max address size) and
'struct sockaddr_storage' which is large enough to hold all the
supported sa_family types (128 bytes max size). The runtime memcpy()
buffer overflow checker is seeing attempts to write beyond the 16
bytes as an overflow, but the actual expected size is that of 'struct
sockaddr_storage'. Plumb the use of 'struct sockaddr_storage' more
completely through-out NFS, which results in adjusting the memcpy()
buffers to the correct union members. Avoids this false positive run-time
warning under CONFIG_FORTIFY_SOURCE:

  memcpy: detected field-spanning write (size 28) of single field "&ctx->nfs_server.address" at fs/nfs/namespace.c:178 (size 16)

Reported-by: kernel test robot <yujie.liu@intel.com>
Link: https://lore.kernel.org/all/202210110948.26b43120-yujie.liu@intel.com
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <anna@kernel.org>
Cc: linux-nfs@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-10-27 15:52:10 -04:00
Anna Schumaker
940261a195 NFS: Allow setting rsize / wsize to a multiple of PAGE_SIZE
Previously, we required this to value to be a power of 2 for UDP related
reasons. This patch keeps the power of 2 rule for UDP but allows more
flexibility for TCP and RDMA.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-07-12 10:53:10 -04:00
Olga Kornievskaia
a43bf60444 NFSv4.1 provide mount option to toggle trunking discovery
Introduce a new mount option -- trunkdiscovery,notrunkdiscovery -- to
toggle whether or not the client will engage in actively discovery
of trunking locations.

v2 make notrunkdiscovery default

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Fixes: 1976b2b314 ("NFSv4.1 query for fs_location attr on a new file system")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2022-03-21 10:36:49 -04:00
Trond Myklebust
468d126dab NFS: Fix initialisation of nfs_client cl_flags field
For some long forgotten reason, the nfs_client cl_flags field is
initialised in nfs_get_client() instead of being initialised at
allocation time. This quirk was harmless until we moved the call to
nfs_create_rpc_client().

Fixes: dd99e9f98f ("NFSv4: Initialise connection to the server in nfs4_alloc_client()")
Cc: stable@vger.kernel.org # 4.8.x
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-02-08 09:13:49 -05:00
Linus Torvalds
0280e3c58f NFS Client Updates for Linux 5.17
- New Features:
   - Basic handling for case insensitive filesystems
   - Initial support for fs_locations and server trunking
 
 - Bugfixes and Cleanups:
   - Cleanups to how the "struct cred *" is handled for the nfs_access_entry
   - Ensure the server has an up to date ctimes before hardlinking or renaming
   - Update 'blocks used' after writeback, fallocate, and clone
   - nfs_atomic_open() fixes
   - Improvements to sunrpc tracing
   - Various null check & indenting related cleanups
   - Some improvements to the sunrpc sysfs code
     - Use default_groups in kobj_type
     - Fix some potential races and reference leaks
   - A few tracepoint cleanups in xprtrdma
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAmHodsIACgkQ18tUv7Cl
 QOt3xQ//c9JPmMJZZoZtaD5UrHg28iyxaJpOUUpwC/jxQhLOETCf+nU1cELYgLq5
 4W06NBYEmjDJ/tihUvcGMKLvbCtQR9Zl9HepFKDTLTQpGmRFD4enwSmMNvW/AV+h
 I7PoN6J1DX/TZ5InOHH9asyoC2MjwrNHMn3bbQVT0qy+i3T76zJiBF79eWTnPR48
 kKPnF1I0p4LKGJy+y+y/z2mdCsz7tzFkhssxVhot0nafxXzbUOp1H9aiwxroRiUC
 ljbBA0TX8FWkGpGFt3y2QK2fMD7ovDpRhLFYiJClmeERXJVH5mXL9O5XfN5AL0xe
 W/QqT5lbWfeHLkpm2j87yTyaHASC7hGKsAyPD0zWLDcNZws61l1Sy4BHymSE5ZVh
 zt7sJjBnOWAtntyUGBg78G2vhBsd63GzrtcqAOlrngwA5ohJ8f32qvBQGyw4MQu9
 75CjRcO8K8mnf9BJ6I1vYPycjkUh9RSFfNdnUEAI9ZwiTEC/hfEvH/omvEtZsNol
 jBgv2SItTkdMZlEppEL4gxuaYT2wiZf2C6Gco215iPAqLC6dudoroN6yoLk/LRd0
 OWZLl5XTr3j6m5QDm22k5CG080vl6XiAxmAFaFSLza6Q34Jmuluc0gLAZZxvqXk9
 Ay7dQt9PQQk6mXD5Hreb0E5N9zcm2LkfvWpyGJ7mTV7sSHjA2DU=
 =wcVT
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.17-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client updates from Anna Schumaker:
 "New Features:

   - Basic handling for case insensitive filesystems

   - Initial support for fs_locations and server trunking

  Bugfixes and Cleanups:

   - Cleanups to how the "struct cred *" is handled for the
     nfs_access_entry

   - Ensure the server has an up to date ctimes before hardlinking or
     renaming

   - Update 'blocks used' after writeback, fallocate, and clone

   - nfs_atomic_open() fixes

   - Improvements to sunrpc tracing

   - Various null check & indenting related cleanups

   - Some improvements to the sunrpc sysfs code:
      - Use default_groups in kobj_type
      - Fix some potential races and reference leaks

   - A few tracepoint cleanups in xprtrdma"

[ This should have gone in during the merge window, but didn't. The
  original pull request - sent during the merge window - had gotten
  marked as spam and discarded due missing DKIM headers in the email
  from Anna.   - Linus ]

* tag 'nfs-for-5.17-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (35 commits)
  SUNRPC: Don't dereference xprt->snd_task if it's a cookie
  xprtrdma: Remove definitions of RPCDBG_FACILITY
  xprtrdma: Remove final dprintk call sites from xprtrdma
  sunrpc: Fix potential race conditions in rpc_sysfs_xprt_state_change()
  net/sunrpc: fix reference count leaks in rpc_sysfs_xprt_state_change
  NFSv4.1 test and add 4.1 trunking transport
  SUNRPC allow for unspecified transport time in rpc_clnt_add_xprt
  NFSv4 handle port presence in fs_location server string
  NFSv4 expose nfs_parse_server_name function
  NFSv4.1 query for fs_location attr on a new file system
  NFSv4 store server support for fs_location attribute
  NFSv4 remove zero number of fs_locations entries error check
  NFSv4: nfs_atomic_open() can race when looking up a non-regular file
  NFSv4: Handle case where the lookup of a directory fails
  NFSv42: Fallocate and clone should also request 'blocks used'
  NFSv4: Allow writebacks to request 'blocks used'
  SUNRPC: use default_groups in kobj_type
  NFS: use default_groups in kobj_type
  NFS: Fix the verifier for case sensitive filesystem in nfs_atomic_open()
  NFS: Add a helper to remove case-insensitive aliases
  ...
2022-01-25 20:16:03 +02:00
Olga Kornievskaia
1976b2b314 NFSv4.1 query for fs_location attr on a new file system
Query the server for other possible trunkable locations for a given
file system on a 4.1+ mount.

v2:
-- added missing static to nfs4_discover_trunking,
reported by the kernel test robot

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2022-01-13 09:30:48 -05:00
Dave Wysochanski
a6b5a28eb5 nfs: Convert to new fscache volume/cookie API
Change the nfs filesystem to support fscache's indexing rewrite and
reenable caching in nfs.

The following changes have been made:

 (1) The fscache_netfs struct is no more, and there's no need to register
     the filesystem as a whole.

 (2) The session cookie is now an fscache_volume cookie, allocated with
     fscache_acquire_volume().  That takes three parameters: a string
     representing the "volume" in the index, a string naming the cache to
     use (or NULL) and a u64 that conveys coherency metadata for the
     volume.

     For nfs, I've made it render the volume name string as:

        "nfs,<ver>,<family>,<address>,<port>,<fsidH>,<fsidL>*<,param>[,<uniq>]"

 (3) The fscache_cookie_def is no more and needed information is passed
     directly to fscache_acquire_cookie().  The cache no longer calls back
     into the filesystem, but rather metadata changes are indicated at
     other times.

     fscache_acquire_cookie() is passed the same keying and coherency
     information as before.

 (4) fscache_enable/disable_cookie() have been removed.

     Call fscache_use_cookie() and fscache_unuse_cookie() when a file is
     opened or closed to prevent a cache file from being culled and to keep
     resources to hand that are needed to do I/O.

     If a file is opened for writing, we invalidate it with
     FSCACHE_INVAL_DIO_WRITE in lieu of doing writeback to the cache,
     thereby making it cease caching until all currently open files are
     closed.  This should give the same behaviour as the uptream code.
     Making the cache store local modifications isn't straightforward for
     NFS, so that's left for future patches.

 (5) fscache_invalidate() now needs to be given uptodate auxiliary data and
     a file size.  It also takes a flag to indicate if this was due to a
     DIO write.

 (6) Call nfs_fscache_invalidate() with FSCACHE_INVAL_DIO_WRITE on a file
     to which a DIO write is made.

 (7) Call fscache_note_page_release() from nfs_release_page().

 (8) Use a killable wait in nfs_vm_page_mkwrite() when waiting for
     PG_fscache to be cleared.

 (9) The functions to read and write data to/from the cache are stubbed out
     pending a conversion to use netfslib.

Changes
=======
ver #3:
 - Added missing =n fallback for nfs_fscache_release_file()[1][2].

ver #2:
 - Use gfpflags_allow_blocking() rather than using flag directly.
 - fscache_acquire_volume() now returns errors.
 - Remove NFS_INO_FSCACHE as it's no longer used.
 - Need to unuse a cookie on file-release, not inode-clear.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Co-developed-by: David Howells <dhowells@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Dave Wysochanski <dwysocha@redhat.com>
Acked-by: Jeff Layton <jlayton@kernel.org>
cc: Trond Myklebust <trond.myklebust@hammerspace.com>
cc: Anna Schumaker <anna.schumaker@netapp.com>
cc: linux-nfs@vger.kernel.org
cc: linux-cachefs@redhat.com
Link: https://lore.kernel.org/r/202112100804.nksO8K4u-lkp@intel.com/ [1]
Link: https://lore.kernel.org/r/202112100957.2oEDT20W-lkp@intel.com/ [2]
Link: https://lore.kernel.org/r/163819668938.215744.14448852181937731615.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/163906979003.143852.2601189243864854724.stgit@warthog.procyon.org.uk/ # v2
Link: https://lore.kernel.org/r/163967182112.1823006.7791504655391213379.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/164021575950.640689.12069642327533368467.stgit@warthog.procyon.org.uk/ # v4
2022-01-10 11:53:25 +00:00
Anna Schumaker
2ef61e0eaa NFS: Remove the nfs4_label from the nfs4_getattr_res
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-11-05 14:54:39 -04:00
Anna Schumaker
5fe1210d25 NFS: Unexport nfs_probe_fsinfo()
All the callers are now in client.c so we can remove the
EXPORT_SYMBOL_GPL() and make it static.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-20 18:09:55 -04:00
Anna Schumaker
4d4cf8d2d6 NFS: Replace calls to nfs_probe_fsinfo() with nfs_probe_server()
Clean up. There are a few places where we want to probe the server, but
don't actually care about the fsinfo result. Change these to use
nfs_probe_server(), which handles the fattr allocation for us.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-20 18:09:54 -04:00
Anna Schumaker
e5731131fb NFS: Move nfs_probe_destination() into the generic client
And rename it to nfs_probe_server(). I also change it to take the nfs_fh
as an argument so callers can choose what filehandle to probe.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-10-20 18:09:54 -04:00
Olga Kornievskaia
dc48e0abee SUNRPC enforce creation of no more than max_connect xprts
If we are adding new transports via rpc_clnt_test_and_add_xprt()
then check if we've reached the limit. Currently only pnfs path
adds transports via that function but this is done in
preparation when the client would add new transports when
session trunking is detected. A warning is logged if the
limit is reached.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-08-27 16:37:29 -04:00
Olga Kornievskaia
7e134205f6 NFSv4 introduce max_connect mount options
This option will control up to how many xprts can the client
establish to the server with a distinct address (that means
nconnect connections are not counted towards this new limit).
This patch is setting up nfs structures to keeep track of the
max_connect limit (does not enforce it).

The default value is kept at 1 so that no current mounts that
don't want any additional connections would be effected. The
maximum value is set at 16.

Mounts to DS are not limited to default value of 1 but instead
set to the maximum default value of 16 (NFS_MAX_TRANSPORTS).

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2021-08-27 16:37:17 -04:00
Dan Carpenter
09226e8303 NFS: Fix a potential NULL dereference in nfs_get_client()
None of the callers are expecting NULL returns from nfs_get_client() so
this code will lead to an Oops.  It's better to return an error
pointer.  I expect that this is dead code so hopefully no one is
affected.

Fixes: 31434f496a ("nfs: check hostname in nfs_get_client")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-06-03 10:14:42 -04:00
Trond Myklebust
ce62b114bb NFS: Split attribute support out from the server capabilities
There are lots of attributes, and they are crowding out the bit space.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-14 09:36:29 -04:00
Trond Myklebust
7f08a3359a NFSv4: Add support for the NFSv4.2 "change_attr_type" attribute
The change_attr_type allows the server to provide a description of how
the change attribute will behave. This again will allow the client to
optimise its behaviour w.r.t. attribute revalidation.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-13 10:04:05 -04:00
Trond Myklebust
90ff57bf4d NFS: Fix up the support for CONFIG_NFS_DISABLE_UDP_SUPPORT
Rather than removing the support in nfs_init_timeout_values(), we should
just fix up the validation checks in the mount option parsers.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2021-04-05 09:04:20 -04:00
Sargun Dhillon
d18a9d3fa0 NFS: NFSv2/NFSv3: Use cred from fs_context during mount
There was refactoring done to use the fs_context for mounting done in:
62a55d088c: NFS: Additional refactoring for fs_context conversion

This made it so that the net_ns is fetched from the fs_context (the netns
that fsopen is called in). This change also makes it so that the credential
fetched during fsopen is used as well as the net_ns.

NFS has already had a number of changes to prepare it for user namespaces:
1a58e8a0e5: NFS: Store the credential of the mount process in the nfs_server
264d948ce7: NFS: Convert NFSv3 to use the container user namespace
c207db2f5d: NFS: Convert NFSv2 to use the container user namespace

Previously, different credentials could be used for creation of the
fs_context versus creation of the nfs_server, as FSCONFIG_CMD_CREATE did
the actual credential check, and that's where current_creds() were fetched.
This meant that the user namespace which fsopen was called in could be a
non-init user namespace. This still requires that the user that calls
FSCONFIG_CMD_CREATE has CAP_SYS_ADMIN in the init user ns.

This roughly allows a privileged user to mount on behalf of an unprivileged
usernamespace, by forking off and calling fsopen in the unprivileged user
namespace. It can then pass back that fsfd to the privileged process which
can configure the NFS mount, and then it can call FSCONFIG_CMD_CREATE
before switching back into the mount namespace of the container, and finish
up the mounting process and call fsmount and move_mount.

Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Tested-by: Alban Crequy <alban.crequy@gmail.com>
Fixes: 62a55d088c ("NFS: Additional refactoring for fs_context conversion")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-12-02 14:05:54 -05:00
Trond Myklebust
1a34c8c9a4 NFS: Support larger readdir buffers
Support readdir buffers of up to 1MB in size so that we can read
large directories using few RPC calls.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Tested-by: Dave Wysochanski <dwysocha@redhat.com>
2020-12-02 14:05:52 -05:00
Frank van der Linden
b78ef845c3 NFSv4.2: query the server for extended attribute support
Query the server for extended attribute support, and record it
as the NFS_CAP_XATTR flag in the server capabilities.

Signed-off-by: Frank van der Linden <fllinden@amazon.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-07-13 17:52:45 -04:00
Frank van der Linden
04a5da690e NFSv4.2: define limits and sizes for user xattr handling
Set limits for extended attributes (attribute value size and listxattr
buffer size), based on the fs-independent limits (XATTR_*_MAX).

Define the maximum XDR sizes for the RFC 8276 XATTR operations.
In the case of operations that carry a larger payload (SETXATTR,
GETXATTR, LISTXATTR), these exclude that payload, which is added
as separate pages, like other operations do.

Define, much like for read and write operations, the maximum overhead
sizes for get/set/listxattr, and use them to limit the maximum payload
size for those operations, in combination with the channel attributes.

Signed-off-by: Frank van der Linden <fllinden@amazon.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2020-07-13 17:52:45 -04:00
Scott Mayhew
55dee1bc0d nfs: add minor version to nfs_server_key for fscache
An NFS client that mounts multiple exports from the same NFS
server with higher NFSv4 versions disabled (i.e. 4.2) and without
forcing a specific NFS version results in fscache index cookie
collisions and the following messages:
[  570.004348] FS-Cache: Duplicate cookie detected

Each nfs_client structure should have its own fscache index cookie,
so add the minorversion to nfs_server_key.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=200145
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-02-25 13:53:24 -05:00
Olga Kornievskaia
b24ee6c64c NFS: allow deprecation of NFS UDP protocol
Add a kernel config CONFIG_NFS_DISABLE_UDP_SUPPORT to disallow NFS
UDP mounts and enable it by default.

Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-24 16:51:13 -05:00
Scott Mayhew
62a55d088c NFS: Additional refactoring for fs_context conversion
Split out from commit "NFS: Add fs_context support."

This patch adds additional refactoring for the conversion of NFS to use
fs_context, namely:

 (*) Merge nfs_mount_info and nfs_clone_mount into nfs_fs_context.
     nfs_clone_mount has had several fields removed, and nfs_mount_info
     has been removed altogether.
 (*) Various functions now take an fs_context as an argument instead
     of nfs_mount_info, nfs_fs_context, etc.

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:17 -05:00
Scott Mayhew
38465f5d1a NFS: rename nfs_fs_context pointer arg in a few functions
Split out from commit "NFS: Add fs_context support."

Rename cfg to ctx in nfs_init_server(), nfs_verify_authflavors(),
and nfs_request_mount().  No functional changes.

Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:17 -05:00
David Howells
5eb005caf5 NFS: Rename struct nfs_parsed_mount_data to struct nfs_fs_context
Rename struct nfs_parsed_mount_data to struct nfs_fs_context and rename
pointers to it to "ctx".  At some point this will be pointed to by an
fs_context struct's fs_private pointer.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:17 -05:00
Al Viro
0c38f2131d nfs: don't pass nfs_subversion to ->create_server()
pick it from mount_info

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2020-01-15 10:15:16 -05:00
Trond Myklebust
e6237b6feb NFSv4.1: Don't rebind to the same source port when reconnecting to the server
NFSv2, v3 and NFSv4 servers often have duplicate replay caches that look
at the source port when deciding whether or not an RPC call is a replay
of a previous call. This requires clients to perform strange TCP gymnastics
in order to ensure that when they reconnect to the server, they bind
to the same source port.

NFSv4.1 and NFSv4.2 have sessions that provide proper replay semantics,
that do not look at the source port of the connection. This patch therefore
ensures they can ignore the rebind requirement.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-03 21:28:45 -05:00
Trond Myklebust
52f98f1a2d NFS/pnfs: Separate NFSv3 DS and MDS traffic
If a NFSv3 server is being used as both a DS and as a regular NFSv3 server,
we may want to keep the IO traffic on a separate TCP connection, since
it will typically have very different timeout characteristics.

This patch therefore sets up a flag to separate the two modes of operation
for the nfs_client.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-03 21:28:45 -05:00
Trond Myklebust
4b1b69cedf NFS: Add a flag to tell nfs_client to set RPC_CLNT_CREATE_NOPING
Add a flag to tell the nfs_client it should set RPC_CLNT_CREATE_NOPING when
creating the rpc client.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-11-03 21:28:44 -05:00
Benjamin Coddington
9f7761cf04 NFS: Cleanup if nfs_match_client is interrupted
Don't bail out before cleaning up a new allocation if the wait for
searching for a matching nfs client is interrupted.  Memory leaks.

Reported-by: syzbot+7fe11b49c1cc30e3fce2@syzkaller.appspotmail.com
Fixes: 950a578c61 ("NFS: make nfs_match_client killable")
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-07-06 14:54:53 -04:00
Trond Myklebust
c98ebe2937 Merge branch 'multipath_tcp' 2019-07-06 14:54:52 -04:00
Trond Myklebust
53c3263071 NFS: Allow multiple connections to a NFSv2 or NFSv3 server
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2019-07-06 14:54:50 -04:00