New features and improvements:
- Sunrpc receive buffer sizes only change when establishing a GSS credentials
- Add more sunrpc tracepoints
- Improve on tracepoints to capture internal NFS I/O errors
Other bugfixes and cleanups:
- Move a dprintk() to after a call to nfs_alloc_fattr()
- Fix off-by-one issues in rpc_ntop6
- Fix a few coccicheck warnings
- Use the correct SPDX license identifiers
- Fix rpc_call_done assignment for BIND_CONN_TO_SESSION
- Replace zero-length array with flexible array
- Remove duplicate headers
- Set invalid blocks after NFSv4 writes to update space_used attribute
- Fix direct WRITE throughput regression
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEnZ5MQTpR7cLU7KEp18tUv7ClQOsFAl7ibyIACgkQ18tUv7Cl
QOsOHBAA1A1stYld0gOhKZtMqxRJi3fnJ5mgroLGtyVQe8uAjpD8Ib1oRleC4MJq
ifpYPozIhMZQCvDiGTAKJ8629OYiXGrN8D5nV6Y2tEGpu5wYv98MyZlU9Y8rVzCP
5vsIMUp5XH8y2wYO8k7fDPPxWNH9Ax89wz5OI16mZxgY/LDm4ojZq+pGbYnWZa4w
oK6Efa66z7yQkPV8oIWuvLe1zZYWGAPibBEwJbrvUWyfygB3owI36sc6nuiEQM+4
hD3h5UtVn8BnudUqvLLa21rnQROMFpgYf4Q/2A1UaNfyRAPoPXMztECBSEYXO0L4
saiMc5o/yTTBCC0ZjV1F+xuGQzMgSQ83KOdbr+a+upvBeFpBynJxccdvMTDEam+q
rl7Ypdc42CsTZ1aVWG/AoIk6GENzR0tXqNR6BcDjYG/yRWvnt/RIZlp6G67IbtRH
b9we+3MbI/lTBoCFGahkkBYO3elTNwilxH3pWcRi8ehNn0GPjlLqHePR17Tmq1tL
QycDlm7QB1m5xNsOOLaBoB4SyguPV0SBprZJ4yYU1B3KC3bGurZVK3+TSLXQrO9V
12RLDt4AOGr0TlctBIhNbkGp8xHY6Dg7HgbdjdrVq8Y9YCfg0C37789BnZA5nVxF
4L101lsTI0puymh+MwmhiyOvCldn30f+MjuWJSm17Id+eRIxYj4=
=a84h
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-5.8-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client updates from Anna Schumaker:
"New features and improvements:
- Sunrpc receive buffer sizes only change when establishing a GSS credentials
- Add more sunrpc tracepoints
- Improve on tracepoints to capture internal NFS I/O errors
Other bugfixes and cleanups:
- Move a dprintk() to after a call to nfs_alloc_fattr()
- Fix off-by-one issues in rpc_ntop6
- Fix a few coccicheck warnings
- Use the correct SPDX license identifiers
- Fix rpc_call_done assignment for BIND_CONN_TO_SESSION
- Replace zero-length array with flexible array
- Remove duplicate headers
- Set invalid blocks after NFSv4 writes to update space_used attribute
- Fix direct WRITE throughput regression"
* tag 'nfs-for-5.8-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (27 commits)
NFS: Fix direct WRITE throughput regression
SUNRPC: rpc_xprt lifetime events should record xprt->state
xprtrdma: Make xprt_rdma_slot_table_entries static
nfs: set invalid blocks after NFSv4 writes
NFS: remove redundant initialization of variable result
sunrpc: add missing newline when printing parameter 'auth_hashtable_size' by sysfs
NFS: Add a tracepoint in nfs_set_pgio_error()
NFS: Trace short NFS READs
NFS: nfs_xdr_status should record the procedure name
SUNRPC: Set SOFTCONN when destroying GSS contexts
SUNRPC: rpc_call_null_helper() should set RPC_TASK_SOFT
SUNRPC: rpc_call_null_helper() already sets RPC_TASK_NULLCREDS
SUNRPC: trace RPC client lifetime events
SUNRPC: Trace transport lifetime events
SUNRPC: Split the xdr_buf event class
SUNRPC: Add tracepoint to rpc_call_rpcerror()
SUNRPC: Update the RPC_SHOW_SOCKET() macro
SUNRPC: Update the rpc_show_task_flags() macro
SUNRPC: Trace GSS context lifetimes
SUNRPC: receive buffer size estimation values almost never change
...
I measured a 50% throughput regression for large direct writes.
The observed on-the-wire behavior is that the client sends every
NFS WRITE twice: once as an UNSTABLE WRITE plus a COMMIT, and once
as a FILE_SYNC WRITE.
This is because the nfs_write_match_verf() check in
nfs_direct_commit_complete() fails for every WRITE.
Buffered writes use nfs_write_completion(), which sets req->wb_verf
correctly. Direct writes use nfs_direct_write_completion(), which
does not set req->wb_verf at all. This leaves req->wb_verf set to
all zeroes for every direct WRITE, and thus
nfs_direct_commit_completion() always sets NFS_ODIRECT_RESCHED_WRITES.
This fix appears to restore nearly all of the lost performance.
Fixes: 1f28476dcb ("NFS: Fix O_DIRECT commit verifier handling")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Use the following command to test nfsv4(size of file1M is 1MB):
mount -t nfs -o vers=4.0,actimeo=60 127.0.0.1/dir1 /mnt
cp file1M /mnt
du -h /mnt/file1M -->0 within 60s, then 1M
When write is done(cp file1M /mnt), will call this:
nfs_writeback_done
nfs4_write_done
nfs4_write_done_cb
nfs_writeback_update_inode
nfs_post_op_update_inode_force_wcc_locked(change, ctime, mtime
nfs_post_op_update_inode_force_wcc_locked
nfs_set_cache_invalid
nfs_refresh_inode_locked
nfs_update_inode
nfsd write response contains change, ctime, mtime, the flag will be
clear after nfs_update_inode. Howerver, write response does not contain
space_used, previous open response contains space_used whose value is 0,
so inode->i_blocks is still 0.
nfs_getattr -->called by "du -h"
do_update |= force_sync || nfs_attribute_cache_expired -->false in 60s
cache_validity = READ_ONCE(NFS_I(inode)->cache_validity)
do_update |= cache_validity & (NFS_INO_INVALID_ATTR -->false
if (do_update) {
__nfs_revalidate_inode
}
Within 60s, does not send getattr request to nfsd, thus "du -h /mnt/file1M"
is 0.
Add a NFS_INO_INVALID_BLOCKS flag, set it when nfsv4 write is done.
Fixes: 16e1437517 ("NFS: More fine grained attribute tracking")
Signed-off-by: Zheng Bin <zhengbin13@huawei.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
The variable result is being initialized with a value that is never read
and it is being updated later with a new value. The initialization is
redundant and can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
A short read can generate an -EIO error without there being an error
on the wire. This tracepoint acts as an eyecatcher when there is no
obvious I/O error.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
When sunrpc trace points are not enabled, the recorded task ID
information alone is not helpful.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
After an NFS page has been written it is considered "unstable" until a
COMMIT request succeeds. If the COMMIT fails, the page will be
re-written.
These "unstable" pages are currently accounted as "reclaimable", either
in WB_RECLAIMABLE, or in NR_UNSTABLE_NFS which is included in a
'reclaimable' count. This might have made sense when sending the COMMIT
required a separate action by the VFS/MM (e.g. releasepage() used to
send a COMMIT). However now that all writes generated by ->writepages()
will automatically be followed by a COMMIT (since commit 919e3bd9a8
("NFS: Ensure we commit after writeback is complete")) it makes more
sense to treat them as writeback pages.
So this patch removes NR_UNSTABLE_NFS and accounts unstable pages in
NR_WRITEBACK and WB_WRITEBACK.
A particular effect of this change is that when
wb_check_background_flush() calls wb_over_bg_threshold(), the latter
will report 'true' a lot less often as the 'unstable' pages are no
longer considered 'dirty' (as there is nothing that writeback can do
about them anyway).
Currently wb_check_background_flush() will trigger writeback to NFS even
when there are relatively few dirty pages (if there are lots of unstable
pages), this can result in small writes going to the server (10s of
Kilobytes rather than a Megabyte) which hurts throughput. With this
patch, there are fewer writes which are each larger on average.
Where the NR_UNSTABLE_NFS count was included in statistics
virtual-files, the entry is retained, but the value is hard-coded as
zero. static trace points and warning printks which mentioned this
counter no longer report it.
[akpm@linux-foundation.org: re-layout comment]
[akpm@linux-foundation.org: fix printk warning]
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Acked-by: Michal Hocko <mhocko@suse.com> [mm]
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chuck Lever <chuck.lever@oracle.com>
Link: http://lkml.kernel.org/r/87d06j7gqa.fsf@notabene.neil.brown.name
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Remove duplicate headers which are included twice.
Signed-off-by: Chen Zhou <chenzhou10@huawei.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Fixes: 02a95dee8c ("NFS add callback_ops to nfs4_proc_bind_conn_to_session_callback")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
This patch corrects the SPDX License Identifier style in
header file related to NFS Client support.
For C header files Documentation/process/license-rules.rst
mandates C-like comments (opposed to C source files where
C++ style should be used).
Changes made by using a script provided by Joe Perches here:
https://lkml.org/lkml/2019/2/7/46.
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
In nfs3_proc_lookup, if nfs_alloc_fattr fails, will only print
"NFS call lookup". This may be confusing, move dprintk after
nfs_alloc_fattr.
Signed-off-by: Xu Wang <vulab@iscas.ac.cn>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Prior to commit e3d3ab64dd66 ("SUNRPC: Use au_rslack when
computing reply buffer size"), there was enough slack in the reply
buffer to commodate filehandles of size 60bytes. However, the real
problem was that the reply buffer size for the MOUNT operation was
not correctly calculated. Received buffer size used the filehandle
size for NFSv2 (32bytes) which is much smaller than the allowed
filehandle size for the v3 mounts.
Fix the reply buffer size (decode arguments size) for the MNT command.
Fixes: 2c94b8eca1 ("SUNRPC: Use au_rslack when computing reply buffer size")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
When we're doing pnfs then the credential being used for the RPC call
is not necessarily the same as the one used in the open context, so
don't use RPC_TASK_CRED_NOREF.
Fixes: 6129650720 ("NFSv4: Avoid referencing the cred unnecessarily during NFSv4 I/O")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
We are not guaranteed that the credential will remain pinned.
Fixes: 6129650720 ("NFSv4: Avoid referencing the cred unnecessarily during NFSv4 I/O")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
(1) The reorganisation of bmap() use accidentally caused the return value
of cachefiles_read_or_alloc_pages() to get corrupted.
(2) The NFS superblock index key accidentally got changed to include a
number of kernel pointers - meaning that the key isn't matchable after
a reboot.
(3) A redundant check in nfs_fscache_get_super_cookie().
(4) The NFS change_attr sometimes set in the auxiliary data for the
caching of an file and sometimes not, which causes the cache to get
discarded when it shouldn't.
(5) There's a race between cachefiles_read_waiter() and
cachefiles_read_copier() that causes an occasional assertion failure.
We add the new state to the nfsi->open_states list, making it
potentially visible to other threads, before we've finished initializing
it.
That wasn't a problem when all the readers were also taking the i_lock
(as we do here), but since we switched to RCU, there's now a possibility
that a reader could see the partially initialized state.
Symptoms observed were a crash when another thread called
nfs4_get_valid_delegation() on a NULL inode, resulting in an oops like:
BUG: unable to handle page fault for address: ffffffffffffffb0 ...
RIP: 0010:nfs4_get_valid_delegation+0x6/0x30 [nfsv4] ...
Call Trace:
nfs4_open_prepare+0x80/0x1c0 [nfsv4]
__rpc_execute+0x75/0x390 [sunrpc]
? finish_task_switch+0x75/0x260
rpc_async_schedule+0x29/0x40 [sunrpc]
process_one_work+0x1ad/0x370
worker_thread+0x30/0x390
? create_worker+0x1a0/0x1a0
kthread+0x10c/0x130
? kthread_park+0x80/0x80
ret_from_fork+0x22/0x30
Fixes: 9ae075fdd1 "NFSv4: Convert open state lookup to use RCU"
Reviewed-by: Seiichi Ikarashi <s.ikarashi@fujitsu.com>
Tested-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Cc: stable@vger.kernel.org # v4.20+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Commit 402cb8dda9 ("fscache: Attach the index key and aux data to
the cookie") added the aux_data and aux_data_len to parameters to
fscache_acquire_cookie(), and updated the callers in the NFS client.
In the process it modified the aux_data to include the change_attr,
but missed adding change_attr to a couple places where aux_data was
used. Specifically, when opening a file and the change_attr is not
added, the following attempt to lookup an object will fail inside
cachefiles_check_object_xattr() = -116 due to
nfs_fscache_inode_check_aux() failing memcmp on auxdata and returning
FSCACHE_CHECKAUX_OBSOLETE.
Fix this by adding nfs_fscache_update_auxdata() to set the auxdata
from all relevant fields in the inode, including the change_attr.
Fixes: 402cb8dda9 ("fscache: Attach the index key and aux data to the cookie")
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Commit f2aedb713c ("NFS: Add fs_context support.") reworked
NFS mount code paths for fs_context support which included
super_block initialization. In the process there was an extra
return left in the code and so we never call
nfs_fscache_get_super_cookie even if 'fsc' is given on as mount
option. In addition, there is an extra check inside
nfs_fscache_get_super_cookie for the NFS_OPTION_FSCACHE which
is unnecessary since the only caller nfs_get_cache_cookie
checks this flag.
Fixes: f2aedb713c ("NFS: Add fs_context support.")
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Commit 402cb8dda9 ("fscache: Attach the index key and aux data to
the cookie") added the index_key and index_key_len parameters to
fscache_acquire_cookie(), and updated the callers in the NFS client.
One of the callers was inside nfs_fscache_get_super_cookie()
and was changed to use the full struct nfs_fscache_key as the
index_key. However, a couple members of this structure contain
pointers and thus will change each time the same NFS share is
remounted. Since index_key is used for fscache_cookie->key_hash
and this subsequently is used to compare cookies, the effectiveness
of fscache with NFS is reduced to the point at which a umount
occurs. Any subsequent remount of the same share will cause a
unique NFS super_block index_key and key_hash to be generated for
the same data, rendering any prior fscache data unable to be
found. A simple reproducer demonstrates the problem.
1. Mount share with 'fsc', create a file, drop page cache
systemctl start cachefilesd
mount -o vers=3,fsc 127.0.0.1:/export /mnt
dd if=/dev/zero of=/mnt/file1.bin bs=4096 count=1
echo 3 > /proc/sys/vm/drop_caches
2. Read file into page cache and fscache, then unmount
dd if=/mnt/file1.bin of=/dev/null bs=4096 count=1
umount /mnt
3. Remount and re-read which should come from fscache
mount -o vers=3,fsc 127.0.0.1:/export /mnt
echo 3 > /proc/sys/vm/drop_caches
dd if=/mnt/file1.bin of=/dev/null bs=4096 count=1
4. Check for READ ops in mountstats - there should be none
grep READ: /proc/self/mountstats
Looking at the history and the removed function, nfs_super_get_key(),
we should only use nfs_fscache_key.key plus any uniquifier, for
the fscache index_key.
Fixes: 402cb8dda9 ("fscache: Attach the index key and aux data to the cookie")
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
The struct nfs_server gets put on the cl_superblocks list before
the server->super field has been initialised, in which case the
call to nfs_sb_active() will Oops. Add a check to ensure that
we skip such a list entry.
Fixes: 3c9e502b59 ("NFS: Add a helper nfs_client_for_each_server()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Currently, if the client sends BIND_CONN_TO_SESSION with
NFS4_CDFC4_FORE_OR_BOTH but only gets NFS4_CDFS4_FORE back it ignores
that it wasn't able to enable a backchannel.
To make sure, the client sends BIND_CONN_TO_SESSION as the first
operation on the connections (ie., no other session compounds haven't
been sent before), and if the client's request to bind the backchannel
is not satisfied, then reset the connection and retry.
Cc: stable@vger.kernel.org
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
nfs4_proc_layoutget() invokes rpc_run_task(), which return the value to
"task". Since rpc_run_task() is impossible to return an ERR pointer,
there is no need to add the IS_ERR() condition on "task" here. So we
need to remove it.
Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Xin Tan <tanxin.ctf@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
nfs3_set_acl keeps track of the acl it allocated locally to determine if an acl
needs to be released at the end. This results in a memory leak when the
function allocates an acl as well as a default acl. Fix by releasing acls
that differ from the acl originally passed into nfs3_set_acl.
Fixes: b7fa0554cf ("[PATCH] NFS: Add support for NFSv3 ACLs")
Reported-by: Xiyu Yang <xiyuyang19@fudan.edu.cn>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If the credential returned by pnfs_prepare_layoutreturn()
does not match the credential of the RPC call, then we do
end up calling pnfs_send_layoutreturn() with that credential,
so don't free it!
Fixes: 44ea8dfce0 ("NFS/pnfs: Reference the layout cred in pnfs_prepare_layoutreturn()")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
We require that any outstanding layout return completes before we can
free up the inode so that the layout itself can be freed.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
We need to drop the inode spinlock while calling nfs4_select_rw_stateid(),
since nfs4_copy_delegation_stateid() could take the delegation lock.
Note that it is safe to do this, since all other calls to
pnfs_update_layout() for that inode will find themselves blocked by
the lock we hold on NFS_LAYOUT_FIRST_LAYOUTGET.
Fixes: fc51b1cf39 ("NFS: Beware when dereferencing the delegation cred")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Another brown paper bag moment. pnfs_alloc_ds_commits_list() is leaking
the RCU lock.
Fixes: a9901899b6 ("pNFS: Add infrastructure for cleaning up per-layout commit structures")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Highlights include:
Stable fixes:
- Fix a page leak in nfs_destroy_unlinked_subrequests()
- Fix use-after-free issues in nfs_pageio_add_request()
- Fix new mount code constant_table array definitions
- finish_automount() requires us to hold 2 refs to the mount record
Features:
- Improve the accuracy of telldir/seekdir by using 64-bit cookies when
possible.
- Allow one RDMA active connection and several zombie connections to
prevent blocking if the remote server is unresponsive.
- Limit the size of the NFS access cache by default
- Reduce the number of references to credentials that are taken by NFS
- pNFS files and flexfiles drivers now support per-layout segment
COMMIT lists.
- Enable partial-file layout segments in the pNFS/flexfiles driver.
- Add support for CB_RECALL_ANY to the pNFS flexfiles layout type
- pNFS/flexfiles Report NFS4ERR_DELAY and NFS4ERR_GRACE errors from
the DS using the layouterror mechanism.
Bugfixes and cleanups:
- SUNRPC: Fix krb5p regressions
- Don't specify NFS version in "UDP not supported" error
- nfsroot: set tcp as the default transport protocol
- pnfs: Return valid stateids in nfs_layout_find_inode_by_stateid()
- alloc_nfs_open_context() must use the file cred when available
- Fix locking when dereferencing the delegation cred
- Fix memory leaks in O_DIRECT when nfs_get_lock_context() fails
- Various clean ups of the NFS O_DIRECT commit code
- Clean up RDMA connect/disconnect
- Replace zero-length arrays with C99-style flexible arrays
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAl6LhhsACgkQZwvnipYK
APIOJxAAiQOgmIg1CV4mrlcVhkwy09N5JAia6AENtoTmwm08nAYg5Y8REb9uX46a
/MJsM2WG8hBCgI6eYmRY8LTr4Ft9rTQEJM9DRMuwQREXwMWwBhUv/QakCeqY1lHE
lyB1z4hj5XKeUoN/OcfALC/GXFFf56A0UyN05nMzeCkBTdd3+qu+hW8Ge1wkAXcr
f0pyLbzdFZlJuTmI4tr8F93g9p3ezuFBuEroT7XPIVJylAdZVumHqnOnz/Mvb99x
rNTsX2dc44GhSAfRnTzPumU3MT6BOLvUzNH1xzdiqKzJrbOnG8WjFodrGr3JWpfp
HkeyYQxJ+Hnfb2LiZBjvMQE8M7kVMZ1jVbrGJEbCxfSqgTly8lOHboqAeKsFaReK
LStnusizdA1LHQVZxPdvn+oL49RDxnzm9dY+DkrXK1qT0GE+icN1CyTyLLfkSCp8
tYvZSJ/qPk5BNZegqH1nBqXkMDkOJ4eEA7+luXDmajRkdRrZ3IWY2M1DpMEoueJ2
j/zoj/NFr1oErU4o7PV9oolA1Euhn1L3wIDuzsbVtjySmbXJNQTtaVVRFpGw3SsZ
7rbqi4BB0SzOooNhQ4q8mLNi4qT7bl/3D04eL8UVzEM73plexhQ8XiOEz/VrIRX7
L9viXH49g4DHQ0rZIaWefxFueqpgbNvQwnlLZl2uQotG9hwhTts=
=YUcP
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-5.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Highlights include:
Stable fixes:
- Fix a page leak in nfs_destroy_unlinked_subrequests()
- Fix use-after-free issues in nfs_pageio_add_request()
- Fix new mount code constant_table array definitions
- finish_automount() requires us to hold 2 refs to the mount record
Features:
- Improve the accuracy of telldir/seekdir by using 64-bit cookies
when possible.
- Allow one RDMA active connection and several zombie connections to
prevent blocking if the remote server is unresponsive.
- Limit the size of the NFS access cache by default
- Reduce the number of references to credentials that are taken by
NFS
- pNFS files and flexfiles drivers now support per-layout segment
COMMIT lists.
- Enable partial-file layout segments in the pNFS/flexfiles driver.
- Add support for CB_RECALL_ANY to the pNFS flexfiles layout type
- pNFS/flexfiles Report NFS4ERR_DELAY and NFS4ERR_GRACE errors from
the DS using the layouterror mechanism.
Bugfixes and cleanups:
- SUNRPC: Fix krb5p regressions
- Don't specify NFS version in "UDP not supported" error
- nfsroot: set tcp as the default transport protocol
- pnfs: Return valid stateids in nfs_layout_find_inode_by_stateid()
- alloc_nfs_open_context() must use the file cred when available
- Fix locking when dereferencing the delegation cred
- Fix memory leaks in O_DIRECT when nfs_get_lock_context() fails
- Various clean ups of the NFS O_DIRECT commit code
- Clean up RDMA connect/disconnect
- Replace zero-length arrays with C99-style flexible arrays"
* tag 'nfs-for-5.7-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (86 commits)
NFS: Clean up process of marking inode stale.
SUNRPC: Don't start a timer on an already queued rpc task
NFS/pnfs: Reference the layout cred in pnfs_prepare_layoutreturn()
NFS/pnfs: Fix dereference of layout cred in pnfs_layoutcommit_inode()
NFS: Beware when dereferencing the delegation cred
NFS: Add a module parameter to set nfs_mountpoint_expiry_timeout
NFS: finish_automount() requires us to hold 2 refs to the mount record
NFS: Fix a few constant_table array definitions
NFS: Try to join page groups before an O_DIRECT retransmission
NFS: Refactor nfs_lock_and_join_requests()
NFS: Reverse the submission order of requests in __nfs_pageio_add_request()
NFS: Clean up nfs_lock_and_join_requests()
NFS: Remove the redundant function nfs_pgio_has_mirroring()
NFS: Fix memory leaks in nfs_pageio_stop_mirroring()
NFS: Fix a request reference leak in nfs_direct_write_clear_reqs()
NFS: Fix use-after-free issues in nfs_pageio_add_request()
NFS: Fix races nfs_page_group_destroy() vs nfs_destroy_unlinked_subrequests()
NFS: Fix a page leak in nfs_destroy_unlinked_subrequests()
NFS: Remove unused FLUSH_SYNC support in nfs_initiate_pgio()
pNFS/flexfiles: Specify the layout segment range in LAYOUTGET
...
Instead of the various open coded calls to set the NFS_INO_STALE bit
and call nfs_zap_caches(), consolidate them into a single function
nfs_set_inode_stale().
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
- Fix EXCHANGE_ID response when NFSD runs in a container
- A battery of new static trace points
- Socket transports now use bio_vec to send Replies
- NFS/RDMA now supports filesystems with no .splice_read method
- Favor memcpy() over DMA mapping for small RPC/RDMA Replies
- Add pre-requisites for supporting multiple Write chunks
- Numerous minor fixes and clean-ups
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABAgAGBQJegj9pAAoJEDNqszNvZn+XNGgP/RsRul/UGe70YoPS6AwxI+c1
2JVni5LV83aVGSN1df/xRNdugWh4j8e8stBIJPCnWFzUERFvrzVeVyW0/dlIy37l
SRL1L62EzFUejAL45O+CkF5+KI2kAWMgDCv+rPnFnIuXVa/sThx63F1AJikVMPjB
7We3vd5Kh/CrMeMflebJYuY12xE6di2b3ifkZRO0/yuMaAuqJrDreYf4L6xpA4rC
QnKQcNl7LGlOwGSI2WvDrCLE056PJFhTuzTawI80NKnkXMMFNc6/7NoXJqasVlHG
fiki2mHbJrbYd8isIm3Vl/QkFsM8QjijtpVxC9gd151w0P7DfpMYmSzlZL7nvq/R
Nt6IIqbaxWSS1VULsuS7rDtBwwZpW/LRWaUhEvMwimR2jeOxcwtlDVTX/dRH2mxq
Ume64Hn8xMEhhx9tHCPQ+Rgjqv5m+ZEAvmV6B7RM9nT2z2MSzQQESeMB14VZZmF/
2oH1dDCVdCmb4ZOcD5yxL6Y1hijn45s+YHdts9uIsCudKYPI906vPhogFC+PVJv+
MrOiUf8d40H0ra8VAUFCjAceOulkv90aLhBjoHbPsP4SQOTsRuUXnsKESZpSHY72
nT/uPM23ULv4kQ6tHB8yQ3ordjCBRgb4zIKtotc3Wpi7dhO8u6ptPj4soiflRShO
8/3N5dYfqdt9FRyr7Z8/
=o5G0
-----END PGP SIGNATURE-----
Merge tag 'nfsd-5.7' of git://git.linux-nfs.org/projects/cel/cel-2.6
Pull nfsd updates from Chuck Lever:
- Fix EXCHANGE_ID response when NFSD runs in a container
- A battery of new static trace points
- Socket transports now use bio_vec to send Replies
- NFS/RDMA now supports filesystems with no .splice_read method
- Favor memcpy() over DMA mapping for small RPC/RDMA Replies
- Add pre-requisites for supporting multiple Write chunks
- Numerous minor fixes and clean-ups
[ Chuck is filling in for Bruce this time while he and his family settle
into a new house ]
* tag 'nfsd-5.7' of git://git.linux-nfs.org/projects/cel/cel-2.6: (39 commits)
svcrdma: Fix leak of transport addresses
SUNRPC: Fix a potential buffer overflow in 'svc_print_xprts()'
SUNRPC/cache: don't allow invalid entries to be flushed
nfsd: fsnotify on rmdir under nfsd/clients/
nfsd4: kill warnings on testing stateids with mismatched clientids
nfsd: remove read permission bit for ctl sysctl
NFSD: Fix NFS server build errors
sunrpc: Add tracing for cache events
SUNRPC/cache: Allow garbage collection of invalid cache entries
nfsd: export upcalls must not return ESTALE when mountd is down
nfsd: Add tracepoints for update of the expkey and export cache entries
nfsd: Add tracepoints for exp_find_key() and exp_get_by_name()
nfsd: Add tracing to nfsd_set_fh_dentry()
nfsd: Don't add locks to closed or closing open stateids
SUNRPC: Teach server to use xprt_sock_sendmsg for socket sends
SUNRPC: Refactor xs_sendpages()
svcrdma: Avoid DMA mapping small RPC Replies
svcrdma: Fix double sync of transport header buffer
svcrdma: Refactor chunk list encoders
SUNRPC: Add encoders for list item discriminators
...
When we're sending a layoutreturn, ensure that we reference the
layout cred atomically with the copy of the stateid.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
When we look up the delegation cred, we are usually doing so in
conjunction with a read of the stateid, and we want to ensure
that the look up is atomic with that read.
Fixes: 57f188e047 ("NFSv4: nfs_update_inplace_delegation() should update delegation cred")
[sfr@canb.auug.org.au: Fixed up borken Fixes: line from Trond :-)]
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Setting nfs_mountpoint_expiry_timeout() to a negative value stops
mountpoint expiration, while setting it to a positive value restarts
the scheduler.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
We must not return from nfs_d_automount() without holding 2 references
to the mount record. Doing so, will trigger the BUG() in finish_automount().
Also ensure that we don't try to reschedule the automount timer with
a negative or zero timeout value.
Fixes: 22a1ae9a93 ("NFS: If nfs_mountpoint_expiry_timeout < 0, do not expire submounts")
Cc: stable@vger.kernel.org # v5.5+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
nfs_vers_tokens, nfs_xprt_protocol_tokens, and nfs_secflavor_tokens were
all missing an empty item at the end of the array, allowing
lookup_constant() to potentially walk off the end and trigger and oops.
Reported-by: Olga Kornievskaia <aglo@umich.edu>
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Fixes: e38bb238ed ("NFS: Convert mount option parsing to use functionality from fs_parser.h")
Cc: stable@vger.kernel.org # v5.6
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Refactor nfs_lock_and_join_requests() in order to separate out the
subrequest merging into its own function nfs_lock_and_join_group()
that can be used by O_DIRECT.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If we have to split the request up into subrequests, we have to submit
the request pointed to by the function call parameter last, in case
there is an error or other issue that causes us to exit before the
last request is submitted. The reason is that the caller is expected
to perform cleanup in those cases.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Clean up nfs_lock_and_join_requests() to simplify the calculation
of the range covered by the page group, taking into account the
presence of mirrors.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
We need to trust that desc->pg_mirror_idx is set correctly, whether
or not mirroring is enabled.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
If we just set the mirror count to 1 without first clearing out
the mirrors, we can leak queued up requests.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
nfs_direct_write_scan_commit_list() will lock the request and bump
the reference count, but we also need to account for the reference
that was taken when we initially added the request to the commit list.
Fixes: fb5f7f20cd ("NFS: commit errors should be fatal")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
We need to ensure that we create the mirror requests before calling
nfs_pageio_add_request_mirror() on the request we are adding.
Otherwise, we can end up with a use-after-free if the call to
nfs_pageio_add_request_mirror() triggers I/O.
Fixes: c917cfaf9b ("NFS: Fix up NFS I/O subrequest creation")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
When a subrequest is being detached from the subgroup, we want to
ensure that it is not holding the group lock, or in the process
of waiting for the group lock.
Fixes: 5b2b5187fa ("NFS: Fix nfs_page_group_destroy() and nfs_lock_and_join_requests() race cases")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
When we detach a subrequest from the list, we must also release the
reference it holds to the parent.
Fixes: 5b2b5187fa ("NFS: Fix nfs_page_group_destroy() and nfs_lock_and_join_requests() race cases")
Cc: stable@vger.kernel.org # v4.14+
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAl6Ch6wUHHBhdWxAcGF1
bC1tb29yZS5jb20ACgkQ6iDy2pc3iXPdcg/9FDMS/n0Xl1HQBUyu26EwLu3aUpNE
BdghXW1LKSTp7MrOENE60PGzZSAiC07ci1DqFd7zfLPZf2q5IwPwOBa/Avy8z95V
oHKqcMT6WO1SPOm/PxZn16FCKyET4gZDTXvHBAyiyFsbk36R522ZY615P9T3eLu/
ZA1NFsSjj68SqMCUlAWfeqjcbQiX63bryEpugOIg0qWy7R/+rtWxj9TjriZ+v9tq
uC45UcjBqphpmoPG8BifA3jjyclwO3DeQb5u7E8//HPPraGeB19ntsymUg7ljoGk
NrqCkZtv6E+FRCDTR5f0O7M1T4BWJodxw2NwngnTwKByLC25EZaGx80o+VyMt0eT
Pog+++JZaa5zZr2OYOtdlPVMLc2ALL6p/8lHOqFU3GKfIf04hWOm6/Lb2IWoXs3f
CG2b6vzoXYyjbF0Q7kxadb8uBY2S1Ds+CVu2HMBBsXsPdwbbtFWOT/6aRAQu61qn
PW+f47NR8By3SO6nMzWts2SZEERZNIEdSKeXHuR7As1jFMXrHLItreb4GCSPay5h
2bzRpxt2br5CDLh7Jv2pZnHtUqBWOSbslTix77+Z/hPKaNowvD9v3tc5hX87rDmB
dYXROD6/KoyXFYDcMdphlvORFhqGqd5bEYuHHum/VjSIRh237+/nxFY/vZ4i4bzU
2fvpAmUlVX1c4rw=
=LlWA
-----END PGP SIGNATURE-----
Merge tag 'selinux-pr-20200330' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull SELinux updates from Paul Moore:
"We've got twenty SELinux patches for the v5.7 merge window, the
highlights are below:
- Deprecate setting /sys/fs/selinux/checkreqprot to 1.
This flag was originally created to deal with legacy userspace and
the READ_IMPLIES_EXEC personality flag. We changed the default from
1 to 0 back in Linux v4.4 and now we are taking the next step of
deprecating it, at some point in the future we will take the final
step of rejecting 1.
- Allow kernfs symlinks to inherit the SELinux label of the parent
directory. In order to preserve backwards compatibility this is
protected by the genfs_seclabel_symlinks SELinux policy capability.
- Optimize how we store filename transitions in the kernel, resulting
in some significant improvements to policy load times.
- Do a better job calculating our internal hash table sizes which
resulted in additional policy load improvements and likely general
SELinux performance improvements as well.
- Remove the unused initial SIDs (labels) and improve how we handle
initial SIDs.
- Enable per-file labeling for the bpf filesystem.
- Ensure that we properly label NFS v4.2 filesystems to avoid a
temporary unlabeled condition.
- Add some missing XFS quota command types to the SELinux quota
access controls.
- Fix a problem where we were not updating the seq_file position
index correctly in selinuxfs.
- We consolidate some duplicated code into helper functions.
- A number of list to array conversions.
- Update Stephen Smalley's email address in MAINTAINERS"
* tag 'selinux-pr-20200330' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: clean up indentation issue with assignment statement
NFS: Ensure security label is set for root inode
MAINTAINERS: Update my email address
selinux: avtab_init() and cond_policydb_init() return void
selinux: clean up error path in policydb_init()
selinux: remove unused initial SIDs and improve handling
selinux: reduce the use of hard-coded hash sizes
selinux: Add xfs quota command types
selinux: optimize storage of filename transitions
selinux: factor out loop body from filename_trans_read()
security: selinux: allow per-file labeling for bpffs
selinux: generalize evaluate_cond_node()
selinux: convert cond_expr to array
selinux: convert cond_av_list to array
selinux: convert cond_list to array
selinux: sel_avc_get_stat_idx should increase position index
selinux: allow kernfs symlinks to inherit parent directory context
selinux: simplify evaluate_cond_node()
Documentation,selinux: deprecate setting checkreqprot to 1
selinux: move status variables out of selinux_ss