svc threads are currently stopped using kthread_stop(). This requires
identifying a specific thread. However we don't care which thread
stops, just as long as one does.
So instead, set a flag in the svc_pool to say that a thread needs to
die, and have each thread check this flag instead of calling
kthread_should_stop(). The first thread to find and clear this flag
then moves towards exiting.
This removes an explicit dependency on sp_all_threads which will make a
future patch simpler.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Both nfsd and nfsv4-callback take a temporary reference to the svc_serv
while calling svc_set_num_threads() to stop the last thread. lockd does
not.
This extra reference prevents the scv_serv from being freed when the
last thread drops its reference count. This is not currently needed
for lockd as the svc_serv is not accessed after the last thread is told
to exit.
However a future patch will require svc_exit_thread() to access the
svc_serv after the svc_put() so it will need the code that calls
svc_set_num_threads() to keep a reference and keep the svc_serv active.
So copy the pattern from nfsd and nfsv4-cb to lockd, and take a
reference around svc_set_num_threads(.., 0)
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
This patch fixes a race in async lock request handling between adding
the relevant struct nlm_block to nlm_blocked list after the request was
sent by vfs_lock_file() and nlmsvc_grant_deferred() does a lookup of the
nlm_block in the nlm_blocked list. It could be that the async request is
completed before the nlm_block was added to the list. This would end
in a -ENOENT and a kernel log message of "lockd: grant for unknown
block".
To solve this issue we add the nlm_block before the vfs_lock_file() call
to be sure it has been added when a possible nlmsvc_grant_deferred() is
called. If the vfs_lock_file() results in an case when it wouldn't be
added to nlm_blocked list, the nlm_block struct will be removed from
this list again.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
This patch returns nlm_lck_blocked in nlmsvc_lock() when an asynchronous
lock request is pending. During testing I ran into the case with the
side-effects that lockd is waiting for only one lm_grant() callback
because it's already part of the nlm_blocked list. If another
asynchronous for the same nlm_block is triggered two lm_grant()
callbacks will occur but lockd was only waiting for one.
To avoid any change of existing users this handling will only being made
when export_op_support_safe_async_lock() returns true.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
This patch reverts mostly commit 40595cdc93 ("nfs: block notification
on fs with its own ->lock") and introduces an EXPORT_OP_ASYNC_LOCK
export flag to signal that the "own ->lock" implementation supports
async lock requests. The only main user is DLM that is used by GFS2 and
OCFS2 filesystem. Those implement their own lock() implementation and
return FILE_LOCK_DEFERRED as return value. Since commit 40595cdc93
("nfs: block notification on fs with its own ->lock") the DLM
implementation were never updated. This patch should prepare for DLM
to set the EXPORT_OP_ASYNC_LOCK export flag and update the DLM
plock implementation regarding to it.
Acked-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
In addition to the benefits of using an enum rather than a set of
macros, we now have a named type that can improve static type
checking of function return values.
As part of this change, I removed a stale comment from svcauth.h;
the return values from current implementations of the
auth_ops::release method are all zero/negative errno, not the SVC_OK
enum values as the old comment suggested.
Suggested-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Most svc threads have no interest in a timeout.
nfsd sets it to 1 hour, but this is a wart of no significance.
lockd uses the timeout so that it can call nlmsvc_retry_blocked().
It also sometimes calls svc_wake_up() to ensure this is called.
So change lockd to be consistent and always use svc_wake_up() to trigger
nlmsvc_retry_blocked() - using a timer instead of a timeout to
svc_recv().
And change svc_recv() to not take a timeout arg.
This makes the sp_threads_timedout counter always zero.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
svc_recv() currently returns a 0 on success or one of two errors:
- -EAGAIN means no message was successfully received
- -EINTR means the thread has been told to stop
Previously nfsd would stop as the result of a signal as well as
following kthread_stop(). In that case the difference was useful: EINTR
means stop unconditionally. EAGAIN means stop if kthread_should_stop(),
continue otherwise.
Now threads only exit when kthread_should_stop() so we don't need the
distinction.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
All callers of svc_recv() go on to call svc_process() on success.
Simplify callers by having svc_recv() do that for them.
This loses one call to validate_process_creds() in nfsd. That was
debugging code added 14 years ago. I don't think we need to keep it.
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
lockd allows SIGKILL and responds by dropping all locks and restarting
the grace period. This functionality has been present since 2.1.32 when
lockd was added to Linux.
This functionality is undocumented and most likely added as a useful
debug aid. When there is a need to drop locks, the better approach is
to use /proc/fs/nfsd/unlock_*.
This patch removes SIGKILL handling as part of preparation for removing
all signal handling from sunrpc service threads.
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
clang's static analysis warning: fs/lockd/mon.c: line 293, column 2:
Null pointer passed as 2nd argument to memory copy function.
Assuming 'hostname' is NULL and calling 'nsm_create_handle()', this will
pass NULL as 2nd argument to memory copy function 'memcpy()'. So return
NULL if 'hostname' is invalid.
Fixes: 77a3ef33e2 ("NSM: More clean up of nsm_get_handle()")
Signed-off-by: Su Hui <suhui@nfschina.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
This patch fixes races when lockd accesses the global nlm_blocked list.
It was mostly safe to access the list because everything was accessed
from the lockd kernel thread context but there exist cases like
nlmsvc_grant_deferred() that could manipulate the nlm_blocked list and
it can be called from any context.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Highlights include:
Stable fixes and other bugfixes:
- nfs: don't report STATX_BTIME in ->getattr
- Revert "NFSv4: Retry LOCK on OLD_STATEID during delegation return"
since it breaks NFSv4 state recovery.
- NFSv4.1: freeze the session table upon receiving NFS4ERR_BADSESSION
- Fix the NFSv4.2 xattr cache shrinker_id
- Force a ctime update after a NFSv4.2 SETXATTR call
Features and cleanups:
- NFS and RPC over TLS client code from Chuck Lever.
- Support for use of abstract unix socket addresses with the rpcbind
daemon.
- Sysfs API to allow shutdown of the kernel RPC client and prevent
umount() hangs if the server is known to be permanently down.
- XDR cleanups from Anna.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEESQctxSBg8JpV8KqEZwvnipYKAPIFAmSgmmUACgkQZwvnipYK
APJwUA/+J6uEjJFoigSDU5dpCwQr4pHZPgUn3T2heplcyalGMxLo1VjDTVuFXb+a
NZqdUZF2ePmYqss/UYzJC7R6/z9OanVBcpiGqp66foJt9ncs9BSm5AzdV5Gvi4VX
6SrBM98nSqvD47l45LQ90bqIdR6WgMP9OiDC257PzYnaMZJcB0xObD4HWXh1zbIz
3xynJTSQnRGbv9I5EjJJGVIHDWLfSKY61NUXjrUcmMZ2L39ITNy0CRi8sIdj3oY/
A2Iz52IHtAhE77+EetThPskbTLa07raQSWRo3X6XJqCKiJIXa5giNDoG/zLq6sOT
hi1AV7Tdxaed2EYibeRWzsSVQIClBb7T/hdro5dWs5u/bxM6Bt+yY90ZWUMZVOAQ
/kGTYQXhI31vUgRaEN+2xci0wKDy9wqyAWcD8u8Gz01KaK09sfJSIvvYn+srSeaz
wEUQHZCdBGtNFVP2q18q4x8BN27uObh1DdMvNhrxrA7YraXSQvL/rIIsD0jmDInb
6olMm9g9nZSHgq62+CYs2v7J/AJKQzE7PsWrTMJDX1rso+/Lyc6x7oUGxv2IFt5H
VZVZNdstKeNzfcnNKsGG2ZbufhasKHqiHJxJTdNOuOi0YBi+ixtJVRpupId3+6aZ
ysng0IfzqiWSuiq5Axjreva+480IDSMW+7cqcw5urKEfYY5uVcc=
=leGh
-----END PGP SIGNATURE-----
Merge tag 'nfs-for-6.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
Pull NFS client updates from Trond Myklebust:
"Stable fixes and other bugfixes:
- nfs: don't report STATX_BTIME in ->getattr
- Revert 'NFSv4: Retry LOCK on OLD_STATEID during delegation return'
since it breaks NFSv4 state recovery.
- NFSv4.1: freeze the session table upon receiving NFS4ERR_BADSESSION
- Fix the NFSv4.2 xattr cache shrinker_id
- Force a ctime update after a NFSv4.2 SETXATTR call
Features and cleanups:
- NFS and RPC over TLS client code from Chuck Lever
- Support for use of abstract unix socket addresses with the rpcbind
daemon
- Sysfs API to allow shutdown of the kernel RPC client and prevent
umount() hangs if the server is known to be permanently down
- XDR cleanups from Anna"
* tag 'nfs-for-6.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (33 commits)
Revert "NFSv4: Retry LOCK on OLD_STATEID during delegation return"
NFS: Don't cleanup sysfs superblock entry if uninitialized
nfs: don't report STATX_BTIME in ->getattr
NFSv4.1: freeze the session table upon receiving NFS4ERR_BADSESSION
NFSv4.2: fix wrong shrinker_id
NFSv4: Clean up some shutdown loops
NFS: Cancel all existing RPC tasks when shutdown
NFS: add sysfs shutdown knob
NFS: add a sysfs link to the acl rpc_client
NFS: add a sysfs link to the lockd rpc_client
NFS: Add sysfs links to sunrpc clients for nfs_clients
NFS: add superblock sysfs entries
NFS: Make all of /sys/fs/nfs network-namespace unique
NFS: Open-code the nfs_kset kset_create_and_add()
NFS: rename nfs_client_kobj to nfs_net_kobj
NFS: rename nfs_client_kset to nfs_kset
NFS: Add an "xprtsec=" NFS mount option
NFS: Have struct nfs_client carry a TLS policy field
SUNRPC: Add a TCP-with-TLS RPC transport class
SUNRPC: Capture CMSG metadata on client-side receive
...
After lockd is started, add a symlink for lockd's rpc_client under
NFS' superblock sysfs.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
The below-mentioned patch was intended to simplify refcounting on the
svc_serv used by locked. The goal was to only ever have a single
reference from the single thread. To that end we dropped a call to
lockd_start_svc() (except when creating thread) which would take a
reference, and dropped the svc_put(serv) that would drop that reference.
Unfortunately we didn't also remove the svc_get() from
lockd_create_svc() in the case where the svc_serv already existed.
So after the patch:
- on the first call the svc_serv was allocated and the one reference
was given to the thread, so there are no extra references
- on subsequent calls svc_get() was called so there is now an extra
reference.
This is clearly not consistent.
The inconsistency is also clear in the current code in lockd_get()
takes *two* references, one on nlmsvc_serv and one by incrementing
nlmsvc_users. This clearly does not match lockd_put().
So: drop that svc_get() from lockd_get() (which used to be in
lockd_create_svc().
Reported-by: Ido Schimmel <idosch@idosch.org>
Closes: https://lore.kernel.org/linux-nfs/ZHsI%2FH16VX9kJQX1@shredder/T/#u
Fixes: b73a297204 ("lockd: move lockd_start_svc() call into lockd_create_svc()")
Signed-off-by: NeilBrown <neilb@suse.de>
Tested-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
- A collection of minor bug fixes
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmRiQVEACgkQM2qzM29m
f5eYfBAAg5Qz45PL+fo1qWxkJ1ZKaNV1vPdi4tqCt9NEItDTjAnjj0am+rKNGZAz
EOM2yFt4xaZGyMYgXe4VnYl0N+rSpbI/H+Rk/wOq4OHPURQD5EO9VeP86qZ7rmGl
ECPqb39TFTwAiRomC/DHO4eNpoe1rQuXu0tW9+GmqDGxeuh8xdxTk33g17ZXwCFN
tdPkkPjxVPdWd8X7HQg9kWm8AWfV+GyuzE2rKAoOjbs6Wv6d9GCY8Cb5HXkRsQhF
4Zh0PVQuTuXurZwtPXwnS0k4kfvQwjlTIKHlXuo0ZLh+SuFbrWHzv0fVyD+kUpSK
HtWbJ8JcruUvz0WGMtZatzRLHCZLDguV6oVXPp7rtmuxTj4szzHSFpEeAV901sIm
Nkvuomvd02K/fiTo7s3yr6t1VG2vju9LDwhBe197iA3leHAlockfbbxE3NJMGbzQ
NoOPd+lu95cfsanOM1LZZLNfbLrZofoSLK9K1+HD0yAVdyq7u47FyHRrymvCaMrj
GiheuqrBfBMEq+2mCwUn37aM0FblYEXQM0xTVXPQcHtBBN/nGZxPJukmpr7ScNlR
aqMtDoOLu4OEFuo6fe2/94eNi+N5XAZgWmx/mSyaytE8Xw9LJxeQ83UTigaGcYKc
3YIuG1YXg9IyIoIdLkghB+Aj/6fivsGFK9Gud6g7I3xw4f15noA=
=PiNG
-----END PGP SIGNATURE-----
Merge tag 'nfsd-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- A collection of minor bug fixes
* tag 'nfsd-6.4-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
NFSD: Remove open coding of string copy
SUNRPC: Fix trace_svc_register() call site
SUNRPC: always free ctxt when freeing deferred request
SUNRPC: double free xprt_ctxt while still in use
SUNRPC: Fix error handling in svc_setup_socket()
SUNRPC: Fix encoding of accepted but unsuccessful RPC replies
lockd: define nlm_port_min,max with CONFIG_SYSCTL
nfsd: define exports_proc_ops with CONFIG_PROC_FS
SUNRPC: Avoid relying on crypto API to derive CBC-CTS output IV
gcc with W=1 and ! CONFIG_SYSCTL
fs/lockd/svc.c:80:51: error: ‘nlm_port_max’ defined but not used [-Werror=unused-const-variable=]
80 | static const int nlm_port_min = 0, nlm_port_max = 65535;
| ^~~~~~~~~~~~
fs/lockd/svc.c:80:33: error: ‘nlm_port_min’ defined but not used [-Werror=unused-const-variable=]
80 | static const int nlm_port_min = 0, nlm_port_max = 65535;
| ^~~~~~~~~~~~
The only use of these variables is when CONFIG_SYSCTL
is defined, so their definition should be likewise conditional.
Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
The big ticket item for this release is support for RPC-with-TLS
[RFC 9289] has been added to the Linux NFS server. The goal is to
provide a simple-to-deploy, low-overhead in-transit confidentiality
and peer authentication mechanism. It can supplement NFS Kerberos
and it can protect the use of legacy non-cryptographic user
authentication flavors such as AUTH_SYS. The TLS Record protocol is
handled entirely by kTLS, meaning it can use either software
encryption or offload encryption to smart NICs.
Work continues on improving NFSD's open file cache. Among the many
clean-ups in that area is a patch to convert the rhashtable to use
the list-hashing version of that data structure.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmRK/JMACgkQM2qzM29m
f5cF5A/+JZFRSPlfSYt0YHzUQQSDdYn5n/IG9TwJQd62xheu083WuKRaCOYYoOhg
06nZd6p7nuF1E0n2ZWOKSE6YkBSE6z4M6KrQlm6lCe/nmxYCR87IYfJCXuL+Yf0e
/LdL4OTvDHzY5ec1DreERldPIUJ8GFzwChH8/z4XwbNDR7qJkF/gf8YxpFr+8K+j
Cfyl8woZiEze/Nvxy1YtAqa7HMEpitt0aWJN55rHwTh9c3b0nmDzziYFcVqXgybJ
/qUHfHBak66ll8RqhcQ3BMuyfszwASERbPsaZ2a5F/RaxLL5ZWfFyhgQwm+PZWT+
J5DdSBwLEQYtKQGD41A1aorP6v/u2QelfWrl4S7/qjRpREp8Ba2IU4fYLjGb1499
Imk68BA7NwFp87tdMi/7en1VVgina4U/S3X71aUYWe+C0g48BfTrVwq4SVbQSAo4
1638vbZnrJbsJMr9OaaysKWfv4KZB020Ji1KVwuqmgy5F8kdfJCCQ2UR/fHuJ3DY
R0Zrd1Ryjwr83viP+Xj0ERiW405gPdCT0RJqoA7rznRPCqT5M42tf5z65uO7iZeE
C1udgDaoQOtioKlem6FcDXLkryf986slGA7V91lat/Jt8A5jLKQfjVe3Q+kaaqXP
ka1DQnYelzMzILQQs39cqW5pShrH8e3tfRZ7JhdBgrpxVXz9ZZM=
=lA2+
-----END PGP SIGNATURE-----
Merge tag 'nfsd-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd updates from Chuck Lever:
"The big ticket item for this release is that support for RPC-with-TLS
[RFC 9289] has been added to the Linux NFS server.
The goal is to provide a simple-to-deploy, low-overhead in-transit
confidentiality and peer authentication mechanism. It can supplement
NFS Kerberos and it can protect the use of legacy non-cryptographic
user authentication flavors such as AUTH_SYS. The TLS Record protocol
is handled entirely by kTLS, meaning it can use either software
encryption or offload encryption to smart NICs.
Aside from that, work continues on improving NFSD's open file cache.
Among the many clean-ups in that area is a patch to convert the
rhashtable to use the list-hashing version of that data structure"
* tag 'nfsd-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (31 commits)
NFSD: Handle new xprtsec= export option
SUNRPC: Support TLS handshake in the server-side TCP socket code
NFSD: Clean up xattr memory allocation flags
NFSD: Fix problem of COMMIT and NFS4ERR_DELAY in infinite loop
SUNRPC: Clear rq_xid when receiving a new RPC Call
SUNRPC: Recognize control messages in server-side TCP socket code
SUNRPC: Be even lazier about releasing pages
SUNRPC: Convert svc_xprt_release() to the release_pages() API
SUNRPC: Relocate svc_free_res_pages()
nfsd: simplify the delayed disposal list code
SUNRPC: Ignore return value of ->xpo_sendto
SUNRPC: Ensure server-side sockets have a sock->file
NFSD: Watch for rq_pages bounds checking errors in nfsd_splice_actor()
sunrpc: simplify two-level sysctl registration for svcrdma_parm_table
SUNRPC: return proper error from get_expiry()
lockd: add some client-side tracepoints
nfs: move nfs_fhandle_hash to common include file
lockd: server should unlock lock if client rejects the grant
lockd: fix races in client GRANTED_MSG wait logic
lockd: move struct nlm_wait to lockd.h
...
Currently lockd just dequeues the block and ignores it if the client
sends a GRANT_RES with a status of nlm_lck_denied. That status is an
indicator that the client has rejected the lock, so the right thing to
do is to unlock the lock we were trying to grant.
Reported-by: Yongcheng Yang <yoyang@redhat.com>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2063818
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
After the wait for a grant is done (for whatever reason), nlmclnt_block
updates the status of the nlm_rqst with the status of the block. At the
point it does this, however, the block is still queued its status could
change at any time.
This is particularly a problem when the waiting task is signaled during
the wait. We can end up giving up on the lock just before the GRANTED_MSG
callback comes in, and accept it even though the lock request gets back
an error, leaving a dangling lock on the server.
Since the nlm_wait never lives beyond the end of nlmclnt_lock, put it on
the stack and add functions to allow us to enqueue and dequeue the
block. Enqueue it just before the lock/wait loop, and dequeue it
just after we exit the loop instead of waiting until the end of
the function. Also, scrape the status at the time that we dequeue it to
ensure that it's final.
Reported-by: Yongcheng Yang <yoyang@redhat.com>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2063818
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
The next patch needs struct nlm_wait in fs/lockd/clntproc.c, so move
the definition to a shared header file. As an added clean-up, drop
the unused b_reclaim field.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
It's easily possible for the server to have an outstanding lock when we
go to shut down. When that happens, we often get a warning like this in
the kernel log:
lockd: couldn't shutdown host module for net f0000000!
This is because the shutdown procedures skip removing any hosts that
still have outstanding resources (locks). Eventually, things seem to get
cleaned up anyway, but the log message is unsettling, and server
shutdown doesn't seem to be working the way it was intended.
Ensure that we tear down any resources held on behalf of a client when
tearing one down for server shutdown.
Reported-by: Yongcheng Yang <yoyang@redhat.com>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2063818
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
There is no need to declare two tables to just create directories,
this can be easily be done with a prefix path with register_sysctl().
Simplify this registration.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Commit 6930bcbfb6 dropped the setting of the file_lock range when
decoding a nlm_lock off the wire. This causes the client side grant
callback to miss matching blocks and reject the lock, only to rerequest
it 30s later.
Add a helper function to set the file_lock range from the start and end
values that the protocol uses, and have the nlm_lock decoder call that to
set up the file_lock args properly.
Fixes: 6930bcbfb6 ("lockd: detect and reject lock arguments that overflow")
Reported-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Tested-by: Amir Goldstein <amir73il@gmail.com>
Cc: stable@vger.kernel.org #6.0
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Just one fix which just came in, this just hit linux-next just yesterday
with a success build report. But since its a fix and reviewed I think its
good to take in.
Sadly the eager beavers willing to help with the sysctl moves have slowed.
-----BEGIN PGP SIGNATURE-----
iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmP2ndUSHG1jZ3JvZkBr
ZXJuZWwub3JnAAoJEM4jHQowkoinyLYQAM8HBVhPcXi4gp2DHbUY6Bd6MeXsQ4Mk
EulLxYZXJFIfgXcKYN9IMNcPtdYJ4xnGWLNzPoixiMb61KY6Cjw+l62AtNBiwvxw
/GliVtrwz0EU1Bw7vilm4UBn/BUH7vXk7HMRExBwMXLwY7y11TYWMtb6+V+7Zzf1
VImyPZ1MlfOYlvgRUdaNUnuvHLE19x2pdVG/oRDLez9gs38FYEAiBd34adUKDf2U
FxZBQPd+4xaEht2sdTp0ws52YoHHx3k9i8mSiRwQqsiydKMk32iD59hXxeA4r3Oc
lzf10VgN7EdSDCTzdfDYhpIxq04RuA1s1gtXU/eePMqfJZenR0FbFHhcHmIhon4j
D5EuON5eMISzzT9h9DSI5k7kXPDv0iIZPb/vunAfklC1UPVv8uZ2/RV6M1jNR1Bn
S5WYURzdlwpS+4+uwl479061W1YnnhmtwVAMNw+AzfTxv9eQQ3oa2X70L0xirfDx
xJxZS8lzc1rLgL2y3lO2/8W1yI6hA0xhME+pxuopVTMWcXt6aG4efq3tIrxEx3Kf
0HukI1KbN6O/Rmf7hqprHGsfIq4EP4IEOk2XIvMSOrDyf2fOXVGCQQQ0eMOw4ejR
WTtHp1TW2byA8lSOp4K+rMZamt9Mokci0tOEc/52OXFA2jDtq4KZNHvHAOdZv4/Y
kLMflEpOmU++
=Txc3
-----END PGP SIGNATURE-----
Merge tag 'sysctl-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull sysctl update from Luis Chamberlain:
"Just one fix which just came in.
Sadly the eager beavers willing to help with the sysctl moves have
slowed"
* tag 'sysctl-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
sysctl: fix proc_dobool() usability
Two significant security enhancements are part of this release:
* NFSD's RPC header encoding and decoding, including RPCSEC GSS
and gssproxy header parsing, has been overhauled to make it
more memory-safe.
* Support for Kerberos AES-SHA2-based encryption types has been
added for both the NFS client and server. This provides a clean
path for deprecating and removing insecure encryption types
based on DES and SHA-1. AES-SHA2 is also FIPS-140 compliant, so
that NFS with Kerberos may now be used on systems with fips
enabled.
In addition to these, NFSD is now able to handle crossing into an
auto-mounted mount point on an exported NFS mount. A number of
fixes have been made to NFSD's server-side copy implementation.
RPC metrics have been converted to per-CPU variables. This helps
reduce unnecessary cross-CPU and cross-node memory bus traffic,
and significantly reduces noise when KCSAN is enabled.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmPzgiYACgkQM2qzM29m
f5dB2A//eqjpj+FgAN+UjygrwMC4ahAsPX3Sc3FG8/lTAiao3NFVFY2gxAiCPyVE
CFk+tUyfL23oXvbyfIBe3LhxSBOf621xU6up2OzqAzJqh1Q9iUWB6as3I14to8ZU
sWpxXo5ofwk1hzkbrvOAVkyfY0emwsr00iBeWMawkpBe8FZEQA31OYj3/xHr6bBI
zEVlZPBZAZlp0DZ74tb+bBLs/EOnqKj+XLWcogCH13JB3sn2umF6cQNkYgsxvHGa
TNQi4LEdzWZGme242LfBRiGGwm1xuVIjlAhYV/R1wIjaknE3QBzqfXc6lJx74WII
HaqpRJGrKqdo7B+1gaXCl/AMS7YluED1CBrxuej0wBG7l2JEB7m2MFMQ4LTQjgsn
nrr3P70DgbB4LuPCPyUS7dtsMmUXabIqP7niiCR4T1toH6lBmHAgEi4cFmkzg7Cd
EoFzn888mtDpfx4fghcsRWS5oKXEzbPJfu5+IZOD63+UB+NGpi0Xo2s23sJPK8vz
kqK/X63JYOUxWUvK0zkj/c/wW1cLqIaBwnSKbShou5/BL+cZVI+uJYrnEesgpoB2
5fh/cZv3hdcoOPO7OfcjCLQYy4J6RCWajptnk/hcS3lMvBTBrnq697iAqCVURDKU
Xfmlf7XbBwje+sk4eHgqVGEqqVjrEmoqbmA2OS44WSS5LDvxXdI=
=ZG/7
-----END PGP SIGNATURE-----
Merge tag 'nfsd-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd updates from Chuck Lever:
"Two significant security enhancements are part of this release:
- NFSD's RPC header encoding and decoding, including RPCSEC GSS and
gssproxy header parsing, has been overhauled to make it more
memory-safe.
- Support for Kerberos AES-SHA2-based encryption types has been added
for both the NFS client and server. This provides a clean path for
deprecating and removing insecure encryption types based on DES and
SHA-1. AES-SHA2 is also FIPS-140 compliant, so that NFS with
Kerberos may now be used on systems with fips enabled.
In addition to these, NFSD is now able to handle crossing into an
auto-mounted mount point on an exported NFS mount. A number of fixes
have been made to NFSD's server-side copy implementation.
RPC metrics have been converted to per-CPU variables. This helps
reduce unnecessary cross-CPU and cross-node memory bus traffic, and
significantly reduces noise when KCSAN is enabled"
* tag 'nfsd-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (121 commits)
NFSD: Clean up nfsd_symlink()
NFSD: copy the whole verifier in nfsd_copy_write_verifier
nfsd: don't fsync nfsd_files on last close
SUNRPC: Fix occasional warning when destroying gss_krb5_enctypes
nfsd: fix courtesy client with deny mode handling in nfs4_upgrade_open
NFSD: fix problems with cleanup on errors in nfsd4_copy
nfsd: fix race to check ls_layouts
nfsd: don't hand out delegation on setuid files being opened for write
SUNRPC: Remove ->xpo_secure_port()
SUNRPC: Clean up the svc_xprt_flags() macro
nfsd: remove fs/nfsd/fault_inject.c
NFSD: fix leaked reference count of nfsd4_ssc_umount_item
nfsd: clean up potential nfsd_file refcount leaks in COPY codepath
nfsd: zero out pointers after putting nfsd_files on COPY setup error
SUNRPC: Fix whitespace damage in svcauth_unix.c
nfsd: eliminate __nfs4_get_fd
nfsd: add some kerneldoc comments for stateid preprocessing functions
nfsd: eliminate find_deleg_file_locked
nfsd: don't take nfsd4_copy ref for OP_OFFLOAD_STATUS
SUNRPC: Add encryption self-tests
...
Currently proc_dobool expects a (bool *) in table->data, but sizeof(int)
in table->maxsize, because it uses do_proc_dointvec() directly.
This is unsafe for at least two reasons:
1. A sysctl table definition may use { .data = &variable, .maxsize =
sizeof(variable) }, not realizing that this makes the sysctl unusable
(see the Fixes: tag) and that they need to use the completely
counterintuitive sizeof(int) instead.
2. proc_dobool() will currently try to parse an array of values if given
.maxsize >= 2*sizeof(int), but will try to write values of type bool
by offsets of sizeof(int), so it will not work correctly with neither
an (int *) nor a (bool *). There is no .maxsize validation to prevent
this.
Fix this by:
1. Constraining proc_dobool() to allow only one value and .maxsize ==
sizeof(bool).
2. Wrapping the original struct ctl_table in a temporary one with .data
pointing to a local int variable and .maxsize set to sizeof(int) and
passing this one to proc_dointvec(), converting the value to/from
bool as needed (using proc_dou8vec_minmax() as an example).
3. Extending sysctl_check_table() to enforce proc_dobool() expectations.
4. Fixing the proc_dobool() docstring (it was just copy-pasted from
proc_douintvec, apparently...).
5. Converting all existing proc_dobool() users to set .maxsize to
sizeof(bool) instead of sizeof(int).
Fixes: 83efeeeb3d ("tty: Allow TIOCSTI to be disabled")
Fixes: a2071573d6 ("sysctl: introduce new proc handler proc_dobool")
Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Currently, svcauth_gss_accept() pre-reserves response buffer space
for the RPC payload length and GSS sequence number before returning
to the dispatcher, which then adds the header's accept_stat field.
The problem is the accept_stat field is supposed to go before the
length and seq_num fields. So svcauth_gss_release() has to relocate
the accept_stat value (see svcauth_gss_prepare_to_wrap()).
To enable these fields to be added to the response buffer in the
correct (final) order, the pointer to the accept_stat has to be made
available to svcauth_gss_accept() so that it can set it before
reserving space for the length and seq_num fields.
As a first step, move the pointer to the location of the accept_stat
field into struct svc_rqst.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Now that all vs_dispatch functions invoke svcxdr_init_encode(), it
is common code and can be pushed down into the generic RPC server.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Now that all vs_dispatch functions invoke svcxdr_init_decode(), it
is common code and can be pushed down into the generic RPC server.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
locks_inode was turned into a wrapper around file_inode in de2a4a501e
(Partially revert "locks: fix file locking on overlayfs"). Finish
replacing locks_inode invocations everywhere with file_inode.
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
The file locking definitions have lived in fs.h since the dawn of time,
but they are only used by a small subset of the source files that
include it.
Move the file locking definitions to a new header file, and add the
appropriate #include directives to the source files that need them. By
doing this we trim down fs.h a bit and limit the amount of rebuilding
that has to be done when we make changes to the file locking APIs.
Reviewed-by: Xiubo Li <xiubli@redhat.com>
Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: David Howells <dhowells@redhat.com>
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Acked-by: Steve French <stfrench@microsoft.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
This release introduces support for the CB_RECALL_ANY operation.
NFSD can send this operation to request that clients return any
delegations they choose. The server uses this operation to handle
low memory scenarios or indicate to a client when that client has
reached the maximum number of delegations the server supports.
The NFSv4.2 READ_PLUS operation has been simplified temporarily
whilst support for sparse files in local filesystems and the VFS is
improved.
Two major data structure fixes appear in this release:
* The nfs4_file hash table is replaced with a resizable hash table
to reduce the latency of NFSv4 OPEN operations.
* Reference counting in the NFSD filecache has been hardened against
races.
In furtherance of removing support for NFSv2 in a subsequent kernel
release, a new Kconfig option enables server-side support for NFSv2
to be left out of a kernel build.
MAINTAINERS has been updated to indicate that changes to fs/exportfs
should go through the NFSD tree.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEKLLlsBKG3yQ88j7+M2qzM29mf5cFAmOXPE4ACgkQM2qzM29m
f5faaRAAh7YT5V61afPbfgBybO5AbDzztpZSNjNjLZs78piSnFp6hP75yNtTviwQ
1o7St13/NkCmDaIdGUpr02U01zbM1BDOq2wGckImOJLNSgb7xHV5r4PqkRiFkh0t
QYSnwG+wp8fDUJeCL/nAOAu9I9EQUqHzWchxiU/h8ln2hN3rXUlIRSeo17Wy7zkD
cNIcoAjTi9fzY3dE6H4r+lZTdNCYH+AdzChmKrHdRZQwq0Xs3FWv4gAMTLbDuD4P
B6NDHz0Umn6XnFsJGptwozkwaWeMQw4GyJj/3iUiO8JF209SaoYXMPjJAyG6tYYa
fUrgv4UXGeXjigDbLBA5IYxfhX7GXjMQSaj3edhzyrl8P74q4/Cq/8fDUnAZ841m
E+TGSCPIQD0QuIjdXxLv9KLY8JNThSfcAt6jr5GBXhPZQr8xpS0BqK/Onr68fgZC
Lpull5xN68L4A1B7cf2GNPuMyvkBKxwSGXOehldh/BkvpVMjFwqd4/q5xWC+6CcQ
tbOkjTbbSS71nzJwZip0NphaYCa3qQPzKT4SZzn/I4I9W5otbwYBx734Bw46gTDE
ZPUXTuJ00VPgX07wbLRahg521Fwzr+8sk1WnVYq82PoaMh1l9FjzLNGouQWBdo3E
UzIo/KUfQKmoZce6O723L6OI4ffdK5oMtfaTpe+SiUPpV1lUAcA=
=jNlu
-----END PGP SIGNATURE-----
Merge tag 'nfsd-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd updates from Chuck Lever:
"This release introduces support for the CB_RECALL_ANY operation. NFSD
can send this operation to request that clients return any delegations
they choose. The server uses this operation to handle low memory
scenarios or indicate to a client when that client has reached the
maximum number of delegations the server supports.
The NFSv4.2 READ_PLUS operation has been simplified temporarily whilst
support for sparse files in local filesystems and the VFS is improved.
Two major data structure fixes appear in this release:
- The nfs4_file hash table is replaced with a resizable hash table to
reduce the latency of NFSv4 OPEN operations.
- Reference counting in the NFSD filecache has been hardened against
races.
In furtherance of removing support for NFSv2 in a subsequent kernel
release, a new Kconfig option enables server-side support for NFSv2 to
be left out of a kernel build.
MAINTAINERS has been updated to indicate that changes to fs/exportfs
should go through the NFSD tree"
* tag 'nfsd-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (49 commits)
NFSD: Avoid clashing function prototypes
SUNRPC: Fix crasher in unwrap_integ_data()
SUNRPC: Make the svc_authenticate tracepoint conditional
NFSD: Use only RQ_DROPME to signal the need to drop a reply
SUNRPC: Clean up xdr_write_pages()
SUNRPC: Don't leak netobj memory when gss_read_proxy_verf() fails
NFSD: add CB_RECALL_ANY tracepoints
NFSD: add delegation reaper to react to low memory condition
NFSD: add support for sending CB_RECALL_ANY
NFSD: refactoring courtesy_client_reaper to a generic low memory shrinker
trace: Relocate event helper files
NFSD: pass range end to vfs_fsync_range() instead of count
lockd: fix file selection in nlmsvc_cancel_blocked
lockd: ensure we use the correct file descriptor when unlocking
lockd: set missing fl_flags field when retrieving args
NFSD: Use struct_size() helper in alloc_session()
nfsd: return error if nfs4_setacl fails
lockd: set other missing fields when unlocking files
NFSD: Add an nfsd_file_fsync tracepoint
sunrpc: svc: Remove an unused static function svc_ungetu32()
...
We currently do a lock_to_openmode call based on the arguments from the
NLM_UNLOCK call, but that will always set the fl_type of the lock to
F_UNLCK, and the O_RDONLY descriptor is always chosen.
Fix it to use the file_lock from the block instead.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Shared locks are set on O_RDONLY descriptors and exclusive locks are set
on O_WRONLY ones. nlmsvc_unlock however calls vfs_lock_file twice, once
for each descriptor, but it doesn't reset fl_file. Ensure that it does.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
vfs_lock_file() expects the struct file_lock to be fully initialised by
the caller. Re-exported NFSv3 has been seen to Oops if the fl_file field
is NULL.
Fixes: aec158242b ("lockd: set fl_owner when unlocking files")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216582
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
lockd currently doesn't access i_flctx safely. This requires a
smp_load_acquire, as the pointer is set via cmpxchg (a release
operation).
Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
Cc: Anna Schumaker <anna@kernel.org>
Cc: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Currently, SUNRPC clears the whole of .pc_argsize before processing
each incoming RPC transaction. Add an extra parameter to struct
svc_procedure to enable upper layers to reduce the amount of each
operation's argument structure that is zeroed by SUNRPC.
The size of struct nfsd4_compoundargs, in particular, is a lot to
clear on each incoming RPC Call. A subsequent patch will cut this
down to something closer to what NFSv2 and NFSv3 uses.
This patch should cause no behavior changes.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.
Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
lockd doesn't currently vet the start and length in nlm4 requests like
it should, and can end up generating lock requests with arguments that
overflow when passed to the filesystem.
The NLM4 protocol uses unsigned 64-bit arguments for both start and
length, whereas struct file_lock tracks the start and end as loff_t
values. By the time we get around to calling nlm4svc_retrieve_args,
we've lost the information that would allow us to determine if there was
an overflow.
Start tracking the actual start and len for NLM4 requests in the
nlm_lock. In nlm4svc_retrieve_args, vet these values to ensure they
won't cause an overflow, and return NLM4_FBIG if they do.
Link: https://bugzilla.linux-nfs.org/show_bug.cgi?id=392
Reported-by: Jan Kasiak <j.kasiak@gmail.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: <stable@vger.kernel.org> # 5.14+
Instead of trusting that struct file_lock returns completely unchanged
after vfs_test_lock() when there's no conflicting lock, stash away our
nlm_lockowner reference so we can properly release it for all cases.
This defends against another file_lock implementation overwriting fl_owner
when the return type is F_UNLCK.
Reported-by: Roberto Bergantinos Corpas <rbergant@redhat.com>
Tested-by: Roberto Bergantinos Corpas <rbergant@redhat.com>
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
This loop condition tries a bit too hard to be clever. Just test for
the two indices we care about explicitly.
Cc: J. Bruce Fields <bfields@fieldses.org>
Fixes: 7f024fcd5c ("Keep read and write fds with each nlm_file")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Unlocking a POSIX lock on an inode with vfs_lock_file only works if
the owner matches. Ensure we set it in the request.
Cc: J. Bruce Fields <bfields@fieldses.org>
Fixes: 7f024fcd5c ("Keep read and write fds with each nlm_file")
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Hoist svo_function back into svc_serv and remove struct
svc_serv_ops, since the struct is now devoid of fields.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
struct svc_serv_ops is about to be removed.
Neil Brown says:
> I suspect svo_module can go as well - I don't think the thread is
> ever the thing that primarily keeps a module active.
A random sample of kthread_create() callers shows sunrpc is the only
one that manages module reference count in this way.
Suggested-by: Neil Brown <neilb@suse.de>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Clean up: svc_shutdown_net() now does nothing but call
svc_close_net(). Replace all external call sites.
svc_close_net() is renamed to be the inverse of svc_xprt_create().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>