RPC-over-RDMA transport to use the new rdma_rw API.
Christoph cleaned the way nfs operations are declared, removing a bunch
of function-pointer casts and declaring the operation vectors as const.
Christoph's changes touch both client and server, and both client and
server pulls this time around should be based on the same commits from
Christoph.
(Note: Anna and I initially didn't coordinate this well and we realized
our pull requests were going to leave you with Christoph's 33 patches
duplicated between our two trees. We decided a last-minute rebase was
the lesser of two evils, so her pull request will show that last-minute
rebase. Yell if that was the wrong choice, and we'll know better for
next time....)
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJZZ80JAAoJECebzXlCjuG+PiMP/jmw4IbzY4qt/X8aldVTMPZ8
TkEXuZSrc7FbmroqAR0XN/qJjzENKUcrnlYm7HKVe6iItTZUvJuVThtHQVGzZUZD
wP2VRzgkky59aDs9cphfTPGKPKL1MtoC3qQdFmKd/8ZhBDHIq89A2pQJwl7PI4rA
IHzvLmZtTKL+xWoypqZQxepONhEY2ZPrffGWL+5OVF/dPmWfJ6m/M6jRTb7zV/YD
PZyRqWQ8UY/HwZTwRrxZDCCxUsmRUPZz195iFjM8wvBl7auWNetC22gyyITlvfzf
1m0zJqw3qn09+v2xnAWs/ZVxypg6rsEiIcL2mf0JC/tQh+iIzabc4e/TwDEWqSq+
ocQrvXJuZCjsrMqg4oaIuDFogaZCsGR5wxDAEyfYDS/8fMdiKq8xJzT7v31/2U37
Bsr1hvgAmD4eZWaTrJg11V5RnTzDgns+EtNfISR8t4/k+wehDfyzav8A+j72sqvR
JT+7iUEd0QcBwo+MCC7AOnLLsIX45QUjZKKrvZNAC1fmr8RyAF1zo5HHO+NNjLuP
J2PUG2GbNxsQkm/JAFKDvyklLpEXZc6uyYAcEefirxYbh1x0GfuetzqtH58DtrQL
/1e80MRG9Qgq5S8PvYyvp1bIQPDRaQ188chEvzZy+3QeNXydq2LzDh0bjlM+4A9I
DZhP2pNGLh0ImaPtX0q+
=mR/a
-----END PGP SIGNATURE-----
Merge tag 'nfsd-4.13' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
"Chuck's RDMA update overhauls the "call receive" side of the
RPC-over-RDMA transport to use the new rdma_rw API.
Christoph cleaned the way nfs operations are declared, removing a
bunch of function-pointer casts and declaring the operation vectors as
const.
Christoph's changes touch both client and server, and both client and
server pulls this time around should be based on the same commits from
Christoph"
* tag 'nfsd-4.13' of git://linux-nfs.org/~bfields/linux: (53 commits)
svcrdma: fix an incorrect check on -E2BIG and -EINVAL
nfsd4: factor ctime into change attribute
svcrdma: Remove svc_rdma_chunk_ctxt::cc_dir field
svcrdma: use offset_in_page() macro
svcrdma: Clean up after converting svc_rdma_recvfrom to rdma_rw API
svcrdma: Clean-up svc_rdma_unmap_dma
svcrdma: Remove frmr cache
svcrdma: Remove unused Read completion handlers
svcrdma: Properly compute .len and .buflen for received RPC Calls
svcrdma: Use generic RDMA R/W API in RPC Call path
svcrdma: Add recvfrom helpers to svc_rdma_rw.c
sunrpc: Allocate up to RPCSVC_MAXPAGES per svc_rqst
svcrdma: Don't account for Receive queue "starvation"
svcrdma: Improve Reply chunk sanity checking
svcrdma: Improve Write chunk sanity checking
svcrdma: Improve Read chunk sanity checking
svcrdma: Remove svc_rdma_marshal.c
svcrdma: Avoid Send Queue overflow
svcrdma: Squelch disconnection messages
sunrpc: Disable splice for krb5i
...
Factoring ctime into the nfsv4 change attribute gives us better
properties than just i_version alone.
Eventually we'll likely also expose this (as opposed to raw i_version)
to userspace, at which point we'll want to move it to a common helper,
called from either userspace or individual filesystems. For now, nfsd
is the only user.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pull read/write updates from Al Viro:
"Christoph's fs/read_write.c series - consolidation and cleanups"
* 'work.read_write' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
nfsd: remove nfsd_vfs_read
nfsd: use vfs_iter_read/write
fs: implement vfs_iter_write using do_iter_write
fs: implement vfs_iter_read using do_iter_read
fs: move more code into do_iter_read/do_iter_write
fs: remove __do_readv_writev
fs: remove do_compat_readv_writev
fs: remove do_readv_writev
Pull misc user access cleanups from Al Viro:
"The first pile is assorted getting rid of cargo-culted access_ok(),
cargo-culted set_fs() and field-by-field copyouts.
The same description applies to a lot of stuff in other branches -
this is just the stuff that didn't fit into a more specific topical
branch"
* 'work.misc-set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
Switch flock copyin/copyout primitives to copy_{from,to}_user()
fs/fcntl: return -ESRCH in f_setown when pid/pgid can't be found
fs/fcntl: f_setown, avoid undefined behaviour
fs/fcntl: f_setown, allow returning error
lpfc debugfs: get rid of pointless access_ok()
adb: get rid of pointless access_ok()
isdn: get rid of pointless access_ok()
compat statfs: switch to copy_to_user()
fs/locks: don't mess with the address limit in compat_fcntl64
nfsd_readlink(): switch to vfs_get_link()
drbd: ->sendpage() never needed set_fs()
fs/locks: pass kernel struct flock to fcntl_getlk/setlk
fs: locks: Fix some troubles at kernel-doc comments
Instead of messing with the address limit to use vfs_read/vfs_writev.
Note that this requires that exported file implement ->read_iter and
->write_iter. All currently exportable file systems do this.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJZPdbLAAoJEHm+PkMAQRiGx4wH/1nCjfnl6fE8oJ24/1gEAOUh
biFdqJkYZmlLYHVtYfLm4Ueg4adJdg0wx6qM/4RaAzmQVvLfDV34bc1qBf1+P95G
kVF+osWyXrZo5cTwkwapHW/KNu4VJwAx2D1wrlxKDVG5AOrULH1pYOYGOpApEkZU
4N+q5+M0ce0GJpqtUZX+UnI33ygjdDbBxXoFKsr24B7eA0ouGbAJ7dC88WcaETL+
2/7tT01SvDMo0jBSV0WIqlgXwZ5gp3yPGnklC3F4159Yze6VFrzHMKS/UpPF8o8E
W9EbuzwxsKyXUifX2GY348L1f+47glen/1sedbuKnFhP6E9aqUQQJXvEO7ueQl4=
=m2Gx
-----END PGP SIGNATURE-----
Merge tag 'v4.12-rc5' into nfsd tree
Update to get f0c3192cee "virtio_net: lower limit on buffer size".
That bug was interfering with my nfsd testing.
Instead of explicitly calling scsi_req_init() after blk_get_request(),
call that function from inside blk_get_request(). Add an
.initialize_rq_fn() callback function to the block drivers that need
it. Merge the IDE .init_rq_fn() function into .initialize_rq_fn()
because it is too small to keep it as a separate function. Keep the
scsi_req_init() call in ide_prep_sense() because it follows a
blk_rq_init() call.
References: commit 82ed4db499 ("block: split scsi_request out of struct request")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
-----BEGIN PGP SIGNATURE-----
iQEcBAABAgAGBQJZPdbLAAoJEHm+PkMAQRiGx4wH/1nCjfnl6fE8oJ24/1gEAOUh
biFdqJkYZmlLYHVtYfLm4Ueg4adJdg0wx6qM/4RaAzmQVvLfDV34bc1qBf1+P95G
kVF+osWyXrZo5cTwkwapHW/KNu4VJwAx2D1wrlxKDVG5AOrULH1pYOYGOpApEkZU
4N+q5+M0ce0GJpqtUZX+UnI33ygjdDbBxXoFKsr24B7eA0ouGbAJ7dC88WcaETL+
2/7tT01SvDMo0jBSV0WIqlgXwZ5gp3yPGnklC3F4159Yze6VFrzHMKS/UpPF8o8E
W9EbuzwxsKyXUifX2GY348L1f+47glen/1sedbuKnFhP6E9aqUQQJXvEO7ueQl4=
=m2Gx
-----END PGP SIGNATURE-----
Merge tag 'v4.12-rc5' into for-4.13/block
We've already got a few conflicts and upcoming work depends on some of the
changes that have gone into mainline as regression fixes for this series.
Pull in 4.12-rc5 to resolve these conflicts and make it easier on down stream
trees to continue working on 4.13 changes.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Since using scsi_req() is only allowed against request queues for
which struct scsi_request is the first member of their private
request data, refuse to submit SCSI commands against a queue for
which this is not the case.
References: commit 82ed4db499 ("block: split scsi_request out of struct request")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: J. Bruce Fields <bfields@redhat.com>
Cc: Jeff Layton <jlayton@poochiereds.net>
Cc: Omar Sandoval <osandov@fb.com>
Cc: linux-nfs@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
if we receive a compound such that:
- the sessionid, slot, and sequence number in the SEQUENCE op
match a cached succesful reply with N ops, and
- the Nth operation of the compound is a PUTFH, PUTPUBFH,
PUTROOTFH, or RESTOREFH,
then nfsd4_sequence will return 0 and set cstate->status to
nfserr_replay_cache. The current filehandle will not be set. This will
cause us to call check_nfsd_access with first argument NULL.
To nfsd4_compound it looks like we just succesfully executed an
operation that set a filehandle, but the current filehandle is not set.
Fix this by moving the nfserr_replay_cache earlier. There was never any
reason to have it after the encode_op label, since the only case where
he hit that is when opdesc->op_func sets it.
Note that there are two ways we could hit this case:
- a client is resending a previously sent compound that ended
with one of the four PUTFH-like operations, or
- a client is sending a *new* compound that (incorrectly) shares
sessionid, slot, and sequence number with a previously sent
compound, and the length of the previously sent compound
happens to match the position of a PUTFH-like operation in the
new compound.
The second is obviously incorrect client behavior. The first is also
very strange--the only purpose of a PUTFH-like operation is to set the
current filehandle to be used by the following operation, so there's no
point in having it as the last in a compound.
So it's likely this requires a buggy or malicious client to reproduce.
Reported-by: Scott Mayhew <smayhew@redhat.com>
Cc: stable@kernel.vger.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This reverts commit 51f5677777 "nfsd: check for oversized NFSv2/v3
arguments", which breaks support for NFSv3 ACLs.
That patch was actually an earlier draft of a fix for the problem that
was eventually fixed by e6838a29ec "nfsd: check for oversized NFSv2/v3
arguments". But somehow I accidentally left this earlier draft in the
branch that was part of my 2.12 pull request.
Reported-by: Eryu Guan <eguan@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
nfsd4_ops contains function pointers, and marking it as constant avoids
it being able to be used as an attach vector for code injections.
Signed-off-by: Christoph Hellwig <hch@lst.de>
struct svc_procinfo contains function pointers, and marking it as
constant avoids it being able to be used as an attach vector for
code injections.
Signed-off-by: Christoph Hellwig <hch@lst.de>
pc_count is the only writeable memeber of struct svc_procinfo, which is
a good candidate to be const-ified as it contains function pointers.
This patch moves it into out out struct svc_procinfo, and into a
separate writable array that is pointed to by struct svc_version.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Pass union nfsd4_op_u to the op_func callbacks instead of using unsafe
function pointer casts.
It also adds two missing structures to struct nfsd4_op.u to facilitate
this.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Except for a lot of unnecessary casts this typedef only has one user,
so remove the casts and expand it in struct nfsd4_operation.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Pass union nfsd4_op_u to the op_set_currentstateid callbacks instead of
using unsafe function pointer casts.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Given the args union in struct nfsd4_op a name, and pass it to the
op_set_currentstateid callbacks instead of using unsafe function
pointer casts.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Drop the resp argument as it can trivially be derived from the rqstp
argument. With that all functions now have the same prototype, and we
can remove the unsafe casting to kxdrproc_t.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
Drop the argp argument as it can trivially be derived from the rqstp
argument. With that all functions now have the same prototype, and we
can remove the unsafe casting to kxdrproc_t.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Drop the p and resp arguments as they are always NULL or can trivially
be derived from the rqstp argument. With that all functions now have the
same prototype, and we can remove the unsafe casting to kxdrproc_t.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Drop the argp and resp arguments as they can trivially be derived from
the rqstp argument. With that all functions now have the same prototype,
and we can remove the unsafe casting to svc_procfunc as well as the
svc_procfunc typedef itself.
Signed-off-by: Christoph Hellwig <hch@lst.de>
struct rpc_procinfo contains function pointers, and marking it as
constant avoids it being able to be used as an attach vector for
code injections.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
p_count is the only writeable memeber of struct rpc_procinfo, which is
a good candidate to be const-ified as it contains function pointers.
This patch moves it into out out struct rpc_procinfo, and into a
separate writable array that is pointed to by struct rpc_version and
indexed by p_statidx.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Declare the p_decode callbacks with the proper prototype instead of
casting to kxdrdproc_t and losing all type safety.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Declare the p_encode callbacks with the proper prototype instead of
casting to kxdreproc_t and losing all type safety.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
bugfixes.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJZE2UeAAoJECebzXlCjuG+St8P/0vG+ps9sY012E6Wh9gy4Ev4
BtxG/c3CtcxrbNzW+cFhdEloBGtC0VvcrKNCozJTK4LdaPYErkyRBpjgXvIggT9I
GWY4ftpH3eJ6uByN9Okgc3/1la2poDflJO/nYhdRed3YHOnXTtx/746tu1xAnVCV
tFtDGrbJZTprt5c3zETtdtquCUSy2aMT5ZbrdU3yWBCwQMNSIufN3an8epfB++xx
Ct+G0HTRffcWAdYuLT0N1HKqm8pkncdNMFpm7mVw0hMCRy552G3fuj8LtkhVTvKE
1KN3zXY4jhaUYWD5Yt6AJcpLEro65b8swYk4e9FP2TNUpCmuRdXT9cb9vE8YztxC
8s4N23RHaEx9I6pC3OU64a2HfhiQM/oOIvjlhTBsjojXsQcqZFD1vsoSYA8Byl0w
m9EQWqPqge4m6yEYl7uAyL6xSthbrhcU1Ks5jvNXGcWzEQj7BATnynJANsfZ+y6r
ZoVcsRNX49m1BG+p9br+9DFffPiNFUMqxbfr73L9HRep3OsPeFKazFG0bKd3hOqA
E6L/AnBd9soSqTuTvbisWrGWbomhtd5G/fAa1uHrWTPHMXUWCmkguiau51FNfcHu
xcJlBBVCvUmmd5u3wF6QeiyjPs4KEBzQzsOUsWKHRxDBp6s+5PX/lHuXRBlDP+fN
TQq0KbvBtea1OyMaRtoV
=Rtl/
-----END PGP SIGNATURE-----
Merge tag 'nfsd-4.12' of git://linux-nfs.org/~bfields/linux
Pull nfsd updates from Bruce Fields:
"Another RDMA update from Chuck Lever, and a bunch of miscellaneous
bugfixes"
* tag 'nfsd-4.12' of git://linux-nfs.org/~bfields/linux: (26 commits)
nfsd: Fix up the "supattr_exclcreat" attributes
nfsd: encoders mustn't use unitialized values in error cases
nfsd: fix undefined behavior in nfsd4_layout_verify
lockd: fix lockd shutdown race
NFSv4: Fix callback server shutdown
SUNRPC: Refactor svc_set_num_threads()
NFSv4.x/callback: Create the callback service through svc_create_pooled
lockd: remove redundant check on block
svcrdma: Clean out old XDR encoders
svcrdma: Remove the req_map cache
svcrdma: Remove unused RDMA Write completion handler
svcrdma: Reduce size of sge array in struct svc_rdma_op_ctxt
svcrdma: Clean up RPC-over-RDMA backchannel reply processing
svcrdma: Report Write/Reply chunk overruns
svcrdma: Clean up RDMA_ERROR path
svcrdma: Use rdma_rw API in RPC reply path
svcrdma: Introduce local rdma_rw API helpers
svcrdma: Clean up svc_rdma_get_inv_rkey()
svcrdma: Add helper to save pages under I/O
svcrdma: Eliminate RPCRDMA_SQ_DEPTH_MULT
...
If an NFSv4 client asks us for the supattr_exclcreat, then we must
not return attributes that are unsupported by this minor version.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Fixes: 75976de655 ("NFSD: Return word2 bitmask if setting security..,")
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
In error cases, lgp->lg_layout_type may be out of bounds; so we
shouldn't be using it until after the check of nfserr.
This was seen to crash nfsd threads when the server receives a LAYOUTGET
request with a large layout type.
GETDEVICEINFO has the same problem.
Reported-by: Ari Kauppi <Ari.Kauppi@synopsys.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
UBSAN: Undefined behaviour in fs/nfsd/nfs4proc.c:1262:34
shift exponent 128 is too large for 32-bit type 'int'
Depending on compiler+architecture, this may cause the check for
layout_type to succeed for overly large values (which seems to be the
case with amd64). The large value will be later used in de-referencing
nfsd4_layout_ops for function pointers.
Reported-by: Jani Tuovila <tuovila@synopsys.com>
Signed-off-by: Ari Kauppi <ari@synopsys.com>
[colin.king@canonical.com: use LAYOUT_TYPE_MAX instead of 32]
Cc: stable@vger.kernel.org
Reviewed-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pull misc vfs updates from Al Viro:
"Assorted bits and pieces from various people. No common topic in this
pile, sorry"
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs/affs: add rename exchange
fs/affs: add rename2 to prepare multiple methods
Make stat/lstat/fstatat pass AT_NO_AUTOMOUNT to vfs_statx()
fs: don't set *REFERENCED on single use objects
fs: compat: Remove warning from COMPATIBLE_IOCTL
remove pointless extern of atime_need_update_rcu()
fs: completely ignore unknown open flags
fs: add a VALID_OPEN_FLAGS
fs: remove _submit_bh()
fs: constify tree_descr arrays passed to simple_fill_super()
fs: drop duplicate header percpu-rwsem.h
fs/affs: bugfix: Write files greater than page size on OFS
fs/affs: bugfix: enable writes on OFS disks
fs/affs: remove node generation check
fs/affs: import amigaffs.h
fs/affs: bugfix: make symbolic links work again
Pull block layer updates from Jens Axboe:
- Add BFQ IO scheduler under the new blk-mq scheduling framework. BFQ
was initially a fork of CFQ, but subsequently changed to implement
fairness based on B-WF2Q+, a modified variant of WF2Q. BFQ is meant
to be used on desktop type single drives, providing good fairness.
From Paolo.
- Add Kyber IO scheduler. This is a full multiqueue aware scheduler,
using a scalable token based algorithm that throttles IO based on
live completion IO stats, similary to blk-wbt. From Omar.
- A series from Jan, moving users to separately allocated backing
devices. This continues the work of separating backing device life
times, solving various problems with hot removal.
- A series of updates for lightnvm, mostly from Javier. Includes a
'pblk' target that exposes an open channel SSD as a physical block
device.
- A series of fixes and improvements for nbd from Josef.
- A series from Omar, removing queue sharing between devices on mostly
legacy drivers. This helps us clean up other bits, if we know that a
queue only has a single device backing. This has been overdue for
more than a decade.
- Fixes for the blk-stats, and improvements to unify the stats and user
windows. This both improves blk-wbt, and enables other users to
register a need to receive IO stats for a device. From Omar.
- blk-throttle improvements from Shaohua. This provides a scalable
framework for implementing scalable priotization - particularly for
blk-mq, but applicable to any type of block device. The interface is
marked experimental for now.
- Bucketized IO stats for IO polling from Stephen Bates. This improves
efficiency of polled workloads in the presence of mixed block size
IO.
- A few fixes for opal, from Scott.
- A few pulls for NVMe, including a lot of fixes for NVMe-over-fabrics.
From a variety of folks, mostly Sagi and James Smart.
- A series from Bart, improving our exposed info and capabilities from
the blk-mq debugfs support.
- A series from Christoph, cleaning up how handle WRITE_ZEROES.
- A series from Christoph, cleaning up the block layer handling of how
we track errors in a request. On top of being a nice cleanup, it also
shrinks the size of struct request a bit.
- Removal of mg_disk and hd (sorry Linus) by Christoph. The former was
never used by platforms, and the latter has outlived it's usefulness.
- Various little bug fixes and cleanups from a wide variety of folks.
* 'for-4.12/block' of git://git.kernel.dk/linux-block: (329 commits)
block: hide badblocks attribute by default
blk-mq: unify hctx delay_work and run_work
block: add kblock_mod_delayed_work_on()
blk-mq: unify hctx delayed_run_work and run_work
nbd: fix use after free on module unload
MAINTAINERS: bfq: Add Paolo as maintainer for the BFQ I/O scheduler
blk-mq-sched: alloate reserved tags out of normal pool
mtip32xx: use runtime tag to initialize command header
scsi: Implement blk_mq_ops.show_rq()
blk-mq: Add blk_mq_ops.show_rq()
blk-mq: Show operation, cmd_flags and rq_flags names
blk-mq: Make blk_flags_show() callers append a newline character
blk-mq: Move the "state" debugfs attribute one level down
blk-mq: Unregister debugfs attributes earlier
blk-mq: Only unregister hctxs for which registration succeeded
blk-mq-debugfs: Rename functions for registering and unregistering the mq directory
blk-mq: Let blk_mq_debugfs_register() look up the queue name
blk-mq: Register <dev>/queue/mq after having registered <dev>/queue
ide-pm: always pass 0 error to ide_complete_rq in ide_do_devset
ide-pm: always pass 0 error to __blk_end_request_all
..
simple_fill_super() is passed an array of tree_descr structures which
describe the files to create in the filesystem's root directory. Since
these arrays are never modified intentionally, they should be 'const' so
that they are placed in .rodata and benefit from memory protection.
This patch updates the function signature and all users, and also
constifies tree_descr.name.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
consider the sequence of commands:
mkdir -p /import/nfs /import/bind /import/etc
mount --bind / /import/bind
mount --make-private /import/bind
mount --bind /import/etc /import/bind/etc
exportfs -o rw,no_root_squash,crossmnt,async,no_subtree_check localhost:/
mount -o vers=4 localhost:/ /import/nfs
ls -l /import/nfs/etc
You would not expect this to report a stale file handle.
Yet it does.
The manipulations under /import/bind cause the dentry for
/etc to get the DCACHE_MOUNTED flag set, even though nothing
is mounted on /etc. This causes nfsd to call
nfsd_cross_mnt() even though there is no mountpoint. So an
upcall to mountd for "/etc" is performed.
The 'crossmnt' flag on the export of / causes mountd to
report that /etc is exported as it is a descendant of /. It
assumes the kernel wouldn't ask about something that wasn't
a mountpoint. The filehandle returned identifies the
filesystem and the inode number of /etc.
When this filehandle is presented to rpc.mountd, via
"nfsd.fh", the inode cannot be found associated with any
name in /etc/exports, or with any mountpoint listed by
getmntent(). So rpc.mountd says the filehandle doesn't
exist. Hence ESTALE.
This is fixed by teaching nfsd not to trust DCACHE_MOUNTED
too much. It is just a hint, not a guarantee.
Change nfsd_mountpoint() to return '1' for a certain mountpoint,
'2' for a possible mountpoint, and 0 otherwise.
Then change nfsd_crossmnt() to check if follow_down()
actually found a mountpount and, if not, to avoid performing
a lookup if the location is not known to certainly require
an export-point.
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
kstrdup() already checks for NULL.
(Brought to our attention by Jason Yann noticing (from sparse output)
that it should have been declared static.)
Signed-off-by: NeilBrown <neilb@suse.com>
Reported-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
A client can append random data to the end of an NFSv2 or NFSv3 RPC call
without our complaining; we'll just stop parsing at the end of the
expected data and ignore the rest.
Encoded arguments and replies are stored together in an array of pages,
and if a call is too large it could leave inadequate space for the
reply. This is normally OK because NFS RPC's typically have either
short arguments and long replies (like READ) or long arguments and short
replies (like WRITE). But a client that sends an incorrectly long reply
can violate those assumptions. This was observed to cause crashes.
So, insist that the argument not be any longer than we expect.
Also, several operations increment rq_next_page in the decode routine
before checking the argument size, which can leave rq_next_page pointing
well past the end of the page array, causing trouble later in
svc_free_pages.
As followup we may also want to rewrite the encoding routines to check
more carefully that they aren't running off the end of the page array.
Reported-by: Tuomas Haanpää <thaan@synopsys.com>
Reported-by: Ari Kauppi <ari@synopsys.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The NFSv2/v3 code does not systematically check whether we decode past
the end of the buffer. This generally appears to be harmless, but there
are a few places where we do arithmetic on the pointers involved and
don't account for the possibility that a length could be negative. Add
checks to catch these.
Reported-by: Tuomas Haanpää <thaan@synopsys.com>
Reported-by: Ari Kauppi <ari@synopsys.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
A client can append random data to the end of an NFSv2 or NFSv3 RPC call
without our complaining; we'll just stop parsing at the end of the
expected data and ignore the rest.
Encoded arguments and replies are stored together in an array of pages,
and if a call is too large it could leave inadequate space for the
reply. This is normally OK because NFS RPC's typically have either
short arguments and long replies (like READ) or long arguments and short
replies (like WRITE). But a client that sends an incorrectly long reply
can violate those assumptions. This was observed to cause crashes.
Also, several operations increment rq_next_page in the decode routine
before checking the argument size, which can leave rq_next_page pointing
well past the end of the page array, causing trouble later in
svc_free_pages.
So, following a suggestion from Neil Brown, add a central check to
enforce our expectation that no NFSv2/v3 call has both a large call and
a large reply.
As followup we may also want to rewrite the encoding routines to check
more carefully that they aren't running off the end of the page array.
We may also consider rejecting calls that have any extra garbage
appended. That would be safer, and within our rights by spec, but given
the age of our server and the NFS protocol, and the fact that we've
never enforced this before, we may need to balance that against the
possibility of breaking some oddball client.
Reported-by: Tuomas Haanpää <thaan@synopsys.com>
Reported-by: Ari Kauppi <ari@synopsys.com>
Cc: stable@vger.kernel.org
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
This passes on the scsi_cmnd result field to users of passthrough
requests. Currently we abuse req->errors for this purpose, but that
field will go away in its current form.
Note that the old IDE code abuses the errors field in very creative
ways and stores all kinds of different values in it. I didn't dare
to touch this magic, so the abuses are brought forward 1:1.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
The function only returns -EIO if rq->errors is non-zero, which is not
very useful and lets a large number of callers ignore the return value.
Just let the callers figure out their error themselves.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>