NFS client updates for Linux 4.9

Highlights include:
 
 Stable bugfixes:
 - sunrpc: fix writ espace race causing stalls
 - NFS: Fix inode corruption in nfs_prime_dcache()
 - NFSv4: Don't report revoked delegations as valid in
   nfs_have_delegation()
 - NFSv4: nfs4_copy_delegation_stateid() must fail if the delegation is
   invalid
 - NFSv4: Open state recovery must account for file permission changes
 - NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic
 
 Features:
 - Add support for tracking multiple layout types with an ordered list
 - Add support for using multiple backchannel threads on the client
 - Add support for pNFS file layout session trunking
 - Delay xprtrdma use of DMA API (for device driver removal)
 - Add support for xprtrdma remote invalidation
 - Add support for larger xprtrdma inline thresholds
 - Use a scatter/gather list for sending xprtrdma RPC calls
 - Add support for the CB_NOTIFY_LOCK callback
 - Improve hashing sunrpc auth_creds by using both uid and gid
 
 Bugfixes:
 - Fix xprtrdma use of DMA API
 - Validate filenames before adding to the dcache
 - Fix corruption of xdr->nwords in xdr_copy_to_scratch
 - Fix setting buffer length in xdr_set_next_buffer()
 - Don't deadlock the state manager on the SEQUENCE status flags
 - Various delegation and stateid related fixes
 - Retry operations if an interrupted slot receives EREMOTEIO
 - Make nfs boot time y2038 safe
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJX/+ZfAAoJENfLVL+wpUDr5MUP/16s2Kp9ZZZZ7ICi3yrHOzb0
 9WpCOmbKUIELXl8YgkxlvPUYMzTQTIc32TwbVgdFV0g41my/0+O3z3+IiTrUGxH5
 8LgouMWBZ9KKmyUB//+KQAXr3j/bvDdF6Li6wJfz8a2o+9xT4oTkK1+Js8p0kn6e
 HNKfRknfCKwvE+j4tPCLfs2RX5qDyBFILXwWhj1fAbmT3rbnp+QqkXD4mWUrXb9z
 DBgxciXRhOkOQQAD2KQBFd2kUqWDZ5ED23b+aYsu9D3VCW45zitBqQFAxkQWL0hp
 x8Mp+MDCxlgdEaGQPUmUiDtPkG1X9ZxUJCAwaJWWsZaItwR2Il+en2sETctnTZ1X
 0IAxZVFdolzSeLzIfNx3OG32JdWJdaNjUzkIZam8gO6i1f6PAmK4alR0J3CT31nJ
 /OEN76o1E7acGWRMmj+MAZ2U5gPfR7EitOzyE8ZUPcHgyeGMiynjwi56WIpeSvT2
 F/Sp5kRe5+D5gtnYuppGp7Srp5vYdtFaz1zgPDUKpDLcxfDweO8AHGjJf3Zmrunx
 X24yia4A14CnfcUy4vKpISXRykmkG/3Z0tpWwV53uXZm4nlQfRc7gPibiW7Ay521
 af8sDoItW98K3DK5NQU7IUn83ua1TStzpoqlAEafRw//g9zPMTbhHvNvOyrRfrcX
 kjWn6hNblMu9M34JOjtu
 =XOrF
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-4.9-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client updates from Anna Schumaker:
 "Highlights include:

  Stable bugfixes:
   - sunrpc: fix writ espace race causing stalls
   - NFS: Fix inode corruption in nfs_prime_dcache()
   - NFSv4: Don't report revoked delegations as valid in nfs_have_delegation()
   - NFSv4: nfs4_copy_delegation_stateid() must fail if the delegation is invalid
   - NFSv4: Open state recovery must account for file permission changes
   - NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic

  Features:
   - Add support for tracking multiple layout types with an ordered list
   - Add support for using multiple backchannel threads on the client
   - Add support for pNFS file layout session trunking
   - Delay xprtrdma use of DMA API (for device driver removal)
   - Add support for xprtrdma remote invalidation
   - Add support for larger xprtrdma inline thresholds
   - Use a scatter/gather list for sending xprtrdma RPC calls
   - Add support for the CB_NOTIFY_LOCK callback
   - Improve hashing sunrpc auth_creds by using both uid and gid

  Bugfixes:
   - Fix xprtrdma use of DMA API
   - Validate filenames before adding to the dcache
   - Fix corruption of xdr->nwords in xdr_copy_to_scratch
   - Fix setting buffer length in xdr_set_next_buffer()
   - Don't deadlock the state manager on the SEQUENCE status flags
   - Various delegation and stateid related fixes
   - Retry operations if an interrupted slot receives EREMOTEIO
   - Make nfs boot time y2038 safe"

* tag 'nfs-for-4.9-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (100 commits)
  NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic
  fs: nfs: Make nfs boot time y2038 safe
  sunrpc: replace generic auth_cred hash with auth-specific function
  sunrpc: add RPCSEC_GSS hash_cred() function
  sunrpc: add auth_unix hash_cred() function
  sunrpc: add generic_auth hash_cred() function
  sunrpc: add hash_cred() function to rpc_authops struct
  Retry operation on EREMOTEIO on an interrupted slot
  pNFS: Fix atime updates on pNFS clients
  sunrpc: queue work on system_power_efficient_wq
  NFSv4.1: Even if the stateid is OK, we may need to recover the open modes
  NFSv4: If recovery failed for a specific open stateid, then don't retry
  NFSv4: Fix retry issues with nfs41_test/free_stateid
  NFSv4: Open state recovery must account for file permission changes
  NFSv4: Mark the lock and open stateids as invalid after freeing them
  NFSv4: Don't test open_stateid unless it is set
  NFSv4: nfs4_do_handle_exception() handle revoke/expiry of a single stateid
  NFS: Always call nfs_inode_find_state_and_recover() when revoking a delegation
  NFSv4: Fix a race when updating an open_stateid
  NFSv4: Fix a race in nfs_inode_reclaim_delegation()
  ...
This commit is contained in:
Linus Torvalds 2016-10-13 21:28:20 -07:00
commit c4a86165d1
58 changed files with 2217 additions and 986 deletions

View File

@ -2470,6 +2470,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
nfsrootdebug [NFS] enable nfsroot debugging messages.
See Documentation/filesystems/nfs/nfsroot.txt.
nfs.callback_nr_threads=
[NFSv4] set the total number of threads that the
NFS client will assign to service NFSv4 callback
requests.
nfs.callback_tcpport=
[NFS] set the TCP port on which the NFSv4 callback
channel should listen.
@ -2493,6 +2498,13 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
of returning the full 64-bit number.
The default is to return 64-bit inode numbers.
nfs.max_session_cb_slots=
[NFSv4.1] Sets the maximum number of session
slots the client will assign to the callback
channel. This determines the maximum number of
callbacks the client will process in parallel for
a particular server.
nfs.max_session_slots=
[NFSv4.1] Sets the maximum number of session slots
the client will attempt to negotiate with the server.

View File

@ -76,7 +76,7 @@ static void nfs_dns_cache_revisit(struct cache_deferred_req *d, int toomany)
dreq = container_of(d, struct nfs_cache_defer_req, deferred_req);
complete_all(&dreq->completion);
complete(&dreq->completion);
nfs_cache_defer_req_put(dreq);
}

View File

@ -31,8 +31,6 @@
struct nfs_callback_data {
unsigned int users;
struct svc_serv *serv;
struct svc_rqst *rqst;
struct task_struct *task;
};
static struct nfs_callback_data nfs_callback_info[NFS4_MAX_MINOR_VERSION + 1];
@ -89,15 +87,6 @@ nfs4_callback_svc(void *vrqstp)
return 0;
}
/*
* Prepare to bring up the NFSv4 callback service
*/
static struct svc_rqst *
nfs4_callback_up(struct svc_serv *serv)
{
return svc_prepare_thread(serv, &serv->sv_pools[0], NUMA_NO_NODE);
}
#if defined(CONFIG_NFS_V4_1)
/*
* The callback service for NFSv4.1 callbacks
@ -139,29 +128,6 @@ nfs41_callback_svc(void *vrqstp)
return 0;
}
/*
* Bring up the NFSv4.1 callback service
*/
static struct svc_rqst *
nfs41_callback_up(struct svc_serv *serv)
{
struct svc_rqst *rqstp;
INIT_LIST_HEAD(&serv->sv_cb_list);
spin_lock_init(&serv->sv_cb_lock);
init_waitqueue_head(&serv->sv_cb_waitq);
rqstp = svc_prepare_thread(serv, &serv->sv_pools[0], NUMA_NO_NODE);
dprintk("--> %s return %d\n", __func__, PTR_ERR_OR_ZERO(rqstp));
return rqstp;
}
static void nfs_minorversion_callback_svc_setup(struct svc_serv *serv,
struct svc_rqst **rqstpp, int (**callback_svc)(void *vrqstp))
{
*rqstpp = nfs41_callback_up(serv);
*callback_svc = nfs41_callback_svc;
}
static inline void nfs_callback_bc_serv(u32 minorversion, struct rpc_xprt *xprt,
struct svc_serv *serv)
{
@ -173,13 +139,6 @@ static inline void nfs_callback_bc_serv(u32 minorversion, struct rpc_xprt *xprt,
xprt->bc_serv = serv;
}
#else
static void nfs_minorversion_callback_svc_setup(struct svc_serv *serv,
struct svc_rqst **rqstpp, int (**callback_svc)(void *vrqstp))
{
*rqstpp = ERR_PTR(-ENOTSUPP);
*callback_svc = ERR_PTR(-ENOTSUPP);
}
static inline void nfs_callback_bc_serv(u32 minorversion, struct rpc_xprt *xprt,
struct svc_serv *serv)
{
@ -189,45 +148,22 @@ static inline void nfs_callback_bc_serv(u32 minorversion, struct rpc_xprt *xprt,
static int nfs_callback_start_svc(int minorversion, struct rpc_xprt *xprt,
struct svc_serv *serv)
{
struct svc_rqst *rqstp;
int (*callback_svc)(void *vrqstp);
struct nfs_callback_data *cb_info = &nfs_callback_info[minorversion];
int nrservs = nfs_callback_nr_threads;
int ret;
nfs_callback_bc_serv(minorversion, xprt, serv);
if (cb_info->task)
if (nrservs < NFS4_MIN_NR_CALLBACK_THREADS)
nrservs = NFS4_MIN_NR_CALLBACK_THREADS;
if (serv->sv_nrthreads-1 == nrservs)
return 0;
switch (minorversion) {
case 0:
/* v4.0 callback setup */
rqstp = nfs4_callback_up(serv);
callback_svc = nfs4_callback_svc;
break;
default:
nfs_minorversion_callback_svc_setup(serv,
&rqstp, &callback_svc);
}
if (IS_ERR(rqstp))
return PTR_ERR(rqstp);
svc_sock_update_bufs(serv);
cb_info->serv = serv;
cb_info->rqst = rqstp;
cb_info->task = kthread_create(callback_svc, cb_info->rqst,
"nfsv4.%u-svc", minorversion);
if (IS_ERR(cb_info->task)) {
ret = PTR_ERR(cb_info->task);
svc_exit_thread(cb_info->rqst);
cb_info->rqst = NULL;
cb_info->task = NULL;
ret = serv->sv_ops->svo_setup(serv, NULL, nrservs);
if (ret) {
serv->sv_ops->svo_setup(serv, NULL, 0);
return ret;
}
rqstp->rq_task = cb_info->task;
wake_up_process(cb_info->task);
dprintk("nfs_callback_up: service started\n");
return 0;
}
@ -281,19 +217,41 @@ err_bind:
return ret;
}
static struct svc_serv_ops nfs_cb_sv_ops = {
static struct svc_serv_ops nfs40_cb_sv_ops = {
.svo_function = nfs4_callback_svc,
.svo_enqueue_xprt = svc_xprt_do_enqueue,
.svo_setup = svc_set_num_threads,
.svo_module = THIS_MODULE,
};
#if defined(CONFIG_NFS_V4_1)
static struct svc_serv_ops nfs41_cb_sv_ops = {
.svo_function = nfs41_callback_svc,
.svo_enqueue_xprt = svc_xprt_do_enqueue,
.svo_setup = svc_set_num_threads,
.svo_module = THIS_MODULE,
};
struct svc_serv_ops *nfs4_cb_sv_ops[] = {
[0] = &nfs40_cb_sv_ops,
[1] = &nfs41_cb_sv_ops,
};
#else
struct svc_serv_ops *nfs4_cb_sv_ops[] = {
[0] = &nfs40_cb_sv_ops,
[1] = NULL,
};
#endif
static struct svc_serv *nfs_callback_create_svc(int minorversion)
{
struct nfs_callback_data *cb_info = &nfs_callback_info[minorversion];
struct svc_serv *serv;
struct svc_serv_ops *sv_ops;
/*
* Check whether we're already up and running.
*/
if (cb_info->task) {
if (cb_info->serv) {
/*
* Note: increase service usage, because later in case of error
* svc_destroy() will be called.
@ -302,6 +260,17 @@ static struct svc_serv *nfs_callback_create_svc(int minorversion)
return cb_info->serv;
}
switch (minorversion) {
case 0:
sv_ops = nfs4_cb_sv_ops[0];
break;
default:
sv_ops = nfs4_cb_sv_ops[1];
}
if (sv_ops == NULL)
return ERR_PTR(-ENOTSUPP);
/*
* Sanity check: if there's no task,
* we should be the first user ...
@ -310,11 +279,12 @@ static struct svc_serv *nfs_callback_create_svc(int minorversion)
printk(KERN_WARNING "nfs_callback_create_svc: no kthread, %d users??\n",
cb_info->users);
serv = svc_create(&nfs4_callback_program, NFS4_CALLBACK_BUFSIZE, &nfs_cb_sv_ops);
serv = svc_create(&nfs4_callback_program, NFS4_CALLBACK_BUFSIZE, sv_ops);
if (!serv) {
printk(KERN_ERR "nfs_callback_create_svc: create service failed\n");
return ERR_PTR(-ENOMEM);
}
cb_info->serv = serv;
/* As there is only one thread we need to over-ride the
* default maximum of 80 connections
*/
@ -357,6 +327,8 @@ int nfs_callback_up(u32 minorversion, struct rpc_xprt *xprt)
* thread exits.
*/
err_net:
if (!cb_info->users)
cb_info->serv = NULL;
svc_destroy(serv);
err_create:
mutex_unlock(&nfs_callback_mutex);
@ -374,18 +346,18 @@ err_start:
void nfs_callback_down(int minorversion, struct net *net)
{
struct nfs_callback_data *cb_info = &nfs_callback_info[minorversion];
struct svc_serv *serv;
mutex_lock(&nfs_callback_mutex);
nfs_callback_down_net(minorversion, cb_info->serv, net);
serv = cb_info->serv;
nfs_callback_down_net(minorversion, serv, net);
cb_info->users--;
if (cb_info->users == 0 && cb_info->task != NULL) {
kthread_stop(cb_info->task);
dprintk("nfs_callback_down: service stopped\n");
svc_exit_thread(cb_info->rqst);
if (cb_info->users == 0) {
svc_get(serv);
serv->sv_ops->svo_setup(serv, NULL, 0);
svc_destroy(serv);
dprintk("nfs_callback_down: service destroyed\n");
cb_info->serv = NULL;
cb_info->rqst = NULL;
cb_info->task = NULL;
}
mutex_unlock(&nfs_callback_mutex);
}

View File

@ -179,6 +179,15 @@ extern __be32 nfs4_callback_devicenotify(
struct cb_devicenotifyargs *args,
void *dummy, struct cb_process_state *cps);
struct cb_notify_lock_args {
struct nfs_fh cbnl_fh;
struct nfs_lowner cbnl_owner;
bool cbnl_valid;
};
extern __be32 nfs4_callback_notify_lock(struct cb_notify_lock_args *args,
void *dummy,
struct cb_process_state *cps);
#endif /* CONFIG_NFS_V4_1 */
extern int check_gss_callback_principal(struct nfs_client *, struct svc_rqst *);
extern __be32 nfs4_callback_getattr(struct cb_getattrargs *args,
@ -198,6 +207,9 @@ extern void nfs_callback_down(int minorversion, struct net *net);
#define NFS41_BC_MIN_CALLBACKS 1
#define NFS41_BC_MAX_CALLBACKS 1
#define NFS4_MIN_NR_CALLBACK_THREADS 1
extern unsigned int nfs_callback_set_tcpport;
extern unsigned short nfs_callback_nr_threads;
#endif /* __LINUX_FS_NFS_CALLBACK_H */

View File

@ -628,4 +628,20 @@ out:
dprintk("%s: exit with status = %d\n", __func__, ntohl(status));
return status;
}
__be32 nfs4_callback_notify_lock(struct cb_notify_lock_args *args, void *dummy,
struct cb_process_state *cps)
{
if (!cps->clp) /* set in cb_sequence */
return htonl(NFS4ERR_OP_NOT_IN_SESSION);
dprintk_rcu("NFS: CB_NOTIFY_LOCK request from %s\n",
rpc_peeraddr2str(cps->clp->cl_rpcclient, RPC_DISPLAY_ADDR));
/* Don't wake anybody if the string looked bogus */
if (args->cbnl_valid)
__wake_up(&cps->clp->cl_lock_waitq, TASK_NORMAL, 0, args);
return htonl(NFS4_OK);
}
#endif /* CONFIG_NFS_V4_1 */

View File

@ -35,6 +35,7 @@
(1 + 3) * 4) // seqid, 3 slotids
#define CB_OP_RECALLANY_RES_MAXSZ (CB_OP_HDR_RES_MAXSZ)
#define CB_OP_RECALLSLOT_RES_MAXSZ (CB_OP_HDR_RES_MAXSZ)
#define CB_OP_NOTIFY_LOCK_RES_MAXSZ (CB_OP_HDR_RES_MAXSZ)
#endif /* CONFIG_NFS_V4_1 */
#define NFSDBG_FACILITY NFSDBG_CALLBACK
@ -72,7 +73,7 @@ static int nfs4_encode_void(struct svc_rqst *rqstp, __be32 *p, void *dummy)
return xdr_ressize_check(rqstp, p);
}
static __be32 *read_buf(struct xdr_stream *xdr, int nbytes)
static __be32 *read_buf(struct xdr_stream *xdr, size_t nbytes)
{
__be32 *p;
@ -534,6 +535,49 @@ static __be32 decode_recallslot_args(struct svc_rqst *rqstp,
return 0;
}
static __be32 decode_lockowner(struct xdr_stream *xdr, struct cb_notify_lock_args *args)
{
__be32 *p;
unsigned int len;
p = read_buf(xdr, 12);
if (unlikely(p == NULL))
return htonl(NFS4ERR_BADXDR);
p = xdr_decode_hyper(p, &args->cbnl_owner.clientid);
len = be32_to_cpu(*p);
p = read_buf(xdr, len);
if (unlikely(p == NULL))
return htonl(NFS4ERR_BADXDR);
/* Only try to decode if the length is right */
if (len == 20) {
p += 2; /* skip "lock id:" */
args->cbnl_owner.s_dev = be32_to_cpu(*p++);
xdr_decode_hyper(p, &args->cbnl_owner.id);
args->cbnl_valid = true;
} else {
args->cbnl_owner.s_dev = 0;
args->cbnl_owner.id = 0;
args->cbnl_valid = false;
}
return 0;
}
static __be32 decode_notify_lock_args(struct svc_rqst *rqstp, struct xdr_stream *xdr, struct cb_notify_lock_args *args)
{
__be32 status;
status = decode_fh(xdr, &args->cbnl_fh);
if (unlikely(status != 0))
goto out;
status = decode_lockowner(xdr, args);
out:
dprintk("%s: exit with status = %d\n", __func__, ntohl(status));
return status;
}
#endif /* CONFIG_NFS_V4_1 */
static __be32 encode_string(struct xdr_stream *xdr, unsigned int len, const char *str)
@ -746,6 +790,7 @@ preprocess_nfs41_op(int nop, unsigned int op_nr, struct callback_op **op)
case OP_CB_RECALL_SLOT:
case OP_CB_LAYOUTRECALL:
case OP_CB_NOTIFY_DEVICEID:
case OP_CB_NOTIFY_LOCK:
*op = &callback_ops[op_nr];
break;
@ -753,7 +798,6 @@ preprocess_nfs41_op(int nop, unsigned int op_nr, struct callback_op **op)
case OP_CB_PUSH_DELEG:
case OP_CB_RECALLABLE_OBJ_AVAIL:
case OP_CB_WANTS_CANCELLED:
case OP_CB_NOTIFY_LOCK:
return htonl(NFS4ERR_NOTSUPP);
default:
@ -1006,6 +1050,11 @@ static struct callback_op callback_ops[] = {
.decode_args = (callback_decode_arg_t)decode_recallslot_args,
.res_maxsize = CB_OP_RECALLSLOT_RES_MAXSZ,
},
[OP_CB_NOTIFY_LOCK] = {
.process_op = (callback_process_op_t)nfs4_callback_notify_lock,
.decode_args = (callback_decode_arg_t)decode_notify_lock_args,
.res_maxsize = CB_OP_NOTIFY_LOCK_RES_MAXSZ,
},
#endif /* CONFIG_NFS_V4_1 */
};

View File

@ -313,7 +313,10 @@ static struct nfs_client *nfs_match_client(const struct nfs_client_initdata *dat
continue;
/* Match the full socket address */
if (!rpc_cmp_addr_port(sap, clap))
continue;
/* Match all xprt_switch full socket addresses */
if (!rpc_clnt_xprt_switch_has_addr(clp->cl_rpcclient,
sap))
continue;
atomic_inc(&clp->cl_count);
return clp;
@ -785,7 +788,8 @@ int nfs_probe_fsinfo(struct nfs_server *server, struct nfs_fh *mntfh, struct nfs
}
fsinfo.fattr = fattr;
fsinfo.layouttype = 0;
fsinfo.nlayouttypes = 0;
memset(fsinfo.layouttype, 0, sizeof(fsinfo.layouttype));
error = clp->rpc_ops->fsinfo(server, mntfh, &fsinfo);
if (error < 0)
goto out_error;
@ -1078,7 +1082,7 @@ void nfs_clients_init(struct net *net)
idr_init(&nn->cb_ident_idr);
#endif
spin_lock_init(&nn->nfs_client_lock);
nn->boot_time = CURRENT_TIME;
nn->boot_time = ktime_get_real();
}
#ifdef CONFIG_PROC_FS

View File

@ -41,6 +41,17 @@ void nfs_mark_delegation_referenced(struct nfs_delegation *delegation)
set_bit(NFS_DELEGATION_REFERENCED, &delegation->flags);
}
static bool
nfs4_is_valid_delegation(const struct nfs_delegation *delegation,
fmode_t flags)
{
if (delegation != NULL && (delegation->type & flags) == flags &&
!test_bit(NFS_DELEGATION_REVOKED, &delegation->flags) &&
!test_bit(NFS_DELEGATION_RETURNING, &delegation->flags))
return true;
return false;
}
static int
nfs4_do_check_delegation(struct inode *inode, fmode_t flags, bool mark)
{
@ -50,8 +61,7 @@ nfs4_do_check_delegation(struct inode *inode, fmode_t flags, bool mark)
flags &= FMODE_READ|FMODE_WRITE;
rcu_read_lock();
delegation = rcu_dereference(NFS_I(inode)->delegation);
if (delegation != NULL && (delegation->type & flags) == flags &&
!test_bit(NFS_DELEGATION_RETURNING, &delegation->flags)) {
if (nfs4_is_valid_delegation(delegation, flags)) {
if (mark)
nfs_mark_delegation_referenced(delegation);
ret = 1;
@ -185,15 +195,13 @@ void nfs_inode_reclaim_delegation(struct inode *inode, struct rpc_cred *cred,
rcu_read_unlock();
put_rpccred(oldcred);
trace_nfs4_reclaim_delegation(inode, res->delegation_type);
} else {
/* We appear to have raced with a delegation return. */
spin_unlock(&delegation->lock);
rcu_read_unlock();
nfs_inode_set_delegation(inode, cred, res);
return;
}
} else {
rcu_read_unlock();
/* We appear to have raced with a delegation return. */
spin_unlock(&delegation->lock);
}
rcu_read_unlock();
nfs_inode_set_delegation(inode, cred, res);
}
static int nfs_do_return_delegation(struct inode *inode, struct nfs_delegation *delegation, int issync)
@ -642,28 +650,49 @@ static void nfs_client_mark_return_unused_delegation_types(struct nfs_client *cl
rcu_read_unlock();
}
static void nfs_revoke_delegation(struct inode *inode)
static void nfs_mark_delegation_revoked(struct nfs_server *server,
struct nfs_delegation *delegation)
{
struct nfs_delegation *delegation;
rcu_read_lock();
delegation = rcu_dereference(NFS_I(inode)->delegation);
if (delegation != NULL) {
set_bit(NFS_DELEGATION_REVOKED, &delegation->flags);
nfs_mark_return_delegation(NFS_SERVER(inode), delegation);
}
rcu_read_unlock();
set_bit(NFS_DELEGATION_REVOKED, &delegation->flags);
delegation->stateid.type = NFS4_INVALID_STATEID_TYPE;
nfs_mark_return_delegation(server, delegation);
}
void nfs_remove_bad_delegation(struct inode *inode)
static bool nfs_revoke_delegation(struct inode *inode,
const nfs4_stateid *stateid)
{
struct nfs_delegation *delegation;
nfs4_stateid tmp;
bool ret = false;
rcu_read_lock();
delegation = rcu_dereference(NFS_I(inode)->delegation);
if (delegation == NULL)
goto out;
if (stateid == NULL) {
nfs4_stateid_copy(&tmp, &delegation->stateid);
stateid = &tmp;
} else if (!nfs4_stateid_match(stateid, &delegation->stateid))
goto out;
nfs_mark_delegation_revoked(NFS_SERVER(inode), delegation);
ret = true;
out:
rcu_read_unlock();
if (ret)
nfs_inode_find_state_and_recover(inode, stateid);
return ret;
}
void nfs_remove_bad_delegation(struct inode *inode,
const nfs4_stateid *stateid)
{
struct nfs_delegation *delegation;
nfs_revoke_delegation(inode);
if (!nfs_revoke_delegation(inode, stateid))
return;
delegation = nfs_inode_detach_delegation(inode);
if (delegation) {
nfs_inode_find_state_and_recover(inode, &delegation->stateid);
if (delegation)
nfs_free_delegation(delegation);
}
}
EXPORT_SYMBOL_GPL(nfs_remove_bad_delegation);
@ -786,8 +815,15 @@ static void nfs_delegation_mark_reclaim_server(struct nfs_server *server)
{
struct nfs_delegation *delegation;
list_for_each_entry_rcu(delegation, &server->delegations, super_list)
list_for_each_entry_rcu(delegation, &server->delegations, super_list) {
/*
* If the delegation may have been admin revoked, then we
* cannot reclaim it.
*/
if (test_bit(NFS_DELEGATION_TEST_EXPIRED, &delegation->flags))
continue;
set_bit(NFS_DELEGATION_NEED_RECLAIM, &delegation->flags);
}
}
/**
@ -851,6 +887,141 @@ restart:
rcu_read_unlock();
}
static inline bool nfs4_server_rebooted(const struct nfs_client *clp)
{
return (clp->cl_state & (BIT(NFS4CLNT_CHECK_LEASE) |
BIT(NFS4CLNT_LEASE_EXPIRED) |
BIT(NFS4CLNT_SESSION_RESET))) != 0;
}
static void nfs_mark_test_expired_delegation(struct nfs_server *server,
struct nfs_delegation *delegation)
{
if (delegation->stateid.type == NFS4_INVALID_STATEID_TYPE)
return;
clear_bit(NFS_DELEGATION_NEED_RECLAIM, &delegation->flags);
set_bit(NFS_DELEGATION_TEST_EXPIRED, &delegation->flags);
set_bit(NFS4CLNT_DELEGATION_EXPIRED, &server->nfs_client->cl_state);
}
static void nfs_inode_mark_test_expired_delegation(struct nfs_server *server,
struct inode *inode)
{
struct nfs_delegation *delegation;
rcu_read_lock();
delegation = rcu_dereference(NFS_I(inode)->delegation);
if (delegation)
nfs_mark_test_expired_delegation(server, delegation);
rcu_read_unlock();
}
static void nfs_delegation_mark_test_expired_server(struct nfs_server *server)
{
struct nfs_delegation *delegation;
list_for_each_entry_rcu(delegation, &server->delegations, super_list)
nfs_mark_test_expired_delegation(server, delegation);
}
/**
* nfs_mark_test_expired_all_delegations - mark all delegations for testing
* @clp: nfs_client to process
*
* Iterates through all the delegations associated with this server and
* marks them as needing to be checked for validity.
*/
void nfs_mark_test_expired_all_delegations(struct nfs_client *clp)
{
struct nfs_server *server;
rcu_read_lock();
list_for_each_entry_rcu(server, &clp->cl_superblocks, client_link)
nfs_delegation_mark_test_expired_server(server);
rcu_read_unlock();
}
/**
* nfs_reap_expired_delegations - reap expired delegations
* @clp: nfs_client to process
*
* Iterates through all the delegations associated with this server and
* checks if they have may have been revoked. This function is usually
* expected to be called in cases where the server may have lost its
* lease.
*/
void nfs_reap_expired_delegations(struct nfs_client *clp)
{
const struct nfs4_minor_version_ops *ops = clp->cl_mvops;
struct nfs_delegation *delegation;
struct nfs_server *server;
struct inode *inode;
struct rpc_cred *cred;
nfs4_stateid stateid;
restart:
rcu_read_lock();
list_for_each_entry_rcu(server, &clp->cl_superblocks, client_link) {
list_for_each_entry_rcu(delegation, &server->delegations,
super_list) {
if (test_bit(NFS_DELEGATION_RETURNING,
&delegation->flags))
continue;
if (test_bit(NFS_DELEGATION_TEST_EXPIRED,
&delegation->flags) == 0)
continue;
if (!nfs_sb_active(server->super))
continue;
inode = nfs_delegation_grab_inode(delegation);
if (inode == NULL) {
rcu_read_unlock();
nfs_sb_deactive(server->super);
goto restart;
}
cred = get_rpccred_rcu(delegation->cred);
nfs4_stateid_copy(&stateid, &delegation->stateid);
clear_bit(NFS_DELEGATION_TEST_EXPIRED, &delegation->flags);
rcu_read_unlock();
if (cred != NULL &&
ops->test_and_free_expired(server, &stateid, cred) < 0) {
nfs_revoke_delegation(inode, &stateid);
nfs_inode_find_state_and_recover(inode, &stateid);
}
put_rpccred(cred);
if (nfs4_server_rebooted(clp)) {
nfs_inode_mark_test_expired_delegation(server,inode);
iput(inode);
nfs_sb_deactive(server->super);
return;
}
iput(inode);
nfs_sb_deactive(server->super);
goto restart;
}
}
rcu_read_unlock();
}
void nfs_inode_find_delegation_state_and_recover(struct inode *inode,
const nfs4_stateid *stateid)
{
struct nfs_client *clp = NFS_SERVER(inode)->nfs_client;
struct nfs_delegation *delegation;
bool found = false;
rcu_read_lock();
delegation = rcu_dereference(NFS_I(inode)->delegation);
if (delegation &&
nfs4_stateid_match_other(&delegation->stateid, stateid)) {
nfs_mark_test_expired_delegation(NFS_SERVER(inode), delegation);
found = true;
}
rcu_read_unlock();
if (found)
nfs4_schedule_state_manager(clp);
}
/**
* nfs_delegations_present - check for existence of delegations
* @clp: client state handle
@ -893,7 +1064,7 @@ bool nfs4_copy_delegation_stateid(struct inode *inode, fmode_t flags,
flags &= FMODE_READ|FMODE_WRITE;
rcu_read_lock();
delegation = rcu_dereference(nfsi->delegation);
ret = (delegation != NULL && (delegation->type & flags) == flags);
ret = nfs4_is_valid_delegation(delegation, flags);
if (ret) {
nfs4_stateid_copy(dst, &delegation->stateid);
nfs_mark_delegation_referenced(delegation);

View File

@ -32,6 +32,7 @@ enum {
NFS_DELEGATION_REFERENCED,
NFS_DELEGATION_RETURNING,
NFS_DELEGATION_REVOKED,
NFS_DELEGATION_TEST_EXPIRED,
};
int nfs_inode_set_delegation(struct inode *inode, struct rpc_cred *cred, struct nfs_openres *res);
@ -47,11 +48,14 @@ void nfs_expire_unused_delegation_types(struct nfs_client *clp, fmode_t flags);
void nfs_expire_unreferenced_delegations(struct nfs_client *clp);
int nfs_client_return_marked_delegations(struct nfs_client *clp);
int nfs_delegations_present(struct nfs_client *clp);
void nfs_remove_bad_delegation(struct inode *inode);
void nfs_remove_bad_delegation(struct inode *inode, const nfs4_stateid *stateid);
void nfs_delegation_mark_reclaim(struct nfs_client *clp);
void nfs_delegation_reap_unclaimed(struct nfs_client *clp);
void nfs_mark_test_expired_all_delegations(struct nfs_client *clp);
void nfs_reap_expired_delegations(struct nfs_client *clp);
/* NFSv4 delegation-related procedures */
int nfs4_proc_delegreturn(struct inode *inode, struct rpc_cred *cred, const nfs4_stateid *stateid, int issync);
int nfs4_open_delegation_recall(struct nfs_open_context *ctx, struct nfs4_state *state, const nfs4_stateid *stateid, fmode_t type);
@ -62,6 +66,8 @@ void nfs_mark_delegation_referenced(struct nfs_delegation *delegation);
int nfs4_have_delegation(struct inode *inode, fmode_t flags);
int nfs4_check_delegation(struct inode *inode, fmode_t flags);
bool nfs4_delegation_flush_on_close(const struct inode *inode);
void nfs_inode_find_delegation_state_and_recover(struct inode *inode,
const nfs4_stateid *stateid);
#endif

View File

@ -435,11 +435,11 @@ int nfs_same_file(struct dentry *dentry, struct nfs_entry *entry)
return 0;
nfsi = NFS_I(inode);
if (entry->fattr->fileid == nfsi->fileid)
return 1;
if (nfs_compare_fh(entry->fh, &nfsi->fh) == 0)
return 1;
return 0;
if (entry->fattr->fileid != nfsi->fileid)
return 0;
if (entry->fh->size && nfs_compare_fh(entry->fh, &nfsi->fh) != 0)
return 0;
return 1;
}
static
@ -496,6 +496,14 @@ void nfs_prime_dcache(struct dentry *parent, struct nfs_entry *entry)
return;
if (!(entry->fattr->valid & NFS_ATTR_FATTR_FSID))
return;
if (filename.len == 0)
return;
/* Validate that the name doesn't contain any illegal '\0' */
if (strnlen(filename.name, filename.len) != filename.len)
return;
/* ...or '/' */
if (strnchr(filename.name, filename.len, '/'))
return;
if (filename.name[0] == '.') {
if (filename.len == 1)
return;
@ -517,6 +525,8 @@ again:
&entry->fattr->fsid))
goto out;
if (nfs_same_file(dentry, entry)) {
if (!entry->fh->size)
goto out;
nfs_set_verifier(dentry, nfs_save_change_attribute(dir));
status = nfs_refresh_inode(d_inode(dentry), entry->fattr);
if (!status)
@ -529,6 +539,10 @@ again:
goto again;
}
}
if (!entry->fh->size) {
d_lookup_done(dentry);
goto out;
}
inode = nfs_fhget(dentry->d_sb, entry->fh, entry->fattr, entry->label);
alias = d_splice_alias(inode, dentry);

View File

@ -387,7 +387,7 @@ static void nfs_direct_complete(struct nfs_direct_req *dreq)
dreq->iocb->ki_complete(dreq->iocb, res, 0);
}
complete_all(&dreq->completion);
complete(&dreq->completion);
nfs_direct_req_release(dreq);
}

View File

@ -520,7 +520,9 @@ const struct address_space_operations nfs_file_aops = {
.invalidatepage = nfs_invalidate_page,
.releasepage = nfs_release_page,
.direct_IO = nfs_direct_IO,
#ifdef CONFIG_MIGRATION
.migratepage = nfs_migrate_page,
#endif
.launder_page = nfs_launder_page,
.is_dirty_writeback = nfs_check_dirty_writeback,
.error_remove_page = generic_error_remove_page,
@ -685,11 +687,6 @@ out_noconflict:
goto out;
}
static int do_vfs_lock(struct file *file, struct file_lock *fl)
{
return locks_lock_file_wait(file, fl);
}
static int
do_unlk(struct file *filp, int cmd, struct file_lock *fl, int is_local)
{
@ -722,7 +719,7 @@ do_unlk(struct file *filp, int cmd, struct file_lock *fl, int is_local)
if (!is_local)
status = NFS_PROTO(inode)->lock(filp, cmd, fl);
else
status = do_vfs_lock(filp, fl);
status = locks_lock_file_wait(filp, fl);
return status;
}
@ -747,7 +744,7 @@ do_setlk(struct file *filp, int cmd, struct file_lock *fl, int is_local)
if (!is_local)
status = NFS_PROTO(inode)->lock(filp, cmd, fl);
else
status = do_vfs_lock(filp, fl);
status = locks_lock_file_wait(filp, fl);
if (status < 0)
goto out;

View File

@ -1080,7 +1080,7 @@ static int ff_layout_async_handle_error_v4(struct rpc_task *task,
case -NFS4ERR_BAD_STATEID:
if (state == NULL)
break;
nfs_remove_bad_delegation(state->inode);
nfs_remove_bad_delegation(state->inode, NULL);
case -NFS4ERR_OPENMODE:
if (state == NULL)
break;

View File

@ -534,12 +534,9 @@ void nfs_clear_pnfs_ds_commit_verifiers(struct pnfs_ds_commit_info *cinfo)
}
#endif
#ifdef CONFIG_MIGRATION
extern int nfs_migrate_page(struct address_space *,
struct page *, struct page *, enum migrate_mode);
#else
#define nfs_migrate_page NULL
#endif
static inline int
@ -562,7 +559,6 @@ void nfs_init_cinfo_from_dreq(struct nfs_commit_info *cinfo,
extern ssize_t nfs_dreq_bytes_left(struct nfs_direct_req *dreq);
/* nfs4proc.c */
extern void __nfs4_read_done_cb(struct nfs_pgio_header *);
extern struct nfs_client *nfs4_init_client(struct nfs_client *clp,
const struct nfs_client_initdata *);
extern int nfs40_walk_client_list(struct nfs_client *clp,
@ -571,6 +567,9 @@ extern int nfs40_walk_client_list(struct nfs_client *clp,
extern int nfs41_walk_client_list(struct nfs_client *clp,
struct nfs_client **result,
struct rpc_cred *cred);
extern int nfs4_test_session_trunk(struct rpc_clnt *,
struct rpc_xprt *,
void *);
static inline struct inode *nfs_igrab_and_active(struct inode *inode)
{

View File

@ -29,7 +29,7 @@ struct nfs_net {
int cb_users[NFS4_MAX_MINOR_VERSION + 1];
#endif
spinlock_t nfs_client_lock;
struct timespec boot_time;
ktime_t boot_time;
#ifdef CONFIG_PROC_FS
struct proc_dir_entry *proc_nfsfs;
#endif

View File

@ -443,6 +443,7 @@ int nfs42_proc_layoutstats_generic(struct nfs_server *server,
task = rpc_run_task(&task_setup);
if (IS_ERR(task))
return PTR_ERR(task);
rpc_put_task(task);
return 0;
}

View File

@ -39,6 +39,7 @@ enum nfs4_client_state {
NFS4CLNT_BIND_CONN_TO_SESSION,
NFS4CLNT_MOVED,
NFS4CLNT_LEASE_MOVED,
NFS4CLNT_DELEGATION_EXPIRED,
};
#define NFS4_RENEW_TIMEOUT 0x01
@ -57,8 +58,11 @@ struct nfs4_minor_version_ops {
struct nfs_fsinfo *);
void (*free_lock_state)(struct nfs_server *,
struct nfs4_lock_state *);
int (*test_and_free_expired)(struct nfs_server *,
nfs4_stateid *, struct rpc_cred *);
struct nfs_seqid *
(*alloc_seqid)(struct nfs_seqid_counter *, gfp_t);
int (*session_trunk)(struct rpc_clnt *, struct rpc_xprt *, void *);
const struct rpc_call_ops *call_sync_ops;
const struct nfs4_state_recovery_ops *reboot_recovery_ops;
const struct nfs4_state_recovery_ops *nograce_recovery_ops;
@ -156,6 +160,7 @@ enum {
NFS_STATE_RECLAIM_NOGRACE, /* OPEN stateid needs to recover state */
NFS_STATE_POSIX_LOCKS, /* Posix locks are supported */
NFS_STATE_RECOVERY_FAILED, /* OPEN stateid state recovery failed */
NFS_STATE_MAY_NOTIFY_LOCK, /* server may CB_NOTIFY_LOCK */
};
struct nfs4_state {
@ -203,6 +208,11 @@ struct nfs4_state_recovery_ops {
struct rpc_cred *);
};
struct nfs4_add_xprt_data {
struct nfs_client *clp;
struct rpc_cred *cred;
};
struct nfs4_state_maintenance_ops {
int (*sched_state_renewal)(struct nfs_client *, struct rpc_cred *, unsigned);
struct rpc_cred * (*get_state_renewal_cred_locked)(struct nfs_client *);
@ -278,6 +288,8 @@ extern int nfs4_proc_get_lease_time(struct nfs_client *clp,
struct nfs_fsinfo *fsinfo);
extern int nfs4_proc_layoutcommit(struct nfs4_layoutcommit_data *data,
bool sync);
extern int nfs4_detect_session_trunking(struct nfs_client *clp,
struct nfs41_exchange_id_res *res, struct rpc_xprt *xprt);
static inline bool
is_ds_only_client(struct nfs_client *clp)
@ -439,7 +451,7 @@ extern void nfs4_schedule_path_down_recovery(struct nfs_client *clp);
extern int nfs4_schedule_stateid_recovery(const struct nfs_server *, struct nfs4_state *);
extern int nfs4_schedule_migration_recovery(const struct nfs_server *);
extern void nfs4_schedule_lease_moved_recovery(struct nfs_client *);
extern void nfs41_handle_sequence_flag_errors(struct nfs_client *clp, u32 flags);
extern void nfs41_handle_sequence_flag_errors(struct nfs_client *clp, u32 flags, bool);
extern void nfs41_handle_server_scope(struct nfs_client *,
struct nfs41_server_scope **);
extern void nfs4_put_lock_state(struct nfs4_lock_state *lsp);
@ -471,6 +483,7 @@ extern struct nfs_subversion nfs_v4;
struct dentry *nfs4_try_mount(int, const char *, struct nfs_mount_info *, struct nfs_subversion *);
extern bool nfs4_disable_idmapping;
extern unsigned short max_session_slots;
extern unsigned short max_session_cb_slots;
extern unsigned short send_implementation_id;
extern bool recover_lost_locks;

View File

@ -199,6 +199,9 @@ struct nfs_client *nfs4_alloc_client(const struct nfs_client_initdata *cl_init)
clp->cl_minorversion = cl_init->minorversion;
clp->cl_mvops = nfs_v4_minor_ops[cl_init->minorversion];
clp->cl_mig_gen = 1;
#if IS_ENABLED(CONFIG_NFS_V4_1)
init_waitqueue_head(&clp->cl_lock_waitq);
#endif
return clp;
error:
@ -562,15 +565,15 @@ out:
/*
* Returns true if the client IDs match
*/
static bool nfs4_match_clientids(struct nfs_client *a, struct nfs_client *b)
static bool nfs4_match_clientids(u64 a, u64 b)
{
if (a->cl_clientid != b->cl_clientid) {
if (a != b) {
dprintk("NFS: --> %s client ID %llx does not match %llx\n",
__func__, a->cl_clientid, b->cl_clientid);
__func__, a, b);
return false;
}
dprintk("NFS: --> %s client ID %llx matches %llx\n",
__func__, a->cl_clientid, b->cl_clientid);
__func__, a, b);
return true;
}
@ -578,17 +581,15 @@ static bool nfs4_match_clientids(struct nfs_client *a, struct nfs_client *b)
* Returns true if the server major ids match
*/
static bool
nfs4_check_clientid_trunking(struct nfs_client *a, struct nfs_client *b)
nfs4_check_serverowner_major_id(struct nfs41_server_owner *o1,
struct nfs41_server_owner *o2)
{
struct nfs41_server_owner *o1 = a->cl_serverowner;
struct nfs41_server_owner *o2 = b->cl_serverowner;
if (o1->major_id_sz != o2->major_id_sz)
goto out_major_mismatch;
if (memcmp(o1->major_id, o2->major_id, o1->major_id_sz) != 0)
goto out_major_mismatch;
dprintk("NFS: --> %s server owners match\n", __func__);
dprintk("NFS: --> %s server owner major IDs match\n", __func__);
return true;
out_major_mismatch:
@ -597,6 +598,100 @@ out_major_mismatch:
return false;
}
/*
* Returns true if server minor ids match
*/
static bool
nfs4_check_serverowner_minor_id(struct nfs41_server_owner *o1,
struct nfs41_server_owner *o2)
{
/* Check eir_server_owner so_minor_id */
if (o1->minor_id != o2->minor_id)
goto out_minor_mismatch;
dprintk("NFS: --> %s server owner minor IDs match\n", __func__);
return true;
out_minor_mismatch:
dprintk("NFS: --> %s server owner minor IDs do not match\n", __func__);
return false;
}
/*
* Returns true if the server scopes match
*/
static bool
nfs4_check_server_scope(struct nfs41_server_scope *s1,
struct nfs41_server_scope *s2)
{
if (s1->server_scope_sz != s2->server_scope_sz)
goto out_scope_mismatch;
if (memcmp(s1->server_scope, s2->server_scope,
s1->server_scope_sz) != 0)
goto out_scope_mismatch;
dprintk("NFS: --> %s server scopes match\n", __func__);
return true;
out_scope_mismatch:
dprintk("NFS: --> %s server scopes do not match\n",
__func__);
return false;
}
/**
* nfs4_detect_session_trunking - Checks for session trunking.
*
* Called after a successful EXCHANGE_ID on a multi-addr connection.
* Upon success, add the transport.
*
* @clp: original mount nfs_client
* @res: result structure from an exchange_id using the original mount
* nfs_client with a new multi_addr transport
*
* Returns zero on success, otherwise -EINVAL
*
* Note: since the exchange_id for the new multi_addr transport uses the
* same nfs_client from the original mount, the cl_owner_id is reused,
* so eir_clientowner is the same.
*/
int nfs4_detect_session_trunking(struct nfs_client *clp,
struct nfs41_exchange_id_res *res,
struct rpc_xprt *xprt)
{
/* Check eir_clientid */
if (!nfs4_match_clientids(clp->cl_clientid, res->clientid))
goto out_err;
/* Check eir_server_owner so_major_id */
if (!nfs4_check_serverowner_major_id(clp->cl_serverowner,
res->server_owner))
goto out_err;
/* Check eir_server_owner so_minor_id */
if (!nfs4_check_serverowner_minor_id(clp->cl_serverowner,
res->server_owner))
goto out_err;
/* Check eir_server_scope */
if (!nfs4_check_server_scope(clp->cl_serverscope, res->server_scope))
goto out_err;
/* Session trunking passed, add the xprt */
rpc_clnt_xprt_switch_add_xprt(clp->cl_rpcclient, xprt);
pr_info("NFS: %s: Session trunking succeeded for %s\n",
clp->cl_hostname,
xprt->address_strings[RPC_DISPLAY_ADDR]);
return 0;
out_err:
pr_info("NFS: %s: Session trunking failed for %s\n", clp->cl_hostname,
xprt->address_strings[RPC_DISPLAY_ADDR]);
return -EINVAL;
}
/**
* nfs41_walk_client_list - Find nfs_client that matches a client/server owner
*
@ -650,7 +745,7 @@ int nfs41_walk_client_list(struct nfs_client *new,
if (pos->cl_cons_state != NFS_CS_READY)
continue;
if (!nfs4_match_clientids(pos, new))
if (!nfs4_match_clientids(pos->cl_clientid, new->cl_clientid))
continue;
/*
@ -658,7 +753,8 @@ int nfs41_walk_client_list(struct nfs_client *new,
* client id trunking. In either case, we want to fall back
* to using the existing nfs_client.
*/
if (!nfs4_check_clientid_trunking(pos, new))
if (!nfs4_check_serverowner_major_id(pos->cl_serverowner,
new->cl_serverowner))
continue;
/* Unlike NFSv4.0, we know that NFSv4.1 always uses the

File diff suppressed because it is too large Load Diff

View File

@ -9,6 +9,7 @@
/* maximum number of slots to use */
#define NFS4_DEF_SLOT_TABLE_SIZE (64U)
#define NFS4_DEF_CB_SLOT_TABLE_SIZE (1U)
#define NFS4_MAX_SLOT_TABLE (1024U)
#define NFS4_NO_SLOT ((u32)-1)
@ -22,6 +23,7 @@ struct nfs4_slot {
u32 slot_nr;
u32 seq_nr;
unsigned int interrupted : 1,
privileged : 1,
seq_done : 1;
};

View File

@ -991,6 +991,8 @@ int nfs4_select_rw_stateid(struct nfs4_state *state,
{
int ret;
if (!nfs4_valid_open_stateid(state))
return -EIO;
if (cred != NULL)
*cred = NULL;
ret = nfs4_copy_lock_stateid(dst, state, lockowner);
@ -1303,6 +1305,8 @@ void nfs4_schedule_path_down_recovery(struct nfs_client *clp)
static int nfs4_state_mark_reclaim_reboot(struct nfs_client *clp, struct nfs4_state *state)
{
if (!nfs4_valid_open_stateid(state))
return 0;
set_bit(NFS_STATE_RECLAIM_REBOOT, &state->flags);
/* Don't recover state that expired before the reboot */
if (test_bit(NFS_STATE_RECLAIM_NOGRACE, &state->flags)) {
@ -1316,6 +1320,8 @@ static int nfs4_state_mark_reclaim_reboot(struct nfs_client *clp, struct nfs4_st
int nfs4_state_mark_reclaim_nograce(struct nfs_client *clp, struct nfs4_state *state)
{
if (!nfs4_valid_open_stateid(state))
return 0;
set_bit(NFS_STATE_RECLAIM_NOGRACE, &state->flags);
clear_bit(NFS_STATE_RECLAIM_REBOOT, &state->flags);
set_bit(NFS_OWNER_RECLAIM_NOGRACE, &state->owner->so_flags);
@ -1327,9 +1333,8 @@ int nfs4_schedule_stateid_recovery(const struct nfs_server *server, struct nfs4_
{
struct nfs_client *clp = server->nfs_client;
if (!nfs4_valid_open_stateid(state))
if (!nfs4_state_mark_reclaim_nograce(clp, state))
return -EBADF;
nfs4_state_mark_reclaim_nograce(clp, state);
dprintk("%s: scheduling stateid recovery for server %s\n", __func__,
clp->cl_hostname);
nfs4_schedule_state_manager(clp);
@ -1337,6 +1342,35 @@ int nfs4_schedule_stateid_recovery(const struct nfs_server *server, struct nfs4_
}
EXPORT_SYMBOL_GPL(nfs4_schedule_stateid_recovery);
static struct nfs4_lock_state *
nfs_state_find_lock_state_by_stateid(struct nfs4_state *state,
const nfs4_stateid *stateid)
{
struct nfs4_lock_state *pos;
list_for_each_entry(pos, &state->lock_states, ls_locks) {
if (!test_bit(NFS_LOCK_INITIALIZED, &pos->ls_flags))
continue;
if (nfs4_stateid_match_other(&pos->ls_stateid, stateid))
return pos;
}
return NULL;
}
static bool nfs_state_lock_state_matches_stateid(struct nfs4_state *state,
const nfs4_stateid *stateid)
{
bool found = false;
if (test_bit(LK_STATE_IN_USE, &state->flags)) {
spin_lock(&state->state_lock);
if (nfs_state_find_lock_state_by_stateid(state, stateid))
found = true;
spin_unlock(&state->state_lock);
}
return found;
}
void nfs_inode_find_state_and_recover(struct inode *inode,
const nfs4_stateid *stateid)
{
@ -1351,14 +1385,18 @@ void nfs_inode_find_state_and_recover(struct inode *inode,
state = ctx->state;
if (state == NULL)
continue;
if (!test_bit(NFS_DELEGATED_STATE, &state->flags))
if (nfs4_stateid_match_other(&state->stateid, stateid) &&
nfs4_state_mark_reclaim_nograce(clp, state)) {
found = true;
continue;
if (!nfs4_stateid_match(&state->stateid, stateid))
continue;
nfs4_state_mark_reclaim_nograce(clp, state);
found = true;
}
if (nfs_state_lock_state_matches_stateid(state, stateid) &&
nfs4_state_mark_reclaim_nograce(clp, state))
found = true;
}
spin_unlock(&inode->i_lock);
nfs_inode_find_delegation_state_and_recover(inode, stateid);
if (found)
nfs4_schedule_state_manager(clp);
}
@ -1498,6 +1536,9 @@ restart:
__func__, status);
case -ENOENT:
case -ENOMEM:
case -EACCES:
case -EROFS:
case -EIO:
case -ESTALE:
/* Open state on this file cannot be recovered */
nfs4_state_mark_recovery_failed(state, status);
@ -1656,15 +1697,9 @@ static void nfs4_state_end_reclaim_reboot(struct nfs_client *clp)
put_rpccred(cred);
}
static void nfs_delegation_clear_all(struct nfs_client *clp)
{
nfs_delegation_mark_reclaim(clp);
nfs_delegation_reap_unclaimed(clp);
}
static void nfs4_state_start_reclaim_nograce(struct nfs_client *clp)
{
nfs_delegation_clear_all(clp);
nfs_mark_test_expired_all_delegations(clp);
nfs4_state_mark_reclaim_helper(clp, nfs4_state_mark_reclaim_nograce);
}
@ -2195,7 +2230,7 @@ static void nfs41_handle_all_state_revoked(struct nfs_client *clp)
static void nfs41_handle_some_state_revoked(struct nfs_client *clp)
{
nfs4_state_mark_reclaim_helper(clp, nfs4_state_mark_reclaim_nograce);
nfs4_state_start_reclaim_nograce(clp);
nfs4_schedule_state_manager(clp);
dprintk("%s: state revoked on server %s\n", __func__, clp->cl_hostname);
@ -2227,13 +2262,22 @@ static void nfs41_handle_cb_path_down(struct nfs_client *clp)
nfs4_schedule_state_manager(clp);
}
void nfs41_handle_sequence_flag_errors(struct nfs_client *clp, u32 flags)
void nfs41_handle_sequence_flag_errors(struct nfs_client *clp, u32 flags,
bool recovery)
{
if (!flags)
return;
dprintk("%s: \"%s\" (client ID %llx) flags=0x%08x\n",
__func__, clp->cl_hostname, clp->cl_clientid, flags);
/*
* If we're called from the state manager thread, then assume we're
* already handling the RECLAIM_NEEDED and/or STATE_REVOKED.
* Those flags are expected to remain set until we're done
* recovering (see RFC5661, section 18.46.3).
*/
if (recovery)
goto out_recovery;
if (flags & SEQ4_STATUS_RESTART_RECLAIM_NEEDED)
nfs41_handle_server_reboot(clp);
@ -2246,6 +2290,7 @@ void nfs41_handle_sequence_flag_errors(struct nfs_client *clp, u32 flags)
nfs4_schedule_lease_moved_recovery(clp);
if (flags & SEQ4_STATUS_RECALLABLE_STATE_REVOKED)
nfs41_handle_recallable_state_revoked(clp);
out_recovery:
if (flags & SEQ4_STATUS_BACKCHANNEL_FAULT)
nfs41_handle_backchannel_fault(clp);
else if (flags & (SEQ4_STATUS_CB_PATH_DOWN |
@ -2410,6 +2455,13 @@ static void nfs4_state_manager(struct nfs_client *clp)
nfs4_state_end_reclaim_reboot(clp);
}
/* Detect expired delegations... */
if (test_and_clear_bit(NFS4CLNT_DELEGATION_EXPIRED, &clp->cl_state)) {
section = "detect expired delegations";
nfs_reap_expired_delegations(clp);
continue;
}
/* Now recover expired state... */
if (test_and_clear_bit(NFS4CLNT_RECLAIM_NOGRACE, &clp->cl_state)) {
section = "reclaim nograce";

View File

@ -1850,7 +1850,7 @@ static void encode_create_session(struct xdr_stream *xdr,
*p++ = cpu_to_be32(RPC_AUTH_UNIX); /* auth_sys */
/* authsys_parms rfc1831 */
*p++ = cpu_to_be32(nn->boot_time.tv_nsec); /* stamp */
*p++ = cpu_to_be32(ktime_to_ns(nn->boot_time)); /* stamp */
p = xdr_encode_array(p, clnt->cl_nodename, clnt->cl_nodelen);
*p++ = cpu_to_be32(0); /* UID */
*p++ = cpu_to_be32(0); /* GID */
@ -4725,34 +4725,37 @@ static int decode_getfattr(struct xdr_stream *xdr, struct nfs_fattr *fattr,
}
/*
* Decode potentially multiple layout types. Currently we only support
* one layout driver per file system.
* Decode potentially multiple layout types.
*/
static int decode_first_pnfs_layout_type(struct xdr_stream *xdr,
uint32_t *layouttype)
static int decode_pnfs_layout_types(struct xdr_stream *xdr,
struct nfs_fsinfo *fsinfo)
{
__be32 *p;
int num;
uint32_t i;
p = xdr_inline_decode(xdr, 4);
if (unlikely(!p))
goto out_overflow;
num = be32_to_cpup(p);
fsinfo->nlayouttypes = be32_to_cpup(p);
/* pNFS is not supported by the underlying file system */
if (num == 0) {
*layouttype = 0;
if (fsinfo->nlayouttypes == 0)
return 0;
}
if (num > 1)
printk(KERN_INFO "NFS: %s: Warning: Multiple pNFS layout "
"drivers per filesystem not supported\n", __func__);
/* Decode and set first layout type, move xdr->p past unused types */
p = xdr_inline_decode(xdr, num * 4);
p = xdr_inline_decode(xdr, fsinfo->nlayouttypes * 4);
if (unlikely(!p))
goto out_overflow;
*layouttype = be32_to_cpup(p);
/* If we get too many, then just cap it at the max */
if (fsinfo->nlayouttypes > NFS_MAX_LAYOUT_TYPES) {
printk(KERN_INFO "NFS: %s: Warning: Too many (%u) pNFS layout types\n",
__func__, fsinfo->nlayouttypes);
fsinfo->nlayouttypes = NFS_MAX_LAYOUT_TYPES;
}
for(i = 0; i < fsinfo->nlayouttypes; ++i)
fsinfo->layouttype[i] = be32_to_cpup(p++);
return 0;
out_overflow:
print_overflow_msg(__func__, xdr);
@ -4764,7 +4767,7 @@ out_overflow:
* Note we must ensure that layouttype is set in any non-error case.
*/
static int decode_attr_pnfstype(struct xdr_stream *xdr, uint32_t *bitmap,
uint32_t *layouttype)
struct nfs_fsinfo *fsinfo)
{
int status = 0;
@ -4772,10 +4775,9 @@ static int decode_attr_pnfstype(struct xdr_stream *xdr, uint32_t *bitmap,
if (unlikely(bitmap[1] & (FATTR4_WORD1_FS_LAYOUT_TYPES - 1U)))
return -EIO;
if (bitmap[1] & FATTR4_WORD1_FS_LAYOUT_TYPES) {
status = decode_first_pnfs_layout_type(xdr, layouttype);
status = decode_pnfs_layout_types(xdr, fsinfo);
bitmap[1] &= ~FATTR4_WORD1_FS_LAYOUT_TYPES;
} else
*layouttype = 0;
}
return status;
}
@ -4856,7 +4858,7 @@ static int decode_fsinfo(struct xdr_stream *xdr, struct nfs_fsinfo *fsinfo)
status = decode_attr_time_delta(xdr, bitmap, &fsinfo->time_delta);
if (status != 0)
goto xdr_error;
status = decode_attr_pnfstype(xdr, bitmap, &fsinfo->layouttype);
status = decode_attr_pnfstype(xdr, bitmap, fsinfo);
if (status != 0)
goto xdr_error;

View File

@ -30,6 +30,7 @@
#include <linux/nfs_fs.h>
#include <linux/nfs_page.h>
#include <linux/module.h>
#include <linux/sort.h>
#include "internal.h"
#include "pnfs.h"
#include "iostat.h"
@ -98,36 +99,80 @@ unset_pnfs_layoutdriver(struct nfs_server *nfss)
nfss->pnfs_curr_ld = NULL;
}
/*
* When the server sends a list of layout types, we choose one in the order
* given in the list below.
*
* FIXME: should this list be configurable in some fashion? module param?
* mount option? something else?
*/
static const u32 ld_prefs[] = {
LAYOUT_SCSI,
LAYOUT_BLOCK_VOLUME,
LAYOUT_OSD2_OBJECTS,
LAYOUT_FLEX_FILES,
LAYOUT_NFSV4_1_FILES,
0
};
static int
ld_cmp(const void *e1, const void *e2)
{
u32 ld1 = *((u32 *)e1);
u32 ld2 = *((u32 *)e2);
int i;
for (i = 0; ld_prefs[i] != 0; i++) {
if (ld1 == ld_prefs[i])
return -1;
if (ld2 == ld_prefs[i])
return 1;
}
return 0;
}
/*
* Try to set the server's pnfs module to the pnfs layout type specified by id.
* Currently only one pNFS layout driver per filesystem is supported.
*
* @id layout type. Zero (illegal layout type) indicates pNFS not in use.
* @ids array of layout types supported by MDS.
*/
void
set_pnfs_layoutdriver(struct nfs_server *server, const struct nfs_fh *mntfh,
u32 id)
struct nfs_fsinfo *fsinfo)
{
struct pnfs_layoutdriver_type *ld_type = NULL;
u32 id;
int i;
if (id == 0)
goto out_no_driver;
if (!(server->nfs_client->cl_exchange_flags &
(EXCHGID4_FLAG_USE_NON_PNFS | EXCHGID4_FLAG_USE_PNFS_MDS))) {
printk(KERN_ERR "NFS: %s: id %u cl_exchange_flags 0x%x\n",
__func__, id, server->nfs_client->cl_exchange_flags);
printk(KERN_ERR "NFS: %s: cl_exchange_flags 0x%x\n",
__func__, server->nfs_client->cl_exchange_flags);
goto out_no_driver;
}
ld_type = find_pnfs_driver(id);
if (!ld_type) {
request_module("%s-%u", LAYOUT_NFSV4_1_MODULE_PREFIX, id);
sort(fsinfo->layouttype, fsinfo->nlayouttypes,
sizeof(*fsinfo->layouttype), ld_cmp, NULL);
for (i = 0; i < fsinfo->nlayouttypes; i++) {
id = fsinfo->layouttype[i];
ld_type = find_pnfs_driver(id);
if (!ld_type) {
dprintk("%s: No pNFS module found for %u.\n",
__func__, id);
goto out_no_driver;
request_module("%s-%u", LAYOUT_NFSV4_1_MODULE_PREFIX,
id);
ld_type = find_pnfs_driver(id);
}
if (ld_type)
break;
}
if (!ld_type) {
dprintk("%s: No pNFS module found!\n", __func__);
goto out_no_driver;
}
server->pnfs_curr_ld = ld_type;
if (ld_type->set_layoutdriver
&& ld_type->set_layoutdriver(server, mntfh)) {
@ -2185,10 +2230,8 @@ static void pnfs_ld_handle_read_error(struct nfs_pgio_header *hdr)
*/
void pnfs_ld_read_done(struct nfs_pgio_header *hdr)
{
if (likely(!hdr->pnfs_error)) {
__nfs4_read_done_cb(hdr);
if (likely(!hdr->pnfs_error))
hdr->mds_ops->rpc_call_done(&hdr->task, hdr);
}
trace_nfs4_pnfs_read(hdr, hdr->pnfs_error);
if (unlikely(hdr->pnfs_error))
pnfs_ld_handle_read_error(hdr);

View File

@ -236,7 +236,7 @@ void pnfs_get_layout_hdr(struct pnfs_layout_hdr *lo);
void pnfs_put_lseg(struct pnfs_layout_segment *lseg);
void pnfs_put_lseg_locked(struct pnfs_layout_segment *lseg);
void set_pnfs_layoutdriver(struct nfs_server *, const struct nfs_fh *, u32);
void set_pnfs_layoutdriver(struct nfs_server *, const struct nfs_fh *, struct nfs_fsinfo *);
void unset_pnfs_layoutdriver(struct nfs_server *);
void pnfs_generic_pg_init_read(struct nfs_pageio_descriptor *, struct nfs_page *);
int pnfs_generic_pg_readpages(struct nfs_pageio_descriptor *desc);
@ -657,7 +657,8 @@ pnfs_wait_on_layoutreturn(struct inode *ino, struct rpc_task *task)
}
static inline void set_pnfs_layoutdriver(struct nfs_server *s,
const struct nfs_fh *mntfh, u32 id)
const struct nfs_fh *mntfh,
struct nfs_fsinfo *fsinfo)
{
}

View File

@ -690,13 +690,50 @@ static int _nfs4_pnfs_v4_ds_connect(struct nfs_server *mds_srv,
dprintk("%s: DS %s: trying address %s\n",
__func__, ds->ds_remotestr, da->da_remotestr);
clp = nfs4_set_ds_client(mds_srv,
(struct sockaddr *)&da->da_addr,
da->da_addrlen, IPPROTO_TCP,
timeo, retrans, minor_version,
au_flavor);
if (!IS_ERR(clp))
break;
if (!IS_ERR(clp) && clp->cl_mvops->session_trunk) {
struct xprt_create xprt_args = {
.ident = XPRT_TRANSPORT_TCP,
.net = clp->cl_net,
.dstaddr = (struct sockaddr *)&da->da_addr,
.addrlen = da->da_addrlen,
.servername = clp->cl_hostname,
};
struct nfs4_add_xprt_data xprtdata = {
.clp = clp,
.cred = nfs4_get_clid_cred(clp),
};
struct rpc_add_xprt_test rpcdata = {
.add_xprt_test = clp->cl_mvops->session_trunk,
.data = &xprtdata,
};
/**
* Test this address for session trunking and
* add as an alias
*/
rpc_clnt_add_xprt(clp->cl_rpcclient, &xprt_args,
rpc_clnt_setup_test_and_add_xprt,
&rpcdata);
if (xprtdata.cred)
put_rpccred(xprtdata.cred);
} else {
clp = nfs4_set_ds_client(mds_srv,
(struct sockaddr *)&da->da_addr,
da->da_addrlen, IPPROTO_TCP,
timeo, retrans, minor_version,
au_flavor);
if (IS_ERR(clp))
continue;
status = nfs4_init_ds_session(clp,
mds_srv->nfs_client->cl_lease_time);
if (status) {
nfs_put_client(clp);
clp = ERR_PTR(-EIO);
continue;
}
}
}
if (IS_ERR(clp)) {
@ -704,18 +741,11 @@ static int _nfs4_pnfs_v4_ds_connect(struct nfs_server *mds_srv,
goto out;
}
status = nfs4_init_ds_session(clp, mds_srv->nfs_client->cl_lease_time);
if (status)
goto out_put;
smp_wmb();
ds->ds_clp = clp;
dprintk("%s [new] addr: %s\n", __func__, ds->ds_remotestr);
out:
return status;
out_put:
nfs_put_client(clp);
goto out;
}
/*

View File

@ -2848,19 +2848,23 @@ out_invalid_transport_udp:
* NFS client for backwards compatibility
*/
unsigned int nfs_callback_set_tcpport;
unsigned short nfs_callback_nr_threads;
/* Default cache timeout is 10 minutes */
unsigned int nfs_idmap_cache_timeout = 600;
/* Turn off NFSv4 uid/gid mapping when using AUTH_SYS */
bool nfs4_disable_idmapping = true;
unsigned short max_session_slots = NFS4_DEF_SLOT_TABLE_SIZE;
unsigned short max_session_cb_slots = NFS4_DEF_CB_SLOT_TABLE_SIZE;
unsigned short send_implementation_id = 1;
char nfs4_client_id_uniquifier[NFS4_CLIENT_ID_UNIQ_LEN] = "";
bool recover_lost_locks = false;
EXPORT_SYMBOL_GPL(nfs_callback_nr_threads);
EXPORT_SYMBOL_GPL(nfs_callback_set_tcpport);
EXPORT_SYMBOL_GPL(nfs_idmap_cache_timeout);
EXPORT_SYMBOL_GPL(nfs4_disable_idmapping);
EXPORT_SYMBOL_GPL(max_session_slots);
EXPORT_SYMBOL_GPL(max_session_cb_slots);
EXPORT_SYMBOL_GPL(send_implementation_id);
EXPORT_SYMBOL_GPL(nfs4_client_id_uniquifier);
EXPORT_SYMBOL_GPL(recover_lost_locks);
@ -2887,6 +2891,9 @@ static const struct kernel_param_ops param_ops_portnr = {
#define param_check_portnr(name, p) __param_check(name, p, unsigned int);
module_param_named(callback_tcpport, nfs_callback_set_tcpport, portnr, 0644);
module_param_named(callback_nr_threads, nfs_callback_nr_threads, ushort, 0644);
MODULE_PARM_DESC(callback_nr_threads, "Number of threads that will be "
"assigned to the NFSv4 callback channels.");
module_param(nfs_idmap_cache_timeout, int, 0644);
module_param(nfs4_disable_idmapping, bool, 0644);
module_param_string(nfs4_unique_id, nfs4_client_id_uniquifier,
@ -2896,6 +2903,9 @@ MODULE_PARM_DESC(nfs4_disable_idmapping,
module_param(max_session_slots, ushort, 0644);
MODULE_PARM_DESC(max_session_slots, "Maximum number of outstanding NFSv4.1 "
"requests the client will negotiate");
module_param(max_session_cb_slots, ushort, 0644);
MODULE_PARM_DESC(max_session_slots, "Maximum number of parallel NFSv4.1 "
"callbacks the client will process for a given server");
module_param(send_implementation_id, ushort, 0644);
MODULE_PARM_DESC(send_implementation_id,
"Send implementation ID with NFSv4.1 exchange_id");

View File

@ -67,6 +67,7 @@ struct nfs4_stateid_struct {
NFS4_DELEGATION_STATEID_TYPE,
NFS4_LAYOUT_STATEID_TYPE,
NFS4_PNFS_DS_STATEID_TYPE,
NFS4_REVOKED_STATEID_TYPE,
} type;
};

View File

@ -103,6 +103,9 @@ struct nfs_client {
#define NFS_SP4_MACH_CRED_WRITE 5 /* WRITE */
#define NFS_SP4_MACH_CRED_COMMIT 6 /* COMMIT */
#define NFS_SP4_MACH_CRED_PNFS_CLEANUP 7 /* LAYOUTRETURN */
#if IS_ENABLED(CONFIG_NFS_V4_1)
wait_queue_head_t cl_lock_waitq;
#endif /* CONFIG_NFS_V4_1 */
#endif /* CONFIG_NFS_V4 */
/* Our own IP address, as a null-terminated string.

View File

@ -124,6 +124,11 @@ struct nfs_fattr {
| NFS_ATTR_FATTR_SPACE_USED \
| NFS_ATTR_FATTR_V4_SECURITY_LABEL)
/*
* Maximal number of supported layout drivers.
*/
#define NFS_MAX_LAYOUT_TYPES 8
/*
* Info on the file system
*/
@ -139,7 +144,8 @@ struct nfs_fsinfo {
__u64 maxfilesize;
struct timespec time_delta; /* server time granularity */
__u32 lease_time; /* in seconds */
__u32 layouttype; /* supported pnfs layout driver */
__u32 nlayouttypes; /* number of layouttypes */
__u32 layouttype[NFS_MAX_LAYOUT_TYPES]; /* supported pnfs layout driver */
__u32 blksize; /* preferred pnfs io block size */
__u32 clone_blksize; /* granularity of a CLONE operation */
};

View File

@ -131,6 +131,7 @@ struct rpc_authops {
struct rpc_auth * (*create)(struct rpc_auth_create_args *, struct rpc_clnt *);
void (*destroy)(struct rpc_auth *);
int (*hash_cred)(struct auth_cred *, unsigned int);
struct rpc_cred * (*lookup_cred)(struct rpc_auth *, struct auth_cred *, int);
struct rpc_cred * (*crcreate)(struct rpc_auth*, struct auth_cred *, int, gfp_t);
int (*list_pseudoflavors)(rpc_authflavor_t *, int);

View File

@ -125,6 +125,13 @@ struct rpc_create_args {
struct svc_xprt *bc_xprt; /* NFSv4.1 backchannel */
};
struct rpc_add_xprt_test {
int (*add_xprt_test)(struct rpc_clnt *,
struct rpc_xprt *,
void *calldata);
void *data;
};
/* Values for "flags" field */
#define RPC_CLNT_CREATE_HARDRTRY (1UL << 0)
#define RPC_CLNT_CREATE_AUTOBIND (1UL << 2)
@ -198,6 +205,16 @@ int rpc_clnt_add_xprt(struct rpc_clnt *, struct xprt_create *,
void rpc_cap_max_reconnect_timeout(struct rpc_clnt *clnt,
unsigned long timeo);
int rpc_clnt_setup_test_and_add_xprt(struct rpc_clnt *,
struct rpc_xprt_switch *,
struct rpc_xprt *,
void *);
const char *rpc_proc_name(const struct rpc_task *task);
void rpc_clnt_xprt_switch_put(struct rpc_clnt *);
void rpc_clnt_xprt_switch_add_xprt(struct rpc_clnt *, struct rpc_xprt *);
bool rpc_clnt_xprt_switch_has_addr(struct rpc_clnt *clnt,
const struct sockaddr *sap);
#endif /* __KERNEL__ */
#endif /* _LINUX_SUNRPC_CLNT_H */

View File

@ -46,6 +46,10 @@
#define RPCRDMA_VERSION 1
#define rpcrdma_version cpu_to_be32(RPCRDMA_VERSION)
enum {
RPCRDMA_V1_DEF_INLINE_SIZE = 1024,
};
struct rpcrdma_segment {
__be32 rs_handle; /* Registered memory handle */
__be32 rs_length; /* Length of the chunk in bytes */

View File

@ -239,8 +239,8 @@ struct rpc_task *rpc_wake_up_first(struct rpc_wait_queue *,
void *);
void rpc_wake_up_status(struct rpc_wait_queue *, int);
void rpc_delay(struct rpc_task *, unsigned long);
void * rpc_malloc(struct rpc_task *, size_t);
void rpc_free(void *);
int rpc_malloc(struct rpc_task *);
void rpc_free(struct rpc_task *);
int rpciod_up(void);
void rpciod_down(void);
int __rpc_wait_for_completion_task(struct rpc_task *task, wait_bit_action_f *);

View File

@ -67,6 +67,18 @@ struct xdr_buf {
len; /* Length of XDR encoded message */
};
static inline void
xdr_buf_init(struct xdr_buf *buf, void *start, size_t len)
{
buf->head[0].iov_base = start;
buf->head[0].iov_len = len;
buf->tail[0].iov_len = 0;
buf->page_len = 0;
buf->flags = 0;
buf->len = 0;
buf->buflen = len;
}
/*
* pre-xdr'ed macros.
*/

View File

@ -83,9 +83,11 @@ struct rpc_rqst {
void (*rq_release_snd_buf)(struct rpc_rqst *); /* release rq_enc_pages */
struct list_head rq_list;
__u32 * rq_buffer; /* XDR encode buffer */
size_t rq_callsize,
rq_rcvsize;
void *rq_xprtdata; /* Per-xprt private data */
void *rq_buffer; /* Call XDR encode buffer */
size_t rq_callsize;
void *rq_rbuffer; /* Reply XDR decode buffer */
size_t rq_rcvsize;
size_t rq_xmit_bytes_sent; /* total bytes sent */
size_t rq_reply_bytes_recvd; /* total reply bytes */
/* received */
@ -127,8 +129,8 @@ struct rpc_xprt_ops {
void (*rpcbind)(struct rpc_task *task);
void (*set_port)(struct rpc_xprt *xprt, unsigned short port);
void (*connect)(struct rpc_xprt *xprt, struct rpc_task *task);
void * (*buf_alloc)(struct rpc_task *task, size_t size);
void (*buf_free)(void *buffer);
int (*buf_alloc)(struct rpc_task *task);
void (*buf_free)(struct rpc_task *task);
int (*send_request)(struct rpc_task *task);
void (*set_retrans_timeout)(struct rpc_task *task);
void (*timer)(struct rpc_xprt *xprt, struct rpc_task *task);

View File

@ -66,4 +66,6 @@ extern struct rpc_xprt *xprt_iter_xprt(struct rpc_xprt_iter *xpi);
extern struct rpc_xprt *xprt_iter_get_xprt(struct rpc_xprt_iter *xpi);
extern struct rpc_xprt *xprt_iter_get_next(struct rpc_xprt_iter *xpi);
extern bool rpc_xprt_switch_has_addr(struct rpc_xprt_switch *xps,
const struct sockaddr *sap);
#endif

View File

@ -53,8 +53,8 @@
#define RPCRDMA_MAX_SLOT_TABLE (256U)
#define RPCRDMA_MIN_INLINE (1024) /* min inline thresh */
#define RPCRDMA_DEF_INLINE (1024) /* default inline thresh */
#define RPCRDMA_MAX_INLINE (3068) /* max inline thresh */
#define RPCRDMA_DEF_INLINE (4096) /* default inline thresh */
#define RPCRDMA_MAX_INLINE (65536) /* max inline thresh */
/* Memory registration strategies, by number.
* This is part of a kernel / user space API. Do not remove. */

View File

@ -551,7 +551,7 @@ rpcauth_lookup_credcache(struct rpc_auth *auth, struct auth_cred * acred,
*entry, *new;
unsigned int nr;
nr = hash_long(from_kuid(&init_user_ns, acred->uid), cache->hashbits);
nr = auth->au_ops->hash_cred(acred, cache->hashbits);
rcu_read_lock();
hlist_for_each_entry_rcu(entry, &cache->hashtable[nr], cr_hash) {

View File

@ -78,6 +78,14 @@ static struct rpc_cred *generic_bind_cred(struct rpc_task *task,
return auth->au_ops->lookup_cred(auth, acred, lookupflags);
}
static int
generic_hash_cred(struct auth_cred *acred, unsigned int hashbits)
{
return hash_64(from_kgid(&init_user_ns, acred->gid) |
((u64)from_kuid(&init_user_ns, acred->uid) <<
(sizeof(gid_t) * 8)), hashbits);
}
/*
* Lookup generic creds for current process
*/
@ -258,6 +266,7 @@ generic_key_timeout(struct rpc_auth *auth, struct rpc_cred *cred)
static const struct rpc_authops generic_auth_ops = {
.owner = THIS_MODULE,
.au_name = "Generic",
.hash_cred = generic_hash_cred,
.lookup_cred = generic_lookup_cred,
.crcreate = generic_create_cred,
.key_timeout = generic_key_timeout,

View File

@ -1298,6 +1298,12 @@ gss_destroy_cred(struct rpc_cred *cred)
gss_destroy_nullcred(cred);
}
static int
gss_hash_cred(struct auth_cred *acred, unsigned int hashbits)
{
return hash_64(from_kuid(&init_user_ns, acred->uid), hashbits);
}
/*
* Lookup RPCSEC_GSS cred for the current process
*/
@ -1982,6 +1988,7 @@ static const struct rpc_authops authgss_ops = {
.au_name = "RPCSEC_GSS",
.create = gss_create,
.destroy = gss_destroy,
.hash_cred = gss_hash_cred,
.lookup_cred = gss_lookup_cred,
.crcreate = gss_create_cred,
.list_pseudoflavors = gss_mech_list_pseudoflavors,

View File

@ -46,6 +46,14 @@ unx_destroy(struct rpc_auth *auth)
rpcauth_clear_credcache(auth->au_credcache);
}
static int
unx_hash_cred(struct auth_cred *acred, unsigned int hashbits)
{
return hash_64(from_kgid(&init_user_ns, acred->gid) |
((u64)from_kuid(&init_user_ns, acred->uid) <<
(sizeof(gid_t) * 8)), hashbits);
}
/*
* Lookup AUTH_UNIX creds for current process
*/
@ -220,6 +228,7 @@ const struct rpc_authops authunix_ops = {
.au_name = "UNIX",
.create = unx_create,
.destroy = unx_destroy,
.hash_cred = unx_hash_cred,
.lookup_cred = unx_lookup_cred,
.crcreate = unx_create_cred,
};

View File

@ -76,13 +76,7 @@ static int xprt_alloc_xdr_buf(struct xdr_buf *buf, gfp_t gfp_flags)
page = alloc_page(gfp_flags);
if (page == NULL)
return -ENOMEM;
buf->head[0].iov_base = page_address(page);
buf->head[0].iov_len = PAGE_SIZE;
buf->tail[0].iov_base = NULL;
buf->tail[0].iov_len = 0;
buf->page_len = 0;
buf->len = 0;
buf->buflen = PAGE_SIZE;
xdr_buf_init(buf, page_address(page), PAGE_SIZE);
return 0;
}

View File

@ -353,7 +353,7 @@ void sunrpc_init_cache_detail(struct cache_detail *cd)
spin_unlock(&cache_list_lock);
/* start the cleaning process */
schedule_delayed_work(&cache_cleaner, 0);
queue_delayed_work(system_power_efficient_wq, &cache_cleaner, 0);
}
EXPORT_SYMBOL_GPL(sunrpc_init_cache_detail);
@ -476,7 +476,8 @@ static void do_cache_clean(struct work_struct *work)
delay = 0;
if (delay)
schedule_delayed_work(&cache_cleaner, delay);
queue_delayed_work(system_power_efficient_wq,
&cache_cleaner, delay);
}

View File

@ -184,7 +184,6 @@ static int __rpc_clnt_handle_event(struct rpc_clnt *clnt, unsigned long event,
struct super_block *sb)
{
struct dentry *dentry;
int err = 0;
switch (event) {
case RPC_PIPEFS_MOUNT:
@ -201,7 +200,7 @@ static int __rpc_clnt_handle_event(struct rpc_clnt *clnt, unsigned long event,
printk(KERN_ERR "%s: unknown event: %ld\n", __func__, event);
return -ENOTSUPP;
}
return err;
return 0;
}
static int __rpc_pipefs_event(struct rpc_clnt *clnt, unsigned long event,
@ -988,7 +987,6 @@ void rpc_task_set_client(struct rpc_task *task, struct rpc_clnt *clnt)
{
if (clnt != NULL) {
rpc_task_release_client(task);
if (task->tk_xprt == NULL)
task->tk_xprt = xprt_iter_get_next(&clnt->cl_xpi);
task->tk_client = clnt;
@ -1693,6 +1691,7 @@ call_allocate(struct rpc_task *task)
struct rpc_rqst *req = task->tk_rqstp;
struct rpc_xprt *xprt = req->rq_xprt;
struct rpc_procinfo *proc = task->tk_msg.rpc_proc;
int status;
dprint_status(task);
@ -1718,11 +1717,14 @@ call_allocate(struct rpc_task *task)
req->rq_rcvsize = RPC_REPHDRSIZE + slack + proc->p_replen;
req->rq_rcvsize <<= 2;
req->rq_buffer = xprt->ops->buf_alloc(task,
req->rq_callsize + req->rq_rcvsize);
if (req->rq_buffer != NULL)
return;
status = xprt->ops->buf_alloc(task);
xprt_inject_disconnect(xprt);
if (status == 0)
return;
if (status != -ENOMEM) {
rpc_exit(task, status);
return;
}
dprintk("RPC: %5u rpc_buffer allocation failed\n", task->tk_pid);
@ -1748,18 +1750,6 @@ rpc_task_force_reencode(struct rpc_task *task)
task->tk_rqstp->rq_bytes_sent = 0;
}
static inline void
rpc_xdr_buf_init(struct xdr_buf *buf, void *start, size_t len)
{
buf->head[0].iov_base = start;
buf->head[0].iov_len = len;
buf->tail[0].iov_len = 0;
buf->page_len = 0;
buf->flags = 0;
buf->len = 0;
buf->buflen = len;
}
/*
* 3. Encode arguments of an RPC call
*/
@ -1772,12 +1762,12 @@ rpc_xdr_encode(struct rpc_task *task)
dprint_status(task);
rpc_xdr_buf_init(&req->rq_snd_buf,
req->rq_buffer,
req->rq_callsize);
rpc_xdr_buf_init(&req->rq_rcv_buf,
(char *)req->rq_buffer + req->rq_callsize,
req->rq_rcvsize);
xdr_buf_init(&req->rq_snd_buf,
req->rq_buffer,
req->rq_callsize);
xdr_buf_init(&req->rq_rcv_buf,
req->rq_rbuffer,
req->rq_rcvsize);
p = rpc_encode_header(task);
if (p == NULL) {
@ -2615,6 +2605,70 @@ int rpc_clnt_test_and_add_xprt(struct rpc_clnt *clnt,
}
EXPORT_SYMBOL_GPL(rpc_clnt_test_and_add_xprt);
/**
* rpc_clnt_setup_test_and_add_xprt()
*
* This is an rpc_clnt_add_xprt setup() function which returns 1 so:
* 1) caller of the test function must dereference the rpc_xprt_switch
* and the rpc_xprt.
* 2) test function must call rpc_xprt_switch_add_xprt, usually in
* the rpc_call_done routine.
*
* Upon success (return of 1), the test function adds the new
* transport to the rpc_clnt xprt switch
*
* @clnt: struct rpc_clnt to get the new transport
* @xps: the rpc_xprt_switch to hold the new transport
* @xprt: the rpc_xprt to test
* @data: a struct rpc_add_xprt_test pointer that holds the test function
* and test function call data
*/
int rpc_clnt_setup_test_and_add_xprt(struct rpc_clnt *clnt,
struct rpc_xprt_switch *xps,
struct rpc_xprt *xprt,
void *data)
{
struct rpc_cred *cred;
struct rpc_task *task;
struct rpc_add_xprt_test *xtest = (struct rpc_add_xprt_test *)data;
int status = -EADDRINUSE;
xprt = xprt_get(xprt);
xprt_switch_get(xps);
if (rpc_xprt_switch_has_addr(xps, (struct sockaddr *)&xprt->addr))
goto out_err;
/* Test the connection */
cred = authnull_ops.lookup_cred(NULL, NULL, 0);
task = rpc_call_null_helper(clnt, xprt, cred,
RPC_TASK_SOFT | RPC_TASK_SOFTCONN,
NULL, NULL);
put_rpccred(cred);
if (IS_ERR(task)) {
status = PTR_ERR(task);
goto out_err;
}
status = task->tk_status;
rpc_put_task(task);
if (status < 0)
goto out_err;
/* rpc_xprt_switch and rpc_xprt are deferrenced by add_xprt_test() */
xtest->add_xprt_test(clnt, xprt, xtest->data);
/* so that rpc_clnt_add_xprt does not call rpc_xprt_switch_add_xprt */
return 1;
out_err:
xprt_put(xprt);
xprt_switch_put(xps);
pr_info("RPC: rpc_clnt_test_xprt failed: %d addr %s not added\n",
status, xprt->address_strings[RPC_DISPLAY_ADDR]);
return status;
}
EXPORT_SYMBOL_GPL(rpc_clnt_setup_test_and_add_xprt);
/**
* rpc_clnt_add_xprt - Add a new transport to a rpc_clnt
* @clnt: pointer to struct rpc_clnt
@ -2697,6 +2751,34 @@ rpc_cap_max_reconnect_timeout(struct rpc_clnt *clnt, unsigned long timeo)
}
EXPORT_SYMBOL_GPL(rpc_cap_max_reconnect_timeout);
void rpc_clnt_xprt_switch_put(struct rpc_clnt *clnt)
{
xprt_switch_put(rcu_dereference(clnt->cl_xpi.xpi_xpswitch));
}
EXPORT_SYMBOL_GPL(rpc_clnt_xprt_switch_put);
void rpc_clnt_xprt_switch_add_xprt(struct rpc_clnt *clnt, struct rpc_xprt *xprt)
{
rpc_xprt_switch_add_xprt(rcu_dereference(clnt->cl_xpi.xpi_xpswitch),
xprt);
}
EXPORT_SYMBOL_GPL(rpc_clnt_xprt_switch_add_xprt);
bool rpc_clnt_xprt_switch_has_addr(struct rpc_clnt *clnt,
const struct sockaddr *sap)
{
struct rpc_xprt_switch *xps;
bool ret;
xps = rcu_dereference(clnt->cl_xpi.xpi_xpswitch);
rcu_read_lock();
ret = rpc_xprt_switch_has_addr(xps, sap);
rcu_read_unlock();
return ret;
}
EXPORT_SYMBOL_GPL(rpc_clnt_xprt_switch_has_addr);
#if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
static void rpc_show_header(void)
{

View File

@ -849,14 +849,17 @@ static void rpc_async_schedule(struct work_struct *work)
}
/**
* rpc_malloc - allocate an RPC buffer
* @task: RPC task that will use this buffer
* @size: requested byte size
* rpc_malloc - allocate RPC buffer resources
* @task: RPC task
*
* A single memory region is allocated, which is split between the
* RPC call and RPC reply that this task is being used for. When
* this RPC is retired, the memory is released by calling rpc_free.
*
* To prevent rpciod from hanging, this allocator never sleeps,
* returning NULL and suppressing warning if the request cannot be serviced
* immediately.
* The caller can arrange to sleep in a way that is safe for rpciod.
* returning -ENOMEM and suppressing warning if the request cannot
* be serviced immediately. The caller can arrange to sleep in a
* way that is safe for rpciod.
*
* Most requests are 'small' (under 2KiB) and can be serviced from a
* mempool, ensuring that NFS reads and writes can always proceed,
@ -865,8 +868,10 @@ static void rpc_async_schedule(struct work_struct *work)
* In order to avoid memory starvation triggering more writebacks of
* NFS requests, we avoid using GFP_KERNEL.
*/
void *rpc_malloc(struct rpc_task *task, size_t size)
int rpc_malloc(struct rpc_task *task)
{
struct rpc_rqst *rqst = task->tk_rqstp;
size_t size = rqst->rq_callsize + rqst->rq_rcvsize;
struct rpc_buffer *buf;
gfp_t gfp = GFP_NOIO | __GFP_NOWARN;
@ -880,28 +885,28 @@ void *rpc_malloc(struct rpc_task *task, size_t size)
buf = kmalloc(size, gfp);
if (!buf)
return NULL;
return -ENOMEM;
buf->len = size;
dprintk("RPC: %5u allocated buffer of size %zu at %p\n",
task->tk_pid, size, buf);
return &buf->data;
rqst->rq_buffer = buf->data;
rqst->rq_rbuffer = (char *)rqst->rq_buffer + rqst->rq_callsize;
return 0;
}
EXPORT_SYMBOL_GPL(rpc_malloc);
/**
* rpc_free - free buffer allocated via rpc_malloc
* @buffer: buffer to free
* rpc_free - free RPC buffer resources allocated via rpc_malloc
* @task: RPC task
*
*/
void rpc_free(void *buffer)
void rpc_free(struct rpc_task *task)
{
void *buffer = task->tk_rqstp->rq_buffer;
size_t size;
struct rpc_buffer *buf;
if (!buffer)
return;
buf = container_of(buffer, struct rpc_buffer, data);
size = buf->len;

View File

@ -401,6 +401,21 @@ int svc_bind(struct svc_serv *serv, struct net *net)
}
EXPORT_SYMBOL_GPL(svc_bind);
#if defined(CONFIG_SUNRPC_BACKCHANNEL)
static void
__svc_init_bc(struct svc_serv *serv)
{
INIT_LIST_HEAD(&serv->sv_cb_list);
spin_lock_init(&serv->sv_cb_lock);
init_waitqueue_head(&serv->sv_cb_waitq);
}
#else
static void
__svc_init_bc(struct svc_serv *serv)
{
}
#endif
/*
* Create an RPC service
*/
@ -443,6 +458,8 @@ __svc_create(struct svc_program *prog, unsigned int bufsize, int npools,
init_timer(&serv->sv_temptimer);
spin_lock_init(&serv->sv_lock);
__svc_init_bc(serv);
serv->sv_nrpools = npools;
serv->sv_pools =
kcalloc(serv->sv_nrpools, sizeof(struct svc_pool),

View File

@ -767,7 +767,7 @@ static void xdr_set_next_page(struct xdr_stream *xdr)
newbase -= xdr->buf->page_base;
if (xdr_set_page_base(xdr, newbase, PAGE_SIZE) < 0)
xdr_set_iov(xdr, xdr->buf->tail, xdr->buf->len);
xdr_set_iov(xdr, xdr->buf->tail, xdr->nwords << 2);
}
static bool xdr_set_next_buffer(struct xdr_stream *xdr)
@ -776,7 +776,7 @@ static bool xdr_set_next_buffer(struct xdr_stream *xdr)
xdr_set_next_page(xdr);
else if (xdr->iov == xdr->buf->head) {
if (xdr_set_page_base(xdr, 0, PAGE_SIZE) < 0)
xdr_set_iov(xdr, xdr->buf->tail, xdr->buf->len);
xdr_set_iov(xdr, xdr->buf->tail, xdr->nwords << 2);
}
return xdr->p != xdr->end;
}
@ -859,12 +859,15 @@ EXPORT_SYMBOL_GPL(xdr_set_scratch_buffer);
static __be32 *xdr_copy_to_scratch(struct xdr_stream *xdr, size_t nbytes)
{
__be32 *p;
void *cpdest = xdr->scratch.iov_base;
char *cpdest = xdr->scratch.iov_base;
size_t cplen = (char *)xdr->end - (char *)xdr->p;
if (nbytes > xdr->scratch.iov_len)
return NULL;
memcpy(cpdest, xdr->p, cplen);
p = __xdr_inline_decode(xdr, cplen);
if (p == NULL)
return NULL;
memcpy(cpdest, p, cplen);
cpdest += cplen;
nbytes -= cplen;
if (!xdr_set_next_buffer(xdr))

View File

@ -1295,7 +1295,7 @@ void xprt_release(struct rpc_task *task)
xprt_schedule_autodisconnect(xprt);
spin_unlock_bh(&xprt->transport_lock);
if (req->rq_buffer)
xprt->ops->buf_free(req->rq_buffer);
xprt->ops->buf_free(task);
xprt_inject_disconnect(xprt);
if (req->rq_cred != NULL)
put_rpccred(req->rq_cred);

View File

@ -15,6 +15,7 @@
#include <asm/cmpxchg.h>
#include <linux/spinlock.h>
#include <linux/sunrpc/xprt.h>
#include <linux/sunrpc/addr.h>
#include <linux/sunrpc/xprtmultipath.h>
typedef struct rpc_xprt *(*xprt_switch_find_xprt_t)(struct list_head *head,
@ -49,7 +50,8 @@ void rpc_xprt_switch_add_xprt(struct rpc_xprt_switch *xps,
if (xprt == NULL)
return;
spin_lock(&xps->xps_lock);
if (xps->xps_net == xprt->xprt_net || xps->xps_net == NULL)
if ((xps->xps_net == xprt->xprt_net || xps->xps_net == NULL) &&
!rpc_xprt_switch_has_addr(xps, (struct sockaddr *)&xprt->addr))
xprt_switch_add_xprt_locked(xps, xprt);
spin_unlock(&xps->xps_lock);
}
@ -232,6 +234,26 @@ struct rpc_xprt *xprt_iter_current_entry(struct rpc_xprt_iter *xpi)
return xprt_switch_find_current_entry(head, xpi->xpi_cursor);
}
bool rpc_xprt_switch_has_addr(struct rpc_xprt_switch *xps,
const struct sockaddr *sap)
{
struct list_head *head;
struct rpc_xprt *pos;
if (xps == NULL || sap == NULL)
return false;
head = &xps->xps_xprt_list;
list_for_each_entry_rcu(pos, head, xprt_switch) {
if (rpc_cmp_addr_port(sap, (struct sockaddr *)&pos->addr)) {
pr_info("RPC: addr %s already in xprt switch\n",
pos->address_strings[RPC_DISPLAY_ADDR]);
return true;
}
}
return false;
}
static
struct rpc_xprt *xprt_switch_find_next_entry(struct list_head *head,
const struct rpc_xprt *cur)

View File

@ -27,7 +27,7 @@ static void rpcrdma_bc_free_rqst(struct rpcrdma_xprt *r_xprt,
list_del(&req->rl_all);
spin_unlock(&buf->rb_reqslock);
rpcrdma_destroy_req(&r_xprt->rx_ia, req);
rpcrdma_destroy_req(req);
kfree(rqst);
}
@ -35,10 +35,8 @@ static void rpcrdma_bc_free_rqst(struct rpcrdma_xprt *r_xprt,
static int rpcrdma_bc_setup_rqst(struct rpcrdma_xprt *r_xprt,
struct rpc_rqst *rqst)
{
struct rpcrdma_ia *ia = &r_xprt->rx_ia;
struct rpcrdma_regbuf *rb;
struct rpcrdma_req *req;
struct xdr_buf *buf;
size_t size;
req = rpcrdma_create_req(r_xprt);
@ -46,30 +44,19 @@ static int rpcrdma_bc_setup_rqst(struct rpcrdma_xprt *r_xprt,
return PTR_ERR(req);
req->rl_backchannel = true;
size = RPCRDMA_INLINE_WRITE_THRESHOLD(rqst);
rb = rpcrdma_alloc_regbuf(ia, size, GFP_KERNEL);
rb = rpcrdma_alloc_regbuf(RPCRDMA_HDRBUF_SIZE,
DMA_TO_DEVICE, GFP_KERNEL);
if (IS_ERR(rb))
goto out_fail;
req->rl_rdmabuf = rb;
size += RPCRDMA_INLINE_READ_THRESHOLD(rqst);
rb = rpcrdma_alloc_regbuf(ia, size, GFP_KERNEL);
size = r_xprt->rx_data.inline_rsize;
rb = rpcrdma_alloc_regbuf(size, DMA_TO_DEVICE, GFP_KERNEL);
if (IS_ERR(rb))
goto out_fail;
rb->rg_owner = req;
req->rl_sendbuf = rb;
/* so that rpcr_to_rdmar works when receiving a request */
rqst->rq_buffer = (void *)req->rl_sendbuf->rg_base;
buf = &rqst->rq_snd_buf;
buf->head[0].iov_base = rqst->rq_buffer;
buf->head[0].iov_len = 0;
buf->tail[0].iov_base = NULL;
buf->tail[0].iov_len = 0;
buf->page_len = 0;
buf->len = 0;
buf->buflen = size;
xdr_buf_init(&rqst->rq_snd_buf, rb->rg_base, size);
rpcrdma_set_xprtdata(rqst, req);
return 0;
out_fail:
@ -219,7 +206,6 @@ int rpcrdma_bc_marshal_reply(struct rpc_rqst *rqst)
struct rpcrdma_xprt *r_xprt = rpcx_to_rdmax(xprt);
struct rpcrdma_req *req = rpcr_to_rdmar(rqst);
struct rpcrdma_msg *headerp;
size_t rpclen;
headerp = rdmab_to_msg(req->rl_rdmabuf);
headerp->rm_xid = rqst->rq_xid;
@ -231,26 +217,9 @@ int rpcrdma_bc_marshal_reply(struct rpc_rqst *rqst)
headerp->rm_body.rm_chunks[1] = xdr_zero;
headerp->rm_body.rm_chunks[2] = xdr_zero;
rpclen = rqst->rq_svec[0].iov_len;
#ifdef RPCRDMA_BACKCHANNEL_DEBUG
pr_info("RPC: %s: rpclen %zd headerp 0x%p lkey 0x%x\n",
__func__, rpclen, headerp, rdmab_lkey(req->rl_rdmabuf));
pr_info("RPC: %s: RPC/RDMA: %*ph\n",
__func__, (int)RPCRDMA_HDRLEN_MIN, headerp);
pr_info("RPC: %s: RPC: %*ph\n",
__func__, (int)rpclen, rqst->rq_svec[0].iov_base);
#endif
req->rl_send_iov[0].addr = rdmab_addr(req->rl_rdmabuf);
req->rl_send_iov[0].length = RPCRDMA_HDRLEN_MIN;
req->rl_send_iov[0].lkey = rdmab_lkey(req->rl_rdmabuf);
req->rl_send_iov[1].addr = rdmab_addr(req->rl_sendbuf);
req->rl_send_iov[1].length = rpclen;
req->rl_send_iov[1].lkey = rdmab_lkey(req->rl_sendbuf);
req->rl_niovs = 2;
if (!rpcrdma_prepare_send_sges(&r_xprt->rx_ia, req, RPCRDMA_HDRLEN_MIN,
&rqst->rq_snd_buf, rpcrdma_noch))
return -EIO;
return 0;
}
@ -402,7 +371,7 @@ out_overflow:
out_short:
pr_warn("RPC/RDMA short backward direction call\n");
if (rpcrdma_ep_post_recv(&r_xprt->rx_ia, &r_xprt->rx_ep, rep))
if (rpcrdma_ep_post_recv(&r_xprt->rx_ia, rep))
xprt_disconnect_done(xprt);
else
pr_warn("RPC: %s: reposting rep %p\n",

View File

@ -160,9 +160,8 @@ static int
fmr_op_open(struct rpcrdma_ia *ia, struct rpcrdma_ep *ep,
struct rpcrdma_create_data_internal *cdata)
{
rpcrdma_set_max_header_sizes(ia, cdata, max_t(unsigned int, 1,
RPCRDMA_MAX_DATA_SEGS /
RPCRDMA_MAX_FMR_SGES));
ia->ri_max_segs = max_t(unsigned int, 1, RPCRDMA_MAX_DATA_SEGS /
RPCRDMA_MAX_FMR_SGES);
return 0;
}
@ -274,6 +273,7 @@ fmr_op_unmap_sync(struct rpcrdma_xprt *r_xprt, struct rpcrdma_req *req)
*/
list_for_each_entry(mw, &req->rl_registered, mw_list)
list_add_tail(&mw->fmr.fm_mr->list, &unmap_list);
r_xprt->rx_stats.local_inv_needed++;
rc = ib_unmap_fmr(&unmap_list);
if (rc)
goto out_reset;
@ -331,4 +331,5 @@ const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops = {
.ro_init_mr = fmr_op_init_mr,
.ro_release_mr = fmr_op_release_mr,
.ro_displayname = "fmr",
.ro_send_w_inv_ok = 0,
};

View File

@ -67,6 +67,8 @@
* pending send queue WRs before the transport is reconnected.
*/
#include <linux/sunrpc/rpc_rdma.h>
#include "xprt_rdma.h"
#if IS_ENABLED(CONFIG_SUNRPC_DEBUG)
@ -161,7 +163,7 @@ __frwr_reset_mr(struct rpcrdma_ia *ia, struct rpcrdma_mw *r)
return PTR_ERR(f->fr_mr);
}
dprintk("RPC: %s: recovered FRMR %p\n", __func__, r);
dprintk("RPC: %s: recovered FRMR %p\n", __func__, f);
f->fr_state = FRMR_IS_INVALID;
return 0;
}
@ -242,9 +244,8 @@ frwr_op_open(struct rpcrdma_ia *ia, struct rpcrdma_ep *ep,
depth;
}
rpcrdma_set_max_header_sizes(ia, cdata, max_t(unsigned int, 1,
RPCRDMA_MAX_DATA_SEGS /
ia->ri_max_frmr_depth));
ia->ri_max_segs = max_t(unsigned int, 1, RPCRDMA_MAX_DATA_SEGS /
ia->ri_max_frmr_depth);
return 0;
}
@ -329,7 +330,7 @@ frwr_wc_localinv_wake(struct ib_cq *cq, struct ib_wc *wc)
frmr = container_of(cqe, struct rpcrdma_frmr, fr_cqe);
if (wc->status != IB_WC_SUCCESS)
__frwr_sendcompletion_flush(wc, frmr, "localinv");
complete_all(&frmr->fr_linv_done);
complete(&frmr->fr_linv_done);
}
/* Post a REG_MR Work Request to register a memory region
@ -396,7 +397,7 @@ frwr_op_map(struct rpcrdma_xprt *r_xprt, struct rpcrdma_mr_seg *seg,
goto out_mapmr_err;
dprintk("RPC: %s: Using frmr %p to map %u segments (%u bytes)\n",
__func__, mw, mw->mw_nents, mr->length);
__func__, frmr, mw->mw_nents, mr->length);
key = (u8)(mr->rkey & 0x000000FF);
ib_update_fast_reg_key(mr, ++key);
@ -449,6 +450,8 @@ __frwr_prepare_linv_wr(struct rpcrdma_mw *mw)
struct rpcrdma_frmr *f = &mw->frmr;
struct ib_send_wr *invalidate_wr;
dprintk("RPC: %s: invalidating frmr %p\n", __func__, f);
f->fr_state = FRMR_IS_INVALID;
invalidate_wr = &f->fr_invwr;
@ -472,6 +475,7 @@ static void
frwr_op_unmap_sync(struct rpcrdma_xprt *r_xprt, struct rpcrdma_req *req)
{
struct ib_send_wr *invalidate_wrs, *pos, *prev, *bad_wr;
struct rpcrdma_rep *rep = req->rl_reply;
struct rpcrdma_ia *ia = &r_xprt->rx_ia;
struct rpcrdma_mw *mw, *tmp;
struct rpcrdma_frmr *f;
@ -487,6 +491,12 @@ frwr_op_unmap_sync(struct rpcrdma_xprt *r_xprt, struct rpcrdma_req *req)
f = NULL;
invalidate_wrs = pos = prev = NULL;
list_for_each_entry(mw, &req->rl_registered, mw_list) {
if ((rep->rr_wc_flags & IB_WC_WITH_INVALIDATE) &&
(mw->mw_handle == rep->rr_inv_rkey)) {
mw->frmr.fr_state = FRMR_IS_INVALID;
continue;
}
pos = __frwr_prepare_linv_wr(mw);
if (!invalidate_wrs)
@ -496,6 +506,8 @@ frwr_op_unmap_sync(struct rpcrdma_xprt *r_xprt, struct rpcrdma_req *req)
prev = pos;
f = &mw->frmr;
}
if (!f)
goto unmap;
/* Strong send queue ordering guarantees that when the
* last WR in the chain completes, all WRs in the chain
@ -510,6 +522,7 @@ frwr_op_unmap_sync(struct rpcrdma_xprt *r_xprt, struct rpcrdma_req *req)
* replaces the QP. The RPC reply handler won't call us
* unless ri_id->qp is a valid pointer.
*/
r_xprt->rx_stats.local_inv_needed++;
rc = ib_post_send(ia->ri_id->qp, invalidate_wrs, &bad_wr);
if (rc)
goto reset_mrs;
@ -521,6 +534,8 @@ frwr_op_unmap_sync(struct rpcrdma_xprt *r_xprt, struct rpcrdma_req *req)
*/
unmap:
list_for_each_entry_safe(mw, tmp, &req->rl_registered, mw_list) {
dprintk("RPC: %s: unmapping frmr %p\n",
__func__, &mw->frmr);
list_del_init(&mw->mw_list);
ib_dma_unmap_sg(ia->ri_device,
mw->mw_sg, mw->mw_nents, mw->mw_dir);
@ -576,4 +591,5 @@ const struct rpcrdma_memreg_ops rpcrdma_frwr_memreg_ops = {
.ro_init_mr = frwr_op_init_mr,
.ro_release_mr = frwr_op_release_mr,
.ro_displayname = "frwr",
.ro_send_w_inv_ok = RPCRDMA_CMP_F_SND_W_INV_OK,
};

View File

@ -53,14 +53,6 @@
# define RPCDBG_FACILITY RPCDBG_TRANS
#endif
enum rpcrdma_chunktype {
rpcrdma_noch = 0,
rpcrdma_readch,
rpcrdma_areadch,
rpcrdma_writech,
rpcrdma_replych
};
static const char transfertypes[][12] = {
"inline", /* no chunks */
"read list", /* some argument via rdma read */
@ -118,10 +110,12 @@ static unsigned int rpcrdma_max_reply_header_size(unsigned int maxsegs)
return size;
}
void rpcrdma_set_max_header_sizes(struct rpcrdma_ia *ia,
struct rpcrdma_create_data_internal *cdata,
unsigned int maxsegs)
void rpcrdma_set_max_header_sizes(struct rpcrdma_xprt *r_xprt)
{
struct rpcrdma_create_data_internal *cdata = &r_xprt->rx_data;
struct rpcrdma_ia *ia = &r_xprt->rx_ia;
unsigned int maxsegs = ia->ri_max_segs;
ia->ri_max_inline_write = cdata->inline_wsize -
rpcrdma_max_call_header_size(maxsegs);
ia->ri_max_inline_read = cdata->inline_rsize -
@ -155,42 +149,6 @@ static bool rpcrdma_results_inline(struct rpcrdma_xprt *r_xprt,
return rqst->rq_rcv_buf.buflen <= ia->ri_max_inline_read;
}
static int
rpcrdma_tail_pullup(struct xdr_buf *buf)
{
size_t tlen = buf->tail[0].iov_len;
size_t skip = tlen & 3;
/* Do not include the tail if it is only an XDR pad */
if (tlen < 4)
return 0;
/* xdr_write_pages() adds a pad at the beginning of the tail
* if the content in "buf->pages" is unaligned. Force the
* tail's actual content to land at the next XDR position
* after the head instead.
*/
if (skip) {
unsigned char *src, *dst;
unsigned int count;
src = buf->tail[0].iov_base;
dst = buf->head[0].iov_base;
dst += buf->head[0].iov_len;
src += skip;
tlen -= skip;
dprintk("RPC: %s: skip=%zu, memmove(%p, %p, %zu)\n",
__func__, skip, dst, src, tlen);
for (count = tlen; count; count--)
*dst++ = *src++;
}
return tlen;
}
/* Split "vec" on page boundaries into segments. FMR registers pages,
* not a byte range. Other modes coalesce these segments into a single
* MR when they can.
@ -229,7 +187,8 @@ rpcrdma_convert_kvec(struct kvec *vec, struct rpcrdma_mr_seg *seg, int n)
static int
rpcrdma_convert_iovs(struct xdr_buf *xdrbuf, unsigned int pos,
enum rpcrdma_chunktype type, struct rpcrdma_mr_seg *seg)
enum rpcrdma_chunktype type, struct rpcrdma_mr_seg *seg,
bool reminv_expected)
{
int len, n, p, page_base;
struct page **ppages;
@ -271,6 +230,13 @@ rpcrdma_convert_iovs(struct xdr_buf *xdrbuf, unsigned int pos,
if (type == rpcrdma_readch)
return n;
/* When encoding the Write list, some servers need to see an extra
* segment for odd-length Write chunks. The upper layer provides
* space in the tail iovec for this purpose.
*/
if (type == rpcrdma_writech && reminv_expected)
return n;
if (xdrbuf->tail[0].iov_len) {
/* the rpcrdma protocol allows us to omit any trailing
* xdr pad bytes, saving the server an RDMA operation. */
@ -327,7 +293,7 @@ rpcrdma_encode_read_list(struct rpcrdma_xprt *r_xprt,
if (rtype == rpcrdma_areadch)
pos = 0;
seg = req->rl_segments;
nsegs = rpcrdma_convert_iovs(&rqst->rq_snd_buf, pos, rtype, seg);
nsegs = rpcrdma_convert_iovs(&rqst->rq_snd_buf, pos, rtype, seg, false);
if (nsegs < 0)
return ERR_PTR(nsegs);
@ -391,7 +357,8 @@ rpcrdma_encode_write_list(struct rpcrdma_xprt *r_xprt, struct rpcrdma_req *req,
seg = req->rl_segments;
nsegs = rpcrdma_convert_iovs(&rqst->rq_rcv_buf,
rqst->rq_rcv_buf.head[0].iov_len,
wtype, seg);
wtype, seg,
r_xprt->rx_ia.ri_reminv_expected);
if (nsegs < 0)
return ERR_PTR(nsegs);
@ -456,7 +423,8 @@ rpcrdma_encode_reply_chunk(struct rpcrdma_xprt *r_xprt,
}
seg = req->rl_segments;
nsegs = rpcrdma_convert_iovs(&rqst->rq_rcv_buf, 0, wtype, seg);
nsegs = rpcrdma_convert_iovs(&rqst->rq_rcv_buf, 0, wtype, seg,
r_xprt->rx_ia.ri_reminv_expected);
if (nsegs < 0)
return ERR_PTR(nsegs);
@ -491,74 +459,184 @@ rpcrdma_encode_reply_chunk(struct rpcrdma_xprt *r_xprt,
return iptr;
}
/*
* Copy write data inline.
* This function is used for "small" requests. Data which is passed
* to RPC via iovecs (or page list) is copied directly into the
* pre-registered memory buffer for this request. For small amounts
* of data, this is efficient. The cutoff value is tunable.
/* Prepare the RPC-over-RDMA header SGE.
*/
static void rpcrdma_inline_pullup(struct rpc_rqst *rqst)
static bool
rpcrdma_prepare_hdr_sge(struct rpcrdma_ia *ia, struct rpcrdma_req *req,
u32 len)
{
int i, npages, curlen;
int copy_len;
unsigned char *srcp, *destp;
struct rpcrdma_xprt *r_xprt = rpcx_to_rdmax(rqst->rq_xprt);
int page_base;
struct page **ppages;
struct rpcrdma_regbuf *rb = req->rl_rdmabuf;
struct ib_sge *sge = &req->rl_send_sge[0];
destp = rqst->rq_svec[0].iov_base;
curlen = rqst->rq_svec[0].iov_len;
destp += curlen;
if (unlikely(!rpcrdma_regbuf_is_mapped(rb))) {
if (!__rpcrdma_dma_map_regbuf(ia, rb))
return false;
sge->addr = rdmab_addr(rb);
sge->lkey = rdmab_lkey(rb);
}
sge->length = len;
dprintk("RPC: %s: destp 0x%p len %d hdrlen %d\n",
__func__, destp, rqst->rq_slen, curlen);
ib_dma_sync_single_for_device(ia->ri_device, sge->addr,
sge->length, DMA_TO_DEVICE);
req->rl_send_wr.num_sge++;
return true;
}
copy_len = rqst->rq_snd_buf.page_len;
/* Prepare the Send SGEs. The head and tail iovec, and each entry
* in the page list, gets its own SGE.
*/
static bool
rpcrdma_prepare_msg_sges(struct rpcrdma_ia *ia, struct rpcrdma_req *req,
struct xdr_buf *xdr, enum rpcrdma_chunktype rtype)
{
unsigned int sge_no, page_base, len, remaining;
struct rpcrdma_regbuf *rb = req->rl_sendbuf;
struct ib_device *device = ia->ri_device;
struct ib_sge *sge = req->rl_send_sge;
u32 lkey = ia->ri_pd->local_dma_lkey;
struct page *page, **ppages;
if (rqst->rq_snd_buf.tail[0].iov_len) {
curlen = rqst->rq_snd_buf.tail[0].iov_len;
if (destp + copy_len != rqst->rq_snd_buf.tail[0].iov_base) {
memmove(destp + copy_len,
rqst->rq_snd_buf.tail[0].iov_base, curlen);
r_xprt->rx_stats.pullup_copy_count += curlen;
/* The head iovec is straightforward, as it is already
* DMA-mapped. Sync the content that has changed.
*/
if (!rpcrdma_dma_map_regbuf(ia, rb))
return false;
sge_no = 1;
sge[sge_no].addr = rdmab_addr(rb);
sge[sge_no].length = xdr->head[0].iov_len;
sge[sge_no].lkey = rdmab_lkey(rb);
ib_dma_sync_single_for_device(device, sge[sge_no].addr,
sge[sge_no].length, DMA_TO_DEVICE);
/* If there is a Read chunk, the page list is being handled
* via explicit RDMA, and thus is skipped here. However, the
* tail iovec may include an XDR pad for the page list, as
* well as additional content, and may not reside in the
* same page as the head iovec.
*/
if (rtype == rpcrdma_readch) {
len = xdr->tail[0].iov_len;
/* Do not include the tail if it is only an XDR pad */
if (len < 4)
goto out;
page = virt_to_page(xdr->tail[0].iov_base);
page_base = (unsigned long)xdr->tail[0].iov_base & ~PAGE_MASK;
/* If the content in the page list is an odd length,
* xdr_write_pages() has added a pad at the beginning
* of the tail iovec. Force the tail's non-pad content
* to land at the next XDR position in the Send message.
*/
page_base += len & 3;
len -= len & 3;
goto map_tail;
}
/* If there is a page list present, temporarily DMA map
* and prepare an SGE for each page to be sent.
*/
if (xdr->page_len) {
ppages = xdr->pages + (xdr->page_base >> PAGE_SHIFT);
page_base = xdr->page_base & ~PAGE_MASK;
remaining = xdr->page_len;
while (remaining) {
sge_no++;
if (sge_no > RPCRDMA_MAX_SEND_SGES - 2)
goto out_mapping_overflow;
len = min_t(u32, PAGE_SIZE - page_base, remaining);
sge[sge_no].addr = ib_dma_map_page(device, *ppages,
page_base, len,
DMA_TO_DEVICE);
if (ib_dma_mapping_error(device, sge[sge_no].addr))
goto out_mapping_err;
sge[sge_no].length = len;
sge[sge_no].lkey = lkey;
req->rl_mapped_sges++;
ppages++;
remaining -= len;
page_base = 0;
}
dprintk("RPC: %s: tail destp 0x%p len %d\n",
__func__, destp + copy_len, curlen);
rqst->rq_svec[0].iov_len += curlen;
}
r_xprt->rx_stats.pullup_copy_count += copy_len;
page_base = rqst->rq_snd_buf.page_base;
ppages = rqst->rq_snd_buf.pages + (page_base >> PAGE_SHIFT);
page_base &= ~PAGE_MASK;
npages = PAGE_ALIGN(page_base+copy_len) >> PAGE_SHIFT;
for (i = 0; copy_len && i < npages; i++) {
curlen = PAGE_SIZE - page_base;
if (curlen > copy_len)
curlen = copy_len;
dprintk("RPC: %s: page %d destp 0x%p len %d curlen %d\n",
__func__, i, destp, copy_len, curlen);
srcp = kmap_atomic(ppages[i]);
memcpy(destp, srcp+page_base, curlen);
kunmap_atomic(srcp);
rqst->rq_svec[0].iov_len += curlen;
destp += curlen;
copy_len -= curlen;
page_base = 0;
/* The tail iovec is not always constructed in the same
* page where the head iovec resides (see, for example,
* gss_wrap_req_priv). To neatly accommodate that case,
* DMA map it separately.
*/
if (xdr->tail[0].iov_len) {
page = virt_to_page(xdr->tail[0].iov_base);
page_base = (unsigned long)xdr->tail[0].iov_base & ~PAGE_MASK;
len = xdr->tail[0].iov_len;
map_tail:
sge_no++;
sge[sge_no].addr = ib_dma_map_page(device, page,
page_base, len,
DMA_TO_DEVICE);
if (ib_dma_mapping_error(device, sge[sge_no].addr))
goto out_mapping_err;
sge[sge_no].length = len;
sge[sge_no].lkey = lkey;
req->rl_mapped_sges++;
}
/* header now contains entire send message */
out:
req->rl_send_wr.num_sge = sge_no + 1;
return true;
out_mapping_overflow:
pr_err("rpcrdma: too many Send SGEs (%u)\n", sge_no);
return false;
out_mapping_err:
pr_err("rpcrdma: Send mapping error\n");
return false;
}
bool
rpcrdma_prepare_send_sges(struct rpcrdma_ia *ia, struct rpcrdma_req *req,
u32 hdrlen, struct xdr_buf *xdr,
enum rpcrdma_chunktype rtype)
{
req->rl_send_wr.num_sge = 0;
req->rl_mapped_sges = 0;
if (!rpcrdma_prepare_hdr_sge(ia, req, hdrlen))
goto out_map;
if (rtype != rpcrdma_areadch)
if (!rpcrdma_prepare_msg_sges(ia, req, xdr, rtype))
goto out_map;
return true;
out_map:
pr_err("rpcrdma: failed to DMA map a Send buffer\n");
return false;
}
void
rpcrdma_unmap_sges(struct rpcrdma_ia *ia, struct rpcrdma_req *req)
{
struct ib_device *device = ia->ri_device;
struct ib_sge *sge;
int count;
sge = &req->rl_send_sge[2];
for (count = req->rl_mapped_sges; count--; sge++)
ib_dma_unmap_page(device, sge->addr, sge->length,
DMA_TO_DEVICE);
req->rl_mapped_sges = 0;
}
/*
* Marshal a request: the primary job of this routine is to choose
* the transfer modes. See comments below.
*
* Prepares up to two IOVs per Call message:
*
* [0] -- RPC RDMA header
* [1] -- the RPC header/data
*
* Returns zero on success, otherwise a negative errno.
*/
@ -626,12 +704,11 @@ rpcrdma_marshal_req(struct rpc_rqst *rqst)
*/
if (rpcrdma_args_inline(r_xprt, rqst)) {
rtype = rpcrdma_noch;
rpcrdma_inline_pullup(rqst);
rpclen = rqst->rq_svec[0].iov_len;
rpclen = rqst->rq_snd_buf.len;
} else if (ddp_allowed && rqst->rq_snd_buf.flags & XDRBUF_WRITE) {
rtype = rpcrdma_readch;
rpclen = rqst->rq_svec[0].iov_len;
rpclen += rpcrdma_tail_pullup(&rqst->rq_snd_buf);
rpclen = rqst->rq_snd_buf.head[0].iov_len +
rqst->rq_snd_buf.tail[0].iov_len;
} else {
r_xprt->rx_stats.nomsg_call_count++;
headerp->rm_type = htonl(RDMA_NOMSG);
@ -673,34 +750,18 @@ rpcrdma_marshal_req(struct rpc_rqst *rqst)
goto out_unmap;
hdrlen = (unsigned char *)iptr - (unsigned char *)headerp;
if (hdrlen + rpclen > RPCRDMA_INLINE_WRITE_THRESHOLD(rqst))
goto out_overflow;
dprintk("RPC: %5u %s: %s/%s: hdrlen %zd rpclen %zd\n",
rqst->rq_task->tk_pid, __func__,
transfertypes[rtype], transfertypes[wtype],
hdrlen, rpclen);
req->rl_send_iov[0].addr = rdmab_addr(req->rl_rdmabuf);
req->rl_send_iov[0].length = hdrlen;
req->rl_send_iov[0].lkey = rdmab_lkey(req->rl_rdmabuf);
req->rl_niovs = 1;
if (rtype == rpcrdma_areadch)
return 0;
req->rl_send_iov[1].addr = rdmab_addr(req->rl_sendbuf);
req->rl_send_iov[1].length = rpclen;
req->rl_send_iov[1].lkey = rdmab_lkey(req->rl_sendbuf);
req->rl_niovs = 2;
if (!rpcrdma_prepare_send_sges(&r_xprt->rx_ia, req, hdrlen,
&rqst->rq_snd_buf, rtype)) {
iptr = ERR_PTR(-EIO);
goto out_unmap;
}
return 0;
out_overflow:
pr_err("rpcrdma: send overflow: hdrlen %zd rpclen %zu %s/%s\n",
hdrlen, rpclen, transfertypes[rtype], transfertypes[wtype]);
iptr = ERR_PTR(-EIO);
out_unmap:
r_xprt->rx_ia.ri_ops->ro_unmap_safe(r_xprt, req, false);
return PTR_ERR(iptr);
@ -916,8 +977,10 @@ rpcrdma_conn_func(struct rpcrdma_ep *ep)
* allowed to timeout, to discover the errors at that time.
*/
void
rpcrdma_reply_handler(struct rpcrdma_rep *rep)
rpcrdma_reply_handler(struct work_struct *work)
{
struct rpcrdma_rep *rep =
container_of(work, struct rpcrdma_rep, rr_work);
struct rpcrdma_msg *headerp;
struct rpcrdma_req *req;
struct rpc_rqst *rqst;
@ -1132,6 +1195,6 @@ out_duplicate:
repost:
r_xprt->rx_stats.bad_reply_count++;
if (rpcrdma_ep_post_recv(&r_xprt->rx_ia, &r_xprt->rx_ep, rep))
if (rpcrdma_ep_post_recv(&r_xprt->rx_ia, rep))
rpcrdma_recv_buffer_put(rep);
}

View File

@ -159,33 +159,34 @@ out_unmap:
/* Server-side transport endpoint wants a whole page for its send
* buffer. The client RPC code constructs the RPC header in this
* buffer before it invokes ->send_request.
*
* Returns NULL if there was a temporary allocation failure.
*/
static void *
xprt_rdma_bc_allocate(struct rpc_task *task, size_t size)
static int
xprt_rdma_bc_allocate(struct rpc_task *task)
{
struct rpc_rqst *rqst = task->tk_rqstp;
struct svc_xprt *sxprt = rqst->rq_xprt->bc_xprt;
size_t size = rqst->rq_callsize;
struct svcxprt_rdma *rdma;
struct page *page;
rdma = container_of(sxprt, struct svcxprt_rdma, sc_xprt);
/* Prevent an infinite loop: try to make this case work */
if (size > PAGE_SIZE)
if (size > PAGE_SIZE) {
WARN_ONCE(1, "svcrdma: large bc buffer request (size %zu)\n",
size);
return -EINVAL;
}
page = alloc_page(RPCRDMA_DEF_GFP);
if (!page)
return NULL;
return -ENOMEM;
return page_address(page);
rqst->rq_buffer = page_address(page);
return 0;
}
static void
xprt_rdma_bc_free(void *buffer)
xprt_rdma_bc_free(struct rpc_task *task)
{
/* No-op: ctxt and page have already been freed. */
}

View File

@ -97,7 +97,7 @@ static struct ctl_table xr_tunables_table[] = {
.data = &xprt_rdma_max_inline_read,
.maxlen = sizeof(unsigned int),
.mode = 0644,
.proc_handler = proc_dointvec,
.proc_handler = proc_dointvec_minmax,
.extra1 = &min_inline_size,
.extra2 = &max_inline_size,
},
@ -106,7 +106,7 @@ static struct ctl_table xr_tunables_table[] = {
.data = &xprt_rdma_max_inline_write,
.maxlen = sizeof(unsigned int),
.mode = 0644,
.proc_handler = proc_dointvec,
.proc_handler = proc_dointvec_minmax,
.extra1 = &min_inline_size,
.extra2 = &max_inline_size,
},
@ -477,115 +477,152 @@ xprt_rdma_connect(struct rpc_xprt *xprt, struct rpc_task *task)
}
}
/*
* The RDMA allocate/free functions need the task structure as a place
* to hide the struct rpcrdma_req, which is necessary for the actual send/recv
* sequence.
*
* The RPC layer allocates both send and receive buffers in the same call
* (rq_send_buf and rq_rcv_buf are both part of a single contiguous buffer).
* We may register rq_rcv_buf when using reply chunks.
/* Allocate a fixed-size buffer in which to construct and send the
* RPC-over-RDMA header for this request.
*/
static void *
xprt_rdma_allocate(struct rpc_task *task, size_t size)
static bool
rpcrdma_get_rdmabuf(struct rpcrdma_xprt *r_xprt, struct rpcrdma_req *req,
gfp_t flags)
{
struct rpc_xprt *xprt = task->tk_rqstp->rq_xprt;
struct rpcrdma_xprt *r_xprt = rpcx_to_rdmax(xprt);
size_t size = RPCRDMA_HDRBUF_SIZE;
struct rpcrdma_regbuf *rb;
if (req->rl_rdmabuf)
return true;
rb = rpcrdma_alloc_regbuf(size, DMA_TO_DEVICE, flags);
if (IS_ERR(rb))
return false;
r_xprt->rx_stats.hardway_register_count += size;
req->rl_rdmabuf = rb;
return true;
}
static bool
rpcrdma_get_sendbuf(struct rpcrdma_xprt *r_xprt, struct rpcrdma_req *req,
size_t size, gfp_t flags)
{
struct rpcrdma_regbuf *rb;
if (req->rl_sendbuf && rdmab_length(req->rl_sendbuf) >= size)
return true;
rb = rpcrdma_alloc_regbuf(size, DMA_TO_DEVICE, flags);
if (IS_ERR(rb))
return false;
rpcrdma_free_regbuf(req->rl_sendbuf);
r_xprt->rx_stats.hardway_register_count += size;
req->rl_sendbuf = rb;
return true;
}
/* The rq_rcv_buf is used only if a Reply chunk is necessary.
* The decision to use a Reply chunk is made later in
* rpcrdma_marshal_req. This buffer is registered at that time.
*
* Otherwise, the associated RPC Reply arrives in a separate
* Receive buffer, arbitrarily chosen by the HCA. The buffer
* allocated here for the RPC Reply is not utilized in that
* case. See rpcrdma_inline_fixup.
*
* A regbuf is used here to remember the buffer size.
*/
static bool
rpcrdma_get_recvbuf(struct rpcrdma_xprt *r_xprt, struct rpcrdma_req *req,
size_t size, gfp_t flags)
{
struct rpcrdma_regbuf *rb;
if (req->rl_recvbuf && rdmab_length(req->rl_recvbuf) >= size)
return true;
rb = rpcrdma_alloc_regbuf(size, DMA_NONE, flags);
if (IS_ERR(rb))
return false;
rpcrdma_free_regbuf(req->rl_recvbuf);
r_xprt->rx_stats.hardway_register_count += size;
req->rl_recvbuf = rb;
return true;
}
/**
* xprt_rdma_allocate - allocate transport resources for an RPC
* @task: RPC task
*
* Return values:
* 0: Success; rq_buffer points to RPC buffer to use
* ENOMEM: Out of memory, call again later
* EIO: A permanent error occurred, do not retry
*
* The RDMA allocate/free functions need the task structure as a place
* to hide the struct rpcrdma_req, which is necessary for the actual
* send/recv sequence.
*
* xprt_rdma_allocate provides buffers that are already mapped for
* DMA, and a local DMA lkey is provided for each.
*/
static int
xprt_rdma_allocate(struct rpc_task *task)
{
struct rpc_rqst *rqst = task->tk_rqstp;
struct rpcrdma_xprt *r_xprt = rpcx_to_rdmax(rqst->rq_xprt);
struct rpcrdma_req *req;
size_t min_size;
gfp_t flags;
req = rpcrdma_buffer_get(&r_xprt->rx_buf);
if (req == NULL)
return NULL;
return -ENOMEM;
flags = RPCRDMA_DEF_GFP;
if (RPC_IS_SWAPPER(task))
flags = __GFP_MEMALLOC | GFP_NOWAIT | __GFP_NOWARN;
if (req->rl_rdmabuf == NULL)
goto out_rdmabuf;
if (req->rl_sendbuf == NULL)
goto out_sendbuf;
if (size > req->rl_sendbuf->rg_size)
goto out_sendbuf;
if (!rpcrdma_get_rdmabuf(r_xprt, req, flags))
goto out_fail;
if (!rpcrdma_get_sendbuf(r_xprt, req, rqst->rq_callsize, flags))
goto out_fail;
if (!rpcrdma_get_recvbuf(r_xprt, req, rqst->rq_rcvsize, flags))
goto out_fail;
dprintk("RPC: %5u %s: send size = %zd, recv size = %zd, req = %p\n",
task->tk_pid, __func__, rqst->rq_callsize,
rqst->rq_rcvsize, req);
out:
dprintk("RPC: %s: size %zd, request 0x%p\n", __func__, size, req);
req->rl_connect_cookie = 0; /* our reserved value */
req->rl_task = task;
return req->rl_sendbuf->rg_base;
out_rdmabuf:
min_size = RPCRDMA_INLINE_WRITE_THRESHOLD(task->tk_rqstp);
rb = rpcrdma_alloc_regbuf(&r_xprt->rx_ia, min_size, flags);
if (IS_ERR(rb))
goto out_fail;
req->rl_rdmabuf = rb;
out_sendbuf:
/* XDR encoding and RPC/RDMA marshaling of this request has not
* yet occurred. Thus a lower bound is needed to prevent buffer
* overrun during marshaling.
*
* RPC/RDMA marshaling may choose to send payload bearing ops
* inline, if the result is smaller than the inline threshold.
* The value of the "size" argument accounts for header
* requirements but not for the payload in these cases.
*
* Likewise, allocate enough space to receive a reply up to the
* size of the inline threshold.
*
* It's unlikely that both the send header and the received
* reply will be large, but slush is provided here to allow
* flexibility when marshaling.
*/
min_size = RPCRDMA_INLINE_READ_THRESHOLD(task->tk_rqstp);
min_size += RPCRDMA_INLINE_WRITE_THRESHOLD(task->tk_rqstp);
if (size < min_size)
size = min_size;
rb = rpcrdma_alloc_regbuf(&r_xprt->rx_ia, size, flags);
if (IS_ERR(rb))
goto out_fail;
rb->rg_owner = req;
r_xprt->rx_stats.hardway_register_count += size;
rpcrdma_free_regbuf(&r_xprt->rx_ia, req->rl_sendbuf);
req->rl_sendbuf = rb;
goto out;
rpcrdma_set_xprtdata(rqst, req);
rqst->rq_buffer = req->rl_sendbuf->rg_base;
rqst->rq_rbuffer = req->rl_recvbuf->rg_base;
return 0;
out_fail:
rpcrdma_buffer_put(req);
return NULL;
return -ENOMEM;
}
/*
* This function returns all RDMA resources to the pool.
/**
* xprt_rdma_free - release resources allocated by xprt_rdma_allocate
* @task: RPC task
*
* Caller guarantees rqst->rq_buffer is non-NULL.
*/
static void
xprt_rdma_free(void *buffer)
xprt_rdma_free(struct rpc_task *task)
{
struct rpcrdma_req *req;
struct rpcrdma_xprt *r_xprt;
struct rpcrdma_regbuf *rb;
struct rpc_rqst *rqst = task->tk_rqstp;
struct rpcrdma_xprt *r_xprt = rpcx_to_rdmax(rqst->rq_xprt);
struct rpcrdma_req *req = rpcr_to_rdmar(rqst);
struct rpcrdma_ia *ia = &r_xprt->rx_ia;
if (buffer == NULL)
return;
rb = container_of(buffer, struct rpcrdma_regbuf, rg_base[0]);
req = rb->rg_owner;
if (req->rl_backchannel)
return;
r_xprt = container_of(req->rl_buffer, struct rpcrdma_xprt, rx_buf);
dprintk("RPC: %s: called on 0x%p\n", __func__, req->rl_reply);
r_xprt->rx_ia.ri_ops->ro_unmap_safe(r_xprt, req,
!RPC_IS_ASYNC(req->rl_task));
ia->ri_ops->ro_unmap_safe(r_xprt, req, !RPC_IS_ASYNC(task));
rpcrdma_unmap_sges(ia, req);
rpcrdma_buffer_put(req);
}
@ -685,10 +722,11 @@ void xprt_rdma_print_stats(struct rpc_xprt *xprt, struct seq_file *seq)
r_xprt->rx_stats.failed_marshal_count,
r_xprt->rx_stats.bad_reply_count,
r_xprt->rx_stats.nomsg_call_count);
seq_printf(seq, "%lu %lu %lu\n",
seq_printf(seq, "%lu %lu %lu %lu\n",
r_xprt->rx_stats.mrs_recovered,
r_xprt->rx_stats.mrs_orphaned,
r_xprt->rx_stats.mrs_allocated);
r_xprt->rx_stats.mrs_allocated,
r_xprt->rx_stats.local_inv_needed);
}
static int

View File

@ -129,15 +129,6 @@ rpcrdma_wc_send(struct ib_cq *cq, struct ib_wc *wc)
wc->status, wc->vendor_err);
}
static void
rpcrdma_receive_worker(struct work_struct *work)
{
struct rpcrdma_rep *rep =
container_of(work, struct rpcrdma_rep, rr_work);
rpcrdma_reply_handler(rep);
}
/* Perform basic sanity checking to avoid using garbage
* to update the credit grant value.
*/
@ -161,13 +152,13 @@ rpcrdma_update_granted_credits(struct rpcrdma_rep *rep)
}
/**
* rpcrdma_receive_wc - Invoked by RDMA provider for each polled Receive WC
* rpcrdma_wc_receive - Invoked by RDMA provider for each polled Receive WC
* @cq: completion queue (ignored)
* @wc: completed WR
*
*/
static void
rpcrdma_receive_wc(struct ib_cq *cq, struct ib_wc *wc)
rpcrdma_wc_receive(struct ib_cq *cq, struct ib_wc *wc)
{
struct ib_cqe *cqe = wc->wr_cqe;
struct rpcrdma_rep *rep = container_of(cqe, struct rpcrdma_rep,
@ -185,6 +176,9 @@ rpcrdma_receive_wc(struct ib_cq *cq, struct ib_wc *wc)
__func__, rep, wc->byte_len);
rep->rr_len = wc->byte_len;
rep->rr_wc_flags = wc->wc_flags;
rep->rr_inv_rkey = wc->ex.invalidate_rkey;
ib_dma_sync_single_for_cpu(rep->rr_device,
rdmab_addr(rep->rr_rdmabuf),
rep->rr_len, DMA_FROM_DEVICE);
@ -204,6 +198,36 @@ out_fail:
goto out_schedule;
}
static void
rpcrdma_update_connect_private(struct rpcrdma_xprt *r_xprt,
struct rdma_conn_param *param)
{
struct rpcrdma_create_data_internal *cdata = &r_xprt->rx_data;
const struct rpcrdma_connect_private *pmsg = param->private_data;
unsigned int rsize, wsize;
/* Default settings for RPC-over-RDMA Version One */
r_xprt->rx_ia.ri_reminv_expected = false;
rsize = RPCRDMA_V1_DEF_INLINE_SIZE;
wsize = RPCRDMA_V1_DEF_INLINE_SIZE;
if (pmsg &&
pmsg->cp_magic == rpcrdma_cmp_magic &&
pmsg->cp_version == RPCRDMA_CMP_VERSION) {
r_xprt->rx_ia.ri_reminv_expected = true;
rsize = rpcrdma_decode_buffer_size(pmsg->cp_send_size);
wsize = rpcrdma_decode_buffer_size(pmsg->cp_recv_size);
}
if (rsize < cdata->inline_rsize)
cdata->inline_rsize = rsize;
if (wsize < cdata->inline_wsize)
cdata->inline_wsize = wsize;
pr_info("rpcrdma: max send %u, max recv %u\n",
cdata->inline_wsize, cdata->inline_rsize);
rpcrdma_set_max_header_sizes(r_xprt);
}
static int
rpcrdma_conn_upcall(struct rdma_cm_id *id, struct rdma_cm_event *event)
{
@ -244,6 +268,7 @@ rpcrdma_conn_upcall(struct rdma_cm_id *id, struct rdma_cm_event *event)
" (%d initiator)\n",
__func__, attr->max_dest_rd_atomic,
attr->max_rd_atomic);
rpcrdma_update_connect_private(xprt, &event->param.conn);
goto connected;
case RDMA_CM_EVENT_CONNECT_ERROR:
connstate = -ENOTCONN;
@ -454,11 +479,12 @@ int
rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
struct rpcrdma_create_data_internal *cdata)
{
struct rpcrdma_connect_private *pmsg = &ep->rep_cm_private;
struct ib_cq *sendcq, *recvcq;
unsigned int max_qp_wr;
int rc;
if (ia->ri_device->attrs.max_sge < RPCRDMA_MAX_IOVS) {
if (ia->ri_device->attrs.max_sge < RPCRDMA_MAX_SEND_SGES) {
dprintk("RPC: %s: insufficient sge's available\n",
__func__);
return -ENOMEM;
@ -487,7 +513,7 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
ep->rep_attr.cap.max_recv_wr = cdata->max_requests;
ep->rep_attr.cap.max_recv_wr += RPCRDMA_BACKWARD_WRS;
ep->rep_attr.cap.max_recv_wr += 1; /* drain cqe */
ep->rep_attr.cap.max_send_sge = RPCRDMA_MAX_IOVS;
ep->rep_attr.cap.max_send_sge = RPCRDMA_MAX_SEND_SGES;
ep->rep_attr.cap.max_recv_sge = 1;
ep->rep_attr.cap.max_inline_data = 0;
ep->rep_attr.sq_sig_type = IB_SIGNAL_REQ_WR;
@ -536,9 +562,14 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
/* Initialize cma parameters */
memset(&ep->rep_remote_cma, 0, sizeof(ep->rep_remote_cma));
/* RPC/RDMA does not use private data */
ep->rep_remote_cma.private_data = NULL;
ep->rep_remote_cma.private_data_len = 0;
/* Prepare RDMA-CM private message */
pmsg->cp_magic = rpcrdma_cmp_magic;
pmsg->cp_version = RPCRDMA_CMP_VERSION;
pmsg->cp_flags |= ia->ri_ops->ro_send_w_inv_ok;
pmsg->cp_send_size = rpcrdma_encode_buffer_size(cdata->inline_wsize);
pmsg->cp_recv_size = rpcrdma_encode_buffer_size(cdata->inline_rsize);
ep->rep_remote_cma.private_data = pmsg;
ep->rep_remote_cma.private_data_len = sizeof(*pmsg);
/* Client offers RDMA Read but does not initiate */
ep->rep_remote_cma.initiator_depth = 0;
@ -849,6 +880,10 @@ rpcrdma_create_req(struct rpcrdma_xprt *r_xprt)
req->rl_cqe.done = rpcrdma_wc_send;
req->rl_buffer = &r_xprt->rx_buf;
INIT_LIST_HEAD(&req->rl_registered);
req->rl_send_wr.next = NULL;
req->rl_send_wr.wr_cqe = &req->rl_cqe;
req->rl_send_wr.sg_list = req->rl_send_sge;
req->rl_send_wr.opcode = IB_WR_SEND;
return req;
}
@ -865,17 +900,21 @@ rpcrdma_create_rep(struct rpcrdma_xprt *r_xprt)
if (rep == NULL)
goto out;
rep->rr_rdmabuf = rpcrdma_alloc_regbuf(ia, cdata->inline_rsize,
GFP_KERNEL);
rep->rr_rdmabuf = rpcrdma_alloc_regbuf(cdata->inline_rsize,
DMA_FROM_DEVICE, GFP_KERNEL);
if (IS_ERR(rep->rr_rdmabuf)) {
rc = PTR_ERR(rep->rr_rdmabuf);
goto out_free;
}
rep->rr_device = ia->ri_device;
rep->rr_cqe.done = rpcrdma_receive_wc;
rep->rr_cqe.done = rpcrdma_wc_receive;
rep->rr_rxprt = r_xprt;
INIT_WORK(&rep->rr_work, rpcrdma_receive_worker);
INIT_WORK(&rep->rr_work, rpcrdma_reply_handler);
rep->rr_recv_wr.next = NULL;
rep->rr_recv_wr.wr_cqe = &rep->rr_cqe;
rep->rr_recv_wr.sg_list = &rep->rr_rdmabuf->rg_iov;
rep->rr_recv_wr.num_sge = 1;
return rep;
out_free:
@ -966,17 +1005,18 @@ rpcrdma_buffer_get_rep_locked(struct rpcrdma_buffer *buf)
}
static void
rpcrdma_destroy_rep(struct rpcrdma_ia *ia, struct rpcrdma_rep *rep)
rpcrdma_destroy_rep(struct rpcrdma_rep *rep)
{
rpcrdma_free_regbuf(ia, rep->rr_rdmabuf);
rpcrdma_free_regbuf(rep->rr_rdmabuf);
kfree(rep);
}
void
rpcrdma_destroy_req(struct rpcrdma_ia *ia, struct rpcrdma_req *req)
rpcrdma_destroy_req(struct rpcrdma_req *req)
{
rpcrdma_free_regbuf(ia, req->rl_sendbuf);
rpcrdma_free_regbuf(ia, req->rl_rdmabuf);
rpcrdma_free_regbuf(req->rl_recvbuf);
rpcrdma_free_regbuf(req->rl_sendbuf);
rpcrdma_free_regbuf(req->rl_rdmabuf);
kfree(req);
}
@ -1009,15 +1049,13 @@ rpcrdma_destroy_mrs(struct rpcrdma_buffer *buf)
void
rpcrdma_buffer_destroy(struct rpcrdma_buffer *buf)
{
struct rpcrdma_ia *ia = rdmab_to_ia(buf);
cancel_delayed_work_sync(&buf->rb_recovery_worker);
while (!list_empty(&buf->rb_recv_bufs)) {
struct rpcrdma_rep *rep;
rep = rpcrdma_buffer_get_rep_locked(buf);
rpcrdma_destroy_rep(ia, rep);
rpcrdma_destroy_rep(rep);
}
buf->rb_send_count = 0;
@ -1030,7 +1068,7 @@ rpcrdma_buffer_destroy(struct rpcrdma_buffer *buf)
list_del(&req->rl_all);
spin_unlock(&buf->rb_reqslock);
rpcrdma_destroy_req(ia, req);
rpcrdma_destroy_req(req);
spin_lock(&buf->rb_reqslock);
}
spin_unlock(&buf->rb_reqslock);
@ -1129,7 +1167,7 @@ rpcrdma_buffer_put(struct rpcrdma_req *req)
struct rpcrdma_buffer *buffers = req->rl_buffer;
struct rpcrdma_rep *rep = req->rl_reply;
req->rl_niovs = 0;
req->rl_send_wr.num_sge = 0;
req->rl_reply = NULL;
spin_lock(&buffers->rb_lock);
@ -1171,70 +1209,81 @@ rpcrdma_recv_buffer_put(struct rpcrdma_rep *rep)
spin_unlock(&buffers->rb_lock);
}
/*
* Wrappers for internal-use kmalloc memory registration, used by buffer code.
*/
/**
* rpcrdma_alloc_regbuf - kmalloc and register memory for SEND/RECV buffers
* @ia: controlling rpcrdma_ia
* rpcrdma_alloc_regbuf - allocate and DMA-map memory for SEND/RECV buffers
* @size: size of buffer to be allocated, in bytes
* @direction: direction of data movement
* @flags: GFP flags
*
* Returns pointer to private header of an area of internally
* registered memory, or an ERR_PTR. The registered buffer follows
* the end of the private header.
* Returns an ERR_PTR, or a pointer to a regbuf, a buffer that
* can be persistently DMA-mapped for I/O.
*
* xprtrdma uses a regbuf for posting an outgoing RDMA SEND, or for
* receiving the payload of RDMA RECV operations. regbufs are not
* used for RDMA READ/WRITE operations, thus are registered only for
* LOCAL access.
* receiving the payload of RDMA RECV operations. During Long Calls
* or Replies they may be registered externally via ro_map.
*/
struct rpcrdma_regbuf *
rpcrdma_alloc_regbuf(struct rpcrdma_ia *ia, size_t size, gfp_t flags)
rpcrdma_alloc_regbuf(size_t size, enum dma_data_direction direction,
gfp_t flags)
{
struct rpcrdma_regbuf *rb;
struct ib_sge *iov;
rb = kmalloc(sizeof(*rb) + size, flags);
if (rb == NULL)
goto out;
return ERR_PTR(-ENOMEM);
iov = &rb->rg_iov;
iov->addr = ib_dma_map_single(ia->ri_device,
(void *)rb->rg_base, size,
DMA_BIDIRECTIONAL);
if (ib_dma_mapping_error(ia->ri_device, iov->addr))
goto out_free;
rb->rg_device = NULL;
rb->rg_direction = direction;
rb->rg_iov.length = size;
iov->length = size;
iov->lkey = ia->ri_pd->local_dma_lkey;
rb->rg_size = size;
rb->rg_owner = NULL;
return rb;
}
out_free:
kfree(rb);
out:
return ERR_PTR(-ENOMEM);
/**
* __rpcrdma_map_regbuf - DMA-map a regbuf
* @ia: controlling rpcrdma_ia
* @rb: regbuf to be mapped
*/
bool
__rpcrdma_dma_map_regbuf(struct rpcrdma_ia *ia, struct rpcrdma_regbuf *rb)
{
if (rb->rg_direction == DMA_NONE)
return false;
rb->rg_iov.addr = ib_dma_map_single(ia->ri_device,
(void *)rb->rg_base,
rdmab_length(rb),
rb->rg_direction);
if (ib_dma_mapping_error(ia->ri_device, rdmab_addr(rb)))
return false;
rb->rg_device = ia->ri_device;
rb->rg_iov.lkey = ia->ri_pd->local_dma_lkey;
return true;
}
static void
rpcrdma_dma_unmap_regbuf(struct rpcrdma_regbuf *rb)
{
if (!rpcrdma_regbuf_is_mapped(rb))
return;
ib_dma_unmap_single(rb->rg_device, rdmab_addr(rb),
rdmab_length(rb), rb->rg_direction);
rb->rg_device = NULL;
}
/**
* rpcrdma_free_regbuf - deregister and free registered buffer
* @ia: controlling rpcrdma_ia
* @rb: regbuf to be deregistered and freed
*/
void
rpcrdma_free_regbuf(struct rpcrdma_ia *ia, struct rpcrdma_regbuf *rb)
rpcrdma_free_regbuf(struct rpcrdma_regbuf *rb)
{
struct ib_sge *iov;
if (!rb)
return;
iov = &rb->rg_iov;
ib_dma_unmap_single(ia->ri_device,
iov->addr, iov->length, DMA_BIDIRECTIONAL);
rpcrdma_dma_unmap_regbuf(rb);
kfree(rb);
}
@ -1248,39 +1297,28 @@ rpcrdma_ep_post(struct rpcrdma_ia *ia,
struct rpcrdma_ep *ep,
struct rpcrdma_req *req)
{
struct ib_device *device = ia->ri_device;
struct ib_send_wr send_wr, *send_wr_fail;
struct rpcrdma_rep *rep = req->rl_reply;
struct ib_sge *iov = req->rl_send_iov;
int i, rc;
struct ib_send_wr *send_wr = &req->rl_send_wr;
struct ib_send_wr *send_wr_fail;
int rc;
if (rep) {
rc = rpcrdma_ep_post_recv(ia, ep, rep);
if (req->rl_reply) {
rc = rpcrdma_ep_post_recv(ia, req->rl_reply);
if (rc)
return rc;
req->rl_reply = NULL;
}
send_wr.next = NULL;
send_wr.wr_cqe = &req->rl_cqe;
send_wr.sg_list = iov;
send_wr.num_sge = req->rl_niovs;
send_wr.opcode = IB_WR_SEND;
for (i = 0; i < send_wr.num_sge; i++)
ib_dma_sync_single_for_device(device, iov[i].addr,
iov[i].length, DMA_TO_DEVICE);
dprintk("RPC: %s: posting %d s/g entries\n",
__func__, send_wr.num_sge);
__func__, send_wr->num_sge);
if (DECR_CQCOUNT(ep) > 0)
send_wr.send_flags = 0;
send_wr->send_flags = 0;
else { /* Provider must take a send completion every now and then */
INIT_CQCOUNT(ep);
send_wr.send_flags = IB_SEND_SIGNALED;
send_wr->send_flags = IB_SEND_SIGNALED;
}
rc = ib_post_send(ia->ri_id->qp, &send_wr, &send_wr_fail);
rc = ib_post_send(ia->ri_id->qp, send_wr, &send_wr_fail);
if (rc)
goto out_postsend_err;
return 0;
@ -1290,32 +1328,24 @@ out_postsend_err:
return -ENOTCONN;
}
/*
* (Re)post a receive buffer.
*/
int
rpcrdma_ep_post_recv(struct rpcrdma_ia *ia,
struct rpcrdma_ep *ep,
struct rpcrdma_rep *rep)
{
struct ib_recv_wr recv_wr, *recv_wr_fail;
struct ib_recv_wr *recv_wr_fail;
int rc;
recv_wr.next = NULL;
recv_wr.wr_cqe = &rep->rr_cqe;
recv_wr.sg_list = &rep->rr_rdmabuf->rg_iov;
recv_wr.num_sge = 1;
ib_dma_sync_single_for_cpu(ia->ri_device,
rdmab_addr(rep->rr_rdmabuf),
rdmab_length(rep->rr_rdmabuf),
DMA_BIDIRECTIONAL);
rc = ib_post_recv(ia->ri_id->qp, &recv_wr, &recv_wr_fail);
if (!rpcrdma_dma_map_regbuf(ia, rep->rr_rdmabuf))
goto out_map;
rc = ib_post_recv(ia->ri_id->qp, &rep->rr_recv_wr, &recv_wr_fail);
if (rc)
goto out_postrecv;
return 0;
out_map:
pr_err("rpcrdma: failed to DMA map the Receive buffer\n");
return -EIO;
out_postrecv:
pr_err("rpcrdma: ib_post_recv returned %i\n", rc);
return -ENOTCONN;
@ -1333,7 +1363,6 @@ rpcrdma_ep_post_extra_recv(struct rpcrdma_xprt *r_xprt, unsigned int count)
{
struct rpcrdma_buffer *buffers = &r_xprt->rx_buf;
struct rpcrdma_ia *ia = &r_xprt->rx_ia;
struct rpcrdma_ep *ep = &r_xprt->rx_ep;
struct rpcrdma_rep *rep;
int rc;
@ -1344,7 +1373,7 @@ rpcrdma_ep_post_extra_recv(struct rpcrdma_xprt *r_xprt, unsigned int count)
rep = rpcrdma_buffer_get_rep_locked(buffers);
spin_unlock(&buffers->rb_lock);
rc = rpcrdma_ep_post_recv(ia, ep, rep);
rc = rpcrdma_ep_post_recv(ia, rep);
if (rc)
goto out_rc;
}

View File

@ -70,9 +70,11 @@ struct rpcrdma_ia {
struct ib_pd *ri_pd;
struct completion ri_done;
int ri_async_rc;
unsigned int ri_max_segs;
unsigned int ri_max_frmr_depth;
unsigned int ri_max_inline_write;
unsigned int ri_max_inline_read;
bool ri_reminv_expected;
struct ib_qp_attr ri_qp_attr;
struct ib_qp_init_attr ri_qp_init_attr;
};
@ -87,6 +89,7 @@ struct rpcrdma_ep {
int rep_connected;
struct ib_qp_init_attr rep_attr;
wait_queue_head_t rep_connect_wait;
struct rpcrdma_connect_private rep_cm_private;
struct rdma_conn_param rep_remote_cma;
struct sockaddr_storage rep_remote_addr;
struct delayed_work rep_connect_worker;
@ -112,9 +115,9 @@ struct rpcrdma_ep {
*/
struct rpcrdma_regbuf {
size_t rg_size;
struct rpcrdma_req *rg_owner;
struct ib_sge rg_iov;
struct ib_device *rg_device;
enum dma_data_direction rg_direction;
__be32 rg_base[0] __attribute__ ((aligned(256)));
};
@ -162,7 +165,10 @@ rdmab_to_msg(struct rpcrdma_regbuf *rb)
* The smallest inline threshold is 1024 bytes, ensuring that
* at least 750 bytes are available for RPC messages.
*/
#define RPCRDMA_MAX_HDR_SEGS (8)
enum {
RPCRDMA_MAX_HDR_SEGS = 8,
RPCRDMA_HDRBUF_SIZE = 256,
};
/*
* struct rpcrdma_rep -- this structure encapsulates state required to recv
@ -182,10 +188,13 @@ rdmab_to_msg(struct rpcrdma_regbuf *rb)
struct rpcrdma_rep {
struct ib_cqe rr_cqe;
unsigned int rr_len;
int rr_wc_flags;
u32 rr_inv_rkey;
struct ib_device *rr_device;
struct rpcrdma_xprt *rr_rxprt;
struct work_struct rr_work;
struct list_head rr_list;
struct ib_recv_wr rr_recv_wr;
struct rpcrdma_regbuf *rr_rdmabuf;
};
@ -276,19 +285,30 @@ struct rpcrdma_mr_seg { /* chunk descriptors */
char *mr_offset; /* kva if no page, else offset */
};
#define RPCRDMA_MAX_IOVS (2)
/* Reserve enough Send SGEs to send a maximum size inline request:
* - RPC-over-RDMA header
* - xdr_buf head iovec
* - RPCRDMA_MAX_INLINE bytes, possibly unaligned, in pages
* - xdr_buf tail iovec
*/
enum {
RPCRDMA_MAX_SEND_PAGES = PAGE_SIZE + RPCRDMA_MAX_INLINE - 1,
RPCRDMA_MAX_PAGE_SGES = (RPCRDMA_MAX_SEND_PAGES >> PAGE_SHIFT) + 1,
RPCRDMA_MAX_SEND_SGES = 1 + 1 + RPCRDMA_MAX_PAGE_SGES + 1,
};
struct rpcrdma_buffer;
struct rpcrdma_req {
struct list_head rl_free;
unsigned int rl_niovs;
unsigned int rl_mapped_sges;
unsigned int rl_connect_cookie;
struct rpc_task *rl_task;
struct rpcrdma_buffer *rl_buffer;
struct rpcrdma_rep *rl_reply;/* holder for reply buffer */
struct ib_sge rl_send_iov[RPCRDMA_MAX_IOVS];
struct rpcrdma_regbuf *rl_rdmabuf;
struct rpcrdma_regbuf *rl_sendbuf;
struct rpcrdma_rep *rl_reply;
struct ib_send_wr rl_send_wr;
struct ib_sge rl_send_sge[RPCRDMA_MAX_SEND_SGES];
struct rpcrdma_regbuf *rl_rdmabuf; /* xprt header */
struct rpcrdma_regbuf *rl_sendbuf; /* rq_snd_buf */
struct rpcrdma_regbuf *rl_recvbuf; /* rq_rcv_buf */
struct ib_cqe rl_cqe;
struct list_head rl_all;
@ -298,14 +318,16 @@ struct rpcrdma_req {
struct rpcrdma_mr_seg rl_segments[RPCRDMA_MAX_SEGS];
};
static inline void
rpcrdma_set_xprtdata(struct rpc_rqst *rqst, struct rpcrdma_req *req)
{
rqst->rq_xprtdata = req;
}
static inline struct rpcrdma_req *
rpcr_to_rdmar(struct rpc_rqst *rqst)
{
void *buffer = rqst->rq_buffer;
struct rpcrdma_regbuf *rb;
rb = container_of(buffer, struct rpcrdma_regbuf, rg_base);
return rb->rg_owner;
return rqst->rq_xprtdata;
}
/*
@ -356,15 +378,6 @@ struct rpcrdma_create_data_internal {
unsigned int padding; /* non-rdma write header padding */
};
#define RPCRDMA_INLINE_READ_THRESHOLD(rq) \
(rpcx_to_rdmad(rq->rq_xprt).inline_rsize)
#define RPCRDMA_INLINE_WRITE_THRESHOLD(rq)\
(rpcx_to_rdmad(rq->rq_xprt).inline_wsize)
#define RPCRDMA_INLINE_PAD_VALUE(rq)\
rpcx_to_rdmad(rq->rq_xprt).padding
/*
* Statistics for RPCRDMA
*/
@ -386,6 +399,7 @@ struct rpcrdma_stats {
unsigned long mrs_recovered;
unsigned long mrs_orphaned;
unsigned long mrs_allocated;
unsigned long local_inv_needed;
};
/*
@ -409,6 +423,7 @@ struct rpcrdma_memreg_ops {
struct rpcrdma_mw *);
void (*ro_release_mr)(struct rpcrdma_mw *);
const char *ro_displayname;
const int ro_send_w_inv_ok;
};
extern const struct rpcrdma_memreg_ops rpcrdma_fmr_memreg_ops;
@ -461,15 +476,14 @@ void rpcrdma_ep_disconnect(struct rpcrdma_ep *, struct rpcrdma_ia *);
int rpcrdma_ep_post(struct rpcrdma_ia *, struct rpcrdma_ep *,
struct rpcrdma_req *);
int rpcrdma_ep_post_recv(struct rpcrdma_ia *, struct rpcrdma_ep *,
struct rpcrdma_rep *);
int rpcrdma_ep_post_recv(struct rpcrdma_ia *, struct rpcrdma_rep *);
/*
* Buffer calls - xprtrdma/verbs.c
*/
struct rpcrdma_req *rpcrdma_create_req(struct rpcrdma_xprt *);
struct rpcrdma_rep *rpcrdma_create_rep(struct rpcrdma_xprt *);
void rpcrdma_destroy_req(struct rpcrdma_ia *, struct rpcrdma_req *);
void rpcrdma_destroy_req(struct rpcrdma_req *);
int rpcrdma_buffer_create(struct rpcrdma_xprt *);
void rpcrdma_buffer_destroy(struct rpcrdma_buffer *);
@ -482,10 +496,24 @@ void rpcrdma_recv_buffer_put(struct rpcrdma_rep *);
void rpcrdma_defer_mr_recovery(struct rpcrdma_mw *);
struct rpcrdma_regbuf *rpcrdma_alloc_regbuf(struct rpcrdma_ia *,
size_t, gfp_t);
void rpcrdma_free_regbuf(struct rpcrdma_ia *,
struct rpcrdma_regbuf *);
struct rpcrdma_regbuf *rpcrdma_alloc_regbuf(size_t, enum dma_data_direction,
gfp_t);
bool __rpcrdma_dma_map_regbuf(struct rpcrdma_ia *, struct rpcrdma_regbuf *);
void rpcrdma_free_regbuf(struct rpcrdma_regbuf *);
static inline bool
rpcrdma_regbuf_is_mapped(struct rpcrdma_regbuf *rb)
{
return rb->rg_device != NULL;
}
static inline bool
rpcrdma_dma_map_regbuf(struct rpcrdma_ia *ia, struct rpcrdma_regbuf *rb)
{
if (likely(rpcrdma_regbuf_is_mapped(rb)))
return true;
return __rpcrdma_dma_map_regbuf(ia, rb);
}
int rpcrdma_ep_post_extra_recv(struct rpcrdma_xprt *, unsigned int);
@ -507,15 +535,25 @@ rpcrdma_data_dir(bool writing)
*/
void rpcrdma_connect_worker(struct work_struct *);
void rpcrdma_conn_func(struct rpcrdma_ep *);
void rpcrdma_reply_handler(struct rpcrdma_rep *);
void rpcrdma_reply_handler(struct work_struct *);
/*
* RPC/RDMA protocol calls - xprtrdma/rpc_rdma.c
*/
enum rpcrdma_chunktype {
rpcrdma_noch = 0,
rpcrdma_readch,
rpcrdma_areadch,
rpcrdma_writech,
rpcrdma_replych
};
bool rpcrdma_prepare_send_sges(struct rpcrdma_ia *, struct rpcrdma_req *,
u32, struct xdr_buf *, enum rpcrdma_chunktype);
void rpcrdma_unmap_sges(struct rpcrdma_ia *, struct rpcrdma_req *);
int rpcrdma_marshal_req(struct rpc_rqst *);
void rpcrdma_set_max_header_sizes(struct rpcrdma_ia *,
struct rpcrdma_create_data_internal *,
unsigned int);
void rpcrdma_set_max_header_sizes(struct rpcrdma_xprt *);
/* RPC/RDMA module init - xprtrdma/transport.c
*/

View File

@ -473,7 +473,16 @@ static int xs_nospace(struct rpc_task *task)
spin_unlock_bh(&xprt->transport_lock);
/* Race breaker in case memory is freed before above code is called */
sk->sk_write_space(sk);
if (ret == -EAGAIN) {
struct socket_wq *wq;
rcu_read_lock();
wq = rcu_dereference(sk->sk_wq);
set_bit(SOCKWQ_ASYNC_NOSPACE, &wq->flags);
rcu_read_unlock();
sk->sk_write_space(sk);
}
return ret;
}
@ -2533,35 +2542,38 @@ static void xs_tcp_print_stats(struct rpc_xprt *xprt, struct seq_file *seq)
* we allocate pages instead doing a kmalloc like rpc_malloc is because we want
* to use the server side send routines.
*/
static void *bc_malloc(struct rpc_task *task, size_t size)
static int bc_malloc(struct rpc_task *task)
{
struct rpc_rqst *rqst = task->tk_rqstp;
size_t size = rqst->rq_callsize;
struct page *page;
struct rpc_buffer *buf;
WARN_ON_ONCE(size > PAGE_SIZE - sizeof(struct rpc_buffer));
if (size > PAGE_SIZE - sizeof(struct rpc_buffer))
return NULL;
if (size > PAGE_SIZE - sizeof(struct rpc_buffer)) {
WARN_ONCE(1, "xprtsock: large bc buffer request (size %zu)\n",
size);
return -EINVAL;
}
page = alloc_page(GFP_KERNEL);
if (!page)
return NULL;
return -ENOMEM;
buf = page_address(page);
buf->len = PAGE_SIZE;
return buf->data;
rqst->rq_buffer = buf->data;
return 0;
}
/*
* Free the space allocated in the bc_alloc routine
*/
static void bc_free(void *buffer)
static void bc_free(struct rpc_task *task)
{
void *buffer = task->tk_rqstp->rq_buffer;
struct rpc_buffer *buf;
if (!buffer)
return;
buf = container_of(buffer, struct rpc_buffer, data);
free_page((unsigned long)buf);
}