linux/net/sunrpc
Chuck Lever 431af645cf xprtrdma: Fix client lock-up after application signal fires
After a signal, the RPC client aborts synchronous RPCs running on
behalf of the signaled application.

The server is still executing those RPCs, and will write the results
back into the client's memory when it's done. By the time the server
writes the results, that memory is likely being used for other
purposes. Therefore xprtrdma has to immediately invalidate all
memory regions used by those aborted RPCs to prevent the server's
writes from clobbering that re-used memory.

With FMR memory registration, invalidation takes a relatively long
time. In fact, the invalidation is often still running when the
server tries to write the results into the memory regions that are
being invalidated.

This sets up a race between two processes:

1.  After the signal, xprt_rdma_free calls ro_unmap_safe.
2.  While ro_unmap_safe is still running, the server replies and
    rpcrdma_reply_handler runs, calling ro_unmap_sync.

Both processes invoke ib_unmap_fmr on the same FMR.

The mlx4 driver allows two ib_unmap_fmr calls on the same FMR at
the same time, but HCAs generally don't tolerate this. Sometimes
this can result in a system crash.

If the HCA happens to survive, rpcrdma_reply_handler continues. It
removes the rpc_rqst from rq_list and releases the transport_lock.
This enables xprt_rdma_free to run in another process, and the
rpc_rqst is released while rpcrdma_reply_handler is still waiting
for the ib_unmap_fmr call to finish.

But further down in rpcrdma_reply_handler, the transport_lock is
taken again, and "rqst" is dereferenced. If "rqst" has already been
released, this triggers a general protection fault. Since bottom-
halves are disabled, the system locks up.

Address both issues by reversing the order of the xprt_lookup_rqst
call and the ro_unmap_sync call. Introduce a separate lookup
mechanism for rpcrdma_req's to enable calling ro_unmap_sync before
xprt_lookup_rqst. Now the handler takes the transport_lock once
and holds it for the XID lookup and RPC completion.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=305
Fixes: 68791649a7 ('xprtrdma: Invalidate in the RPC reply ... ')
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2017-07-13 16:00:11 -04:00
..
auth_gss sunrpc: mark all struct rpc_procinfo instances as const 2017-07-13 15:57:57 -04:00
xprtrdma xprtrdma: Fix client lock-up after application signal fires 2017-07-13 16:00:11 -04:00
addr.c replace strict_strto calls 2014-07-12 18:45:49 -04:00
auth_generic.c NFS client updates for Linux 4.9 2016-10-13 21:28:20 -07:00
auth_null.c sunrpc: remove dead codes of cr_magic in rpc_cred 2017-02-08 17:02:46 -05:00
auth_unix.c sunrpc: rename NFS_NGROUPS to UNX_NGROUPS for auth unix 2017-02-08 17:02:45 -05:00
auth.c sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h> 2017-03-02 08:42:31 +01:00
backchannel_rqst.c SUNRPC: Refactor rpc_xdr_buf_init() 2016-09-19 13:08:37 -04:00
cache.c NFS client updates for Linux 4.11 2017-03-01 16:10:30 -08:00
clnt.c sunrpc: mark all struct rpc_procinfo instances as const 2017-07-13 15:57:57 -04:00
debugfs.c sunrpc: record rpc client pointer in seq->private directly 2017-02-08 17:02:47 -05:00
Kconfig svcrdma: Introduce local rdma_rw API helpers 2017-04-25 17:25:55 -04:00
Makefile SUNRPC: Add a structure to track multiple transports 2016-02-05 18:48:54 -05:00
netns.h netns: make struct pernet_operations::id unsigned int 2016-11-18 10:59:15 -05:00
rpc_pipe.c fs: Replace CURRENT_TIME with current_time() for inode timestamps 2016-09-27 21:06:21 -04:00
rpcb_clnt.c sunrpc: mark all struct rpc_procinfo instances as const 2017-07-13 15:57:57 -04:00
sched.c sunrpc: don't check for failure from mempool_alloc() 2017-04-20 13:44:57 -04:00
socklib.c sunrpc: do not pull udp headers on receive 2016-04-11 15:31:33 -04:00
stats.c sunrpc: move pc_count out of struct svc_procinfo 2017-07-13 15:58:02 -04:00
sunrpc_syms.c SUNRPC: cleanup ida information when removing sunrpc module 2017-01-24 15:29:24 -05:00
sunrpc.h
svc_xprt.c Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-02-20 13:23:30 -08:00
svc.c sunrpc: mark all struct svc_version instances as const 2017-07-13 15:58:03 -04:00
svcauth_unix.c sunrpc: rename NFS_NGROUPS to UNX_NGROUPS for auth unix 2017-02-08 17:02:45 -05:00
svcauth.c locking/atomic, kref: Implement kref_put_lock() 2017-01-18 10:03:29 +01:00
svcsock.c SUNRPC/backchanel: set XPT_CONG_CTRL flag for bc xprt 2017-03-09 15:20:46 -05:00
sysctl.c Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
timer.c
xdr.c sunrpc: Fix xdr_init_decode_pages() documenting comment 2017-04-25 16:12:34 -04:00
xprt.c SUNRPC: Make slot allocation more reliable 2017-07-13 15:58:04 -04:00
xprtmultipath.c SUNRPC search xprt switch for sockaddr 2016-09-19 13:08:36 -04:00
xprtsock.c SUNRPC: ensure correct error is reported by xs_tcp_setup_socket() 2017-05-31 12:26:44 -04:00