Each transport implementation can now set unique bind, connect,
reestablishment, and idle timeout values. These are variables,
allowing the values to be modified dynamically. This permits
exponential backoff of any of these values, for instance.
As an example, we implement exponential backoff for the connection
reestablishment timeout.
Test-plan:
Destructive testing (unplugging the network temporarily). Connectathon
with UDP and TCP.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Implement a best practice: if the remote end drops our connection, try to
reconnect using the same port number. This is important because the NFS
server's Duplicate Reply Cache often hashes on the source port number.
If the client reuses the port number when it reconnects, the server's DRC
will be more effective.
Based on suggestions by Mike Eisler, Olaf Kirch, and Alexey Kuznetsky.
Test-plan:
Destructive testing.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Select an RPC client source port between 650 and 1023 instead of between
1 and 800. The old range conflicts with a number of network services.
Provide sysctls to allow admins to select a different port range.
Note that this doesn't affect user-level RPC library behavior, which
still uses 1 to 800.
Based on a suggestion by Olaf Kirch <okir@suse.de>.
Test-plan:
Repeated mount and unmount. Destructive testing. Idle timeouts.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean-up: Move some macros that are specific to the Van Jacobson
implementation into xprt.c. Get rid of the cong_wait field in
rpc_xprt, which is no longer used. Get rid of xprt_clear_backlog.
Test-plan:
Compile with CONFIG_NFS enabled.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Get rid of the "xprt->nocong" variable.
Test-plan:
Use WAN simulation to cause sporadic bursty packet loss with UDP mounts.
Look for significant regression in performance or client stability.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The final place where congestion control state is adjusted is in
xprt_release, where each request is finally released. Add a callout
there to allow transports to perform additional processing when a
request is about to be released.
Test-plan:
Use WAN simulation to cause sporadic bursty packet loss. Look for significant
regression in performance or client stability.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
A new interface that allows transports to adjust their congestion window
using the Van Jacobson implementation in xprt.c is provided.
Test-plan:
Use WAN simulation to cause sporadic bursty packet loss. Look for
significant regression in performance or client stability.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Allow transports to hook the retransmit timer interrupt. Some transports
calculate their congestion window here so that a retransmit timeout has
immediate effect on the congestion window.
Test-plan:
Use WAN simulation to cause sporadic bursty packet loss. Look for significant
regression in performance or client stability.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The next method we abstract is the one that releases a transport,
allowing another task to have access to the transport.
Again, one generic version of this is provided for transports that
don't need the RPC client to perform congestion control, and one
version is for transports that can use the original Van Jacobson
implementation in xprt.c.
Test-plan:
Use WAN simulation to cause sporadic bursty packet loss. Look for
significant regression in performance or client stability.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The next several patches introduce an API that allows transports to
choose whether the RPC client provides congestion control or whether
the transport itself provides it.
The first method we abstract is the one that serializes access to the
RPC transport to prevent the bytes from different requests from mingling
together. This method provides proper request serialization and the
opportunity to prevent new requests from being started because the
transport is congested.
The normal situation is for the transport to handle congestion control
itself. Although NFS over UDP was first, it has been recognized after
years of experience that having the transport provide congestion control
is much better than doing it in the RPC client. Thus TCP, and probably
every future transport implementation, will use the default method,
xprt_lock_write, provided in xprt.c, which does not provide any kind
of congestion control. UDP can continue using the xprt.c-provided
Van Jacobson congestion avoidance implementation.
Test-plan:
Use WAN simulation to cause sporadic bursty packet loss. Look for significant
regression in performance or client stability.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Prepare the way to remove the "xprt->nocong" variable by adding a callout
to the RPC client transport switch API to handle setting RPC retransmit
timeouts.
Add a pair of generic helper functions that provide the ability to set a
simple fixed timeout, or to set a timeout based on the state of a round-
trip estimator.
Test-plan:
Use WAN simulation to cause sporadic bursty packet loss. Look for significant
regression in performance or client stability.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Now we can fix up the last few places that use the "xprt->stream"
variable, and get rid of it from the rpc_xprt structure.
Test-plan:
Destructive testing (unplugging the network temporarily). Connectathon
with UDP and TCP.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add a generic mechanism for skipping over transport-specific headers
when constructing an RPC request. This removes another "xprt->stream"
dependency.
Test-plan:
Write-intensive workload on a single mount point (try both UDP and
TCP).
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Split the RPC client's main socket write path into a TCP version and a UDP
version to eliminate another dependency on the "xprt->stream" variable.
Compiler optimization removes unneeded code from xs_sendpages, as this
function is now called with some constant arguments.
We can now cleanly perform transport protocol-specific return code testing
and error recovery in each path.
Test-plan:
Millions of fsx operations. Performance characterization such as
"sio" or "iozone". Examine oprofile results for any changes before and
after this patch is applied.
Version: Thu, 11 Aug 2005 16:08:46 -0400
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Create separate connection worker functions for managing UDP and TCP
transport sockets. This eliminates several dependencies on "xprt->stream".
Test-plan:
Destructive testing (unplugging the network temporarily). Connectathon with
v2, v3, and v4.
Version: Thu, 11 Aug 2005 16:08:18 -0400
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Split the socket write space callback function into a TCP version and UDP
version, eliminating one dependence on the "xprt->stream" variable.
Keep the common pieces of this path in xprt.c so other transports can use
it too.
Test-plan:
Write-intensive workload on a single mount point.
Version: Thu, 11 Aug 2005 16:07:51 -0400
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean-up: change some comments to reflect the realities of the new RPC
transport switch mechanism. Get rid of unused xprt_receive() prototype.
Also, organize function prototypes in xprt.h by usage and scope.
Test-plan:
Compile kernel with CONFIG_NFS enabled.
Version: Thu, 11 Aug 2005 16:07:21 -0400
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean-up: remove only reference to xprt->pending from the socket transport
implementation. This makes a cleaner interface for other transport
implementations as well.
Test-plan:
Compile kernel with CONFIG_NFS enabled.
Version: Thu, 11 Aug 2005 16:06:52 -0400
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean-up: get rid of unnecessary socket.h and in.h includes in the generic
parts of the RPC client.
Test-plan:
Compile kernel with CONFIG_NFS enabled.
Version: Thu, 11 Aug 2005 16:06:23 -0400
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean-up: get rid of a name reference to sockets in the generic parts of the
RPC client by renaming the sockstate field in the rpc_xprt structure.
Test-plan:
Compile kernel with CONFIG_NFS enabled.
Version: Thu, 11 Aug 2005 16:05:53 -0400
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean-up: Replace the xprt_lock with something more aptly named. This lock
single-threads the XID and request slot reservation process.
Test-plan:
Compile kernel with CONFIG_NFS enabled.
Version: Thu, 11 Aug 2005 16:05:26 -0400
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean-up: replace a name reference to sockets in the generic parts of the RPC
client by renaming sock_lock in the rpc_xprt structure.
Test-plan:
Compile kernel with CONFIG_NFS enabled.
Version: Thu, 11 Aug 2005 16:05:00 -0400
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Reduce stack utilization of the RPC socket transport's send path.
A couple of unlikely()s are added to ensure the compiler places the
tail processing at the end of the csect.
Test-plan:
Millions of fsx operations. Performance characterization such as "sio" or
"iozone".
Version: Thu, 11 Aug 2005 16:04:30 -0400
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Introduce block header comments and a function naming convention to the
socket transport implementation. Provide a debug setting for transports
that is separate from RPCDBG_XPRT. Eliminate xprt_default_timeout().
Provide block comments for exposed interfaces in xprt.c, and eliminate
the useless obvious comments.
Convert printk's to dprintk's.
Test-plan:
Compile kernel with CONFIG_NFS enabled.
Version: Thu, 11 Aug 2005 16:04:04 -0400
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Move the bulk of client-side socket-specific code into a separate source
file, net/sunrpc/xprtsock.c.
Test-plan:
Millions of fsx operations. Performance characterization such as "sio" or
"iozone". Destructive testing (unplugging the network temporarily, server
reboots). Connectathon with v2, v3, and v4.
Version: Thu, 11 Aug 2005 16:03:38 -0400
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean-up: Move some code that is common to both RPC client- and server-side
socket transports into its own source file, net/sunrpc/socklib.c.
Test-plan:
Compile kernel with CONFIG_NFS enabled. Millions of fsx operations over
UDP, client and server. Connectathon over UDP.
Version: Thu, 11 Aug 2005 16:03:09 -0400
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The in-kernel portmapper does not require a reserved port for making
bind queries.
Test-plan:
Tens of runs of the Connectathon locking suite with TCP and UDP
against several other NFS server implementations using NFSv3,
not NFSv4 (which doesn't require rpcbind).
Version: Thu, 11 Aug 2005 16:02:43 -0400
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Implement a best practice: don't use exponential backoff when computing
retransmit timeout values on TCP connections, but simply retransmit
at regular intervals.
This also fixes a bug introduced when xprt_reset_majortimeo() was added.
Test-plan:
Enable RPC debugging and watch timeout behavior on a NFS/TCP mount.
Version: Thu, 11 Aug 2005 16:02:19 -0400
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Implement a best practice: for soft mounts, an rpcbind timeout should
cause an RPC request to fail.
This also provides an FSM hook for retrying an rpcbind with a different
rpcbind protocol version. We'll use this later to try multiple rpcbind
protocol versions when binding. To enable this, expose the RPC error
code returned during a portmap request to the FSM so it can make some
decision about how to report, retry, or fail the request.
Test-plan:
Hundreds of passes with connectathon NFSv3 locking suite, on the client
and server.
Version: Thu, 11 Aug 2005 16:01:53 -0400
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Fix up xprt_connect_status: the soft timeout logic was clobbering tk_status,
so TCP connect errors were not properly reported on soft mounts.
Test-plan:
Destructive testing (unplugging the network temporarily). Connectathon
with UDP and TCP.
Version: Thu, 11 Aug 2005 16:01:28 -0400
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Change a printk(KERN_WARNING to dprintk, and it is really only interesting
when trying to debug a problem, and can occur normally without error.
Remove various gratuitous gotos in surrounding code, and remove some
type-cast assignments from inside 'if' conditionals, as that is just
obscuring what it going on.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Use schedule_timeout_{,un}interruptible() instead of
set_current_state()/schedule_timeout() to reduce kernel size. Also use
human-time conversion functions instead of hard-coded division to avoid
rounding issues.
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The sunrpc stats are collected in unsigned integers, but they are printed
with '%d'. That can result in negative numbers in /proc/net/rpc when the
highest bit of a counter is set. The following patch changes '%d' to '%u'
where appropriate.
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
When registering an RPC cache, cache_register() always sets the owner as the
sunrpc module. However, there are RPC caches owned by other modules. With
the incorrect owner setting, the real owning module can be removed potentially
with an open reference to the cache from userspace.
For example, if one were to stop the nfs server and unmount the nfsd
filesystem, the nfsd module could be removed eventhough rpc.idmapd had
references to the idtoname and nametoid caches (i.e.
/proc/net/rpc/nfs4.<cachename>/channel is still open). This resulted in a
system panic on one of our machines when attempting to restart the nfs
services after reloading the nfsd module.
The following patch adds a 'struct module *owner' field in struct
cache_detail. The owner is further assigned to the struct proc_dir_entry
in cache_register() so that the module cannot be unloaded while user-space
daemons have an open reference on the associated file under /proc.
Signed-off-by: Bruce Allan <bwa@us.ibm.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This patch is against 2.6.10, but still applies cleanly. It's just
s/driverfs/sysfs/ in these two files.
Signed-off-by: Rolf Eike Beer <eike-kernel@sf-tec.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Since the patch to add a NULL short-circuit to crypto_free_tfm() went in,
there's no longer any need for callers of that function to check for NULL.
This patch removes the redundant NULL checks and also a few similar checks
for NULL before calls to kfree() that I ran into while doing the
crypto_free_tfm bits.
I've succesfuly compile tested this patch, and a kernel with the patch
applied boots and runs just fine.
When I posted the patch to LKML (and other lists/people on Cc) it drew the
following comments :
J. Bruce Fields commented
"I've no problem with the auth_gss or nfsv4 bits.--b."
Sridhar Samudrala said
"sctp change looks fine."
Herbert Xu signed off on the patch.
So, I guess this is ready to be dropped into -mm and eventually mainline.
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch goes through the current users of the crypto layer and sets
CRYPTO_TFM_REQ_MAY_SLEEP at crypto_alloc_tfm() where all crypto operations
are performed in process context.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch puts mostly read only data in the right section
(read_mostly), to help sharing of these data between CPUS without
memory ping pongs.
On one of my production machine, tcp_statistics was sitting in a
heavily modified cache line, so *every* SNMP update had to force a
reload.
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lots of places just needs the states, not even linux/tcp.h, where this
enum was, needs it.
This speeds up development of the refactorings as less sources are
rebuilt when things get moved from net/tcp.h.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
While I was going through the crypto users recently, I noticed this
bogus kmap in sunrpc. It's totally unnecessary since the crypto
layer will do its own kmap before touching the data. Besides, the
kmap is throwing the return value away.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
We need to divide, not multiply. While we're here,
use NSEC_PER_USEC instead of a magic constant.
Based upon a report from Josip Loncaric and a patch
by Andrew Morton.
Signed-off-by: David S. Miller <davem@davemloft.net>
In __xprt_lock_write() we check to see if `task' is NULL, but in other places
we just go and dereference it.
`task' shouldn't be NULL anyway, so remove this test.
This defect was found automatically by Coverity Prevent, a static analysis
tool.
Signed-off-by: Zaur Kambarov <zkambarov@coverity.com>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
1. Establish a simple API for process freezing defined in linux/include/sched.h:
frozen(process) Check for frozen process
freezing(process) Check if a process is being frozen
freeze(process) Tell a process to freeze (go to refrigerator)
thaw_process(process) Restart process
frozen_process(process) Process is frozen now
2. Remove all references to PF_FREEZE and PF_FROZEN from all
kernel sources except sched.h
3. Fix numerous locations where try_to_freeze is manually done by a driver
4. Remove the argument that is no longer necessary from two function calls.
5. Some whitespace cleanup
6. Clear potential race in refrigerator (provides an open window of PF_FREEZE
cleared before setting PF_FROZEN, recalc_sigpending does not check
PF_FROZEN).
This patch does not address the problem of freeze_processes() violating the rule
that a task may only modify its own flags by setting PF_FREEZE. This is not clean
in an SMP environment. freeze(process) is therefore not SMP safe!
Signed-off-by: Christoph Lameter <christoph@lameter.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
rpc_create_client was modified recently to do its own (synchronous) NULL ping
of the server. We'd rather do that on our own, asynchronously, so that we
don't have to block the nfsd thread doing the probe, and so that setclientid
handling (hence, client mounts) can proceed normally whether the callback is
succesful or not. (We can still function fine without the callback
channel--we just won't be able to give out delegations till it's verified to
work.)
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Neil Brown <neilb@cse.unsw.edu.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>