Commit Graph

904 Commits

Author SHA1 Message Date
Andi Kleen
6904996101 gcc-4.6: nfsd: fix initialized but not read warnings
Fixes at least one real minor bug: the nfs4 recovery dir sysctl
would not return its status properly.

Also I finished Al's 1e41568d73 ("Take ima_path_check() in nfsd
past dentry_open() in nfsd_open()") commit, it moved the IMA
code, but left the old path initializer in there.

The rest is just dead code removed I think, although I was not
fully sure about the "is_borc" stuff. Some more review
would be still good.

Found by gcc 4.6's new warnings.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-07-29 19:32:17 -04:00
J. Bruce Fields
f9d7562fdb nfsd4: share file descriptors between stateid's
The vfs doesn't really allow us to "upgrade" a file descriptor from
read-only to read-write, and our attempt to do so in nfs4_upgrade_open
is ugly and incomplete.

Move to a different scheme where we keep multiple opens, shared between
open stateid's, in the nfs4_file struct.  Each file will be opened at
most 3 times (for read, write, and read-write), and those opens will be
shared between all clients and openers.  On upgrade we will do another
open if necessary instead of attempting to upgrade an existing open.
We keep count of the number of readers and writers so we know when to
close the shared files.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-07-29 18:19:23 -04:00
J. Bruce Fields
0292191417 nfsd4: fix openmode checking on IO using lock stateid
It is legal to perform a write using the lock stateid that was
originally associated with a read lock, or with a file that was
originally opened for read, but has since been upgraded.

So, when checking the openmode, check the mode associated with the
open stateid from which the lock was derived.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-07-29 16:37:12 -04:00
J. Bruce Fields
21fb4016bd nfsd4: miscellaneous process_open2 cleanup
Move more work into helper functions.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-07-29 16:34:29 -04:00
J. Bruce Fields
c3e4808086 nfsd4: don't pretend to support write delegations
The delegation code mostly pretends to support either read or write
delegations.  However, correct support for write delegations would
require, for example, breaking of delegations (and/or implementation of
cb_getattr) on stat.  Currently all that stops us from handing out
delegations is a subtle reference-counting issue.

Avoid confusion by adding an earlier check that explicitly refuses write
delegations.

For now, though, I'm not going so far as to rip out existing
half-support for write delegations, in case we get around to using that
soon.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-07-29 16:05:51 -04:00
J. Bruce Fields
fa0a21269f nfsd: bypass readahead cache when have struct file
The readahead cache compensates for the fact that the NFS server
currently does an open and close on every IO operation in the NFSv2 and
NFSv3 case.

In the NFSv4 case we have long-lived struct files associated with client
opens, so there's no need for this.  In fact, concurrent IO's using
trying to modify the same file->f_ra may cause problems.

So, don't bother with the readahead cache in that case.

Note eventually we'll likely do this in the v2/v3 case as well by
keeping a cache of struct files instead of struct file_ra_state's.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-07-27 18:15:54 -04:00
J. Bruce Fields
af4718f3f9 nfsd: minor nfsd_svc() cleanup
More idiomatic to put the error case in the if clause.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-07-23 08:51:27 -04:00
J. Bruce Fields
59db4a0c10 nfsd: move more into nfsd_startup()
This is just cleanup--it's harmless to call nfsd_rachache_init,
nfsd_init_socks, and nfsd_reset_versions more than once.  But there's no
point to it.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-07-23 08:51:26 -04:00
Jeff Layton
ac77efbe2b nfsd: just keep single lockd reference for nfsd
Right now, nfsd keeps a lockd reference for each socket that it has
open. This is unnecessary and complicates the error handling on
startup and shutdown. Change it to just do a lockd_up when starting
the first nfsd thread just do a single lockd_down when taking down the
last nfsd thread. Because of the strange way the sv_count is handled
this requires an extra flag to tell whether the nfsd_serv holds a
reference for lockd or not.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-07-23 08:51:26 -04:00
Jeff Layton
628b368728 nfsd: clean up nfsd_create_serv error handling
There doesn't seem to be any need to reset the nfssvc_boot time if the
nfsd startup failed.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-07-23 08:51:25 -04:00
Jeff Layton
0cd14a061e nfsd: fix error handling in __write_ports_addxprt
__write_ports_addxprt calls nfsd_create_serv. That increases the
refcount of nfsd_serv (which is tracked in sv_nrthreads). The service
only decrements the thread count on error, not on success like
__write_ports_addfd does, so using this interface leaves the nfsd
thread count high.

Fix this by having this function call svc_destroy() on error to release
the reference (and possibly to tear down the service) and simply
decrement the refcount without tearing down the service on success.

This makes the sv_threads handling work basically the same in both
__write_ports_addxprt and __write_ports_addfd.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-07-23 08:51:24 -04:00
Jeff Layton
78a8d7c8ca nfsd: fix error handling when starting nfsd with rpcbind down
The refcounting for nfsd is a little goofy. What happens is that we
create the nfsd RPC service, attach sockets to it but don't actually
start the threads until someone writes to the "threads" procfile. To do
this, __write_ports_addfd will create the nfsd service and then will
decrement the refcount when exiting but won't actually destroy the
service.

This is fine when there aren't errors, but when there are this can
cause later attempts to start nfsd to fail. nfsd_serv will be set,
and that causes __write_versions to return EBUSY.

Fix this by calling svc_destroy on nfsd_serv when this function is
going to return error.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-07-23 08:51:23 -04:00
Jeff Layton
4ad9a344be nfsd4: fix v4 state shutdown error paths
If someone tries to shut down the laundry_wq while it isn't up it'll
cause an oops.

This can happen because write_ports can create a nfsd_svc before we
really start the nfs server, and we may fail before the server is ever
started.

Also make sure state is shutdown on error paths in nfsd_svc().

Use a common global nfsd_up flag instead of nfs4_init, and create common
helper functions for nfsd start/shutdown, as there will be other work
that we want done only when we the number of nfsd threads transitions
between zero and nonzero.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-07-23 08:51:22 -04:00
J. Bruce Fields
55b13354d7 nfsd: remove unused assignment from nfsd_link
Trivial cleanup, since "dest" is never used.

Reported-by: Anshul Madan <Anshul.Madan@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-07-23 08:50:39 -04:00
Chuck Lever
43a9aa64a2 NFSD: Fill in WCC data for REMOVE, RMDIR, MKNOD, and MKDIR
Some well-known NFSv3 clients drop their directory entry caches when
they receive replies with no WCC data.  Without this data, they
employ extra READ, LOOKUP, and GETATTR requests to ensure their
directory entry caches are up to date, causing performance to suffer
needlessly.

In order to return WCC data, our server has to have both the pre-op
and the post-op attribute data on hand when a reply is XDR encoded.
The pre-op data is filled in when the incoming fh is locked, and the
post-op data is filled in when the fh is unlocked.

Unfortunately, for REMOVE, RMDIR, MKNOD, and MKDIR, the directory fh
is not unlocked until well after the reply has been XDR encoded.  This
means that encode_wcc_data() does not have wcc_data for the parent
directory, so none is returned to the client after these operations
complete.

By unlocking the parent directory fh immediately after the internal
operations for each NFS procedure is complete, the post-op data is
filled in before XDR encoding starts, so it can be returned to the
client properly.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-07-07 17:12:32 -04:00
J. Bruce Fields
6a85d6c769 nfsd4: comment nitpick
Reported-by: "Madan, Anshul" <Anshul.Madan@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-07-06 12:40:22 -04:00
J. Bruce Fields
cba9ba4b90 nfsd4: fix delegation recall race use-after-free
When the rarely-used callback-connection-changing setclientid occurs
simultaneously with a delegation recall, we rerun the recall by
requeueing it on a workqueue.  But we also need to take a reference on
the delegation in that case, since the delegation held by the rpc itself
will be released by the rpc_release callback.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-06-24 12:24:55 -04:00
J. Bruce Fields
ac94bf5825 nfsd4: fix deleg leak on callback error
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-06-24 12:24:53 -04:00
J. Bruce Fields
ec8acac84a nfsd4: remove some debugging code
This is overkill.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-06-22 22:29:03 -04:00
Benny Halevy
9303bbd3de nfsd: nfs4callback encode_stateid helper function
To be used also for the pnfs cb_layoutrecall callback

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd4: fix cb_recall encoding]
    "nfsd: nfs4callback encode_stateid helper function" forgot to reserve
    more space after return from the new helper.
Reported-by: Michael Groshans <groshans@citi.umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-06-22 17:19:51 -04:00
J. Bruce Fields
4731030d58 nfsd4: translate memory errors to delay, not serverfault
If the server is out of memory is better for clients to back off and
retry than to just error out.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-06-22 17:19:36 -04:00
J. Bruce Fields
76407f76e0 nfsd4; fix session reference count leak
Note the session has to be put() here regardless of what happens to the
client.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-06-22 17:19:28 -04:00
J. Bruce Fields
68a4b48ce6 nfsd4: don't bother storing callback reply tag
We don't use this, and probably never will.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-05-31 12:43:59 -04:00
J. Bruce Fields
24a0111e40 nfsd4: fix use of op_share_access
NFSv4.1 adds additional flags to the share_access argument of the open
call.  These flags need to be masked out in some of the existing code,
but current code does that inconsistently.

Tested-by: Michael Groshans <groshans@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-05-31 12:43:55 -04:00
J. Bruce Fields
172c85dd57 nfsd4: treat more recall errors as failures
If a recall fails for some unexpected reason, instead of ignoring it and
treating it like a success, it's safer to treat it as a failure,
preventing further delgation grants and returning CB_PATH_DOWN.

Also put put switches in a (two me) more logical order, with normal case
first.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-05-31 12:43:53 -04:00
J. Bruce Fields
378b7d37f9 nfsd4: remove extra put() on callback errors
Since rpc_call_async() guarantees that the release method will be called
even on failure, this put is wrong.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-05-31 12:43:51 -04:00
Alexey Dobriyan
4be929be34 kernel-wide: replace USHORT_MAX, SHORT_MAX and SHORT_MIN with USHRT_MAX, SHRT_MAX and SHRT_MIN
- C99 knows about USHRT_MAX/SHRT_MAX/SHRT_MIN, not
  USHORT_MAX/SHORT_MAX/SHORT_MIN.

- Make SHRT_MIN of type s16, not int, for consistency.

[akpm@linux-foundation.org: fix drivers/dma/timb_dma.c]
[akpm@linux-foundation.org: fix security/keys/keyring.c]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:07:02 -07:00
Christoph Hellwig
8018ab0574 sanitize vfs_fsync calling conventions
Now that the last user passing a NULL file pointer is gone we can remove
the redundant dentry argument and associated hacks inside vfs_fsynmc_range.

The next step will be removig the dentry argument from ->fsync, but given
the luck with the last round of method prototype changes I'd rather
defer this until after the main merge window.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-21 18:31:21 -04:00
Christoph Hellwig
e970a573ce nfsd: open a file descriptor for fsync in nfs4 recovery
Instead of just looking up a path use do_filp_open to get us a file
structure for the nfs4 recovery directory.  This allows us to get
rid of the last non-standard vfs_fsync caller with a NULL file
pointer.

[AV: should be using fput(), not filp_close()]

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-05-21 18:31:21 -04:00
J. Bruce Fields
e4e83ea47b Revert "nfsd4: distinguish expired from stale stateids"
This reverts commit 78155ed75f.

We're depending here on the boot time that we use to generate the
stateid being monotonic, but get_seconds() is not necessarily.

We still depend at least on boot_time being different every time, but
that is a safer bet.

We have a few reports of errors that might be explained by this problem,
though we haven't been able to confirm any of them.

But the minor gain of distinguishing expired from stale errors seems not
worth the risk.

Conflicts:

	fs/nfsd/nfs4state.c

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-05-18 19:03:50 -04:00
Pavel Emelyanov
47cee541a4 nfsd: safer initialization order in find_file()
The alloc_init_file() first adds a file to the hash and then
initializes its fi_inode, fi_id and fi_had_conflict.

The uninitialized fi_inode could thus be erroneously checked by
the find_file(), so move the hash insertion lower.

The client_mutex should prevent this race in practice; however, we
eventually hope to make less use of the client_mutex, so the ordering
here is an accident waiting to happen.

I didn't find whether the same can be true for two other fields,
but the common sense tells me it's better to initialize an object
before putting it into a global hash table :)

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-05-18 12:05:20 -04:00
J. Bruce Fields
b7299f4439 nfs4: minor callback code simplification, comment
Note the position in the version array doesn't have to match the actual
rpc version number--to me it seems clearer to maintain the distinction.

Also document choice of rpc callback version number, as discussed in
e.g. http://www.ietf.org/mail-archive/web/nfsv4/current/msg07985.html
and followups.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-05-18 11:51:38 -04:00
Pavel Emelyanov
15ddb4aec5 NFSD: don't report compiled-out versions as present
The /proc/fs/nfsd/versions file calls nfsd_vers() to check whether
the particular nfsd version is present/available. The problem is
that once I turn off e.g. NFSD-V4 this call returns -1 which is
true from the callers POV which is wrong.

The proposal is to report false in that case.

The bug has existed since 6658d3a7bb "[PATCH] knfsd: remove
nfsd_versbits as intermediate storage for desired versions".

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: stable@kernel.org
Acked-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-05-14 18:46:14 -04:00
J. Bruce Fields
4dc6ec00f6 nfsd4: implement reclaim_complete
This is a mandatory operation.  Also, here (not in open) is where we
should be committing the reboot recovery information.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-05-13 12:03:11 -04:00
Benny Halevy
ab707e1565 nfsd4: nfsd4_destroy_session must set callback client under the state lock
nfsd4_set_callback_client must be called under the state lock to atomically
set or unset the callback client and shutting down the previous one.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-05-13 11:59:11 -04:00
Benny Halevy
d76829889a nfsd4: keep a reference count on client while in use
Get a refcount on the client on SEQUENCE,
Release the refcount and renew the client when all respective compounds completed.
Do not expire the client by the laundromat while in use.
If the client was expired via another path, free it when the compounds
complete and the refcount reaches 0.

Note that unhash_client_locked must call list_del_init on cl_lru as
it may be called twice for the same client (once from nfs4_laundromat
and then from expire_client)

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-05-13 11:58:54 -04:00
Benny Halevy
07cd4909a6 nfsd4: mark_client_expired
Mark the client as expired under the client_lock so it won't be renewed
when an nfsv4.1 session is done, after it was explicitly expired
during processing of the compound.

Do not renew a client mark as expired (in particular, it is not
on the lru list anymore)

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-05-13 11:47:22 -04:00
Benny Halevy
46583e2597 nfsd4: introduce nfs4_client.cl_refcount
Currently just initialize the cl_refcount to 1
and decrement in expire_client(), conditionally freeing the
client when the refcount reaches 0.

To be used later by nfsv4.1 compounds to keep the client from
timing out while in use.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-05-13 11:47:03 -04:00
Benny Halevy
84d38ac9ab nfsd4: refactor expire_client
Separate out unhashing of the client and session.
To be used later by the laundromat.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-05-11 21:02:02 -04:00
Benny Halevy
36acb66bda nfsd4: extend the client_lock to cover cl_lru
To be used later on to hold a reference count on the client while in use by a
nfsv4.1 compound.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-05-11 21:02:02 -04:00
Benny Halevy
328efbab0f nfsd4: use list_move in move_to_confirmed
rather than list_del_init, list_add

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-05-11 21:02:01 -04:00
Benny Halevy
be1fdf6c43 nfsd4: fold release_session into expire_client
and grab the client lock once for all the client's sessions.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-05-11 21:02:01 -04:00
Benny Halevy
9089f1b478 nfsd4: rename sessionid_lock to client_lock
In preparation to share the lock's scope to both client
and session hash tables.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-05-11 21:02:01 -04:00
J. Bruce Fields
5d4cec2f2f nfsd4: fix bare destroy_session null dereference
It's legal to send a DESTROY_SESSION outside any session (as the only
operation in a compound), in which case cstate->session will be NULL;
check for that case.

While we're at it, move these checks into a separate helper function.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-05-07 19:08:47 -04:00
J. Bruce Fields
5306293c9c Merge commit 'v2.6.34-rc6'
Conflicts:
	fs/nfsd/nfs4callback.c
2010-05-04 11:29:05 -04:00
Benny Halevy
dbd65a7e44 nfsd4: use local variable in nfs4svc_encode_compoundres
'cs' is already computed, re-use it.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-05-04 10:10:36 -04:00
J. Bruce Fields
26c0c75e69 nfsd4: fix unlikely race in session replay case
In the replay case, the

	renew_client(session->se_client);

happens after we've droppped the sessionid_lock, and without holding a
reference on the session; so there's nothing preventing the session
being freed before we get here.

Thanks to Benny Halevy for catching a bug in an earlier version of this
patch.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Acked-by: Benny Halevy <bhalevy@panasas.com>
2010-05-03 08:32:31 -04:00
Neil Brown
2bc3c1179c nfsd4: bug in read_buf
When read_buf is called to move over to the next page in the pagelist
of an NFSv4 request, it sets argp->end to essentially a random
number, certainly not an address within the page which argp->p now
points to.  So subsequent calls to READ_BUF will think there is much
more than a page of spare space (the cast to u32 ensures an unsigned
comparison) so we can expect to fall off the end of the second
page.

We never encountered thsi in testing because typically the only
operations which use more than two pages are write-like operations,
which have their own decoding logic.  Something like a getattr after a
write may cross a page boundary, but it would be very unusual for it to
cross another boundary after that.

Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-04-26 15:39:08 -04:00
Dan Carpenter
d03859a4ac nfsd: potential ERR_PTR dereference on exp_export() error paths.
We "goto finish" from several places where "exp" is an ERR_PTR.  Also I
changed the check for "fsid_key" so that it was consistent with the check
I added.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-04-22 12:03:02 -04:00
J. Bruce Fields
5771635592 nfsd4: complete enforcement of 4.1 op ordering
Enforce the rules about compound op ordering.

Motivated by implementing RECLAIM_COMPLETE, for which the client is
implicit in the current session, so it is important to ensure a
succesful SEQUENCE proceeds the RECLAIM_COMPLETE.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2010-04-22 11:35:14 -04:00