Commit Graph

2189 Commits

Author SHA1 Message Date
Trond Myklebust
141aeb9f26 NFSv4: Fix two unbalanced put_rpccred() issues.
Commits 29fba38b (nfs41: lease renewal) and fc01cea9 (nfs41: sequence
operation) introduce a couple of put_rpccred() calls on credentials for
which there is no corresponding get_rpccred().

See http://bugzilla.kernel.org/show_bug.cgi?id=14249

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-10-26 08:09:46 -04:00
Trond Myklebust
52567b03ca NFSv4: Fix a bug when the server returns NFS4ERR_RESOURCE
RFC 3530 states that when we recieve the error NFS4ERR_RESOURCE, we are not
supposed to bump the sequence number on OPEN, LOCK, LOCKU, CLOSE, etc
operations. The problem is that we map that error into EREMOTEIO in the XDR
layer, and so the NFSv4 middle-layer routines like seqid_mutating_err(),
and nfs_increment_seqid() don't recognise it.

The fix is to defer the mapping until after the middle layers have
processed the error.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-10-23 14:46:42 -04:00
Terry Loftin
a8b40bc7e6 nfs: Panic when commit fails
Actually pass the NFS_FILE_SYNC option to the server to avoid a
Panic in nfs_direct_write_complete() when a commit fails.

At the end of an nfs write, if the nfs commit fails, all the writes
will be rescheduled.  They are supposed to be rescheduled as NFS_FILE_SYNC
writes, but the rpc_task structure is not completely intialized and so
the option is not passed.  When the rescheduled writes complete, the
return indicates that they are NFS_UNSTABLE and we try to do another
commit.  This leads to a Panic because the commit data structure pointer
was set to null in the initial (failed) commit attempt.

Signed-off-by: Terry Loftin <terry.loftin@hp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-10-23 14:16:30 -04:00
Yinghai Lu
4223a4a155 nfs: Fix nfs_parse_mount_options() kfree() leak
Fix a (small) memory leak in one of the error paths of the NFS mount
options parsing code.

Regression introduced in 2.6.30 by commit a67d18f (NFS: load the
rpc/rdma transport module automatically).

Reported-by: Yinghai Lu <yinghai@kernel.org>
Reported-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-22 08:15:23 +09:00
Stefan Richter
a1be9eee29 NFS: suppress a build warning
struct sockaddr_storage * can safely be used as struct sockaddr *.
Suppress an "incompatible pointer type" warning.

Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-12 10:25:12 -07:00
Trond Myklebust
3050141bae NFSv4: Kill nfs4_renewd_prepare_shutdown()
The NFSv4 renew daemon is shared between all active super blocks that refer
to a particular NFS server, so it is wrong to be shutting it down in
nfs4_kill_super every time a super block is destroyed.

This patch therefore kills nfs4_renewd_prepare_shutdown altogether, and
leaves it up to nfs4_shutdown_client() to also shut down the renew daemon
by means of the existing call to nfs4_kill_renewd().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-10-08 11:50:55 -04:00
Trond Myklebust
517be09def NFSv4: Fix the referral mount code
Fix a typo which causes try_location() to use the wrong length argument
when calling nfs_parse_server_name(). This again, causes the initialisation
of the mount's sockaddr structure to fail.

Also ensure that if nfs4_pathname_string() returns an error, then we pass
that error back up the stack instead of ENOENT.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-10-06 15:42:20 -04:00
Ben Hutchings
f4373bf9e6 nfs: Avoid overrun when copying client IP address string
As seen in <http://bugs.debian.org/549002>, nfs4_init_client() can
overrun the source string when copying the client IP address from
nfs_parsed_mount_data::client_address to nfs_client::cl_ipaddr.  Since
these are both treated as null-terminated strings elsewhere, the copy
should be done with strlcpy() not memcpy().

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-10-06 15:42:18 -04:00
Trond Myklebust
bcd2ea17da NFS: Fix port initialisation in nfs_remount()
The recent changeset 53a0b9c4c9 (NFS: Replace
nfs_parse_ip_address() with rpc_pton()) broke nfs_remount, since the call
to rpc_pton() will zero out the port number in data->nfs_server.address.

This is actually due to a bug in nfs_remount: it should be looking at the
port number in nfs_server.port instead...

This fixes bug
   http://bugzilla.kernel.org/show_bug.cgi?id=14276

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-10-06 15:41:22 -04:00
Trond Myklebust
f5855fecda NFS: Fix port and mountport display in /proc/self/mountinfo
Currently, the port and mount port will both display as 65535 if you do not
specify a port number. That would be wrong...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-10-06 15:40:37 -04:00
Trond Myklebust
c5811dbdd2 NFS: Fix a default mount regression...
With the recent spate of changes, the nfs protocol version will now default
to 2 instead of 3, while the mount protocol version defaults to 3.

The following patch should ensure the defaults are consistent with the
previous defaults of vers=3,proto=tcp,mountvers=3,mountproto=tcp.

This fixes the bug
   http://bugzilla.kernel.org/show_bug.cgi?id=14259

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-10-06 15:40:15 -04:00
Christoph Lameter
fce22848a1 this_cpu: Use this_cpu operations for NFS statistics
Simplify NFS statistics and allow the use of optimized
arch instructions.

Acked-by: Tejun Heo <tj@kernel.org>
CC: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2009-10-03 19:48:22 +09:00
Alexey Dobriyan
f0f37e2f77 const: mark struct vm_struct_operations
* mark struct vm_area_struct::vm_ops as const
* mark vm_ops in AGP code

But leave TTM code alone, something is fishy there with global vm_ops
being used.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-27 11:39:25 -07:00
Al Viro
36dd2fdb37 nfs[23] tcp breakage in mount with binary options
We forget to set nfs_server.protocol in tcp case when old-style binary
options are passed to mount.  The thing remains zero and never validated
afterwards.  As the result, we hit BUG in fs/nfs/client.c:588.

Breakage has been introduced in NFS: Add nfs_alloc_parsed_mount_data
merged yesterday...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-09-24 14:58:42 -04:00
Linus Torvalds
6c5daf012c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  truncate: use new helpers
  truncate: new helpers
  fs: fix overflow in sys_mount() for in-kernel calls
  fs: Make unload_nls() NULL pointer safe
  freeze_bdev: grab active reference to frozen superblocks
  freeze_bdev: kill bd_mount_sem
  exofs: remove BKL from super operations
  fs/romfs: correct error-handling code
  vfs: seq_file: add helpers for data filling
  vfs: remove redundant position check in do_sendfile
  vfs: change sb->s_maxbytes to a loff_t
  vfs: explicitly cast s_maxbytes in fiemap_check_ranges
  libfs: return error code on failed attr set
  seq_file: return a negative error code when seq_path_root() fails.
  vfs: optimize touch_time() too
  vfs: optimization for touch_atime()
  vfs: split generic_forget_inode() so that hugetlbfs does not have to copy it
  fs/inode.c: add dev-id and inode number for debugging in init_special_inode()
  libfs: make simple_read_from_buffer conventional
2009-09-24 08:32:11 -07:00
Linus Torvalds
db16826367 Merge branch 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6
* 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (21 commits)
  HWPOISON: Enable error_remove_page on btrfs
  HWPOISON: Add simple debugfs interface to inject hwpoison on arbitary PFNs
  HWPOISON: Add madvise() based injector for hardware poisoned pages v4
  HWPOISON: Enable error_remove_page for NFS
  HWPOISON: Enable .remove_error_page for migration aware file systems
  HWPOISON: The high level memory error handler in the VM v7
  HWPOISON: Add PR_MCE_KILL prctl to control early kill behaviour per process
  HWPOISON: shmem: call set_page_dirty() with locked page
  HWPOISON: Define a new error_remove_page address space op for async truncation
  HWPOISON: Add invalidate_inode_page
  HWPOISON: Refactor truncate to allow direct truncating of page v2
  HWPOISON: check and isolate corrupted free pages v2
  HWPOISON: Handle hardware poisoned pages in try_to_unmap
  HWPOISON: Use bitmask/action code for try_to_unmap behaviour
  HWPOISON: x86: Add VM_FAULT_HWPOISON handling to x86 page fault handler v2
  HWPOISON: Add poison check to page fault handling
  HWPOISON: Add basic support for poisoned pages in fault handler v3
  HWPOISON: Add new SIGBUS error codes for hardware poison signals
  HWPOISON: Add support for poison swap entries v2
  HWPOISON: Export some rmap vma locking to outside world
  ...
2009-09-24 07:53:22 -07:00
npiggin@suse.de
c08d3b0e33 truncate: use new helpers
Update some fs code to make use of new helper functions introduced
in the previous patch. Should be no significant change in behaviour
(except CIFS now calls send_sig under i_lock, via inode_newsize_ok).

Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Miklos Szeredi <miklos@szeredi.hu>
Cc: linux-nfs@vger.kernel.org
Cc: Trond.Myklebust@netapp.com
Cc: linux-cifs-client@lists.samba.org
Cc: sfrench@samba.org
Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-09-24 08:41:47 -04:00
Alexey Dobriyan
2bcd57ab61 headers: utsname.h redux
* remove asm/atomic.h inclusion from linux/utsname.h --
   not needed after kref conversion
 * remove linux/utsname.h inclusion from files which do not need it

NOTE: it looks like fs/binfmt_elf.c do not need utsname.h, however
due to some personality stuff it _is_ needed -- cowardly leave ELF-related
headers and files alone.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23 18:13:10 -07:00
David Howells
2df5480638 NFS: Propagate 'fsc' mount option through automounts
Propagate the NFS 'fsc' mount option through NFS automounts of various types.

This is now required as commit:

	commit c02d7adf8c
	Author: Trond Myklebust <Trond.Myklebust@netapp.com>
	Date:   Mon Jun 22 15:09:14 2009 -0400

	NFSv4: Replace nfs4_path_walk() with VFS path lookup in a private namespace

uses VFS-driven automounting to reach all submounts barring the root, thus
preventing fscaching from being enabled on any submount other than the root.

This patch gets around that by propagating the NFS_OPTION_FSCACHE flag across
automounts.  If a uniquifier is supplied to a mount then this is propagated to
all automounts of that mount too.

Signed-off-by: David Howells <dhowells@redhat.com>
[Trond: Fixed up the definition of nfs_fscache_get_super_cookie for the
        case of #undef CONFIG_NFS_FSCACHE]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-09-23 14:36:39 -04:00
Chuck Lever
9423a08ad5 NFS: Add nfs_alloc_parsed_mount_data
Allocating nfs_parsed_mount_data and setting up the defaults is nearly
the same for both nfs and nfs4 mounts.

Both paths seem to use nfs_validate_transport_protocol(), so setting a
default value for nfs_server.protocol ought to be unnecessary.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-09-23 14:36:38 -04:00
Trond Myklebust
8a6e5deb8a NFS: Get rid of the NFS_MOUNT_VER3 and NFS_MOUNT_TCP flags
Keep it in the case of the legacy binary mount interface, but purge it from
the nfs_server structure.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-09-23 14:36:37 -04:00
James Morris
88e9d34c72 seq_file: constify seq_operations
Make all seq_operations structs const, to help mitigate against
revectoring user-triggerable function pointers.

This is derived from the grsecurity patch, although generated from scratch
because it's simpler than extracting the changes from there.

Signed-off-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23 07:39:29 -07:00
Linus Torvalds
342ff1a1b5 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits)
  trivial: fix typo in aic7xxx comment
  trivial: fix comment typo in drivers/ata/pata_hpt37x.c
  trivial: typo in kernel-parameters.txt
  trivial: fix typo in tracing documentation
  trivial: add __init/__exit macros in drivers/gpio/bt8xxgpio.c
  trivial: add __init macro/ fix of __exit macro location in ipmi_poweroff.c
  trivial: remove unnecessary semicolons
  trivial: Fix duplicated word "options" in comment
  trivial: kbuild: remove extraneous blank line after declaration of usage()
  trivial: improve help text for mm debug config options
  trivial: doc: hpfall: accept disk device to unload as argument
  trivial: doc: hpfall: reduce risk that hpfall can do harm
  trivial: SubmittingPatches: Fix reference to renumbered step
  trivial: fix typos "man[ae]g?ment" -> "management"
  trivial: media/video/cx88: add __init/__exit macros to cx88 drivers
  trivial: fix typo in CONFIG_DEBUG_FS in gcov doc
  trivial: fix missing printk space in amd_k7_smp_check
  trivial: fix typo s/ketymap/keymap/ in comment
  trivial: fix typo "to to" in multiple files
  trivial: fix typos in comments s/DGBU/DBGU/
  ...
2009-09-22 07:51:45 -07:00
Alexey Dobriyan
6aed62853c const: make file_lock_operations const
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-22 07:17:25 -07:00
Jens Axboe
48d0764998 nfs: initialize the backing_dev_info when creating the server
NFS may free the server structure without ever having used the
bdi, so we either need to flag the bdi as being uninitialized or
initialize it up front. This does the latter.

This fixes a crash with mounting more than one NFS file system,
should people ever need that kind of obscure NFS functionality.

Tested-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-21 15:40:33 +02:00
Jens Axboe
92f25053c0 nfs: nfs_kill_super() should call bdi_unregister() after killing super
Otherwise we could be attempting to flush data for a writeback
thread and bdi that have already disappeared.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-21 15:40:32 +02:00
Joe Perches
a419aef8b8 trivial: remove unnecessary semicolons
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-09-21 15:14:58 +02:00
Jens Axboe
32a88aa1b6 fs: Assign bdi in super_block
We do this automatically in get_sb_bdev() from the set_bdev_super()
callback. Filesystems that have their own private backing_dev_info
must assign that in ->fill_super().

Note that ->s_bdi assignment is required for proper writeback!

Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-16 15:18:51 +02:00
Jens Axboe
1fe06ad892 writeback: get rid of wbc->for_writepages
It's only set, it's never checked. Kill it.

Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-16 15:16:18 +02:00
Andi Kleen
f590f333fb HWPOISON: Enable error_remove_page for NFS
Enable hardware memory error handling for NFS

Truncation of data pages at runtime should be safe in NFS,
even when it doesn't support migration so far.

Trond tells me migration is also queued up for 2.6.32.

Acked-by: Trond.Myklebust@netapp.com
Signed-off-by: Andi Kleen <ak@linux.intel.com>
2009-09-16 11:50:17 +02:00
Trond Myklebust
ab3bbaa8b2 Merge branch 'nfs-for-2.6.32' 2009-09-11 14:59:37 -04:00
Jens Axboe
d993831fa7 writeback: add name to backing_dev_info
This enables us to track who does what and print info. Its main use
is catching dirty inodes on the default_backing_dev_info, so we can
fix that up.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11 09:20:26 +02:00
Trond Myklebust
2ecda72b49 NFSv4: Disallow 'mount -t nfs4 -overs=2' and 'mount -t nfs4 -overs=3'
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-09-08 19:50:07 -04:00
Chuck Lever
764302ccb8 NFS: Allow the "nfs" file system type to support NFSv4
When mounting an "nfs" type file system, recognize "v4," "vers=4," or
"nfsvers=4" mount options, and convert the file system to "nfs4" under
the covers.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
[trondmy: fixed up binary mount code so it sets the 'version' field too]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-09-08 19:50:03 -04:00
Chuck Lever
a6fe23be90 NFS: Move details of nfs4_get_sb() to a helper
Clean up: Refactor nfs4_get_sb() to allow its guts to be invoked by
nfs_get_sb().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-09-08 19:50:00 -04:00
Chuck Lever
7630c852e1 NFS: Refactor NFSv4 text-based mount option validation
Clean up: Refactor the part of nfs4_validate_mount_options() that
handles text-based options, so we can call it from the NFSv2/v3
option validation function.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-09-08 19:49:57 -04:00
Chuck Lever
4cfd74fc99 NFS: Mount option parser should detect missing "port="
The meaning of not specifying the "port=" mount option is different
for "-t nfs" and "-t nfs4" mounts.  The default port value for
NFSv2/v3 mounts is 0, but the default for NFSv4 mounts is 2049.

To support "-t nfs -o vers=4", the mount option parser must detect
when "port=" is missing so that the correct default port value can be
set depending on which NFS version is requested.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-09-08 19:49:47 -04:00
Harshula Jayasuriya
dbab8360ed NFS: out of date comment regarding O_EXCL above nfs3_proc_create()
Hi Trond,

Recently we were observing the behaviour difference between a 2.4.x and
2.6.x kernel with respect to O_EXCL. A comment from 2.4.x era, "For now,
we don't implement O_EXCL." seems inaccurate in TOT.

If so, here's a patch to remove the comment.

This patch is against:
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6

Signed-off-by: Harshula Jayasuriya <harshula@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-09-08 19:49:33 -04:00
Trond Myklebust
7111dc7392 NFSv4: Fix an infinite looping problem with the nfs4_state_manager
Commit 76db6d9500 (nfs41: add session setup
to the state manager) introduces an infinite loop possibility in the NFSv4
state manager. By first checking nfs4_has_session() before clearing the
NFS4CLNT_SESSION_SETUP flag, it allows for a situation where someone sets
that flag, but it never gets cleared, and so the state manager loops.

In fact commit c3fad1b1aa (nfs41: add session
reset to state manager) causes this to happen every time we get a network
partition error.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Daniel J Blueman <daniel.blueman@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-24 16:28:42 -07:00
Chuck Lever
5eecfde615 NFS: Handle a zero-length auth flavor list
Some releases of Linux rpc.mountd (nfs-utils 1.1.4 and later) return an
empty auth flavor list if no sec= was specified for the export.  This is
notably broken server behavior.

The new auth flavor list checking added in a recent commit rejects this
case.  The OpenSolaris client does too.

The broken mountd implementation is already widely deployed.  To avoid
a behavioral regression, the kernel's mount client skips flavor checking
(ie reverts to the pre-2.6.32 behavior) if mountd returns an empty
flavor list.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-23 23:43:57 -04:00
Jan Kara
e1af88a1ad nfs: Remove reference to generic_osync_inode from a comment
generic_file_direct_write() no longer calls generic_osync_inode() so remove the
comment.

CC: linux-nfs@vger.kernel.org
CC: Neil Brown <neilb@suse.de>
CC: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-19 19:48:08 -04:00
Trond Myklebust
7d7ea88289 NFS: Use the DNS resolver in the mount code.
In the referral code, use it to look up the new server's ip address if the
fs_locations attribute contains a hostname.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-19 18:22:15 -04:00
Trond Myklebust
e571cbf1a4 NFS: Add a dns resolver for use with NFSv4 referrals and migration
The NFSv4 and NFSv4.1 protocols both allow for the redirection of a client
from one server to another in order to support filesystem migration and
replication. For full protocol support, we need to add the ability to
convert a DNS host name into an IP address that we can feed to the RPC
client.

We'll reuse the sunrpc cache, now that it has been converted to work with
rpc_pipefs.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-19 18:22:15 -04:00
Trond Myklebust
6a396f67d2 Merge branch 'nfsv4_xdr_cleanups-for-2.6.32' into nfs-for-2.6.32
Conflicts:
	fs/nfs/nfs4xdr.c
2009-08-19 18:21:52 -04:00
Benny Halevy
cccddf4f55 nfs: nfs4xdr: optimize low level decoding
do not increment decoding ptr if not needed.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-14 14:02:26 -04:00
Benny Halevy
c0eae66ece nfs: nfs4xdr: get rid of READ_BUF
Use xdr_inline_decode instead.
Open code debug printout and error return.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-14 14:02:23 -04:00
Benny Halevy
2460ba57c4 nfs: nfs4xdr: simplify decode_exchange_id by reusing decode_opaque_inline
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-14 14:02:20 -04:00
Benny Halevy
99398d0655 nfs: nfs4xdr: get rid of COPYMEM
Just directly call memcpy.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-14 14:02:17 -04:00
Benny Halevy
e78291e4e0 nfs: nfs4xdr: introduce decode_sessionid helper
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-14 14:02:14 -04:00
Benny Halevy
db942bbd09 nfs: nfs4xdr: introduce decode_verifier helper
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[Trond: Fixed up an 'uninitialised variable' issue in decode_readdir]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-14 13:57:58 -04:00
Benny Halevy
07d30434cf nfs: nfs4xdr: introduce decode_opaque_fixed and decode_stateid helpers
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-14 13:26:27 -04:00
Benny Halevy
686841b3cc nfs: nfs4xdr: introduce print_overflow_msg
Part fo the nfs4xdr cleanup.  READ_BUF will go away.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-14 13:24:38 -04:00
Benny Halevy
c816fd3406 nfs: nfs4xdr: get rid of READTIME
It has no users.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-14 13:24:32 -04:00
Benny Halevy
3ceb4dbb99 nfs: nfs4xdr: get rid of READ64
s/READ64\(\*(.*)\)/p = xdr_decode_hyper(p, \1)/
s/READ64\((.*)\)/p = xdr_decode_hyper(p, &\1)/

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-14 13:24:13 -04:00
Benny Halevy
6f723f7710 nfs: nfs4xdr: get rid of READ32
s/READ32\((.*)\)/\1 = be32_to_cpup(p++)/

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-14 13:23:58 -04:00
Benny Halevy
811652bd6e nfs: nfs4xdr: merge xdr_encode_int+xdr_encode_opaque_fixed into xdr_encode_opaque
use encode_string where appropriate.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-14 13:19:24 -04:00
Benny Halevy
345585132a nfs: nfs4xdr: optimize low level encoding
do not increment encoding ptr if not needed.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-14 13:18:03 -04:00
Benny Halevy
13c65ce900 nfs: nfs4xdr: change RESERVE_SPACE macro into a static helper
In order to open code and expose the result pointer assignment.

Alternatively, we can open code the call to xdr_reserve_space
and do the BUG_ON an the error case at the call site.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-14 13:17:17 -04:00
Benny Halevy
2220f13a8b nfs: nfs4xdr: encode_compound_hdr does not have to round up reserved bytes
This is already done by xdr_reserve_space and since encode_compound_hdr
is adding a byte count to "12" which is already word aligned, the xdr
level rounding will work just as well.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-14 13:16:00 -04:00
Benny Halevy
42edd69812 nfs: nfs4xdr: optimize RESERVE_SPACE in encode_create_session and encode_sequence
Coalesce multilpe constant RESERVE_SPACEs into one

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-14 13:15:20 -04:00
Benny Halevy
93f0cf2594 nfs: nfs4xdr: get rid of WRITEMEM
s/WRITEMEM(/p = xdr_encode_opaque_fixed(p, /

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-14 13:13:58 -04:00
Benny Halevy
b95be5a976 nfs: nfs4xdr: get rid of WRITE64
s/WRITE64/p = xdr_encode_hyper(p, /

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-14 13:13:15 -04:00
Benny Halevy
e75bc1c89e nfs: nfs4xdr: get rid of WRITE32
s/WRITE32/*p++ = cpu_to_be32/

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-14 13:12:55 -04:00
Trond Myklebust
1ae88b2e44 NFS: Fix an O_DIRECT Oops...
We can't call nfs_readdata_release()/nfs_writedata_release() without
first initialising and referencing args.context. Doing so inside
nfs_direct_read_schedule_segment()/nfs_direct_write_schedule_segment()
causes an Oops.

We should rather be calling nfs_readdata_free()/nfs_writedata_free() in
those cases.

Looking at the O_DIRECT code, the "struct nfs_direct_req" is already
referencing the nfs_open_context for us. Since the readdata and writedata
structures carry a reference to that, we can simplify things by getting rid
of the extra nfs_open_context references, so that we can replace all
instances of nfs_readdata_release()/nfs_writedata_release().

Reported-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-08-12 08:21:39 -07:00
Trond Myklebust
f884dcaead Merge branch 'sunrpc_cache-for-2.6.32' into nfs-for-2.6.32 2009-08-10 17:45:58 -04:00
Trond Myklebust
976a6f921c Merge branch 'patches_cel-for-2.6.32' into nfs-for-2.6.32 2009-08-10 17:45:50 -04:00
Bartlomiej Zolnierkiewicz
e576e05a73 nfs: remove superfluous BUG_ON()s
Subject: [PATCH] nfs: remove superfluous BUG_ON()s

Remove duplicated BUG_ON()s from nfs[4]_create_server()
(we make the same checks earlier in both functions).

This takes care of the following entries from Dan's list:

fs/nfs/client.c +1078 nfs_create_server(47) warning: variable derefenced before check 'server->nfs_client'
fs/nfs/client.c +1079 nfs_create_server(48) warning: variable derefenced before check 'server->nfs_client->rpc_ops'
fs/nfs/client.c +1363 nfs4_create_server(43) warning: variable derefenced before check 'server->nfs_client'
fs/nfs/client.c +1364 nfs4_create_server(44) warning: variable derefenced before check 'server->nfs_

Reported-by: Dan Carpenter <error27@gmail.com>
Cc: corbet@lwn.net
Cc: eteo@redhat.com
Cc: Julia Lawall <julia@diku.dk>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-10 08:54:16 -04:00
Peter Staubach
38c73044f5 NFS: read-modify-write page updating
Hi.

I have a proposal for possibly resolving this issue.

I believe that this situation occurs due to the way that the
Linux NFS client handles writes which modify partial pages.

The Linux NFS client handles partial page modifications by
allocating a page from the page cache, copying the data from
the user level into the page, and then keeping track of the
offset and length of the modified portions of the page.  The
page is not marked as up to date because there are portions
of the page which do not contain valid file contents.

When a read call comes in for a portion of the page, the
contents of the page must be read in the from the server.
However, since the page may already contain some modified
data, that modified data must be written to the server
before the file contents can be read back in the from server.
And, since the writing and reading can not be done atomically,
the data must be written and committed to stable storage on
the server for safety purposes.  This means either a
FILE_SYNC WRITE or a UNSTABLE WRITE followed by a COMMIT.
This has been discussed at length previously.

This algorithm could be described as modify-write-read.  It
is most efficient when the application only updates pages
and does not read them.

My proposed solution is to add a heuristic to decide whether
to do this modify-write-read algorithm or switch to a read-
modify-write algorithm when initially allocating the page
in the write system call path.  The heuristic uses the modes
that the file was opened with, the offset in the page to
read from, and the size of the region to read.

If the file was opened for reading in addition to writing
and the page would not be filled completely with data from
the user level, then read in the old contents of the page
and mark it as Uptodate before copying in the new data.  If
the page would be completely filled with data from the user
level, then there would be no reason to read in the old
contents because they would just be copied over.

This would optimize for applications which randomly access
and update portions of files.  The linkage editor for the
C compiler is an example of such a thing.

I tested the attached patch by using rpmbuild to build the
current Fedora rawhide kernel.  The kernel without the
patch generated about 269,500 WRITE requests.  The modified
kernel containing the patch generated about 261,000 WRITE
requests.  Thus, about 8,500 fewer WRITE requests were
generated.  I suspect that many of these additional
WRITE requests were probably FILE_SYNC requests to WRITE
a single page, but I didn't test this theory.

The difference between this patch and the previous one was
to remove the unneeded PageDirty() test.  I then retested to
ensure that the resulting system continued to behave as
desired.

	Thanx...

		ps

Signed-off-by: Peter Staubach <staubach@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-10 08:54:16 -04:00
Trond Myklebust
074cc1deec NFS: Add a ->migratepage() aop for NFS
Make NFS a bit more friendly to NUMA and memory hot removal...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-10 08:54:13 -04:00
Trond Myklebust
7d217caca5 SUNRPC: Replace rpc_client->cl_dentry and cl_mnt, with a cl_path
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09 15:14:24 -04:00
Trond Myklebust
b693ba4a33 SUNRPC: Constify rpc_pipe_ops...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09 15:14:15 -04:00
Chuck Lever
ec6ee61250 NFS: Replace nfs_set_port() with rpc_set_port()
Clean up.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09 15:09:37 -04:00
Chuck Lever
53a0b9c4c9 NFS: Replace nfs_parse_ip_address() with rpc_pton()
Clean up: Use the common routine now provided in sunrpc.ko for parsing mount
addresses.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09 15:09:36 -04:00
Chuck Lever
a02d692611 SUNRPC: Provide functions for managing universal addresses
Introduce a set of functions in the kernel's RPC implementation for
converting between a socket address and either a standard
presentation address string or an RPC universal address.

The universal address functions will be used to encode and decode
RPCB_FOO and NFSv4 SETCLIENTID arguments.  The other functions are
part of a previous promise to deliver shared functions that can be
used by upper-layer protocols to display and manipulate IP
addresses.

The kernel's current address printf formatters were designed
specifically for kernel to user-space APIs that require a particular
string format for socket addresses, thus are somewhat limited for the
purposes of sunrpc.ko.  The formatter for IPv6 addresses, %pI6, does
not support short-handing or scope IDs.  Also, these printf formatters
are unique per address family, so a separate formatter string is
required for printing AF_INET and AF_INET6 addresses.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09 15:09:34 -04:00
Chuck Lever
ec88f28d1e NFS: Use the authentication flavor list returned by mountd
Commit a14017db added support in the kernel's NFS mount client to
decode the authentication flavor list returned by mountd.

The NFS client can now use this list to determine whether the
authentication flavor requested by the user is actually supported
by the server.

Note we don't actually negotiate the security flavor if none was
specified by the user.  Instead, we try to use AUTH_SYS, and fail if
the server does not support it.  This prevents us from negotiating
an inappropriate security flavor (some servers list AUTH_NULL first).

If the server does not support AUTH_SYS, the user must provide an
appropriate security flavor by specifying the "sec=" mount option.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09 15:09:32 -04:00
Chuck Lever
059f90b323 NFS: Fix auth flavor len accounting
Previous logic in the NFS mount parsing code path assumed
auth_flavor_len was set to zero for simple authentication flavors
(like AUTH_UNIX), and 1 for compound flavors (like AUTH_GSS).

At some earlier point (maybe even before the option parsers were
merged?) specific checks for auth_flavor_len being zero were removed
from the functions that validate the mount option that sets the mount
point's authentication flavor.

Since we are populating an array for authentication flavors, the
auth_flavor_len should always be set to the number of flavors.  Let's
eliminate some cleverness here, and prepare for new logic that needs
to know the number of flavors in the auth_flavors[] array.

(auth_flavors[] is an array because at some point we want to allow a
list of acceptable authentication flavors to be specified via the sec=
mount option.  For now it remains a single element array).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09 15:09:31 -04:00
Chuck Lever
0b524123c9 NFS: Add ability to send MOUNTPROC_UMNT to the kernel's mountd client
After certain failure modes of an NFS mount, an NFS client should send
a MOUNTPROC_UMNT request to remove the just-added mount entry from the
server's mount table.  While no-one should rely on the accuracy of the
server's mount table, sending a UMNT is simply being a good internet
neighbor.

Since NFS mount processing is handled in the kernel now, we will need
a function in the kernel's mountd client that can post a MOUNTRPC_UMNT
request, in order to handle these failure modes.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09 15:09:30 -04:00
Chuck Lever
f3f4f4ed26 NFS: Fix up new minorversion= option
The new minorversion= mount option (commit 3fd5be9e) was merged at
the same time as the recent sloppy parser fixes (commit a5a16bae),
so minorversion= still uses the old value parsing logic.

If the minorversion= option specifies a bogus value, it should fail
with "bad value" not "bad option."

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09 15:09:29 -04:00
Trond Myklebust
c140aa9135 NFSv4: Clean up the nfs.callback_tcpport option
Tighten up the validity checking in param_set_port: check for NULL pointers.
Ensure that the option shows up on 'modinfo' output.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09 15:06:19 -04:00
Trond Myklebust
80e52aced1 NFSv4: Don't do idmapper upcalls for asynchronous RPC calls
We don't want to cause rpciod to hang...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09 15:06:19 -04:00
Trond Myklebust
62ab460cf5 NFSv4: Add 'server capability' flags for NFSv4 recommended attributes
If the NFSv4 server doesn't support a POSIX attribute, the generic NFS code
needs to know that, so that it don't keep trying to poll for it.

However, by the same count, if the NFSv4 server does support that
attribute, then we should ensure that the inode metadata is appropriately
labelled as being untrusted. For instance, if we don't know the correct
value of the file's uid, we should certainly not be caching ACLs or ACCESS
results.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09 15:06:19 -04:00
Trond Myklebust
a78cb57a10 NFSv4: Don't loop forever on state recovery failure...
If the server is broken, then retrying forever won't fix it. We
should just give up after a while, and return an error to the user.
We set the number of retries to 10 for now...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09 15:06:19 -04:00
Roel Kluin
dd8ac1da41 nfs: Keep index within mnt_errtbl[]
Ensure that index i remains within array mnt_errtbl[] and mnt3_errtbl[].

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-08-09 15:06:19 -04:00
Trond Myklebust
d953126a28 NFSv4: Fix a problem whereby a buggy server can oops the kernel
We just had a case in which a buggy server occasionally returns the wrong
attributes during an OPEN call. While the client does catch this sort of
condition in nfs4_open_done(), and causes the nfs4_atomic_open() to return
-EISDIR, the logic in nfs_atomic_lookup() is broken, since it causes a
fallback to an ordinary lookup instead of just returning the error.

When the buggy server then returns a regular file for the fallback lookup,
the VFS allows the open, and bad things start to happen, since the open
file doesn't have any associated NFSv4 state.

The fix is firstly to return the EISDIR/ENOTDIR errors immediately, and
secondly to ensure that we are always careful when dereferencing the
nfs_open_context state pointer.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-07-21 19:22:38 -04:00
Trond Myklebust
fccba80455 NFSv4: Fix an NFSv4 mount regression
Commit 008f55d0e0 (nfs41: recover lease in
_nfs4_lookup_root) forces the state manager to always run on mount. This is
a bug in the case of NFSv4.0, which doesn't require us to send a
setclientid until we want to grab file state.

In any case, this is completely the wrong place to be doing state
management. Moving that code into nfs4_init_session...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-07-21 16:48:07 -04:00
Trond Myklebust
b64aec8d1e NFSv4: Fix an Oops in nfs4_free_lock_state
The oops http://www.kerneloops.org/raw.php?rawid=537858&msgid= appears to
be due to the nfs4_lock_state->ls_state field being uninitialised. This
happens if the call to nfs4_free_lock_state() is triggered at the end of
nfs4_get_lock_state().

The fix is to move the initialisation of ls_state into the allocator.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-07-21 16:47:46 -04:00
Alexey Dobriyan
405f55712d headers: smp_lock.h redux
* Remove smp_lock.h from files which don't need it (including some headers!)
* Add smp_lock.h to files which do need it
* Make smp_lock.h include conditional in hardirq.h
  It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT

  This will make hardirq.h inclusion cheaper for every PREEMPT=n config
  (which includes allmodconfig/allyesconfig, BTW)

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-12 12:22:34 -07:00
Linus Torvalds
81e4e1ba7e Revert "fuse: Fix build error" as unnecessary
This reverts commit 097041e576.

Trond had a better fix, which is the parent of this one ("Fix compile
error due to congestion_wait() changes")

Requested-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Larry Finger <Larry.Finger@lwfinger.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-11 11:22:34 -07:00
Larry Finger
097041e576 fuse: Fix build error
When building v2.6.31-rc2-344-g69ca06c, the following build errors are
found due to missing includes:

 CC [M]  fs/fuse/dev.o
fs/fuse/dev.c: In function ‘request_end’:
fs/fuse/dev.c:289: error: ‘BLK_RW_SYNC’ undeclared (first use in this function)
...
fs/nfs/write.c: In function ‘nfs_set_page_writeback’:
fs/nfs/write.c:207: error: ‘BLK_RW_ASYNC’ undeclared (first use in this function)

Signed-off-by: Larry Finger@lwfinger.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-10 19:09:46 -07:00
Jens Axboe
8aa7e847d8 Fix congestion_wait() sync/async vs read/write confusion
Commit 1faa16d228 accidentally broke
the bdi congestion wait queue logic, causing us to wait on congestion
for WRITE (== 1) when we really wanted BLK_RW_ASYNC (== 0) instead.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-07-10 20:31:53 +02:00
Alexey Dobriyan
b43f3cbd21 headers: mnt_namespace.h redux
Fix various silly problems wrt mnt_namespace.h:

 - exit_mnt_ns() isn't used, remove it
 - done that, sched.h and nsproxy.h inclusions aren't needed
 - mount.h inclusion was need for vfsmount_lock, but no longer
 - remove mnt_namespace.h inclusion from files which don't use anything
   from mnt_namespace.h

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-07-08 09:31:56 -07:00
Trond Myklebust
b88f8a546f NFS: Correct the NFS mount path when following a referral
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-22 21:28:25 -07:00
Trond Myklebust
0b75b35c7c NFS: Fix nfs_path() to always return a '/' at the beginning of the path
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-22 21:28:25 -07:00
Trond Myklebust
c02d7adf8c NFSv4: Replace nfs4_path_walk() with VFS path lookup in a private namespace
As noted in the previous patch, the NFSv4 client mount code currently
has several limitations. If the mount path contains symlinks, or
referrals, or even if it just contains a '..', then the client code in
nfs4_path_walk() will fail with an error.

This patch replaces the nfs4_path_walk()-based lookup with a helper
function that sets up a private namespace to represent the namespace on the
server, then uses the ordinary VFS and NFS path lookup code to walk down the
mount path in that namespace.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-22 21:28:25 -07:00
Linus Torvalds
7e0338c0de Merge branch 'for-2.6.31' of git://fieldses.org/git/linux-nfsd
* 'for-2.6.31' of git://fieldses.org/git/linux-nfsd: (60 commits)
  SUNRPC: Fix the TCP server's send buffer accounting
  nfsd41: Backchannel: minorversion support for the back channel
  nfsd41: Backchannel: cleanup nfs4.0 callback encode routines
  nfsd41: Remove ip address collision detection case
  nfsd: optimise the starting of zero threads when none are running.
  nfsd: don't take nfsd_mutex twice when setting number of threads.
  nfsd41: sanity check client drc maxreqs
  nfsd41: move channel attributes from nfsd4_session to a nfsd4_channel_attr struct
  NFS: kill off complicated macro 'PROC'
  sunrpc: potential memory leak in function rdma_read_xdr
  nfsd: minor nfsd_vfs_write cleanup
  nfsd: Pull write-gathering code out of nfsd_vfs_write
  nfsd: track last inode only in use_wgather case
  sunrpc: align cache_clean work's timer
  nfsd: Use write gathering only with NFSv2
  NFSv4: kill off complicated macro 'PROC'
  NFSv4: do exact check about attribute specified
  knfsd: remove unreported filehandle stats counters
  knfsd: fix reply cache memory corruption
  knfsd: reply cache cleanups
  ...
2009-06-22 12:55:50 -07:00
Benny Halevy
578e458568 nfs41: Move initialization of nfs4_opendata seq_res to nfs4_init_opendata_res
nfs4_open_recover_helper clears opendata->o_res
before calling nfs4_init_opendata_res, thus causing
NFSv4.0 OPEN operations to be sent rather than nfsv4.1.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-20 14:55:12 -04:00
Trond Myklebust
1f84603c09 Merge branch 'devel-for-2.6.31' into for-2.6.31
Conflicts:
	fs/nfs/client.c
	fs/nfs/super.c
2009-06-18 18:13:44 -07:00
James Morris
4bf259e3ae nfs: remove unnecessary NFS_INO_INVALID_ACL checks
Unless I'm mistaken, NFS_INO_INVALID_ACL is being checked twice during
getacl calls (i.e. first via nfs_revalidate_inode() and then by each all
site).

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 18:02:14 -07:00
Chuck Lever
a5a16bae70 NFS: More "sloppy" parsing problems
Specifying "port=-5" with the kernel's current mount option parser
generates "unrecognized mount option".  If "sloppy" is set, this
causes the mount to succeed and use the default values; the desired
behavior is that, since this is a valid option with an invalid value,
the mount should fail, even with "sloppy."

To properly handle "sloppy" parsing, we need to distinguish between
correct options with invalid values, and incorrect options.  We will
need to parse integer values by hand, therefore, and not rely on
match_token().

For instance, these must all fail with "invalid value":

	port=12345678
	port=-5
	port=samuel

and not with "unrecognized option," as they do currently.

Thus, for the sake of match_token() we need to treat the values for
these options as strings, and do the conversion to integers using
strict_strtol().

This is basically the same solution we used for the earlier "retry="
fix (commit ecbb3845), except in this case the kernel actually has to
parse the value, rather than ignore it.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 18:02:14 -07:00
Chuck Lever
d23c45fd84 NFS: Invalid mount option values should always fail, even with "sloppy"
Ian Kent reports:

"I've noticed a couple of other regressions with the options vers
and proto option of mount.nfs(8).

The commands:

mount -t nfs -o vers=<invalid version> <server>:/<path> /<mountpoint>
mount -t nfs -o proto=<invalid proto> <server>:/<path> /<mountpoint>

both immediately fail.

But if the "-s" option is also used they both succeed with the
mount falling back to defaults (by the look of it).

In the past these failed even when the sloppy option was given, as
I think they should. I believe the sloppy option is meant to allow
the mount command to still function for mount options (for example
in shared autofs maps) that exist on other Unix implementations but
aren't present in the Linux mount.nfs(8). So, an invalid value
specified for a known mount option is different to an unknown mount
option and should fail appropriately."

See RH bugzilla 486266.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 18:02:13 -07:00
Chuck Lever
065015e5ef NFS: Remove unused XDR decoder functions
Clean up: Remove xdr_decode_fhstatus() and xdr_decode_fhstatus3(), now
that they are unused.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 18:02:13 -07:00
Chuck Lever
8e02f6b9aa NFS: Update MNT and MNT3 reply decoding functions
Solder xdr_stream-based XDR decoding functions into the in-kernel mountd
client that are more careful about checking data types and watching for
buffer overflows.  The new MNT3 decoder includes support for auth-flavor
list decoding.

The "_sz" macro for MNT3 replies was missing the size of the file handle.
I've added this back, and included the size of the auth flavor array.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 18:02:13 -07:00
Chuck Lever
a14017db28 NFS: add XDR decoder for mountd version 3 auth-flavor lists
Introduce an xdr_stream-based XDR decoder that can unpack the auth-
flavor list returned in a MNT3 reply.

The nfs_mount() function's caller allocates an array, and passes the
size and a pointer to it.  The decoder decodes all the flavors it can
into the array, and returns the number of decoded flavors.

If the caller is not interested in the auth flavors, it can pass a
value of zero as the size of the pre-allocated array.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 18:02:12 -07:00
Chuck Lever
4fdcd9966d NFS: add new file handle decoders to in-kernel mountd client
Introduce xdr_stream-based XDR file handle decoders to the in-kernel
mountd client.  These are more careful than the existing decoder
functions about buffer overflows and data type and range checking.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 18:02:12 -07:00
Chuck Lever
fb12529577 NFS: Add separate mountd status code decoders for each mountd version
Introduce data structures and xdr_stream-based decoding functions for
unmarshalling mountd status codes properly.

Mountd version 3 uses specific standard error return codes that are
not errno values and not NFS3ERR_ values.  These have a well-defined
standard mapping to local errno values.  Introduce data structures
and a decoder function that map these status codes to local errno
values properly.  This is new functionality (but not used yet).

Version 1 mountd status values are defined by RFC 1094 as UNIX error
values (errno values).  Errno values on heterogeneous systems do not
necessarily match each other.  To avoid exposing possibly incorrect
errno values to upper layers, the current XDR decoder converts all
non-zero MNT version 1 status codes to -EACCES.

The OpenGroup XNFS standard provides a mapping similar to but smaller
than the version 3 error codes.  Implement a decoder that uses the XNFS
error codes, replacing the current decoder.

For both mountd protocol versions, map unrecognized errors to -EACCES.

Finally we introduce a replacement data structure for mnt_fhstatus
at this time, which is used by the new XDR decoders.  In addition to
documenting that the status value returned by the XDR decoders is
always an errno, this new structure will be expanded in subsequent
patches.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 18:02:12 -07:00
Chuck Lever
99835db430 NFS: remove unused function in fs/nfs/mount_clnt.c
Clean up: remove xdr_encode_dirpath() now that it has been replaced.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 18:02:11 -07:00
Chuck Lever
29a1bd6bf8 NFS: Use xdr_stream-based XDR encoder for MNT's dirpath argument
Check the length of the supplied dirpath, and see that it fits
properly in the RPC buffer.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 18:02:11 -07:00
Chuck Lever
2ad780978b NFS: Clean up MNT program definitions
Clean up:  Relocate MNT program procedure number definitions to the
only file that uses them.  Relocate the version number definitions,
which are shared, to nfs.h.  Remove duplicate program number
definitions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 18:02:11 -07:00
Chuck Lever
18fc316419 NFS: Fix false error return from nfs_callback_up() if ipv6.ko is not available
Clear "ret" if the error return from svc_create_xprt(AF_INET6) was
-EAFNOSUPORT.  Otherwise, callback start-up will succeed, but
nfs_callback_up() will return -EAFNOSUPPORT anyway, and the first
NFSv4 mount attempt after a reboot will fail.

Bug introduced by commit f738f517 in 2.6.30-rc1.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 18:02:10 -07:00
Chuck Lever
a21bdd9b96 NFS: Return error code from nfs_callback_up() to user space
If the kernel cannot start the NFSv4 callback service during a mount
request, it returns -ENOMEM to user space, resulting in this message:

   mount.nfs4: Cannot allocate memory

Adjust nfs_alloc_client() and nfs_get_client() to pass NFSv4 callback
start-up errors back to user space so a less mysterious error message
can be displayed by the mount command.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 18:02:10 -07:00
Chuck Lever
c381ad2cf2 NFS: Do not display the setting of the "intr" mount option
The "intr" mount option has been deprecated for a while, but
/proc/mounts continues to display "nointr" whether "intr" or "nointr"
has been specified for a mount point.

Since these options do not have any effect, simply do not display
them.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 18:02:09 -07:00
Suresh Jayaraman
bf40d3435c NFS: add support for splice writes
Adds support for splice writes. It effectively calls
generic_file_splice_write() to do the writes.

We need not worry about O_APPEND case as the combination of splice()
writes and O_APPEND is disallowed. This patch propagates NFS write
errors back to the caller. The number of bytes written via splice are
being added to NFSIO_NORMALWRITTENBYTES as these are effectively
cached writes.

Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 18:02:09 -07:00
Trond Myklebust
301933a0ac Merge commit 'linux-pnfs/nfs41-for-2.6.31' into nfsv41-for-2.6.31 2009-06-17 17:59:58 -07:00
Ricardo Labiaga
68f3f90133 nfs41: Backchannel: CB_SEQUENCE validation
Validates the callback's sessionID, the slot number, and the sequence ID.
Increments the slot's sequence.

Detects replays, but simply prints a debug message (if debugging is enabled
since we don't yet implement a duplicate request cache for the backchannel.
This should not present a problem, since only idempotent callbacks are
currently implemented.

Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: Backchannel: Be more obvious about the return value]
[nfs41: Backchannel: dprink in host order]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:43 -07:00
Ricardo Labiaga
963891ac43 nfs41: Backchannel: New find_client_with_session()
Finds the 'struct nfs_client' that matches the server's address, major
version number, and session ID.

Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:43 -07:00
Ricardo Labiaga
f8625a6a4b nfs41: Backchannel: Add a backchannel slot table to the session
Defines a new 'struct nfs4_slot_table' in the 'struct nfs4_session'
for use by the backchannel.  Initializes, resets, and destroys the backchannel
slot table in the same manner the forechannel slot table is initialized,
reset, and destroyed.

The sequenceid for each slot in the backchannel slot table is initialized
to 0, whereas the forechannel slotid's sequenceid is set to 1.

Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:42 -07:00
Ricardo Labiaga
050047ce71 nfs41: Backchannel: Refactor nfs4_init_slot_table()
Generalize nfs4_init_slot_table() so it can be used to initialize the
backchannel slot table in addition to the forechannel slot table.

Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:42 -07:00
Ricardo Labiaga
b73dafa7ac nfs41: Backchannel: Refactor nfs4_reset_slot_table()
Generalize nfs4_reset_slot_table() so it can be used to reset the
backchannel slot table in addition to the forechannel slot table.

Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:41 -07:00
Ricardo Labiaga
65fc64e547 nfs41: Backchannel: update cb_sequence args and results
Change the type of cs_addr and csr_status to 'struct sockaddr' and
'__be32' since the cb_sequence processing function will use existing
functionality that expects these types.

Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:40 -07:00
Benny Halevy
281fe15dc1 nfs41: verify CB_SEQUENCE position in callback compound
CB_SEQUENCE must appear first in the callback compound RPC.
If it is not the first operation NFS4ERR_SEQUENCE_POS must be returned.
If the first operation ni the CB_COMPOUND is not CB_SEQUENCE then
NFS4ERR_OP_NOT_IN_SESSION must be returned.

Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: refactor op preprocessing out of process_op]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:39 -07:00
Benny Halevy
4aece6a19c nfs41: cb_sequence xdr implementation
[nfs41: get rid of READMEM and COPYMEM for callback_xdr.c]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: get rid of READ64 in callback_xdr.c]
See http://linux-nfs.org/pipermail/pnfs/2009-June/007846.html
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:38 -07:00
Benny Halevy
d49433e1e3 nfs41: cb_sequence proc implementation
Currently, just free up any referring calls information.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: fix csr_{,target}highestslotid]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:38 -07:00
Benny Halevy
2d9b9ec344 nfs41: cb_sequence protocol level data structures
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:37 -07:00
Benny Halevy
34bc47c941 nfs41: consider minorversion in callback_xdr:process_op
Note that this patch changes the nfsv4.0 behavior also when
CONFIG_NFS_V4_1 is not defined where NFS4ERR_MINOR_VERS_MISMATCH
will be returned if the client received a CB_COMPOUND
with minorversion != 0.  Previously, it would have
returned NFS4ERR_OP_ILLEGAL for CB_SEQUENCE.
(or if the server is broken and sent OP_CB_GETATTR or OP_CB_RECALL
with minorversion!=0, they would have been processed normally.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: refactor op preprocessing out of process_op]
See http://linux-nfs.org/pipermail/pnfs/2009-June/007845.html
[nfs41: define CB_NOTIFY_DEVICEID as not supported]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:37 -07:00
Benny Halevy
45377b94ed nfs41: callback numbers definitions
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:36 -07:00
Benny Halevy
48a9e2d228 nfs41: decode minorversion 1 cb_compound header
decode cb_compound header conforming to
http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-26

Get rid of cb_compound_hdr_arg.callback_ident

callback_ident is not used anywhere so we shouldn't waste any memory to
store it.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: no need to break read_buf in decode_compound_hdr_arg]
See http://linux-nfs.org/pipermail/pnfs/2009-June/007844.html
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:35 -07:00
Benny Halevy
b8f2ef84b0 nfs41: store minorversion in cb_compound_hdr_arg
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:35 -07:00
Andy Adamson
5a0ffe544c nfs41: Release backchannel resources associated with session
Frees the preallocated backchannel resources that are associated with
this session when the session is destroyed.

A backchannel is currently created once per session. Destroy the backchannel
only when the session is destroyed.

Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com>
Signed-off-by: Andy Adamson<andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:34 -07:00
Andy Adamson
0f91421e8e nfs41: Client indicates presence of NFSv4.1 callback channel.
Set the SESSION4_BACK_CHAN flag to indicate the client supports a backchannel.

Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:33 -07:00
Andy Adamson
0b5b7ae0a8 nfs41: Setup the backchannel
The NFS v4.1 callback service has already been setup, and
rpc_xprt->serv points to the svc_serv structure describing it.
Invoke the xprt_setup_backchannel() initialization to pre-
allocate the necessary backchannel structures.

Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: change nfs4_put_session(nfs4_session**) to nfs4_destroy_session(nfs_session*)]
Signed-off-by: Alexandros Batsakis <Alexandros.Batsakis@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[moved xprt_setup_backchannel from nfs4_init_session to nfs4_init_backchannel]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:32 -07:00
Andy Adamson
e82dc22dac nfs41: Allow NFSv4 and NFSv4.1 callback services to coexist
Tracks the nfs_callback_info for both versions, enabling the callback
service for v4 and v4.1 to run concurrently and be stopped independently
of each other.

Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:32 -07:00
Benny Halevy
8f97524235 nfs41: create a svc_xprt for nfs41 callback thread and use for incoming callbacks
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:31 -07:00
Ricardo Labiaga
a43cde94fe nfs41: Implement NFSv4.1 callback service process.
nfs41_callback_up() initializes the necessary queues and creates the new
nfs41_callback_svc thread.  This thread executes the callback service which
waits for requests to arrive on the svc_serv->sv_cb_list.

NFS41_BC_MIN_CALLBACKS is set to 1 because we expect callbacks to not
cause substantial latency.

The actual processing of the callback will be implemented as a separate patch.

There is only one NFSv4.1 callback service.  The first caller of
nfs4_callback_up() creates the service, subsequent callers increment a
reference count on the service.  The service is destroyed when the last
caller invokes nfs_callback_down().

The transport needs to hold a reference to the callback service in order
to invoke it during callback processing.  Currently this reference is only
obtained when the service is first created.  This is incorrect, since
subsequent registrations for other transports will leave the xprt->serv
pointer uninitialized, leading to an oops when a callback arrives on
the "unreferenced" transport.

This patch fixes the problem by ensuring that a reference to the service
is saved in xprt->serv, either because the service is created by this
invocation to nfs4_callback_up() or by a prior invocation.

Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: Add a reference to svc_serv during callback service bring up]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[Type check arguments of nfs_callback_up]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: save svc_serv in nfs_callback_info]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[Removal of ugly #ifdefs]
[nfs41: Update to removal of ugly #ifdefs]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 14:11:29 -07:00
Trond Myklebust
5cd973c44a NFSv4/NLM: Push file locking BKL dependencies down into the NLM layer
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 13:23:01 -07:00
Trond Myklebust
3f09df70e3 NFS: Ensure we always hold the BKL when dereferencing inode->i_flock
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 13:23:00 -07:00
Trond Myklebust
965b5d6791 NFSv4: Handle more errors when recovering open file and locking state
It is possible for servers to return NFS4ERR_BAD_STATEID when
the state management code is recovering locks or is reclaiming state when
returning a delegation. Ensure that we handle that case.
While we're at it, add in handlers for NFS4ERR_STALE,
NFS4ERR_ADMIN_REVOKED, NFS4ERR_OPENMODE, NFS4ERR_DENIED and
NFS4ERR_STALE_STATEID, since the protocol appears to allow for them too.

Also handle ENOMEM...

Finally, rather than add new NFSv4.0-specific errors and error handling into
the generic delegation code, move that open file and locking state error
handling into the NFSv4 layer.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 13:22:59 -07:00
Trond Myklebust
d5122201a7 NFSv4: Move error handling out of the delegation generic code
The NFSv4 delegation recovery code is required by the protocol to handle
more errors. Rather than add NFSv4.0 specific errors into 'generic'
delegation code, we should move the error handling into the NFSv4 layer.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 13:22:58 -07:00
Trond Myklebust
01c3f05228 NFSv4: Fix the 'nolock' option regression
NFSv4 should just ignore the 'nolock' option. It is an NFSv2/v3 thing...
This fixes the Oops in http://bugzilla.kernel.org/show_bug.cgi?id=13330

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 13:22:58 -07:00
Benny Halevy
7146851376 nfs41: minorversion support for nfs4_{init,destroy}_callback
move nfs4_init_callback into nfs4_init_client_minor_version
and nfs4_destroy_callback into nfs4_clear_client_minor_version

as these need to happen also when auto-negotiating the minorversion
once the callback service for nfs41 becomes different than for nfs4.0

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: Fix checkpatch warning]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[Type check arguments of nfs_callback_up]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: Backchannel: Remove FIXME comment]
Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 13:06:01 -07:00
Benny Halevy
9bdaa86d2a nfs41: Refactor nfs4_{init,destroy}_callback for nfs4.0
Refactor-out code to bring the callback service up and down.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 12:43:47 -07:00
Benny Halevy
34dc1ad752 nfs41: increment_{open,lock}_seqid
Unlike minorversion0, in nfsv4.1 the open and lock seqids need
not be incremented by the client and should always be set to zero.

This is implemented using a new nfs_rpc_ops methods -
increment_open_seqid and increment_lock_seqid

Signed-off-by: Rahul Iyer <iyer@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: check for session not minorversion]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 12:43:45 -07:00
Andy Adamson
78722e9c92 nfs41: only retry EXCHANGE_ID on recoverable errors
Stops an infinite loop of EXCHANGE_ID.

Signed-off-by: Andy Adamson <andros@netapp.com>
[fixed checkpatch warnings]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 12:43:44 -07:00
Benny Halevy
008f55d0e0 nfs41: recover lease in _nfs4_lookup_root
This creates the nfsv4.1 session on mount.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 12:25:13 -07:00
Andy Adamson
b4b82607ff nfs41: get_clid_cred for EXCHANGE_ID
Unlike SETCLIENTID, EXCHANGE_ID requires a machine credential. Do not search
for credentials other than the machine credential.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 12:25:13 -07:00
Andy Adamson
90a16617ee nfs41: add a get_clid_cred function to nfs4_state_recovery_ops
EXCHANGE_ID has different credential requirements than SETCLIENTID.
Prepare for a separate credential function.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 12:25:12 -07:00
Andy Adamson
591d71cbde nfs41: establish sessions-based clientid
nfsv4.1 clientid is established via EXCHANGE_ID rather than
SETCLIENTID{,_CONFIRM}

This is implemented using a new establish_clid method in
nfs4_state_recovery_ops.

nfs41: establish clientid via exchange id only if cred != NULL

>From 2.6.26 reclaimer() uses machine cred for setting up the client id
therefore it is never expected to be NULL.

Signed-off-by: Rahul Iyer <iyer@netapp.com>
[removed dprintk]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: lease renewal]
[revamped patch for new nfs4_state_manager design]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 12:25:11 -07:00
Andy Adamson
a7b721037f nfs41: introduce get_state_renewal_cred
Use the machine cred for sending SEQUENCE to renew
the client's lease.

[revamp patch for new state management design starting 2.6.29]
[nfs41: support minorversion 1 for nfs4_check_lease]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: get cred in exchange_id when cred arg is NULL]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: use cl_machined_cred instead of cl_ex_cred]
    Since EXCHANGE_ID insists on using the machine credential, cl_ex_cred is
    not needed. nfs4_proc_exchange_id() is only called if the machine credential
    is available. Remove the credential logic from nfs4_proc_exchange_id.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 12:25:11 -07:00
Benny Halevy
8e69514f29 nfs41: support minorversion 1 for nfs4_check_lease
[moved nfs4_get_renew_cred related changes to
 "nfs41: introduce get_state_renewal_cred"]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 12:25:10 -07:00
Benny Halevy
29fba38b79 nfs41: lease renewal
Send a NFSv4.1 SEQUENCE op rather than RENEW that was deprecated in
minorversion 1.
Use the nfs_client minorversion to select reboot_recover/
network_partition_recovery/state_renewal ops.

Note: we use reclaimer to create the nfs41 session before there are any
cl_superblocks for the nfs_client.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: check for session not minorversion]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[revamped patch for new nfs4_state_manager design]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: obliterate nfs4_state_recovery_ops.renew_lease method]
    moved to nfs4_state_maintenance_ops
[also undid per-minorversion nfs4_state_recovery_ops here]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 12:25:09 -07:00
Andy Adamson
b069d94af7 nfs41: schedule async session reset
Define a new session reset state which is set upon a sequence operation error
in both the sync and async error handlers.

Place all new requests and all but the last outstanding rpc on the
slot_tbl_waitq. Spawn the recovery thread when the last slot is free.
Call nfs4_proc_destroy_session, reinitialize the session, call
nfs4_proc_create_session, clear the session reset state, and wake up the next
task on the slot_tbl_waitq.

Return the nfs4_proc_destroy_session status to the session reclaimer and
check for NFS4ERR_BADSESSION and NFS4ERR_DEADSESSION. Other destroy session
errors should be handled in nfs4_proc_destroy_session where the call can
be retried with adjusted arguments.

Signed-off-by: Andy Adamson<andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
nfs41: make nfs4_wait_bit_killable public]
    nfs4_wait_bit_killable to be used by NFSv4.1 session recover logic.
Signed-off-by: Rahul Iyer <iyer@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: have create_session work on nfs_client]
Signed-off-by: Andy Adamson <andros@netapp.com>
[nfs41: trigger the state manager for session reset]
    Replace the session reset state with the NFS4CLNT_SESSION_SETUP cl_state.
    Place all rpc tasks to sleep on the slot table waitqueue until the slot
    table is drained, then schedule state recovery and wait for it to complete.
Signed-off-by: Andy Adamson <andros@netapp.com>
[nfs41: remove nfs41_session_recovery [ch]
Replaced by using the nfs4_state_manager.
Signed-off-by: Andy Adamson <andros@netapp.com>
[nfs41: nfs4_wait_bit_killable only used locally]
[nfs41: keep nfs4_wait_bit_killable static]
[nfs41: keep const nfs_server in nfs4_handle_exception]
[nfs41: remove session parameter from nfs4_find_slot]
Signed-off-by: Andy Adamson <andros@netapp.com
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: resset the session from nfs41_setup_sequence]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 12:25:09 -07:00
Andy Adamson
4745e3154b nfs41: kick start nfs41 session recovery when handling errors
Remove checking for any errors that the SEQUENCE operation does not return.
-NFS4ERR_STALE_CLIENTID, NFS4ERR_EXPIRED, NFS4ERR_CB_PATH_DOWN, NFS4ERR_BACK_CHAN_BUSY, NFS4ERR_OP_NOT_IN_SESSION.

SEQUENCE operation error recovery is very primative, we only reset the session.

Remove checking for any errors that are returned by the SEQUENCE operation, but
that resetting the session won't address.
NFS4ERR_RETRY_UNCACHED_REP, NFS4ERR_SEQUENCE_POS,NFS4ERR_TOO_MANY_OPS.

Add error checking for missing SEQUENCE errors that a session reset will
address.
NFS4ERR_BAD_HIGH_SLOT, NFS4ERR_DEADSESSION, NFS4ERR_SEQ_FALSE_RETRY.

A reset of the session is currently our only response to a SEQUENCE operation
error. Don't reset the session on errors where a new session won't help.

Don't reset the session on errors where a new session won't help.

[nfs41: nfs4_async_handle_error update error checking]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: trigger the state manager for session reset]
    Replace session state bit with nfs_client state bit.  Set the
    NFS4CLNT_SESSION_SETUP bit upon a session related error in the sync/async
    error handlers.
[nfs41: _nfs4_async_handle_error fix session reset error list]
Sequence operation errors that session reset could help.
NFS4ERR_BADSESSION
NFS4ERR_BADSLOT
NFS4ERR_BAD_HIGH_SLOT
NFS4ERR_DEADSESSION
NFS4ERR_CONN_NOT_BOUND_TO_SESSION
NFS4ERR_SEQ_FALSE_RETRY
NFS4ERR_SEQ_MISORDERED

Sequence operation errors that a session reset would not help

NFS4ERR_BADXDR
NFS4ERR_DELAY
NFS4ERR_REP_TOO_BIG
NFS4ERR_REP_TOO_BIG_TO_CACHE
NFS4ERR_REQ_TOO_BIG
NFS4ERR_RETRY_UNCACHED_REP
NFS4ERR_SEQUENCE_POS
NFS4ERR_TOO_MANY_OPS

Signed-off-by: Andy Adamson <andros@netapp.com>
[nfs41 nfs4_handle_exception fix session reset error list]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[moved nfs41_sequece_call_done code to nfs41: sequence operation]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 12:25:08 -07:00
Andy Adamson
eedc020e71 nfs41: use rpc prepare call state for session reset
[nfs41: change nfs4_restart_rpc argument]
[nfs41: check for session not minorversion]
[nfs41: trigger the state manager for session reset]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[always define nfs4_restart_rpc]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 12:25:07 -07:00
Andy Adamson
c3fad1b1aa nfs41: add session reset to state manager
Move the code to reset a session from the session_reclaimer to the
nfs4_state_manager.  Destroy the session, and create a new one. Treat
NFS4ERR_BADSESSION and NFS4ERR_DEADSESSION as a successful
nfs4_proc_destroy_session. Signal nfs4_proc_create_session that this is a
session reset so that the session slot table is re-used.

If the clientid is stale, set both NFS4CLNT_LEASE_EXPIRED and
NFS4CLNT_SESSION_SETUP bits and retry.

Use a switch statement in nfs4_session_recovery_handle_error for future
patche which will add handling for other errors.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>

[nfs41: session reset in nfs4_recovery_handle_error]
Signed-off-by: Andy Adamson <andros@netapp.com>
[nfs41: reset session on nfs4_do_reclaim session reset error]
    If nfs4_do_reclaim gets a session reset error, nfs4_recovery_handle_error
    will set the NFS4CLNT_SESSION_SETUP bit, and the state manager should
    continue processing to reset the session.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[move nfs4_proc_destroy_session declaration here]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 12:25:06 -07:00
Andy Adamson
76db6d9500 nfs41: add session setup to the state manager
At mount, nfs_alloc_client sets the cl_state NFS4CLNT_LEASE_EXPIRED bit
and nfs4_alloc_session sets the NFS4CLNT_SESSION_SETUP bit, so both bits are
set when nfs4_lookup_root calls nfs4_recover_expired_lease which schedules
the nfs4_state_manager and waits for it to complete.

Place the session setup after the clientid establishment in nfs4_state_manager
so that the session is setup right after the clientid has been established
without rescheduling the state manager.

Unlike nfsv4.0, the nfs_client struct is not ready to use until the session
has been established.  Postpone marking the nfs_client struct to NFS_CS_READY
until after a successful CREATE_SESSION call so that other threads cannot use
the client until the session is established.

If the EXCHANGE_ID call fails and the session has not been setup (the
NFS4CLNT_SESSION_SETUP bit is set), mark the client with the error and return.

If the session setup CREATE_SESSION call fails with NFS4ERR_STALE_CLIENTID
which could occur due to server reboot or network partition inbetween the
EXCHANGE_ID and CREATE_SESSION call, reset the NFS4CLNT_LEASE_EXPIRED and
NFS4CLNT_SESSION_SETUP bits and try again.

If the CREATE_SESSION call fails with other errors, mark the client with
the error and return.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>

[nfs41: NFS_CS_SESSION_SETUP cl_cons_state for back channel setup]
  On session setup, the CREATE_SESSION reply races with the server back channel
  probe which needs to succeed to setup the back channel. Set a new
  cl_cons_state NFS_CS_SESSION_SETUP just prior to the CREATE_SESSION call
  and add it as a valid state to nfs_find_client so that the client back channel
  can find the nfs_client struct and won't drop the server backchannel probe.
  Use a new cl_cons_state so that NFSv4.0 back channel behaviour which only
  sets NFS_CS_READY is unchanged.
  Adjust waiting on the nfs_client_active_wq accordingly.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>

[nfs41: rename NFS_CS_SESSION_SETUP to NFS_CS_SESSION_INITING]
Signed-off-by: Andy Adamson <andros@netapp.com>
[nfs41: set NFS_CL_SESSION_INITING in alloc_session]
Signed-off-by: Andy Adamson <andros@netapp.com>
[nfs41: move session setup into a function]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[moved nfs4_proc_create_session declaration here]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 12:25:06 -07:00
Andy Adamson
ac72b7b3b3 nfs41: reset the session slot table
Separated from nfs41: schedule async session reset

Do not kfree the session slot table upon session reset, just re-initialize it.
Add a boolean to nfs4_proc_create_session to inidicate if this is a
session reset or a session initialization.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 12:25:05 -07:00
Andy Adamson
fc01cea963 nfs41: sequence operation
Implement the sequence operation conforming to
http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-26

Check returned sessionid, slotid and slot sequenceid in decode_sequence.

If the server returns different values for sessionID, slotID or slot sequence
number than what was sent, the server is looney tunes.

Pass the sequence operation status to nfs41_sequence_done in order to
determine when to increment the slot sequence ID.

Free slot is separated from sequence done.

Signed-off-by: Rahul Iyer <iyer@netapp.com>
Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com>
Signed-off-by: Andy Adamson<andros@umich.edu>
[nfs41: sequence res use slotid]
Signed-off-by: Andy Adamson <andros@netapp.com>
[nfs41: deref slot table in decode_sequence only for minorversion!=0]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: nfs4_call_sync]
[nfs41: remove SEQ4_STATUS_USE_TK_STATUS]
[nfs41: return ESERVERFAULT in decode_sequence]
[no sr_session, no sr_flags]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: use nfs4_call_sync_sequence to renew session lease]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: remove nfs4_call_sync_sequence forward definition]
Signed-off-by: Andy Adamson <andros@netapp.com>
[nfs41: use struct nfs_client for nfs41_proc_async_sequence]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: pass *session in seq_args and seq_res]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41 nfs41_sequence_call_done update error checking]
[nfs41 nfs41_sequence_done update error checking]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: remove switch on error from nfs41_sequence_call_done]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 12:25:04 -07:00
Andy Adamson
8328d59f38 nfs41: enable nfs_client only nfs4_async_handle_error
The session is per struct nfs_client, not per nfs_server. Allow the handler
to be called with no nfs_server which simplifies the nfs4_proc_async_sequence session renewal call and will let it be used by pnfs file layout data servers.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 12:25:04 -07:00
Andy Adamson
0f3e66c6a6 nfs41: destroy_session operation
Implement the destroy_session operation conforming to
http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-26

Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com>
Signed-off-by: Andy Adamson<andros@umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: remove extraneous rpc_clnt pointer]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41; NFS_CS_READY required for DESTROY_SESSION]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: pass *session in seq_args and seq_res]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
[nfs41: fix encode_destroy_session's xdr Xcoding pointer type]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 12:24:55 -07:00
Andy Adamson
96b09e024f nfs41: use session attributes for rsize and wsize
Set the mount points rsize and wsize to the negotiated session fore channel
maximum response and requeset size. These values will be bound checked in
nfs_server_set_fsinfo.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[move nfs4_session_set_rwsize into CONFIG_NFS_V4]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 12:24:53 -07:00
Andy Adamson
8d35301d7d nfs41: verify session channel attribues
Invalidate the session if the server returns invalid fore or back channel
attributes.

Use a KERN_WARNING to report the fatal session estabishment error.

Signed-off-by: Andy Adamson <andros@netapp.com>
[refactor nfs4_verify_channel_attrs]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 12:24:52 -07:00
Andy Adamson
fc931582c2 nfs41: create_session operation
Implement the create_session operation conforming to
http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-26

Set the real fore channel max operations to preserve server resources.
Note: If the server returns < NFS4_MAX_OPS, the client will very soon
get an NFS4ERR_TOO_MANY_OPS. A later patch will handle this.

Set the max_rqst_sz and max_resp_sz to PAGE_SIZE - we preallocate the buffers.

Set the back channel max_resp_sz_cached to zero to force the client to
always set csa_cachethis to FALSE because the current implementation
of the back channel DRC only supports caching the CB_SEQUENCE operation.

The client back channel server supports one slot, and desires 2 operations
per compound.

Signed-off-by: Ricardo Labiaga <ricardo.labiaga@netapp.com>
Signed-off-by: Andy Adamson<andros@umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: remove extraneous rpc_clnt pointer]
Use the struct nfs_client cl_rpcclient.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: nfs4_init_channel_attrs, just use nfs41_create_session_args]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: use rsize and wsize for session channel attributes]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: set channel max operations]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: set back channel attributes]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: obliterate nfs4_adjust_channel_attrs]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: have create_session work on nfs_client]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: move CONFIG_NFS_V4_1 endif]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: pass *session in seq_args and seq_res]
[moved nfs4_init_slot_table definition here]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: use kcalloc to allocate slot table]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
[nfs41: fix Xcode_create_session's xdr Xcoding pointer type]
[nfs41: refactor decoding of channel attributes]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 12:24:34 -07:00
Andy Adamson
2050f0cc07 nfs41: get_lease_time
get_lease_time uses the FSINFO rpc operation to
get the lease time attribute.

nfs4_get_lease_time() is only called from the state manager on session setup
so don't recover from clientid or sequence level errors.

We do need to recover from NFS4ERR_DELAY or NFS4ERR_GRACE.
Use NFS4_POLL_RETRY_MIN - the Linux server returns NFS4ERR_DELAY when an
upcall is needed to resolve an uncached export referenced by a file handle.

[nfs41: sequence res use slotid]
Signed-off-by: Andy Adamson<andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: remove extraneous rpc_clnt pointer]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: have get_lease_time work on nfs_client]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: get_lease_time recover from NFS4ERR_DELAY]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: pass *session in seq_args and seq_res]
[define nfs4_get_lease_time_{args,res}]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 12:24:32 -07:00
Benny Halevy
99fe60d062 nfs41: exchange_id operation
Implement the exchange_id operation conforming to
http://tools.ietf.org/html/draft-ietf-nfsv4-minorversion1-26

Unlike NFSv4.0, NFSv4.1 requires machine credentials. RPC_AUTH_GSS machine
credentials will be passed into the kernel at mount time to be available for
the exchange_id operation.

RPC_AUTH_UNIX root mounts can use the UNIX root credential. Store the root
credential in the nfs_client struct.

Without a credential, NFSv4.1 state renewal fails.

[nfs41: establish clientid via exchange id only if cred != NULL]
Signed-off-by: Andy Adamson<andros@umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfsd41: move nfstime4 from under CONFIG_NFS_V4_1]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: do not wait a lease time in exchange id]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: pass *session in seq_args and seq_res]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
[nfs41: Ignoring impid in decode_exchange_id is missing a READ_BUF]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: fix Xcode_exchange_id's xdr Xcoding pointer type]
[nfs41: get rid of unused struct nfs41_exchange_id_res members]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
2009-06-17 12:23:57 -07:00
Andy Adamson
938e101091 nfs41 delegreturn sequence setup done support
Separate delegreturn calls from nfs41: sequence setup/done support

Implement the delegreturn rpc_call_prepare method for
asynchronuos nfs rpcs, call nfs41_setup_sequence from
respective rpc_call_validate_args methods.

Call nfs4_sequence_done from respective rpc_call_done methods.

Note that we need to pass a pointer to the nfs_server in calls data
for passing on to nfs4_sequence_done.

Signed-off-by: Andy Adamson<andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[pnfs: client data server write validate and release]
Signed-off-by: Andy Adamson<andros@umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 10:46:51 -07:00
Andy Adamson
21d9a851aa nfs41 commit sequence setup done support
Separate commit calls from nfs41: sequence setup/done support

Implement the commit rpc_call_prepare method for
asynchronuos nfs rpcs, call nfs41_setup_sequence from
respective rpc_call_validate_args methods.

Call nfs4_sequence_done from respective rpc_call_done methods.

Note that we need to pass a pointer to the nfs_server in calls data
for passing on to nfs4_sequence_done.

Signed-off-by: Andy Adamson<andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[pnfs: client data server write validate and release]
Signed-off-by: Andy Adamson<andros@umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: Support sessions with O_DIRECT.]
Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: separate free slot from sequence done]
[nfs41: nfs4_sequence_free_slot use nfs_client for data server]
Signed-off-by: Andy Adamson<andros@umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 10:46:50 -07:00
Andy Adamson
def6ed7ef4 nfs41 write sequence setup done support
Separate write calls from nfs41: sequence setup/done support

Implement the write rpc_call_prepare method for
asynchronuos nfs rpcs, call nfs41_setup_sequence from
respective rpc_call_validate_args methods.

Call nfs4_sequence_done from respective rpc_call_done methods.

Note that we need to pass a pointer to the nfs_server in calls data
for passing on to nfs4_sequence_done.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[pnfs: client data server write validate and release]
Signed-off-by: Andy Adamson <andros@umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[move the nfs4_sequence_free_slot call in nfs_readpage_retry from]
[nfs41: separate free slot from sequence done
Signed-off-by: Andy Adamson <andros@umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: Support sessions with O_DIRECT.]
Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: nfs4_sequence_free_slot use nfs_client for data server]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 10:46:49 -07:00
Andy Adamson
f11c88af26 nfs41: read sequence setup/done support
Implement the read rpc_call_prepare method for
asynchronuos nfs rpcs, call nfs41_setup_sequence from
respective rpc_call_validate_args methods.

Call nfs4_sequence_done from respective rpc_call_done methods.

Note that we need to pass a pointer to the nfs_server in calls data
for passing on to nfs4_sequence_done.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[pnfs: client data server write validate and release]
Signed-off-by: Andy Adamson <andros@umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[move the nfs4_sequence_free_slot call in nfs_readpage_retry from]
[nfs41: separate free slot from sequence done]
[remove nfs_readargs.nfs_server, use calldata->inode instead]
Signed-off-by: Andy Adamson <andros@umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: Support sessions with O_DIRECT]
Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: nfs4_sequence_free_slot use nfs_client for data server]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 10:46:48 -07:00
Andy Adamson
472cfbd9b9 nfs41: unlink sequence setup/done support
Implement the rpc_call_prepare methods for
asynchronuos nfs rpcs, call nfs41_setup_sequence from
respective rpc_call_validate_args methods.

Call nfs4_sequence_done from respective rpc_call_done methods.

Note that we need to pass a pointer to the nfs_server in calls data
for passing on to nfs4_sequence_done.

Signed-off-by: Andy Adamson<andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[pnfs: client data server write validate and release]
Signed-off-by: Andy Adamson<andros@umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: separate free slot from sequence done]
[nfs41: sequence res use slotid]
[nfs41: remove SEQ4_STATUS_USE_TK_STATUS]
[nfs41: nfs4_sequence_free_slot use nfs_client for data server]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 10:46:47 -07:00
Andy Adamson
a893693c15 nfs41: locku sequence setup/done support
Separate nfs4_locku calls from nfs41: sequence setup/done support
Call nfs4_sequence_done from respective rpc_call_done methods.

Note that we need to pass a pointer to the nfs_server in calls data
for passing on to nfs4_sequence_done.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[pnfs: client data server write validate and release]
Signed-off-by: Andy Adamson <andros@umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: nfs4_sequence_free_slot use nfs_client for data server]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 10:46:46 -07:00
Andy Adamson
66179efee3 nfs41: lock sequence setup/done support
Separate nfs4_lock calls from nfs41: sequence setup/done support
Call nfs4_sequence_done from respective rpc_call_done methods.

Note that we need to pass a pointer to the nfs_server in calls data
for passing on to nfs4_sequence_done.

Signed-off-by: Andy Adamson<andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[pnfs: client data server write validate and release]
[use nfs4_sequence_done_free_slot]
Signed-off-by: Andy Adamson<andros@umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 10:46:45 -07:00
Andy Adamson
d898528cdb nfs41: open sequence setup/done support
Separate nfs4_open calls from nfs41: sequence setup/done support
Call nfs4_sequence_done from respective rpc_call_done methods.

Note that we need to pass a pointer to the nfs_server in calls data
for passing on to nfs4_sequence_done.

Signed-off-by: Andy Adamson<andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[pnfs: client data server write validate and release]
[use nfs4_sequence_done_free_slot]
Signed-off-by: Andy Adamson<andros@umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 10:46:44 -07:00
Andy Adamson
19ddab06ed nfs41: close sequence setup/done support
Separate nfs4_close calls from nfs41: sequence setup/done support
Call nfs4_sequence_done from respective rpc_call_done methods.

Note that we need to pass a pointer to the nfs_server in calls data
for passing on to nfs4_sequence_done.

Signed-off-by: Andy Adamson<andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[pnfs: client data server write validate and release]
Signed-off-by: Andy Adamson<andros@umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: separate free slot from sequence done]
[nfs41: sequence res use slotid]
[nfs41: remove SEQ4_STATUS_USE_TK_STATUS]
[nfs41: nfs4_sequence_free_slot use nfs_client for data server]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 10:46:43 -07:00
Andy Adamson
69ab40c4c3 nfs41: nfs41_call_sync_done
Implement nfs4.1 synchronous rpc_call_done method
that essentially just calls nfs4_sequence_done, that turns
around and calls nfs41_sequence_done for minorversion1 rpcs.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: check for session not minorversion]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[move adding nfs4_sequence_free_slot from nfs41-separate-free-slot-from-sequence-done]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: nfs41_call_sync_data use nfs_client not nfs_server]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 10:46:42 -07:00
Andy Adamson
b0df806c0f nfs41: nfs41_sequence_done
Handle session level errors, update slot sequence id and
sessions bookeeping, free slot.

[nfs41: sequence res use slotid]
Signed-off-by: Andy Adamson<andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: remove SEQ4_STATUS_USE_TK_STATUS]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: check for session not minorversion]
Signed-off-by: Andy Adamson <andros@netapp.com>
[nfs41: bail out early out of nfs41_sequence_done if !res->sr_session]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[move nfs4_sequence_done from nfs41: nfs41_call_sync_done]
Signed-off-by: Andy Adamson <andros@netapp.com>
[move nfs4_sequence_free_slot from nfs41: separate free slot from sequence done]
    Don't free the slot until after all rpc_restart_calls have completed.
    Session reset will require more work.
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[moved reset sr_slotid to nfs41_sequence_free_slot]
[free slot also on unexpectecd error]
[remove seq_res.sr_session member, use nfs_client's instead]
[ditch seq_res.sr_flags until used]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[look at sr_slotid for bailing out early from nfs41_sequence_done]
[nfs41: rpc_wake_up_next if sessions slot was not consumed.]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: nfs4_sequence_free_slot use nfs_client for data server]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: remove unused error checking in nfs41_sequence_done]
Signed-off-by: Andy Adamson <andros@netapp.com>
[nfs41: remove nfs4_has_session check in nfs41_sequence_done]
Signed-off-by: Andy Adamson <andros@netapp.com>
[nfs41: remove nfs_client pointer check]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 10:46:41 -07:00
Andy Adamson
13615871cd nfs41: nfs41_sequence_free_slot
[from nfs41: separate free slot from sequence done]

Don't free the slot until after all rpc_restart_calls have completed.
Session reset will require more work.

As noted by Trond, since we're using rpc_wake_up_next rather than
rpc_wake_up() we must always wake up the next task in the queue
either by going through nfs4_free_slot, or just calling
rpc_wake_up_next if no slot is to be freed.

[nfs41: sequence res use slotid]
[nfs41: remove SEQ4_STATUS_USE_TK_STATUS]
[got rid of nfs4_sequence_res.sr_session, use nfs_client.cl_session instead]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: rpc_wake_up_next if sessions slot was not consumed.]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: nfs4_sequence_free_slot use nfs_client for data server]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 10:46:41 -07:00
Andy Adamson
e2c4ab3ce2 nfs41: free slot
Free a slot in the slot table.

Mark the slot as free in the bitmap-based allocation table
by clearing a bit corresponding to the slotid.

Update lowest_free_slotid if freed slotid is lower than that.
Update highest_used_slotid.  In the case the freed slotid
equals the highest_used_slotid, scan downwards for the next
highest used slotid using the optimized fls* functions.

Finally, wake up thread waiting on slot_tbl_waitq for a free slot
to become available.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: free slot use slotid]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: use find_first_zero_bit for nfs4_find_slot]
    While at it, obliterate lowest_free_slotid and fix-up related comments.
    As per review comment 21/85.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: use __clear_bit for nfs4_free_slot]
    While at it, fix-up function comment.
    Part of review comment 22/85.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: use find_last_bit in nfs4_free_slot to determine highest used slot.]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: rpc_sleep_on slot_tbl_waitq must be called under slot_tbl_lock]
    Otherwise there's a race (we've hit) with nfs4_free_slot where
    nfs41_setup_sequence sees a full slot table, unlocks slot_tbl_lock,
    nfs4_free_slots happen concurrently and call rpc_wake_up_next
    where there's nobody to wake up yet, context goes back to
    nfs41_setup_sequence which goes to sleep when the slot table
    is actually empty now and there's no-one to wake it up anymore.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 10:46:39 -07:00
Andy Adamson
fbcd4abcb3 nfs41: setup_sequence method
Allocate a slot in the session slot table and set the sequence op arguments.

Called at the rpc prepare stage.

Add a status to nfs41_sequence_res, initialize it to one so that we catch
rpc level failures which do not go through decode_sequence which sets
the new status field.

Note that upon an rpc level failure, we don't know if the server processed the
sequence operation or not. Proceed as if the server did process the sequence
operation.

Signed-off-by: Rahul Iyer <iyer@netapp.com>
[nfs41: sequence args use slotid]
[nfs41: find slot return slotid]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: remove SEQ4_STATUS_USE_TK_STATUS]
As per 11-14-08 review
[move extern declaration from nfs41: sequence setup/done support]
[removed sa_session definition, changed sa_cache_this into a u8 to reduce footprint]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: rpc_sleep_on slot_tbl_waitq must be called under slot_tbl_lock]
    Otherwise there's a race (we've hit) with nfs4_free_slot where
    nfs41_setup_sequence sees a full slot table, unlocks slot_tbl_lock,
    nfs4_free_slots happen concurrently and call rpc_wake_up_next
    where there's nobody to wake up yet, context goes back to
    nfs41_setup_sequence which goes to sleep when the slot table
    is actually empty now and there's no-one to wake it up anymore.
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 10:46:39 -07:00
Benny Halevy
510b81756f nfs41: find slot
Find a free slot using bitmap-based allocation.
Use the optimized ffz function to find a zero bit
in the bitmap that indicates a free slot, starting
the search from the 'lowest_free_slotid' position.

If found, mark the slot as used in the bitmap, get
the slot's slotid and seqid, and update max_slotid
to be used by the SEQUENCE operation.

Also, update lowest_free_slotid for next search.

If no free slot was found the caller has to wait
for a free slot (outside the scope of this function)

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: find slot return slotid]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[use find_first_zero_bit for nfs4_find_slot as per review comment 21/85.]
[use NFS4_MAX_SLOT_TABLE rather than NFS4_NO_SLOT]
[nfs41: rpc_sleep_on slot_tbl_waitq must be called under slot_tbl_lock]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 10:46:37 -07:00
Andy Adamson
ce5039c1be nfs41: nfs4_setup_sequence
Perform the nfs4_setup_sequence in the rpc_call_prepare state.

If a session slot is not available, we will rpc_sleep_on the
slot wait queue leaving the tk_action as rpc_call_prepare.

Once we have a session slot, hang on to it even through rpc_restart_calls.
Ensure the nfs41_sequence_res sr_slot pointer is NULL before rpc_run_task is
called as nfs41_setup_sequence will only find a new slot if it is NULL.

A future patch will call free slot after any rpc_restart_calls, and handle the
rpc restart that result from a sequence operation error.

Signed-off-by: Rahul Iyer <iyer@netapp.com>
[nfs41: sequence res use slotid]
Signed-off-by: Andy Adamson<andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: simplify nfs4_call_sync]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: nfs4_call_sync]
[nfs41: check for session not minorversion]
[nfs41: remove rpc_message from nfs41_call_sync_args]
[moved NFS4_MAX_SLOT_TABLE logic into nfs41_setup_sequence]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: nfs41_call_sync_data use nfs_client not nfs_server]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: expose nfs4_call_sync_session for lease renewal]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: remove unnecessary return check]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 10:46:36 -07:00
Andy Adamson
9b7b9fcc9c nfs41: xdr {encode,decode}_sequence
Implement stubs for encode and decode sequence, defined as no-ops when
CONFIG_NFS_V4_1 is not defined.
Add the nfsv41 encode and decode sizes. Add encode_sequence to all
nfs4_enc_* routines and decode_sequence to all nfs4_dec_* routines as required
by v41.

[was nfs41: minorversion support for xdr]
[added nfs_client argument to encode_sequence so not to use sequence_args to pass sa_session]
Signed-off-by: Andy Adamson<andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: pass *session in seq_args and seq_res]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 10:46:36 -07:00
Benny Halevy
66cc042970 nfs41: encode minorversion in compound header
Signed-off-by: Andy Adamdon <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: pass *session in seq_args and seq_res]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 10:46:35 -07:00
Benny Halevy
28f566942c NFS: use dynamically computed compound_hdr.replen for xdr_inline_pages offset
As Trond suggested, rather than passing a constant to xdr_inline_pages,
keep a running count of the expected reply bytes.  In preparation for
nfs41, where additional op sequence are expteced when talking to nfs41
servers.

[NFS: cb_compoundhdr.replen is in words not bytes]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: get fs_locations replen before encoding the GETATTR]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: get getacl replen before encoding the GETATTR]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 10:46:34 -07:00
Benny Halevy
dadf0c2767 NFS: update hdr->replen for every encode op
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 10:46:33 -07:00
Benny Halevy
0c4e8c1877 NFS: define and initialize compound_hdr.replen
replen holds the running count of expected reply bytes.
repl will then be used by encoding routines for xdr_inline_pages offset
after which data bytes are to be received directly into the xdr
buffer pages.

NOTE: According to the nfsv4 and v4.1 RFCs, the replied tag SHOULD be the same
is the one sent, but this is not required as a MUST for the server to do so.
The server may screw us if it replies a tag of a different length in the
compound result.

[NFS: cb_compoundhdr.replen is in words not bytes]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 10:46:32 -07:00
Benny Halevy
6ce183919b NFS: use decode_change_info_maxsz for xdr maxsz calculations
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 10:46:31 -07:00
Andy Adamson
5f7dbd5c75 nfs41: set up seq_res.sr_slotid
Initialize nfs4_sequence_res sr_slotid to NFS4_MAX_SLOT_TABLE.

[was nfs41: sequence res use slotid]
Signed-off-by: Andy Adamson <andros@netapp.com>
[pulled definition of struct nfs4_sequence_res.sr_slotid to here]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 10:46:30 -07:00
Benny Halevy
f3752975ca nfs41: nfs41: pass *session in seq_args and seq_res
To be used for getting the rpc's minorversion and for nfs41 xdr
{en,de}coding of the sequence operation.
Reset the seq session ptrs for minorversion=0 rpc calls.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 10:46:29 -07:00
Andy Adamson
cccef3b96a nfs41: introduce nfs4_call_sync
Use nfs4_call_sync rather than rpc_call_sync to provide
for a nfs41 sessions-enabled interface for sessions manipulation.

The nfs41 rpc logic uses the rpc_call_prepare method to
recover and create the session, as well as selecting a free slot id
and the rpc_call_done to free the slot and update slot table
related metadata.

In the coming patches we'll add rpc prepare and done routines
for setting up the sequence op and processing the sequence result.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: nfs4_call_sync]
As per 11-14-08 review.
Squash into "nfs41: introduce nfs4_call_sync" and "nfs41: nfs4_setup_sequence"
Define two functions one for v4 and one for v41
add a pointer to struct nfs4_client to the correct one.
Signed-off-by: Andy Adamson <andros@netapp.com>
[added BUG() in _nfs4_call_sync_session if !CONFIG_NFS_V4_1]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: check for session not minorversion]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[group minorversion specific stuff together]
Signed-off-by: Alexandros Batsakis <Alexandros.Batsakis@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Andy Adamson <andros@netapp.com>
[nfs41: fixup nfs4_clear_client_minor_version]
[introduce nfs4_init_client_minor_version() in this patch]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[cleaned-up patch: got rid of nfs_call_sync_t, dprintks, cosmetics, extra server defs]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 10:46:28 -07:00
Benny Halevy
22958463d5 nfs41: use nfs4_fs_locations_res
In preparation for nfs41 sequence processing.

Signed-off-by: Andy Admason <andros@netapp.com>
[find nfs4_fs_locations_res]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 10:46:27 -07:00
Benny Halevy
73c403a9a9 nfs41: use nfs4_setaclres
In preparation for nfs41 sequence processing.

Signed-off-by: Andy Admason <andros@netapp.com>
[define nfs_setaclres]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 10:46:26 -07:00
Benny Halevy
9e9ecc03d6 NFS: get rid of unused xdr decode_setattr(, res) argument
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 10:46:26 -07:00
Benny Halevy
663c79b3cd nfs41: use nfs4_getaclres
In preparation for nfs41 sequence processing.

Signed-off-by: Andy Admason <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: embed resp_len in nfs_getaclres]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 10:46:25 -07:00
Benny Halevy
d45b2989a7 nfs41: use nfs4_pathconf_res
In preparation for nfs41 sequence processing.

Signed-off-by: Andy Admason <andros@netapp.com>
[define nfs4_pathconf_res]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 10:46:24 -07:00
Benny Halevy
3dda5e4347 nfs41: use nfs4_fsinfo_res
In preparation for nfs41 sequence processing.

Signed-off-by: Andy Admason <andros@netapp.com>
[define nfs4_fsinfo_res]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 10:46:23 -07:00
Benny Halevy
24ad148a0f nfs41: use nfs4_statfs_res
In preparation for nfs41 sequence processing.

Signed-off-by: Andy Admason <andros@netapp.com>
[define nfs4_statfs_res]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 10:46:22 -07:00
Benny Halevy
f50c700081 nfs41: use nfs4_readlink_res
In preparation for nfs41 sequence processing.

Signed-off-by: Andy Admason <andros@netapp.com>
[define nfs4_readlink_res]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 10:46:21 -07:00
Benny Halevy
43652ad553 nfs41: use nfs4_server_caps_arg
In preparation for nfs41 sequence processing.

Signed-off-by: Andy Admason <andros@netapp.com>
[define nfs4_server_caps_arg]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 10:46:20 -07:00
Andy Adamson
557134a39c nfs41: sessions client infrastructure
NFSv4.1 Sessions basic data types, initialization, and destruction.

The session is always associated with a struct nfs_client that holds
the exchange_id results.

Signed-off-by: Rahul Iyer <iyer@netapp.com>
Signed-off-by: Andy Adamson<andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[remove extraneous rpc_clnt pointer, use the struct nfs_client cl_rpcclient.
remove the rpc_clnt parameter from nfs4 nfs4_init_session]
Signed-off-by: Andy Adamson<andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[Use the presence of a session to determine behaviour instead of the
minorversion number.]
Signed-off-by: Andy Adamson <andros@netapp.com>
[constified nfs4_has_session's struct nfs_client parameter]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[Rename nfs4_put_session() to nfs4_destroy_session() and call it from nfs4_free_client() not nfs4_free_server().
Also get rid of nfs4_get_session() and the ref_count in nfs4_session struct as keeping track of nfs_client should be sufficient]
Signed-off-by: Alexandros Batsakis <Alexandros.Batsakis@netapp.com>
[nfs41: pass rsize and wsize into nfs4_init_session]
Signed-off-by: Andy Adamson <andros@netapp.com>
[separated out removal of rpc_clnt parameter from nfs4_init_session ot a
 patch of its own]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[Pass the nfs_client pointer into nfs4_alloc_session]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: don't assign to session->clp->cl_session in nfs4_destroy_session]
[nfs41: fixup nfs4_clear_client_minor_version]
[introduce nfs4_clear_client_minor_version() in this patch]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[Refactor nfs4_init_session]
    Moved session allocation into nfs4_init_client_minor_version, called from
    nfs4_init_client.
    Leave rwise and wsize initialization in nfs4_init_session, called from
    nfs4_init_server.
    Reverted moving of nfs_fsid definition to nfs_fs_sb.h
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: Move NFS4_MAX_SLOT_TABLE define from under CONFIG_NFS_V4_1]
[Fix comile error when CONFIG_NFS_V4_1 is not set.]
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[moved nfs4_init_slot_table definition to "create_session operation"]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[nfs41: alloc session with GFP_KERNEL]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 10:46:19 -07:00
Benny Halevy
c2e713dd83 nfs41: translate NFS4ERR_MINOR_VERS_MISMATCH to EPROTONOSUPPORT
To be returned to the mount command when trying to mount a v4 server
using minorversion 1.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 10:46:18 -07:00
Benny Halevy
5aae4a9ae0 nfs41: Use mount minorversion option
Use the mount minorversion option to initialize the nfs_client cl_minorversion
and match it in nfs_match_client() when looking up a nfs_client.

[nfs41: remove ifdefs around nfs_client_initdata.minorversion]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 10:46:17 -07:00
Benny Halevy
94a417f3d7 nfs41: nfs_client.cl_minorversion
This field is set to the nfsv4 minor version for this mount.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>

Note: This patch sets the referral to the same minorversion as the
current mount. Revisit in future patch.

Signed-off-by: Andy Adamson <andros@netapp.com>
[removed cl_minorversion assignment in nfs_set_client]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
[always define nfs_client.cl_minorversion]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 10:46:16 -07:00
Mike Sager
3fd5be9e19 nfs41: add mount command option minorversion
mount -t nfs4 -o minorversion=[0|1] specifies whether to use 4.0 or 4.1.
By default, the minorversion is set to 0.

Signed-off-by: Mike Sager <sager@netapp.com>
[set default minorversion to 0 as per Trond and SteveD's request]
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 10:46:15 -07:00
Ricardo Labiaga
1efae38140 nfs41: Add Kconfig symbols for NFSv4.1
Added CONFIG_NFS_V4_1 and made it depend upon CONFIG_NFS_V4 and EXPERIMENTAL.
Indicate that CONFIG_NFS_V4_1 is for NFS developers at the moment

At the moment we're expecting folks trying out nfs41 to
actively participate in the development process by helping us
debug issues and ideally send patches to fix problems.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-06-17 10:46:13 -07:00
Thomas Gleixner
8b0b1db013 remove put_cpu_no_resched()
put_cpu_no_resched() is an optimization of put_cpu() which unfortunately
can cause high latencies.

The nfs iostats code uses put_cpu_no_resched() in a code sequence where a
reschedule request caused by an interrupt between the get_cpu() and the
put_cpu_no_resched() can delay the reschedule for at least HZ.

The other users of put_cpu_no_resched() optimize correctly in interrupt
code, but there is no real harm in using the put_cpu() function which is
an alias for preempt_enable().  The extra check of the preemmpt count is
not as critical as the potential source of missing a reschedule.

Debugged in the preempt-rt tree and verified in mainline.

Impact: remove a high latency source

[akpm@linux-foundation.org: build fix]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-06-16 19:47:48 -07:00
J. Bruce Fields
7eef4091a6 Merge commit 'v2.6.30' into for-2.6.31 2009-06-15 18:08:07 -07:00
Alessio Igor Bogani
337eb00a2c Push BKL down into ->remount_fs()
[xfs, btrfs, capifs, shmem don't need BKL, exempt]

Signed-off-by: Alessio Igor Bogani <abogani@texware.it>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-11 21:36:11 -04:00
Al Viro
9393bd07cf switch follow_down()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-06-11 21:36:01 -04:00
Trond Myklebust
95baa25c73 NFSv4: Fix the case where NFSv4 renewal fails
If the asynchronous lease renewal fails (usually due to a soft timeout),
then we _must_ schedule state recovery in order to ensure that we don't
lose the lease unnecessarily or, if the lease is already lost, that we
recover the locking state promptly...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-05-26 14:51:00 -04:00
Sam Ravnborg
d0367a508a nfs: fix build error in nfsroot with initconst
fix build error with latest kbuild adjustments to initconst.

The commit a447c09324 ("vfs: Use
const for kernel parser table") changed:

    static match_table_t __initdata tokens = {
to
    static match_table_t __initconst tokens = {

But the missing const causes popwerpc to fail with latest
updates to __initconst like this:

fs/nfs/nfsroot.c:400: error: __setup_str_nfs_root_setup causes a section type conflict
fs/nfs/nfsroot.c:400: error: __setup_str_nfs_root_setup causes a section type conflict

The bug is only present with kbuild-next.
Following patch has been build tested.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-05-26 14:51:00 -04:00
Frank Filz
7ee2cb7f32 nfs: Fix NFS v4 client handling of MAY_EXEC in nfs_permission.
The problem is that permission checking is skipped if atomic open is
possible, but when exec opens a file, it just opens it O_READONLY which
means EXEC permission will not be checked at that time.

This problem is observed by the following sequence (executed as root):

  mount -t nfs4 server:/ /mnt4
  echo "ls" >/mnt4/foo
  chmod 744 /mnt4/foo
  su guest -c "mnt4/foo"

Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
Tested-by: Eugene Teo <eugeneteo@kernel.sg>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-05-18 20:11:12 -07:00
Randy Dunlap
dd4dc82d4c lockd: fix FILE_LOCKING=n build error
lockd/svclock.c is missing a header file <linux/fs.h>.

<linux/fs.h> is missing a definition of locks_release_private()
for the config case of FILE_LOCKING=n, causing a build error:

fs/lockd/svclock.c:330: error: implicit declaration of function 'locks_release_private'

lockd without FILE_LOCKING doesn't make sense, so make LOCKD and LOCKD_V4
depend on FILE_LOCKING, and make NFS depend on FILE_LOCKING.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-05-13 15:59:10 -04:00
Al Viro
6f5bbff9a1 Convert obvious places to deactivate_locked_super()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-05-09 10:49:40 -04:00
Alessio Igor Bogani
67e55205ec vfs: umount_begin BKL pushdown
Push BKL down into ->umount_begin()

Signed-off-by: Alessio Igor Bogani <abogani@texware.it>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-05-09 10:49:38 -04:00
Trond Myklebust
7fdf523067 NFS: Close page_mkwrite() races
Follow up to Nick Piggin's patches to ensure that nfs_vm_page_mkwrite
returns with the page lock held, and sets the VM_FAULT_LOCKED flag.

See http://bugzilla.kernel.org/show_bug.cgi?id=12913

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-05-02 19:42:39 -07:00
Trond Myklebust
8340437210 NFS: Fix the XDR iovec calculation in nfs3_xdr_setaclargs
Commit ae46141ff0 (NFSv3: Fix posix ACL code)
introduces a bug in the calculation of the XDR header iovec. In the case
where we are inlining the acls, we need to adjust the length of the iovec
req->rq_svec, in addition to adjusting the total buffer length.

Tested-by: Leonardo Chiquitto <leonardo.lists@gmail.com>
Tested-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-21 07:46:49 -07:00
Trond Myklebust
2b2ec7554c NFS: Fix the return value in nfs_page_mkwrite()
Commit c2ec175c39 ("mm: page_mkwrite
change prototype to match fault") exposed a bug in the NFS
implementation of page_mkwrite.  We should be returning 0 on success...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-07 14:07:03 -07:00
Trond Myklebust
d508afb437 NFS: Fix a double free in nfs_parse_mount_options()
Due to an apparent typo, commit a67d18f89f
(NFS: load the rpc/rdma transport module automatically) lead to the
'proto=' mount option doing a double free, while Opt_mountproto leaks a
string.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-06 17:19:48 -07:00
David Howells
b797cac748 NFS: Add mount options to enable local caching on NFS
Add NFS mount options to allow the local caching support to be enabled.

The attached patch makes it possible for the NFS filesystem to be told to make
use of the network filesystem local caching service (FS-Cache).

To be able to use this, a recent nfsutils package is required.

There are three variant NFS mount options that can be added to a mount command
to control caching for a mount.  Only the last one specified takes effect:

 (*) Adding "fsc" will request caching.

 (*) Adding "fsc=<string>" will request caching and also specify a uniquifier.

 (*) Adding "nofsc" will disable caching.

For example:

	mount warthog:/ /a -o fsc

The cache of a particular superblock (NFS FSID) will be shared between all
mounts of that volume, provided they have the same connection parameters and
are not marked 'nosharecache'.

Where it is otherwise impossible to distinguish superblocks because all the
parameters are identical, but the 'nosharecache' option is supplied, a
uniquifying string must be supplied, else only the first mount will be
permitted to use the cache.

If there's a key collision, then the second mount will disable caching and give
a warning into the kernel log.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
2009-04-03 16:42:48 +01:00
David Howells
5d1acff159 NFS: Display local caching state
Display the local caching state in /proc/fs/nfsfs/volumes.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
2009-04-03 16:42:47 +01:00
David Howells
7f8e05f60c NFS: Store pages from an NFS inode into a local cache
Store pages from an NFS inode into the cache data storage object associated
with that inode.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
2009-04-03 16:42:45 +01:00
David Howells
9a9fc1c033 NFS: Read pages from FS-Cache into an NFS inode
Read pages from an FS-Cache data storage object representing an inode into an
NFS inode.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
2009-04-03 16:42:44 +01:00
David Howells
f42b293d6d NFS: nfs_readpage_async() needs to be accessible as a fallback for local caching
nfs_readpage_async() needs to be non-static so that it can be used as a
fallback for the local on-disk caching should an EIO crop up when reading the
cache.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
2009-04-03 16:42:44 +01:00
David Howells
1fcdf53488 NFS: Add read context retention for FS-Cache to call back with
Add read context retention so that FS-Cache can call back into NFS when a read
operation on the cache fails EIO rather than reading data.  This permits NFS to
then fetch the data from the server instead using the appropriate security
context.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
2009-04-03 16:42:44 +01:00
David Howells
545db45f0f NFS: FS-Cache page management
FS-Cache page management for NFS.  This includes hooking the releasing and
invalidation of pages marked with PG_fscache (aka PG_private_2) and waiting for
completion of the write-to-cache flag (PG_fscache_write aka PG_owner_priv_2).

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
2009-04-03 16:42:44 +01:00
David Howells
6a51091d07 NFS: Add some new I/O counters for FS-Cache doing things for NFS
Add some new NFS I/O counters for FS-Cache doing things for NFS.  A new line is
emitted into /proc/pid/mountstats if caching is enabled that looks like:

	fsc: <rok> <rfl> <wok> <wfl> <unc>

Where <rok> is the number of pages read successfully from the cache, <rfl> is
the number of failed page reads against the cache, <wok> is the number of
successful page writes to the cache, <wfl> is the number of failed page writes
to the cache, and <unc> is the number of NFS pages that have been disconnected
from the cache.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
2009-04-03 16:42:43 +01:00
David Howells
d599064a1b NFS: Invalidate FsCache page flags when cache removed
Invalidate the FsCache page flags on the pages belonging to an inode when the
cache backing that NFS inode is removed.

This allows a live cache to be withdrawn.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
2009-04-03 16:42:43 +01:00
David Howells
ef79c097bb NFS: Use local disk inode cache
Bind data storage objects in the local cache to NFS inodes.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
2009-04-03 16:42:43 +01:00
David Howells
10329a5d48 NFS: Define and create inode-level cache objects
Define and create inode-level cache data storage objects (as managed by
nfs_inode structs).

Each inode-level object is created in a superblock-level index object and is
itself a data storage object into which pages from the inode are stored.

The inode object key is the NFS file handle for the inode.

The inode object is given coherency data to carry in the auxiliary data
permitted by the cache.  This is a sequence made up of:

 (1) i_mtime from the NFS inode.

 (2) i_ctime from the NFS inode.

 (3) i_size from the NFS inode.

 (4) change_attr from the NFSv4 attribute data.

As the cache is a persistent cache, the auxiliary data is checked when a new
NFS in-memory inode is set up that matches an already existing data storage
object in the cache.  If the coherency data is the same, the on-disk object is
retained and used; if not, it is scrapped and a new one created.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
2009-04-03 16:42:43 +01:00
David Howells
08734048b3 NFS: Define and create superblock-level objects
Define and create superblock-level cache index objects (as managed by
nfs_server structs).

Each superblock object is created in a server level index object and is itself
an index into which inode-level objects are inserted.

Ideally there would be one superblock-level object per server, and the former
would be folded into the latter; however, since the "nosharecache" option
exists this isn't possible.

The superblock object key is a sequence consisting of:

 (1) Certain superblock s_flags.

 (2) Various connection parameters that serve to distinguish superblocks for
     sget().

 (3) The volume FSID.

 (4) The security flavour.

 (5) The uniquifier length.

 (6) The uniquifier text.  This is normally an empty string, unless the fsc=xyz
     mount option was used to explicitly specify a uniquifier.

The key blob is of variable length, depending on the length of (6).

The superblock object is given no coherency data to carry in the auxiliary data
permitted by the cache.  It is assumed that the superblock is always coherent.

This patch also adds uniquification handling such that two otherwise identical
superblocks, at least one of which is marked "nosharecache", won't end up
trying to share the on-disk cache.  It will be possible to manually provide a
uniquifier through a mount option with a later patch to avoid the error
otherwise produced.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
2009-04-03 16:42:42 +01:00
David Howells
147272813e NFS: Define and create server-level objects
Define and create server-level cache index objects (as managed by nfs_client
structs).

Each server object is created in the NFS top-level index object and is itself
an index into which superblock-level objects are inserted.

Ideally there would be one superblock-level object per server, and the former
would be folded into the latter; however, since the "nosharecache" option
exists this isn't possible.

The server object key is a sequence consisting of:

 (1) NFS version

 (2) Server address family (eg: AF_INET or AF_INET6)

 (3) Server port.

 (4) Server IP address.

The key blob is of variable length, depending on the length of (4).

The server object is given no coherency data to carry in the auxiliary data
permitted by the cache.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
2009-04-03 16:42:42 +01:00
David Howells
8ec442ae4c NFS: Register NFS for caching and retrieve the top-level index
Register NFS for caching and retrieve the top-level cache index object cookie.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
2009-04-03 16:42:42 +01:00
David Howells
3b9ce977b2 NFS: Permit local filesystem caching to be enabled for NFS
Permit local filesystem caching to be enabled for NFS in the kernel
configuration.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
2009-04-03 16:42:42 +01:00
David Howells
6b9b3514aa NFS: Add comment banners to some NFS functions
Add comment banners to some NFS functions so that they can be modified by the
NFS fscache patches for further information.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Steve Dickson <steved@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Daire Byrne <Daire.Byrne@framestore.com>
2009-04-03 16:42:41 +01:00
Linus Torvalds
8fe74cf053 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
  Remove two unneeded exports and make two symbols static in fs/mpage.c
  Cleanup after commit 585d3bc06f
  Trim includes of fdtable.h
  Don't crap into descriptor table in binfmt_som
  Trim includes in binfmt_elf
  Don't mess with descriptor table in load_elf_binary()
  Get rid of indirect include of fs_struct.h
  New helper - current_umask()
  check_unsafe_exec() doesn't care about signal handlers sharing
  New locking/refcounting for fs_struct
  Take fs_struct handling to new file (fs/fs_struct.c)
  Get rid of bumping fs_struct refcount in pivot_root(2)
  Kill unsharing fs_struct in __set_personality()
2009-04-02 21:09:10 -07:00
Trond Myklebust
cc85906110 Merge branch 'devel' into for-linus 2009-04-01 13:28:15 -04:00
Nick Piggin
c2ec175c39 mm: page_mkwrite change prototype to match fault
Change the page_mkwrite prototype to take a struct vm_fault, and return
VM_FAULT_xxx flags.  There should be no functional change.

This makes it possible to return much more detailed error information to
the VM (and also can provide more information eg.  virtual_address to the
driver, which might be important in some special cases).

This is required for a subsequent fix.  And will also make it easier to
merge page_mkwrite() with fault() in future.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <joel.becker@oracle.com>
Cc: Artem Bityutskiy <dedekind@infradead.org>
Cc: Felix Blyakher <felixb@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-04-01 08:59:14 -07:00
Al Viro
ce3b0f8d5c New helper - current_umask()
current->fs->umask is what most of fs_struct users are doing.
Put that into a helper function.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-03-31 23:00:26 -04:00
Alexey Dobriyan
99b7623380 proc 2/2: remove struct proc_dir_entry::owner
Setting ->owner as done currently (pde->owner = THIS_MODULE) is racy
as correctly noted at bug #12454. Someone can lookup entry with NULL
->owner, thus not pinning enything, and release it later resulting
in module refcount underflow.

We can keep ->owner and supply it at registration time like ->proc_fops
and ->data.

But this leaves ->owner as easy-manipulative field (just one C assignment)
and somebody will forget to unpin previous/pin current module when
switching ->owner. ->proc_fops is declared as "const" which should give
some thoughts.

->read_proc/->write_proc were just fixed to not require ->owner for
protection.

rmmod'ed directories will be empty and return "." and ".." -- no harm.
And directories with tricky enough readdir and lookup shouldn't be modular.
We definitely don't want such modular code.

Removing ->owner will also make PDE smaller.

So, let's nuke it.

Kudos to Jeff Layton for reminding about this, let's say, oversight.

http://bugzilla.kernel.org/show_bug.cgi?id=12454

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2009-03-31 01:14:44 +04:00
Chuck Lever
3c8c45dfab NFS: Simplify logic to compare socket addresses in client.c
Callback requests from IPv4 servers are now always guaranteed to be
AF_INET, and never mapped IPv4 AF_INET6 addresses.  Both
nfs_match_client() and nfs_find_client() can now share the same
address comparison logic, so fold them together.

We can also dispense with of most of the conditional compilation
in here.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-28 16:51:04 -04:00
Trond Myklebust
d188262d60 Merge commit '9f4c899c0d90e1b51b6864834f3877b47c161a0e' into devel 2009-03-28 16:50:58 -04:00
Chuck Lever
f738f51703 NFS: Start PF_INET6 callback listener only if IPv6 support is available
Apparently a lot of people need to disable IPv6 completely on their
distributor-built systems, which have CONFIG_IPV6_MODULE enabled at
build time.

They do this by blacklisting the ipv6.ko module.  This causes the
creation of the NFSv4 callback service listener to fail if
CONFIG_IPV6_MODULE is set, but the module cannot be loaded.

Now that the kernel's PF_INET6 RPC listeners are completely separate
from PF_INET listeners, we can always start PF_INET.  Then the NFS
client can try to start a PF_INET6 listener, but it isn't required
to be available.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-28 16:02:43 -04:00
Chuck Lever
26298caaca NFS: Revert creation of IPv6 listeners for lockd and NFSv4 callbacks
We're about to convert over to using separate PF_INET and PF_INET6
listeners, instead of a single PF_INET6 listener that also receives
AF_INET requests and maps them to AF_INET6.

Clear the way by removing the logic in lockd and the NFSv4 callback
server that creates an AF_INET6 service listener.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-28 15:55:06 -04:00
Chuck Lever
49a9072f29 SUNRPC: Remove @family argument from svc_create() and svc_create_pooled()
Since an RPC service listener's protocol family is specified now via
svc_create_xprt(), it no longer needs to be passed to svc_create() or
svc_create_pooled().  Remove that argument from the synopsis of those
functions, and remove the sv_family field from the svc_serv struct.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-28 15:54:48 -04:00
Chuck Lever
9652ada3fb SUNRPC: Change svc_create_xprt() to take a @family argument
The sv_family field is going away.  Pass a protocol family argument to
svc_create_xprt() instead of extracting the family from the passed-in
svc_serv struct.

Again, as this is a listener socket and not an address, we make this
new argument an "int" protocol family, instead of an "sa_family_t."

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-28 15:54:36 -04:00
Al Viro
f786aa90e0 constify dentry_operations: NFS
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-03-27 14:43:59 -04:00
Trond Myklebust
7fe5c398fc NFS: Optimise NFS close()
Close-to-open cache consistency rules really only require us to flush out
writes on calls to close(), and require us to revalidate attributes on the
very last close of the file.

Currently we appear to be doing a lot of extra attribute revalidation
and cache flushes.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-19 15:35:50 -04:00
Trond Myklebust
b1e4adf4ea NFS: Fix the notifications when renaming onto an existing file
NFS appears to be returning an unnecessary "delete" notification when
we're doing an atomic rename. See

  http://bugzilla.gnome.org/show_bug.cgi?id=575684

The fix is to get rid of the redundant call to d_delete().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-19 15:35:49 -04:00
Trond Myklebust
47c6256420 NFS: Fix up a mismerged patch
Move the definition of nfs_need_commit() into the #ifdef CONFIG_NFS_V3
section as originally intended in the patch "NFS: cleanup - remove
struct nfs_inode->ncommit"

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-19 15:17:40 -04:00
Trond Myklebust
9f4c899c0d NFS: Fix the fix to Bugzilla #11061, when IPv6 isn't defined...
Stephen Rothwell reports:

Today's linux-next build (powerpc ppc64_defconfig) failed like this:

fs/built-in.o: In function `.nfs_get_client':
client.c:(.text+0x115010): undefined reference to `.__ipv6_addr_type'

Fix by moving the IPV6 specific parts of commit
d7371c41b0 ("Bug 11061, NFS mounts dropped")
into the '#ifdef IPV6..." section.

Also fix up a couple of formatting issues.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-12 14:51:32 -04:00
Tom Talpey
a67d18f89f NFS: load the rpc/rdma transport module automatically
When mounting an NFS/RDMA server with the "-o proto=rdma" or
"-o rdma" options, attempt to dynamically load the necessary
"xprtrdma" client transport module. Doing so improves usability,
while avoiding a static module dependency and any unnecesary
resources.

Signed-off-by: Tom Talpey <tmtalpey@gmail.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11 14:37:56 -04:00
Trond Myklebust
e1ebfd33be NFS: Kill the "defined but not used" compile error on nommu machines
Bryan Wu reports that when compiling NFS on nommu machines he gets a
"defined but not used" error on nfs_file_mmap().

The easiest fix is simply to get rid of the special casing in NFS, and
just always call generic_file_mmap() to set up the file.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11 14:37:54 -04:00
Trond Myklebust
72cb77f4a5 NFS: Throttle page dirtying while we're flushing to disk
The following patch is a combination of a patch by myself and Peter
Staubach.

Trond: If we allow other processes to dirty pages while a process is doing
a consistency sync to disk, we can end up never making progress.

Peter: Attached is a patch which addresses a continuing problem with
the NFS client generating out of order WRITE requests.  While
this is compliant with all of the current protocol
specifications, there are servers in the market which can not
handle out of order WRITE requests very well.  Also, this may
lead to sub-optimal block allocations in the underlying file
system on the server.  This may cause the read throughputs to
be reduced when reading the file from the server.

Peter: There has been a lot of work recently done to address out of
order issues on a systemic level.  However, the NFS client is
still susceptible to the problem.  Out of order WRITE
requests can occur when pdflush is in the middle of writing
out pages while the process dirtying the pages calls
generic_file_buffered_write which calls
generic_perform_write which calls
balance_dirty_pages_rate_limited which ends up calling
writeback_inodes which ends up calling back into the NFS
client to writes out dirty pages for the same file that
pdflush happens to be working with.

Signed-off-by: Peter Staubach <staubach@redhat.com>
[modification by Trond to merge the two similar patches]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11 14:10:30 -04:00
Trond Myklebust
fb8a1f11b6 NFS: cleanup - remove struct nfs_inode->ncommit
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11 14:10:29 -04:00
Trond Myklebust
a65318bf3a NFSv4: Simplify some cache consistency post-op GETATTRs
Certain asynchronous operations such as write() do not expect
(or care) that other metadata such as the file owner, mode, acls, ...
change. All they want to do is update and/or check the change attribute,
ctime, and mtime.
By skipping the file owner and group update, we also avoid having to do a
potential idmapper upcall for these asynchronous RPC calls.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11 14:10:28 -04:00
Trond Myklebust
69aaaae18f NFSv4: A referral is assumed to always point to a directory.
Fix a bug whereby we would fail to create a mount point for a referral.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11 14:10:28 -04:00
Trond Myklebust
409924e4c9 NFSv4: Make decode_getfattr() set fattr->valid to reflect what was decoded
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11 14:10:27 -04:00
Trond Myklebust
f26c7a7887 NFSv4: Clean up decode_getfattr()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11 14:10:26 -04:00
Trond Myklebust
bca794785c NFS: Fix the type of struct nfs_fattr->mode
There is no point in using anything other than umode_t, since we copy the
content pretty much directly into inode->i_mode.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11 14:10:26 -04:00
Trond Myklebust
1ca277d88d NFS: Shrink the struct nfs_fattr
We don't need the bitmap[] field anymore, since the 'valid' field tells us
all we need to know about which attributes were filled in...
Also move the pre-op attributes in order to improve the structure packing.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11 14:10:25 -04:00
Trond Myklebust
9e6e70f8d8 NFSv4: Support NFSv4 optional attributes in the struct nfs_fattr
Currently, filling struct nfs_fattr is more or less an all or nothing
operation, since NFSv2 and NFSv3 have only mandatory attributes.
In NFSv4, some attributes are optional, and so we may simply not be able to
fill in those fields. Furthermore, NFSv4 allows you to specify which
attributes you are interested in retrieving, thus permitting you to
optimise away retrieval of attributes that you know will no change...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11 14:10:24 -04:00
Trond Myklebust
78f945f88e NFSv4: Ignore errors on the post-op attributes in SETATTR calls
There is no need to fail or retry a SETATTR call just because the post-op
GETATTR failed.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11 14:10:23 -04:00
NeilBrown
37d9d76d8b NFS: flush cached directory information slightly more readily.
If cached directory contents becomes incorrect, there is no way to
flush the contents.  This contrasts with files where file locking is
the recommended way to ensure cache consistency between multiple
applications (a read-lock always flushes the cache).

Also while changes to files often change the size of the file (thus
triggering a cache flush), changes to directories often do not change
the apparent size (as the size is often rounded to a block size).

So it is particularly important with directories to avoid the
possibility of an incorrect cache wherever possible.

When the link count on a directory changes it implies a change in the
number of child directories, and so a change in the contents of this
directory.  So use that as a trigger to flush cached contents.

When the ctime changes but the mtime does not, there are two possible
reasons.
 1/ The owner/mode information has been changed.
 2/ utimes has been used to set the mtime backwards.

In the first case, a data-cache flush is not required.
In the second case it is.

So on the basis that correctness trumps performance, flush the
directory contents cache in this case also.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11 14:10:23 -04:00
Suresh Jayaraman
2b57dc6cf9 NFS: Minor __nfs_revalidate_inode cleanup
Remove redundant NFS_STALE() check, a leftover due to the commit
691beb13cd

Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-11 14:10:22 -04:00
Ian Dall
d7371c41b0 Bug 11061, NFS mounts dropped
Addresses: http://bugzilla.kernel.org/show_bug.cgi?id=11061

sockaddr structures can't be reliably compared using memcmp() because
there are padding bytes in the structure which can't be guaranteed to
be the same even when the sockaddr structures refer to the same
socket. Instead compare all the relevant fields. In the case of IPv6
sin6_flowinfo is not compared because it only affects QoS and
sin6_scope_id is only compared if the address is "link local" because
"link local" addresses need only be unique to a specific link.

Signed-off-by: Ian Dall <ian@beware.dropbear.id.au>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-10 20:33:22 -04:00
Suresh Jayaraman
a71ee337b3 NFS: Handle -ESTALE error in access()
Hi Trond,

I have been looking at a bugreport where trying to open applications on KDE
on a NFS mounted home fails temporarily. There have been multiple reports on
different kernel versions pointing to this common issue:
http://bugzilla.kernel.org/show_bug.cgi?id=12557
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/269954
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=508866.html

This issue can be reproducible consistently by doing this on a NFS mounted
home (KDE):
1. Open 2 xterm sessions
2. From one of the xterm session, do "ssh -X <remote host>"
3. "stat ~/.Xauthority" on the remote SSH session
4. Close the two xterm sessions
5. On the server do a "stat ~/.Xauthority"
6. Now on the client, try to open xterm
This will fail.

Even if the filehandle had become stale, the NFS client should invalidate
the cache/inode and should repeat LOOKUP. Looking at the packet capture when
the failure occurs shows that there were two subsequent ACCESS() calls with
the same filehandle and both fails with -ESTALE error.

I have tested the fix below. Now the client issue a LOOKUP after the
ACCESS() call fails with -ESTALE. If all this makes sense to you, can you
consider this for inclusion?

Thanks,


If the server returns an -ESTALE error due to stale filehandle in response to
an ACCESS() call, we need to invalidate the cache and inode so that LOOKUP()
can be retried. Without this change, the nfs client retries ACCESS() with the
same filehandle, fails again and could lead to temporary failure of
applications running on nfs mounted home.

Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-10 20:33:21 -04:00
Trond Myklebust
ae46141ff0 NFSv3: Fix posix ACL code
Fix a memory leak due to allocation in the XDR layer. In cases where the
RPC call needs to be retransmitted, we end up allocating new pages without
clearing the old ones. Fix this by moving the allocation into
nfs3_proc_setacls().

Also fix an issue discovered by Kevin Rudd, whereby the amount of memory
reserved for the acls in the xdr_buf->head was miscalculated, and causing
corruption.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-10 20:33:18 -04:00
Trond Myklebust
ef95d31e6d NFS: Fix misparsing of nfsv4 fs_locations attribute (take 2)
The changeset ea31a4437c (nfs: Fix
misparsing of nfsv4 fs_locations attribute) causes the mountpath that is
calculated at the beginning of try_location() to be clobbered when we
later strncpy a non-nul terminated hostname using an incorrect buffer
length.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2009-03-10 20:33:17 -04:00
Alexey Dobriyan
97afe47ac3 fs/Kconfig: move nfs out
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2009-01-22 13:16:00 +03:00
Nick Piggin
54566b2c15 fs: symlink write_begin allocation context fix
With the write_begin/write_end aops, page_symlink was broken because it
could no longer pass a GFP_NOFS type mask into the point where the
allocations happened.  They are done in write_begin, which would always
assume that the filesystem can be entered from reclaim.  This bug could
cause filesystem deadlocks.

The funny thing with having a gfp_t mask there is that it doesn't really
allow the caller to arbitrarily tinker with the context in which it can be
called.  It couldn't ever be GFP_ATOMIC, for example, because it needs to
take the page lock.  The only thing any callers care about is __GFP_FS
anyway, so turn that into a single flag.

Add a new flag for write_begin, AOP_FLAG_NOFS.  Filesystems can now act on
this flag in their write_begin function.  Change __grab_cache_page to
accept a nofs argument as well, to honour that flag (while we're there,
change the name to grab_cache_page_write_begin which is more instructive
and does away with random leading underscores).

This is really a more flexible way to go in the end anyway -- if a
filesystem happens to want any extra allocations aside from the pagecache
ones in ints write_begin function, it may now use GFP_KERNEL (rather than
GFP_NOFS) for common case allocations (eg.  ocfs2_alloc_write_ctxt, for a
random example).

[kosaki.motohiro@jp.fujitsu.com: fix ubifs]
[kosaki.motohiro@jp.fujitsu.com: fix fuse]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: <stable@kernel.org>		[2.6.28.x]
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Cleaned up the calling convention: just pass in the AOP flags
  untouched to the grab_cache_page_write_begin() function.  That
  just simplifies everybody, and may even allow future expansion of the
  logic.   - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-04 13:33:20 -08:00
Trond Myklebust
08cc36cbd1 Merge branch 'devel' into next 2008-12-30 16:51:43 -05:00
WANG Cong
46f72f57d2 fs/nfs/nfs4proc.c: make nfs4_map_errors() static
nfs4_map_errors() can become static.

Signed-off-by: WANG Cong <wangcong@zeuux.org>
Cc: J. Bruce Fields <bfields@fieldses.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-30 16:35:55 -05:00
Olga Kornievskaia
945b34a772 rpc: allow gss callbacks to client
This patch adds client-side support to allow for callbacks other than
AUTH_SYS.

Signed-off-by: Olga Kornievskaia <aglo@citi.umich.edu>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 16:18:34 -05:00
Andy Adamson
cf8cdbe5bd NFS: remove unused status from encode routines
Signed-off-by: Andy Adamson<andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 16:06:18 -05:00
Andy Adamson
d017931cff NFS: increment number of operations in each encode routine
Signed-off-by: Andy Adamson<andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 16:06:17 -05:00
Benny Halevy
49c2559e29 NFS: fix comment placement in nfs4xdr.c
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 16:06:16 -05:00
Andy Adamson
05d564fe00 NFS: fix tabs in nfs4xdr.c
Signed-off-by: Andy Adamson<andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 16:06:15 -05:00
Andy Adamson
6c0195a468 NFS: remove white space from nfs4xdr.c
Clean-up

Signed-off-by: Andy Adamson<andros@netapp.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 16:06:15 -05:00
Benny Halevy
374130770e nfs: remove incorrect usage of nfs4 compound response hdr.status
3 call sites look at hdr.status before returning success.
hdr.status must be zero in this case so there's no point in this.

Currently, hdr.status is correctly processed at decode_op_hdr time
if the op status cannot be decoded.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 16:06:14 -05:00
Benny Halevy
aadf615211 nfs: return compound hdr.status when there are no op replies
When there are no op replies encoded in the compound reply
hdr.status still contains the overall status of the compound
rpc.  This can happen, e.g., when the server returns a
NFS4ERR_MINOR_VERS_MISMATCH error.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 16:06:13 -05:00
Trond Myklebust
027b6ca021 NFSv4: Fix an infinite loop in the NFS state recovery code
Marten Gajda <marten.gajda@fernuni-hagen.de> states:

I tracked the problem down to the function nfs4_do_open_expired.
Within this function _nfs4_open_expired is called and may return
-NFS4ERR_DELAY. When a further call to _nfs4_open_expired is
executed and does not return -NFS4ERR_DELAY the "exception.retry"
variable is not reset to 0, causing the loop to iterate again
(and as long as err != -NFS4ERR_DELAY, probably forever)

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 16:04:13 -05:00
Peter Staubach
64672d55d9 optimize attribute timeouts for "noac" and "actimeo=0"
Hi.

I've been looking at a bugzilla which describes a problem where
a customer was advised to use either the "noac" or "actimeo=0"
mount options to solve a consistency problem that they were
seeing in the file attributes.  It turned out that this solution
did not work reliably for them because sometimes, the local
attribute cache was believed to be valid and not timed out.
(With an attribute cache timeout of 0, the cache should always
appear to be timed out.)

In looking at this situation, it appears to me that the problem
is that the attribute cache timeout code has an off-by-one
error in it.  It is assuming that the cache is valid in the
region, [read_cache_jiffies, read_cache_jiffies + attrtimeo].  The
cache should be considered valid only in the region,
[read_cache_jiffies, read_cache_jiffies + attrtimeo).  With this
change, the options, "noac" and "actimeo=0", work as originally
expected.

This problem was previously addressed by special casing the
attrtimeo == 0 case.  However, since the problem is only an off-
by-one error, the cleaner solution is address the off-by-one
error and thus, not require the special case.

    Thanx...

        ps

Signed-off-by: Peter Staubach <staubach@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:56 -05:00
Trond Myklebust
dc0b027dfa NFSv4: Convert the open and close ops to use fmode
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:56 -05:00
Trond Myklebust
7a50c60e46 NFS: Use delegations to optimise ACCESS calls
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:55 -05:00
Trond Myklebust
15860ab1d7 NFSv4: Ensure that we set the verifier when revalidating delegated dentries
This ensures that we don't have to look up the dentry again after we return
the delegation if we know that the directory didn't change.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:54 -05:00
Trond Myklebust
5584c30630 NFSv4: Clean up is_atomic_open()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:54 -05:00
Trond Myklebust
bd7bf9d540 NFSv4: Convert delegation->type field to fmode_t
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:53 -05:00
Trond Myklebust
9082a5cc1e NFSv4: Fix up delegation callbacks
Currently, the callback server is listening on IPv6 if it is enabled. This
means that IPv4 addresses will always be mapped.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:53 -05:00
Trond Myklebust
b7391f44f2 NFSv4: Return unreferenced delegations more promptly
If the client is not using a delegation, the right thing to do is to return
it as soon as possible. This helps reduce the amount of state the server
has to track, as well as reducing the potential for conflicts with other
clients.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:52 -05:00
Trond Myklebust
6411bd4a47 NFSv4: Clean up the asynchronous delegation return
Reuse the state management thread in order to return delegations when we
get a callback.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:51 -05:00
Trond Myklebust
b0d3ded1a2 NFSv4: Clean up nfs_expire_all_delegations()
Let the actual delegreturn stuff be run in the state manager thread rather
than allocating a separate kthread.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:50 -05:00
Trond Myklebust
0d62f85a81 NFSv4: Fix a BAD_SEQUENCEID condition.
We really shouldn't be resetting the sequence ids when doing state
expiration recovery, since we don't know if the server still remembers our
previous state owners. There are servers out there that do attempt to
preserve client state even if the lease has expired. Such a server would
only release that state if a conflicting OPEN request occurs.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:49 -05:00
Trond Myklebust
f3c76491e7 NFSv4: Don't exit the state management if there are still tasks to do
Fix up a potential race...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:48 -05:00
Trond Myklebust
e005e8041c NFSv4: Rename the state reclaimer thread
It is really a more general purpose state management thread at this point.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:48 -05:00
Trond Myklebust
707fb4b324 NFSv4: Clean up NFS4ERR_CB_PATH_DOWN error management...
Add a delegation cleanup phase to the state management loop, and do the
NFS4ERR_CB_PATH_DOWN recovery there.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:47 -05:00
Trond Myklebust
515d861177 NFSv4: Clean up the support for returning multiple delegations
Add a flag to mark delegations as requiring return, then run a garbage
collector. In the future, this will allow for more flexible delegation
management, where delegations may be marked for return if it turns out
that they are not being referenced.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:46 -05:00
Trond Myklebust
9e33bed552 NFSv4: Add recovery for individual stateids
NFSv4 defines a number of state errors which the client does not currently
handle. Among those we should worry about are:
  NFS4ERR_ADMIN_REVOKED - the server's administrator revoked our locks
  			  and/or delegations.
  NFS4ERR_BAD_STATEID - the client and server are out of sync, possibly
                        due to a delegation return racing with an OPEN
			request.
  NFS4ERR_OPENMODE - the client attempted to do something not sanctioned
  		     by the open mode of the stateid. Should normally just
		     occur as a result of a delegation return race.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:46 -05:00
Trond Myklebust
95d35cb4c4 NFSv4: Remove nfs_client->cl_sem
Now that we're using the flags to indicate state that needs to be
recovered, as well as having implemented proper refcounting and spinlocking
on the state and open_owners, we can get rid of nfs_client->cl_sem. The
only remaining case that was dubious was the file locking, and that case is
now covered by the nfsi->rwsem.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:45 -05:00
Trond Myklebust
19e03c570e NFSv4: Ensure that file unlock requests don't conflict with state recovery
The unlock path is currently failing to take the nfs_client->cl_sem read
lock, and hence the recovery path may see locks disappear from underneath
it.
Also ensure that it takes the nfs_inode->rwsem read lock so that it there
is no conflict with delegation recalls.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:44 -05:00
Trond Myklebust
65de872ed6 NFS: Remove the unnecessary argument to nfs4_wait_clnt_recover()
...and move some code around in order to clear out an unnecessary
forward declaration.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:44 -05:00
Trond Myklebust
fe1d81952e NFSv4: Ensure that nfs4_reclaim_open_state() doesn't depend on cl_sem
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:43 -05:00
Trond Myklebust
7eff03aec9 NFSv4: Add a recovery marking scheme for state owners
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:43 -05:00
Trond Myklebust
0f605b5600 NFSv4: Don't tell server we rebooted when not necessary
Instead of doing a full setclientid, try doing a RENEW call first.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:42 -05:00
Trond Myklebust
e598d843c0 NFSv4: Remove redundant RENEW calls if we know the lease has expired
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:42 -05:00
Trond Myklebust
b79a4a1b45 NFSv4: Fix state recovery when the client runs over the grace period
If the client for some reason is not able to recover all its state within
the time allotted for the grace period, and the server reboots again, the
client is not allowed to recover the state that was 'lost' using reboot
recovery.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:41 -05:00
Trond Myklebust
6dc9d57af9 NFSv4: Callers to nfs4_get_renew_cred() need to hold nfs_client->cl_lock
Ditto for nfs4_get_setclientid_cred().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:41 -05:00
Trond Myklebust
0286001430 NFSv4: Clean up for the state loss reclaimer
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:40 -05:00
Trond Myklebust
15c831bf1a NFS: Use atomic bitops when changing struct nfs_delegation->flags
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:39 -05:00
Trond Myklebust
86e8948998 NFSv4: Fix up the dereferencing of delegation->inode
Without an extra lock, we cannot just assume that the delegation->inode is
valid when we're traversing the rcu-protected nfs_client lists. Use the
delegation->lock to ensure that it is truly valid.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:39 -05:00
Trond Myklebust
343104308a NFSv4: Fix up another delegation related race
When we can update_open_stateid(), we need to be certain that we don't
race with a delegation return. While we could do this by grabbing the
nfs_client->cl_lock, a dedicated spin lock in the delegation structure
will scale better.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:38 -05:00
Chuck Lever
0cb2659b81 NLM: allow lockd requests from an unprivileged port
If the admin has specified the "noresvport" option for an NFS mount
point, the kernel's NFS client uses an unprivileged source port for
the main NFS transport.  The kernel's lockd client should use an
unprivileged port in this case as well.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:38 -05:00
Chuck Lever
50a737f86d NFS: "[no]resvport" mount option changes mountd client too
If the admin has specified the "noresvport" option for an NFS mount
point, the kernel's NFS client uses an unprivileged source port for
the main NFS transport.  The kernel's mountd client should use an
unprivileged port in this case as well.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:37 -05:00
Chuck Lever
d740351bf0 NFS: add "[no]resvport" mount option
The standard default security setting for NFS is AUTH_SYS.  An NFS
client connects to NFS servers via a privileged source port and a
fixed standard destination port (2049).  The client sends raw uid and
gid numbers to identify users making NFS requests, and the server
assumes an appropriate authority on the client has vetted these
values because the source port is privileged.

On Linux, by default in-kernel RPC services use a privileged port in
the range between 650 and 1023 to avoid using source ports of well-
known IP services.  Using such a small range limits the number of NFS
mount points and the number of unique NFS servers to which a client
can connect concurrently.

An NFS client can use unprivileged source ports to expand the range of
source port numbers, allowing more concurrent server connections and
more NFS mount points.  Servers must explicitly allow NFS connections
from unprivileged ports for this to work.

In the past, bumping the value of the sunrpc.max_resvport sysctl on
the client would permit the NFS client to use unprivileged ports.
Bumping this setting also changes the maximum port number used by
other in-kernel RPC services, some of which still required a port
number less than 1023.

This is exacerbated by the way source port numbers are chosen by the
Linux RPC client, which starts at the top of the range and works
downwards.  It means that bumping the maximum means all RPC services
requesting a source port will likely get an unprivileged port instead
of a privileged one.

Changing this setting effects all NFS mount points on a client.  A
sysadmin could not selectively choose which mount points would use
non-privileged ports and which could not.

Lastly, this mechanism of expanding the limit on the number of NFS
mount points was entirely undocumented.

To address the need for the NFS client to use a large range of source
ports without interfering with the activity of other in-kernel RPC
services, we introduce a new NFS mount option.  This option explicitly
tells only the NFS client to use a non-privileged source port when
communicating with the NFS server for one specific mount point.

This new mount option is called "resvport," like the similar NFS mount
option on FreeBSD and Mac OS X.  A sister patch for nfs-utils will be
submitted that documents this new option in nfs(5).

The default setting for this new mount option requires the NFS client
to use a privileged port, as before.  Explicitly specifying the
"noresvport" mount option allows the NFS client to use an unprivileged
source port for this mount point when connecting to the NFS server
port.

This mount option is supported only for text-based NFS mounts.

[ Sidebar: it is widely known that security mechanisms based on the
  use of privileged source ports are ineffective.  However, the NFS
  client can combine the use of unprivileged ports with the use of
  secure authentication mechanisms, such as Kerberos.  This allows a
  large number of connections and mount points while ensuring a useful
  level of security.

  Eventually we may change the default setting for this option
  depending on the security flavor used for the mount.  For example,
  if the mount is using only AUTH_SYS, then the default setting will
  be "resvport;" if the mount is using a strong security flavor such
  as krb5, the default setting will be "noresvport." ]

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
[Trond.Myklebust@netapp.com: Fixed a bug whereby nfs4_init_client()
was being called with incorrect arguments.]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:37 -05:00
Chuck Lever
542fcc334a NFS: move nfs_server flag initialization
Make it possible for the NFSv4 mount set up logic to pass mount option
flags down the stack to nfs_create_rpc_client().

This is immediately useful if we want NFS mount options to modulate
settings of the underlying RPC transport, but it may be useful at some
later point if other parts of the NFSv4 mount initialization logic
want to know what the mount options are.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:36 -05:00
Chuck Lever
4a01b8a4ee NFS: expand flags passed to nfs_create_rpc_client()
The nfs_create_rpc_client() function sets up an RPC client for an NFS
mount point.  Add an option that allows it to set up an RPC transport
from an unprivileged port.

Instead of having nfs_create_rpc_client()'s callers retain local
knowledge about how to set up an RPC client, create a couple of flag
arguments to control the use of RPC_CLNT_CREATE flags.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:35 -05:00
Chuck Lever
c5d120f8e8 NFS: introduce nfs_mount_info struct for calling nfs_mount()
Clean up: convert nfs_mount() to take a single data structure argument to make
it simpler to add more arguments.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:35 -05:00
Chuck Lever
146ec944bb NFS: Move declaration of nfs_mount() to fs/nfs/internal.h
Clean up:  The nfs_mount() function is not to be used outside of the
NFS client.  Move its public declaration to fs/nfs/internal.h.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:34 -05:00
Chuck Lever
7b5d2b98e1 NFS: rename nfs_path variable
Clean up: I'm about to move the declaration of nfs_mount into
fs/nfs/internal.h and include it in fs/nfs/nfsroot.c.  There's a
conflicting definition of nfs_path in fs/nfs/internal.h and
fs/nfs/nfsroot.c, so rename the private one.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:34 -05:00
Wu Fengguang
136221fc32 nfs: remove redundant tests on reading new pages
aops->readpages() and its NFS helper readpage_async_filler() will only
be called to do readahead I/O for newly allocated pages. So it's not
necessary to test for the always 0 dirty/uptodate page flags.

The removal of nfs_wb_page() call also fixes a readahead bug: the NFS
readahead has been synchronous since 2.6.23, because that call will
clear PG_readahead, which is the reminder for asynchronous readahead.

More background: the PG_readahead page flag is shared with PG_reclaim,
one for read path and the other for write path. clear_page_dirty_for_io()
unconditionally clears PG_readahead to prevent possible readahead residuals,
assuming itself to be always called in the write path. However, NFS is one
and the only exception in that it _always_ calls clear_page_dirty_for_io()
in the read path, i.e. for readpages()/readpage().

Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Wu Fengguang <wfg@linux.intel.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-12-23 15:21:30 -05:00
Harvey Harrison
be85940548 fs: replace NIPQUAD()
Using NIPQUAD() with NIPQUAD_FMT, %d.%d.%d.%d or %u.%u.%u.%u
can be replaced with %pI4

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-31 00:56:28 -07:00
David S. Miller
a1744d3bee Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/wireless/p54/p54common.c
2008-10-31 00:17:34 -07:00
Harvey Harrison
5b095d9892 net: replace %p6 with %pI6
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-29 12:52:50 -07:00
Harvey Harrison
1afa67f5e7 misc: replace NIP6_FMT with %p6 format specifier
The iscsi_ibft.c changes are almost certainly a bugfix as the
pointer 'ip' is a u8 *, so they never print the last 8 bytes
of the IPv6 address, and the eight bytes they do print have
a zero byte with them in each 16-bit word.

Other than that, this should cause no difference in functionality.

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-10-28 16:06:44 -07:00
Trond Myklebust
ae05f26940 NFS: Convert nfs_attr_generation_counter into an atomic_long
The most important property we need from nfs_attr_generation_counter is
monotonicity, which is not guaranteed by the current system of smp memory
barriers. We should convert it to an atomic_long_t, and drop the memory
barriers.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-28 15:21:40 -04:00
Alan Cox
526719ba51 Switch to a valid email address...
Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-27 08:40:17 -07:00
Miklos Szeredi
f696a3659f [PATCH] move executable checking into ->permission()
For execute permission on a regular files we need to check if file has
any execute bits at all, regardless of capabilites.

This check is normally performed by generic_permission() but was also
added to the case when the filesystem defines its own ->permission()
method.  In the latter case the filesystem should be responsible for
performing this check.

Move the check from inode_permission() inside filesystems which are
not calling generic_permission().

Create a helper function execute_ok() that returns true if the inode
is a directory or if any execute bits are present in i_mode.

Also fix up the following code:

 - coda control file is never executable
 - sysctl files are never executable
 - hfs_permission seems broken on MAY_EXEC, remove
 - hfsplus_permission is eqivalent to generic_permission(), remove

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2008-10-23 05:13:25 -04:00
Christoph Hellwig
440037287c [PATCH] switch all filesystems over to d_obtain_alias
Switch all users of d_alloc_anon to d_obtain_alias.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-10-23 05:13:01 -04:00
Al Viro
3516586a42 [PATCH] make O_EXCL in nd->intent.flags visible in nd->flags
New flag: LOOKUP_EXCL.  Set before doing the final step of pathname
resolution on the paths that have LOOKUP_CREATE and O_EXCL.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-10-23 05:12:56 -04:00
Linus Torvalds
52c6738b7f Merge git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  NFS: use correct fs type for v4 submounts and referrals
  Make nfs_file_cred more robust.
  NFS: Enable NFSv4 callback server to listen on AF_INET6 sockets
2008-10-20 09:39:20 -07:00
Rik van Riel
4f98a2fee8 vmscan: split LRU lists into anon & file sets
Split the LRU lists in two, one set for pages that are backed by real file
systems ("file") and one for pages that are backed by memory and swap
("anon").  The latter includes tmpfs.

The advantage of doing this is that the VM will not have to scan over lots
of anonymous pages (which we generally do not want to swap out), just to
find the page cache pages that it should evict.

This patch has the infrastructure and a basic policy to balance how much
we scan the anon lists and how much we scan the file lists.  The big
policy changes are in separate patches.

[lee.schermerhorn@hp.com: collect lru meminfo statistics from correct offset]
[kosaki.motohiro@jp.fujitsu.com: prevent incorrect oom under split_lru]
[kosaki.motohiro@jp.fujitsu.com: fix pagevec_move_tail() doesn't treat unevictable page]
[hugh@veritas.com: memcg swapbacked pages active]
[hugh@veritas.com: splitlru: BDI_CAP_SWAP_BACKED]
[akpm@linux-foundation.org: fix /proc/vmstat units]
[nishimura@mxp.nes.nec.co.jp: memcg: fix handling of shmem migration]
[kosaki.motohiro@jp.fujitsu.com: adjust Quicklists field of /proc/meminfo]
[kosaki.motohiro@jp.fujitsu.com: fix style issue of get_scan_ratio()]
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-20 08:50:25 -07:00
Andy Adamson
ec9a05c94c NFS: use correct fs type for v4 submounts and referrals
Signed-off-by: Andy Adamson<andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-17 13:06:48 -04:00
Neil Brown
504e518953 Make nfs_file_cred more robust.
As not all files have an associated open_context (e.g. device special
files), it is safest to test for the existence of the open context
before de-referencing it.

Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-17 13:06:45 -04:00
Chuck Lever
18de973530 NFS: Enable NFSv4 callback server to listen on AF_INET6 sockets
Allow the NFS callback server to listen for requests via an AF_INET6 or
AF_INET socket when IPv6 support is present in the kernel.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-17 13:06:41 -04:00
Trond Myklebust
6925bac120 Merge branch 'next' 2008-10-15 15:54:56 -04:00
Trond Myklebust
011935a0a7 NFS: Fix a resolution problem with nfs_inode->cache_change_attribute
The cache_change_attribute is used to decide whether or not a directory has
changed, in which case we may need to look it up again. Again, the use of
'jiffies' leads to an issue of resolution.

Once again, the fix is to change nfs_inode->cache_change_attribute, and
just make it a simple counter.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-14 19:24:50 -04:00
Trond Myklebust
4704f0e274 NFS: Fix the resolution problem with nfs_inode_attrs_need_update()
It appears that 'jiffies' timestamps do not have high enough resolution for
nfs_inode_attrs_need_update(). One problem is that a GETATTR can be
launched within < 1 jiffy of the last operation that updated the attribute.
Another problem is that RPC calls can take < 1 jiffy to execute.

We can fix this by switching the variables to use a simple global counter
that gets incremented every time we start another GETATTR call.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-14 19:23:17 -04:00
Trond Myklebust
921615f111 NFS: Changes to inode->i_nlinks must set the NFS_INO_INVALID_ATTR flag
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-14 19:23:07 -04:00
Linus Torvalds
8acd3a60bc Merge branch 'for-2.6.28' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.28' of git://linux-nfs.org/~bfields/linux: (59 commits)
  svcrdma: Fix IRD/ORD polarity
  svcrdma: Update svc_rdma_send_error to use DMA LKEY
  svcrdma: Modify the RPC reply path to use FRMR when available
  svcrdma: Modify the RPC recv path to use FRMR when available
  svcrdma: Add support to svc_rdma_send to handle chained WR
  svcrdma: Modify post recv path to use local dma key
  svcrdma: Add a service to register a Fast Reg MR with the device
  svcrdma: Query device for Fast Reg support during connection setup
  svcrdma: Add FRMR get/put services
  NLM: Remove unused argument from svc_addsock() function
  NLM: Remove "proto" argument from lockd_up()
  NLM: Always start both UDP and TCP listeners
  lockd: Remove unused fields in the nlm_reboot structure
  lockd: Add helper to sanity check incoming NOTIFY requests
  lockd: change nlmclnt_grant() to take a "struct sockaddr *"
  lockd: Adjust nlmsvc_lookup_host() to accomodate AF_INET6 addresses
  lockd: Adjust nlmclnt_lookup_host() signature to accomodate non-AF_INET
  lockd: Support non-AF_INET addresses in nlm_lookup_host()
  NLM: Convert nlm_lookup_host() to use a single argument
  svcrdma: Add Fast Reg MR Data Types
  ...
2008-10-14 12:31:14 -07:00
Steven Whitehouse
a447c09324 vfs: Use const for kernel parser table
This is a much better version of a previous patch to make the parser
tables constant. Rather than changing the typedef, we put the "const" in
all the various places where its required, allowing the __initconst
exception for nfsroot which was the cause of the previous trouble.

This was posted for review some time ago and I believe its been in -mm
since then.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: Alexander Viro <aviro@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-10-13 10:10:37 -07:00
Chuck Lever
5e2e7721f0 NFS: fix nfs_parse_ip_address() corner case
Bruce observed that nfs_parse_ip_address() will successfully parse an
IPv6 address that looks like this:

  "::1%"

A scope delimiter is present, but there is no scope ID following it.
This is harmless, as it would simply set the scope ID to zero.  However,
in some cases we would like to flag this as an improperly formed
address.

We are now also careful to reject addresses where garbage follows the
address (up to the length of the string), instead of ignoring the
non-address characters; and where the scope ID is nonsense (not a valid
device name, but also not numeric).  Before, both of these cases would
result in a harmless zero scope ID.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10 14:41:51 -04:00
J. Bruce Fields
456018d791 NFS: Cleanup nfs_set_port
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-10 14:41:50 -04:00
Trond Myklebust
03254e65a6 NFS: Fix attribute updates
This fixes a regression seen when running the Connectathon testsuite
against an ext3 filesystem. The reason was that the inode was constantly
being marked as 'just updated' by the jiffy wraparound test.
This again meant that newer GETATTR calls were failing to pass the
nfs_inode_attrs_need_update() test unless the changes caused a ctime update
on the server, since they were perceived as having been started before the
latest inode update.

Given that nfs_inode_attrs_need_update() already checks for wraparound
of nfsi->last_updated, we can drop the buggy "protection" in
nfs_update_inode().

Also make a slight micro-optimisation of nfs_inode_attrs_need_update(): we
are more often going to see time_after(fattr->time_start, nfsi->last_updated)
be true, rather than seeing an update of ctime/size, so put that test
first to ensure that we optimise away the ctime/size tests.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-09 13:34:07 -04:00
Trond Myklebust
d7fb120774 NFS: Don't use range_cyclic for data integrity syncs
It is more efficient to write linearly starting from the beginning of the
file.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-07 18:19:05 -04:00
Steve Dickson
8491945f11 NFS: Client mounts hang when exported directory do not exist
This patch fixes a regression that was introduced by the string based mounts.

nfs_mount() statically returns -EACCES for every error returned
by the remote mounted. This is incorrect because -EACCES is
an non-fatal error to the mount.nfs command. This error causes
mount.nfs to retry the mount even in the case when the exported
directory does not exist.

This patch maps the errors returned by the remote mountd into
valid errno values, exactly how it was done pre-string based
mounts. By returning the correct errno enables mount.nfs
to do the right thing.

Signed-off-by: Steve Dickson <steved@redhat.com>
[Trond.Myklebust@netapp.com: nfs_stat_to_errno() now correctly returns
 negative errors, so remove the sign change.]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-07 18:19:01 -04:00
J. Bruce Fields
ea31a4437c nfs: Fix misparsing of nfsv4 fs_locations attribute
The code incorrectly assumes here that the server name (or ip address)
is null-terminated.  This can cause referrals to fail in some cases.

Also support ipv6 addresses.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-07 18:17:47 -04:00
J. Bruce Fields
f0c929251e nfs: prepare to share nfs_set_port
We plan to use this function elsewhere.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-07 18:17:36 -04:00
J. Bruce Fields
460cdbc832 nfs: replace while loop by for loops in nfs_follow_referral
Whoever wrote this had a bizarre allergy to for loops.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-07 18:17:20 -04:00
J. Bruce Fields
4ada29d5c4 nfs: break up nfs_follow_referral
This function is a little longer and more deeply nested than necessary.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-07 18:16:40 -04:00
EG Keizer
37ca8f5c60 nfs: authenticated deep mounting
Allow mount to do authenticated mounts below the root of the exported tree.
The wording in RFC 2623, sec 2.3.2. allows fsinfo with UNIX authentication
on the root of the export. Mounts are not always done on the root
of the exported tree. Especially autoumounts often mount below the root of
the exported tree.
Some server implementations (justly) require full authentication for the
so-called deep mounts. The old code used AUTH_SYS only. This caused deep
mounts to fail on systems requiring stronger authentication..
The client should try both authentication types and use the first one that
succeeds.
This method was already partially implemented. This patch completes
the implementation for NFS2 and NFS3.
This patch was developed to allow Debian systems to automount home directories
on Solaris servers with krb5 authentication.

Tested on kernel 2.6.24-etchnhalf.1

Signed-off-by: E.G. Keizer <keie@few.vu.nl>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-07 18:16:22 -04:00
Jeff Layton
f25b874d39 NFS: missing nfs_fattr_init in nfs3_proc_getacl and nfs3_proc_setacls (resend #2)
The fattrs used in the NFSv3 getacl/setacl calls are not being properly
initialized. This occasionally causes nfs_update_inode to fall into
NFSv4 specific codepaths when handling post-op attrs from these calls.

Thanks to Cai Qian for noticing the spurious NFSv4 messages in debug
output from a v3 mount...

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-07 18:16:22 -04:00
J. Bruce Fields
f200c11c25 nfs: remove an obsolete nfs_flock comment
We *do* now allow bsd flocks over nfs.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-07 18:16:21 -04:00
Denis V. Lunev
44d5759d3f nfs: BUG_ON in nfs_follow_mountpoint
Unfortunately, BUG_ON(IS_ROOT(dentry)) can happen inside
nfs_follow_mountpoint with NFS running Fedora 8 using a
specific setup.
https://bugzilla.redhat.com/show_bug.cgi?id=458622

So, the situation should be handled on NFS client gracefully.

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Trond Myklebust <Trond.Myklebust@netapp.com>
CC: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-07 18:15:16 -04:00
Denis V. Lunev
fd08d7e9d1 nfs: ERR_PTR is expected on failure from nfs_do_clone_mount
Replace NULL with ERR_PTR(-EINVAL).

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-07 18:14:34 -04:00
Adrian Bunk
bb8a3b53c2 fix fs/nfs/nfsroot.c compilation
This patch fixes the following compile error caused by
commit f9247273cb
(UFS: add const to parser token tabl):

<--  snip  -->

...
  CC      fs/nfs/nfsroot.o
/home/bunk/linux/kernel-2.6/git/linux-2.6/fs/nfs/nfsroot.c:130: error: tokens causes a section type conflict
make[3]: *** [fs/nfs/nfsroot.o] Error 1

<--  snip  -->

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-07 17:59:49 -04:00
Trond Myklebust
691beb13cd NFS: Allow concurrent inode revalidation
Currently, if two processes are both trying to revalidate metadata for the
same inode, they will find themselves being serialised. There is no good
justification for this now that we have improved our ability to detect
stale attribute data, so we should remove that serialisation.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-07 17:59:43 -04:00
Trond Myklebust
2f28ea614f NFS: Fix up nfs_setattr_update_inode()
Ensure that it sets the inode metadata under the correct spinlock.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-07 17:41:46 -04:00
Trond Myklebust
076f1fc94c NFS: Don't clear nfsi->cache_validity in nfs_check_inode_attributes()
If we're merely checking the inode attributes because we suspect that the
'updated' attributes returned by the RPC call are stale, then we shouldn't
be doing weak cache consistency updates or clearing the cache_validity
flags.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-07 17:41:33 -04:00
Trond Myklebust
4dc05efb86 NFS: Convert __nfs_revalidate_inode() to use nfs_refresh_inode()
In the case where there are parallel RPC calls to the same inode, we may
receive stale metadata due to the lack of ordering, hence the sanity
checking of metadata in nfs_refresh_inode().
Currently, __nfs_revalidate_inode() is calling nfs_update_inode() directly,
without any further sanity checks, and hence may end up setting the inode
up with stale metadata.

Fix is to use nfs_refresh_inode() instead of nfs_update_inode().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-07 17:41:17 -04:00
Trond Myklebust
d65f557f39 NFS: Fix nfs_post_op_update_inode_force_wcc()
If we believe that the attributes are old (see nfs_refresh_inode()), then
we shouldn't force an update.
Also ensure that we hold the inode->i_lock across attribute checks and the
call to nfs_refresh_inode_locked() to ensure that we don't race with other
attribute updates.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-07 17:41:00 -04:00
Trond Myklebust
a10ad17630 NFS: Fix the NFS attribute update
Currently nfs_refresh_inode() will only update the inode metadata if it
sees that the RPC call that returned the nfs_fattr was started
after the last update of the inode. This means that if we have parallel
RPC calls to the same inode (when sending WRITE calls, for instance), we
may often miss updates.

This patch attempts to recover those missed updates by also accepting
them if the ctime in the nfs_fattr is more recent than the inode's
cached ctime.
It also recovers the case where the file size has increased, but the
ctime has not been updated due to limited ctime resolution.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-07 17:34:17 -04:00
Trond Myklebust
870a5be8b9 NFS: Clean up nfs_refresh_inode() and nfs_post_op_update_inode()
Try to avoid taking and dropping the inode->i_lock more than once. Do so by
moving the code in nfs_refresh_inode() that needs to be done under the
spinlock into a function nfs_refresh_inode_locked(), and then having both
nfs_refresh_inode() and nfs_post_op_update_inode() call it directly.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-07 17:29:49 -04:00
Trond Myklebust
7973c1f15a NFS: Add mount options for controlling the lookup cache
Add the following NFS-specific mount options to the parser.

    -o lookupcache=all          /* Default: cache positive & negative
                                   dentries */
    -o lookupcache=pos[itive]   /* Don't cache negative dentries */
    -o lookupcache=none         /* Strict revalidation of all dentries */

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-07 17:23:57 -04:00
Trond Myklebust
ff3525a539 NFS: Don't apply NFS_MOUNT_FLAGMASK to text-based mounts
The point of introducing text-based mounts was to allow us to add
functionality without having to worry about legacy binary mount formats.
The mask should be there in order to ensure that binary formats don't start
enabling features that they cannot support. There is no justification for
applying it to the text mount path.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-07 17:23:56 -04:00
Trond Myklebust
4eec952e42 NFS: Add options for finer control of the lookup cache
Add the flag NFS_MOUNT_LOOKUP_CACHE_NONEG to turn off the caching of
negative dentries. In reality what we do is to force
nfs_lookup_revalidate() to always discard negative dentries.

Add the flag NFS_MOUNT_LOOKUP_CACHE_NONE for enforcing stricter
revalidation of dentries. It forces the revalidate code to always do a
lookup instead of just checking the cached mtime of the parent directory.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-07 17:22:20 -04:00
Trond Myklebust
1daef0a868 NFS: Clean up nfs_sb_active/nfs_sb_deactive
Instead of causing umount requests to block on server->active_wq while the
asynchronous sillyrename deletes are executing, we can use the sb->s_active
counter to obtain a reference to the super_block, and then release that
reference in nfs_async_unlink_release().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-06 20:08:26 -04:00
Trond Myklebust
d5e66348bb NFS: Fix nfs_file_llseek()
After the BKL removal patches were applied to the rest of the NFS code, the
BKL protection in nfs_file_llseek() is no longer sufficient to ensure that
inode->i_size is read safely in generic_file_llseek_unlocked().

In order to fix the situation, we either have to replace the naked read of
inode->i_size in generic_file_llseek_unlocked() with i_size_read(), or the
whole thing needs to be executed under the inode->i_lock;
In order to avoid disrupting other filesystems, avoid touching
generic_file_llseek_unlocked() for now...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-10-06 20:08:26 -04:00
Chuck Lever
e851db5b05 SUNRPC: Add address family field to svc_serv data structure
Introduce and initialize an address family field in the svc_serv structure.

This field will determine what family to use for the service's listener
sockets and what families are advertised via the local rpcbind daemon.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-09-29 17:56:56 -04:00
Chuck Lever
af904deaf6 NFS: Restore missing hunk in NFS mount option parser
Automounter maps can contain mount options valid for other NFS
implementations but not for Linux.  The Linux automounter uses the
mount command's "-s" command line option ("s" for "sloppy") so that
mount requests containing such options are not rejected.

Commit f45663ce5f attempted to address a
known regression with text-based NFS mount option parsing.  Unrecognized
mount options would cause mount requests to fail, even if the "-s"
option was used on the mount command line.

Unfortunately, this commit was not complete as submitted.  It adds a
new mount option, "sloppy".  But it is missing a hunk, so it now allows
NFS mounts with unrecognized mount options, even if the "sloppy" option
is not present.  This could be a problem if a required critical mount
option such as "sync" is misspelled, for example, and is considered a
regression from 2.6.26.

This patch restores the missing hunk.  Now, the default behavior of
text-based NFS mount options is as before: any unrecognized mount option
will cause the mount to fail.

Please include this in 2.6.27-rc.

Thanks to Neil Brown for reporting this.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Acked-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-09-08 15:35:19 -07:00
Linus Torvalds
1a3f7d98e5 Revert "UFS: add const to parser token table"
This reverts commit f9247273cb (and
fb2e405fc1 - "fix fs/nfs/nfsroot.c
compilation" - that fixed a missed conversion).

The changes cause problems for at least the sparc build.  Let's re-do
them when the exact issues are resolved.

Requested-by: Andrew Morton <akpm@linux-foundation.org>
Requested-by: Steven Whitehouse <swhiteho@redhat.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-08-04 16:50:38 -07:00
Al Viro
8d66bf5481 [PATCH] pass struct path * to do_add_mount()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-08-01 11:25:32 -04:00
Trond Myklebust
744d18dbfa NFS: Ensure we call nfs_sb_deactive() after releasing the directory inode
In order to avoid the "Busy inodes after unmount" error message, we need to
ensure that nfs_async_unlink_release() releases the super block after the
call to nfs_free_unlinkdata().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-27 18:20:51 -04:00
Marc Zyngier
31c9446993 nfs_remount oops when rebooting + possible fix
Jeff, Trond,

The commit

48b605f83c (NFS: implement option checking
when remounting NFS filesystems (resend))

generate an Oops on my platform when rebooting while its root FS on
an NFS share (NFSv3, TCP) :

Unmounting local filesystems...done.
Unable to handle kernel NULL pointer dereference at virtual address 00000000
pgd = c3d00000
[00000000] *pgd=a3d72031, *pte=00000000, *ppte=00000000
Internal error: Oops: 17 [#1]
Modules linked in: cpufreq_powersave cpufreq_ondemand cpufreq_userspace cpufreq_conservative ext3 jbd sd_mod pata_pcmcia libata scsi_mod pcmcia loop firmware_class pxafb cfbcopyarea cfbimgblt cfbfillrect pxa2xx_cs pxa2xx_core pcmcia_core snd_pxa2xx_ac97 snd_ac97_codec ac97_bus snd_pxa2xx_pcm snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd isp116x_hcd soundcore rtc_sa1100 snd_page_alloc pxa25x_udc usbcore rtc_ds1307 rtc_core
CPU: 0    Not tainted  (2.6.26-03414-g33af79d-dirty #15)
PC is at nfs_remount+0x40/0x264
LR is at do_remount_sb+0x158/0x194
pc : [<c00bbf54>]    lr : [<c0076c40>]    psr: 60000013
sp : c2dd1e70  ip : c2dd1e98  fp : c2dd1e94
r10: 00000040  r9 : c3d17000  r8 : c3c3fc40
r7 : 00000000  r6 : 00000000  r5 : c3d2b200  r4 : 00000000
r3 : 00000003  r2 : 00000000  r1 : c2dd1e9c  r0 : c3c3fc00
Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 0000397f  Table: a3d00000  DAC: 00000015
Process mount (pid: 1462, stack limit = 0xc2dd0270)
Stack: (0xc2dd1e70 to 0xc2dd2000)
1e60:                                     00000000 c3c3fc00 00000000 00000000
1e80: c3c3fc40 c3d17000 c2dd1ebc c2dd1e98 c0076c40 c00bbf20 c01c61e4 00000001
1ea0: c2dd1ebc 00000001 c3c3fc00 c2dd1ef0 c2dd1ee4 c2dd1ec0 c008c6d8 c0076af4
1ec0: 00000021 00000040 c2dd1ef0 c3d77000 c3eaa000 00000000 c2dd1f6c c2dd1ee8
1ee0: c008d1bc c008c5f8 00000000 c2dd0000 c3c0c320 c3805b38 c002064c 0001f820
1f00: 0001f810 00000001 00000001 00000000 c2dd0000 00000000 c2dd1f34 c2dd1f28
1f20: c005ead8 c005e6f8 c2dd1f44 c2dd1f38 c005eaf8 c005ead0 c2dd1f6c c2dd1f48
1f40: c008ae3c 00000000 c3d77000 0001f810 c0ed0021 c0020ca8 c2dd0000 00000000
1f60: c2dd1fa4 c2dd1f70 c008d2d4 c008d0bc 00000000 0001f810 c2dd1f9c c3eaa000
1f80: c3d17000 00000000 00000000 be8b6aa8 be8b6ad0 00000015 00000000 c2dd1fa8
1fa0: c0020b00 c008d254 00000000 be8b6aa8 0001f810 0001f820 0001f830 c0ed0021
1fc0: 00000000 be8b6aa8 be8b6ad0 00000015 00000000 be8b6ad0 0001f810 be8b6aa8
1fe0: 0001f810 be8b6964 0000aab8 40125124 60000010 0001f810 00000000 00000000
Backtrace:
[<c00bbf14>] (nfs_remount+0x0/0x264) from [<c0076c40>] (do_remount_sb+0x158/0x194)
  r9:c3d17000 r8:c3c3fc40 r7:00000000 r6:00000000 r5:c3c3fc00
r4:00000000
[<c0076ae8>] (do_remount_sb+0x0/0x194) from [<c008c6d8>] (do_remount+0xec/0x118)
  r6:c2dd1ef0 r5:c3c3fc00 r4:00000001
[<c008c5ec>] (do_remount+0x0/0x118) from [<c008d1bc>] (do_mount+0x10c/0x198)
[<c008d0b0>] (do_mount+0x0/0x198) from [<c008d2d4>] (sys_mount+0x8c/0xd4)
[<c008d248>] (sys_mount+0x0/0xd4) from [<c0020b00>] (ret_fast_syscall+0x0/0x2c)
  r7:00000015 r6:be8b6ad0 r5:be8b6aa8 r4:00000000
Code: 0a000086 ea000006 e3530003 8a000004 (e5923000)
---[ end trace 55e1b689cf8c8a6a ]---
------------[ cut here ]------------
WARNING: at kernel/exit.c:966 do_exit+0x3c/0x628()
Modules linked in: cpufreq_powersave cpufreq_ondemand cpufreq_userspace cpufreq_conservative ext3 jbd sd_mod pata_pcmcia libata scsi_mod pcmcia loop firmware_class pxafb cfbcopyarea cfbimgblt cfbfillrect pxa2xx_cs pxa2xx_core pcmcia_core snd_pxa2xx_ac97 snd_ac97_codec ac97_bus snd_pxa2xx_pcm snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd isp116x_hcd soundcore rtc_sa1100 snd_page_alloc pxa25x_udc usbcore rtc_ds1307 rtc_core
[<c0025168>] (dump_stack+0x0/0x14) from [<c0032154>] (warn_on_slowpath+0x4c/0x68)
[<c0032108>] (warn_on_slowpath+0x0/0x68) from [<c003531c>] (do_exit+0x3c/0x628)
  r6:0000000b r5:c3c3dc80 r4:c2dd0000
[<c00352e0>] (do_exit+0x0/0x628) from [<c0025004>] (die+0x2b0/0x30c)
[<c0024d54>] (die+0x0/0x30c) from [<c00270bc>] (__do_kernel_fault+0x6c/0x80)
[<c0027050>] (__do_kernel_fault+0x0/0x80) from [<c00272e0>] (do_page_fault+0x210/0x230)
  r7:c3fa7118 r6:c3c3dc80 r5:c3d166a8 r4:00010000
[<c00270d0>] (do_page_fault+0x0/0x230) from [<c00201ec>] (do_DataAbort+0x3c/0xa0)
[<c00201b0>] (do_DataAbort+0x0/0xa0) from [<c002064c>] (__dabt_svc+0x4c/0x60)
Exception stack(0xc2dd1e28 to 0xc2dd1e70)
1e20:                   c3c3fc00 c2dd1e9c 00000000 00000003 00000000 c3d2b200
1e40: 00000000 00000000 c3c3fc40 c3d17000 00000040 c2dd1e94 c2dd1e98 c2dd1e70
1e60: c0076c40 c00bbf54 60000013 ffffffff
  r8:c3c3fc40 r7:00000000 r6:00000000 r5:c2dd1e5c r4:ffffffff
[<c00bbf14>] (nfs_remount+0x0/0x264) from [<c0076c40>] (do_remount_sb+0x158/0x194)
  r9:c3d17000 r8:c3c3fc40 r7:00000000 r6:00000000 r5:c3c3fc00
r4:00000000
[<c0076ae8>] (do_remount_sb+0x0/0x194) from [<c008c6d8>] (do_remount+0xec/0x118)
  r6:c2dd1ef0 r5:c3c3fc00 r4:00000001
[<c008c5ec>] (do_remount+0x0/0x118) from [<c008d1bc>] (do_mount+0x10c/0x198)
[<c008d0b0>] (do_mount+0x0/0x198) from [<c008d2d4>] (sys_mount+0x8c/0xd4)
[<c008d248>] (sys_mount+0x0/0xd4) from [<c0020b00>] (ret_fast_syscall+0x0/0x2c)
  r7:00000015 r6:be8b6ad0 r5:be8b6aa8 r4:00000000
---[ end trace 55e1b689cf8c8a6a ]---
/etc/rc6.d/S60umountroot: line 17:  1462 Segmentation fault      mount $MOUNT_FORCE_OPT -n -o remount,ro -t dummytype dummydev / 2> /dev/null

The new super.c:nfs_remount function doesn't check the validity of the
options/options4 pointers. Unfortunately, this seems to happend.
The obvious patch seems to check the pointers, and not to do anything if
the happend to be NULL.

Tested on an XScale PXA255 system, latest git.

Regards,

	M.

Signed-off-by: Marc Zyngier <marc.zyngier@altran.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-27 18:20:41 -04:00
Al Viro
e6305c43ed [PATCH] sanitize ->permission() prototype
* kill nameidata * argument; map the 3 bits in ->flags anybody cares
  about to new MAY_... ones and pass with the mask.
* kill redundant gfs2_iop_permission()
* sanitize ecryptfs_permission()
* fix remaining places where ->permission() instances might barf on new
  MAY_... found in mask.

The obvious next target in that direction is permission(9)

folded fix for nfs_permission() breakage from Miklos Szeredi <mszeredi@suse.cz>

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-07-26 20:53:14 -04:00
Alexey Dobriyan
51cc50685a SL*B: drop kmem cache argument from constructor
Kmem cache passed to constructor is only needed for constructors that are
themselves multiplexeres.  Nobody uses this "feature", nor does anybody uses
passed kmem cache in non-trivial way, so pass only pointer to object.

Non-trivial places are:
	arch/powerpc/mm/init_64.c
	arch/powerpc/mm/hugetlbpage.c

This is flag day, yes.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Jon Tollefson <kniht@linux.vnet.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Matt Mackall <mpm@selenic.com>
[akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c]
[akpm@linux-foundation.org: fix mm/slab.c]
[akpm@linux-foundation.org: fix ubifs]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-26 12:00:07 -07:00
Adrian Bunk
fb2e405fc1 fix fs/nfs/nfsroot.c compilation
This fixes the following compile error caused by commit
f9247273cb ("UFS: add const to parser
token table"):

    CC      fs/nfs/nfsroot.o
  /home/bunk/linux/kernel-2.6/git/linux-2.6/fs/nfs/nfsroot.c:130: error: tokens causes a section type conflict
  make[3]: *** [fs/nfs/nfsroot.o] Error 1

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-24 17:32:41 -07:00
Trond Myklebust
cadc723cc1 Merge branch 'bkl-removal' into next 2008-07-15 18:34:58 -04:00
Trond Myklebust
e89e896d31 Merge branch 'devel' into next
Conflicts:

	fs/nfs/file.c

Fix up the conflict with Jon Corbet's bkl-removal tree
2008-07-15 18:34:16 -04:00
Trond Myklebust
f839c4c199 NFSv4: Remove BKL from the nfsv4 state recovery
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-15 18:10:57 -04:00
Trond Myklebust
c3cc8c019c NFS: Remove BKL from the readdir code
Page accesses are serialised using the page locks, whereas all attribute
updates are serialised using the inode->i_lock.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-15 18:10:56 -04:00
Trond Myklebust
76566991f9 NFS: Remove BKL from the symlink code
Page cache accesses are serialised using page locks, whereas attribute
updates are serialised using inode->i_lock.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-15 18:10:56 -04:00
Trond Myklebust
52e2e8d37e NFS: Remove BKL from the sillydelete operations
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-15 18:10:55 -04:00
Trond Myklebust
bd9bb454b7 NFS: Remove the BKL from the rename, rmdir and unlink operations
Attribute updates are safe, and dentry operations are protected using VFS
level locks. Defer removing the BKL from sillyrename until a separate
patch.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-15 18:10:55 -04:00
Trond Myklebust
fc0f684c21 NFS: Remove BKL from NFS lookup code
All dentry-related operations are already BKL-safe, since they are
protected by the VFS locking. No extra locks should be needed in the NFS
code.

In the case of nfs_revalidate_inode(), we're only doing an attribute
update (protected by the inode->i_lock).
In the case of nfs_lookup(), we're instantiating a new dentry, so there
should be no contention possible until after we call d_materialise_unique.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-15 18:10:54 -04:00
Trond Myklebust
fc81af535e NFS: Remove the BKL from nfs_link()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-15 18:10:54 -04:00
Trond Myklebust
f1e2eda235 NFS: Remove the BKL from the inode creation operations
nfs_instantiate() does not require the BKL, neither do the attribute
updates or the RPC code.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-15 18:10:53 -04:00
Trond Myklebust
bba67e0e3f NFS: Remove BKL usage from open()
All the NFSv4 stateful operations are already protected by other locks (in
particular by the rpc_sequence locks.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-15 18:10:53 -04:00
Trond Myklebust
b6a2e569e2 NFS: Remove BKL usage from the write path
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-15 18:10:52 -04:00
Trond Myklebust
4d80f2ecd5 NFS: Remove the BKL from the permission checking code
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-15 18:10:52 -04:00
Trond Myklebust
fa6dc9dc59 NFS: Remove attribute update related BKL references
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-15 18:10:51 -04:00
Trond Myklebust
a3d01454bc NFS: Remove BKL requirement from attribute updates
The main problem is dealing with inode->i_size: we need to set the
inode->i_lock on all attribute updates, and so vmtruncate won't cut it.
Make an NFS-private version of vmtruncate that has the necessary locking
semantics.

The result should be that the following inode attribute updates are
protected by inode->i_lock
	nfsi->cache_validity
	nfsi->read_cache_jiffies
	nfsi->attrtimeo
	nfsi->attrtimeo_timestamp
	nfsi->change_attr
	nfsi->last_updated
	nfsi->cache_change_attribute
	nfsi->access_cache
	nfsi->access_cache_entry_lru
	nfsi->access_cache_inode_lru
	nfsi->acl_access
	nfsi->acl_default
	nfsi->nfs_page_tree
	nfsi->ncommit
	nfsi->npages
	nfsi->open_files
	nfsi->silly_list
	nfsi->acl
	nfsi->open_states
	inode->i_size
	inode->i_atime
	inode->i_mtime
	inode->i_ctime
	inode->i_nlink
	inode->i_uid
	inode->i_gid

The following is protected by dir->i_mutex
	nfsi->cookieverf

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-15 18:10:51 -04:00
Trond Myklebust
1b83d70703 NFS: Protect inode->i_nlink updates using inode->i_lock
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-15 18:10:50 -04:00
Jonathan Corbet
2fceef397f Merge commit 'v2.6.26' into bkl-removal 2008-07-14 15:29:34 -06:00
Chuck Lever
f45663ce5f NFS: Allow either strict or sloppy mount option parsing
The kernel's NFS client mount option parser currently doesn't allow
unrecognized or incorrect mount options.  This prevents misspellings or
incorrectly specified mount options from possibly causing silent data
corruption.

However, NFS mount options are not standardized, so different operating
systems can use differently spelled mount options to support similar
features, or can support mount options which no other operating system
supports.

"Sloppy" mount option parsing, which allows the parser to ignore any
option it doesn't recognize, is needed to support automounters that often
use maps that are shared between heterogenous operating systems.

The legacy mount command ignores the validity of the values of mount
options entirely, except for the "sec=" and "proto=" options.  If an
incorrect value is specified, the out-of-range value is passed to the
kernel; if a value is specified that contains non-numeric characters,
it appears as though the legacy mount command sets that option to zero
(probably incorrect behavior in general).

In any case, this sets a precedent which we will partially follow for
the kernel mount option parser:

	+ if "sloppy" is not set, the parser will be strict about both
	  unrecognized options (same as legacy) and invalid option
	  values (stricter than legacy)

	+ if "sloppy" is set, the parser will ignore unrecognized
	  options and invalid option values (same as legacy)

An "invalid" option value in this case means that either the type
(integer, short, or string) or sign (for integer values) of the specified
value is incorrect.

This patch does two things: it changes the NFS client's mount option
parsing loop so that it parses the whole string instead of failing at
the first unrecognized option or invalid option value.  An unrecognized
option or an invalid option value cause the option to be skipped.

Then, the patch adds a "sloppy" mount option that allows the parsing
to succeed anyway if there were any problems during parsing.  When
parsing a set of options is complete, if there are errors and "sloppy"
was specified, return success anyway.  Otherwise, only return success
if there are no errors.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:44 -04:00
Chuck Lever
6738b2512b NFS4: Set security flavor default for NFSv4 mounts like other defaults
Set the default security flavor when we set the other mount option
default values for NFSv4.  This cleans up the NFSv4 mount option parsing
path to look like the NFSv2/v3 one.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:43 -04:00
Chuck Lever
dd07c94750 NFS: Set security flavor default for NFSv2/3 mounts like other defaults
Set the default security flavor when we set the other mount option default
values.  After this change, only the legacy user-space mount path needs to
set the NFS_MOUNT_SECFLAVOUR flag.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:42 -04:00
Chuck Lever
01060c896e NFS: Refactor logic for parsing NFS security flavor mount options
Clean up: Refactor the NFS mount option parsing function to extract the
security flavor parsing logic into a separate function.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:41 -04:00
Chuck Lever
0e0cab744b NFS: use documenting macro constants for initializing ac{reg, dir}{min, max}
Clean up.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:40 -04:00
Chuck Lever
ed596a8adb NFS: Move the nfs_set_port() call out of nfs_parse_mount_options()
The remount path does not need to set the port in the server address.
Since it's not really a part of option parsing, move the nfs_set_port()
call to nfs_parse_mount_options()'s callers.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:39 -04:00
Trond Myklebust
259875efed NFS: set transport defaults after mount option parsing is finished
Move the UDP/TCP default timeo/retrans settings for text mounts to
nfs_init_timeout_values(), which was were they were always being
initialised (and sanity checked) for binary mounts.
Document the default timeout values using appropriate #defines.

Ensure that we initialise and sanity check the transport protocols that
may have been specified by the user.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:38 -04:00
Jeff Layton
5afc597c5f nfs4: fix potential race with rapid nfs_callback_up/down cycle
If the nfsv4 callback thread is rapidly brought up and down, it's
possible that nfs_callback_svc might never get a chance to run. If
this happens, the cleanup at thread exit might never occur, throwing
the refcounting off and nfs_callback_info in an incorrect state.

Move the clean functions into nfs_callback_down. Also change the
nfs_callback_info struct to track the svc_rqst rather than svc_serv
since we need to know that to call svc_exit_thread.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:32 -04:00
Jeff Layton
ee84dfc454 nfs4: remove BKL from nfs_callback_up and nfs_callback_down
The nfs_callback_mutex is sufficient protection.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:31 -04:00
Benny Halevy
77e03677ac nfs: initialize timeout variable in nfs4_proc_setclientid_confirm
gcc (4.3.0) rightfully warns about this:
/usr0/export/dev/bhalevy/git/linux-pnfs-bh-nfs41/fs/nfs/nfs4proc.c: In function nfs4_proc_setclientid_confirm:
/usr0/export/dev/bhalevy/git/linux-pnfs-bh-nfs41/fs/nfs/nfs4proc.c:2936: warning: timeout may be used uninitialized in this function

nfs4_delay that's passed a pointer to 'timeout' is looking at its value
and sets it up to some value in the range: NFS4_POLL_RETRY_MIN..NFS4_POLL_RETRY_MAX
	if (*timeout <= 0)
		*timeout = NFS4_POLL_RETRY_MIN;
	if (*timeout > NFS4_POLL_RETRY_MAX)
		*timeout = NFS4_POLL_RETRY_MAX;

Therefore it will end up set to some sane, though rather indeterministic, value.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:30 -04:00
Chuck Lever
d8e7748ab8 NFS: handle interface identifiers in incoming IPv6 addresses
Add support in the kernel NFS client's address parser for interface
identifiers.

IPv6 link-local addresses require an additional "interface identifier",
which is a network device name or an integer that indexes the array of
local network interfaces.  They are suffixed to the address with a '%'.
For example:

	fe80::215:c5ff:fe3b:e1b2%2

indicates an interface index of 2.  Or

	fe80::215:c5ff:fe3b:e1b2%eth0

indicates that requests should be routed through the eth0 device.
Without the interface ID, link-local addresses are not usable for NFS.

Both the kernel NFS client mount option parser and the mount.nfs command
can take either form.  The mount.nfs command always passes the address
through getnameinfo(3), which usually re-writes interface indices as
device names.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:29 -04:00
Chuck Lever
ce3b7e1906 NFS: Add string length argument to nfs_parse_server_address
To make nfs_parse_server_address() more generally useful, allow it to
accept input strings that are not terminated with '\0'.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:28 -04:00
Chuck Lever
d1aa082573 NFS: Support raw IPv6 address hostnames during NFS mount operation
Traditionally the mount command has looked for a ":" to separate the
server's hostname from the export path in the mounted on device name,
like this:

	mount server:/export /mounted/on/dir

The server's hostname is "server" and the export path is "/export".

You can also substitute a specific IPv4 network address for the server
hostname, like this:

	mount 192.168.0.55:/export /mounted/on/dir

Raw IPv6 addresses present a problem, however, because they look
something like this:

	fe80::200:5aff:fe00:30b

Note the use of colons.

To get around the presence of colons, copy the Solaris convention used for
mounting IPv6 servers by address: wrap a raw IPv6 address with square
brackets.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:27 -04:00
Chuck Lever
dc04589827 NFS: Use common device name parsing logic for NFSv4 and NFSv2/v3
To support passing a raw IPv6 address as a server hostname, we need to
expand the logic that handles splitting the passed-in device name into
a server hostname and export path

Start by pulling device name parsing out of the mount option validation
functions and into separate helper functions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:26 -04:00
Trond Myklebust
cd10072562 NFS: Fix a dependency on CONFIG_NFS_V4 in nfs_remount
Fix the 'nfs4_fs_type' undeclared error in nfs_remount when compiling sans
NFSv4...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Jeff Layton <jlayton@redhat.com>
2008-07-09 12:09:25 -04:00
Trond Myklebust
e468bae97d NFS: Allow redirtying of a completed unstable write.
Currently, if an unstable write completes, we cannot redirty the page in
order to reflect a new change in the page data until after we've sent a
COMMIT request.

This patch allows a page rewrite to proceed without the unnecessary COMMIT
step, putting it immediately back onto the dirty page list, undoing the
VM unstable write accounting, and removing the NFS_PAGE_TAG_COMMIT tag from
the NFS radix tree.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:24 -04:00
Trond Myklebust
e7d39069e3 NFS: Clean up nfs_update_request()
Simplify the loop in nfs_update_request by moving into a separate function
the code that attempts to update an existing cached NFS write.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:23 -04:00
Chuck Lever
396cee977f NFS: missing newline in NFS mount debugging message
Clean up.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:22 -04:00
Chuck Lever
d33e4dfeab NFS: Treat "intr" and "nointr" options as deprecated
Clean up:  the "intr" and "nointr" mount options were recently retired.
Document this in the NFS mount option parser.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:21 -04:00
Chuck Lever
ecbb3845dd NFS: Allow any value for the "retry" option
The kernel NFS mount option parser should ignore the retry= mount option
since it is meaningful only in user space.  Today it expects a number
rather than arbitrary text, so it ignores the option if the value is
numeric, but chokes if there are other characters in the value.

Change it to allow any text (except ",") as its value.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:20 -04:00
Trond Myklebust
f41f741838 NFS: Ensure we zap only the access and acl caches when setting new acls
...and ensure that we obey the NFS_INO_INVALID_ACL flag when retrieving the
acls.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:19 -04:00
Trond Myklebust
2e96d28672 NFS: Fix a warning in nfs4_async_handle_error
We're not modifying the nfs_server when we call nfs_inc_server_stats and
friends, so allow the compiler to pass 'const' pointers too.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:18 -04:00
Chuck Lever
34e8f92831 NFS: Move fs/nfs/iostat.h to include/linux
The fs/nfs/iostat.h header has definitions that were designed to be exposed
to user space.  Move these definitions under include/linux so user space can
use the definitions in applications that read /proc/self/mountstats.

Also address a handful of coding style issues called out by checkpatch.pl in
fs/nfs/iostat.h.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:17 -04:00
Trond Myklebust
46cb650c22 NFS: Remove the redundant file_open entry from struct nfs_rpc_ops
All instances are set to nfs_open(), so we should just remove the redundant
indirection. Ditto for the file_release op

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:16 -04:00
Trond Myklebust
659bfcd6dd NFS: Fix the ftruncate() credential problem
ftruncate() access checking is supposed to be performed at open() time,
just like reads and writes.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:14 -04:00
Jeff Layton
48b605f83c NFS: implement option checking when remounting NFS filesystems (resend)
When remounting an NFS or NFS4 filesystem, the new NFS options are not
respected, yet the remount will still return success. This patch adds
a remount_fs sb op for NFS that checks any new nfs mount options against
the existing ones and fails the mount if any have changed.

This is only implemented for string-based mount options since doing
this with binary options isn't really feasible.

This is essentially the same as the original patch I sent out, but
adds a check to see if the addr= option has changed.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:09 -04:00
Adrian Bunk
c2d946e55e fs/nfs/nfsroot.c: remove CVS keyword
This patch removes a CVS keyword that wasn't updated for a long time
from a comment.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:08 -04:00
Chuck Lever
48186c7d57 NFS: Fix trace debugging nits in write.c
Clean up: fix a few dprintk messages that still need to show the RPC task ID
correctly, and be sure we use the preferred %lld or %llu instead of %Ld or
%Lu.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:05 -04:00
Chuck Lever
6da24bc9cf NFS: Use NFSDBG_FILE for all fops
Clean up: some fops use NFSDBG_FILE, some use NFSDBG_VFS.  Let's use
NFSDBG_FILE for all fops, and consistently report file names instead
of inode numbers.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:04 -04:00
Chuck Lever
b7eaefaa87 NFS: Add debugging facility for NFS aops
Recent work in fs/nfs/file.c neglected to add appropriate trace debugging
for the NFS client's address space operations.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:03 -04:00
Chuck Lever
cc0dd2d105 NFS: Make nfs_open methods consistent
Clean up: Report the same debugging info and count function calls the
same for files and directories in nfs_opendir() and nfs_file_open().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:02 -04:00
Chuck Lever
b84e06c58f NFS: Make nfs_llseek methods consistent
Clean up: Report the same debugging info in nfs_llseek_dir() and
nfs_llseek_file().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:01 -04:00
Chuck Lever
549177863b NFS: Make nfs_fsync methods consistent
Clean up: Report the same debugging info, count function calls the same,
and use similar function naming in nfs_fsync_dir() and nfs_fsync().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:09:00 -04:00
Trond Myklebust
b5418383ef NFS: do_setlk(): don't flush caches when we have a delegation
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:08:50 -04:00
Trond Myklebust
7e5f614660 NFS: Revert commit 44dd151d
Revert commit 44dd151d "NFS: Don't mark a written page as uptodate until it
is on disk". While it is true that the write may fail, that is always the
case. There is no reason why we should treat data on pages that are not
already marked as PG_uptodate as being special. The only thing we gain is a
noticeable slowdown when re-reading these pages.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:08:46 -04:00
Trond Myklebust
efc91ed019 NFS: Optimise append writes with holes
If a file is being extended, and we're creating a hole, we might as well
declare the entire page to be up to date.

This patch significantly improves the write performance for sparse files
in the case where lseek(SEEK_END) is used to append several non-contiguous
writes at intervals of < PAGE_SIZE.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:08:45 -04:00
Trond Myklebust
2116271a34 NFS: Add correct bounds checking to NFSv2 locks
NFSv2 file locking currently fails the Connectathon tests, because the
calls to the VFS locking code do not return an EINVAL error if the
struct file_lock overflows the 32-bit boundaries.

The problem is due to the fact that we occasionally call helpers from
fs/locks.c in order to avoid RPC calls to the server when we know that a
local process holds the lock. These helpers are, of course, always
64-bit enabled, so EINVAL is not returned in cases when it would if
the call had gone to the NLM code.

For consistency, we therefore add support for a bounds-checking helper.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:08:40 -04:00
Trond Myklebust
f3d47a3a6a NFS: Fix a preemption count leak in nfs_update_request
The commit 2785259631 (nfs: use GFP_NOFS
preloads for radix-tree insertion) appears to have introduced a bug:
We only want to call radix_tree_preload() once after creating a request.
Calling it every time we loop after we created the request, will cause
preemption count leaks.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Nick Piggin <npiggin@suse.de>
2008-07-09 12:08:39 -04:00
Trond Myklebust
0b4aae7aad NFS: Reduce the stack usage in NFSv3 create operations
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:08:38 -04:00
Trond Myklebust
57dc9a5747 NFS: Reduce the stack usage in NFSv4 create operations
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-09 12:08:37 -04:00
Trond Myklebust
2aac05a919 NFS: Fix readdir cache invalidation
invalidate_inode_pages2_range() takes page offset arguments, not byte
ranges.

Another thought is that individual pages might perhaps get evicted by VM
pressure, in which case we might perhaps want to re-read not only the
evicted page, but all subsequent pages too (in case the server returns
more/less data per page so that the alignment of the next entry
changes). We should therefore remove the condition that we only do this on
page->index==0.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-07-08 15:22:40 -04:00
Andi Kleen
9465efc9e9 Remove BKL from remote_llseek v2
- Replace remote_llseek with generic_file_llseek_unlocked (to force compilation
failures in all users)
- Change all users to either use generic_file_llseek_unlocked directly or
take the BKL around. I changed the file systems who don't use the BKL
for anything (CIFS, GFS) to call it directly. NCPFS and SMBFS and NFS
take the BKL, but explicitely in their own source now.

I moved them all over in a single patch to avoid unbisectable sections.

Open problem: 32bit kernels can corrupt fpos because its modification
is not atomic, but they can do that anyways because there's other paths who
modify it without BKL.

Do we need a special lock for the pos/f_version = 0 checks?

Trond says the NFS BKL is likely not needed, but keep it for now
until his full audit.

v2: Use generic_file_llseek_unlocked instead of remote_llseek_unlocked
    and factor duplicated code (suggested by hch)

Cc: Trond.Myklebust@netapp.com
Cc: swhiteho@redhat.com
Cc: sfrench@samba.org
Cc: vandrove@vc.cvut.cz

Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
2008-07-02 15:06:27 -06:00
Trond Myklebust
03fa9e84e5 NFS: nfs_updatepage(): don't mark page as dirty if an error occurred
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-06-23 17:09:07 -04:00
Trond Myklebust
b7e2445737 NFS: Fix filehandle size comparisons in the mount code
Fix a sign issue in xdr_decode_fhstatus3()
Fix incorrect comparison in nfs_validate_mount_data()

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-06-23 17:09:06 -04:00
Trond Myklebust
33852a1f2b NFS: Reduce the NFS mount code stack usage.
This appears to fix the Oops reported in
  http://bugzilla.kernel.org/show_bug.cgi?id=10826

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-06-23 17:09:05 -04:00
Adrian Bunk
1d2e88e73e nfs: make nfs4_drop_state_owner() static
nfs4_drop_state_owner() can now become static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-05-16 09:43:31 -07:00
Jan Blunck
31f31db1a1 nfs: path_{get,put}() cleanups
Here are some more places where path_{get,put}() can be used instead of
dput()/mntput() pair.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-05-16 09:43:30 -07:00
Harvey Harrison
3110ff8048 nfs: replace remaining __FUNCTION__ occurrences
__FUNCTION__ is gcc-specific, use __func__

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-05-16 09:43:29 -07:00
Eric Paris
46c8ac7425 nfs/lsm: make NFSv4 set LSM mount options
NFSv3 get_sb operations call into the LSM layer to set security options passed
from userspace.  NFSv4 hooks were not originally added since it was reasonably
late in the merge window and NFSv3 was the only thing that had regressed (v4
has never supported any LSM options)

This patch makes NFSv4 call into the LSM to set security options rather than
just blindly dropping them with no notice to the user as happens today.  This
patch was tested in a simple NFSv4 environment with the context= option and
appeared to work as expected.

Signed-off-by: Eric Paris <eparis@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: James Morris <jmorris@namei.org>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-05-16 09:43:27 -07:00
Trond Myklebust
3a6258e1fb NFSv4: Check the return value of decode_compound_hdr_arg()
If decode_compound_hdr_arg() returns a resource error, then we cannot
proceed to process the callback. Return a 'GARBAGE_ARGS' rpc-level error to
the caller instead.
If, however, the minor version field is incorrect, then we need to
propagate the resulting NFS4ERR_MINOR_VERS_MISMATCH error back as the
compound status field (setting the nops field to 0).

Finally, if encode_compound_hdr_res() returns an error, we need to return
an RPC_SYSTEM_ERR to the caller.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-05-16 09:43:26 -07:00
Fred Isaman
38def50fab nfs: fix race in nfs_dirty_request
When called from nfs_flush_incompatible, the req is not locked, so
req->wb_page might be set to NULL before it is used by PageWriteback.

Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-05-16 09:43:23 -07:00
Trond Myklebust
b0b539739f NFS: Ensure that 'noac' and/or 'actimeo=0' turn off attribute caching
Both the 'noac' and 'actimeo=0' mount options should ensure that attributes
are not cached, however a bug in nfs_attribute_timeout() means that
currently, the attributes may in fact get cached for up to one jiffy. This
has been seen to cause corruption in some applications.

The reason for the bug is that the time_in_range() test returns 'true' as
long as the current time lies between nfsi->read_cache_jiffies and
nfsi->read_cache_jiffies + nfsi->attrtimeo. In other words, if jiffies
equals nfsi->read_cache_jiffies, then we still cache the attribute data.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-05-16 09:43:21 -07:00
Miklos Szeredi
fa799759f9 mm: bdi: expose the BDI object in sysfs for NFS
Register NFS' backing_dev_info under sysfs with the name "nfs-MAJOR:MINOR"

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-30 08:29:49 -07:00
Denis V. Lunev
34b37235c6 nfs: use proc_create to setup de->proc_fops
Use proc_create() to make sure that ->proc_fops be setup before gluing PDE to
main tree.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:20 -07:00
Alexey Dobriyan
36a5aeb878 proc: remove proc_root_fs
Use creation by full path instead: "fs/foo".

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-29 08:06:18 -07:00
Al Viro
42faad9965 [PATCH] restore sane ->umount_begin() API
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-04-25 09:23:25 -04:00
Linus Torvalds
563307b2fa Merge git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (80 commits)
  SUNRPC: Invalidate the RPCSEC_GSS session if the server dropped the request
  make nfs_automount_list static
  NFS: remove duplicate flags assignment from nfs_validate_mount_data
  NFS - fix potential NULL pointer dereference v2
  SUNRPC: Don't change the RPCSEC_GSS context on a credential that is in use
  SUNRPC: Fix a race in gss_refresh_upcall()
  SUNRPC: Don't disconnect more than once if retransmitting NFSv4 requests
  SUNRPC: Remove the unused export of xprt_force_disconnect
  SUNRPC: remove XS_SENDMSG_RETRY
  SUNRPC: Protect creds against early garbage collection
  NFSv4: Attempt to use machine credentials in SETCLIENTID calls
  NFSv4: Reintroduce machine creds
  NFSv4: Don't use cred->cr_ops->cr_name in nfs4_proc_setclientid()
  nfs: fix printout of multiword bitfields
  nfs: return negative error value from nfs{,4}_stat_to_errno
  NLM/lockd: Ensure client locking calls use correct credentials
  NFS: Remove the buggy lock-if-signalled case from do_setlk()
  NLM/lockd: Fix a race when cancelling a blocking lock
  NLM/lockd: Ensure that nlmclnt_cancel() returns results of the CANCEL call
  NLM: Remove the signal masking in nlmclnt_proc/nlmclnt_cancel
  ...
2008-04-24 11:46:16 -07:00
Trond Myklebust
233607dbbc Merge branch 'devel' 2008-04-24 14:01:02 -04:00
Jeff Layton
06e02d66fa NFS: don't let nfs_callback_svc exit on unexpected svc_recv errors (try #2)
When svc_recv returns an unexpected error, nfs_callback_svc will print a
warning and exit. This problematic for several reasons. In particular,
it will cause the reference counts for the thread to be wrong, and no
new thread will be started until all nfs4 mounts are unmounted.

Rather than exiting on error from svc_recv, have the thread do a 1s
sleep and then retry the loop. This is unlikely to cause any harm, and
if the error turns out to be something temporary then it may be able to
recover.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-04-23 16:13:42 -04:00
J. Bruce Fields
e1ba1ab76e nfsd: fix comment
Obvious comment nit.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-04-23 16:13:42 -04:00
Jeff Layton
a277e33cbe NFS: convert nfs4 callback thread to kthread API
There's a general push to convert kernel threads to use the (much
cleaner) kthread API. This patch converts the NFSv4 callback kernel
thread to the kthread API. In addition to being generally cleaner this
also removes the dependency on signals when shutting down the thread.

Note that this patch depends on the recent patches to svc_recv() to
make it check kthread_should_stop() periodically. Those patches are
in Bruce's tree at the moment and are slated for 2.6.26 along with
the lockd conversion, so this conversion is probably also appropriate
for 2.6.26.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-04-23 16:13:40 -04:00
J. Bruce Fields
065f30ec14 nfs: remove unnecessary NFS_NEED_* defines
Thanks to Robert Day for pointing out that these two defines are unused.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Trond Myklebust <trond@netapp.com>Trond Myklebust <trond@netapp.com>
Cc: Neil Brown <neilb@suse.de>
Cc: "Robert P. J. Day" <rpjday@crashcourse.ca>
2008-04-23 16:13:37 -04:00
Adrian Bunk
a3dab29353 make nfs_automount_list static
nfs_automount_list can now become static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-04-19 16:55:29 -04:00
Jeff Layton
daa7da5fd3 NFS: remove duplicate flags assignment from nfs_validate_mount_data
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-04-19 16:55:25 -04:00
Cyrill Gorcunov
63649bd708 NFS - fix potential NULL pointer dereference v2
There is possible NULL pointer dereference if kstr[n]dup failed.
So fix them for safety.

Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-04-19 16:55:22 -04:00
Trond Myklebust
a2b2bb8822 NFSv4: Attempt to use machine credentials in SETCLIENTID calls
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-04-19 16:54:59 -04:00
Trond Myklebust
7c67db3a8a NFSv4: Reintroduce machine creds
We need to try to ensure that we always use the same credentials whenever
we re-establish the clientid on the server. If not, the server won't
recognise that we're the same client, and so may not allow us to recover
state.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-04-19 16:54:56 -04:00
Trond Myklebust
78ea323be6 NFSv4: Don't use cred->cr_ops->cr_name in nfs4_proc_setclientid()
With the recent change to generic creds, we can no longer use
cred->cr_ops->cr_name to distinguish between RPCSEC_GSS principals and
AUTH_SYS/AUTH_NULL identities. Replace it with the rpc_authops->au_name
instead...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-04-19 16:54:53 -04:00
Fred Isaman
4410924157 nfs: fix printout of multiword bitfields
Benny points out that zero-padding of multiword bitfields is necessary,
and that delimiting each word is nice to avoid endianess confusion.

bhalevy: without zero padding output can be ambiguous. Also,
since the printed array of two 32-bit unsigned integers is not a
64-bit number, delimiting the output with a semicolon makes more sense.

Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-04-19 16:54:50 -04:00
Benny Halevy
856dff3d38 nfs: return negative error value from nfs{,4}_stat_to_errno
All use sites for nfs{,4}_stat_to_errno negate their return value.
It's more efficient to return a negative error from the stat_to_errno convertors
rather than negating its return value everywhere. This also produces slightly
smaller code.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-04-19 16:54:47 -04:00
Trond Myklebust
c4d7c402b7 NFS: Remove the buggy lock-if-signalled case from do_setlk()
Both NLM and NFSv4 should be able to clean up adequately in the case where
the user interrupts the RPC call...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-04-19 16:53:52 -04:00
Trond Myklebust
536ff0f809 NFSv4: Ensure we don't corrupt fl->fl_flags in nfs4_proc_unlck
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-04-19 16:53:33 -04:00
Trond Myklebust
c1d519312d NFSv4: Only increment the sequence id if the server saw it
It is quite possible that the OPEN, CLOSE, LOCK, LOCKU,... compounds fail
before the actual stateful operation has been executed (for instance in the
PUTFH call). There is no way to tell from the overall status result which
operations were executed from the COMPOUND.

The fix is to move incrementing of the sequence id into the XDR layer,
so that we do it as we process the results from the stateful operation.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-04-19 16:53:15 -04:00
Trond Myklebust
35d05778e2 NFSv4: Remove bogus call to nfs4_drop_state_owner() in _nfs4_open_expired()
There should be no need to invalidate a perfectly good state owner just
because of a stale filehandle. Doing so can cause the state recovery code
to break, since nfs4_get_renew_cred() and nfs4_get_setclientid_cred() rely
on finding active state owners.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-04-19 16:53:12 -04:00
Trond Myklebust
dbae4c73f0 NFS: Ensure that rpc_run_task() errors are propagated back to the caller
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-04-19 16:53:08 -04:00
Trond Myklebust
c9d8f89d98 NFS: Ensure that the write code cleans up properly when rpc_run_task() fails
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-04-19 16:53:05 -04:00
Trond Myklebust
fdd1e74c89 NFS: Ensure that the read code cleans up properly when rpc_run_task() fails
In the case of readpage() we need to ensure that the pages get unlocked,
and that the error is flagged.

In the case of O_DIRECT, we need to ensure that the pages are all released.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-04-19 16:53:01 -04:00
Trond Myklebust
73e3302f60 NFS: Fix nfs_wb_page() to always exit with an error or a clean page
It is possible for nfs_wb_page() to sometimes exit with 0 return value, yet
the page is left in a dirty state.
For instance in the case where the server rebooted, and the COMMIT request
failed, then all the previously "clean" pages which were cached by the
server, but were not guaranteed to have been writted out to disk,
have to be redirtied and resent to the server.
The fix is to have nfs_wb_page_priority() check that the page is clean
before it exits...

This fixes a condition that triggers the BUG_ON(PagePrivate(page)) in
nfs_create_request() when we're in the nfs_readpage() path.

Also eliminate a redundant BUG_ON(!PageLocked(page)) while we're at it. It
turns out that clear_page_dirty_for_io() has the exact same test.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-04-19 16:52:58 -04:00
Dave Hansen
2c463e9548 [PATCH] r/o bind mounts: check mnt instead of superblock directly
If we depend on the inodes for writeability, we will not catch the r/o mounts
when implemented.

This patches uses __mnt_want_write().  It does not guarantee that the mount
will stay writeable after the check.  But, this is OK for one of the checks
because it is just for a printk().

The other two are probably unnecessary and duplicate existing checks in the
VFS.  This won't make them better checks than before, but it will make them
detect r/o mounts.

Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Dave Hansen <haveblue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-04-19 00:29:27 -04:00
Bryan Wu
240ee83118 fix bug - executing FDPIC ELF on NFS mount triggers BUG() at mm/nommu.c:862:/do_mmap_private()
NFS needs a NOMMU version mmap function to support uClinux on NOMMU machine
http://blackfin.uclinux.org/gf/project/uclinux-dist/tracker/?action=TrackerItemEdit&tracker_id=141&tracker_item_id=3992

Signed-off-by: Bryan Wu <cooloney@kernel.org>
Cc: Mike Frysinger <vapier.adi@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-04-08 21:06:56 -04:00
Jeff Layton
66d3aac041 NFS: initialize flags field in nfs_open_context
The nfs_open_context struct had a "flags" field added recently, but the
allocator isn't initializing it. It also looks like the allocator isn't
initializing the mode or list either, but they seem to be overwritten
by the caller, so that's less of an issue.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-04-08 21:06:53 -04:00
Al Viro
c35038beca [PATCH] do shrink_submounts() for all fs types
... and take it out of ->umount_begin() instances.  Call with all locks
already taken (by do_umount()) and leave calling release_mounts() to
caller (it will do release_mounts() anyway, so we can just put into
the same list).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2008-03-27 20:47:58 -04:00
Linus Torvalds
7d3628b230 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (46 commits)
  [NET] ifb: set separate lockdep classes for queue locks
  [IPV6] KCONFIG: Fix description about IPV6_TUNNEL.
  [TCP]: Fix shrinking windows with window scaling
  netpoll: zap_completion_queue: adjust skb->users counter
  bridge: use time_before() in br_fdb_cleanup()
  [TG3]: Fix build warning on sparc32.
  MAINTAINERS: bluez-devel is subscribers-only
  audit: netlink socket can be auto-bound to pid other than current->pid (v2)
  [NET]: Fix permissions of /proc/net
  [SCTP]: Fix a race between module load and protosw access
  [NETFILTER]: ipt_recent: sanity check hit count
  [NETFILTER]: nf_conntrack_h323: logical-bitwise & confusion in process_setup()
  [RT2X00] drivers/net/wireless/rt2x00/rt2x00dev.c: remove dead code, fix warning
  [IPV4]: esp_output() misannotations
  [8021Q]: vlan_dev misannotations
  xfrm: ->eth_proto is __be16
  [IPV4]: ipv4_is_lbcast() misannotations
  [SUNRPC]: net/* NULL noise
  [SCTP]: fix misannotated __sctp_rcv_asconf_lookup()
  [PKT_SCHED]: annotate cls_u32
  ...
2008-03-21 07:57:45 -07:00
Chuck Lever
ecfc555a83 NFS: Always enable NFS direct I/O
Since O_DIRECT is a standard feature that is enabled in most distros,
eliminate the CONFIG_NFS_DIRECTIO build option, and change the
fs/nfs/Makefile to always build in the NFS direct I/O engine.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-19 18:00:34 -04:00
Chuck Lever
82d101d58a NFS: Show most mount options via nfs_show_options()
Display all mount options in /proc/mount which may be needed to reconstruct
a previous mount.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-19 18:00:29 -04:00
Chuck Lever
3f8400d1f1 NFS: Save the values of the "mount*=" mount options
Save the value of the mountproto= mountport= mountvers= and mountaddr=
options so that these values can be displayed later via
nfs_show_options().

This preserves the intent of the original mount options, should the file
system need to be remounted based on what's displayed in /proc/mounts.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-19 18:00:22 -04:00
Chuck Lever
f22d6d79fe NFS: Save the value of the "port=" mount option
During a remount based on the mount options displayed in /proc/mounts, we
want to preserve the original behavior of the mount request.  Let's save
the original setting of the "port=" mount option in the mount's nfs_server
structure.

This allows us to simplify the default behavior of port setting for NFSv4
mounts: by default, NFSv2/3 mounts first try an RPC bind to determine the
NFS server's port, unless the user specified the "port=" mount option;
Users can force the client to skip the RPC bind by explicitly specifying
"port=<value>".

NFSv4, by contrast, assumes the NFS server port is 2049 and skips the RPC
bind, unless the user specifies "port=".  Users can force an RPC bind for
NFSv4 by explicitly specifying "port=0".

I added a couple of extra comments to clarify this behavior.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-19 18:00:19 -04:00
Chuck Lever
78fa701f34 NFS: Fix up data types of fields in nfs_parsed_mount_options
Clean up: make data types of fields in nfs_parsed_mount_options more
consistent with other uses.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-19 18:00:16 -04:00
Chuck Lever
2d76743227 NFS: numeric mount parameters are unsigned
Clean up: use %u instead of %d when displaying NFS mount options.

Nit: Fix reporting of "namlen=" option in nfs_show_mount_stats.  The mount
option is called "namlen" without the "e".

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-19 18:00:13 -04:00
Jeff Layton
7bda2cdf48 NFS: clean up short packet handling for NFSv4 readdir
Currently, the NFS readdir decoders have a workaround for buggy servers
that send an empty readdir response with the EOF bit unset. If the
server sends a malformed response in some cases, this workaround kicks
in and just returns an empty response rather than returning a proper
error to the caller.

This patch does 3 things:

1) have malformed responses with no entries return error (-EIO)

2) preserve existing workaround for servers that send empty
   responses with the EOF marker unset.

3) Add some comments to clarify the logic in decode_readdir().

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-19 18:00:10 -04:00
Jeff Layton
643f81115b NFS: clean up short packet handling for NFSv3 readdir
Currently, the NFS readdir decoders have a workaround for buggy servers
that send an empty readdir response with the EOF bit unset. If the
server sends a malformed response in some cases, this workaround kicks
in and just returns an empty response rather than returning a proper
error to the caller.

This patch does 3 things:

1) have malformed responses with no entries return error (-EIO)

2) preserve existing workaround for servers that send empty
   responses with the EOF marker unset.

3) Add some comments to clarify the logic in nfs3_xdr_readdirres().

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-19 18:00:06 -04:00
Jeff Layton
caa02bd540 NFS: clean up short packet handling for NFSv2 readdir
Currently, the NFS readdir decoders have a workaround for buggy servers
that send an empty readdir response with the EOF bit unset. If the
server sends a malformed response in some cases, this workaround kicks
in and just returns an empty response rather than returning a proper
error to the caller.

This patch does 3 things:

1) have malformed responses with no entries return error (-EIO)

2) preserve existing workaround for servers that send empty
   responses with the EOF marker unset.

3) Add some comments to clarify the logic in nfs_xdr_readdirres().

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-19 18:00:03 -04:00
Fred Isaman
4af68bffac nfs: remove duplicate initializations of nfs_read_data field
Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-19 17:59:59 -04:00
Fred
6d884e8fc8 nfs: nfs_redirty_request
Both flush functions have the same error handling routine.  Pull
it out as a function.

Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-19 17:59:56 -04:00
Trond Myklebust
c7c350e92a Merge branch 'hotfixes' into devel 2008-03-19 17:59:44 -04:00
Fred Isaman
f8512ad0da nfs: don't ignore return value from nfs_pageio_add_request
Ignoring the return value from nfs_pageio_add_request can cause deadlocks.

In read path:
  call nfs_pageio_add_request from readpage_async_filler
  assume at this point that there are requests already in desc, that
    can't be merged with the current request.
  so nfs_pageio_doio is fired up to clear out desc.
  assume something goes wrong in setting up the io, so desc->pg_error is set.
  This causes nfs_pageio_add_request to return 0, *WITHOUT* adding the original
    request.
  BUT, since return code is ignored, readpage_async_filler assumes it has
    been added, and does nothing further, leaving page locked.
  do_generic_mapping_read will eventually call lock_page, resulting in deadlock

In write path:
  page is marked dirty by generic_perform_write
  nfs_writepages is called
  call nfs_pageio_add_request from nfs_page_async_flush
  assume at this point that there are requests already in desc, that
    can't be merged with the current request.
  so nfs_pageio_doio is fired up to clear out desc.
  assume something goes wrong in setting up the io, so desc->pg_error is set.
  This causes nfs_page_async_flush to return 0, *WITHOUT* adding the original
    request, yet marking the request as locked (PG_BUSY) and in writeback,
    clearing dirty marks.
  The next time a write is done to the page, deadlock will result as
    nfs_write_end calls nfs_update_request

Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-19 17:59:02 -04:00
David S. Miller
2f633928cb Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 2008-03-17 23:44:31 -07:00
Al Viro
e6f1cebf71 [NET] endianness noise: INADDR_ANY
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-03-17 22:44:53 -07:00
Fred Isaman
2f42b5d043 NFS: fix encode_fsinfo_maxsz
The previous value was not taking into account space for bitmap array size.

Signed-off-by: Fred Isaman <iisaman@citi.umich.edu>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-14 13:47:17 -04:00
Trond Myklebust
98a8e32394 SUNRPC: Add a helper rpcauth_lookup_generic_cred()
The NFSv4 protocol allows clients to negotiate security protocols on the
fly in the case where an administrator on the server changes the export
settings and/or in the case where we may have a filesystem migration event.

Instead of having the NFS client code cache credentials that are tied to a
particular AUTH method it is therefore preferable to have a generic credential
that can be converted into whatever AUTH is in use by the RPC client when
the read/write/sillyrename/... is put on the wire.

We do this by means of the new "generic" credential, which basically just
caches the minimal information that is needed to look up an RPCSEC_GSS,
AUTH_SYS, or AUTH_NULL credential.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-14 13:42:49 -04:00
Trond Myklebust
9446389ef6 Merge commit 'origin' into devel 2008-03-08 11:49:24 -05:00
Linus Torvalds
4c1aa6f8b9 Merge branch 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'hotfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  NFS: Fix dentry revalidation for NFSv4 referrals and mountpoint crossings
  NFS: Fix the fsid revalidation in nfs_update_inode()
  SUNRPC: Fix a nfs4 over rdma transport oops
  NFS: Fix an f_mode/f_flags confusion in fs/nfs/write.c
2008-03-07 12:08:07 -08:00
Trond Myklebust
4e99a1ff34 NFS: Fix dentry revalidation for NFSv4 referrals and mountpoint crossings
As long as the directory contents haven't changed, we should just let the
path walk proceed to cross the mountpoint. Apart from being an optimisation
in the case of 'nohide' mountpoint traversals, it also fixes an issue with
referrals: referral inodes don't have valid filehandles, so calling
nfs_revalidate_inode() on them is a bug.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-07 14:35:41 -05:00
Trond Myklebust
c37dcd334c NFS: Fix the fsid revalidation in nfs_update_inode()
When we detect that we've crossed a mountpoint on the remote server, we
must take care not to use that inode to revalidate the fsid on our
current superblock. To do so, we label the inode as a remote mountpoint,
and check for that in nfs_update_inode().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-07 14:35:37 -05:00
Trond Myklebust
af1b8c2ff7 NFS: Fix an f_mode/f_flags confusion in fs/nfs/write.c
O_SYNC is stored in filp->f_flags.
Thanks to Al Viro for pointing out the bug.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-03-07 14:33:40 -05:00
Eric Paris
f9c3a38021 NFS: use new LSM interfaces to explicitly set mount options
NFS and SELinux worked together previously because SELinux had NFS
specific knowledge built in.  This design was approved by both groups
back in 2004 but the recent NFS changes to use nfs_parsed_mount_data and
the usage of nfs_clone_mount_data showed this to be a poor fragile
solution.  This patch fixes the NFS functionality regression by making
use of the new LSM interfaces to allow an FS to explicitly set its own
mount options.

The explicit setting of mount options is done in the nfs get_sb
functions which are called before the generic vfs hooks try to set mount
options for filesystems which use text mount data.

This does not currently support NFSv4 as that functionality did not
exist in previous kernels and thus there is no regression.  I will be
adding the needed code, which I believe to be the exact same as the v3
code, in nfs4_get_sb for 2.6.26.

Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: James Morris <jmorris@namei.org>
2008-03-06 08:40:59 +11:00
Trond Myklebust
cdd0972945 Merge branch 'cleanups' into next 2008-02-28 23:48:05 -08:00
Trond Myklebust
5e4424af9a SUNRPC: Remove now-redundant RCU-safe rpc_task free path
Now that we've tightened up the locking rules for RPC queue wakeups, we can
remove the RCU-safe kfree calls...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-28 23:26:28 -08:00
Trond Myklebust
f6a1cc8930 SUNRPC: Add a (empty for the moment) destructor for rpc_wait_queues
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-28 23:17:27 -08:00
Trond Myklebust
5d00837b90 SUNRPC: Run rpc timeout functions as callbacks instead of in softirqs
An audit of the current RPC timeout functions shows that they don't really
ever need to run in the softirq context. As long as the softirq is
able to signal that the wakeup is due to a timeout (which it can do by
setting task->tk_status to -ETIMEDOUT) then the callback functions can just
run as standard task->tk_callback functions (in the rpciod/process
context).

The only possible border-line case would be xprt_timer() for the case of
UDP, when the callback is used to reduce the size of the transport
congestion window. In testing, however, the effect of moving that update
to a callback would appear to be minor.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-25 21:40:44 -08:00
Trond Myklebust
fda1393938 SUNRPC: Convert users of rpc_wake_up_task to use rpc_wake_up_queued_task
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-25 21:40:42 -08:00
Trond Myklebust
101070ca2f NFS: Ensure that the asynchronous RPC calls complete on nfsiod.
We want to ensure that rpc_call_ops that involve mntput() are run on nfsiod
rather than on rpciod, so that they don't deadlock when the resulting
umount calls rpc_shutdown_client(). Hence we specify that read, write and
commit calls must complete on nfsiod.
Ditto for NFSv4 open, lock, locku and close asynchronous calls.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-25 21:40:37 -08:00
Trond Myklebust
5746006f1d NFS: Add an nfsiod workqueue
NFS post-rpciod cleanups often involve tasks that cannot be safely
performed within the rpciod context (due to deadlock concerns). We
therefore add a dedicated NFS workqueue that can perform tasks like
cleaning up state after an interrupted NFSv4 open() call, or calling
put_nfs_open_context() after an asynchronous read or write call.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-25 21:40:36 -08:00
Trond Myklebust
383ba71938 NFS: Fix a deadlock with lazy umount
We can't allow rpc callback functions like task->tk_ops->rpc_call_prepare()
and task->tk_ops->rpc_call_done() to call mntput() in any way, since
that will cause a deadlock when the call to rpc_shutdown_client() attempts
to wait on 'task' to complete.

We can avoid the above deadlock by moving calls to mntput to
task->tk_ops->rpc_release() callback, since at that time the task will be
marked as completed, and so rpc_shutdown_client won't attempt to wait on
it.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-25 21:40:33 -08:00
Trond Myklebust
4b5621f6b1 NFS: Fix an f_mode/f_flags confusion in fs/nfs/write.c
O_SYNC is stored in filp->f_flags.
Thanks to Al Viro for pointing out the bug.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-25 15:56:29 -08:00
Pavel Emelyanov
5216a8e70e Wrap buffers used for rpc debug printks into RPC_IFDEBUG
Sorry for the noise, but here's the v3 of this compilation fix :)

There are some places, which declare the char buf[...] on the stack
to push it later into dprintk(). Since the dprintk sometimes (if the
CONFIG_SYSCTL=n) becomes an empty do { } while (0) stub, these buffers
cause gcc to produce appropriate warnings.

Wrap these buffers with RPC_IFDEBUG macro, as Trond proposed, to
compile them out when not needed.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: J. Bruce Fields <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-21 18:42:29 -05:00
Harvey Harrison
90dc7d2796 nfs: fix sparse warnings
fs/nfs/nfs4state.c:788:34: warning: Using plain integer as NULL pointer
fs/nfs/delegation.c:52:34: warning: Using plain integer as NULL pointer
fs/nfs/idmap.c:312:12: warning: Using plain integer as NULL pointer
fs/nfs/callback_xdr.c:257:6: warning: Using plain integer as NULL pointer
fs/nfs/callback_xdr.c:270:6: warning: Using plain integer as NULL pointer
fs/nfs/callback_xdr.c:281:6: warning: Using plain integer as NULL pointer

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-20 16:15:44 -05:00
Jeff Layton
1227a74e2e NFS: flush signals before taking down callback thread
Now that the reference counting on the callback thread is working as
expected, it uncovers another problem.  Peter Staubach noticed while
testing that patch on an older kernel that he would occasionally see
this printk in rpc_register fire:

    "RPC: failed to contact portmap (errno -512).

The NFSv4 callback thread is signaled by nfs_callback_down(), but never
flushes that signal. All of the shutdown processing is done with that
signal pending. This makes it fail the call to unregister the port with
the portmapper.

In actuality, this rpc_register call isn't necessary at all since the
port isn't actually registered with the portmapper anymore. Regardless,
there doesn't seem to be any reason to leave the signal pending while
the thread is being shut down and flushing it should generally silence
that printk.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-20 13:32:43 -05:00
Trond Myklebust
52833e897f Merge branch 'linus_origin' into hotfixes 2008-02-15 13:36:30 -05:00
Jan Blunck
1d957f9bf8 Introduce path_put()
* Add path_put() functions for releasing a reference to the dentry and
  vfsmount of a struct path in the right order

* Switch from path_release(nd) to path_put(&nd->path)

* Rename dput_path() to path_put_conditional()

[akpm@linux-foundation.org: fix cifs]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: <linux-fsdevel@vger.kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Steven French <sfrench@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14 21:13:33 -08:00
Jan Blunck
4ac9137858 Embed a struct path into struct nameidata instead of nd->{dentry,mnt}
This is the central patch of a cleanup series. In most cases there is no good
reason why someone would want to use a dentry for itself. This series reflects
that fact and embeds a struct path into nameidata.

Together with the other patches of this series
- it enforced the correct order of getting/releasing the reference count on
  <dentry,vfsmount> pairs
- it prepares the VFS for stacking support since it is essential to have a
  struct path in every place where the stack can be traversed
- it reduces the overall code size:

without patch series:
   text    data     bss     dec     hex filename
5321639  858418  715768 6895825  6938d1 vmlinux

with patch series:
   text    data     bss     dec     hex filename
5320026  858418  715768 6894212  693284 vmlinux

This patch:

Switch from nd->{dentry,mnt} to nd->path.{dentry,mnt} everywhere.

[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix cifs]
[akpm@linux-foundation.org: fix smack]
Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Acked-by: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-14 21:13:33 -08:00
Nick Piggin
2785259631 nfs: use GFP_NOFS preloads for radix-tree insertion
NFS should use GFP_NOFS mode radix tree preloads rather than GFP_ATOMIC
allocations at radix-tree insertion-time.  This is important to reduce the
atomic memory requirement.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-13 23:24:09 -05:00
Olga Kornievskaia
8d042218b0 NFS: add missing spkm3 strings to mount option parser
This patch adds previous missing spkm3 string values that are needed
to parse mount options in the kernel.
2008-02-13 23:24:08 -05:00
Jeff Layton
25606656b1 NFS: remove error field from nfs_readdir_descriptor_t
The error field in nfs_readdir_descriptor_t is never used outside of the
function in which it is set. Remove the field and change the place that
does use it to use an existing local variable.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-13 23:24:07 -05:00
Dan Muntz
497799e7c0 NFS: missing spaces in KERN_WARNING
The warning message for a v4 server returning various bad sequence-ids is
missing spaces.

Signed-off-by: Dan Muntz <dmuntz@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-13 23:24:06 -05:00
Jeff Layton
8e60029f40 NFS: fix reference counting for NFSv4 callback thread
The reference counting for the NFSv4 callback thread stays artificially
high. When this thread comes down, it doesn't properly tear down the
svc_serv, causing a memory leak. In my testing on an older kernel on
x86_64, memory would leak out of the 8k kmalloc slab. So, we're leaking
at least a page of memory every time the thread comes down.

svc_create() creates the svc_serv with a sv_nrthreads count of 1, and
then svc_create_thread() increments that count. Whenever the callback
thread is started it has a sv_nrthreads count of 2. When coming down, it
calls svc_exit_thread() which decrements that count and if it hits 0, it
tears everything down. That never happens here since the count is always
at 2 when the thread exits.

The problem is that nfs_callback_up() should be calling svc_destroy() on
the svc_serv on both success and failure. This is how lockd_up_proto()
handles the reference counting, and doing that here fixes the leak.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-13 23:24:04 -05:00
Trond Myklebust
5d47a35600 NFS: Fix a potential file corruption issue when writing
If the inode is flagged as having an invalid mapping, then we can't rely on
the PageUptodate() flag. Ensure that we don't use the "anti-fragmentation"
write optimisation in nfs_updatepage(), since that will cause NFS to write
out areas of the page that are no longer guaranteed to be up to date.

A potential corruption could occur in the following scenario:

client 1			client 2
===============			===============
				fd=open("f",O_CREAT|O_WRONLY,0644);
				write(fd,"fubar\n",6);	// cache last page
				close(fd);
fd=open("f",O_WRONLY|O_APPEND);
write(fd,"foo\n",4);
close(fd);

				fd=open("f",O_WRONLY|O_APPEND);
				write(fd,"bar\n",4);
				close(fd);
-----
The bug may lead to the file "f" reading 'fubar\n\0\0\0\nbar\n' because
client 2 does not update the cached page after re-opening the file for
write. Instead it keeps it marked as PageUptodate() until someone calls
invaldate_inode_pages2() (typically by calling read()).

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-02-07 19:20:20 -05:00
David Howells
e231c2ee64 Convert ERR_PTR(PTR_ERR(p)) instances to ERR_CAST(p)
Convert instances of ERR_PTR(PTR_ERR(p)) to ERR_CAST(p) using:

perl -spi -e 's/ERR_PTR[(]PTR_ERR[(](.*)[)][)]/ERR_CAST(\1)/' `grep -rl 'ERR_PTR[(]*PTR_ERR' fs crypto net security`

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07 08:42:26 -08:00
Christoph Lameter
eebd2aa355 Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user
Simplify page cache zeroing of segments of pages through 3 functions

zero_user_segments(page, start1, end1, start2, end2)

        Zeros two segments of the page. It takes the position where to
        start and end the zeroing which avoids length calculations and
	makes code clearer.

zero_user_segment(page, start, end)

        Same for a single segment.

zero_user(page, start, length)

        Length variant for the case where we know the length.

We remove the zero_user_page macro. Issues:

1. Its a macro. Inline functions are preferable.

2. The KM_USER0 macro is only defined for HIGHMEM.

   Having to treat this special case everywhere makes the
   code needlessly complex. The parameter for zeroing is always
   KM_USER0 except in one single case that we open code.

Avoiding KM_USER0 makes a lot of code not having to be dealing
with the special casing for HIGHMEM anymore. Dealing with
kmap is only necessary for HIGHMEM configurations. In those
configurations we use KM_USER0 like we do for a series of other
functions defined in highmem.h.

Since KM_USER0 is depends on HIGHMEM the existing zero_user_page
function could not be a macro. zero_user_* functions introduced
here can be be inline because that constant is not used when these
functions are called.

Also extract the flushing of the caches to be outside of the kmap.

[akpm@linux-foundation.org: fix nfs and ntfs build]
[akpm@linux-foundation.org: fix ntfs build some more]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: <linux-ext4@vger.kernel.org>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: David Chinner <dgc@sgi.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-05 09:44:13 -08:00
Tom Tucker
d7c9f1ed97 svc: Change services to use new svc_create_xprt service
Modify the various kernel RPC svcs to use the svc_create_xprt service.

Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Neil Brown <neilb@suse.de>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Greg Banks <gnb@sgi.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2008-02-01 16:42:09 -05:00
Linus Torvalds
75659ca0c1 Merge branch 'task_killable' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc
* 'task_killable' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc: (22 commits)
  Remove commented-out code copied from NFS
  NFS: Switch from intr mount option to TASK_KILLABLE
  Add wait_for_completion_killable
  Add wait_event_killable
  Add schedule_timeout_killable
  Use mutex_lock_killable in vfs_readdir
  Add mutex_lock_killable
  Use lock_page_killable
  Add lock_page_killable
  Add fatal_signal_pending
  Add TASK_WAKEKILL
  exit: Use task_is_*
  signal: Use task_is_*
  sched: Use task_contributes_to_load, TASK_ALL and TASK_NORMAL
  ptrace: Use task_is_*
  power: Use task_is_*
  wait: Use TASK_NORMAL
  proc/base.c: Use task_is_*
  proc/array.c: Use TASK_REPORT
  perfmon: Use task_is_*
  ...

Fixed up conflicts in NFS/sunrpc manually..
2008-02-01 11:45:47 +11:00
Trond Myklebust
3fbd67ad61 NFSv4: Iterate through all nfs_clients when the server recalls a delegation
The same delegation may have been handed out to more than one nfs_client.
Ensure that if a recall occurs, we return all instances.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:06:12 -05:00
Trond Myklebust
57bfa89171 NFSv4: Deal more correctly with duplicate delegations
If a (broken?) server hands out two different delegations for the same
file, then we should return one of them.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:06:12 -05:00
Trond Myklebust
6f23e3872c NFS: Fix a potential race between umount and nfs_access_cache_shrinker()
Thanks to Yawei Niu for spotting the race.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:06:12 -05:00
Trond Myklebust
e6f8107595 NFS: Add an asynchronous delegreturn operation for use in nfs_clear_inode
Otherwise, there is a potential deadlock if the last dput() from an NFSv4
close() or other asynchronous operation leads to nfs_clear_inode calling
the synchronous delegreturn.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:06:12 -05:00
Benny Halevy
99fadcd764 nfs: convert NFS_*(inode) helpers to static inline
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:06:11 -05:00
Benny Halevy
3a10c30acc nfs: obliterate NFS_FLAGS macro
use NFS_I(inode)->flags instead

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:06:11 -05:00
Chuck Lever
fc6014771b NFS: Address memory leaks in the NFS client mount option parser
David Howells noticed that repeating the same mount option twice during an
NFS mount request can result in orphaned memory in certain cases.

Only the client_address and mount_server.hostname strings are initialized
in the mount parsing loop, so those appear to be the only two pointers that
might be written over by repeating a mount option.  The strings in the
nfs_server section of the nfs_parsed_mount_data structure are set only once
after the options are parsed, thus these are not susceptible to being
overwritten.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:06:11 -05:00
J. Bruce Fields
3d1c550874 nfs4: allow nfsv4 acls on non-regular-files
The rfc doesn't give any reason it shouldn't be possible to set an
attribute on a non-regular file.  And if the server supports it, then it
shouldn't be up to us to prevent it.

Thanks to Erez for the report and Trond for further analysis.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Tested-by: Erez Zadok <ezk@cs.sunysb.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:06:10 -05:00
Trond Myklebust
f3c391e89c NFS: Optimise away the sigmask code in aio/dio reads and writes
There are no interruptible waits for asynchronous RPC tasks, so we don't
need to wrap calls to rpc_run_task() with an
rpc_clnt_sigmask/rpc_clnt_unsigmask pair.

Instead we can wrap the wait_for_completion_interruptible() in
nfs_direct_wait(). This means that we completely optimise away sigmask
setting for the case of non-blocking aio/dio.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:06:10 -05:00
Chuck Lever
883bb163f8 NLM: Introduce an arguments structure for nlmclnt_init()
Clean up: pass 5 arguments to nlmclnt_init() in a structure similar to the
new nfs_client_initdata structure.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2008-01-30 02:06:07 -05:00
Chuck Lever
1093a60ef3 NLM/NFS: Use cached nlm_host when calling nlmclnt_proc()
Now that each NFS mount point caches its own nlm_host structure, it can be
passed to nlmclnt_proc() for each lock request.  By pinning an nlm_host for
each mount point, we trade the overhead of looking up or creating a fresh
nlm_host struct during every NLM procedure call for a little extra memory.

We also restrict the nlmclnt_proc symbol to limit the use of this call to
in-tree modules.

Note that nlm_lookup_host() (just removed from the client's per-request
NLM processing) could also trigger an nlm_host garbage collection.  Now
client-side nlm_host garbage collection occurs only during NFS mount
processing.  Since the NFS client now holds a reference on these nlm_host
structures, they wouldn't have been affected by garbage collection
anyway.

Given that nlm_lookup_host() reorders the global nlm_host chain after
every successful lookup, and that a garbage collection could be triggered
during the call, we've removed a significant amount of per-NLM-request
CPU processing overhead.

Sidebar: there are only a few remaining references to the internals of
NFS inodes in the client-side NLM code.  The only references I found are
related to extracting or comparing the inode's file handle via NFS_FH().
One is in nlmclnt_grant(); the other is in nlmclnt_setlockargs().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:06:07 -05:00
Chuck Lever
9289e7f91a NFS: Invoke nlmclnt_init during NFS mount processing
Cache an appropriate nlm_host structure in the NFS client's mount point
metadata for later use.

Note that there is no need to set NFS_MOUNT_NONLM in the error case -- if
nfs_start_lockd() returns a non-zero value, its callers ensure that the
mount request fails outright.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:06:07 -05:00
Chuck Lever
3d509e5454 NFS: nfs_write_end clean up
Clean up: commit 4899f9c8 added nfs_write_end(), which introduces a
conditional expression that returns an unsigned integer in one arm and
a signed integer in the other.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:06:02 -05:00
Chuck Lever
bf4285e75c NFS: Fix minor mixed sign comparison in NFS client's write logic
Clean up: PAGE_CACHE_SIZE is unsigned, and nfs_pageio_init() takes a size_t.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:06:01 -05:00
Chuck Lever
d24aae41b4 NFS: Use size_t for storing name lengths
Clean up: always use the same type when handling buffer lengths.  As a
bonus, this prevents a mixed sign comparison in idmap_lookup_name.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:06:01 -05:00
Chuck Lever
a661b77fc1 NFS: Fix use of copy_to_user() in idmap_pipe_upcall
The idmap_pipe_upcall() function expects the copy_to_user() function to
return a negative error value if the call fails, but copy_to_user()
returns an unsigned long number of bytes that couldn't be copied.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:06:01 -05:00
Chuck Lever
369af0f116 NFS: Clean up fs/nfs/idmap.c
Clean up white space damage and use standard kernel coding conventions for
return statements.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:06:00 -05:00
Trond Myklebust
59dca3b28c NFS: Fix the 'proto=' mount option
Currently, if you have a server mounted using networking protocol, you
cannot specify a different value using the 'proto=' option on another
mountpoint.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:06:00 -05:00
Trond Myklebust
331702337f NFS: Support per-mountpoint timeout parameters.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:59 -05:00
Trond Myklebust
7a3e3e18e4 NFS: Ensure that we respect NFS_MAX_TCP_TIMEOUT
It isn't sufficient just to limit timeout->to_initval, we also need to
limit to_maxval.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:59 -05:00
Trond Myklebust
69dd716c5f NFSv4: Add socket proto argument to setclientid
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:58 -05:00
Chuck Lever
3c7c7e4812 NFS: Pull covers off IPv6 address parsing
Now that the needed IPv6 infrastructure is in place, allow the NFS client's
IP address parser to generate AF_INET6 addresses.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:57 -05:00
Chuck Lever
4c56801770 NFS: Support non-IPv4 addresses in nfs_parsed_mount_data
Replace the nfs_server and mount_server address fields in the
nfs_parsed_mount_data structure with a "struct sockaddr_storage"
instead of a "struct sockaddr_in".

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:57 -05:00
Chuck Lever
9412b92772 NFS: Refactor mount option address parsing into separate function
Refactor the logic to parse incoming text-based IP addresses.  Use the
in4_pton() function instead of the older in_aton(), following the lead
of the in-kernel CIFS client.

Later we'll add IPv6 address parsing using the matching in6_pton()
function.  For now we can't allow IPv6 address parsing: we must expand
the size of the address storage fields in the nfs_parsed_mount_options
struct before we can parse and store IPv6 addresses.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:56 -05:00
Chuck Lever
338320345b NFS: Remove the NIPQUAD from nfs_try_mount
In the name of address family compatibility, we can't have the NIP_FMT and
NIPQUAD macros in nfs_try_mount().  Instead, we can make use of an unused
mount option to display the mount server's hostname.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:56 -05:00
Chuck Lever
6677d09513 NFS: Adjust nfs_clone_mount structure to store "struct sockaddr *"
Change the addr field in the nfs_clone_mount structure to store a "struct
sockaddr *" to support non-IPv4 addresses in the NFS client.

Note this is mostly a cosmetic change, and does not actually allow
referrals using IPv6 addresses.  The existing referral code assumes that
the server returns a string that represents an IPv4 address.  This code
needs to support hostnames and IPv6 addresses as well as IPv4 addresses,
thus it will need to be reorganized completely (to handle DNS resolution
in user space).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:56 -05:00
Chuck Lever
dcecae0ff4 NFS: Change nfs4_set_client() to accept struct sockaddr *
Adjust the arguments and callers of nfs4_set_client() to pass a "struct
sockaddr *" instead of a "struct sockaddr_in *" to support non-IPv4
addresses in the NFS client.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:56 -05:00
Chuck Lever
d7422c472b NFS: Change nfs_get_client() to take sockaddr *
Adjust arguments and callers of nfs_get_client() to pass a
"struct sockaddr *" instead of "struct sockaddr_in *" to support
non-IPv4 addresses.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:55 -05:00
Chuck Lever
ff052645c9 NFS: Change nfs_find_client() to take "struct sockaddr *"
Adjust arguments and callers of nfs_find_client() to pass a
"struct sockaddr *" instead of "struct sockaddr_in *" to support non-IPv4
addresses.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>

Trond: Also fix up protocol version number argument in nfs_find_client() to
use the correct u32 type.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:55 -05:00
Chuck Lever
c1d3586656 NFS: Change cb_recallargs to pass "struct sockaddr *" instead of sockaddr_in
Change the addr field in the cb_recallargs struct to a "struct sockaddr *"
to support non-IPv4 addresses.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:55 -05:00
Chuck Lever
671beed7e2 NFS: Change cb_getattrargs to pass "struct sockaddr *" instead of sockaddr_in
Change the addr field in the cb_getattrargs struct to a "struct sockaddr *"
to support non-IPv4 addresses.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:54 -05:00
Chuck Lever
6e4cffd7b2 NFS: Expand server address storage in nfs_client struct
Prepare for managing larger addresses in the NFS client by widening the
nfs_client struct's cl_addr field.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>

(Modified to work with the new parameters for nfs_alloc_client)
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:54 -05:00
Trond Myklebust
3b0d3f93d0 NFS: Add support for AF_INET6 addresses in __nfs_find_client()
Introduce AF_INET6-specific address checking to __nfs_find_client().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:54 -05:00
Chuck Lever
0d0f0c192d NFS: Set default port for NFSv4, with support for AF_INET6
Create a helper function to set the default NFS port for NFSv4 mount
points.  The helper supports both AF_INET and AF_INET6 family addresses.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:54 -05:00
Chuck Lever
04dcd6e3ac NFS: Make setting a port number agostic
We'll need to set the port number of an AF_INET or AF_INET6 address in
several places in fs/nfs/super.c, so introduce a helper that can manage
this for us.  We put this helper to immediate use.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:53 -05:00
Chuck Lever
cdcd7f9abc NFS: Verify IPv6 addresses properly
Add support to nfs_verify_server_address for recognizing AF_INET6
addresses.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:53 -05:00
Chuck Lever
fd00a8ff8e NFS: Add support for AF_INET6 addresses in nfs_compare_super()
Refactor nfs_compare_super() and add AF_INET6 support.

Replace the generic memcmp() to document explicitly what parts of the
addresses must match in this check, and make the comparison independent
of the lengths of both addresses.

A side benefit is both tests are more computationally efficient than a
memcmp().

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:53 -05:00
Chuck Lever
3f43c6667a NFS: Address a couple of nits in nfs_follow_referral()
Clean up: fix an outdated block comment, and address a comparison
between a signed and unsigned integer.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:52 -05:00
Chuck Lever
1d98fe6717 NFS: Move dprintks from callback.c to callback_proc.c
Clean up: The client side peer address is available in callback_proc.c,
so move a dprintk out of fs/nfs/callback.c and into
fs/nfs/callback_proc.c.

This is more consistent with other debugging messages, and the proc
routines have more information about each request to display.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:52 -05:00
Chuck Lever
5d8515caeb NFS: eliminate NIPQUAD(clp->cl_addr.sin_addr)
To ensure the NFS client displays IPv6 addresses properly, replace
address family-specific NIPQUAD() invocations with a call to the RPC
client to get a formatted string representing the remote peer's
address.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:52 -05:00
Chuck Lever
d4d3c50749 NFS: Enable NFS client to generate CLIENTID strings with IPv6 addresses
We recently added methods to RPC transports that provide string versions of
the remote peer address information.  Convert the NFSv4 SETCLIENTID
procedure to use those methods instead of building the client ID out of
whole cloth.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:51 -05:00
Chuck Lever
cc38bac3a0 NFS: Ensure NFSv4 SETCLIENTID send buffer is large enough
Ensure that the RPC buffer size specified for NFSv4 SETCLIENTID procedures
matches what we are encoding into the buffer.  See the definition of
struct nfs4_setclientid {} and the encode_setclientid() function.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:51 -05:00
Trond Myklebust
40c553193d NFS: Remove the redundant nfs_client->cl_nfsversion
We can get the same information from the rpc_ops structure instead.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:49 -05:00
Trond Myklebust
c81468a1a7 NFS: Clean up the nfs_find_client function.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:48 -05:00
Trond Myklebust
3a498026ee NFS: Clean up the nfs_client initialisation
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:48 -05:00
Trond Myklebust
bfc69a4566 NFS: define a function to update nfsi->cache_change_attribute
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:47 -05:00
Chuck Lever
5cce428d95 NFS: Remove an unneeded check in decode_compound_header_arg()
Clean up:  The header tag length is unsigned, so checking that it is less
than zero is unnecessary.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:46 -05:00
Chuck Lever
d45273ed6f NFS: Clean up address comparison in __nfs_find_client()
The address comparison in the __nfs_find_client() function is deceptive.
It uses a memcmp() to check a pair of u32 fields for equality.  Not only is
this inefficient, but usually memcmp() is used for comparing two *whole*
sockaddr_in's (which includes comparisons of the address family and port
number), so it's easy to mistake the comparison here for a whole sockaddr
comparison, which it isn't.

So for clarity and efficiency, we replace the memcmp() with a simple test
for equality between the two s_addr fields.  This should have no
behavioral effect.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:46 -05:00
Chuck Lever
6a0ed1de8e NFS: Clean up: copy hostname with kstrndup during mount processing
Clean up: mount option parsing uses kstrndup in several places, rather than
using kzalloc.  Replace the few remaining uses of kzalloc with kstrndup,
for consistency.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:46 -05:00
Chuck Lever
e887cbcf91 NFS: Remove support for the 'mountprog' option
Remove the mount option that allows users to specify an alternate mountd
program number.  The client hasn't support setting an alternate mountd
program number for a very long time.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:46 -05:00
Chuck Lever
ad879cef85 NFS: Remove support for the 'nfsprog' option
Remove the mount option that allows users to specify an alternate NFS
program number.  The client hasn't support setting an alternate NFS
program number for a very long time.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:45 -05:00
Chuck Lever
0eb2574121 NFS: Ensure that NFS version 4 mounts use NFS_PORT if nfsport wasn't set
Text-based mount option parsing introduced a minor regression in the
behavior of NFS version 4 mounts.  NFS version 4 is not supposed to require
a running rpcbind service on the server in order for a mount to succeed.

In other words, if the mount options don't specify a port number, the port
number is supposed to default to 2049.  For earlier versions of NFS, the
default port number was zero in order to cause the RPC client to autobind
to the server's NFS service.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:45 -05:00
Chuck Lever
28c494c5c8 NFS: Prevent nfs_getattr() hang during heavy write workloads
POSIX requires that ctime and mtime, as reported by the stat(2) call,
reflect the activity of the most recent write(2).  To that end, nfs_getattr()
flushes pending dirty writes to a file before doing a GETATTR to allow the
NFS server to set the file's size, ctime, and mtime properly.

However, nfs_getattr() can be starved when a constant stream of application
writes to a file prevents nfs_wb_nocommit() from completing.  This usually
results in hangs of programs doing a stat against an NFS file that is being
written.  "ls -l" is a common victim of this behavior.

To prevent starvation, hold the file's i_mutex in nfs_getattr() to
freeze applications writes temporarily so the client can more quickly obtain
clean values for a file's size, mtime, and ctime.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:45 -05:00
Chuck Lever
464ad6b1ad NFS: Change sign of some loop indices in nfs4xdr.c
Nit: Eliminate some mixed sign comparisons in loop indices.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:44 -05:00
Chuck Lever
bcecff77a9 NFS: Use unsigned intermediates for manipulating header lengths (NFSv4 XDR)
Clean up: prevent length underflow and mixed sign comparison when
unmarshalling NFS version 4 getacl, readdir, and readlink replies.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:44 -05:00
Chuck Lever
c957c526ef NFS: Use unsigned intermediates for manipulating header lengths (NFSv3 XDR)
Clean up: prevent length underflow and mixed sign comparisons when
unmarshalling NFS version 3 read, readdir, and readlink replies.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:44 -05:00
Chuck Lever
6232dbbcff NFS: Use unsigned intermediates for manipulating header lengths (NFSv2 XDR)
Clean up: prevent length underflow and mixed sign comparisons when
unmarshalling NFS version 2 read, readdir, and readlink replies.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:44 -05:00
Chuck Lever
8a8c74bf94 NFS: Ensure nfs_wcc_update_inode always converts file size to loff_t
The nfs_wcc_update_inode() function omits logic to convert the type of
the NFS on-the-wire value of a file's size (__u64) to the type of file
size value stored in struct inode (loff_t, which is signed).

Everywhere else in the NFS client I checked already correctly converts the
file size type.

This effects only very large files.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:43 -05:00
Trond Myklebust
0773769191 NFS/SUNRPC: Convert users of rpc_init_task+rpc_execute to rpc_run_task()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:39 -05:00
Trond Myklebust
5138fde011 NFS/SUNRPC: Convert all users of rpc_call_setup()
Replace use of rpc_call_setup() with rpc_init_task(), and in cases where we
need to initialise task->tk_action, with rpc_call_start().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:32 -05:00
Trond Myklebust
bdc7f021f3 NFS: Clean up the (commit|read|write)_setup() callback routines
Move the common code for setting up the nfs_write_data and nfs_read_data
structures into fs/nfs/read.c, fs/nfs/write.c and fs/nfs/direct.c.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:32 -05:00
Trond Myklebust
3ff7576dda SUNRPC: Clean up the initialisation of priority queue scheduling info.
We want the default scheduling priority (priority == 0) to remain
RPC_PRIORITY_NORMAL.

Also ensure that the priority wait queue scheduling is per process id
instead of sometimes being per thread, and sometimes being per inode.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:30 -05:00
Trond Myklebust
c970aa85e7 SUNRPC: Clean up rpc_run_task
Make it use the new task initialiser structure instead of acting as a
wrapper.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:30 -05:00
Trond Myklebust
84115e1cd4 SUNRPC: Cleanup of rpc_task initialisation
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:30 -05:00
Steve Dickson
ef818a28fa NFS: Stop sillyname renames and unmounts from racing
Added an active/deactive mechanism to the nfs_server structure
allowing async operations to hold off umount until the
operations are done.

Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:24 -05:00
Trond Myklebust
2f74c0a056 NFSv4: Clean up the OPEN/CLOSE serialisation code
Reduce the time spent locking the rpc_sequence structure by queuing the
nfs_seqid only when we are ready to take the lock (when calling
nfs_wait_on_sequence).

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:24 -05:00
Trond Myklebust
acee478afc NFS: Clean up the write request locking.
Ensure that we set/clear NFS_PAGE_TAG_LOCKED when the nfs_page is hashed.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:24 -05:00
Trond Myklebust
8b1f9ee56e NFS: Optimise nfs_vm_page_mkwrite()
The current model locks the page twice for no good reason. Optimise by
inlining the parts of nfs_write_begin()/nfs_write_end() that we care about.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:23 -05:00
Trond Myklebust
77f111929d NFS: Ensure that we eject stale inodes as soon as possible
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:22 -05:00
Trond Myklebust
d45b9d8baf NFS: Handle -ENOENT errors in unlink()/rmdir()/rename()
If the server returns an ENOENT error, we still need to do a d_delete() in
order to ensure that the dentry is deleted.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:22 -05:00
Trond Myklebust
609005c319 NFS: Sillyrename: in the case of a race, check aliases are really positive
In nfs_do_call_unlink() we check that we haven't raced, and that lookup()
hasn't created an aliased dentry to our sillydeleted dentry. If somebody
has deleted the file on the server and the lookup() resulted in a negative
dentry, then ignore...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:22 -05:00
Trond Myklebust
fccca7fc6a NFS: Fix a sillyrename race...
Ensure that readdir revalidates its data cache after blocking on
sillyrename.

Also fix a typo in nfs_do_call_unlink(): swap the ^= for an |=. The result
is the same, since we've already checked that the flag is unset, but it
makes the code more readable.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-30 02:05:21 -05:00
Trond Myklebust
d0dc3701cb NFSv4: Give the lock stateid its own sequence queue
Sharing the open sequence queue causes a deadlock when we try to take
both a lock sequence id and and open sequence id.

This fixes the regression reported by Dimitri Puzin and Jeff Garzik: See

	http://bugzilla.kernel.org/show_bug.cgi?id=9712

for details.

Reported-and-tested-by: Dimitri Puzin <bugs@psycast.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Jeff Garzik <jgarzik@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-10 13:35:32 -08:00
Trond Myklebust
e6e21970ba NFSv4: Fix open_to_lock_owner sequenceid allocation...
NFSv4 file locking is currently completely broken since it doesn't respect
the OPEN sequencing when it is given an unconfirmed lock_owner and needs to
do an open_to_lock_owner. Worse: it breaks the sunrpc rules by doing a
GFP_KERNEL allocation inside an rpciod callback.

Fix is to preallocate the open seqid structure in nfs4_alloc_lockdata if we
see that the lock_owner is unconfirmed.
Then, in nfs4_lock_prepare() we wait for either the open_seqid, if
the lock_owner is still unconfirmed, or else fall back to waiting on the
standard lock_seqid.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-03 09:37:17 -05:00
Trond Myklebust
bb22629ee8 NFSv4: nfs4_open_confirm must not set the open_owner as confirmed on error
RFC3530 states that the open_owner is confirmed if and only if the client
sends an OPEN_CONFIRM request with the appropriate sequence id and stateid
within the lease period.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-03 09:37:17 -05:00
Trond Myklebust
b274b48f3e NFSv4: Fix circular locking dependency in nfs4_kill_renewd
Erez Zadok reports:

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.24-rc6-unionfs2 #80
-------------------------------------------------------
umount.nfs4/4017 is trying to acquire lock:
 (&(&clp->cl_renewd)->work){--..}, at: [<c0223e53>]
__cancel_work_timer+0x83/0x17f

but task is already holding lock:
 (&clp->cl_sem){----}, at: [<f8879897>] nfs4_kill_renewd+0x17/0x29 [nfs]

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #1 (&clp->cl_sem){----}:
       [<c0230699>] __lock_acquire+0x9cc/0xb95
       [<c0230c39>] lock_acquire+0x5f/0x78
       [<c0397cb8>] down_read+0x3a/0x4c
       [<f88798e6>] nfs4_renew_state+0x1c/0x1b8 [nfs]
       [<c0223821>] run_workqueue+0xd9/0x1ac
       [<c0224220>] worker_thread+0x7a/0x86
       [<c0226b49>] kthread+0x3b/0x62
       [<c02033a3>] kernel_thread_helper+0x7/0x10
       [<ffffffff>] 0xffffffff

-> #0 (&(&clp->cl_renewd)->work){--..}:
       [<c0230589>] __lock_acquire+0x8bc/0xb95
       [<c0230c39>] lock_acquire+0x5f/0x78
       [<c0223e87>] __cancel_work_timer+0xb7/0x17f
       [<c0223f5a>] cancel_delayed_work_sync+0xb/0xd
       [<f887989e>] nfs4_kill_renewd+0x1e/0x29 [nfs]
       [<f885a8f6>] nfs_free_client+0x37/0x9e [nfs]
       [<f885ab20>] nfs_put_client+0x5d/0x62 [nfs]
       [<f885ab9a>] nfs_free_server+0x75/0xae [nfs]
       [<f8862672>] nfs4_kill_super+0x27/0x2b [nfs]
       [<c0258aab>] deactivate_super+0x3f/0x51
       [<c0269668>] mntput_no_expire+0x42/0x67
       [<c025d0e4>] path_release_on_umount+0x15/0x18
       [<c0269d30>] sys_umount+0x1a3/0x1cb
       [<c0269d71>] sys_oldumount+0x19/0x1b
       [<c02026ca>] sysenter_past_esp+0x5f/0xa5
       [<ffffffff>] 0xffffffff

Looking at the code, it would seem that taking the clp->cl_sem in
nfs4_kill_renewd is completely redundant, since we're already guaranteed to
have exclusive access to the nfs_client (we're shutting down).

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-03 09:37:16 -05:00
Trond Myklebust
e9cc6c234b NFS: Fix a possible Oops in fs/nfs/super.c
Sigh... commit 4584f520e1 (NFS: Fix NFS
mountpoint crossing...) had a slight flaw: server can be NULL if sget()
returned an existing superblock.

Fix the fix by dereferencing s->s_fs_info.

Thanks to Coverity/Adrian Bunk and Frank Filz for spotting the bug.
(See http://bugzilla.kernel.org/show_bug.cgi?id=9647)

Also add in the same namespace Oops fix for NFSv4 in both the mountpoint
crossing case, and the referral case.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2008-01-03 09:37:11 -05:00
Trond Myklebust
a10db50a4a NFS: Fix an Oops in NFS unmount
Ensure that the dummy 'root dentry' is invisible to d_find_alias(). If not,
then it may be spliced into the tree if a parent directory from the same
filesystem gets mounted at a later time.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-12-12 11:12:15 -05:00
Trond Myklebust
a5576cfa5c Revert "NFS: Ensure we return zero if applications attempt to write zero bytes"
This reverts commit b9148c6b80.

On Wed, 12 Dec 2007 10:57:30 -0500, Chuck Lever wrote
> commit b9148c6b should be reverted.  It was recently forward-ported
> from some years-old patches, and is clearly not needed now.
>
> On Dec 11, 2007, at 5:21 PM, Adrian Bunk wrote:
>
>> This code became dead after commit
>> b9148c6b80
>> (which BTW doesn't seem to have changed any behaviour) and can
>> therefore
>> be removed.
>>
>> Spotted by the Coverity checker.
>>
>> Signed-off-by: Adrian Bunk <bunk@kernel.org>
>>
>> ---
>> --- linux-2.6/fs/nfs/direct.c.old     2007-12-02 21:54:53.000000000 +0100
>> +++ linux-2.6/fs/nfs/direct.c 2007-12-02 21:55:10.000000000 +0100
>> @@ -897,15 +897,12 @@ ssize_t nfs_file_direct_write(struct kio
>>       if (!count)
>>               goto out;       /* return 0 */
>>
>>       retval = -EINVAL;
>>       if ((ssize_t) count < 0)
>>               goto out;
>> -     retval = 0;
>> -     if (!count)
>> -             goto out;
>>
>>       retval = nfs_sync_mapping(mapping);
>>       if (retval)
>>               goto out;
>>
>>       retval = nfs_direct_write(iocb, iov, nr_segs, pos, count);
>>

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-12-12 11:08:33 -05:00
Trond Myklebust
5cef338b30 NFSv2/v3: Fix a memory leak when using -onolock
Neil Brown said:
> Hi Trond,
> 
> We found that a machine which made moderately heavy use of
> 'automount' was leaking some nfs data structures - particularly the
> 4K allocated by rpc_alloc_iostats.
> It turns out that this only happens with filesystems with -onolock
> set.

> The problem is that if NFS_MOUNT_NONLM is set, nfs_start_lockd doesn't
> set server->destroy, so when the filesystem is unmounted, the
> ->client_acl is not shutdown, and so several resources are still
> held.  Multiple mount/umount cycles will slowly eat away memory
> several pages at a time.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: NeilBrown <neilb@suse.de>
2007-12-11 22:01:56 -05:00
Trond Myklebust
4584f520e1 NFS: Fix NFS mountpoint crossing...
The check that was added to nfs_xdev_get_sb() to work around broken
servers, works fine for NFSv2, but causes mountpoint crossing on NFSv3 to
always return ESTALE.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-12-11 19:01:45 -05:00
Matthew Wilcox
150030b78a NFS: Switch from intr mount option to TASK_KILLABLE
By using the TASK_KILLABLE infrastructure, we can get rid of the 'intr'
mount option.  We have to use _killable everywhere instead of _interruptible
as we get rid of rpc_clnt_sigmask/sigunmask.

Signed-off-by: Liam R. Howlett <howlett@gmail.com>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
2007-12-06 17:40:25 -05:00
Chuck Lever
02fe494619 NFS: Clean up new multi-segment direct I/O changes
Simplify calling sequence of nfs_direct_{read,write}_schedule(), and
rename them to reflect their new role.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-26 16:32:40 -05:00
Chuck Lever
b9148c6b80 NFS: Ensure we return zero if applications attempt to write zero bytes
A zero byte count direct write request should be a successful no-op, not an
error.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-26 16:32:38 -05:00
Chuck Lever
c216fd708e NFS: Support multiple segment iovecs in the NFS direct I/O path
Allow applications to perform asynchronous scatter-gather direct I/O
to NFS files.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-26 16:32:36 -05:00
Chuck Lever
19f737879c NFS: Introduce iovec I/O helpers to fs/nfs/direct.c
Add helpers that iterate over multi-segment iovecs.  These will
be used to support multi-segment scatter/gather direct I/O in a
later patch.

Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-26 16:32:35 -05:00
Adrian Bunk
4c30d56edc NFS: fs/nfs/dir.c should #include "internal.h"
Every file should include the headers containing the prototypes for its global
functions (in this case nfs_access_cache_shrinker()).

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-26 16:24:49 -05:00
Adrian Bunk
5334eb13d4 NFS: make nfs_wb_page_priority() static
nfs_wb_page_priority() can now become static.

Signed-off-by: Adrian Bunk <bunk@kernel.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-26 16:24:48 -05:00
Russell King
f16c960332 NFS: mount failure causes bad page state
While testing a kernel based upon ecd744eec3
(with wrong boot arguments), I got the following bad page state entry while
NFS was trying to mount it's rootfs:

IP-Config: Complete:
      device=eth0, addr=192.168.1.101, mask=255.255.255.0, gw=255.255.255.255,
     host=192.168.1.101, domain=, nis-domain=(none),
     bootserver=192.168.1.100, rootserver=192.168.1.100, rootpath=
Looking up port of RPC 100003/2 on 192.168.1.100
rpcbind: server 192.168.1.100 not responding, timed out
Root-NFS: Unable to get nfsd port number from server, using default
Looking up port of RPC 100005/1 on 192.168.1.100
rpcbind: server 192.168.1.100 not responding, timed out
Root-NFS: Unable to get mountd port number from server, using default
mount: server 192.168.1.100 not responding, timed out
Root-NFS: Server returned error -5 while mounting /nfs/rootfs/
VFS: Unable to mount root fs via NFS, trying floppy.
Bad page state in process 'swapper'
page:c02b1260 flags:0x00000400 mapping:00000000 mapcount:0 count:0
Trying to fix it up, but a reboot is needed
Backtrace:
[<c0023e34>] (dump_stack+0x0/0x14) from [<c0062570>] (bad_page+0x70/0xac)
[<c0062500>] (bad_page+0x0/0xac) from [<c0064914>] (free_hot_cold_page+0x80/0x178)
[<c0064894>] (free_hot_cold_page+0x0/0x178) from [<c0064a74>] (free_hot_page+0x14/0x18)
[<c0064a60>] (free_hot_page+0x0/0x18) from [<c0067078>] (put_page+0xf8/0x154)
[<c0066f80>] (put_page+0x0/0x154) from [<c007dbc8>] (kfree+0xc8/0xd0)
[<c007db00>] (kfree+0x0/0xd0) from [<c00cbb54>] (nfs_get_sb+0x230/0x710)
[<c00cb924>] (nfs_get_sb+0x0/0x710) from [<c0084334>] (vfs_kern_mount+0x58/0xac)[<c00842dc>] (vfs_kern_mount+0x0/0xac) from [<c00843c0>] (do_kern_mount+0x38/0xf4)
[<c0084388>] (do_kern_mount+0x0/0xf4) from [<c0099c7c>] (do_mount+0x1e8/0x614)
...

This seems to be caused by use of an uninitialised structure due to NULL
options being passed to nfs_validate_mount_data().  Ensure that the
parsed mount data is always initialised.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
     (Trond: added fix for the same bug in nfs4_validate_mount_data()).
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-26 16:24:22 -05:00
Neil Brown
4c1fe2f78a kernel BUG at fs/nfs/namespace.c:108! - can be triggered by bad server
Hi Trond,

I have discovered that the BUG_ON in nfs_follow_mountpoint:

	BUG_ON(IS_ROOT(dentry));

can be triggered by a misbehaving server.

What happens is the client does a lookup and discoveres that the named
directory has a different fsid, so it initiates a mount.
It then performs a GETATTR on the mounted directory and gets a
different fsid again (due to a bug in the NFS server).
This causes nfs_follow_mountpoint to be called on the newly mounted
root, which triggers the BUG_ON.

To duplicate this, have a directory which contains some mountpoints,
and export that directory with the "crossmnt" flag using nfs-utils
1.1.1 (or 1.1.0 I think)

The GETATTR on the root of the mounted filesystem will return the
information for the top exportpoint, while a lookup will return the
correct information.  This difference causes the NFS client to BUG.

I think the best way to fix this is to trap this possibility early, so
just before completing the mount in the NFS client, check that it isn't
going to use nfs_mountpoint_inode_operations.
As long as i_op will never change once set (is that true?), this
should be adequately safe.

The following patch shows a possible approach, and it works for me.
i.e. when the NFS server is misbehaving, I get ESTALE on those
mountpoints, while when the NFS server is working correctly, I get
correct behaviour on the client.

NeilBrown

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-17 13:08:48 -05:00
Trond Myklebust
b09b9417d0 NFS: Fix the ustat() regression
Since 2.6.18, the superblock sb->s_root has been a dummy dentry with a
dummy inode. This breaks ustat(), which actually uses sb->s_root in a
vfstat() call.

Fix this by making the s_root a dummy alias to the directory inode that was
used when creating the superblock.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-11-17 13:08:44 -05:00
Neil Brown
432409eebc NFS: Fix for bug in handling of errors for O_DIRECT writes
Commit eda3cef8dd ("NFS: Fix error
handling in nfs_direct_write_result()") ensured that if a WRITE returns
an error, then data->res.verf->committed is not tested (as it is not
initialised).

Then commit 60fa3f769f ("NFS: Fix two bugs
in the O_DIRECT write code") inadvertently reverted this while fixing
other problems.

So move the test so that we never examine ->committed in an error case,
and fix a speeling error while we are there.

Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-23 16:41:21 -07:00
Trond Myklebust
55b70a0300 NFS: Fix a typo in nfs_call_unlink()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-21 13:37:07 -04:00
Trond Myklebust
bad2a52411 NFSv2: Ensure that the directory metadata gets revalidated on file create
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-21 13:37:02 -04:00
Linus Torvalds
c00046c279 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (74 commits)
  fix do_sys_open() prototype
  sysfs: trivial: fix sysfs_create_file kerneldoc spelling mistake
  Documentation: Fix typo in SubmitChecklist.
  Typo: depricated -> deprecated
  Add missing profile=kvm option to Documentation/kernel-parameters.txt
  fix typo about TBI in e1000 comment
  proc.txt: Add /proc/stat field
  small documentation fixes
  Fix compiler warning in smount example program from sharedsubtree.txt
  docs/sysfs: add missing word to sysfs attribute explanation
  documentation/ext3: grammar fixes
  Documentation/java.txt: typo and grammar fixes
  Documentation/filesystems/vfs.txt: typo fix
  include/asm-*/system.h: remove unused set_rmb(), set_wmb() macros
  trivial copy_data_pages() tidy up
  Fix typo in arch/x86/kernel/tsc_32.c
  file link fix for Pegasus USB net driver help
  remove unused return within void return function
  Typo fixes retrun -> return
  x86 hpet.h: remove broken links
  ...
2007-10-19 20:36:17 -07:00
Linus Torvalds
b35e704118 Avoid compile error in fs/nfs/unlink.c
Erez Zadok reports that certain configurations fail to build due to
schedule() TASK_[UN]INTERRUPTIBLE not being declared.  Add proper
include files to fix.

Cc: Erez Zadok <ezk@cs.sunysb.edu>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 19:59:18 -07:00
Olof Johansson
e9a404580c nfs: Fix build break with CONFIG_NFS_V4=n
Signed-off-by: Olof Johansson <olof@lixom.net>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-19 19:27:46 -07:00
Jan Engelhardt
96de0e252c Convert files to UTF-8 and some cleanups
* Convert files to UTF-8.

  * Also correct some people's names
    (one example is Eißfeldt, which was found in a source file.
    Given that the author used an ß at all in a source file
    indicates that the real name has in fact a 'ß' and not an 'ss',
    which is commonly used as a substitute for 'ß' when limited to
    7bit.)

  * Correct town names (Goettingen -> Göttingen)

  * Update Eberhard Mönkeberg's address (http://lkml.org/lkml/2007/1/8/313)

Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
2007-10-19 23:21:04 +02:00
Trond Myklebust
603c83da19 NFSv4: Fix an rpc_cred reference leakage in fs/nfs/delegation.c
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-19 17:19:30 -04:00
Trond Myklebust
a49c3c7736 NFSv4: Ensure that we wait for the CLOSE request to complete
Otherwise, we do end up breaking close-to-open semantics. We also end up
breaking some of the silly-rename tests in Connectathon on some setups.

Please refer to the bug-report at
	http://bugzilla.linux-nfs.org/show_bug.cgi?id=150

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-19 17:19:25 -04:00
Trond Myklebust
565277f63c NFS: Fix a race in sillyrename
lookup() and sillyrename() can race one another because the sillyrename()
completion cannot take the parent directory's inode->i_mutex since the
latter may be held by whoever is calling dput().

We therefore have little option but to add extra locking to ensure that
nfs_lookup() and nfs_atomic_open() do not race with the sillyrename
completion.
If somebody has looked up the sillyrenamed file in the meantime, we just
transfer the sillydelete information to the new dentry.

Please refer to the bug-report at
	http://bugzilla.linux-nfs.org/show_bug.cgi?id=150

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-19 17:19:16 -04:00
Trond Myklebust
61e930a904 NFS: Fix a writeback race...
This patch fixes a regression that was introduced by commit
44dd151d5c

We cannot zero the user page in nfs_mark_uptodate() any more, since

  a) We'd be modifying the page without holding the page lock
  b) We can race with other updates of the page, most notably
     because of the call to nfs_wb_page() in nfs_writepage_setup().

Instead, we do the zeroing in nfs_update_request() if we see that we're
creating a request that might potentially be marked as up to date.

Thanks to Olivier Paquet for reporting the bug and providing a test-case.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-19 17:18:57 -04:00
Jeff Layton
188b95dd8e NFS: if ATTR_KILL_S*ID bits are set, then skip mode change
If the ATTR_KILL_S*ID bits are set then any mode change is only for clearing
the setuid/setgid bits.  For NFS, skip the mode change and let the server
handle it.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-18 14:37:22 -07:00
Christoph Lameter
4ba9b9d0ba Slab API: remove useless ctor parameter and reorder parameters
Slab constructors currently have a flags parameter that is never used.  And
the order of the arguments is opposite to other slab functions.  The object
pointer is placed before the kmem_cache pointer.

Convert

        ctor(void *object, struct kmem_cache *s, unsigned long flags)

to

        ctor(struct kmem_cache *s, void *object)

throughout the kernel

[akpm@linux-foundation.org: coupla fixes]
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:45 -07:00
Peter Zijlstra
c9e51e4180 mm: count reclaimable pages per BDI
Count per BDI reclaimable pages; nr_reclaimable = nr_dirty + nr_unstable.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:45 -07:00
Peter Zijlstra
e0bf68ddec mm: bdi init hooks
provide BDI constructor/destructor hooks

[akpm@linux-foundation.org: compile fix]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:45 -07:00
Peter Zijlstra
c4dc4beed2 nfs: remove congestion_end()
These patches aim to improve balance_dirty_pages() and directly address three
issues:
  1) inter device starvation
  2) stacked device deadlocks
  3) inter process starvation

1 and 2 are a direct result from removing the global dirty limit and using
per device dirty limits. By giving each device its own dirty limit is will
no longer starve another device, and the cyclic dependancy on the dirty limit
is broken.

In order to efficiently distribute the dirty limit across the independant
devices a floating proportion is used, this will allocate a share of the total
limit proportional to the device's recent activity.

3 is done by also scaling the dirty limit proportional to the current task's
recent dirty rate.

This patch:

nfs: remove congestion_end().  It's redundant, clear_bdi_congested() already
wakes the waiters.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 08:42:44 -07:00
Nick Piggin
4899f9c852 nfs: convert to new aops
[akpm@linux-foundation.org: fix against git-nfs]
[peterz@infradead.org: fix against git-nfs]
Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-16 09:42:57 -07:00
Linus Torvalds
541010e4b8 Merge branch 'locks' of git://linux-nfs.org/~bfields/linux
* 'locks' of git://linux-nfs.org/~bfields/linux:
  nfsd: remove IS_ISMNDLCK macro
  Rework /proc/locks via seq_files and seq_list helpers
  fs/locks.c: use list_for_each_entry() instead of list_for_each()
  NFS: clean up explicit check for mandatory locks
  AFS: clean up explicit check for mandatory locks
  9PFS: clean up explicit check for mandatory locks
  GFS2: clean up explicit check for mandatory locks
  Cleanup macros for distinguishing mandatory locks
  Documentation: move locks.txt in filesystems/
  locks: add warning about mandatory locking races
  Documentation: move mandatory locking documentation to filesystems/
  locks: Fix potential OOPS in generic_setlease()
  Use list_first_entry in locks_wake_up_blocks
  locks: fix flock_lock_file() comment
  Memory shortage can result in inconsistent flocks state
  locks: kill redundant local variable
  locks: reverse order of posix_locks_conflict() arguments
2007-10-15 16:07:40 -07:00
Trond Myklebust
05c88babab NFSv4: Fix a typo in nfs_inode_reclaim_delegation
We were intending to put the previous instance of delegation->cred
before setting a new one.

Thanks to David Howells for spotting this.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-11 15:11:51 -04:00
Pavel Emelyanov
dfad9441be NFS: clean up explicit check for mandatory locks
The __mandatory_lock(inode) macro makes the same check, but makes the code
more readable.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2007-10-09 18:32:46 -04:00
Trond Myklebust
f43bf0bebe NFS: Add a boot parameter to disable 64 bit inode numbers
This boot parameter will allow legacy 32-bit applications which call stat()
to continue to function even if the NFSv3/v4 server uses 64-bit inode
numbers.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:20:52 -04:00
Trond Myklebust
2a3f5fd459 NFS: nfs_refresh_inode should clear cache_validity flags on success
If the cached attributes match the ones supplied in the fattr, then assume
we've revalidated the inode.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:20:50 -04:00
Trond Myklebust
40d2470409 NFS: Fix a connectathon regression in NFSv3 and NFSv4
We're failing basic test6 against Linux servers because they lack a correct
change attribute. The fix is to assume that we always want to invalidate
the readdir caches when we call update_changeattr and/or
nfs_post_op_update_inode on a directory.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:20:47 -04:00
Trond Myklebust
9e08a3c5ae NFS: Use nfs_refresh_inode() in ops that aren't expected to change the inode
nfs_post_op_update_inode() is really only meant to be used if we expect the
inode and its attributes to have changed in some way.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:20:45 -04:00
Trond Myklebust
c7c209730d NFS: Get rid of some obsolete macros
- NFS_READTIME, NFS_CHANGE_ATTR are completely unused.
- Inline the few remaining uses of NFS_ATTRTIMEO, and remove.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:20:23 -04:00
Trond Myklebust
4f48af4584 NFS: Simplify filehandle revalidation
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:20:20 -04:00
Trond Myklebust
9697d2342e NFS: Ensure that nfs_link() returns a hashed dentry
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:20:18 -04:00
Trond Myklebust
a12802cab8 NFS: Be strict about dentry revalidation when doing exclusive create
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:20:16 -04:00
Trond Myklebust
b050aa791f NFS: Don't zap the readdir caches upon error
If necessary, the caches will get zapped under normal revalidation.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:20:13 -04:00
Trond Myklebust
efbb06b7f9 NFS: Remove the redundant nfs_reval_fsid()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:20:11 -04:00
Trond Myklebust
81c768808c NFSv3: Always use directory post-op attributes in nfs3_proc_lookup
LOOKUP returns the directory post-op attributes whether or not the
operation was successful.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:20:08 -04:00
Trond Myklebust
d75340cc4d NFSv4: Fix nfs_atomic_open() to set the verifier on negative dentries too
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:20:06 -04:00
Trond Myklebust
216d5d0688 NFSv4: Use NFSv2/v3 rules for negative dentries in nfs_open_revalidate
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:20:03 -04:00
Trond Myklebust
0a5ebc1488 NFSv4: Don't revalidate the directory in nfs_atomic_lookup()
Why bother, since the call to nfs4_atomic_open() will do it for us.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:20:01 -04:00
Trond Myklebust
f2c77f4e62 NFS: Optimise nfs_lookup_revalidate()
We don't need to call nfs_revalidate_inode() on the directory if we already
know that the verifiers don't match.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:19:58 -04:00
Trond Myklebust
6d2b296686 NFS: Reset nfsi->last_updated only if the attribute changed
Otherwise set it to nfsi->read_cache_jiffies in order to prevent jiffy
wraparound issues.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:19:55 -04:00
Trond Myklebust
60ccd4ec41 NFS: Remove nfs_begin_data_update/nfs_end_data_update
The lower level routines in fs/nfs/proc.c, fs/nfs/nfs3proc.c and
fs/nfs/nfs4proc.c should already be dealing with the revalidation issues.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:19:53 -04:00
Trond Myklebust
80eb209def NFS: Remove NFS_I(inode)->data_updates
We have no more users...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:19:50 -04:00
Trond Myklebust
a1643a92f6 NFS: NFS_CACHEINV() should not test for nfs_caches_unstable()
The fact that we're in the process of modifying the inode does not mean
that we should not invalidate the attribute and data caches. The defensive
thing is to always invalidate when we're confronted with inode
mtime/ctime or change_attribute updates that we do not immediately
recognise.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:19:48 -04:00
Trond Myklebust
3258b4fa55 NFS: Remove bogus nfs_mark_for_revalidate() in nfs_lookup
The parent of the newly materialised dentry has just been revalidated...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:19:45 -04:00
Trond Myklebust
cf8ba45e05 NFS: don't cache the verifer across ->lookup() calls
If the ->lookup() call causes the directory verifier to change, then there
is still no need to use the old verifier, since our dentry has been
verified.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:19:42 -04:00
Trond Myklebust
7668fdbe9a NFS: nfs_post_op_update_inode don't update cache_change_attribute
If nfs_post_op_update_inode fails because the server didn't return any
attributes, then we let the subsequent inode revalidation update
cache_change_attribute.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:19:37 -04:00
Trond Myklebust
12b373ebf0 NFS: Don't revalidate dentries on directory size or ctime changes
We only need to look at the mtime changes...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:19:35 -04:00
Trond Myklebust
2f78e4313a NFS: Don't set cache_change_attribute in nfs_revalidate_mapping
The attribute revalidation code will already have taken care of resetting
nfsi->cache_change_attribute.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:19:32 -04:00
Trond Myklebust
446e534985 NFS: Fix a bug in nfs_open_revalidate()
We want to set the verifier when the call to nfs4_open_revalidate()
_succeeds_.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:19:30 -04:00
Trond Myklebust
d4d9cdcb47 NFS: Don't hash the negative dentry when optimising for an O_EXCL open
We don't want to leave an unverified hashed negative dentry if the
exclusive create fails to complete.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:19:27 -04:00
Trond Myklebust
5724ab3787 NFS: nfs_instantiate() should set the dentry verifier
That will also allow us to remove the calls in mknod and mkdir.
In addition it will ensure that symlinks set it correctly.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:19:25 -04:00
Trond Myklebust
fab728e156 NFS: Ensure nfs_instantiate() invalidates the parent dir on error
Also ensure that it drops the dentry in this case.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:19:22 -04:00
Trond Myklebust
4b841736bc NFS: Fix nfs_verify_change_attribute()
We don't care about whether or not some other process on our client is
changing the directory while we're in nfs_lookup_revalidate(), because the
dcache will take care of ensuring local atomicity.
We can therefore remove the test for nfs_caches_unstable().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:19:20 -04:00
Trond Myklebust
70ca88521f NFS: Fake up 'wcc' attributes to prevent cache invalidation after write
NFSv2 and v4 don't offer weak cache consistency attributes on WRITE calls.
In NFSv3, returning wcc data is optional. In all cases, we want to prevent
the client from invalidating our cached data whenever ->write_done()
attempts to update the inode attributes.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:19:15 -04:00
Trond Myklebust
b64e8a5ef7 NFS: Remove bogus check of cache_change_attribute in nfs_update_inode
Remove the bogus 'data_stable' check in nfs_update_inode. The
cache_change_attribute tells you if the directory changed on the server,
and should have nothing to do with the file length.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:19:11 -04:00
Trond Myklebust
7fdc49c4e4 NFS: Fix the ESTALE "revalidation" in _nfs_revalidate_inode()
For one thing, the test NFS_ATTRTIMEO() == 0 makes no sense: we're
testing whether or not the cache timeout length is zero, which is totally
unrelated to the issue of whether or not we trust the file staleness.

Secondly, we do not want to retry the GETATTR once a file has been declared
stale by the server: we rather want to discard that inode as soon as
possible, since there are broken servers still in use out there that reuse
filehandles on new files.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:19:08 -04:00
Trond Myklebust
8850df999c NFS: Fix atime revalidation in read()
NFSv3 will correctly update atime on a read() call, so there is no need to
set the NFS_INO_INVALID_ATIME flag unless the call to nfs_refresh_inode()
fails.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:19:06 -04:00
Trond Myklebust
c481299839 NFS: Fix atime revalidation in readdir()
NFSv3 will correctly update atime on a readdir call, so there is no need to
set the NFS_INO_INVALID_ATIME flag unless the call to nfs_refresh_inode()
fails.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:19:03 -04:00
Trond Myklebust
57fa76f2da NFS: Don't use readdirplus data if the page cache is invalid
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:19:00 -04:00
Trond Myklebust
47aabaa7e4 NFSv4: Don't use ctime/mtime for determining when to invalidate the caches
In NFSv4 we should only be looking at the change attribute.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:18:57 -04:00
Trond Myklebust
17cadc9537 NFS: Don't force a dcache revalidation if nfs_wcc_update_inode succeeds
The reason is that if the weak cache consistency update was successful,
then we know that our client must be the only one that changed the
directory, and we've already updated the dcache to reflect the change.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:18:55 -04:00
Trond Myklebust
e323ea46d9 NFS: nfs_wcc_update_inode: directory caches are always invalidated
We must ensure that the readdir data is always invalidated whether or not
the weak cache consistency data update succeeds.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:18:51 -04:00
Trond Myklebust
6ecc5e8fca NFS: Fix dcache revalidation bugs
We don't need to force a dentry lookup just because we're making changes to
the directory.

Don't update nfsi->cache_change_attribute in nfs_end_data_update: that
overrides the NFSv3/v4 weak consistency checking that tells us our update
was the only one, and that tells us the dcache is still valid.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:18:49 -04:00
Trond Myklebust
7957c1418f NFS: fix nfs_verify_change_attribute
We always want to check that the verifier and directory
cache_change_attribute match. This also allows us to remove the 'wraparound
hack' for the cache_change_attribute. If we're only checking for equality,
then we don't care about wraparound issues.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:18:46 -04:00
Trond Myklebust
68e8a70d3c NFS: nfs_post_op_update_inode() should call nfs_refresh_inode()
Ensure that we don't clobber the results from a more recent getattr call...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:18:43 -04:00
Trond Myklebust
f2115dc987 NFS: Fix over-conservative attribute invalidation in nfs_update_inode()
We should always be declaring the attribute cache as valid after having
updated it.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:18:40 -04:00
Trond Myklebust
76b32999df NFSv4: Make NFSv4 ACCESS calls return attributes too...
It doesn't really make sense to cache an access call without also
revalidating the attributes.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:18:38 -04:00
Trond Myklebust
af22f94ae0 NFSv4: Simplify _nfs4_do_access()
Currently, _nfs4_do_access() is just a copy of nfs_do_access() with added
conversion of the open flags into an access mask. This patch merges the
duplicate functionality.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:18:34 -04:00
Trond Myklebust
cd3758e37d NFS: Replace file->private_data with calls to nfs_file_open_context()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:18:31 -04:00
Chuck Lever
8fb559f87f NFS: Eliminate nfs_refresh_verifier()
nfs_set_verifier() and nfs_refresh_verifier() do exactly the same thing, so
replace one with the other.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:18:26 -04:00
Chuck Lever
77a55a1fe8 NFS: Eliminate nfs_renew_times()
The nfs_renew_times() function plants the current time in jiffies in
dentry->d_time.  But a call to nfs_renew_times() is always followed by
another call that overwrites dentry->d_time.  Get rid of the
nfs_renew_times() calls.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:18:24 -04:00
Chuck Lever
92f6c17825 NFS: Don't call nfs_renew_times() in nfs_dentry_iput()
Negative dentries need to be reverified after an asynchronous unlink.

Quoth Trond:

"Unfortunately I don't think that we can avoid revalidating the
resulting negative dentry since the UNLINK call is asynchronous,
and so the new verifier on the directory will only be known a
posteriori."

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:18:22 -04:00
Chuck Lever
bcf35617a7 NFS: Show "nointr" mount option
The default "intr" setting is different for NFS and NFSv4.  To avoid
confusion on this issue, don't hide the "nointr" option in /proc/mounts.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:18:17 -04:00
Chuck Lever
6e88e0618c NFS: Verify server address before invoking in-kernel mount client
Re-order mount option sanity checking slightly to ensure we have a valid
server address *before* trying to do the mountd RPC call.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:18:14 -04:00
\"Talpey, Thomas\
2cf7ff7a37 NFS: support RDMA mounts
Adds hooks to the string-based NFS mount to support an "rdma" protocol option.

Signed-off-by: Tom Talpey <tmt@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:18:00 -04:00
\"Talpey, Thomas\
56928edd5a NFS - print accurate transport protocol
Use the per-transport strings to display the transport protocol accurately.

Signed-off-by: Tom Talpey <tmt@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:17:55 -04:00
\"Talpey, Thomas\
0896a725a1 NFS/SUNRPC: use transport protocol naming
Instead of an { address family, raw IP protocol number }-tuple, use the
newly-defined RPC identifier when creating clients in the upper layers.

Signed-off-by: Tom Talpey <tmt@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:17:53 -04:00
\"Talpey, Thomas\
4f22ccc346 SUNRPC: mark bulk read/write data in xdrbuf
Adds a flag word to the xdrbuf struct which indicates any bulk
disposition of the data. This enables RPC transport providers to
marshal it efficiently/appropriately, and may enable other
optimizations.

Signed-off-by: Tom Talpey <tmt@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:17:34 -04:00
Trond Myklebust
20c71f5e0f NFSv4: Fix a bug in nfs4_validate_mount_data()
The previous patch introduced a bug when copying the server address.

Also clarify a copy into the auth_flavours array: currently the two
size calculations are equivalent, but we may decide to change the size
of auth_flavors[] at some point.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:17:31 -04:00
\"Talpey, Thomas\
91ea40b9c6 NFS: use in-kernel mount argument structure for nfsv4 mounts
The user-visible nfs4_mount_data does not contain sufficient data to
describe new mount options, and also is now a legacy structure. Replace
it with the internal nfs_parsed_mount_data for nfsv4 in-kernel use.

Signed-off-by: Tom Talpey <tmt@netapp.com>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:17:28 -04:00
\"Talpey, Thomas\
2283f8d6ed NFS: use in-kernel mount argument structure for nfsv[23] mounts
The user-visible nfs_mount_data does not contain sufficient data to
describe new mount options, and also is now a legacy structure. Replace
it with the internal nfs_parsed_mount_data for nfsv[23] in-kernel use.

Signed-off-by: Tom Talpey <tmt@netapp.com>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:17:26 -04:00
\"Talpey, Thomas\
6b18eaa082 NFS: move nfs_parsed_mount_data structure definition
In preparation for rearranging the nfs mount argument passing, make the
nfs_parsed_mount_data struct visible across nfs kernel files.

Signed-off-by: Tom Talpey <tmt@netapp.com>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:17:23 -04:00
Chuck Lever
fe82a183ca NFS: Convert printk's to dprintk's in fs/nfs/nfs?xdr.c
Due to recent edict to replace or remove printk's that can be triggered en
masse by remote misbehavior.  Left a few that only occur just before a BUG.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:17:09 -04:00
Chuck Lever
0ac83779fa NFS: Add new 'mountaddr=' mount option
I got the 'mounthost=' option wrong - it shouldn't look for an address
value, but rather a hostname value.  However, the in-kernel mount client
and NFS client cannot resolve a hostname by themselves; they rely on
user-land to pass in the resolved address.

Create a new mount option that does take an address so that the mount
program's address can be passed in.  The mount hostname is now ignored
by the kernel.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:17:06 -04:00
James Lentini
aad7000735 [NFS] [PATCH] NFS: initialize default port in kernel mount client
If no mount server port number is specified, the previous change to the
kernel mount client inadvertently allows the NFS server's port number to be
the used as the mount server's port number. If the user specifies an NFS
server port (-o port=x), the mount will fail.

The fix below sets the mount server's port to 0 if no mount server
port is specified by the user.

Signed-off-by: James Lentini <jlentini@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:17:04 -04:00
Chuck Lever
efd8340bb1 NFS: Kernel mount client should use async bind
Simplify the in-kernel mount client by using autobind instead of an
explicit call to rpc_getport_sync.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:17:01 -04:00
Jeff Layton
ddc01c0813 [NFS] [PATCH] NFS: show addr=ipaddr in /proc/mounts rather than
A minor thing, but useful when working with a server with multiple
addrs. This looks like it might also be necessary if Miklos' effort
to eliminate /etc/mtab ever comes to fruition.

When displaying mount options in /proc/mounts, the kernel prints
"addr=hostname". This info is redundant since we already have the
hostname displayed as part of the "device" section of the mount. This
patch changes it to display the IP address to which the socket is
connected.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:15:39 -04:00
Christoph Hellwig
f8cf3678f4 [NFS] [PATCH] nfs: tiny makefile cleanup
no need to set up foo-objs these days.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:15:36 -04:00
Fabio Olive Leite
c7e1596111 Re: [NFS] [PATCH] Attribute timeout handling and wrapping u32 jiffies
I would like to discuss the idea that the current checks for attribute
timeout using time_after are inadequate for 32bit architectures, since
time_after works correctly only when the two timestamps being compared
are within 2^31 jiffies of each other. The signed overflow caused by
comparing values more than 2^31 jiffies apart will flip the result,
causing incorrect assumptions of validity.

2^31 jiffies is a fairly large period of time (~25 days) when compared
to the lifetime of most kernel data structures, but for long lived NFS
mounts that can sit idle for months (think that for some reason autofs
cannot be used), it is easy to compare inode attribute timestamps with
very disparate or even bogus values (as in when jiffies have wrapped
many times, where the comparison doesn't even make sense).

Currently the code tests for attribute timeout by simply adding the
desired amount of jiffies to the stored timestamp and comparing that
with the current timestamp of obtained attribute data with time_after.
This is incorrect, as it returns true for the desired timeout period
and another full 2^31 range of jiffies.

In testing with artificial jumps (several small jumps, not one big
crank) of the jiffies I was able to reproduce a problem found in a
server with very long lived NFS mounts, where attributes would not be
refreshed even after touching files and directories in the server:

Initial uptime:
03:42:01 up 6 min, 0 users, load average: 0.01, 0.12, 0.07

NFS volume is mounted and time is advanced:
03:38:09 up 25 days, 2 min, 0 users, load average: 1.22, 1.05, 1.08

# ls -l /local/A/foo/bar /nfs/A/foo/bar
-rw-r--r--  1 root root 0 Dec 17 03:38 /local/A/foo/bar
-rw-r--r--  1 root root 0 Nov 22 00:36 /nfs/A/foo/bar

# touch /local/A/foo/bar

# ls -l /local/A/foo/bar /nfs/A/foo/bar
-rw-r--r--  1 root root 0 Dec 17 03:47 /local/A/foo/bar
-rw-r--r--  1 root root 0 Nov 22 00:36 /nfs/A/foo/bar

We can see the local mtime is updated, but the NFS mount still shows
the old value. The patch below makes it work:

Initial setup...
07:11:02 up 25 days, 1 min,  0 users,  load average: 0.15, 0.03, 0.04

# ls -l /local/A/foo/bar /nfs/A/foo/bar
-rw-r--r--  1 root root 0 Jan 11 07:11 /local/A/foo/bar
-rw-r--r--  1 root root 0 Jan 11 07:11 /nfs/A/foo/bar

# touch /local/A/foo/bar

# ls -l /local/A/foo/bar /nfs/A/foo/bar
-rw-r--r--  1 root root 0 Jan 11 07:14 /local/A/foo/bar
-rw-r--r--  1 root root 0 Jan 11 07:14 /nfs/A/foo/bar

Signed-off-by: Fabio Olive Leite <fleite@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:15:33 -04:00
Peter Staubach
4e769b934e 64 bit ino support for NFS client
Hi.

Attached is a patch to modify the NFS client code to support
64 bit ino's, as appropriate for the system and the NFS
protocol version.

The code basically just expand the NFS interfaces for routines
which handle ino's from using ino_t to u64 and then uses the
fileid in the nfs_inode instead of i_ino in the inode.  The
code paths that were updated are in the getattr method and
the readdir methods.

This should be no real change on 64 bit platforms.  Since
the ino_t is an unsigned long, it would already be 64 bits
wide.

    Thanx...

           ps

Signed-off-by: Peter Staubach <staubach@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:15:29 -04:00
Trond Myklebust
7b159fc18d NFS: Fall back to synchronous writes when a background write errors...
This helps prevent huge queues of background writes from building up
whenever the server runs out of disk or quota space, or if someone changes
the file access modes behind our backs.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:15:23 -04:00
Trond Myklebust
34901f70d1 NFS: Writeback optimisation
Schedule writes using WB_SYNC_NONE first, then come back for a second pass
using WB_SYNC_ALL.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:15:21 -04:00
Trond Myklebust
ed90ef51a3 NFS: Clean up NFS writeback flush code
The only user of nfs_sync_mapping_range() is nfs_getattr(), which uses it
to flush out the entire inode without sending a commit. We therefore
replace nfs_sync_mapping_range with a more appropriate helper.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:15:18 -04:00
Trond Myklebust
f758c88519 NFS: Clean up nfs_writepages()
Just call write_cache_pages directly instead of hacking the writeback
control structure in order to find out if we were called from writepages()
or directly from the VM.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:15:13 -04:00
Trond Myklebust
9cccef9505 NFS: Clean up write code...
The addition of nfs_page_mkwrite means that We should no longer need to
create requests inside nfs_writepage()

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:15:11 -04:00
Trond Myklebust
94387fb1aa NFS: Add the helper nfs_vm_page_mkwrite
This is needed in order to set up a proper nfs_page request for mmapped
files.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-10-09 17:15:08 -04:00
Trond Myklebust
54af3bb543 NFS: Fix an Oops in encode_lookup()
It doesn't look as if the NFS file name limit is being initialised correctly
in the struct nfs_server. Make sure that we limit whatever is being set in
nfs_probe_fsinfo() and nfs_init_server().

Also ensure that readdirplus and nfs4_path_walk respect our file name
limits.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-28 15:36:42 -07:00
Alexey Dobriyan
49af7ee181 nfs: fix oops re sysctls and V4 support
NFS unregisters sysctls only if V4 support is compiled in.  However, sysctl
table is not V4 specific, so unregister it always.

Steps to reproduce:

	[build nfs.ko with CONFIG_NFS_V4=n]
	modrobe nfs
	rmmod nfs
	ls /proc/sys

Unable to handle kernel paging request at ffffffff880661c0 RIP:
 [<ffffffff802af8e3>] proc_sys_readdir+0xd3/0x350
PGD 203067 PUD 207063 PMD 7e216067 PTE 0
Oops: 0000 [1] SMP
CPU 1
Modules linked in: lockd nfs_acl sunrpc
Pid: 3335, comm: ls Not tainted 2.6.23-rc3-bloat #2
RIP: 0010:[<ffffffff802af8e3>]  [<ffffffff802af8e3>] proc_sys_readdir+0xd3/0x350
RSP: 0018:ffff81007fd93e78  EFLAGS: 00010286
RAX: ffffffff880661c0 RBX: ffffffff80466370 RCX: ffffffff880661c0
RDX: 00000000000014c0 RSI: ffff81007f3ad020 RDI: ffff81007efd8b40
RBP: 0000000000000018 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: ffffffff802a8570 R12: ffffffff880661c0
R13: ffff81007e219640 R14: ffff81007efd8b40 R15: ffff81007ded7280
FS:  00002ba25ef03060(0000) GS:ffff81007ff81258(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: ffffffff880661c0 CR3: 000000007dfaf000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process ls (pid: 3335, threadinfo ffff81007fd92000, task ffff81007d8a0000)
Stack:  ffff81007f3ad150 ffffffff80283f30 ffff81007fd93f48 ffff81007efd8b40
 ffff81007ee00440 0000000422222222 0000000200035593 ffffffff88037e9a
 2222222222222222 ffffffff80466500 ffff81007e416400 ffff81007e219640
Call Trace:
 [<ffffffff80283f30>] filldir+0x0/0xf0
 [<ffffffff80283f30>] filldir+0x0/0xf0
 [<ffffffff802840c7>] vfs_readdir+0xa7/0xc0
 [<ffffffff80284376>] sys_getdents+0x96/0xe0
 [<ffffffff8020bb3e>] system_call+0x7e/0x83

Code: 41 8b 14 24 85 d2 74 dc 49 8b 44 24 08 48 85 c0 74 e7 49 3b
RIP  [<ffffffff802af8e3>] proc_sys_readdir+0xd3/0x350
 RSP <ffff81007fd93e78>
CR2: ffffffff880661c0
Kernel panic - not syncing: Fatal exception

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-19 11:24:18 -07:00
Trond Myklebust
1b3b4a1a2d NFS: Fix a write request leak in nfs_invalidate_page()
Ryusuke Konishi says:

The recent truncate_complete_page() clears the dirty flag from a page
before calling a_ops->invalidatepage(),
^^^^^^
static void
truncate_complete_page(struct address_space *mapping, struct page *page)
{
        ...
        cancel_dirty_page(page, PAGE_CACHE_SIZE);  <--- Inserted here at
kernel 2.6.20

        if (PagePrivate(page))
                do_invalidatepage(page, 0);   ---> will call
a_ops->invalidatepage()
        ...
}

and this is disturbing nfs_wb_page_priority() from calling 
nfs_writepage_locked() that is expected to handle the pending
request (=nfs_page) associated with the page.

int nfs_wb_page_priority(struct inode *inode, struct page *page, int how)
{
        ...
        if (clear_page_dirty_for_io(page)) {
                ret = nfs_writepage_locked(page, &wbc);
                if (ret < 0)
                        goto out;
        }
        ...
}

Since truncate_complete_page() will get rid of the page after
a_ops->invalidatepage() returns, the request (=nfs_page) associated
with the page becomes a garbage in nfs_inode->nfs_page_tree.
------------------------

Fix this by ensuring that nfs_wb_page_priority() recognises that it may
also need to clear out non-dirty pages that have an nfs_page associated
with them.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-09-01 10:14:54 -04:00
Chuck Lever
7d1cca7299 NFS: change NFS mount error return when hostname/pathname too long
According to the mount(2) man page, the proper error return code for the
mount(2) system call when the special device name or the mounted-on
directory name is too long is ENAMETOOLONG.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-09-01 10:14:40 -04:00
Chuck Lever
350c73af6a NFS: Off-by-one length error in string handling
The hostname was getting truncated in the new text-based NFS mount API.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-09-01 10:14:40 -04:00
Chuck Lever
fdc6e2c8c0 NFS: Return a real error code from mount(2)
Don't filter the return code from the in-kernel rpcbind or NFS mount
clients.  Return the real error code so that callers of the new NFS
text-based mount API can apply a useful retry strategy.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-09-01 10:14:39 -04:00
Chuck Lever
fdb66ff4ac NFS: mount option parser chokes on proto=
The new text-based NFS mount option parsing logic doesn't recognize any
valid transport protocols due to a silly mistake in the protocol token
matching logic.  This prevents basic mount requests such as:

   mount.nfs server:/export /mnt -o proto=tcp

from working with the new text-based NFS mount API.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-09-01 10:14:38 -04:00
Trond Myklebust
deee9369b9 NFSv4: Ensure that we pass the correct dentry to nfs4_intent_set_file
This patch fixes an Oops that was reported by Gabriel Barazer.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-09-01 10:14:38 -04:00
Trond Myklebust
65bbf6bdbb NFSv4: Fix a typo in _nfs4_do_open_reclaim
This should fix the following Oops reported by Jeff Garzik:

kernel BUG at fs/nfs/nfs4xdr.c:1040!
invalid opcode: 0000 [1] SMP 
CPU 0 
Modules linked in: nfs lockd sunrpc af_packet
ipv6 cpufreq_ondemand acpi_cpufreq battery floppy nvram sg snd_hda_intel
ata_generic snd_pcm_oss snd_mixer_oss snd_pcm i2c_i801 snd_page_alloc e1000
firewire_ohci ata_piix i2c_core sr_mod cdrom sata_sil ahci libata sd_mod
scsi_mod ext3 jbd ehci_hcd uhci_hcd
Pid: 16353, comm: 10.10.10.1-recl Not tainted 2.6.23-rc3 #1
RIP: 0010:[<ffffffff88240980>] [<ffffffff88240980>] :nfs:encode_open+0x1c0/0x330
RSP: 0018:ffff8100467c5c60  EFLAGS: 00010202
RAX: ffff81000f89b8b8 RBX: 00000000697a6f6d RCX: ffff81000f89b8b8
RDX: 0000000000000004 RSI: 0000000000000004 RDI: ffff8100467c5c80
RBP: ffff8100467c5c80 R08: ffff81000f89bc30 R09: ffff81000f89b83f
R10: 0000000000000001 R11: ffffffff881e79e0 R12: ffff81003cbd1808
R13: ffff81000f89b860 R14: ffff81005fc984e0 R15: ffffffff88240af0
FS:  0000000000000000(0000) GS:ffffffff8052a000(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00002adb9e51a030 CR3: 000000007ea7e000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process 10.10.10.1-recl (pid: 16353, threadinfo ffff8100467c4000, task ffff8100038ce780)
Stack:  ffff81004aeb6a40 ffff81003cbd1808 ffff81003cbd1808 ffffffff88240b5d
 ffff81000f89b8bc ffff81005fc984e8 ffff81000f89bc30 ffff81005fc984e8
 0000000300000000 0000000000000000 0000000000000000 ffff81003cbd1800
Call Trace:
 [<ffffffff88240b5d>] :nfs:nfs4_xdr_enc_open_noattr+0x6d/0x90
 [<ffffffff881e74b7>] :sunrpc:rpcauth_wrap_req+0x97/0xf0
 [<ffffffff88240af0>] :nfs:nfs4_xdr_enc_open_noattr+0x0/0x90
 [<ffffffff881df57a>] :sunrpc:call_transmit+0x18a/0x290
 [<ffffffff881e5e7b>] :sunrpc:__rpc_execute+0x6b/0x290
 [<ffffffff881dff76>] :sunrpc:rpc_do_run_task+0x76/0xd0
 [<ffffffff882373f6>] :nfs:_nfs4_proc_open+0x76/0x230
 [<ffffffff88237a2e>] :nfs:nfs4_open_recover_helper+0x5e/0xc0
 [<ffffffff88237b74>] :nfs:nfs4_open_recover+0xe4/0x120
 [<ffffffff88238e14>] :nfs:nfs4_open_reclaim+0xa4/0xf0
 [<ffffffff882413c5>] :nfs:nfs4_reclaim_open_state+0x55/0x1b0
 [<ffffffff882417ea>] :nfs:reclaimer+0x2ca/0x390
 [<ffffffff88241520>] :nfs:reclaimer+0x0/0x390
 [<ffffffff8024e59b>] kthread+0x4b/0x80
 [<ffffffff8020cad8>] child_rip+0xa/0x12
 [<ffffffff8024e550>] kthread+0x0/0x80
 [<ffffffff8020cace>] child_rip+0x0/0x12


Code: 0f 0b eb fe 48 89 ef c7 00 00 00 00 02 be 08 00 00 00 e8 79 
RIP  [<ffffffff88240980>] :nfs:encode_open+0x1c0/0x330
 RSP <ffff8100467c5c60>

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-09-01 10:14:37 -04:00
Trond Myklebust
560aef7450 NFS: Fix use of cancel_delayed_work_sync in nfs_release_automount_timer
Doh! We can't use cancel_delayed_work_sync because we may have been called
from an unmount that was being performed by nfs_automount_task.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-09-01 10:14:36 -04:00
Trond Myklebust
e89a5a43b9 NFS: Fix the mount regression
This avoids the recent NFS mount regression (returning EBUSY when
mounting the same filesystem twice with different parameters).

The best I can do given the constraints appears to be to have the kernel
first look for a superblock that matches both the fsid and the
user-specified mount options, and then spawn off a new superblock if
that search fails.

Note that this is not the same as specifying nosharecache everywhere
since nosharecache will never attempt to match an existing superblock.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: Hua Zhong <hzhong@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-08-31 20:26:45 -07:00
Trond Myklebust
3d39c691ff NFS: Replace flush_scheduled_work with cancel_work_sync() and friends
This will avoid deadlocks of the form:

stack backtrace:
 [<c0104fda>] show_trace_log_lvl+0x1a/0x30
 [<c0105c02>] show_trace+0x12/0x20
 [<c0105d15>] dump_stack+0x15/0x20
 [<c013ee42>] __lock_acquire+0xc22/0x1030
 [<c013f2b1>] lock_acquire+0x61/0x80
 [<c012edd9>] flush_workqueue+0x49/0x70
 [<c012ee0d>] flush_scheduled_work+0xd/0x10
 [<dcf55c0c>] nfs_release_automount_timer+0x2c/0x30 [nfs]
 [<dcf45d8e>] nfs_free_server+0x9e/0xd0 [nfs]
 [<dcf4e626>] nfs_kill_super+0x16/0x20 [nfs]
 [<c017b38d>] deactivate_super+0x7d/0xa0
 [<c018f94b>] mntput_no_expire+0x4b/0x80
 [<c018fd94>] expire_mount_list+0xe4/0x140
 [<c0191219>] mark_mounts_for_expiry+0x99/0xb0
 [<dcf55d1d>] nfs_expire_automounts+0xd/0x40 [nfs]
 [<c012e61b>] run_workqueue+0x12b/0x1e0
 [<c012f05b>] worker_thread+0x9b/0x100
 [<c0131c72>] kthread+0x42/0x70
 [<c0104c0f>] kernel_thread_helper+0x7/0x18
 =======================

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-08-07 16:12:50 -04:00
Trond Myklebust
905f8d16e3 NFSv4: Don't call put_rpccred() from an rcu callback
Doing so would require us to introduce bh-safe locks into put_rpccred().
This patch fixes the lockdep complaint reported by Marc Dietrich:

inconsistent {softirq-on-W} -> {in-softirq-W} usage.
swapper/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
 (rpc_credcache_lock){-+..}, at: [<c01dc487>]
_atomic_dec_and_lock+0x17/0x60
{softirq-on-W} state was registered at:
  [<c013e870>] __lock_acquire+0x650/0x1030
  [<c013f2b1>] lock_acquire+0x61/0x80
  [<c02db9ac>] _spin_lock+0x2c/0x40
  [<c01dc487>] _atomic_dec_and_lock+0x17/0x60
  [<dced55fd>] put_rpccred+0x5d/0x100 [sunrpc]
  [<dced56c1>] rpcauth_unbindcred+0x21/0x60 [sunrpc]
  [<dced3fd4>] a0 [sunrpc]
  [<dcecefe0>] rpc_call_sync+0x30/0x40 [sunrpc]
  [<dcedc73b>] rpcb_register+0xdb/0x180 [sunrpc]
  [<dced65b3>] svc_register+0x93/0x160 [sunrpc]
  [<dced6ebe>] __svc_create+0x1ee/0x220 [sunrpc]
  [<dced7053>] svc_create+0x13/0x20 [sunrpc]
  [<dcf6d722>] nfs_callback_up+0x82/0x120 [nfs]
  [<dcf48f36>] nfs_get_client+0x176/0x390 [nfs]
  [<dcf49181>] nfs4_set_client+0x31/0x190 [nfs]
  [<dcf49983>] nfs4_create_server+0x63/0x3b0 [nfs]
  [<dcf52426>] nfs4_get_sb+0x346/0x5b0 [nfs]
  [<c017b444>] vfs_kern_mount+0x94/0x110
  [<c0190a62>] do_mount+0x1f2/0x7d0
  [<c01910a6>] sys_mount+0x66/0xa0
  [<c0104046>] syscall_call+0x7/0xb
  [<ffffffff>] 0xffffffff
irq event stamp: 5277830
hardirqs last  enabled at (5277830): [<c017530a>] kmem_cache_free+0x8a/0xc0
hardirqs last disabled at (5277829): [<c01752d2>] kmem_cache_free+0x52/0xc0
softirqs last  enabled at (5277798): [<c0124173>] __do_softirq+0xa3/0xc0
softirqs last disabled at (5277817): [<c01241d7>] do_softirq+0x47/0x50

other info that might help us debug this:
no locks held by swapper/0.

stack backtrace:
 [<c0104fda>] show_trace_log_lvl+0x1a/0x30
 [<c0105c02>] show_trace+0x12/0x20
 [<c0105d15>] dump_stack+0x15/0x20
 [<c013ccc3>] print_usage_bug+0x153/0x160
 [<c013d8b9>] mark_lock+0x449/0x620
 [<c013e824>] __lock_acquire+0x604/0x1030
 [<c013f2b1>] lock_acquire+0x61/0x80
 [<c02db9ac>] _spin_lock+0x2c/0x40
 [<c01dc487>] _atomic_dec_and_lock+0x17/0x60
 [<dced55fd>] put_rpccred+0x5d/0x100 [sunrpc]
 [<dcf6bf83>] nfs_free_delegation_callback+0x13/0x20 [nfs]
 [<c012f9ea>] __rcu_process_callbacks+0x6a/0x1c0
 [<c012fb52>] rcu_process_callbacks+0x12/0x30
 [<c0124218>] tasklet_action+0x38/0x80
 [<c0124125>] __do_softirq+0x55/0xc0
 [<c01241d7>] do_softirq+0x47/0x50
 [<c0124605>] irq_exit+0x35/0x40
 [<c0112463>] smp_apic_timer_interrupt+0x43/0x80
 [<c0104a77>] apic_timer_interrupt+0x33/0x38
 [<c02690df>] cpuidle_idle_call+0x6f/0x90
 [<c01023c3>] cpu_idle+0x43/0x70
 [<c02d8c27>] rest_init+0x47/0x50
 [<c03bcb6a>] start_kernel+0x22a/0x2b0
 [<00000000>] 0x0
 =======================

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-08-07 15:15:57 -04:00
Trond Myklebust
45328c354e NFS: Fix NFSv4 open stateid regressions
Do not allow cached open for O_RDONLY or O_WRONLY unless the file has been
previously opened in these modes.

Also Fix the calculation of the mode in nfs4_close_prepare. We should only
issue an OPEN_DOWNGRADE if we're sure that we will still be holding the
correct open modes. This may not be the case if we've been doing delegated
opens.

Finally, there is no need to adjust the open mode bit flags in
nfs4_close_done(): that has already been done in nfs4_close_prepare().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-08-07 15:13:19 -04:00
Trond Myklebust
ba683031fa NFSv4: Fix a locking regression in nfs4_set_mode_locked()
We don't really need to clear &state->inode_states inside
nfs4_set_mode_locked, and doing so without holding the inode->i_lock would
in any case be a bug...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-08-07 15:13:18 -04:00
Trond Myklebust
5e11934d13 NFS: Fix put_nfs_open_context
We need to grab the inode->i_lock atomically with the last reference put in
order to remove the open context that is being freed from the
nfsi->open_files list.

Fix by converting the kref to a standard atomic counter and then using
atomic_dec_and_lock()...

Thanks to Arnd Bergmann for pointing out the problem.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-08-07 15:13:17 -04:00
Al Viro
41089644c1 fix broken handling of port=... in NFS option parsing
Obviously broken on little-endian; fortunately, the option is not
frequently used...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[ Hey, sparse is wonderful, but even better than sparse is having people
  like Al that actually _run_ it and fix bugs using it.    - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-22 11:15:18 -07:00
Paul Mundt
20c2df83d2 mm: Remove slab destructors from kmem_cache_create().
Slab destructors were no longer supported after Christoph's
c59def9f22 change. They've been
BUGs for both slab and slub, and slob never supported them
either.

This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2007-07-20 10:11:58 +09:00
Jeff Layton
0a87cf128f NFSv4: handle lack of clientaddr in option string
If a NFSv4 mount is attempted  with string based options, and the
option string doesn't contain a clientaddr= option, the kernel will
currently oops. Check for this situation and return a proper error.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-19 15:21:40 -04:00
Benny Halevy
f9d888fcd9 NFSv4: debug print ntohl(status) in nfs client callback xdr code
status in nfs client callback xdr code is passed in network order.
print it in host order for better readability.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-19 15:21:40 -04:00
Trond Myklebust
e4eff1a622 SUNRPC: Clean up the sillyrename code
Fix a couple of bugs:
 - Don't rely on the parent dentry still being valid when the call completes.
   Fixes a race with shrink_dcache_for_umount_subtree()

 - Don't remove the file if the filehandle has been labelled as stale.

Fix a couple of inefficiencies
 - Remove the global list of sillyrenamed files. Instead we can cache the
   sillyrename information in the dentry->d_fsdata
 - Move common code from unlink_setup/unlink_done into fs/nfs/unlink.c

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-19 15:21:39 -04:00
Trond Myklebust
4fdc17b2a7 NFS: Introduce struct nfs_removeargs+nfs_removeres
We need a common structure for setting up an unlink() rpc call in order to
fix the asynchronous unlink code.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-19 15:21:39 -04:00
Trond Myklebust
3062c532ad NFS: Use dentry->d_time to store the parent directory verifier.
This will free up the d_fsdata field for other use.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-19 15:21:39 -04:00
Trond Myklebust
e3a535e173 NFSv4: Fix the nfsv4 readlink reply buffer alignment
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-19 15:09:04 -04:00
Trond Myklebust
d6ac02dfaa NFSv4: Fix the readdir reply buffer alignment
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-19 15:09:04 -04:00
Trond Myklebust
9104a55dc3 NFSv4: More NFSv4 xdr cleanups
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-19 15:09:04 -04:00
Trond Myklebust
9936781d01 NFSv4: Try to recover from getfh failures in nfs4_xdr_dec_open
Try harder to recover the open state if the server failed to return a
filehandle.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-19 15:09:03 -04:00
Trond Myklebust
56659e9926 NFSv4: 'constify' lookup arguments.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-19 15:09:03 -04:00
Trond Myklebust
365c8f589a NFSv4: Don't fail nfs4_xdr_dec_open if decode_restorefh() failed
We can already easily recover from that inside _nfs4_proc_open().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-19 15:09:03 -04:00
Trond Myklebust
6f220ed5a8 NFSv4: Fix open state recovery
Ensure that opendata->state is always initialised when we do state
recovery.

Ensure that we set the filehandle in the case where we're doing an
"OPEN_CLAIM_PREVIOUS" call due to a server reboot.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-19 15:09:03 -04:00
J. Bruce Fields
6d34ac199a locks: make posix_test_lock() interface more consistent
Since posix_test_lock(), like fcntl() and ->lock(), indicates absence or
presence of a conflict lock by setting fl_type to, respectively, F_UNLCK
or something other than F_UNLCK, the return value is no longer needed.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
2007-07-18 19:17:19 -04:00
J. Bruce Fields
370f6599e8 nfs: disable leases over NFS
As Peter Staubach says elsewhere
(http://marc.info/?l=linux-kernel&m=118113649526444&w=2):

> The problem is that some file system such as NFSv2 and NFSv3 do
> not have sufficient support to be able to support leases correctly.
> In particular for these two file systems, there is no over the wire
> protocol support.
>
> Currently, these two file systems fail the fcntl(F_SETLEASE) call
> accidentally, due to a reference counting difference.  These file
> systems should fail more consciously, with a proper error to
> indicate that the call is invalid for them.

Define an nfs setlease method that just returns -EINVAL.

If someone can demonstrate a real need, perhaps we could reenable
them in the presence of the "nolock" mount option.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Cc: Peter Staubach <staubach@redhat.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-18 19:17:19 -04:00
Rafael J. Wysocki
8314418629 Freezer: make kernel threads nonfreezable by default
Currently, the freezer treats all tasks as freezable, except for the kernel
threads that explicitly set the PF_NOFREEZE flag for themselves.  This
approach is problematic, since it requires every kernel thread to either
set PF_NOFREEZE explicitly, or call try_to_freeze(), even if it doesn't
care for the freezing of tasks at all.

It seems better to only require the kernel threads that want to or need to
be frozen to use some freezer-related code and to remove any
freezer-related code from the other (nonfreezable) kernel threads, which is
done in this patch.

The patch causes all kernel threads to be nonfreezable by default (ie.  to
have PF_NOFREEZE set by default) and introduces the set_freezable()
function that should be called by the freezable kernel threads in order to
unset PF_NOFREEZE.  It also makes all of the currently freezable kernel
threads call set_freezable(), so it shouldn't cause any (intentional)
change of behaviour to appear.  Additionally, it updates documentation to
describe the freezing of tasks more accurately.

[akpm@linux-foundation.org: build fixes]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Nigel Cunningham <nigel@nigel.suspend2.net>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:02 -07:00
Rusty Russell
8e1f936b73 mm: clean up and kernelify shrinker registration
I can never remember what the function to register to receive VM pressure
is called.  I have to trace down from __alloc_pages() to find it.

It's called "set_shrinker()", and it needs Your Help.

1) Don't hide struct shrinker.  It contains no magic.
2) Don't allocate "struct shrinker".  It's not helpful.
3) Call them "register_shrinker" and "unregister_shrinker".
4) Call the function "shrink" not "shrinker".
5) Reduce the 17 lines of waffly comments to 13, but document it properly.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: David Chinner <dgc@sgi.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-17 10:23:00 -07:00
Pavel Emelianov
259902ea95 Make NFS client use seq_list_xxx helpers
This includes /proc/fs/nfsfs/servers and /proc/fs/nfsfs/volumes entries.

Both need to show the header and use the list_head.

Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-16 09:05:42 -07:00
Frank Filz
137d6acaa6 NFSv4: Make sure unlock is really an unlock when cancelling a lock
I ran into a curious issue when a lock is being canceled. The
cancellation results in a lock request to the vfs layer instead of an
unlock request. This is particularly insidious when the process that
owns the lock is exiting. In that case, sometimes the erroneous lock is
applied AFTER the process has entered zombie state, preventing the lock
from ever being released. Eventually other processes block on the lock
causing a slow degredation of the system. In the 2.6.16 kernel this was
investigated on, the problem is compounded by the fact that the cl_sem
is held while blocking on the vfs lock, which results in most processes
accessing the nfs file system in question hanging.

In more detail, here is how the situation occurs:

first _nfs4_do_setlk():

static int _nfs4_do_setlk(struct nfs4_state *state, int cmd, struct file_lock *fl, int reclaim)
...
        ret = nfs4_wait_for_completion_rpc_task(task);
        if (ret == 0) {
...
        } else
                data->cancelled = 1;

then nfs4_lock_release():

static void nfs4_lock_release(void *calldata)
...
        if (data->cancelled != 0) {
                struct rpc_task *task;
                task = nfs4_do_unlck(&data->fl, data->ctx, data->lsp,
                                data->arg.lock_seqid);

The problem is the same file_lock that was passed in to _nfs4_do_setlk()
gets passed to nfs4_do_unlck() from nfs4_lock_release(). So the type is
still F_RDLCK or FWRLCK, not F_UNLCK. At some point, when cancelling the
lock, the type needs to be changed to F_UNLCK. It seemed easiest to do
that in nfs4_do_unlck(), but it could be done in nfs4_lock_release().
The concern I had with doing it there was if something still needed the
original file_lock, though it turns out the original file_lock still
needs to be modified by nfs4_do_unlck() because nfs4_do_unlck() uses the
original file_lock to pass to the vfs layer, and a copy of the original
file_lock for the RPC request.

It seems like the simplest solution is to force all situations where
nfs4_do_unlck() is being used to result in an unlock, so with that in
mind, I made the following change:

Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:49 -04:00
Trond Myklebust
6f2e64d3e1 NFSv4: Make the NFS state model work with the nosharedcache mount option
Consider the case where the user has mounted the remote filesystem
server:/foo on the two local directories /bar and /baz using the
nosharedcache mount option. The files /bar/file and /baz/file are
represented by different inodes in the local namespace, but refer to the
same file /foo/file on the server.
Consider the case where a process opens both /bar/file and /baz/file, then
closes /bar/file: because the nfs4_state is not shared between /bar/file
and /baz/file, the kernel will see that the nfs4_state for /bar/file is no
longer referenced, so it will send off a CLOSE rpc call. Unless the
open_owners differ, then that CLOSE call will invalidate the open state on
/baz/file too.

Conclusion: we cannot share open state owners between two different
non-shared mount instances of the same filesystem.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:48 -04:00
Trond Myklebust
275a5d24bf NFS: Error when mounting the same filesystem with different options
Unless the user sets the NFS_MOUNT_NOSHAREDCACHE mount flag, we should
return EBUSY if the filesystem is already mounted on a superblock that
has set conflicting mount options.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:48 -04:00
Trond Myklebust
75180df2ed NFS: Add the mount option "nosharecache"
Prior to David Howell's mount changes in 2.6.18, users who mounted
different directories which happened to be from the same filesystem on the
server would get different super blocks, and hence could choose different
mount options. As long as there were no hard linked files that crossed from
one subtree to another, this was quite safe.
Post the changes, if the two directories are on the same filesystem (have
the same 'fsid'), they will share the same super block, and hence the same
mount options.

Add a flag to allow users to elect not to share the NFS super block with
another mount point, even if the fsids are the same. This will allow
users to set different mount options for the two different super blocks, as
was previously possible. It is still up to the user to ensure that there
are no cache coherency issues when doing this, however the default
behaviour will be to share super blocks whenever two paths result in
the same fsid.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:48 -04:00
Chuck Lever
8007122520 NFS: Add support for mounting NFSv4 file systems with string options
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:48 -04:00
Chuck Lever
136d558ce7 NFS: Add final pieces to support in-kernel mount option parsing
Hook in final components required for supporting in-kernel mount option
parsing for NFSv2 and NFSv3 mounts.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:48 -04:00
Chuck Lever
0076d7b7ba NFS: Introduce generic mount client API
For NFSv2 and v3 mounts, the first step is to contact the server's MOUNTD
and request the file handle for the root of the mounted share.  Add a
function to the NFS client that handles this operation.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:47 -04:00
Chuck Lever
bf0fd7680f NFS: Add enums and match tables for mount option parsing
This generic infrastructure works for both NFS and NFSv4 mounts.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:47 -04:00
Chuck Lever
013a8c1ab5 NFS: Improve debugging output in NFS in-kernel mount client
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:47 -04:00
Chuck Lever
19207231c9 NFS: Clean up in-kernel NFS mount
Clean up white space and coding conventions.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:46 -04:00
Chuck Lever
3ea97309e6 NFS: Remake nfsroot_mount as a permanent part of NFS client
In preparation for supporting NFSv2 and NFSv3 mount option handling in the
kernel NFS client, convert mount_clnt.c to be a permanent part of the NFS
client, instead of built only when CONFIG_ROOT_NFS is enabled.

In addition, we also replace the "struct sockaddr_in *" argument with
something more generic, to help support IPv6 at some later point.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:46 -04:00
Chuck Lever
43780b87fa SUNRPC: Add a convenient default for the hostname when calling rpc_create()
A couple of callers just use a stringified IP address for the rpc client's
hostname.  Move the logic for constructing this into rpc_create(), so it can
be shared.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:46 -04:00
Chuck Lever
cce63cd637 SUNRPC: Rename rpcb_getport_external routine
In preparation for handling NFS mount option parsing in the kernel,
rename rpcb_getport_external as rpcb_get_port_sync, and make it available
always (instead of only when CONFIG_ROOT_NFS is enabled).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:46 -04:00
Chuck Lever
f0768ebd09 NFS: Introduce nfs4_validate_mount_options
Refactor NFSv4 mount processing to break out mount data validation
in the same way it's broken out in the NFSv2/v3 mount path.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:45 -04:00
Chuck Lever
5df36e78da NFS: Clean up nfs_validate_mount_data
Move error handling code out of the main code path.  The switch statement
was also improperly indented, according to Documentation/CodingStyle.  This
prepares nfs_validate_mount_data for the addition of option string parsing.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:45 -04:00
Chuck Lever
fc50d58fd0 NFS: Clean-up: Refactor IP address sanity checks in NFS client
NFS and NFSv4 mounts can now share server address sanity checking.  And, it
provides an easy mechanism for adding IPv6 address checking at some later
point.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:44 -04:00
Chuck Lever
4d81cd1611 NFS: Clean-up: fix a compiler warning in fs/nfs/super.c
/home/cel/linux/fs/nfs/super.c: In function 'nfs_pseudoflavour_to_name':
/home/cel/linux/fs/nfs/super.c:270: warning: comparison between signed and unsigned

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:44 -04:00
Chuck Lever
0655960f76 NFS: Clean up error handling in nfs_get_sb
The error return logic in nfs_get_sb now matches nfs4_get_sb, and is more maintainable.
A subsequent patch will take advantage of this simplification.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:44 -04:00
Chuck Lever
29eb981a3b NFS: Clean-up: Replace nfs_copy_user_string with strndup_user
The new string utility function strndup_user can be used instead of
nfs_copy_user_string, eliminating an unnecessary duplication of function.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:44 -04:00
Chuck Lever
5680d48be8 NFS: Clean-up: Define macros for maximum host and export path name lengths
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:44 -04:00
Chuck Lever
9eaa67c6a5 NFS: Clean-up: use correct type when converting NFS blocks to local blocks
inode->i_blocks is a blkcnt_t these days, which can be a u64 or unsigned
long, depending on the setting of CONFIG_LSF.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:44 -04:00
Trond Myklebust
8bda4e4c98 NFSv4: Fix up stateid locking...
We really don't need to grab both the state->so_owner and the
inode->i_lock.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:43 -04:00
Trond Myklebust
1ac7e2fd35 NFSv4: Clean up the callers of nfs4_open_recover_helper()
Rely on nfs4_try_open_cached() when appropriate.

Also fix an RCU violation in _nfs4_do_open_reclaim()

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:43 -04:00
Trond Myklebust
6ee4126890 NFSv4: Don't call OPEN if we already have an open stateid for a file
If we already have a stateid with the correct open mode for a given file,
then we can reuse that stateid instead of re-issuing an OPEN call without
violating the close-to-open caching semantics.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:43 -04:00
Trond Myklebust
aac00a8d0a NFSv4: Check for the existence of a delegation in nfs4_open_prepare()
We should not be calling open() on an inode that has a delegation unless
we're doing a reclaim.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:43 -04:00
Trond Myklebust
3e309914a1 NFSv4: Clean up _nfs4_proc_open()
Use a flag instead of the 'data->rpc_status = -ENOMEM hack.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:42 -04:00
Trond Myklebust
1b370bc28f NFSv4: Allow nfs4_opendata_to_nfs4_state to return errors.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:42 -04:00
Trond Myklebust
6f43ddccb3 NFSv4: Improve the debugging of bad sequence id errors...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:42 -04:00
Trond Myklebust
003707c722 NFSv4: Always use the delegation if we have one
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:41 -04:00
Trond Myklebust
0f9f95e0ad NFSv4: Clean up confirmation of sequence ids...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:41 -04:00
Trond Myklebust
412c77cee6 NFSv4: Defer inode revalidation when setting up a delegation
Currently we force a synchronous call to __nfs_revalidate_inode() in
nfs_inode_set_delegation(). This not only ensures that we cannot call
nfs_inode_set_delegation from an asynchronous context, but it also slows
down any call to open().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:41 -04:00
Trond Myklebust
8383e4602c NFSv4: Use RCU to protect delegations
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:41 -04:00
Trond Myklebust
13437e12fb NFSv4: Support recalling delegations by stateid part 2
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:41 -04:00
Trond Myklebust
9016302784 NFSv4: Support recalling delegations by stateid
There appear to be some rogue servers out there that issue multiple
delegations with different stateids for the same file. Ensure that when we
return delegations, we do so on a per-stateid basis rather than a per-file
basis.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:40 -04:00
Trond Myklebust
2ced46c270 NFSv4: Fix up a bug in nfs4_open_recover()
Don't clobber the delegation info...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:40 -04:00
Trond Myklebust
549d6ed5e8 NFSv4: set the delegation in nfs4_opendata_to_nfs4_state
This ensures that nfs4_open_release() and nfs4_open_confirm_release()
can now handle an eventual delegation that was returned with out open.
As such, it fixes a delegation "leak" when the user breaks out of an open
call.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:40 -04:00
Trond Myklebust
1c816efa24 NFSv4: Fix a bug in __nfs4_find_state_byowner
The test for state->state == 0 does not tell you that the stateid is in the
process of being freed. It really tells you that the stateid is not yet
initialised...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:40 -04:00
Trond Myklebust
1b45c46cf7 NFSv4: Fix atomic open for execute...
Currently we do not check for the FMODE_EXEC flag as we should. For that
particular case, we need to perform an ACCESS call to the server in order
to check that the file is executable.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:40 -04:00
Trond Myklebust
9f958ab885 NFSv4: Reduce the chances of an open_owner identifier collision
Currently we just use a 32-bit counter.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:39 -04:00
Trond Myklebust
88d9093997 NFSv4: nfs_increment_open_seqid should not return a value
It is a void function...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:39 -04:00
Trond Myklebust
e6889620e8 NFSv4: Fix underestimate of NFSv4 lookup request size
Also fix up the underestimate of fs_locations

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:39 -04:00
Trond Myklebust
2cebf82883 NFSv4: Fix the underestimate of NFSv4 open request size
The maximum size depends on the filename size and a number of other
elements which are currently not being counted.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:39 -04:00
Trond Myklebust
bd625ba80d NFSv4: Fix the NFSv4 owner and owner_group size estimates
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:39 -04:00
Trond Myklebust
7af654f8d1 NFSv4: Don't reuse expired nfs4_state_owner structs
That just confuses certain NFSv4 servers.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:38 -04:00
Trond Myklebust
27b3f949b7 NFSv4: Fix a credential reference leak in nfs4_get_state_owner()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:38 -04:00
Trond Myklebust
587142f85f NFS: Replace NFS_I(inode)->req_lock with inode->i_lock
There is no justification for keeping a special spinlock for the exclusive
use of the NFS writeback code.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:38 -04:00
Trond Myklebust
4e56e082dd NFSv4: Clean up _nfs4_proc_lookup() vs _nfs4_proc_lookupfh()
They differ only slightly in the arguments they take. Why have they not
been merged?

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:38 -04:00
Trond Myklebust
1be27f3660 SUNRPC: Remove the tk_auth macro...
We should almost always be deferencing the rpc_auth struct by means of the
credential's cr_auth field instead of the rpc_clnt->cl_auth anyway. Fix up
that historical mistake, and remove the macro that propagated it.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:37 -04:00
Trond Myklebust
f61534dfd3 SUNRPC: Remove redundant calls to rpciod_up()/rpciod_down()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:30 -04:00
Trond Myklebust
90c5755ff5 SUNRPC: Kill rpc_clnt->cl_oneshot
Replace it with explicit calls to rpc_shutdown_client() or
rpc_destroy_client() (for the case of asynchronous calls).

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:29 -04:00
Trond Myklebust
c6d00e639b NFSv4: Convert struct nfs4_opendata to use struct kref
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:28 -04:00
Trond Myklebust
3bec63db55 NFS: Convert struct nfs_open_context to use a kref
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:27 -04:00
Trond Myklebust
edc05fc1c2 NFS: reduce latency by using conditional rescheduling in nfs_scan_list
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:27 -04:00
Trond Myklebust
dce34ce298 NFS: Prevent integer overflow in nfs_scan_list()
Also ensure that nfs_inode ncommit and npages are large enough to represent
all possible values for the number of pages.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:27 -04:00
Trond Myklebust
2aefa10431 NFS: Remove the redundant 'dirty' and 'commit' lists from nfs_inode
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:26 -04:00
Trond Myklebust
5c36968343 NFS cleanup: speed up nfs_scan_commit using radix tree tags
Add a tag for requests that are waiting for a COMMIT

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:26 -04:00
Trond Myklebust
9fd367f0f3 NFS cleanup: Rename NFS_PAGE_TAG_WRITEBACK to NFS_PAGE_TAG_LOCKED
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:26 -04:00
Trond Myklebust
c03b402461 NFS: Convert struct nfs_page to use krefs
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:26 -04:00
Trond Myklebust
a50f7951a3 NFS: Fix an Oops in the nfs_access_cache_shrinker()
The nfs_access_cache_shrinker may race with nfs_access_zap_cache().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:25 -04:00
Trond Myklebust
e2f032e9ef NFS: nfs3_proc_create() should use nfs_post_op_update_inode()
Also get rid of a redundant call to nfs_setattr_update_inode(). The call to
nfs3_proc_setattr() already takes care of that.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:25 -04:00
Jeff Layton
aa53ed541a NFS4: on a O_EXCL OPEN make sure SETATTR sets the fields holding the verifier
The Linux NFS4 client simply skips over the bitmask in an O_EXCL open
call and so it doesn't bother to reset any fields that may be holding
the verifier. This patch has us save the first two words of the bitmask
(which is all the current client has #defines for). The client then
later checks this bitmask and turns on the appropriate flags in the
sattr->ia_verify field for the following SETATTR call.

This patch only currently checks to see if the server used the atime
and mtime slots for the verifier (which is what the Linux server uses
for this). I'm not sure of what other fields the server could
reasonably use, but adding checks for others should be trivial.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:25 -04:00
Trond Myklebust
fc6ae3cf48 NFS: Re-enable forced umounts
They disappeared some time around 2.6.18.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:25 -04:00
Jeff Layton
83d93f2229 NFS: Use GFP_HIGHUSER for page allocation in nfs_symlink()
nfs_symlink() allocates a GFP_KERNEL page for the pagecache. Most
pagecache pages are allocated using GFP_HIGHUSER, and there's no reason
not to do that in nfs_symlink() as well.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
2007-07-10 23:40:25 -04:00
Trond Myklebust
a0356862bc NFS: Fix nfs_reval_fsid()
We don't need to revalidate the fsid on the root directory. It suffices to
revalidate it on the current directory.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:24 -04:00
Trond Myklebust
b39e625b6e NFSv4: Clean up nfs4_call_async()
Use rpc_run_task() instead of doing it ourselves.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:24 -04:00
Trond Myklebust
4a35bd41af NFSv4: Ensure that nfs4_do_close() doesn't race with umount
nfs4_do_close() does not currently have any way to ensure that the user
won't attempt to unmount the partition while the asynchronous RPC call
is completing. This again may cause Oopses in nfs_update_inode().

Add a vfsmount argument to nfs4_close_state to ensure that the partition
remains mounted while we're closing the file.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:24 -04:00
Trond Myklebust
ad389da79f NFSv4: Ensure asynchronous open() calls always pin the mountpoint
A number of race conditions may currently ensue if the user presses ^C
and then unmounts the partition while an asynchronous open() is in
progress.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:24 -04:00
Trond Myklebust
539cd03a57 NFSv4: Cleanup: pass the nfs_open_context to open recovery code
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:24 -04:00
Trond Myklebust
88be9f990f NFS: Replace vfsmount and dentry in nfs_open_context with struct path
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:23 -04:00
Trond Myklebust
de05a0cc2a NFS: Minor read optimisation...
Since PG_uptodate may now end up getting set during the call to
nfs_wb_page(), we can avoid putting a read request on the wire in those
situations.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:23 -04:00
Trond Myklebust
44dd151d5c NFS: Don't mark a written page as uptodate until it is on disk
The write may fail, so we should not mark the page as uptodate until we are
certain that the data has been accepted and written to disk by the server.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:23 -04:00
Trond Myklebust
d9df8d6b38 NFS: Don't fail an O_DIRECT read/write if get_user_pages() returns pages
There is no need to fail the entire O_DIRECT read/write just because
get_user_pages() returned fewer pages than we requested.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:23 -04:00
Chuck Lever
070ea60214 NFS: Clean ups in fs/nfs/direct.c
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:23 -04:00
Jens Axboe
f0930fffa9 sendfile: convert nfs to using splice_read()
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2007-07-10 08:04:14 +02:00
Trond Myklebust
b4946ffb18 NFS: Fix a refcount leakage in O_DIRECT
The current code is leaking a reference to dreq->kref when the calls to
nfs_direct_read_schedule() and nfs_direct_write_schedule() return an
error.
This patch moves the call to kref_put() from nfs_direct_wait() back into
nfs_direct_read() and nfs_direct_write() (which are the functions that
actually took the reference in the first place) fixing the leak.

Thanks to Denis V. Lunev for spotting the bug and proposing the original
fix.

Acked-by: Denis V. Lunev <dlunev@gmail.com>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-05-30 16:26:01 -04:00
Trond Myklebust
d4a8f3677f NFS: Fix nfs_direct_dirty_pages()
We only need to dirty the pages that were actually read in.

Also convert nfs_direct_dirty_pages() to call set_page_dirty() instead of
set_page_dirty_lock(). A call to lock_page() is unacceptable in an rpciod
callback function.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-05-24 11:18:18 -04:00
Chuck Lever
749e146e01 NFS: Fix handful of compiler warnings in direct.c
This patch fixes a couple of signage issues that were causing an Oops
when running the LTP diotest4 test. get_user_pages() returns a signed
error, hence we need to be careful when comparing with the unsigned
number of pages from data->npages.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-05-24 10:44:20 -04:00
Trond Myklebust
7fe7f8487a NFS: Avoid a deadlock situation on write
When processes are allowed to attempt to lock a non-contiguous range of nfs
write requests, it is possible for generic_writepages to 'wrap round' the
address space, and call writepage() on a request that is already locked by
the same process.

We avoid the deadlock by checking if the page index is contiguous with the
list of nfs write requests that is already held in our
nfs_pageio_descriptor prior to attempting to lock a new request.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-05-24 10:44:20 -04:00
Alexey Dobriyan
e8edc6e03a Detach sched.h from mm.h
First thing mm.h does is including sched.h solely for can_do_mlock() inline
function which has "current" dereference inside. By dealing with can_do_mlock()
mm.h can be detached from sched.h which is good. See below, why.

This patch
a) removes unconditional inclusion of sched.h from mm.h
b) makes can_do_mlock() normal function in mm/mlock.c
c) exports can_do_mlock() to not break compilation
d) adds sched.h inclusions back to files that were getting it indirectly.
e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were
   getting them indirectly

Net result is:
a) mm.h users would get less code to open, read, preprocess, parse, ... if
   they don't need sched.h
b) sched.h stops being dependency for significant number of files:
   on x86_64 allmodconfig touching sched.h results in recompile of 4083 files,
   after patch it's only 3744 (-8.3%).

Cross-compile tested on

	all arm defconfigs, all mips defconfigs, all powerpc defconfigs,
	alpha alpha-up
	arm
	i386 i386-up i386-defconfig i386-allnoconfig
	ia64 ia64-up
	m68k
	mips
	parisc parisc-up
	powerpc powerpc-up
	s390 s390-up
	sparc sparc-up
	sparc64 sparc64-up
	um-x86_64
	x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig

as well as my two usual configs.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-21 09:18:19 -07:00
Trond Myklebust
dd504ea16f Merge branch 'master' of /home/trondmy/repositories/git/linux-2.6/ 2007-05-17 11:36:59 -04:00
Christoph Lameter
a35afb830f Remove SLAB_CTOR_CONSTRUCTOR
SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Steven French <sfrench@us.ibm.com>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Dave Kleikamp <shaggy@austin.ibm.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Anton Altaparmakov <aia21@cantab.net>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jan Kara <jack@ucw.cz>
Cc: David Chinner <dgc@sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-17 05:23:04 -07:00
Trond Myklebust
2e42c3e2ae NFS: Fix more sparse warnings
- fs/nfs/nfs4xdr.c:2499:42: warning: incorrect type in argument 2
   (different signedness)
 - fs/nfs/nfs4xdr.c:2658:49: warning: incorrect type in argument 4
   (different explicit signedness)
 - fs/nfs/nfs4xdr.c:2683:50: warning: incorrect type in argument 4
   (different explicit signedness)
 - fs/nfs/nfs4xdr.c:3063:68: warning: incorrect type in argument 4
   (different explicit signedness)
 - fs/nfs/nfs4xdr.c:3065:68: warning: incorrect type in argument 4
   (different explicit signedness)

 - fs/nfs/callback_xdr.c:138:31: warning: incorrect type in argument 2
   (different signedness)

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-05-14 19:33:46 -04:00
Trond Myklebust
10afec9081 NFS: Fix some 'sparse' warnings...
- fs/nfs/dir.c:610:8: warning: symbol 'nfs_llseek_dir' was not declared.
   Should it be static?
 - fs/nfs/dir.c:636:5: warning: symbol 'nfs_fsync_dir' was not declared.
   Should it be static?
 - fs/nfs/write.c:925:19: warning: symbol 'req' shadows an earlier one
 - fs/nfs/write.c:61:6: warning: symbol 'nfs_commit_rcu_free' was not
   declared. Should it be static?
 - fs/nfs/nfs4proc.c:793:5: warning: symbol 'nfs4_recover_expired_lease'
   was not declared. Should it be static?

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-05-14 19:33:46 -04:00
Trond Myklebust
8ae20abdd1 NFS4: Fix incorrect use of sizeof() in fs/nfs/nfs4xdr.c
The XDR code should not depend on the physical allocation size of
structures like nfs4_stateid and nfs4_verifier since those may have to
change at some future date. We therefore replace all uses of
sizeof() with constants like NFS4_VERIFIER_SIZE and NFS4_STATEID_SIZE.

This also has the side-effect of fixing some warnings of the type
	format ‘%u’ expects type ‘unsigned int’, but argument X has type
		‘long unsigned int’
on 64-bit systems

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-05-14 19:33:45 -04:00
Nate Diller
60945cb7c8 NFS: use zero_user_page
Use zero_user_page() instead of the newly deprecated memclear_highpage_flush().

Signed-off-by: Nate Diller <nate.diller@gmail.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-05-14 19:33:45 -04:00
Jesper Juhl
7a13e93228 NFS: Kill the obsolete NFS_PARANOIA
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-05-09 17:58:01 -04:00
Milind Arun Choudhary
fee7f23fea NFS: use __set_current_state()
use __set_current_state(TASK_*) instead of current->state = TASK_*, in fs/nfs

Signed-off-by: Milind Arun Choudhary <milindchoudhary@gmail.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-05-09 17:58:01 -04:00
Chuck Lever
e4cc6ee2e4 NFS: Clean up NFSv4 XDR error message
Make it more useful for debugging purposes.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-05-09 17:58:00 -04:00
Chuck Lever
6ce7dc9407 NFS: NFS client underestimates how large an NFSv4 SETATTR reply can be
The maximum size of an NFSv4 SETATTR compound reply should include the
GETATTR operation that we send.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-05-09 17:58:00 -04:00
Trond Myklebust
e70c490810 NFS: Remove redundant check in nfs_check_verifier()
The check for nfs_attribute_timeout(dir) in nfs_check_verifier is
redundant: nfs_lookup_revalidate() will already call nfs_revalidate_inode()
on the parent dir when necessary.

The only case where this is not done is the case of a negative dentry. Fix
this case by moving up the revalidation code.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-05-09 17:57:59 -04:00
Trond Myklebust
e62c2bba1f NFS: Fix a jiffie wraparound issue
dentry verifiers are always set to the parent directory's
cache_change_attribute. There is no reason to be testing for anything other
than equality when we're trying to find out if the dentry has been checked
since the last time the directory was modified.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-05-09 17:57:58 -04:00
Peter Zijlstra
277866a0e3 nfs: fix congestion control: use atomic_longs
Change the atomic_t in struct nfs_server to atomic_long_t in anticipation
of machines that can handle 8+TB of (4K) pages under writeback.

However I suspect other things in NFS will start going *bang* by then.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:21 -07:00
Randy Dunlap
e63340ae6b header cleaning: don't include smp_lock.h when not used
Remove includes of <linux/smp_lock.h> where it is not used/needed.
Suggested by Al Viro.

Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:07 -07:00
Linus Torvalds
2d56d3c43c Merge branch 'server-cluster-locking-api' of git://linux-nfs.org/~bfields/linux
* 'server-cluster-locking-api' of git://linux-nfs.org/~bfields/linux:
  gfs2: nfs lock support for gfs2
  lockd: add code to handle deferred lock requests
  lockd: always preallocate block in nlmsvc_lock()
  lockd: handle test_lock deferrals
  lockd: pass cookie in nlmsvc_testlock
  lockd: handle fl_grant callbacks
  lockd: save lock state on deferral
  locks: add fl_grant callback for asynchronous lock return
  nfsd4: Convert NFSv4 to new lock interface
  locks: add lock cancel command
  locks: allow {vfs,posix}_lock_file to return conflicting lock
  locks: factor out generic/filesystem switch from setlock code
  locks: factor out generic/filesystem switch from test_lock
  locks: give posix_test_lock same interface as ->lock
  locks: make ->lock release private data before returning in GETLK case
  locks: create posix-to-flock helper functions
  locks: trivial removal of unnecessary parentheses
2007-05-07 12:34:24 -07:00
Christoph Lameter
50953fe9e0 slab allocators: Remove SLAB_DEBUG_INITIAL flag
I have never seen a use of SLAB_DEBUG_INITIAL.  It is only supported by
SLAB.

I think its purpose was to have a callback after an object has been freed
to verify that the state is the constructor state again?  The callback is
performed before each freeing of an object.

I would think that it is much easier to check the object state manually
before the free.  That also places the check near the code object
manipulation of the object.

Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
compiled with SLAB debugging on.  If there would be code in a constructor
handling SLAB_DEBUG_INITIAL then it would have to be conditional on
SLAB_DEBUG otherwise it would just be dead code.  But there is no such code
in the kernel.  I think SLUB_DEBUG_INITIAL is too problematic to make real
use of, difficult to understand and there are easier ways to accomplish the
same effect (i.e.  add debug code before kfree).

There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
clear in fs inode caches.  Remove the pointless checks (they would even be
pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.

This is the last slab flag that SLUB did not support.  Remove the check for
unimplemented flags from SLUB.

Signed-off-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:57 -07:00
Nick Piggin
6fe6900e1e mm: make read_cache_page synchronous
Ensure pages are uptodate after returning from read_cache_page, which allows
us to cut out most of the filesystem-internal PageUptodate calls.

I didn't have a great look down the call chains, but this appears to fixes 7
possible use-before uptodate in hfs, 2 in hfsplus, 1 in jfs, a few in
ecryptfs, 1 in jffs2, and a possible cleared data overwritten with readpage in
block2mtd.  All depending on whether the filler is async and/or can return
with a !uptodate page.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-07 12:12:51 -07:00
Marc Eshel
9d6a8c5c21 locks: give posix_test_lock same interface as ->lock
posix_test_lock() and ->lock() do the same job but have gratuitously
different interfaces.  Modify posix_test_lock() so the two agree,
simplifying some code in the process.

Signed-off-by: Marc Eshel <eshel@almaden.ibm.com>
Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
2007-05-06 17:39:00 -04:00
J. Bruce Fields
70cc6487a4 locks: make ->lock release private data before returning in GETLK case
The file_lock argument to ->lock is used to return the conflicting lock
when found.  There's no reason for the filesystem to return any private
information with this conflicting lock, but nfsv4 is.

Fix nfsv4 client, and modify locks.c to stop calling fl_release_private
for it in this case.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Cc: "Trond Myklebust" <Trond.Myklebust@netapp.com>"
2007-05-06 17:38:19 -04:00
Trond Myklebust
84dde76c4a NFS: Fix a compile glitch on 64-bit systems
fs/nfs/pagelist.c:226: error: conflicting types for 'nfs_pageio_init'
include/linux/nfs_page.h:80: error: previous declaration of 'nfs_pageio_init' was here

Thanks to Andrew for spotting this...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-05-04 14:44:06 -04:00
Jason Uhlenkott
a19b89cad5 NFS: Clean up nfs_create_request comments
Remove some stale comments about hard limits which went away in 2.5.

Signed-off-by: Jason Uhlenkott <juhlenko@akamai.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-05-02 07:37:29 -07:00
J. Bruce Fields
08efa202eb NFS4: invalidate cached acl on setacl
The ACL that the server sets may not be exactly the one we set--for
example, it may silently turn off bits that it does not support.  So we
should remove any cached ACL so that any subsequent request for the ACL
will go to the server.

Signed-off-by: "J. Bruce Fields" <bfields@citi.umich.edu>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-05-02 07:36:09 -07:00
Neil Brown
83672d392f NFS: Fix directory caching problem - with test case and patch.
Try running this script in an NFS mounted directory (Client relatively
recent - 2.6.18 has the problem as does 2.6.20).

------------------------------------------------------
#!/bin/bash
#
# This script will produce the following errormessage from tar:
#
#   tar: newdir/innerdir/innerfile: file changed as we read it

# create dirs
rm -rf nfstest
mkdir -p nfstest/dir/innerdir

# create files (should not be empty)
echo "Hello World!" >nfstest/dir/file
echo "Hello World!" >nfstest/dir/innerdir/innerfile

# problem only happens if we sleep before chmod
sleep 1

# change file modes
chmod -R a+r nfstest

# rename dir
mv nfstest/dir nfstest/newdir

# tar it
tar -cf nfstest/nfstest.tar -C nfstest newdir

# restore old dir name
mv nfstest/newdir nfstest/dir
--------------------------------------------------------

What happens:

The 'chmod -R' does a readdir_plus in each directory and the results
get cached in the page cache.  It then updates the ctime on each file
by one second.  When this happens, the post-op attributes are used to
update the ctime stored on the client to match the value in the kernel.

The 'mv' calls shrink_dcache_parent on the directory tree which
flushes all the dentries (so a new lookup will be required) but
doesn't flush the inodes or pagecache.

The 'tar' does a readdir on each directory, but (in the case of
'innerdir' at least) satisfies it from the pagecache and uses the
READDIRPLUS data to update all the inodes.  In the case of
'innerdir/innerfile', the ctime is out of date.

'tar' then calls 'lstat' on innerdir/innerfile getting an old ctime.
It then opens the file (triggering a GETATTR), reads the content, and
then calls fstat to see if anything has changed.  It finds that ctime
has changed and so complains.

The problem seems to be that the cache readdirplus info is kept around
for too long.

My patch below discards pagecache data for directories when
dentry_iput is called on them.  This effectively removes the symptom
which convinces me that I correctly understand the problem.  However
I'm not convinced that is a proper solution, as there could easily be
other races that trigger the same problem without being affected by
this 'fix'.

One possibility would be to require that readdirplus pagecache data be
only used *once* to instantiate an inode.  Somehow it should then be
invalidated so that if the dentry subsequently disappears, it will
cause a new request to the server to fill in the stat data.

Another possibility is to compare the cache_change_attribute on the
inode with something similar for the readdirplus info and reject the
info from readdirplus if it is too old.

I haven't tried to implement these and would value other opinions
before I do.

Thanks,
NeilBrown


Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:19 -07:00
Neil Brown
1f4eab7e7c NFS: Set meaningful value for fattr->time_start in readdirplus results.
Don't use uninitialsed value for fattr->time_start in readdirplus results.

The 'fattr' structure filled in by nfs3_decode_direct does not get a
value for ->time_start set.
Thus if an entry is for an inode that we already have in cache,
when nfs_readdir_lookup calls nfs_fhget, it will call nfs_refresh_inode
and may update the inode with out-of-date information.

Directories are read a page at a time, so each page could have a
different timestamp that "should" be used to set the time_start for
the fattr for info in that page.  However storing the timestamp per
page is awkward.  (We could stick in the first 4 bytes and only read 4092
bytes, but that is a bigger code change than I am interested it).

This patch ignores the readdir_plus attributes if a readdir finds the
information already in cache, and otherwise sets ->time_start to the time
the readdir request was sent to the server.

It might be nice to store - in the directory inode - the time stamp for
the earliest readdir request that is still in the page cache, so that we
don't ignore attribute data that we don't have to.  This patch doesn't do
that.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:18 -07:00
Steve Dickson
74dd34e6e8 NFS: Added support to turn off the NFSv3 READDIRPLUS RPC.
READDIRPLUS can be a performance hindrance when the client is working with
large directories. In addition, some servers still have bugs in their
implementations (e.g. Tru64 returns wrong values for the fsid).

Add a mount flag to enable users to turn it off at mount time following the
implementation in Apple's NFS client.

Signed-off-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:16 -07:00
Chuck Lever
df8b172a88 NFS: switch NFSROOT to use new rpcbind client
It is arguable whether NFSROOT will support IPv6, and thus whether
rpcb_getport_external needs to support rpcbind versions greater than 2.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:14 -07:00
Chuck Lever
2bea90d43a SUNRPC: RPC buffer size estimates are too large
The RPC buffer size estimation logic in net/sunrpc/clnt.c always
significantly overestimates the requirements for the buffer size.
A little instrumentation demonstrated that in fact rpc_malloc was never
allocating the buffer from the mempool, but almost always called kmalloc.

To compute the size of the RPC buffer more precisely, split p_bufsiz into
two fields; one for the argument size, and one for the result size.

Then, compute the sum of the exact call and reply header sizes, and split
the RPC buffer precisely between the two.  That should keep almost all RPC
buffers within the 2KiB buffer mempool limit.

And, we can finally be rid of RPC_SLACK_SPACE!

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:10 -07:00
Trond Myklebust
ca52fec152 NFS: Use pgoff_t in structures and functions that pass page cache offsets
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:09 -07:00
Trond Myklebust
724c439c20 NFS: Clean up nfs_sync_mapping_wait()
It has no business touching wbc->pages_skipped.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:08 -07:00
Trond Myklebust
8d5658c949 NFS: Fix a buffer overflow in the allocation of struct nfs_read/writedata
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:07 -07:00
Trond Myklebust
c63c7b0513 NFS: Fix a race when doing NFS write coalescing
Currently we do write coalescing in a very inefficient manner: one pass in
generic_writepages() in order to lock the pages for writing, then one pass
in nfs_flush_mapping() and/or nfs_sync_mapping_wait() in order to gather
the locked pages for coalescing into RPC requests of size "wsize".

In fact, it turns out there is actually a deadlock possible here since we
only start I/O on the second pass. If the user signals the process while
we're in nfs_sync_mapping_wait(), for instance, then we may exit before
starting I/O on all the requests that have been queued up.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:06 -07:00
Trond Myklebust
8b09bee308 NFS: Cleanup for nfs_readpages()
Do the coalescing of read requests into block sized requests at start of
I/O as we scan through the pages instead of going through a second pass.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:05 -07:00
Trond Myklebust
bcb71bba7e NFS: Another cleanup of the read/write request coalescing code
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:04 -07:00
Trond Myklebust
d8a5ad75cc NFS: Cleanup the coalescing code
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:04 -07:00
Trond Myklebust
91e59c368c NFS: Don't wait for congestion in nfs_update_request()
It is redundant, and will interfere with the call to
balance_dirty_pages_ratelimited_nr in generic_file_write().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:03 -07:00
Amnon Aaronsohn
1a0ba9ae48 NFS: statfs error-handling fix
The nfs statfs function returns a success code on error, and fills the
output buffer with invalid values.  The attached patch makes it return a
correct error code instead.

Signed-off-by: Amnon Aaronsohn <amnonaar@gmail.com>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
 (Modified patch to reinstate the dprintk())
2007-04-30 22:17:02 -07:00
Trond Myklebust
d585158b60 NFS: Fix nfs_set_page_dirty()
Be more careful about testing page->mapping.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-04-30 22:17:02 -07:00
Trond Myklebust
2b82f190c8 NFS: Fix race in nfs_set_page_dirty
Protect nfs_set_page_dirty() against races with nfs_inode_add_request.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-20 22:56:30 -07:00
Trond Myklebust
612c9384fd NFS: Fix the 'desynchronized value of nfs_i.ncommit' error
Redirtying a request that is already marked for commit will screw up the
accounting for NR_UNSTABLE_NFS as well as nfs_i.ncommit.
Ensure that all requests on the commit queue are labelled with the
PG_NEED_COMMIT flag, and avoid moving them onto the dirty list inside
nfs_page_mark_flush().

Also inline nfs_mark_request_dirty() into nfs_page_mark_flush() for
atomicity reasons. Avoid dropping the spinlock until we're done marking the
request in the radix tree and have added it to the ->dirty list.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-20 22:56:29 -07:00
Trond Myklebust
6d677e3504 NFS: Don't clear PG_writeback until after we've processed unstable writes
Ensure that we don't release the PG_writeback lock until after the page has
either been redirtied, or queued on the nfs_inode 'commit' list.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-20 22:56:29 -07:00
Trond Myklebust
8e821cad12 NFS: clean up the unstable write code
Get rid of the inlined #ifdefs.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-20 22:56:29 -07:00
Trond Myklebust
eb4cac10d9 NFS: Fix a list corruption problem
We must remove the request from whatever list it is currently on before we
can add it to the dirty list.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-15 16:48:11 -07:00
Trond Myklebust
5a6d41b32a NFS: Ensure PG_writeback is cleared when writeback fails
If the writebacks are cancelled via nfs_cancel_dirty_list, or due to the
memory allocation failing in nfs_flush_one/nfs_flush_multi, then we must
ensure that the PG_writeback flag is cleared.

Also ensure that we actually own the PG_writeback flag whenever we
schedule a new writeback by making nfs_set_page_writeback() return the
value of test_set_page_writeback().
The PG_writeback page flag ends up replacing the functionality of the
PG_FLUSHING nfs_page flag, so we rip that out too.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-14 21:46:48 -07:00
Trond Myklebust
60fa3f769f NFS: Fix two bugs in the O_DIRECT write code
Do not flag an error if the COMMIT call fails and we decide to resend the
writes. Let the resend flag the error if it fails.

If a write has failed, then nfs_direct_write_result should not attempt to
send a commit. It should just exit asap and return the error to the user.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-14 21:46:48 -07:00
Trond Myklebust
e1552e1998 NFS: Fix an Oops in nfs_setattr()
It looks like nfs_setattr() and nfs_rename() also need to test whether the
target is a regular file before calling nfs_wb_all()...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-04-14 21:46:47 -07:00
Trond Myklebust
634707388b [PATCH] nfs: nfs_getattr() can't call nfs_sync_mapping_range() for non-regular files
Looks like we need a check in nfs_getattr() for a regular file. It makes
no sense to call nfs_sync_mapping_range() on anything else. I think that
should fix your problem: it will stop the NFS client from interfering
with dirty pages on that inode's mapping.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Olof Johansson <olof@lixom.net>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-16 19:25:06 -07:00
Peter Zijlstra
89a09141df [PATCH] nfs: fix congestion control
The current NFS client congestion logic is severly broken, it marks the
backing device congested during each nfs_writepages() call but doesn't
mirror this in nfs_writepage() which makes for deadlocks.  Also it
implements its own waitqueue.

Replace this by a more regular congestion implementation that puts a cap on
the number of active writeback pages and uses the bdi congestion waitqueue.

Also always use an interruptible wait since it makes sense to be able to
SIGKILL the process even for mounts without 'intr'.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-03-16 19:25:05 -07:00
Eric W. Biederman
0b4d414714 [PATCH] sysctl: remove insert_at_head from register_sysctl
The semantic effect of insert_at_head is that it would allow new registered
sysctl entries to override existing sysctl entries of the same name.  Which is
pain for caching and the proc interface never implemented.

I have done an audit and discovered that none of the current users of
register_sysctl care as (excpet for directories) they do not register
duplicate sysctl entries.

So this patch simply removes the support for overriding existing entries in
the sys_sysctl interface since no one uses it or cares and it makes future
enhancments harder.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Corey Minyard <minyard@acm.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Jan Kara <jack@ucw.cz>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: David Chinner <dgc@sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-14 08:09:59 -08:00
Tim Schmielau
cd354f1ae7 [PATCH] remove many unneeded #includes of sched.h
After Al Viro (finally) succeeded in removing the sched.h #include in module.h
recently, it makes sense again to remove other superfluous sched.h includes.
There are quite a lot of files which include it but don't actually need
anything defined in there.  Presumably these includes were once needed for
macros that used to live in sched.h, but moved to other header files in the
course of cleaning it up.

To ease the pain, this time I did not fiddle with any header files and only
removed #includes from .c-files, which tend to cause less trouble.

Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha,
arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig,
allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all
configs in arch/arm/configs on arm.  I also checked that no new warnings were
introduced by the patch (actually, some warnings are removed that were emitted
by unnecessarily included header files).

Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-14 08:09:54 -08:00
Trond Myklebust
d9bc125caf Merge branch 'master' of /home/trondmy/kernel/linux-2.6/
Conflicts:

	net/sunrpc/auth_gss/gss_krb5_crypto.c
	net/sunrpc/auth_gss/gss_spkm3_token.c
	net/sunrpc/clnt.c

Merge with mainline and fix conflicts.
2007-02-12 22:43:25 -08:00
Chuck Lever
43d78ef2ba NFS: disconnect before retrying NFSv4 requests over TCP
RFC3530 section 3.1.1 states an NFSv4 client MUST NOT send a request
twice on the same connection unless it is the NULL procedure.  Section
3.1.1 suggests that the client should disconnect and reconnect if it
wants to retry a request.

Implement this by adding an rpc_clnt flag that an ULP can use to
specify that the underlying transport should be disconnected on a
major timeout.  The NFSv4 client asserts this new flag, and requests
no retries after a minor retransmit timeout.

Note that disconnecting on a retransmit is in general not safe to do
if the RPC client does not reuse the TCP port number when reconnecting.

See http://bugzilla.linux-nfs.org/show_bug.cgi?id=6

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-02-12 22:40:45 -08:00
Trond Myklebust
a301b77771 NFS: Don't use ClearPageUptodate() when writeback fails
ClearPageUptodate() will just cause races here. What we really want to do
is to invalidate the page cache.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-02-12 22:40:38 -08:00
Trond Myklebust
b0c4fddca2 NFS: Cleanup - avoid rereading 'jiffies' more than once in the same routine
Micro-optimisations for nfs_fhget() and nfs_wcc_update_inode().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-02-12 22:40:30 -08:00
Trond Myklebust
3e7d950a52 NFS: Fix a wraparound issue with nfsi->cache_change_attribute
Fix wraparound issue with nfsi->cache_change_attribute. If it is found
to lie in the future, then update it to lie in the past. Patch based on
a suggestion by Neil Brown.

..and minor micro-optimisation: avoid reading 'jiffies' more than once in
nfs_update_inode().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-02-12 22:40:22 -08:00
Josef 'Jeff' Sipek
ee9b6d61a2 [PATCH] Mark struct super_operations const
This patch is inspired by Arjan's "Patch series to mark struct
file_operations and struct inode_operations const".

Compile tested with gcc & sparse.

Signed-off-by: Josef 'Jeff' Sipek <jsipek@cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:47 -08:00
Arjan van de Ven
92e1d5be91 [PATCH] mark struct inode_operations const 2
Many struct inode_operations in the kernel can be "const".  Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data.  In addition it'll catch accidental writes at compile time to
these shared resources.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:46 -08:00
Arjan van de Ven
00977a59b9 [PATCH] mark struct file_operations const 6
Many struct file_operations in the kernel can be "const".  Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data.  In addition it'll catch accidental writes at compile time to
these shared resources.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:45 -08:00
Adrian Bunk
b5d5dfbd59 [PATCH] include/linux/nfsd/const.h: remove NFS_SUPER_MAGIC
NFS_SUPER_MAGIC is already defined in include/linux/magic.h

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:36 -08:00
Chuck Lever
27459f0940 [PATCH] knfsd: SUNRPC: Provide room in svc_rqst for larger addresses
Expand the rq_addr field to allow it to contain larger addresses.

Specifically, we replace a 'sockaddr_in' with a 'sockaddr_storage', then
everywhere the 'sockaddr_in' was referenced, we use instead an accessor
function (svc_addr_in) which safely casts the _storage to _in.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:36 -08:00
Chuck Lever
ad06e4bd62 [PATCH] knfsd: SUNRPC: Add a function to format the address in an svc_rqst for printing
There are loads of places where the RPC server assumes that the rq_addr fields
contains an IPv4 address.  Top among these are error and debugging messages
that display the server's IP address.

Let's refactor the address printing into a separate function that's smart
enough to figure out the difference between IPv4 and IPv6 addresses.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:35 -08:00
Chuck Lever
482fb94e1b [PATCH] knfsd: SUNRPC: allow creating an RPC service without registering with portmapper
Sometimes we need to create an RPC service but not register it with the local
portmapper.  NFSv4 delegation callback, for example.

Change the svc_makesock() API to allow optionally creating temporary or
permanent sockets, optionally registering with the local portmapper, and make
it return the ephemeral port of the new socket.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Cc: Aurelien Charbon <aurelien.charbon@ext.bull.net>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:35 -08:00
Trond Myklebust
588a700b26 NFSv4: /proc/mounts displays the wrong server name for referrals
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-02-03 15:35:10 -08:00
Chuck Lever
a3f565b1e5 NFS: fix print format for tk_pid
The tk_pid field is an unsigned short.  The proper print format specifier for
that type is %5u, not %4d.

Also clean up some miscellaneous print formatting nits.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-02-03 15:35:09 -08:00
Trond Myklebust
e148582e10 NFSv4: Add lockdep checks to nfs4_wait_clnt_recover()
Attempt to detect deadlocks due to caller holding locks on clp->cl_sem

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-02-03 15:35:08 -08:00
Trond Myklebust
a6a352e93d NFSv4: Don't start state recovery in nfs4_close_done()
We might not even have any open files at this point...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-02-03 15:35:08 -08:00
Trond Myklebust
7c85d9007d NFS: Fixup some outdated comments...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-02-03 15:35:07 -08:00
Trond Myklebust
d30c8348a4 NFS: nfs_writepages() cleanup
Strip out the call to nfs_commit_inode(), and allow that to be done by
nfs_write_inode().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-02-03 15:35:07 -08:00
Trond Myklebust
f40313ac39 NFS: Micro-optimisation for nfs_wb_page()
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-02-03 15:35:07 -08:00
Trond Myklebust
02241bc47e NFS: Ensure that ->writepage() uses flush_stable() when reclaiming pages
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-02-03 15:35:06 -08:00
Trond Myklebust
8e0969f045 NFS: Remove nfs_readpage_sync()
It makes no sense to maintain 2 parallel systems for reading in pages.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-02-03 15:35:06 -08:00
Trond Myklebust
c228fd3aee NFSv4: Cleanups for fs_locations code.
Start long arduous project...  What the hell is

	struct dentry = {};

all about?

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-02-03 15:35:06 -08:00
Trond Myklebust
faebf4e2bb NFSv4: Don't require that NFSv4 mount paths begin with '/'
Addresses the regression noted in
  http://bugzilla.linux-nfs.org/show_bug.cgi?id=134

Also mark a couple of other regressions as requiring fixing.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-02-03 15:35:05 -08:00
Trond Myklebust
c79ba787c1 NFS: Dont clobber more uptodate values in nfs_set_verifier()
nfs_lookup_revalidate and friends are not serialised, so it is currently
quite possible for the dentry to be revalidated, and then have the
updated verifier replaced with an older value by another process.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-02-03 15:35:05 -08:00
Trond Myklebust
ef75c7974b NFS: Also use readdir info to revalidate positive dentries
If the fileid of the cached dentry fails to match that returned by
the readdir call, then we should also d_drop. Try to take into account the
fact that on NFSv4, readdir may return the "mounted_on_fileid" by looking
for submounts.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-02-03 15:35:04 -08:00
Trond Myklebust
df1d5d23d3 NFS: Fix a readdir/lookup inefficiency.
Make sure that nfs_readdir_lookup() handles negative dentries correctly.
If d_lookup() returns a negative dentry, then we need to d_drop() that
since readdir shows that it should be positive.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-02-03 15:35:04 -08:00
Trond Myklebust
ccfeb50623 NFS: Fix up "rm -rf"...
When a file is being scheduled for deletion by means of the sillyrename
mechanism, it makes sense to start out writeback of the dirty data as
soon as possible in order to ensure that the delete can occur. Examples of
cases where this is an issue include "rm -rf", which will busy-wait until
the file is closed, and the sillyrename completes.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-02-03 15:35:04 -08:00
Trond Myklebust
ab91f264cf NFSv4: Fix NFS4_enc_server_caps_sz/NFS4_dec_server_caps_sz
Insert missing encode_putfh_maxsz/decode_putfh_maxsz

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-02-03 15:35:03 -08:00
Trond Myklebust
f2d0d85e58 NFSv4: Fix Oops in nfs4_create_referral_server
The filehandle that is passed into nfs4_create_referral_server is
not initialised. The expectation is that nfs4_create_referral_server will
initialise it, and return it to the caller.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-02-03 15:35:03 -08:00
Neil Brown
46bae1a9a7 [PATCH] Remove warning: VFS is out of sync with lock manager
But keep it as a dprintk

The message can be generated in a quite normal situation:
 If a 'lock' request is interrupted, then the lock client needs to
  record that the server has the lock, incase it does.
 When we come the unlock, the server might say it doesn't, even
  though we think it does (or might) and this generates the message.

Signed-off-by: Neil Brown <neilb@suse.de>
Acked-by: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-30 16:01:35 -08:00
Trond Myklebust
717d44e849 [PATCH] NFS: Fix races in nfs_revalidate_mapping()
Prevent the call to invalidate_inode_pages2() from racing with file writes
by taking the inode->i_mutex across the page cache flush and invalidate.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-24 12:31:06 -08:00
Trond Myklebust
e3db7691e9 [PATCH] NFS: Fix race in nfs_release_page()
NFS: Fix race in nfs_release_page()

    invalidate_inode_pages2() may find the dirty bit has been set on a page
    owing to the fact that the page may still be mapped after it was locked.
    Only after the call to unmap_mapping_range() are we sure that the page
    can no longer be dirtied.
    In order to fix this, NFS has hooked the releasepage() method and tries
    to write the page out between the call to unmap_mapping_range() and the
    call to remove_mapping(). This, however leads to deadlocks in the page
    reclaim code, where the page may be locked without holding a reference
    to the inode or dentry.

    Fix is to add a new address_space_operation, launder_page(), which will
    attempt to write out a dirty page without releasing the page lock.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>

    Also, the bare SetPageDirty() can skew all sort of accounting leading to
    other nasties.

[akpm@osdl.org: cleanup]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2007-01-11 18:18:21 -08:00