linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-16 17:12:06 +00:00

Author	SHA1	Message	Date
Benny Halevy	a1eaecbc4c	NFSv4.1: make deviceid cache global Move deviceid cache from the pnfs files layout driver to the generic layer in preparation for the objects layout driver. Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2011-05-29 12:09:48 +03:00
Benny Halevy	45df3c8b0f	pnfs: resolve header dependency in pnfs.h Some definitions in the header file depend on nfs_fs.h so pnfs.h can't be included independently. Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2011-05-29 12:09:48 +03:00
Benny Halevy	67d51f65bd	NFSv4.1: use struct nfs_client to qualify deviceid deviceids are unique per server, per layout type. Therefore, in the global cache in the files layout driver deviceids from different servers may clash so we need to qualify them with a struct nfs_client that represents the nfs server that returned the deviceid. Introduced in 2.6.39 commit `ea8eecdd` "NFSv4.1 move deviceid cache to filelayout driver" Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2011-05-29 12:09:47 +03:00
Jim Rees	3b6445a6f6	NFSv4.1: fix typo in filelayout_check_layout Signed-off-by: Jim Rees <rees@umich.edu> Signed-off-by: Benny Halevy <bhalevy@panasas.com>	2011-05-29 12:09:46 +03:00
Chuck Lever	4251c94833	NFS: Revert NFSROOT default mount options Marek Belisko <marek.belisko@gmail.com> reports that recent attempts to fix regressions in NFSROOT have broken his configuration: > After update from 2.6.38-rc8 to 2.6.38 is mounting rootfs over nfs not possible. > Log: > VFS: Mounted root (nfs filesystem) on device 0:14. > Freeing init memory: 132K > nfs: server 10.146.1.21 not responding, still trying > nfs: server 10.146.1.21 not responding, still trying > > This is never ending. I make short bisect (not too much commits > between versions) > and bad commit was reported: `53d4737580` > > NFS: NFSROOT should default to "proto=udp" > > I've tested on mini2440 board (DM9000, static IP). > Is there some missing option or something else to be checked? An examination of a network trace captured during the failure shows that the mount is actually succeeding, but that the client is not seeing READ replies larger than 16KB. This could be a local packet filtering issue on the client, but we didn't troubleshoot this further because of the reported "git bisect" result. Last fall we removed the ad hoc mount option parser in fs/nfs/nfsroot.c in favor of using the main parser in fs/nfs/super.c (see commit `56463e50` "NFS: Use super.c for NFSROOT mount option parsing"). That commit changed the default NFSROOT mount options to be the same as those employed by user space mounts. As it turns out, these new default mount options are not tolerated by many embedded systems. So far these problems have been due to specific behavior of certain embedded NICs. The NFS community does not have such hardware on hand for running tests. Commit `53d47375` recently introduced a clean way to specify default mount options for NFSROOT, so we can now easily restore the traditional defaults for NFSROOT: vers=2,udp,rsize=4096,wsize=4096 This should revert the new default NFSROOT mount options introduced with commit `56463e50`. Tested-by: Marek Belisto <marek.belisto@open-nandra.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-05-27 17:42:47 -04:00
Lai Jiangshan	26f04dde68	nfs,rcu: convert call_rcu(nfs_free_delegation_callback) to kfree_rcu() The rcu callback nfs_free_delegation_callback() just calls a kfree(), so we use kfree_rcu() instead of the call_rcu(nfs_free_delegation_callback). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-05-27 17:42:46 -04:00
Vitaliy Gusev	4b8ee2b82e	nfs41: Correct offset for LAYOUTCOMMIT A client sends offset to MDS as it was seen by DS. As result, file size after copy is only half of original file size in case of 2 DS. Signed-off-by: Vitaliy Gusev <gusev.vitaliy@nexenta.com> Cc: stable@kernel.org [2.6.39] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-05-27 17:42:01 -04:00
Harshula Jayasuriya	60c16ea877	NFS: nfs_update_inode: print current and new inode size in debug output Hi Trond, In nfs_update_inode debug output, print the current and new inode size when the file size changes on the NFS server. Signed-off-by: Harshula Jayasuriya <harshula@redhat.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-05-27 17:42:01 -04:00
Trond Myklebust	444f72fe7e	NFSv4.1: Fix the handling of NFS4ERR_SEQ_MISORDERED errors Currently, the call to nfs4_schedule_session_recovery() will actually just result in a test of the lease when what we really want is to force a session reset. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org	2011-05-27 17:42:01 -04:00
Trond Myklebust	0ced63d1a2	NFSv4: Handle expired stateids when the lease is still valid Currently, if the server returns NFS4ERR_EXPIRED in reply to a READ or WRITE, but the RENEW test determines that the lease is still active, we fail to recover and end up looping forever in a READ/WRITE + RENEW death spiral. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org	2011-05-27 17:42:01 -04:00
Ying Han	1495f230fa	vmscan: change shrinker API by passing shrink_control struct Change each shrinker's API by consolidating the existing parameters into shrink_control struct. This will simplify any further features added w/o touching each file of shrinker. [akpm@linux-foundation.org: fix build] [akpm@linux-foundation.org: fix warning] [kosaki.motohiro@jp.fujitsu.com: fix up new shrinker API] [akpm@linux-foundation.org: fix xfs warning] [akpm@linux-foundation.org: update gfs2] Signed-off-by: Ying Han <yinghan@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Minchan Kim <minchan.kim@gmail.com> Acked-by: Pavel Emelyanov <xemul@openvz.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Acked-by: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Hugh Dickins <hughd@google.com> Cc: Dave Hansen <dave@linux.vnet.ibm.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-05-25 08:39:26 -07:00
Trond Myklebust	a75b9df9d3	NFSv4.1: Ensure that layoutget uses the correct gfp modes Currently, writebacks may end up recursing back into the filesystem due to GFP_KERNEL direct reclaims in the pnfs subsystem. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-05-11 22:52:13 -04:00
Andy Adamson	2887fe4552	NFSv4.1: remove pnfs_layout_hdr from pnfs_destroy_all_layouts tmp_list Prevents an infinite loop as list was never emptied. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-05-11 14:20:13 -04:00
Andy Adamson	a8a4ae3a89	NFSv41: Resend on NFS4ERR_RETRY_UNCACHED_REP Free the slot and resend the RPC with new session <slot#,seq#>. For nfs4_async_handle_error, return -EAGAIN and set the task->tk_status to 0 to restart the async rpc in the rpc_restart_call_prepare state which resets the slot. For nfs4_handle_exception, retrying a call that uses nfs4_call_sync will reset the slot via nfs41_call_sync_prepare. For open/close/lock/locku/delegreturn/layoutcommit/unlink/rename/write cachethis is true, so these operations will not trigger an NFS4ERR_RETRY_UNCACHED_REP. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-05-11 14:01:33 -04:00
Jeff Layton	26c4c17073	nfs: don't lose MS_SYNCHRONOUS on remount of noac mount On a remount, the VFS layer will clear the MS_SYNCHRONOUS bit on the assumption that the flags on the mount syscall will have it set if the remounted fs is supposed to keep it. In the case of "noac" though, MS_SYNCHRONOUS is implied. A remount of such a mount will lose the MS_SYNCHRONOUS flag since "sync" isn't part of the mount options. Reported-by: Max Matveev <makc@redhat.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Cc: stable@kernel.org Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-04-27 16:20:01 -04:00
Bryan Schumaker	613e901e1e	NFS: Return meaningful status from decode_secinfo() When compiling, I was getting this warning: fs/nfs/nfs4xdr.c: In function ‘decode_secinfo’: fs/nfs/nfs4xdr.c:4839:6: warning: variable ‘status’ set but not used [-Wunused-but-set-variable] We were unconditionally returning 0 as long as there wasn't an error coming out of xdr_inline_decode(). We probably want to check the error status coming out of decode_op_hdr() and decode_secinfo_gss(), rather than assuming that everything is OK all the time. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-04-27 16:17:29 -04:00
Trond Myklebust	28331a46d8	NFSv4: Ensure we request the ordinary fileid when doing readdirplus When readdir() returns a directory entry for the root of a mounted filesystem, Linux follows the old convention of returning the inode number of the covered directory (despite newer versions of POSIX declaring that this is a bug). To ensure this continues to work, the NFSv4 readdir implementation requests the 'mounted-on-fileid' from the server. However, readdirplus also needs to instantiate an inode for this entry, and for that, we also need to request the real fileid as per this patch. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-04-27 15:57:16 -04:00
Trond Myklebust	1bd714f2a1	NFSv4: Ensure that clientid and session establishment can time out The following patch ensures that we do not get permanently trapped in the RPC layer when trying to establish a new client id or session. This again ensures that the state manager can finish in a timely fashion when the last filesystem to reference the nfs_client exits. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-04-24 14:29:33 -04:00
Trond Myklebust	fd954ae124	NFSv4.1: Don't loop forever in nfs4_proc_create_session If a server for some reason keeps sending NFS4ERR_DELAY errors, we can end up looping forever inside nfs4_proc_create_session, and so the usual mechanisms for detecting if the nfs_client is dead don't work. Fix this by ensuring that we loop inside the nfs4_state_manager thread instead. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-04-24 14:28:18 -04:00
Bryan Schumaker	fb8a5ba811	NFSv4: Handle NFS4ERR_WRONGSEC outside of nfs4_handle_exception() I only want to try other secflavors during an initial mount if NFS4ERR_WRONGSEC is returned. nfs4_handle_exception() could potentially map other errors to EPERM, so we should handle this error specially for correctness. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-04-18 17:06:00 -04:00
Bryan Schumaker	468f86134e	NFSv4.1: Don't update sequence number if rpc_task is not sent If we fail to contact the gss upcall program, then no message will be sent to the server. The client still updated the sequence number, however, and this lead to NFS4ERR_SEQ_MISMATCH for the next several RPC calls. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-04-18 17:05:48 -04:00
Trond Myklebust	47c2199b6e	NFSv4.1: Ensure state manager thread dies on last umount Currently, the state manager may continue to try recovering state forever even after the last filesystem to reference that nfs_client has umounted. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org	2011-04-15 18:28:22 -04:00
Bryan Schumaker	c3dfc2808a	NFS: Use correct variable for page bounds checking While decoding a secinfo reply, I store the list of supported sec flavors on a page accessible through res->flavors. Before reading each new flavor, I do some math to determine if there is enough space left on this page, and I break out of my read look if there isn't. In order to perform this check correctly, I need to use the address of res->flavors, rather than the address of res. When this loop was broken early I lied to the caller and told them that the entire list had been decoded. This could lead to problems if the caller tries to use any the garbage data claiming to be a valid sec flavor. I fixed this by using res->flavors->num_flavors as a counter, incrementing it every time a sec flavor is successfully decoded. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-04-13 15:12:23 -04:00
Bryan Schumaker	9b7160c55a	NFS: don't negotiate when user specifies sec flavor We were always attempting sec flavor negotiation, even if the user told us a specific sec flavor to use. If that sec flavor fails, we should return an error rather than continuing with sec flavor negotiation. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-04-13 15:12:23 -04:00
Bryan Schumaker	801a16dc7b	NFS: Attempt mount with default sec flavor first nfs4_lookup_root() is already configured to use either RPC_AUTH_UNIX or a user specified flavor (through -o sec=<whatever>). We should use this flavor first, and only attempt negotiation if it fails with -EPERM. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-04-13 15:12:23 -04:00
Bryan Schumaker	0fabee243a	NFS: flav_array honors NFS_MAX_SECFLAVORS NFS_MAX_SECFLAVORS should already take into account RPC_AUTH_UNIX and RPC_AUTH_NULL, so we don't need to set aside extra slots for them. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-04-13 15:12:22 -04:00
Bryan Schumaker	d1a8016a2d	NFS: Fix infinite loop in gss_create_upcall() There can be an infinite loop if gss_create_upcall() is called without the userspace program running. To prevent this, we return -EACCES if we notice that pipe_version hasn't changed (indicating that the pipe has not been opened). Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-04-13 15:12:22 -04:00
Weston Andros Adamson	79a48a1f5d	Don't mark_inode_dirty_sync() while holding lock mark_inode_dirty_sync() grabs the same inode lock! race conditions between holding the lock in pnfs_set_layoutcommit() and in mark_inode_dirty_sync() can result in a second call to pnfs_layoutcommit_inode(), but this will be a noop as NFS_INO_LAYOUTCOMMIT won't be set in the second call Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-04-13 13:15:51 -04:00
Trond Myklebust	c0d0e96b84	NFS: Get rid of pointless test in nfs_commit_done Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-04-12 19:34:23 -04:00
Bryan Schumaker	561f0b0ad0	NFS: Remove unused argument from nfs_find_best_sec() The inode was used in an earlier version of the code, but it isn't used anymore. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-04-12 19:34:23 -04:00
Trond Myklebust	4b38a6db01	NFS: Eliminate duplicate call to nfs_mark_request_dirty We only need to call nfs_mark_request_dirty() once in nfs_writepage_setup(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-04-12 19:34:22 -04:00
Jesper Juhl	160bc1604f	NFS: Remove dead code from nfs_fs_mount() In fs/nfs/super.c::nfs_fs_mount() we test for a NULL 'data': ... if (data == NULL \|\| mntfh == NULL) goto out_free_fh; ... and then further down in the function we test 'data' again: ... nfs_fscache_get_super_cookie( s, data ? data->fscache_uniq : NULL, NULL); ... this second check is just dead code since there is no way 'data' could possibly be NULL here. We also rely on a non-NULL 'data' in more than one location between these two tests, further proving the point that the second test is bogus. This patch removes the dead code. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-04-12 19:34:21 -04:00
Dave Chinner	0d88f6e804	nfs: don't call __mark_inode_dirty while holding i_lock nfs_scan_commit() is called with the inode->i_lock held, but it then calls __mark_inode_dirty() while still holding the lock. This causes a deadlock. Push the inode->i_lock into nfs_scan_commit() so it can protect only the parts of the code it needs to and can be dropped before the call to __mark_inode_dirty() to avoid the deadlock. Signed-off-by: Dave Chinner <dchinner@redhat.com> Tested-by: Will Simoneau <simoneau@ele.uri.edu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-04-12 14:17:24 -07:00
Linus Torvalds	94c8a984ae	Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFS: Change initial mount authflavor only when server returns NFS4ERR_WRONGSEC NFS: Fix a signed vs. unsigned secinfo bug Revert "net/sunrpc: Use static const char arrays"	2011-04-08 11:47:35 -07:00
Bryan Schumaker	37adb89fad	NFS: Change initial mount authflavor only when server returns NFS4ERR_WRONGSEC When attempting an initial mount, we should only attempt other authflavors if AUTH_UNIX receives a NFS4ERR_WRONGSEC error. This allows other errors to be passed back to userspace programs. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-04-07 13:19:40 -07:00
Bryan Schumaker	418875900e	NFS: Fix a signed vs. unsigned secinfo bug rpc_authflavor_t is cast from an unsigned int, but the initial code tried to use it as a signed int. I fix this by passing an rpc_authflavor_t pointer around, and returning signed integers from functions. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-04-06 13:25:04 -07:00
Lucas De Marchi	25985edced	Fix common misspellings Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>	2011-03-31 11:26:23 -03:00
Dave Chinner	0444d76ae6	fs: don't use igrab() while holding i_lock Fix the incorrect use of igrab() inside the i_lock in NFS and Ceph‥ If we are already holding the i_lock, we have a reference to the inode so we can safely use ihold() to gain an extra reference. This avoids hangs due to lock recursion on the i_lock now that the inode_lock is gone and igrab() uses the i_lock itself. Signed-off-by: Dave Chinner <dchinner@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Cc: Ryan Mallon <ryan@bluewatersys.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-03-29 07:50:34 -07:00
Trond Myklebust	a0e7e3cf79	NFS: Don't leak RPC clients in NFSv4 secinfo negotiation Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-27 17:48:17 +02:00
Trond Myklebust	4d65c520fb	NFS: Fix a hang in the writeback path Now that the inode scalability patches have been merged, it is no longer safe to call igrab() under the inode->i_lock. Now that we no longer call nfs_clear_request() until the nfs_page is being freed, we know that we are always holding a reference to the nfs_open_context, which again holds a reference to the path, and so the inode cannot be freed until the last nfs_page has been removed from the radix tree and freed. We can therefore skip the igrab()/iput() altogether. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-27 17:48:07 +02:00
Trond Myklebust	0acd220192	Merge branch 'nfs-for-2.6.39' into nfs-for-next	2011-03-24 17:03:14 -04:00
Weston Andros Adamson	35124a0994	Cleanup XDR parsing for LAYOUTGET, GETDEVICEINFO changes LAYOUTGET and GETDEVICEINFO XDR parsing to: - not use vmap, which doesn't work on incoherent archs - use xdr_stream parsing for all xdr Signed-off-by: Weston Andros Adamson <dros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-24 17:01:41 -04:00
Andy Adamson	ef31153786	NFSv4.1 convert layoutcommit sync to boolean Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-24 15:49:48 -04:00
Andy Adamson	de4b15c7e9	NFSv4.1 pnfs_layoutcommit_inode fixes Test NFS_INO_LAYOUTCOMMIT before kzalloc Mark inode dirty to retry LAYOUTCOMMIT on kzalloc failure. Add comments. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-24 15:49:48 -04:00
Bryan Schumaker	8f70e95f9f	NFS: Determine initial mount security When sec=<something> is not presented as a mount option, we should attempt to determine what security flavor the server is using. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-24 13:52:42 -04:00
Bryan Schumaker	7ebb931598	NFS: use secinfo when crossing mountpoints A submount may use different security than the parent mount does. We should figure out what sec flavor the submount uses at mount time. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-24 13:52:42 -04:00
Bryan Schumaker	5a5ea0d485	NFS: Add secinfo procedure This patch adds the nfs4 operation secinfo as a valid nfs rpc operation. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-24 13:52:41 -04:00
Bryan Schumaker	7c5130588d	NFS: lookup supports alternate client A later patch will need to perform a lookup using an alternate client with a different security flavor. This patch adds support for doing that on NFS v4. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-24 13:52:41 -04:00
Bryan Schumaker	e73b83f270	NFS: convert call_sync() to a function This patch changes nfs4_call_sync() from a macro into a static inline function. As a macro, the call_sync() function will not do any type checking and depends on the sequence arguments always having the same name. As a function, we get to have type checking and can rename the arguments if we so choose. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-24 13:52:41 -04:00
Fred Isaman	cccb4d063b	NFSv4.1 remove temp code that prevented ds commits Now that all the infrastructure is in place, we will do the right thing if we remove this special casing. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-23 15:29:04 -04:00
Andy Adamson	863a3c6c68	NFSv4.1: layoutcommit The filelayout driver sends LAYOUTCOMMIT only when COMMIT goes to the data server (as opposed to the MDS) and the data server WRITE is not NFS_FILE_SYNC. Only whole file layout support means that there is only one IOMODE_RW layout segment. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Alexandros Batsakis <batsakis@netapp.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Mingyang Guo <guomingyang@nrchpc.ac.cn> Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn> Signed-off-by: Zhang Jingwang <zhangjingwang@nrchpc.ac.cn> Tested-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-23 15:29:04 -04:00
Fred Isaman	e0c2b38018	NFSv4.1: filelayout driver specific code for COMMIT Implement all the hooks created in the previous patches. This requires exporting quite a few functions and adding a few structure fields. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-23 15:29:04 -04:00
Fred Isaman	988b6dceb0	NFSv4.1: remove GETATTR from ds commits Any COMMIT compound directed to a data server needs to have the GETATTR calls suppressed. We here, make sure the field we are testing (data->lseg) is set and refcounted correctly. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-23 15:29:03 -04:00
Fred Isaman	a861a1e1c3	NFSv4.1: add generic layer hooks for pnfs COMMIT We create three major hooks for the pnfs code. pnfs_mark_request_commit() is called during writeback_done from nfs_mark_request_commit, which gives the driver an opportunity to claim it wants control over commiting a particular req. pnfs_choose_commit_list() is called from nfs_scan_list to choose which list a given req should be added to, based on where we intend to send it for COMMIT. It is up to the driver to have preallocated list headers for each destination it may need. pnfs_commit_list() is how the driver actually takes control, it is used instead of nfs_commit_list(). In order to pass information between the above functions, we create a union in nfs_page to hold a lseg (which is possible because the req is not on any list while in transition), and add some flags to indicate if we need to use the pnfs code. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-23 15:29:03 -04:00
Fred Isaman	425eb736cd	NFSv4.1: alloc and free commit_buckets Create a preallocated list header to hold nfs_pages for each non-MDS COMMIT destination. Note this is not necessarily each DS, but is basically each <DS, fh> pair. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-23 15:29:03 -04:00
Fred Isaman	c879513e91	NFSv4.1: shift filelayout_free_lseg Move it up to avoid forward declaration in later patch. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-23 15:29:03 -04:00
Fred Isaman	5917ce8440	NFSv4.1: pull out code from nfs_commit_release Create a separate support function for later use by data server commit code. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-23 15:29:03 -04:00
Fred Isaman	64bfeb49bd	NFSv4.1: pull error handling out of nfs_commit_list Create a separate support function for later use by data server commit code. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-23 15:29:03 -04:00
Fred Isaman	5f452431e2	NFSv4.1: add callback to nfs4_commit_done Add a callback that the pnfs layout driver can use to do its own error handling of the data server's COMMIT response. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-23 15:29:03 -04:00
Fred Isaman	9ace33cdc6	NFSv4.1: rearrange nfs_commit_rpcsetup Reorder nfs_commit_rpcsetup, preparing for a pnfs entry point. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-23 15:29:02 -04:00
Fred Isaman	465d52437d	NFSv4.1: don't send COMMIT to ds for data sync writes Based on consensus reached in Feb 2011 interim IETF meeting regarding use of LAYOUTCOMMIT, it has been decided that a NFS_DATA_SYNC return from a WRITE to data server should not initiate a COMMIT. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-23 15:29:02 -04:00
Bryan Schumaker	8ef2ce3e16	NFS: Detect loops in a readdir due to bad cookies Some filesystems (such as ext4) can return the same cookie value for multiple files. If we try to start a readdir with one of these cookies, the server will return the first file found with a cookie of the same value. This can cause the client to enter an infinite loop. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-23 15:14:27 -04:00
Bryan Schumaker	480c2006eb	NFS: Create nfs_open_dir_context nfs_opendir() created a context that held much more information than we need for a readdir. This patch introduces a slimmed-down nfs_open_dir_context that contains only the cookie and the cred used for RPC operations. The new context will eventually be used to help detect readdir loops. Signed-off-by: Bryan Schumaker <bjschuma@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-23 15:13:11 -04:00
Trond Myklebust	e47c085afb	NFS: Ensure that we update the readdir filp->f_pos correctly If we're doing a search by readdir cookie, we need to ensure that the resulting f_pos is updated. To do so, we need to update the desc->current_index, in the same way that we do in the search by file offset case. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-23 15:12:12 -04:00
Gusev Vitaliy	4667058b77	nfs4: Fix NULL dereference at d_alloc_and_lookup() d_alloc_and_lookup() calls i_op->lookup method due to rootfh changes his fsid. During mount i_op of NFS root inode is set to nfs_mountpoint_inode_operations, if rpc_ops->getroot() and rpc_ops->getattr() return different fsid. After that nfs_follow_remote_path() raised oops: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [< (null)>] (null) stack trace: d_alloc_and_lookup+0x4c/0x74 do_lookup+0x1e3/0x280 link_path_walk+0x12e/0xab0 nfs4_remote_get_sb+0x56/0x2c0 [nfs] path_walk+0x67/0xe0 vfs_path_lookup+0x8e/0x100 nfs_follow_remote_path+0x16f/0x3e0 [nfs] nfs4_try_mount+0x6f/0xd0 [nfs] nfs_get_sb+0x269/0x400 [nfs] vfs_kern_mount+0x8a/0x1f0 do_kern_mount+0x52/0x130 do_mount+0x20a/0x260 sys_mount+0x90/0xe0 system_call_fastpath+0x16/0x1b So just refresh fsid, as RFC3530 doesn't specify behavior in case of rootfh changes fsid. Signed-off-by: Vitaliy Gusev <gusev.vitaliy@nexenta.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-22 20:00:25 -04:00
Trond Myklebust	b8413f98f9	NFS: Fix a hang/infinite loop in nfs_wb_page() When one of the two waits in nfs_commit_inode() is interrupted, it returns a non-negative value, which causes nfs_wb_page() to think that the operation was successful causing it to busy-loop rather than exiting. It also causes nfs_file_fsync() to incorrectly report the file as being successfully committed to disk. This patch fixes both problems by ensuring that we return an error if the attempts to wait fail. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org	2011-03-21 21:09:24 -04:00
Trond Myklebust	b31268ac79	FS: Use stable writes when not doing a bulk flush If we're only doing a single write, and there are no other unstable writes being queued up, we might want to just flip to using a stable write RPC call. Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-21 21:08:17 -04:00
Dan Carpenter	1c34092adf	nfs: lock() vs unlock() typo These should be spin_unlock() instead of spin_lock(). It's a typo. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-21 00:45:50 -04:00
Linus Torvalds	179198373c	Merge branch 'nfs-for-2.6.39' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'nfs-for-2.6.39' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (54 commits) RPC: killing RPC tasks races fixed xprt: remove redundant check SUNRPC: Convert struct rpc_xprt to use atomic_t counters SUNRPC: Ensure we always run the tk_callback before tk_action sunrpc: fix printk format warning xprt: remove redundant null check nfs: BKL is no longer needed, so remove the include NFS: Fix a warning in fs/nfs/idmap.c Cleanup: Factor out some cut-and-paste code. cleanup: save 60 lines/100 bytes by combining two mostly duplicate functions. NFS: account direct-io into task io accounting gss:krb5 only include enctype numbers in gm_upcall_enctypes RPCRDMA: Fix FRMR registration/invalidate handling. RPCRDMA: Fix to XDR page base interpretation in marshalling logic. NFSv4: Send unmapped uid/gids to the server when using auth_sys NFSv4: Propagate the error NFS4ERR_BADOWNER to nfs4_do_setattr NFSv4: cleanup idmapper functions to take an nfs_server argument NFSv4: Send unmapped uid/gids to the server if the idmapper fails NFSv4: If the server sends us a numeric uid/gid then accept it NFSv4.1: reject zero layout with zeroed stripe unit ...	2011-03-17 17:40:00 -07:00
Linus Torvalds	054cfaacf8	Merge branch 'mnt_devname' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'mnt_devname' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: vfs: bury ->get_sb() nfs: switch NFS from ->get_sb() to ->mount() nfs: stop mangling ->mnt_devname on NFS vfs: new superblock methods to override /proc/*/mount{s,info} nfs: nfs_do_{ref,sub}mount() superblock argument is redundant nfs: make nfs_path() work without vfsmount nfs: store devname at disconnected NFS roots nfs: propagate devname to nfs{,4}_get_root()	2011-03-16 19:09:57 -07:00
Al Viro	011949811b	nfs: switch NFS from ->get_sb() to ->mount() The last remaining instances of ->get_sb() can be converted ->mount() now - nothing in them uses new vfsmount anymore. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-16 16:48:06 -04:00
Al Viro	fd462fb51d	nfs: stop mangling ->mnt_devname on NFS now we can do that - nobody cares about its value anymore. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-16 16:48:06 -04:00
Al Viro	c7f404b40a	vfs: new superblock methods to override /proc/*/mount{s,info} a) ->show_devname(m, mnt) - what to put into devname columns in mounts, mountinfo and mountstats b) ->show_path(m, mnt) - what to put into relative path column in mountinfo Leaving those NULL gives old behaviour. NFS switched to using those. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-16 16:48:06 -04:00
Al Viro	f8ad9c4bae	nfs: nfs_do_{ref,sub}mount() superblock argument is redundant It's always equal to dentry->d_sb Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-16 16:48:06 -04:00
Al Viro	b514f872f8	nfs: make nfs_path() work without vfsmount part 3: now we have everything to get nfs_path() just by dentry - just follow to (disconnected) root and pick the rest of the thing there. Start killing propagation of struct vfsmount * on the paths that used to bring it to nfs_path(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-16 16:47:55 -04:00
Al Viro	b1942c5f8c	nfs: store devname at disconnected NFS roots part 2: make sure that disconnected roots have corresponding mnt_devname values stashed into them. Have nfs_get_root() stuff a copy of devname into ->d_fsdata of the found root, provided that it is disconnected. Have ->d_release() free it when dentry goes away. Have the places where NFS uses ->d_fsdata for sillyrename (and that can never* happen to a disconnected root - dentry will be attached to its parent) free old devname copies if they find those. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-16 16:44:24 -04:00
Al Viro	0d5839ad05	nfs: propagate devname to nfs{,4}_get_root() step 1 of ->mnt_devname fixes: make sure we have the value of devname available in ..._get_root(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-16 16:27:04 -04:00
Linus Torvalds	bd2895eead	Merge branch 'for-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq * 'for-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: fix build failure introduced by s/freezeable/freezable/ workqueue: add system_freezeable_wq rds/ib: use system_wq instead of rds_ib_fmr_wq net/9p: replace p9_poll_task with a work net/9p: use system_wq instead of p9_mux_wq xfs: convert to alloc_workqueue() reiserfs: make commit_wq use the default concurrency level ocfs2: use system_wq instead of ocfs2_quota_wq ext4: convert to alloc_workqueue() scsi/scsi_tgt_lib: scsi_tgtd isn't used in memory reclaim path scsi/be2iscsi,qla2xxx: convert to alloc_workqueue() misc/iwmc3200top: use system_wq instead of dedicated workqueues i2o: use alloc_workqueue() instead of create_workqueue() acpi: kacpi*_wq don't need WQ_MEM_RECLAIM fs/aio: aio_wq isn't used in memory reclaim path input/tps6507x-ts: use system_wq instead of dedicated workqueue cpufreq: use system_wq instead of dedicated workqueues wireless/ipw2x00: use system_wq instead of dedicated workqueues arm/omap: use system_wq in mailbox workqueue: use WQ_MEM_RECLAIM instead of WQ_RESCUER	2011-03-16 08:20:19 -07:00
Stephen Rothwell	8f68cd42d8	nfs: BKL is no longer needed, so remove the include Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-15 08:44:35 -04:00
Rob Landley	c5cb09b6f8	Cleanup: Factor out some cut-and-paste code. Factor out some cut-and-paste code in options parsing. Saves about 800 bytes on x86-64. Signed-off-by: Rob Landley <rlandley@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:39:28 -05:00
Rob Landley	c12bacec45	cleanup: save 60 lines/100 bytes by combining two mostly duplicate functions. Eliminate two mostly duplicate functions (nfs_parse_simple_hostname() and nfs_parse_protected_hostname()) and instead just make the calling function (nfs_parse_devname()) do everything. Signed-off-by: Rob Landley <rlandley@parallels.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:39:28 -05:00
Konstantin Khlebnikov	7ec10f26e1	NFS: account direct-io into task io accounting Account NFS direct-io reads and writes into Task I/O Accounting. Do it before complition to handle aio. NFS have unusual direct-io implementation, thus accounting in generic code does not work. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:39:27 -05:00
Trond Myklebust	b064eca2cf	NFSv4: Send unmapped uid/gids to the server when using auth_sys The new behaviour is enabled using the new module parameter 'nfs4_disable_idmapping'. Note that if the server rejects an unmapped uid or gid, then the client will automatically switch back to using the idmapper. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:39:27 -05:00
Trond Myklebust	3ddeb7c5c6	NFSv4: Propagate the error NFS4ERR_BADOWNER to nfs4_do_setattr This will be required in order to switch uid/gid mapping back on if the admin has tried to disable it. Note that we also propagate NFS4ERR_BADNAME at the same time, in order to work around a Linux server bug. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:39:27 -05:00
Trond Myklebust	e4fd72a17d	NFSv4: cleanup idmapper functions to take an nfs_server argument ...instead of the nfs_client. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:39:26 -05:00
Trond Myklebust	f0b851689a	NFSv4: Send unmapped uid/gids to the server if the idmapper fails Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:39:26 -05:00
Trond Myklebust	5cf36cfdc8	NFSv4: If the server sends us a numeric uid/gid then accept it Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:39:26 -05:00
Benny Halevy	75247affd7	NFSv4.1: reject zero layout with zeroed stripe unit Allowing stripe_unit==0 causes the client to crash later on when dividing by zero. Reported-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:45 -05:00
Fred Isaman	36fe432d33	NFSv4.1: Clear lseg pointer in ->doio function Now that we have access to the pointer, clear it immediately after the put, instead of in caller. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:45 -05:00
Fred Isaman	c76069bda0	NFSv4.1: rearrange ->doio args This will make it possible to clear the lseg pointer in the same function as it is put, instead of in the caller nfs_pageio_doio(). Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:44 -05:00
Fred Isaman	a69aef1496	NFSv4.1: pnfs filelayout driver write Allows the pnfs filelayout driver to write to the data servers. Note that COMMIT to data servers will be implemented in a future patch. To avoid improper behavior, for the moment any WRITE to a data server that would also require a COMMIT to the data server is sent NFS_FILE_SYNC. Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Mingyang Guo <guomingyang@nrchpc.ac.cn> Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:44 -05:00
Fred Isaman	7ffd10640d	NFSv4.1: remove GETATTR from ds writes Any WRITE compound directed to a data server needs to have the GETATTR calls suppressed. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:44 -05:00
Andy Adamson	0382b74409	NFSv4.1: implement generic pnfs layer write switch Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: J. Bruce Fields <bfields@fieldses.org> Signed-off-by: Mike Sager <sager@netapp.com> Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn> Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:44 -05:00
Fred Isaman	44b83799a9	NFSv4.1: trigger LAYOUTGET for writes Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:44 -05:00
Fred Isaman	5053aa568d	NFSv4.1: Send lseg down into nfs_write_rpcsetup We grab the lseg sent in from the doio function and attach it to each struct nfs_write_data created. This is how the lseg will be sent to the layout driver. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:44 -05:00
Fred Isaman	b029bc9b08	NFSv4.1: add callback to nfs4_write_done Add callback that pnfs layout driver can use to do its own handling of data server WRITE response. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:43 -05:00
Andy Adamson	d138d5d17b	NFSv4.1: rearrange nfs_write_rpcsetup Reorder nfs_write_rpcsetup, preparing for a pnfs entry point. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:43 -05:00
Andy Adamson	568e8c494d	NFSv4.1: turn off pNFS on ds connection failure If a data server is unavailable, go through MDS. Mark the deviceid containing the data server as a negative cache entry. Do not try to connect to any data server on a deviceid marked as a negative cache entry. Mark any layout that tries to use the marked deviceid as failed. Inodes with a layout marked as fails will not use the layout for I/O, and will not perform any more layoutgets. Inodes without a layout will still do layoutget, but the layout will get marked immediately. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:43 -05:00
Christoph Hellwig	ea8eecdd11	NFSv4.1 move deviceid cache to filelayout driver No need for generic cache with only one user. Keep a simple hash of deviceids in the filelayout driver. Signed-off-by: Christoph Hellwig <hch@infradead.org> Acked-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:43 -05:00
Andy Adamson	cbdabc7f8b	NFSv4.1: filelayout async error handler Use our own async error handler. Mark the layout as failed and retry i/o through the MDS on specified errors. Update the mds_offset in nfs_readpage_retry so that a failed short-read retry to a DS gets correctly resent through the MDS. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:43 -05:00
Andy Adamson	dc70d7b318	NFSv4.1: filelayout read Attempt a pNFS file layout read by setting up the nfs_read_data struct and calling nfs_initiate_read with the data server rpc client and the filelayout rpc call ops. Error handling is implemented in a subsequent patch. Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Mingyang Guo <guomingyang@nrchpc.ac.cn> Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Tested-by: Guo Mingyang <guomingyang@nrchpc.ac.cn> Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:43 -05:00
Fred Isaman	cfe7f4120f	NFSv4.1: filelayout i/o helpers Prepare for filelayout_read_pagelist with helper functions that find the correct data server, filehandle, and offset. Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Marc Eshel <eshel@almaden.ibm.com> Signed-off-by: Mike Sager <sager@netapp.com> Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn> Signed-off-by: Tigran Mkrtchyan <tigran@anahit.desy.de> Signed-off-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de> Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:42 -05:00
Andy Adamson	d83217c135	NFSv4.1: data server connection Introduce a data server set_client and init session following the nfs4_set_client and nfs4_init_session convention. Once a new nfs_client is on the nfs_client_list, the nfs_client cl_cons_state serializes access to creating an nfs_client struct with matching properties. Use the new nfs_get_client() that initializes new clients. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:42 -05:00
Andy Adamson	64419a9b20	NFSv4.1: generic read Separate the rpc run portion of nfs_read_rpcsetup into a new function nfs_initiate_read that is called for normal NFS I/O. Add a pNFS read_pagelist function that is called instead of nfs_intitate_read for pNFS reads. Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Mike Sager <sager@netapp.com> Signed-off-by: Mingyang Guo <guomingyang@nrchpc.ac.cn> Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn> Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:42 -05:00
Fred Isaman	bae724ef95	NFSv4.1: shift pnfs_update_layout locations Move the pnfs_update_layout call location to nfs_pageio_do_add_request(). Grab the lseg sent in the doio function to nfs_read_rpcsetup and attach it to each nfs_read_data so it can be sent to the layout driver. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:42 -05:00
Fred Isaman	94ad1c80e2	NFSv4.1: coelesce across layout stripes Add a pg_test layout driver hook which is used to avoid coelescing I/O across layout stripes. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Andy Adamson <andros@citi.umich.edu> Signed-off-by: Dean Hildebrand <dhildeb@us.ibm.com> Signed-off-by: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Oleg Drokin <green@linuxhacker.ru> Signed-off-by: Tao Guo <guotao@nrchpc.ac.cn> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:42 -05:00
Fred Isaman	d684d2ae10	NFSv4.1: lseg refcounting Prepare put_lseg and get_lseg to be called from the pNFS I/O code. Pull common code from pnfs_lseg_locked to call from pnfs_lseg. Inline pnfs_lseg_locked into it's only caller. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Benny Halevy <bhalevy@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:42 -05:00
Andy Adamson	94de8b27d0	NFSv4.1: add MDS mount DS only check The DS only role cannot be used to mount. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:41 -05:00
Andy Adamson	d6fb79d433	NFSv4.1: new flag for lease time check Data servers cannot send nfs4_proc_get_lease_time. but still need to setup state renewal. Add the NFS_CS_CHECK_LEASE_TIME bit to indicate if the lease time can be checked. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:41 -05:00
Andy Adamson	d3b4c9d767	NFSv4.1: new flag for state renewal check Data servers not sharing a session with the mount MDS always have an empty cl_superblocks list. Replace the cl_superblocks empty list check to see if it is time to shut down renewd with the NFS_CS_STOP_RENEW bit which is not set by such a data server. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:41 -05:00
Andy Adamson	89d1ea6579	NFSv4.1: send zero stateid seqid on v4.1 i/o Data servers require a zero stateid seqid, and there is no advantage to not doing the same for all NFSv4.1 Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:41 -05:00
Andy Adamson	45a52a0207	NFS move nfs_client initialization into nfs_get_client Now nfs_get_client returns an nfs_client ready to be used no matter if it was found or created. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:41 -05:00
Andy Adamson	bf9c1387ca	NFSv4.1: put_layout_hdr can remove nfsi->layout Prevents an Oops triggered by CB_LAYOUTRECALL and LAYOUTGET race on a pnfs_layout_hdr first pnfs_layout_segment. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:41 -05:00
Fred Isaman	136028967a	NFS: change nfs_writeback_done to return void The return values are not used by any callers. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:40 -05:00
Fred Isaman	83762c56c1	NFS: remove pointless if statement in nfs_direct_write_result The code was doing nothing more in either branch of the if. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:40 -05:00
Fred Isaman	f49f9baac8	pnfs: fix pnfs lock inversion of i_lock and cl_lock The pnfs code was using throughout the lock order i_lock, cl_lock. This conflicts with the nfs delegation code. Rework the pnfs code to avoid taking both locks simultaneously. Currently the code takes the double lock to add/remove the layout to a nfs_client list, while atomically checking that the list of lsegs is empty. To avoid this, we rely on existing serializations. When a layout is initialized with lseg count equal zero, LAYOUTGET's openstateid serialization is in effect, making it safe to assume it stays zero unless we change it. And once a layout's lseg count drops to zero, it is set as DESTROYED and so will stay at zero. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:40 -05:00
Fred Isaman	9f52c2525e	pnfs: do not need to clear NFS_LAYOUT_BULK_RECALL flag We do not need to clear the NFS_LAYOUT_BULK_RECALL, as setting it guarantees that NFS_LAYOUT_DESTROYED will be set once any outstanding io is finished. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:40 -05:00
Fred Isaman	3851172244	pnfs: avoid incorrect use of layout stateid The code could violate the following from RFC5661, section 12.5.3: "Once a client has no more layouts on a file, the layout stateid is no longer valid and MUST NOT be used." This can occur when a layout already has a lseg, starts another non-everlapping LAYOUTGET, and a CB_LAYOUTRECALL for the existing lseg is processed before we hit pnfs_layout_process(). Solve by setting, each time the client has no more lsegs for a file, a flag which blocks further use of the layout and triggers its removal. This also fixes a second bug which occurs in the same instance as above. If we actually use pnfs_layout_process, we add the new lseg to the layout, but the layout has been removed from the nfs_client list by the intervening CB_LAYOUTRECALL and will not be added back. Thus the newly acquired lseg will not be properly returned in the event of a subsequent CB_LAYOUTRECALL. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:39 -05:00
Chuck Lever	53d4737580	NFS: NFSROOT should default to "proto=udp" There have been a number of recent reports that NFSROOT is no longer working with default mount options, but fails only with certain NICs. Brian Downing <bdowning@lavos.net> bisected to commit `56463e50` "NFS: Use super.c for NFSROOT mount option parsing". Among other things, this commit changes the default mount options for NFSROOT to use TCP instead of UDP as the underlying transport. TCP seems less able to deal with NICs that are slow to initialize. The system logs that have accompanied reports of problems all show that NFSROOT attempts to establish a TCP connection before the NIC is fully initialized, and thus the TCP connection attempt fails. When a TCP connection attempt fails during a mount operation, the NFS stack needs to fail the operation. Usually user space knows how and when to retry it. The network layer does not report a distinct error code for this particular failure mode. Thus, there isn't a clean way for the RPC client to see that it needs to retry in this case, but not in others. Because NFSROOT is used in some environments where it is not possible to update the kernel command line to specify "udp", the proper thing to do is change NFSROOT to use UDP by default, as it did before commit `56463e50`. To make it easier to see how to change default mount options for NFSROOT and to distinguish default settings from mandatory settings, I've adjusted a couple of areas to document the specifics. root_nfs_cat() is also modified to deal with commas properly when concatenating strings containing mount option lists. This keeps root_nfs_cat() call sites simpler, now that we may be concatenating multiple mount option strings. Tested-by: Brian Downing <bdowning@lavos.net> Tested-by: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Cc: <stable@kernel.org> # 2.6.37 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:07 -05:00
Huang Weiyi	57df216bd8	nfs4: remove duplicated #include Remove duplicated #include('s) in fs/nfs/nfs4proc.c Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:18:37 -05:00
Trond Myklebust	f9feab1e18	NFSv4: nfs4_state_mark_reclaim_nograce() should be static There are no more external users of nfs4_state_mark_reclaim_nograce() or nfs4_state_mark_reclaim_reboot(), so mark them as static. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:18:36 -05:00
Trond Myklebust	ecac799a5e	NFSv4: Fix the setlk error handler Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:18:36 -05:00
Trond Myklebust	b4410c2f7f	NFSv4.1: Fix the handling of the SEQUENCE status bits We want SEQUENCE status bits to be handled by the state manager in order to avoid threading issues. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:18:35 -05:00
Trond Myklebust	0400a6b0cb	NFSv4/4.1: Fix nfs4_schedule_state_recovery abuses nfs4_schedule_state_recovery() should only be used when we need to force the state manager to check the lease. If we just want to start the state manager in order to handle a state recovery situation, we should be using nfs4_schedule_state_manager(). This patch fixes the abuses of nfs4_schedule_state_recovery() by replacing its use with a set of helper functions that do the right thing. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:18:22 -05:00
Andy Adamson	c34c32ea97	NFSv4.1 reclaim complete must wait for completion Signed-off-by: Andy Adamson <andros@netapp.com> [Trond: fix whitespace errors] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:05:01 -05:00
Andy Adamson	114f64b5f2	NFSv4: remove duplicate clientid in struct nfs_client Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:05:00 -05:00
Ricardo Labiaga	7d6d63d642	NFSv4.1: Retry CREATE_SESSION on NFS4ERR_DELAY Fix bug where we currently retry the EXCHANGEID call again, eventhough we already have a valid clientid. Instead, delay and retry the CREATE_SESSION call. Signed-off-by: Ricardo Labiaga <Ricardo.Labiaga@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:04:59 -05:00
Frank Filz	3fa0b4e201	(try3-resend) Fix nfs_compat_user_ino64 so it doesn't cause problems if bit 31 or 63 are set in fileid The problem was use of an int32, which when converted to a uint64 is sign extended resulting in a fileid that doesn't fit in 32 bits even though the intent of the function is to fit the fileid into 32 bits. Signed-off-by: Frank Filz <ffilzlnx@us.ibm.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> [Trond: Added an include for compat.h] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:04:58 -05:00
Jovi Zhang	43b7c3f051	nfs: fix compilation warning this commit fix compilation warning as following: linux-2.6/fs/nfs/nfs4proc.c:3265: warning: comparison of distinct pointer types lacks a cast Signed-off-by: Jovi Zhang <bookjovi@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:04:56 -05:00
Stanislav Fomichev	b9f810570d	nfs: add kmalloc return value check in decode_and_add_ds add kmalloc return value check in decode_and_add_ds Signed-off-by: Stanislav Fomichev <kernel@fomichev.me> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:04:55 -05:00
Jeff Layton	d2224e7afb	nfs: close NFSv4 COMMIT vs. CLOSE race I've been adding in more artificial delays in the NFSv4 commit and close codepaths to uncover races. The kernel I'm testing has the patch to close the race in __rpc_wait_for_completion_task that's in Trond's cthon2011 branch. The reproducer I've been using does this in a loop: mkdir("DIR"); fd = open("DIR/FILE", O_WRONLY\|O_CREAT\|O_EXCL, 0644); write(fd, "abcdefg", 7); close(fd); unlink("DIR/FILE"); rmdir("DIR"); The above reproducer shouldn't result in any silly-renaming. However, when I add a "msleep(100)" just after the nfs_commit_clear_lock call in nfs_commit_release, I can almost always force one to occur. If I can force it to occur with that, then it can happen without that delay given the right timing. nfs_commit_inode waits for the NFS_INO_COMMIT bit to clear when called with FLUSH_SYNC set. nfs_commit_rpcsetup on the other hand does not wait for the task to complete before putting its reference to it, so the last reference get put in rpc_release task and gets queued to a workqueue. In this situation, the last open context reference may be put by the COMMIT release instead of the close() syscall. The close() syscall returns too quickly and the unlink runs while the d_count is still high since the COMMIT release hasn't put its dentry reference yet. Fix this by having rpc_commit_rpcsetup wait for the RPC call to complete before putting the task reference when FLUSH_SYNC is set. With this, the last reference is put by the process that's initiating the FLUSH_SYNC commit and the race is closed. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:04:53 -05:00
Trond Myklebust	bf294b41ce	SUNRPC: Close a race in __rpc_wait_for_completion_task() Although they run as rpciod background tasks, under normal operation (i.e. no SIGKILL), functions like nfs_sillyrename(), nfs4_proc_unlck() and nfs4_do_close() want to be fully synchronous. This means that when we exit, we want all references to the rpc_task to be gone, and we want any dentry references etc. held by that task to be released. For this reason these functions call __rpc_wait_for_completion_task(), followed by rpc_put_task() in the expectation that the latter will be releasing the last reference to the rpc_task, and thus ensuring that the callback_ops->rpc_release() has been called synchronously. This patch fixes a race which exists due to the fact that rpciod calls rpc_complete_task() (in order to wake up the callers of __rpc_wait_for_completion_task()) and then subsequently calls rpc_put_task() without ensuring that these two steps are done atomically. In order to avoid adding new spin locks, the patch uses the existing waitqueue spin lock to order the rpc_task reference count releases between the waiting process and rpciod. The common case where nobody is waiting for completion is optimised for by checking if the RPC_TASK_ASYNC flag is cleared and/or if the rpc_task reference count is 1: in those cases we drop trying to grab the spin lock, and immediately free up the rpc_task. Those few processes that need to put the rpc_task from inside an asynchronous context and that do not care about ordering are given a new helper: rpc_put_task_async(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:04:52 -05:00
Neil Horman	e9e3d724e2	nfs4: Ensure that ACL pages sent over NFS were not allocated from the slab (v3) The "bad_page()" page allocator sanity check was reported recently (call chain as follows): bad_page+0x69/0x91 free_hot_cold_page+0x81/0x144 skb_release_data+0x5f/0x98 __kfree_skb+0x11/0x1a tcp_ack+0x6a3/0x1868 tcp_rcv_established+0x7a6/0x8b9 tcp_v4_do_rcv+0x2a/0x2fa tcp_v4_rcv+0x9a2/0x9f6 do_timer+0x2df/0x52c ip_local_deliver+0x19d/0x263 ip_rcv+0x539/0x57c netif_receive_skb+0x470/0x49f :virtio_net:virtnet_poll+0x46b/0x5c5 net_rx_action+0xac/0x1b3 __do_softirq+0x89/0x133 call_softirq+0x1c/0x28 do_softirq+0x2c/0x7d do_IRQ+0xec/0xf5 default_idle+0x0/0x50 ret_from_intr+0x0/0xa default_idle+0x29/0x50 cpu_idle+0x95/0xb8 start_kernel+0x220/0x225 _sinittext+0x22f/0x236 It occurs because an skb with a fraglist was freed from the tcp retransmit queue when it was acked, but a page on that fraglist had PG_Slab set (indicating it was allocated from the Slab allocator (which means the free path above can't safely free it via put_page. We tracked this back to an nfsv4 setacl operation, in which the nfs code attempted to fill convert the passed in buffer to an array of pages in __nfs4_proc_set_acl, which gets used by the skb->frags list in xs_sendpages. __nfs4_proc_set_acl just converts each page in the buffer to a page struct via virt_to_page, but the vfs allocates the buffer via kmalloc, meaning the PG_slab bit is set. We can't create a buffer with kmalloc and free it later in the tcp ack path with put_page, so we need to either: 1) ensure that when we create the list of pages, no page struct has PG_Slab set or 2) not use a page list to send this data Given that these buffers can be multiple pages and arbitrarily sized, I think (1) is the right way to go. I've written the below patch to allocate a page from the buddy allocator directly and copy the data over to it. This ensures that we have a put_page free-able page for every entry that winds up on an skb frag list, so it can be safely freed when the frame is acked. We do a put page on each entry after the rpc_call_sync call so as to drop our own reference count to the page, leaving only the ref count taken by tcp_sendpages. This way the data will be properly freed when the ack comes in Successfully tested by myself to solve the above oops. Note, as this is the result of a setacl operation that exceeded a page of data, I think this amounts to a local DOS triggerable by an uprivlidged user, so I'm CCing security on this as well. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: Trond Myklebust <Trond.Myklebust@netapp.com> CC: security@kernel.org CC: Jeff Layton <jlayton@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2011-03-04 17:28:52 -08:00
Tejun Heo	43d133c18b	Merge branch 'master' into for-2.6.39	2011-02-21 09:43:56 +01:00
Chuck Lever	d1205f87bb	NFS: NFSv4 readdir loses entries On recent 2.6.38-rc kernels, connectathon basic test 6 fails on NFSv4 mounts of OpenSolaris with something like: > ./test6: readdir > ./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.12' dir entry, pass 0 > ./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.82' dir entry, pass 0 > ./test6: (/mnt/klimt/matisse.test) didn't read expected 'file.164' dir entry, pass 0 > ./test6: (/mnt/klimt/matisse.test) Test failed with 3 errors > basic tests failed > Tests failed, leaving /mnt/klimt mounted > [cel@matisse cthon04]$ I narrowed the problem down to nfs4_decode_dirent() reporting that the decode buffer had overflowed while decoding the entries for those missing files. verify_attr_len() assumes both it's pointer arguments reside on the same page. When these arguments point to locations on two different pages, verify_attr_len() can report false errors. This can happen now that a large NFSv4 readdir result can span pages. We have reasonably good checking in nfs4_decode_dirent() anyway, so it should be safe to simply remove the extra checking. At a guess, this was introduced by commit `6650239a`, "NFS: Don't use vm_map_ram() in readdir". Cc: stable@kernel.org [2.6.37] Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-28 13:41:35 -05:00
Chuck Lever	c08e76d0cd	NFS: Micro-optimize nfs4_decode_dirent() Make the decoding of NFSv4 directory entries slightly more efficient by: 1. Avoiding unnecessary byte swapping when checking XDR booleans, and 2. Not bumping "p" when its value will be immediately replaced by xdr_inline_decode() This commit makes nfs4_decode_dirent() consistent with similar logic in the other two decode_dirent() functions. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-28 13:37:35 -05:00
Trond Myklebust	e00b8a2404	NFS: Fix an NFS client lockdep issue There is no reason to be freeing the delegation cred in the rcu callback, and doing so is resulting in a lockdep complaint that rpc_credcache_lock is being called from both softirq and non-softirq contexts. Reported-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org	2011-01-28 13:37:09 -05:00
Andy Adamson	c7a360b05b	NFS construct consistent co_ownerid for v4.1 As stated in section 2.4 of RFC 5661, subsequent instances of the client need to present the same co_ownerid. Concatinate the client's IP dot address, host name, and the rpc_auth pseudoflavor to form the co_ownerid. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-25 22:49:14 -05:00
Trond Myklebust	27dc1cd3ad	NFS: nfs_wcc_update_inode() should set nfsi->attr_gencount If the call to nfs_wcc_update_inode() results in an attribute update, we need to ensure that the inode's attr_gencount gets bumped too, otherwise we are not protected against races with other GETATTR calls. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-25 15:28:21 -05:00
Andy Adamson	b2a2897dc4	NFS improve pnfs_put_deviceid_cache debug print What we really want to know is the ref count. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-25 15:26:51 -05:00
Andy Adamson	2c4cdf8f6d	NFS fix cb_sequence error processing Always assign the cb_process_state nfs_client pointer so a processing error in cb_sequence after the nfs_client is found and referenced returns a non-NULL cb_process_state nfs_client and the matching nfs_put_client in nfs4_callback_compound dereferences the client. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-25 15:26:51 -05:00
Andy Adamson	778be232a2	NFS do not find client in NFSv4 pg_authenticate The information required to find the nfs_client cooresponding to the incoming back channel request is contained in the NFS layer. Perform minimal checking in the RPC layer pg_authenticate method, and push more detailed checking into the NFS layer where the nfs_client can be found. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-25 15:26:51 -05:00
Chuck Lever	f61f6da0d5	NFS: Prevent memory allocation failure in nfsacl_encode() nfsacl_encode() allocates memory in certain cases. This of course is not guaranteed to work. Since commit `9f06c719` "SUNRPC: New xdr_streams XDR encoder API", the kernel's XDR encoders can't return a result indicating possibly a failure, so a memory allocation failure in nfsacl_encode() has become fatal (ie, the XDR code Oopses) in some cases. However, the allocated memory is a tiny fixed amount, on the order of 40-50 bytes. We can easily use a stack-allocated buffer for this, with only a wee bit of nose-holding. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-25 15:24:47 -05:00
Chuck Lever	ee5dc7732b	NFS: Fix "kernel BUG at fs/nfs/nfs3xdr.c:1338!" Milan Broz <mbroz@redhat.com> reports: > on today Linus' tree I get OOps if using nfs. > > server (2.6.36) exports dir: > /dir 172.16.1.0/24(rw,async,all_squash,no_subtree_check,anonuid=500,anongid=500) > > on client it is mounted in fstab > server:/dir /mnt/tst nfs rw,soft 0 0 > > and these commands OOpses it (simplified from a configure script): > > cd /dir > touch x > install x y > > [ 105.327701] ------------[ cut here ]------------ > [ 105.327979] kernel BUG at fs/nfs/nfs3xdr.c:1338! > [ 105.328075] invalid opcode: 0000 [#1] PREEMPT SMP > [ 105.328223] last sysfs file: /sys/devices/virtual/bdi/0:16/uevent > [ 105.328349] Modules linked in: usbcore dm_mod > [ 105.328553] > [ 105.328678] Pid: 3710, comm: install Not tainted 2.6.37+ #423 440BX Desktop Reference Platform/VMware Virtual Platform > [ 105.328853] EIP: 0060:[<c116c06c>] EFLAGS: 00010282 CPU: 0 > [ 105.329152] EIP is at nfs3_xdr_enc_setacl3args+0x61/0x98 > [ 105.329249] EAX: ffffffea EBX: ce941d98 ECX: 00000000 EDX: 00000004 > [ 105.329340] ESI: ce941cd0 EDI: 000000a4 EBP: ce941cc0 ESP: ce941cb4 > [ 105.329431] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 > [ 105.329525] Process install (pid: 3710, ti=ce940000 task=ced36f20 task.ti=ce940000) > [ 105.336600] Stack: > [ 105.336693] ce941cd0 ce9dc000 00000000 ce941cf8 c12ecd02 c12f43e0 c116c00b cf754158 > [ 105.336982] ce9dc004 cf754284 ce9dc004 cf7ffee8 ceff9978 ce9dc000 cf7ffee8 ce9dc000 > [ 105.337182] ce9dc000 ce941d14 c12e698d cf75412c ce941d98 cf7ffee8 cf7fff20 00000000 > [ 105.337405] Call Trace: > [ 105.337695] [<c12ecd02>] rpcauth_wrap_req+0x75/0x7f > [ 105.337806] [<c12f43e0>] ? xdr_encode_opaque+0x12/0x15 > [ 105.337898] [<c116c00b>] ? nfs3_xdr_enc_setacl3args+0x0/0x98 > [ 105.337988] [<c12e698d>] call_transmit+0x17e/0x1e8 > [ 105.338072] [<c12ec307>] __rpc_execute+0x6d/0x1a6 > [ 105.338155] [<c12ec474>] rpc_execute+0x34/0x37 > [ 105.338235] [<c12e738d>] rpc_run_task+0xb5/0xbd > [ 105.338316] [<c12e7474>] rpc_call_sync+0x3d/0x58 > [ 105.338402] [<c116d0c6>] nfs3_proc_setacls+0x18e/0x24f > [ 105.338493] [<c10b3f76>] ? __kmalloc+0x148/0x1c4 > [ 105.338579] [<c10ecd01>] ? posix_acl_alloc+0x12/0x22 > [ 105.338665] [<c116d5c8>] nfs3_proc_setacl+0xa0/0xca > [ 105.338748] [<c116d69c>] nfs3_setxattr+0x62/0x88 > [ 105.338834] [<c1317042>] ? sub_preempt_count+0x7c/0x89 > [ 105.338926] [<c116d63a>] ? nfs3_setxattr+0x0/0x88 > [ 105.339026] [<c10cfa79>] __vfs_setxattr_noperm+0x26/0x95 > [ 105.339114] [<c10cfb43>] vfs_setxattr+0x5b/0x76 > [ 105.339211] [<c10cfbfb>] setxattr+0x9d/0xc3 > [ 105.339298] [<c10a2ea8>] ? handle_pte_fault+0x258/0x5cb > [ 105.339428] [<c1091ff6>] ? __free_pages+0x1a/0x23 > [ 105.339517] [<c10498ea>] ? up_read+0x16/0x2c > [ 105.339599] [<c10b8365>] ? fget+0x0/0xa3 > [ 105.339677] [<c10b8365>] ? fget+0x0/0xa3 > [ 105.339760] [<c1025d23>] ? get_parent_ip+0xb/0x31 > [ 105.339843] [<c1317042>] ? sub_preempt_count+0x7c/0x89 > [ 105.339931] [<c10cfc72>] sys_fsetxattr+0x51/0x79 > [ 105.340014] [<c1002853>] sysenter_do_call+0x12/0x32 > [ 105.340133] Code: 2e 76 18 00 58 31 d2 8b 7f 28 f6 43 04 01 74 03 8b 53 08 6a 00 8b 46 04 6a 01 8b 0b 52 89 fa e8 85 10 f8 ff 83 c4 0c 85 c0 79 04 <0f> 0b eb fe 31 c9 f6 43 04 04 74 03 8b 4b 0c 68 00 10 00 00 8d > [ 105.350321] EIP: [<c116c06c>] nfs3_xdr_enc_setacl3args+0x61/0x98 SS:ESP 0068:ce941cb4 > [ 105.364385] ---[ end trace 01fcfe7f0f7f6e4a ]--- nfs3_xdr_enc_setacl3args() is not properly setting up the target buffer before nfsacl_encode() attempts to encode the ACL. Introduced by commit `d9c407b1` "NFS: Introduce new-style XDR encoding functions for NFSv3." Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-25 15:24:47 -05:00
Chuck Lever	839f7ad693	NFS: Fix "kernel BUG at fs/aio.c:554!" Nick Piggin reports: > I'm getting use after frees in aio code in NFS > > [ 2703.396766] Call Trace: > [ 2703.396858] [<ffffffff8100b057>] ? native_sched_clock+0x27/0x80 > [ 2703.396959] [<ffffffff8108509e>] ? put_lock_stats+0xe/0x40 > [ 2703.397058] [<ffffffff81088348>] ? lock_release_holdtime+0xa8/0x140 > [ 2703.397159] [<ffffffff8108a2a5>] lock_acquire+0x95/0x1b0 > [ 2703.397260] [<ffffffff811627db>] ? aio_put_req+0x2b/0x60 > [ 2703.397361] [<ffffffff81039701>] ? get_parent_ip+0x11/0x50 > [ 2703.397464] [<ffffffff81612a31>] _raw_spin_lock_irq+0x41/0x80 > [ 2703.397564] [<ffffffff811627db>] ? aio_put_req+0x2b/0x60 > [ 2703.397662] [<ffffffff811627db>] aio_put_req+0x2b/0x60 > [ 2703.397761] [<ffffffff811647fe>] do_io_submit+0x2be/0x7c0 > [ 2703.397895] [<ffffffff81164d0b>] sys_io_submit+0xb/0x10 > [ 2703.397995] [<ffffffff8100307b>] system_call_fastpath+0x16/0x1b > > Adding some tracing, it is due to nfs completing the request then > returning something other than -EIOCBQUEUED, so aio.c > also completes the request. To address this, prevent the NFS direct I/O engine from completing async iocbs when the forward path returns an error without starting any I/O. This fix appears to survive ^C during both "xfstest no. 208" and "fsx -Z." It's likely this bug has existed for a very long while, as we are seeing very similar symptoms in OEL 5. Copying stable. Cc: Stable <stable@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-25 15:24:47 -05:00
Jesper Juhl	ad3d2eedf0	NFS4: Avoid potential NULL pointer dereference in decode_and_add_ds(). On Mon, 17 Jan 2011, Mi Jinlong wrote: > > > Jesper Juhl: > > strrchr() can return NULL if nothing is found. If this happens we'll > > dereference a NULL pointer in > > fs/nfs/nfs4filelayoutdev.c::decode_and_add_ds(). > > > > I tried to find some other code that guarantees that this can never > > happen but I was unsuccessful. So, unless someone else can point to some > > code that ensures this can never be a problem, I believe this patch is > > needed. > > > > While I was changing this code I also noticed that all the dprintk() > > statements, except one, start with "%s:". The one missing the ":" I added > > it to. > > Maybe another one also should be changed at decode_and_add_ds() at line 243: > > 243 printk("%s Decoded address and port %s\n", __func__, buf); > Missed that one. Thanks. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-25 15:24:46 -05:00
Tejun Heo	ada609ee2a	workqueue: use WQ_MEM_RECLAIM instead of WQ_RESCUER WQ_RESCUER is now an internal flag and should only be used in the workqueue implementation proper. Use WQ_MEM_RECLAIM instead. This doesn't introduce any functional difference. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: dm-devel@redhat.com Cc: Neil Brown <neilb@suse.de>	2011-01-25 14:35:54 +01:00
Fred Isaman	0da2a4ac33	NFS: fix handling of malloc failure during nfs_flush_multi() Cleanup of the allocated list entries should not call put_nfs_open_context() on each entry, as the context will always be NULL, causing an oops. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-01-19 15:37:49 -05:00
David Howells	ea5b778a8b	Unexport do_add_mount() and add in follow_automount(), not ->d_automount() Unexport do_add_mount() and make ->d_automount() return the vfsmount to be added rather than calling do_add_mount() itself. follow_automount() will then do the addition. This slightly complicates things as ->d_automount() normally wants to add the new vfsmount to an expiration list and start an expiration timer. The problem with that is that the vfsmount will be deleted if it has a refcount of 1 and the timer will not repeat if the expiration list is empty. To this end, we require the vfsmount to be returned from d_automount() with a refcount of (at least) 2. One of these refs will be dropped unconditionally. In addition, follow_automount() must get a 3rd ref around the call to do_add_mount() lest it eat a ref and return an error, leaving the mount we have open to being expired as we would otherwise have only 1 ref on it. d_automount() should also add the the vfsmount to the expiration list (by calling mnt_set_expiry()) and start the expiration timer before returning, if this mechanism is to be used. The vfsmount will be unlinked from the expiration list by follow_automount() if do_add_mount() fails. This patch also fixes the call to do_add_mount() for AFS to propagate the mount flags from the parent vfsmount. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:07:48 -05:00
David Howells	36d43a4376	NFS: Use d_automount() rather than abusing follow_link() Make NFS use the new d_automount() dentry operation rather than abusing follow_link() on directories. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com> Acked-by: Ian Kent <raven@themaw.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-01-15 20:07:34 -05:00

1 2 3 4 5 ...

2075 Commits