Commit Graph

38468 Commits

Author SHA1 Message Date
Linus Torvalds
e20db597b6 NFS client updates for Linux 3.19
Highlights include:
 
 Features:
 - NFSv4.2 client support for hole punching and preallocation.
 - Further RPC/RDMA client improvements.
 - Add more RPC transport debugging tracepoints.
 - Add RPC debugging tools in debugfs.
 
 Bugfixes:
 - Stable fix for layoutget error handling
 - Fix a change in COMMIT behaviour resulting from the recent io code updates
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUhRVTAAoJEGcL54qWCgDyfeUP/RoFo3ImTMbGxfcPJqoELjcO
 lZbQ+27pOE/whFDkWgiOVTwlgGct5a0WRo7GCZmpYJA4q1kmSv4ngTb3nMTCUztt
 xMJ0mBr0BqttVs+ouKiVPm3cejQXedEhttwWcloIXS8lNenlpL29Zlrx2NHdU8UU
 13+souocj0dwIyTYYS/4Lm9KpuCYnpDBpP5ShvQjVaMe/GxJo6GyZu70c7FgwGNz
 Nh9onzZV3mz1elhfizlV38aVA7KWVXtLWIqOFIKlT2fa4nWB8Hc07miR5UeOK0/h
 r+icnF2qCQe83MbjOxYNxIKB6uiA/4xwVc90X4AQ7F0RX8XPWHIQWG5tlkC9jrCQ
 3RGzYshWDc9Ud2mXtLMyVQxHVVYlFAe1WtdP8ZWb1oxDInmhrarnWeNyECz9xGKu
 VzIDZzeq9G8slJXATWGRfPsYr+Ihpzcen4QQw58cakUBcqEJrYEhlEOfLovM71k3
 /S/jSHBAbQqiw4LPMw87bA5A6+ZKcVSsNE0XCtNnhmqFpLc1kKRrl5vaN+QMk5tJ
 v4/zR0fPqH7SGAJWYs4brdfahyejEo0TwgpDs7KHmu1W9zQ0LCVTaYnQuUmQjta6
 WyYwIy3TTibdfR191O0E3NOW82Q/k/NBD6ySvabN9HqQ9eSk6+rzrWAslXCbYohb
 BJfzcQfDdx+lsyhjeTx9
 =wOP3
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.19-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client updates from Trond Myklebust:
 "Highlights include:

  Features:
   - NFSv4.2 client support for hole punching and preallocation.
   - Further RPC/RDMA client improvements.
   - Add more RPC transport debugging tracepoints.
   - Add RPC debugging tools in debugfs.

  Bugfixes:
   - Stable fix for layoutget error handling
   - Fix a change in COMMIT behaviour resulting from the recent io code
     updates"

* tag 'nfs-for-3.19-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (28 commits)
  sunrpc: add a debugfs rpc_xprt directory with an info file in it
  sunrpc: add debugfs file for displaying client rpc_task queue
  nfs: Add DEALLOCATE support
  nfs: Add ALLOCATE support
  NFS: Clean up nfs4_init_callback()
  NFS: SETCLIENTID XDR buffer sizes are incorrect
  SUNRPC: serialize iostats updates
  xprtrdma: Display async errors
  xprtrdma: Enable pad optimization
  xprtrdma: Re-write rpcrdma_flush_cqs()
  xprtrdma: Refactor tasklet scheduling
  xprtrdma: unmap all FMRs during transport disconnect
  xprtrdma: Cap req_cqinit
  xprtrdma: Return an errno from rpcrdma_register_external()
  nfs: define nfs_inc_fscache_stats and using it as possible
  nfs: replace nfs_add_stats with nfs_inc_stats when add one
  NFS: Deletion of unnecessary checks before the function call "nfs_put_client"
  sunrpc: eliminate RPC_TRACEPOINTS
  sunrpc: eliminate RPC_DEBUG
  lockd: eliminate LOCKD_DEBUG
  ...
2014-12-10 15:13:13 -08:00
Linus Torvalds
8139548136 Merge branch 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI updates from Ingo Molnar:
 "Changes in this cycle are:

   - support module unload for efivarfs (Mathias Krause)

   - another attempt at moving x86 to libstub taking advantage of the
     __pure attribute (Ard Biesheuvel)

   - add EFI runtime services section to ptdump (Mathias Krause)"

* 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, ptdump: Add section for EFI runtime services
  efi/x86: Move x86 back to libstub
  efivarfs: Allow unloading when build as module
2014-12-10 12:42:16 -08:00
Linus Torvalds
3eb5b893eb Merge branch 'x86-mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 MPX support from Thomas Gleixner:
 "This enables support for x86 MPX.

  MPX is a new debug feature for bound checking in user space.  It
  requires kernel support to handle the bound tables and decode the
  bound violating instruction in the trap handler"

* 'x86-mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  asm-generic: Remove asm-generic arch_bprm_mm_init()
  mm: Make arch_unmap()/bprm_mm_init() available to all architectures
  x86: Cleanly separate use of asm-generic/mm_hooks.h
  x86 mpx: Change return type of get_reg_offset()
  fs: Do not include mpx.h in exec.c
  x86, mpx: Add documentation on Intel MPX
  x86, mpx: Cleanup unused bound tables
  x86, mpx: On-demand kernel allocation of bounds tables
  x86, mpx: Decode MPX instruction to get bound violation information
  x86, mpx: Add MPX-specific mmap interface
  x86, mpx: Introduce VM_MPX to indicate that a VMA is MPX specific
  x86, mpx: Add MPX to disabled features
  ia64: Sync struct siginfo with general version
  mips: Sync struct siginfo with general version
  mpx: Extend siginfo structure to include bound violation information
  x86, mpx: Rename cfg_reg_u and status_reg
  x86: mpx: Give bndX registers actual names
  x86: Remove arbitrary instruction size limit in instruction decoder
2014-12-10 09:34:43 -08:00
Linus Torvalds
86c6a2fddf Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The main changes in this cycle are:

   - 'Nested Sleep Debugging', activated when CONFIG_DEBUG_ATOMIC_SLEEP=y.

     This instruments might_sleep() checks to catch places that nest
     blocking primitives - such as mutex usage in a wait loop.  Such
     bugs can result in hard to debug races/hangs.

     Another category of invalid nesting that this facility will detect
     is the calling of blocking functions from within schedule() ->
     sched_submit_work() -> blk_schedule_flush_plug().

     There's some potential for false positives (if secondary blocking
     primitives themselves are not ready yet for this facility), but the
     kernel will warn once about such bugs per bootup, so the warning
     isn't much of a nuisance.

     This feature comes with a number of fixes, for problems uncovered
     with it, so no messages are expected normally.

   - Another round of sched/numa optimizations and refinements, for
     CONFIG_NUMA_BALANCING=y.

   - Another round of sched/dl fixes and refinements.

  Plus various smaller fixes and cleanups"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
  sched: Add missing rcu protection to wake_up_all_idle_cpus
  sched/deadline: Introduce start_hrtick_dl() for !CONFIG_SCHED_HRTICK
  sched/numa: Init numa balancing fields of init_task
  sched/deadline: Remove unnecessary definitions in cpudeadline.h
  sched/cpupri: Remove unnecessary definitions in cpupri.h
  sched/deadline: Fix rq->dl.pushable_tasks bug in push_dl_task()
  sched/fair: Fix stale overloaded status in the busiest group finding logic
  sched: Move p->nr_cpus_allowed check to select_task_rq()
  sched/completion: Document when to use wait_for_completion_io_*()
  sched: Update comments about CLONE_NEWUTS and CLONE_NEWIPC
  sched/fair: Kill task_struct::numa_entry and numa_group::task_list
  sched: Refactor task_struct to use numa_faults instead of numa_* pointers
  sched/deadline: Don't check CONFIG_SMP in switched_from_dl()
  sched/deadline: Reschedule from switched_from_dl() after a successful pull
  sched/deadline: Push task away if the deadline is equal to curr during wakeup
  sched/deadline: Add deadline rq status print
  sched/deadline: Fix artificial overrun introduced by yield_task_dl()
  sched/rt: Clean up check_preempt_equal_prio()
  sched/core: Use dl_bw_of() under rcu_read_lock_sched()
  sched: Check if we got a shallowest_idle_cpu before searching for least_loaded_cpu
  ...
2014-12-09 21:21:34 -08:00
Al Viro
1ead0e79bf fat: fix oops on corrupted vfat fs
a) don't bother with ->d_time for positives - we only check it for
   negatives anyway.

b) make sure to set it at unlink and rmdir time - at *that* point
   soon-to-be negative dentry matches then-current directory contents

c) don't go into renaming of old alias in vfat_lookup() unless it
   has the same parent (which it will, unless we are seeing corrupted
   image)

[hirofumi@mail.parknet.co.jp: make change minimum, don't call d_move() for dir]
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: <stable@vger.kernel.org>	[3.17.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-03 09:36:03 -08:00
Linus Torvalds
3a18ca0613 Fix an ext4 metadata checksum regression introduced in v3.18-rc3.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJUfTKdAAoJENNvdpvBGATwRhsQAJOWfDD1K57CFQQ0MbLdRfeA
 YzdTHlKIYpt4Dm3+Bc/LJskLrA0sn2vmxF/v0Jxr3F/agwfCODLMO8dIsB0nFX0E
 eC8fCx05titBMQLN4adWr59qQTeE1nssQdpA5SomPrJZr5pSabtj3ekFiHQJ+bWb
 9WNY737TxSPLK0ex9iDlAAp/AoxgF4K6zj/azsRY+mmlfM9+dFoprZWqWwgwl99m
 4LVx1waAnLQdU2Yj7ZYGReweFFTTOqGz4ds1GggymB3Z8Q873dVYO7vdbQWDFJgC
 TcAp8YbfrQC6/M/IaASZKVj6hwEPVMTgOs7dUeyfPtSaXBrW0WBGAhM5gFnQ6J+T
 DO4YHC+tH26GLsfBs9IZnHjAoeVZ93JFDKmxfclDs0AGY+0WgSyY8Bt6VJyoWR60
 RPbW15i/0oMSWEPxbuqQmFIqlcj5n9D80SEmvhpn7oJwkrrUMprcxcWTQN+Ca73e
 2EIOW0SHaLrkM7wpYjwlO4dgCxSZWg6QfHyznbuJcVKOw8KnDMuTEOjP2vBvwHwu
 Wax4EIGZw9XqZVI7a9Z1nd+mpUYi5KDgpS8Uo08Qz5QapWEYla3JPt76q3TwSCIz
 ExMwoRBUSMrpSoDMbyjmk4sh5ABTTWOf9SPmCdnVfzZ36EY0dJckeXj0jFqtyVdq
 p1bxjWPARBm1LfVBcQek
 =yIOT
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus_urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 bugfix from Ted Ts'o:
 "Fix an ext4 metadata checksum regression introduced in v3.18-rc3"

* tag 'ext4_for_linus_urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  jbd2: fix regression where we fail to initialize checksum seed when loading
2014-12-01 20:11:49 -08:00
Darrick J. Wong
32f3869184 jbd2: fix regression where we fail to initialize checksum seed when loading
When we're enabling journal features, we cannot use the predicate
jbd2_journal_has_csum_v2or3() because we haven't yet set the sb
feature flag fields!  Moreover, we just finished loading the shash
driver, so the test is unnecessary; calculate the seed always.

Without this patch, we fail to initialize the checksum seed the first
time we turn on journal_checksum, which means that all journal blocks
written during that first mount are corrupt.  Transactions written
after the second mount will be fine, since the feature flag will be
set in the journal superblock.  xfstests generic/{034,321,322} are the
regression tests.

(This is important for 3.18.)

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.coM>
Reported-by: Eric Whitney <enwlinux@gmail.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2014-12-01 21:57:06 -05:00
Chris Mason
2f19cad94c btrfs: zero out left over bytes after processing compression streams
Don Bailey noticed that our page zeroing for compression at end-io time
isn't complete.  This reworks a patch from Linus to push the zeroing
into the zlib and lzo specific functions instead of trying to handle the
corners inside btrfs_decompress_buf2page

Signed-off-by: Chris Mason <clm@fb.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Reported-by: Don A. Bailey <donb@securitymouse.com>
cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-11-30 09:33:51 -08:00
Trond Myklebust
1702562db4 NFS: Generic client side changes from Chuck
These patches fixes for iostats and SETCLIENTID in addition to cleaning
 up the nfs4_init_callback() function.
 
 Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJUdfhQAAoJENfLVL+wpUDr9x4QANWmG6xjEU7RuIBLalOoxit6
 eXnEDNZlwYp6NCkYktHVWaTqXdwq7fdGX+3p4eiwNg3C3SrJHkpWvjeFd4KT/SyP
 B/w3vYG3o5H01i3Mb5kCD2uW0gbS9soZXpIg+uHHOl43yZzneC0vPTLhQ/h+9zPc
 B32XcgQhvLIHR3LWrC/+uolsa31lcyya0W45PX+iCpzHF7i9qRBrNLODkTlw1hNQ
 eEnYIvVy5oW00zlHJUYiTHP3e+0EJn5PAngdYbqiboJ9mK7DbB0QDwqyvJbIT7ql
 WAip6cNcJnSv1eiVYqDwlR1ok8drK5X7yQCT3lcLzAMDznLsSAL1Itu0h2Ay3z61
 f8XCyTwI0izq0DbdrMcPoqPSitqyM8nkPElnOuitXwzEroPaG40OF67yss3+ixbl
 JeQZ+u35pnpCkKUaZdCK3Pn83StxmUaBcFx8eg30NBc0SN13Eiz6aZcGEperClrR
 RwMLDUhUtAMcMRunRRxiN9lHafPqqeDeJre7uky0p0sU9CsH+1n5qIKLmk+Ber2d
 ZS29TobdR7ktfjQ52XazMAIFzI7r1v4Zn7ziH/WRbvKoKw9ICcyoUGNy9CS5LyWu
 BxWCZAJmby5H1i7V7ituJK8TImb81L4aPW06hHX9k+0SoTrgFjas//4jZYfysJ9N
 PvR0MRNfMFhuBlsTj7Gy
 =PMdS
 -----END PGP SIGNATURE-----

Merge tag 'nfs-cel-for-3.19' of git://git.linux-nfs.org/projects/anna/nfs-rdma into linux-next

Pull pull additional NFS client changes for 3.19 from Anna Schumaker:
  "NFS: Generic client side changes from Chuck

  These patches fixes for iostats and SETCLIENTID in addition to cleaning
  up the nfs4_init_callback() function.

  Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>"

* tag 'nfs-cel-for-3.19' of git://git.linux-nfs.org/projects/anna/nfs-rdma:
  NFS: Clean up nfs4_init_callback()
  NFS: SETCLIENTID XDR buffer sizes are incorrect
  SUNRPC: serialize iostats updates
2014-11-26 17:34:14 -05:00
Linus Torvalds
b914c5b213 Merge branch 'for-3.18' of git://linux-nfs.org/~bfields/linux
Pull nfsd bugfixes from Bruce Fields:
 "These fix one mishandling of the case when security labels are
  configured out, and two races in the 4.1 backchannel code"

* 'for-3.18' of git://linux-nfs.org/~bfields/linux:
  nfsd: Fix slot wake up race in the nfsv4.1 callback code
  SUNRPC: Fix locking around callback channel reply receive
  nfsd: correctly define v4.2 support attributes
2014-11-25 19:05:41 -08:00
Linus Torvalds
277f850fbc Merge git://git.kvack.org/~bcrl/aio-fixes
Pull aio fix from Ben LaHaise:
 "Dirty page accounting fix for aio"

* git://git.kvack.org/~bcrl/aio-fixes:
  aio: fix uncorrent dirty pages accouting when truncating AIO ring buffer
2014-11-25 18:55:44 -08:00
Anna Schumaker
624bd5b7b6 nfs: Add DEALLOCATE support
This patch adds support for using the NFS v4.2 operation DEALLOCATE to
punch holes in a file.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-11-25 16:38:32 -05:00
Anna Schumaker
f4ac1674f5 nfs: Add ALLOCATE support
This patch adds support for using the NFS v4.2 operation ALLOCATE to
preallocate data in a file.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-11-25 16:38:32 -05:00
Chuck Lever
c2ef47b7f5 NFS: Clean up nfs4_init_callback()
nfs4_init_callback() is never invoked for NFS versions other than 4.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-11-25 16:22:16 -05:00
Chuck Lever
6dd3436b9d NFS: SETCLIENTID XDR buffer sizes are incorrect
Use the correct calculation of the maximum size of a clientaddr4
when encoding and decoding SETCLIENTID operations. clientaddr4 is
defined in section 2.2.10 of RFC3530bis-31.

The usage in encode_setclientid_maxsz is missing the 4-byte length
in both strings, but is otherwise correct. decode_setclientid_maxsz
simply asks for a page of receive buffer space, which is
unnecessarily large (more than 4KB).

Note that a SETCLIENTID reply is either clientid+verifier, or
clientaddr4, depending on the returned NFS status. It doesn't
hurt to allocate enough space for both.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
2014-11-25 16:22:16 -05:00
Li RongQing
e9f456ca50 nfs: define nfs_inc_fscache_stats and using it as possible
Define and use nfs_inc_fscache_stats when plus one, which can save to
pass one parameter.

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-11-24 20:08:47 -05:00
Li RongQing
5a254d08b0 nfs: replace nfs_add_stats with nfs_inc_stats when add one
Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-11-24 20:08:47 -05:00
Markus Elfring
fe0bf1185d NFS: Deletion of unnecessary checks before the function call "nfs_put_client"
The nfs_put_client() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-11-24 20:07:27 -05:00
Jeff Layton
10b89567db lockd: eliminate LOCKD_DEBUG
LOCKD_DEBUG is always the same value as CONFIG_SUNRPC_DEBUG, so we can
just use it instead.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-11-24 17:24:08 -05:00
Peng Tao
4bd5a980de nfs41: fix nfs4_proc_layoutget error handling
nfs4_layoutget_release() drops layout hdr refcnt. Grab the refcnt
early so that it is safe to call .release in case nfs4_alloc_pages
fails.

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Fixes: a47970ff78 ("NFSv4.1: Hold reference to layout hdr in layoutget")
Cc: stable@vger.kernel.org # 3.9+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-11-24 17:14:54 -05:00
Weston Andros Adamson
cb1410c71e NFS: fix subtle change in COMMIT behavior
Recent work in the pgio layer made it possible for there to be more than one
request per page. This caused a subtle change in commit behavior, because
write.c:nfs_commit_unstable_pages compares the number of *pages* waiting for
writeback against the number of requests on a commit list to choose when to
send a COMMIT in a non-blocking flush.

This is probably hard to hit in normal operation - you have to be using
rsize/wsize < PAGE_SIZE, or pnfs with lots of boundaries that are not page
aligned to have a noticeable change in behavior.

Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-11-24 17:00:42 -05:00
Christoph Hellwig
6a74c0c940 pnfs/blocklayout: fix end calculation in pnfs_num_cont_bytes
Use the number of pages in the pagecache mapping instead of the
number of pnfs requests which is only slightly related.

Reported-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-11-24 17:00:41 -05:00
Anna Schumaker
878ffa9f85 NFS: Use nfs_server_capable() for checknig NFS_CAP_SEEK
This should make the code easier to maintain in the future.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-11-24 12:49:13 -05:00
Jan Kara
c1b69b1ca1 nfs: Remove dead case from nfs4_map_errors()
NFS4ERR_ACCESS has number 13 and thus is matched and returned
immediately at the beginning of nfs4_map_errors() and there's no point
in checking it later.

Coverity-id: 733891
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-11-24 12:48:14 -05:00
Linus Torvalds
d038a63ace Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs deadlock fix from Chris Mason:
 "This has a fix for a long standing deadlock that we've been trying to
  nail down for a while.  It ended up being a bad interaction with the
  fair reader/writer locks and the order btrfs reacquires locks in the
  btree"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  btrfs: fix lockups from btrfs_clear_path_blocking
2014-11-23 11:16:36 -08:00
Al Viro
3035b675ad Merge branch 'overlayfs-current' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs into for-linus
"The biggest change is to rename the filesystem from "overlayfs" to "overlay".
This will allow legacy overlayfs to be easily carried by distros alongside the
new mainline one.  Also fix a couple of copy-up races and allow escaping comma
character in filenames."

The last bit is about commas in pathname mount options...
2014-11-21 11:51:08 -05:00
Miklos Szeredi
7676895f47 ovl: ovl_dir_fsync() cleanup
Check against !OVL_PATH_LOWER instead of OVL_PATH_MERGE.  For a copied up
directory the two are currently equivalent.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-11-20 16:40:02 +01:00
Miklos Szeredi
c9f00fdb9a ovl: pass dentry into ovl_dir_read_merged()
Pass dentry into ovl_dir_read_merged() insted of upperpath and lowerpath.
This cleans up callers and paves the way for multi-layer directory reads.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-11-20 16:40:01 +01:00
Miklos Szeredi
71d509280f ovl: use lockless_dereference() for upperdentry
Don't open code lockless_dereference() in ovl_upperdentry_dereference().

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-11-20 16:40:01 +01:00
Miklos Szeredi
91c7794713 ovl: allow filenames with comma
Allow option separator (comma) to be escaped with backslash.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-11-20 16:40:00 +01:00
Miklos Szeredi
521484639e ovl: fix race in private xattr checks
Xattr operations can race with copy up.  This does not matter as long as
we consistently fiter out "trunsted.overlay.opaque" attribute on upper
directories.

Previously we checked parent against OVL_PATH_MERGE.  This is too general,
and prone to race with copy-up.  I.e. we found the parent to be on the
lower layer but ovl_dentry_real() would return the copied-up dentry,
possibly with the "opaque" attribute.

So instead use ovl_path_real() and decide to filter the attributes based on
the actual type of the dentry we'll use.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-11-20 16:40:00 +01:00
Miklos Szeredi
a105d685a8 ovl: fix remove/copy-up race
ovl_remove_and_whiteout() needs to check if upper dentry exists or not
after having locked upper parent directory.

Previously we used a "type" value computed before locking the upper parent
directory, which is susceptible to racing with copy-up.

There's a similar check in ovl_check_empty_and_clear().  This one is not
actually racy, since copy-up doesn't change the "emptyness" property of a
directory.  Add a comment to this effect, and check the existence of upper
dentry locally to make the code cleaner.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
2014-11-20 16:39:59 +01:00
Miklos Szeredi
ef94b1864d ovl: rename filesystem type to "overlay"
Some distributions carry an "old" format of overlayfs while mainline has a
"new" format.

The distros will possibly want to keep the old overlayfs alongside the new
for compatibility reasons.

To make it possible to differentiate the two versions change the name of
the new one from "overlayfs" to "overlay".

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reported-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Cc: Andy Whitcroft <apw@canonical.com>
2014-11-20 16:39:59 +01:00
Trond Myklebust
c6c15e1ed3 nfsd: Fix slot wake up race in the nfsv4.1 callback code
The currect code for nfsd41_cb_get_slot() and nfsd4_cb_done() has no
locking in order to guarantee atomicity, and so allows for races of
the form.

Task 1                                  Task 2
======                                  ======
if (test_and_set_bit(0) != 0) {
                                        clear_bit(0)
                                        rpc_wake_up_next(queue)
        rpc_sleep_on(queue)
        return false;
}

This patch breaks the race condition by adding a retest of the bit
after the call to rpc_sleep_on().

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-11-19 15:45:44 -05:00
Chris Mason
f82c458a2c btrfs: fix lockups from btrfs_clear_path_blocking
The fair reader/writer locks mean that btrfs_clear_path_blocking needs
to strictly follow lock ordering rules even when we already have
blocking locks on a given path.

Before we can clear a blocking lock on the path, we need to make sure
all of the locks have been converted to blocking.  This will remove lock
inversions against anyone spinning in write_lock() against the buffers
we're trying to get read locks on.  These inversions didn't exist before
the fair read/writer locks, but now we need to be more careful.

We papered over this deadlock in the past by changing
btrfs_try_read_lock() to be a true trylock against both the spinlock and
the blocking lock.  This was slower, and not sufficient to fix all the
deadlocks.  This patch adds a btrfs_tree_read_lock_atomic(), which
basically means get the spinlock but trylock on the blocking lock.

Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Josef Bacik <jbacik@fb.com>
Reported-by: Patrick Schmid <schmid@phys.ethz.ch>
cc: stable@vger.kernel.org #v3.15+
2014-11-19 10:34:35 -08:00
Arnd Bergmann
7ca2f23440 isofs: avoid unused function warning
With the isofs_hash() function removed, isofs_hash_ms() is the only user
of isofs_hash_common(), but it's defined inside of an #ifdef, which triggers
this gcc warning in ARM axm55xx_defconfig starting with v3.18-rc3:

fs/isofs/inode.c:177:1: warning: 'isofs_hash_common' defined but not used [-Wunused-function]

This patch moves the function inside of the same #ifdef section to avoid that
warning, which seems the best compromise of a relatively harmless patch for
a late -rc.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: b0afd8e5db ("isofs: don't bother with ->d_op for normal case")
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-19 13:09:37 -05:00
Yan, Zheng
4a7795d35e vfs: fix reference leak in d_prune_aliases()
In "d_prune_alias(): just lock the parent and call __dentry_kill()" the old
dget + d_drop + dput has been replaced with lock_parent + __dentry_kill;
unfortunately, dput() does more than just killing dentry - it also drops the
reference to parent.  New variant leaks that reference and needs dput(parent)
after killing the child off.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-19 13:07:20 -05:00
Christoph Hellwig
6d0ba0432a nfsd: correctly define v4.2 support attributes
Even when security labels are disabled we support at least the same
attributes as v4.1.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2014-11-19 12:03:19 -05:00
Dave Hansen
abe1e395f6 fs: Do not include mpx.h in exec.c
We no longer need mpx.h in exec.c.  This will obviously also
break the build for non-x86 builds.  We get the MPX includes that
we need from mmu_context.h now.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20141118003608.837015B3@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-18 02:01:40 +01:00
Dave Hansen
fe3d197f84 x86, mpx: On-demand kernel allocation of bounds tables
This is really the meat of the MPX patch set.  If there is one patch to
review in the entire series, this is the one.  There is a new ABI here
and this kernel code also interacts with userspace memory in a
relatively unusual manner.  (small FAQ below).

Long Description:

This patch adds two prctl() commands to provide enable or disable the
management of bounds tables in kernel, including on-demand kernel
allocation (See the patch "on-demand kernel allocation of bounds tables")
and cleanup (See the patch "cleanup unused bound tables"). Applications
do not strictly need the kernel to manage bounds tables and we expect
some applications to use MPX without taking advantage of this kernel
support. This means the kernel can not simply infer whether an application
needs bounds table management from the MPX registers.  The prctl() is an
explicit signal from userspace.

PR_MPX_ENABLE_MANAGEMENT is meant to be a signal from userspace to
require kernel's help in managing bounds tables.

PR_MPX_DISABLE_MANAGEMENT is the opposite, meaning that userspace don't
want kernel's help any more. With PR_MPX_DISABLE_MANAGEMENT, the kernel
won't allocate and free bounds tables even if the CPU supports MPX.

PR_MPX_ENABLE_MANAGEMENT will fetch the base address of the bounds
directory out of a userspace register (bndcfgu) and then cache it into
a new field (->bd_addr) in  the 'mm_struct'.  PR_MPX_DISABLE_MANAGEMENT
will set "bd_addr" to an invalid address.  Using this scheme, we can
use "bd_addr" to determine whether the management of bounds tables in
kernel is enabled.

Also, the only way to access that bndcfgu register is via an xsaves,
which can be expensive.  Caching "bd_addr" like this also helps reduce
the cost of those xsaves when doing table cleanup at munmap() time.
Unfortunately, we can not apply this optimization to #BR fault time
because we need an xsave to get the value of BNDSTATUS.

==== Why does the hardware even have these Bounds Tables? ====

MPX only has 4 hardware registers for storing bounds information.
If MPX-enabled code needs more than these 4 registers, it needs to
spill them somewhere. It has two special instructions for this
which allow the bounds to be moved between the bounds registers
and some new "bounds tables".

They are similar conceptually to a page fault and will be raised by
the MPX hardware during both bounds violations or when the tables
are not present. This patch handles those #BR exceptions for
not-present tables by carving the space out of the normal processes
address space (essentially calling the new mmap() interface indroduced
earlier in this patch set.) and then pointing the bounds-directory
over to it.

The tables *need* to be accessed and controlled by userspace because
the instructions for moving bounds in and out of them are extremely
frequent. They potentially happen every time a register pointing to
memory is dereferenced. Any direct kernel involvement (like a syscall)
to access the tables would obviously destroy performance.

==== Why not do this in userspace? ====

This patch is obviously doing this allocation in the kernel.
However, MPX does not strictly *require* anything in the kernel.
It can theoretically be done completely from userspace. Here are
a few ways this *could* be done. I don't think any of them are
practical in the real-world, but here they are.

Q: Can virtual space simply be reserved for the bounds tables so
   that we never have to allocate them?
A: As noted earlier, these tables are *HUGE*. An X-GB virtual
   area needs 4*X GB of virtual space, plus 2GB for the bounds
   directory. If we were to preallocate them for the 128TB of
   user virtual address space, we would need to reserve 512TB+2GB,
   which is larger than the entire virtual address space today.
   This means they can not be reserved ahead of time. Also, a
   single process's pre-popualated bounds directory consumes 2GB
   of virtual *AND* physical memory. IOW, it's completely
   infeasible to prepopulate bounds directories.

Q: Can we preallocate bounds table space at the same time memory
   is allocated which might contain pointers that might eventually
   need bounds tables?
A: This would work if we could hook the site of each and every
   memory allocation syscall. This can be done for small,
   constrained applications. But, it isn't practical at a larger
   scale since a given app has no way of controlling how all the
   parts of the app might allocate memory (think libraries). The
   kernel is really the only place to intercept these calls.

Q: Could a bounds fault be handed to userspace and the tables
   allocated there in a signal handler instead of in the kernel?
A: (thanks to tglx) mmap() is not on the list of safe async
   handler functions and even if mmap() would work it still
   requires locking or nasty tricks to keep track of the
   allocation state there.

Having ruled out all of the userspace-only approaches for managing
bounds tables that we could think of, we create them on demand in
the kernel.

Based-on-patch-by: Qiaowei Ren <qiaowei.ren@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-mm@kvack.org
Cc: linux-mips@linux-mips.org
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20141114151829.AD4310DE@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-18 00:58:53 +01:00
Qiaowei Ren
4aae7e436f x86, mpx: Introduce VM_MPX to indicate that a VMA is MPX specific
MPX-enabled applications using large swaths of memory can
potentially have large numbers of bounds tables in process
address space to save bounds information. These tables can take
up huge swaths of memory (as much as 80% of the memory on the
system) even if we clean them up aggressively. In the worst-case
scenario, the tables can be 4x the size of the data structure
being tracked. IOW, a 1-page structure can require 4 bounds-table
pages.

Being this huge, our expectation is that folks using MPX are
going to be keen on figuring out how much memory is being
dedicated to it. So we need a way to track memory use for MPX.

If we want to specifically track MPX VMAs we need to be able to
distinguish them from normal VMAs, and keep them from getting
merged with normal VMAs. A new VM_ flag set only on MPX VMAs does
both of those things. With this flag, MPX bounds-table VMAs can
be distinguished from other VMAs, and userspace can also walk
/proc/$pid/smaps to get memory usage for MPX.

In addition to this flag, we also introduce a special ->vm_ops
specific to MPX VMAs (see the patch "add MPX specific mmap
interface"), but currently different ->vm_ops do not by
themselves prevent VMA merging, so we still need this flag.

We understand that VM_ flags are scarce and are open to other
options.

Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: linux-mm@kvack.org
Cc: linux-mips@linux-mips.org
Cc: Dave Hansen <dave@sr71.net>
Link: http://lkml.kernel.org/r/20141114151825.565625B3@viggo.jf.intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-11-18 00:58:53 +01:00
Ingo Molnar
e9ac5f0fa8 Merge branch 'sched/urgent' into sched/core, to pick up fixes before applying more changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16 10:50:25 +01:00
Ingo Molnar
595247f61f * Support module unload for efivarfs - Mathias Krause
* Another attempt at moving x86 to libstub taking advantage of the
    __pure attribute - Ard Biesheuvel
 
  * Add EFI runtime services section to ptdump - Mathias Krause
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUZnQGAAoJEC84WcCNIz1VfyUP/1MCVt4vepl7+0JzdUP/eVPs
 CwPM6gBOvgx1PviWrtvSU8UyjtYDqZx7jnCvyvbmlgixAqqIoFm80x5sd9DfyJBj
 vSrmavXaJgQomJN3N+fvaIpGJXp8NQmeNT87++UMb6VE5nYvx7suDcwfTqOaxcYt
 yDwKatTXTvQxDLlGgtymp2UhgVKBICs9WVo8weevB5LPmpt4TFCi1GDSimJkfg+0
 JTvkKF+QxPmVqgwY7bgdFFcfhsYCux5VgtbD4DJKS3LgfLJLAMKPOt83DbeOSmIa
 8zqtlF3eMwHgecKrMLSfIZH3XIl2gsIdPdvT6iBQkwwGZuSzG93JgwJ90HYiKoDm
 yNlffnhmdgI2RXO97UZJpzqGor+eNc0auuS4485PcE8NtZ1tbo20A/OCpfIzK8j8
 Vk7sfZxaaHKF5PUtRe6vo3myRlUCofMIuSEWSF8d709R6AEEuia6RZ3Y45EJPROn
 fKOiLsf7Og1Mk43Iy2lb7kFT766OsUnZZHU/xiIZj/v94HPWFWoFPtxPC0IURvPx
 24oiJxCnyXWGtoyn+SSprl+NAPuPsxVFYriTwaq2RBuoY0NAdy7NIXKe2HTp/WI0
 oSTtYkCRcwiHv0aSrg+yQHmwH7y7m39S3yIS4t5LXenn2G4ObUUjhcAQdF2Ft0Pr
 MT/l+stTt390cpfpcQbE
 =he2O
 -----END PGP SIGNATURE-----

Merge tag 'efi-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/efi

Pull EFI updates for v3.19 from Matt Fleming:

 - Support module unload for efivarfs - Mathias Krause

 - Another attempt at moving x86 to libstub taking advantage of the
   __pure attribute - Ard Biesheuvel

 - Add EFI runtime services section to ptdump - Mathias Krause

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-11-16 10:48:53 +01:00
Linus Torvalds
1afcb6ed0d NFS client bugfixes for Linux 3.18
Highlights include:
 
 - Stable patches to fix NFSv4.x delegation reclaim error paths
 - Fix a bug whereby we were advertising NFSv4.1 but using NFSv4.2 features
 - Fix a use-after-free problem with pNFS block layouts
 - Fix a memory leak in the pNFS files O_DIRECT code
 - Replace an intrusive and Oops-prone performance fix in the NFSv4 atomic
   open code with a safer one-line version and revert the two original patches.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUZol9AAoJEGcL54qWCgDyNQQQALnngvpPR51BoO/iTz9ruXol
 fGZy0SRIlTUKm1ArsQsQ+HGbV5K0hgP3Tg+z2AtEEZ8u/2Fi2Bqdl6+eNY12tKHd
 uUctDdM5TXLrETAn1UULrnd2eX1cvPMBfOlXlAdNHHsGEgC7w7YQ+rzGwnls+HDy
 LYXzY7Y3jYGdTMaRgZc5YRdtd8JBpCxciRvPEQLDIobwP0JnZC1afTLe1XInqB2I
 TZ4NTHT+DEWA+Ou1P2deL7+RuJNEAeWWBvULJy76n4BqKvN/HNedOO5HyBYXrwSd
 3UX3wbx9CWRxN1F0EqNKxjxZ/597JwqBeNoTDRcofLsqumUfAOtlbym1EahcD3Ls
 pykopNfgUhGuhxolStmuHdS6CnyQPERpR5lFZcDp7XtcwSq4FcwD8DRzLJMZW5dg
 N1lkfFlwQN3rqdk/NEHL+IxS41Hlk4HXjMoP6MNbRtqzIN6tW9tvC4MtAWd1aYxO
 YuUW281pbWxXQ731s0kTIrMUdQ9vGSRBMcbnO9rL3o+xkh8y5SPVkx9lhdhJN0UD
 VbQ5Ws/xZ54bD1PfyYb+Yx659lI8MSFOsDuMuLmDtfYnVicHwCA3H63StvQ3ihf/
 q0gu8Iex9YbNNjf7IfYGuWPmPn3gwPBoURPC0bcZvMPdY6DXodU6Oj4BRTQ5VCie
 9N0pt2wp2eRjaSzD7r5A
 =8YN6
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.18-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 "Highlights include:

   - stable patches to fix NFSv4.x delegation reclaim error paths
   - fix a bug whereby we were advertising NFSv4.1 but using NFSv4.2
     features
   - fix a use-after-free problem with pNFS block layouts
   - fix a memory leak in the pNFS files O_DIRECT code
   - replace an intrusive and Oops-prone performance fix in the NFSv4
     atomic open code with a safer one-line version and revert the two
     original patches"

* tag 'nfs-for-3.18-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  sunrpc: fix sleeping under rcu_read_lock in gss_stringify_acceptor
  NFS: Don't try to reclaim delegation open state if recovery failed
  NFSv4: Ensure that we call FREE_STATEID when NFSv4.x stateids are revoked
  NFSv4: Fix races between nfs_remove_bad_delegation() and delegation return
  NFSv4.1: nfs41_clear_delegation_stateid shouldn't trust NFS_DELEGATED_STATE
  NFSv4: Ensure that we remove NFSv4.0 delegations when state has expired
  NFS: SEEK is an NFS v4.2 feature
  nfs: Fix use of uninitialized variable in nfs_getattr()
  nfs: Remove bogus assignment
  nfs: remove spurious WARN_ON_ONCE in write path
  pnfs/blocklayout: serialize GETDEVICEINFO calls
  nfs: fix pnfs direct write memory leak
  Revert "NFS: nfs4_do_open should add negative results to the dcache."
  Revert "NFS: remove BUG possibility in nfs4_open_and_get_state"
  NFSv4: Ensure nfs_atomic_open set the dentry verifier on ENOENT
2014-11-15 14:15:16 -08:00
Linus Torvalds
971ad4e4d6 Merge branch 'akpm' (fixes from Andrew Morton)
Merge misc fixes from Andrew Morton:
 "15 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  MAINTAINERS: add IIO include files
  kernel/panic.c: update comments for print_tainted
  mem-hotplug: reset node present pages when hot-adding a new pgdat
  mem-hotplug: reset node managed pages when hot-adding a new pgdat
  mm/debug-pagealloc: correct freepage accounting and order resetting
  fanotify: fix notification of groups with inode & mount marks
  mm, compaction: prevent infinite loop in compact_zone
  mm: alloc_contig_range: demote pages busy message from warn to info
  mm/slab: fix unalignment problem on Malta with EVA due to slab merge
  mm/page_alloc: restrict max order of merging on isolated pageblock
  mm/page_alloc: move freepage counting logic to __free_one_page()
  mm/page_alloc: add freepage on isolate pageblock to correct buddy list
  mm/page_alloc: fix incorrect isolation behavior by rechecking migratetype
  mm/compaction: skip the range until proper target pageblock is met
  zram: avoid kunmap_atomic() of a NULL pointer
2014-11-13 16:57:25 -08:00
Jan Kara
8edc6e1688 fanotify: fix notification of groups with inode & mount marks
fsnotify() needs to merge inode and mount marks lists when notifying
groups about events so that ignore masks from inode marks are reflected
in mount mark notifications and groups are notified in proper order
(according to priorities).

Currently the sorting of the lists done by fsnotify_add_inode_mark() /
fsnotify_add_vfsmount_mark() and fsnotify() differed which resulted
ignore masks not being used in some cases.

Fix the problem by always using the same comparison function when
sorting / merging the mark lists.

Thanks to Heinrich Schuchardt for improvements of my patch.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=87721
Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Tested-by: Heinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-11-13 16:17:06 -08:00
Yan, Zheng
3231300bb9 ceph: fix flush tid comparision
TID of cap flush ack is 64 bits, but ceph_inode_info::flushing_cap_tid
is only 16 bits. 16 bits should be plenty to let the cap flush updates
pipeline appropriately, but we need to cast in the proper direction when
comparing these differently-sized versions. So downcast the 64-bits one
to 16 bits.

Reflects ceph.git commit a5184cf46a6e867287e24aeb731634828467cd98.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
2014-11-13 22:19:05 +03:00
Trond Myklebust
f8ebf7a8ca NFS: Don't try to reclaim delegation open state if recovery failed
If state recovery failed, then we should not attempt to reclaim delegated
state.

http://lkml.kernel.org/r/CAN-5tyHwG=Cn2Q9KsHWadewjpTTy_K26ee+UnSvHvG4192p-Xw@mail.gmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-11-12 17:19:04 -05:00
Trond Myklebust
c606bb8857 NFSv4: Ensure that we call FREE_STATEID when NFSv4.x stateids are revoked
NFSv4.x (x>0) requires us to call TEST_STATEID+FREE_STATEID if a stateid is
revoked. We will currently fail to do this if the stateid is a delegation.

http://lkml.kernel.org/r/CAN-5tyHwG=Cn2Q9KsHWadewjpTTy_K26ee+UnSvHvG4192p-Xw@mail.gmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-11-12 17:19:04 -05:00
Trond Myklebust
869f9dfa4d NFSv4: Fix races between nfs_remove_bad_delegation() and delegation return
Any attempt to call nfs_remove_bad_delegation() while a delegation is being
returned is currently a no-op. This means that we can end up looping
forever in nfs_end_delegation_return() if something causes the delegation
to be revoked.
This patch adds a mechanism whereby the state recovery code can communicate
to the delegation return code that the delegation is no longer valid and
that it should not be used when reclaiming state.
It also changes the return value for nfs4_handle_delegation_recall_error()
to ensure that nfs_end_delegation_return() does not reattempt the lock
reclaim before state recovery is done.

http://lkml.kernel.org/r/CAN-5tyHwG=Cn2Q9KsHWadewjpTTy_K26ee+UnSvHvG4192p-Xw@mail.gmail.com
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-11-12 17:19:04 -05:00