Commit Graph

48768 Commits

Author SHA1 Message Date
David Disseldorp
6026685de3 cifs: fix CIFS_ENUMERATE_SNAPSHOTS oops
As with 618763958b, an open directory may have a NULL private_data
pointer prior to readdir. CIFS_ENUMERATE_SNAPSHOTS must check for this
before dereference.

Fixes: 834170c859 ("Enable previous version support")
Signed-off-by: David Disseldorp <ddiss@suse.de>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2017-05-03 09:59:20 -05:00
David Disseldorp
0e5c795592 cifs: fix leak in FSCTL_ENUM_SNAPS response handling
The server may respond with success, and an output buffer less than
sizeof(struct smb_snapshot_array) in length. Do not leak the output
buffer in this case.

Fixes: 834170c859 ("Enable previous version support")
Signed-off-by: David Disseldorp <ddiss@suse.de>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2017-05-03 09:54:12 -05:00
Linus Torvalds
89c9fea3c8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
  tty: fix comment for __tty_alloc_driver()
  init/main: properly align the multi-line comment
  init/main: Fix double "the" in comment
  Fix dead URLs to ftp.kernel.org
  drivers: Clean up duplicated email address
  treewide: Fix typo in xml/driver-api/basics.xml
  tools/testing/selftests/powerpc: remove redundant CFLAGS in Makefile: "-Wall -O2 -Wall" -> "-O2 -Wall"
  selftests/timers: Spelling s/privledges/privileges/
  HID: picoLCD: Spelling s/REPORT_WRTIE_MEMORY/REPORT_WRITE_MEMORY/
  net: phy: dp83848: Fix Typo
  UBI: Fix typos
  Documentation: ftrace.txt: Correct nice value of 120 priority
  net: fec: Fix typo in error msg and comment
  treewide: Fix typos in printk
2017-05-02 19:09:35 -07:00
Linus Torvalds
76f1948a79 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching
Pull livepatch updates from Jiri Kosina:

 - a per-task consistency model is being added for architectures that
   support reliable stack dumping (extending this, currently rather
   trivial set, is currently in the works).

   This extends the nature of the types of patches that can be applied
   by live patching infrastructure. The code stems from the design
   proposal made [1] back in November 2014. It's a hybrid of SUSE's
   kGraft and RH's kpatch, combining advantages of both: it uses
   kGraft's per-task consistency and syscall barrier switching combined
   with kpatch's stack trace switching. There are also a number of
   fallback options which make it quite flexible.

   Most of the heavy lifting done by Josh Poimboeuf with help from
   Miroslav Benes and Petr Mladek

   [1] https://lkml.kernel.org/r/20141107140458.GA21774@suse.cz

 - module load time patch optimization from Zhou Chengming

 - a few assorted small fixes

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
  livepatch: add missing printk newlines
  livepatch: Cancel transition a safe way for immediate patches
  livepatch: Reduce the time of finding module symbols
  livepatch: make klp_mutex proper part of API
  livepatch: allow removal of a disabled patch
  livepatch: add /proc/<pid>/patch_state
  livepatch: change to a per-task consistency model
  livepatch: store function sizes
  livepatch: use kstrtobool() in enabled_store()
  livepatch: move patching functions into patch.c
  livepatch: remove unnecessary object loaded check
  livepatch: separate enabled and patched states
  livepatch/s390: add TIF_PATCH_PENDING thread flag
  livepatch/s390: reorganize TIF thread flag bits
  livepatch/powerpc: add TIF_PATCH_PENDING thread flag
  livepatch/x86: add TIF_PATCH_PENDING thread flag
  livepatch: create temporary klp_update_patch_state() stub
  x86/entry: define _TIF_ALLWORK_MASK flags explicitly
  stacktrace/x86: add function for detecting reliable stack traces
2017-05-02 18:24:16 -07:00
Linus Torvalds
8d65b08deb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Millar:
 "Here are some highlights from the 2065 networking commits that
  happened this development cycle:

   1) XDP support for IXGBE (John Fastabend) and thunderx (Sunil Kowuri)

   2) Add a generic XDP driver, so that anyone can test XDP even if they
      lack a networking device whose driver has explicit XDP support
      (me).

   3) Sparc64 now has an eBPF JIT too (me)

   4) Add a BPF program testing framework via BPF_PROG_TEST_RUN (Alexei
      Starovoitov)

   5) Make netfitler network namespace teardown less expensive (Florian
      Westphal)

   6) Add symmetric hashing support to nft_hash (Laura Garcia Liebana)

   7) Implement NAPI and GRO in netvsc driver (Stephen Hemminger)

   8) Support TC flower offload statistics in mlxsw (Arkadi Sharshevsky)

   9) Multiqueue support in stmmac driver (Joao Pinto)

  10) Remove TCP timewait recycling, it never really could possibly work
      well in the real world and timestamp randomization really zaps any
      hint of usability this feature had (Soheil Hassas Yeganeh)

  11) Support level3 vs level4 ECMP route hashing in ipv4 (Nikolay
      Aleksandrov)

  12) Add socket busy poll support to epoll (Sridhar Samudrala)

  13) Netlink extended ACK support (Johannes Berg, Pablo Neira Ayuso,
      and several others)

  14) IPSEC hw offload infrastructure (Steffen Klassert)"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2065 commits)
  tipc: refactor function tipc_sk_recv_stream()
  tipc: refactor function tipc_sk_recvmsg()
  net: thunderx: Optimize page recycling for XDP
  net: thunderx: Support for XDP header adjustment
  net: thunderx: Add support for XDP_TX
  net: thunderx: Add support for XDP_DROP
  net: thunderx: Add basic XDP support
  net: thunderx: Cleanup receive buffer allocation
  net: thunderx: Optimize CQE_TX handling
  net: thunderx: Optimize RBDR descriptor handling
  net: thunderx: Support for page recycling
  ipx: call ipxitf_put() in ioctl error path
  net: sched: add helpers to handle extended actions
  qed*: Fix issues in the ptp filter config implementation.
  qede: Fix concurrency issue in PTP Tx path processing.
  stmmac: Add support for SIMATIC IOT2000 platform
  net: hns: fix ethtool_get_strings overflow in hns driver
  tcp: fix wraparound issue in tcp_lp
  bpf, arm64: fix jit branch offset related to ldimm64
  bpf, arm64: implement jiting of BPF_XADD
  ...
2017-05-02 16:40:27 -07:00
Steve French
26c9cb668c Set unicode flag on cifs echo request to avoid Mac error
Mac requires the unicode flag to be set for cifs, even for the smb
echo request (which doesn't have strings).

Without this Mac rejects the periodic echo requests (when mounting
with cifs) that we use to check if server is down

Signed-off-by: Steve French <smfrench@gmail.com>
CC: Stable <stable@vger.kernel.org>
2017-05-02 14:57:34 -05:00
Pavel Shilovsky
c610c4b619 CIFS: Add asynchronous write support through kernel AIO
This patch adds support to process write calls passed by io_submit()
asynchronously. It based on the previously introduced async context
that allows to process i/o responses in a separate thread and
return the caller immediately for asynchronous calls.

This improves writing performance of single threaded applications
with increasing of i/o queue depth size.

Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2017-05-02 14:57:34 -05:00
Pavel Shilovsky
6685c5e2d1 CIFS: Add asynchronous read support through kernel AIO
This patch adds support to process read calls passed by io_submit()
asynchronously. It based on the previously introduced async context
that allows to process i/o responses in a separate thread and
return the caller immediately for asynchronous calls.

This improves reading performance of single threaded applications
with increasing of i/o queue depth size.

Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2017-05-02 14:57:34 -05:00
Pavel Shilovsky
ccf7f4088a CIFS: Add asynchronous context to support kernel AIO
Currently the code doesn't recognize asynchronous calls passed
by io_submit() and processes all calls synchronously. This is not
what kernel AIO expects. This patch introduces a new async context
that keeps track of all issued i/o requests and moves a response
collecting procedure to a separate thread. This allows to return
to a caller immediately for async calls and call iocb->ki_complete()
once all requests are completed. For sync calls the current thread
simply waits until all requests are completed.

Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2017-05-02 14:57:34 -05:00
Daniel N Pettersson
29bb3158cf cifs: fix IPv6 link local, with scope id, address parsing
When the IP address is gotten from the UNC, use only the address part
of the UNC. Else all after the percent sign in an IPv6 link local
address is interpreted as a scope id. This includes the slash and
share name. A scope id is expected to be an integer and any trailing
characters makes the conversion to integer fail.
Example of mount command that fails:
mount -i -t cifs //fe80::6a05:caff:fe3e:8ffc%2/test /mnt/t -o sec=none

Signed-off-by: Daniel N Pettersson <danielnp@axis.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2017-05-02 14:57:34 -05:00
Dan Carpenter
564277ecee cifs: small underflow in cnvrtDosUnixTm()
January is month 1.  There is no zero-th month.  If someone passes a
zero month then it means we read from one space before the start of the
total_days_of_prev_months[] array.

We may as well also be strict about days as well.

Fixes: 1bd5bbcb65 ("[CIFS] Legacy time handling for Win9x and OS/2 part 1")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2017-05-02 14:57:34 -05:00
Linus Torvalds
204f144c9f Merge branch 'work.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull fs/compat.c cleanups from Al Viro:
 "More moving of compat syscalls from fs/compat.c to fs/*.c where the
  native counterparts live.

  And death to compat_sys_getdents64() - the only architecture that used
  to need it was ia64, and _that_ has lost biarch support quite a few
  years ago"

* 'work.compat' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs/compat.c: trim unused includes
  move compat_rw_copy_check_uvector() over to fs/read_write.c
  fhandle: move compat syscalls from compat.c
  open: move compat syscalls from compat.c
  stat: move compat syscalls from compat.c
  fcntl: move compat syscalls from compat.c
  readdir: move compat syscalls from compat.c
  statfs: move compat syscalls from compat.c
  utimes: move compat syscalls from compat.c
  move compat select-related syscalls to fs/select.c
  Remove compat_sys_getdents64()
2017-05-02 11:54:26 -07:00
Linus Torvalds
da7b66ffb2 Merge branch 'work.splice' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull splice updates from Al Viro:
 "These actually missed the last cycle; the branch itself is from last
  December"

* 'work.splice' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  make nr_pages calculation in default_file_splice_read() a bit less ugly
  splice/tee/vmsplice: validate flags
  splice_pipe_desc: kill ->flags
  remove spd_release_page()
2017-05-02 11:38:06 -07:00
Linus Torvalds
5b13475a5e Merge branch 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull iov_iter updates from Al Viro:
 "Cleanups that sat in -next + -stable fodder that has just missed 4.11.

  There's more iov_iter work in my local tree, but I'd prefer to push
  the stuff that had been in -next first"

* 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  iov_iter: don't revert iov buffer if csum error
  generic_file_read_iter(): make use of iov_iter_revert()
  generic_file_direct_write(): make use of iov_iter_revert()
  orangefs: use iov_iter_revert()
  sctp: switch to copy_from_iter_full()
  net/9p: switch to copy_from_iter_full()
  switch memcpy_from_msg() to copy_from_iter_full()
  rds: make use of iov_iter_revert()
2017-05-02 11:18:50 -07:00
Linus Torvalds
6fd4e7f774 Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fixes from Steve French:
 "Three cifs/smb3 fixes - including two for stable"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  cifs: don't check for failure from mempool_alloc()
  Do not return number of bytes written for ioctl CIFS_IOC_COPYCHUNK_FILE
  Fix match_prepath()
2017-05-02 11:16:29 -07:00
Linus Torvalds
2575be8ad3 - constify compression structures; Bhumika Goyal
- restore powerpc dumping; Ankit Kumar
 - fix more bugs in the rarely exercises module unloading logic
 - reorganize filesystem locking to fix problems noticed by lockdep
 - refactor internal pstore APIs to make development and review easier:
   - improve error reporting
   - add kernel-doc structure and function comments
   - avoid insane argument passing by using a common record structure
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 Comment: Kees Cook <kees@outflux.net>
 
 iQIcBAABCgAGBQJZB3wzAAoJEIly9N/cbcAmVQAP/3+GzoxUcL43ypsDa1CmCsFN
 l2roQjzWLNGfHgq5qkS/mNtrUdEvBMUBd2oyhHcaiqM0DsuuO3rKTp6dZ8oczYjN
 6GoTmU8ZwIPze3VEadNPjCIdpsfTMNKtZvVJCTrWnsgXxTawDS89qqr7SCs3qhBS
 Dm1E8oX77YyhOKoGA6O3CJxpdm/Ge4+KpPR6Uwj90eVro04vYoiwnBjyLUzE7w1l
 JcXEGEh1t5NjUxHeMwW7HZwJYZfA3DQ6I3MOOzGhf9tsKp6J0LQTTV8PMSEo1mif
 mLZhDBy8BBlunL+b2Tp3+c4QItGSHkBCWASI2RLa2TM7xvL67oC+qm/WaUyoRovy
 hllEG96rsCs3Zx7fFFsfQCwURcTWfJQMrD+0d/fM+P2ylWvgp+KU6PeLTS9IHu6M
 3n6i5i6A6OY/QvmZr1tN/06kUBjtQmo8EgQ0jxoxAlWyNcJqi93hmJyaRW28KxjS
 tjFTNLZMrslj0UDmjiD6fIuaT6gsGDB+3wAMPVAf+iV/k/2GUlj3ZILe4RaABAe9
 8xaUu11tZ5sTniayZ+10bA+6+K5n7uTlgU8RfFgaUZoRAzHgtyijOmdo6N+HILfK
 klv59B1Fmf6JpDlq7L9vurOqE82FAWFn4DruFM2bAaky2meFUNbYFiNfwK4l6lPI
 pmAgpdgRRvNMBCEmbVfv
 =S14G
 -----END PGP SIGNATURE-----

Merge tag 'pstore-v4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull pstore updates from Kees Cook:
 "This has a large internal refactoring along with several smaller
  fixes.

   - constify compression structures; Bhumika Goyal

   - restore powerpc dumping; Ankit Kumar

   - fix more bugs in the rarely exercises module unloading logic

   - reorganize filesystem locking to fix problems noticed by lockdep

   - refactor internal pstore APIs to make development and review
     easier:
      - improve error reporting
      - add kernel-doc structure and function comments
      - avoid insane argument passing by using a common record
        structure"

* tag 'pstore-v4.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (23 commits)
  pstore: Solve lockdep warning by moving inode locks
  pstore: Fix flags to enable dumps on powerpc
  pstore: Remove unused vmalloc.h in pmsg
  pstore: simplify write_user_compat()
  pstore: Remove write_buf() callback
  pstore: Replace arguments for write_buf_user() API
  pstore: Replace arguments for write_buf() API
  pstore: Replace arguments for erase() API
  pstore: Do not duplicate record metadata
  pstore: Allocate records on heap instead of stack
  pstore: Pass record contents instead of copying
  pstore: Always allocate buffer for decompression
  pstore: Replace arguments for write() API
  pstore: Replace arguments for read() API
  pstore: Switch pstore_mkfile to pass record
  pstore: Move record decompression to function
  pstore: Extract common arguments into structure
  pstore: Add kernel-doc for struct pstore_info
  pstore: Improve register_pstore() error reporting
  pstore: Avoid race in module unloading
  ...
2017-05-02 10:35:45 -07:00
Linus Torvalds
6dc2cce932 Merge branch 'x86-process-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pul x86/process updates from Ingo Molnar:
 "The main change in this cycle was to add the ARCH_[GET|SET]_CPUID
  prctl() ABI extension to control the availability of the CPUID
  instruction, analogously to the existing PR_GET|SET_TSC ABI that
  controls RDTSC.

  Motivation: the 'rr' user-space record-and-replay execution debugger
  would like to trap and emulate the CPUID instruction - which
  instruction is normally unprivileged.

  Trapping CPUID is possible on IvyBridge and later Intel CPUs - expose
  this hardware capability"

* 'x86-process-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/syscalls/32: Ignore arch_prctl for other architectures
  um/arch_prctl: Fix fallout from x86 arch_prctl() rework
  x86/arch_prctl: Add ARCH_[GET|SET]_CPUID
  x86/cpufeature: Detect CPUID faulting support
  x86/syscalls/32: Wire up arch_prctl on x86-32
  x86/arch_prctl: Add do_arch_prctl_common()
  x86/arch_prctl/64: Rename do_arch_prctl() to do_arch_prctl_64()
  x86/arch_prctl/64: Use SYSCALL_DEFINE2 to define sys_arch_prctl()
  x86/arch_prctl: Rename 'code' argument to 'option'
  x86/msr: Rename MISC_FEATURE_ENABLES to MISC_FEATURES_ENABLES
  x86/process: Optimize TIF_NOTSC switch
  x86/process: Correct and optimize TIF_BLOCKSTEP switch
  x86/process: Optimize TIF checks in __switch_to_xtra()
2017-05-01 19:57:58 -07:00
Linus Torvalds
3527d3e951 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The main changes in this cycle were:

   - another round of rq-clock handling debugging, robustization and
     fixes

   - PELT accounting improvements

   - CPU hotplug related ->cpus_allowed affinity handling fixes all
     around the tree

   - ... plus misc fixes, cleanups and updates"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits)
  sched/x86: Update reschedule warning text
  crypto: N2 - Replace racy task affinity logic
  cpufreq/sparc-us2e: Replace racy task affinity logic
  cpufreq/sparc-us3: Replace racy task affinity logic
  cpufreq/sh: Replace racy task affinity logic
  cpufreq/ia64: Replace racy task affinity logic
  ACPI/processor: Replace racy task affinity logic
  ACPI/processor: Fix error handling in __acpi_processor_start()
  sparc/sysfs: Replace racy task affinity logic
  powerpc/smp: Replace open coded task affinity logic
  ia64/sn/hwperf: Replace racy task affinity logic
  ia64/salinfo: Replace racy task affinity logic
  workqueue: Provide work_on_cpu_safe()
  ia64/topology: Remove cpus_allowed manipulation
  sched/fair: Move the PELT constants into a generated header
  sched/fair: Increase PELT accuracy for small tasks
  sched/fair: Fix comments
  sched/Documentation: Add 'sched-pelt' tool
  sched/fair: Fix corner case in __accumulate_sum()
  sched/core: Remove 'task' parameter and rename tsk_restore_flags() to current_restore_flags()
  ...
2017-05-01 19:12:53 -07:00
Linus Torvalds
5db6db0d40 Merge branch 'work.uaccess' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull uaccess unification updates from Al Viro:
 "This is the uaccess unification pile. It's _not_ the end of uaccess
  work, but the next batch of that will go into the next cycle. This one
  mostly takes copy_from_user() and friends out of arch/* and gets the
  zero-padding behaviour in sync for all architectures.

  Dealing with the nocache/writethrough mess is for the next cycle;
  fortunately, that's x86-only. Same for cleanups in iov_iter.c (I am
  sold on access_ok() in there, BTW; just not in this pile), same for
  reducing __copy_... callsites, strn*... stuff, etc. - there will be a
  pile about as large as this one in the next merge window.

  This one sat in -next for weeks. -3KLoC"

* 'work.uaccess' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (96 commits)
  HAVE_ARCH_HARDENED_USERCOPY is unconditional now
  CONFIG_ARCH_HAS_RAW_COPY_USER is unconditional now
  m32r: switch to RAW_COPY_USER
  hexagon: switch to RAW_COPY_USER
  microblaze: switch to RAW_COPY_USER
  get rid of padding, switch to RAW_COPY_USER
  ia64: get rid of copy_in_user()
  ia64: sanitize __access_ok()
  ia64: get rid of 'segment' argument of __do_{get,put}_user()
  ia64: get rid of 'segment' argument of __{get,put}_user_check()
  ia64: add extable.h
  powerpc: get rid of zeroing, switch to RAW_COPY_USER
  esas2r: don't open-code memdup_user()
  alpha: fix stack smashing in old_adjtimex(2)
  don't open-code kernel_setsockopt()
  mips: switch to RAW_COPY_USER
  mips: get rid of tail-zeroing in primitives
  mips: make copy_from_user() zero tail explicitly
  mips: clean and reorder the forest of macros...
  mips: consolidate __invoke_... wrappers
  ...
2017-05-01 14:41:04 -07:00
Arnd Bergmann
67fd389735 block, dax: use correct format string in bdev_dax_supported
The new message has an incorrect format string, causing a warning in some
configurations:

fs/block_dev.c: In function 'bdev_dax_supported':
fs/block_dev.c:779:5: error: format '%d' expects argument of type 'int', but argument 2 has type 'long int' [-Werror=format=]
     "error: dax access failed (%d)", len);

This changes it to use the correct %ld instead of %d.

Fixes: 2093f2e9df ("block, dax: convert bdev_dax_supported() to dax_direct_access()")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-05-01 13:16:29 -07:00
Linus Torvalds
694752922b Merge branch 'for-4.12/block' of git://git.kernel.dk/linux-block
Pull block layer updates from Jens Axboe:

 - Add BFQ IO scheduler under the new blk-mq scheduling framework. BFQ
   was initially a fork of CFQ, but subsequently changed to implement
   fairness based on B-WF2Q+, a modified variant of WF2Q. BFQ is meant
   to be used on desktop type single drives, providing good fairness.
   From Paolo.

 - Add Kyber IO scheduler. This is a full multiqueue aware scheduler,
   using a scalable token based algorithm that throttles IO based on
   live completion IO stats, similary to blk-wbt. From Omar.

 - A series from Jan, moving users to separately allocated backing
   devices. This continues the work of separating backing device life
   times, solving various problems with hot removal.

 - A series of updates for lightnvm, mostly from Javier. Includes a
   'pblk' target that exposes an open channel SSD as a physical block
   device.

 - A series of fixes and improvements for nbd from Josef.

 - A series from Omar, removing queue sharing between devices on mostly
   legacy drivers. This helps us clean up other bits, if we know that a
   queue only has a single device backing. This has been overdue for
   more than a decade.

 - Fixes for the blk-stats, and improvements to unify the stats and user
   windows. This both improves blk-wbt, and enables other users to
   register a need to receive IO stats for a device. From Omar.

 - blk-throttle improvements from Shaohua. This provides a scalable
   framework for implementing scalable priotization - particularly for
   blk-mq, but applicable to any type of block device. The interface is
   marked experimental for now.

 - Bucketized IO stats for IO polling from Stephen Bates. This improves
   efficiency of polled workloads in the presence of mixed block size
   IO.

 - A few fixes for opal, from Scott.

 - A few pulls for NVMe, including a lot of fixes for NVMe-over-fabrics.
   From a variety of folks, mostly Sagi and James Smart.

 - A series from Bart, improving our exposed info and capabilities from
   the blk-mq debugfs support.

 - A series from Christoph, cleaning up how handle WRITE_ZEROES.

 - A series from Christoph, cleaning up the block layer handling of how
   we track errors in a request. On top of being a nice cleanup, it also
   shrinks the size of struct request a bit.

 - Removal of mg_disk and hd (sorry Linus) by Christoph. The former was
   never used by platforms, and the latter has outlived it's usefulness.

 - Various little bug fixes and cleanups from a wide variety of folks.

* 'for-4.12/block' of git://git.kernel.dk/linux-block: (329 commits)
  block: hide badblocks attribute by default
  blk-mq: unify hctx delay_work and run_work
  block: add kblock_mod_delayed_work_on()
  blk-mq: unify hctx delayed_run_work and run_work
  nbd: fix use after free on module unload
  MAINTAINERS: bfq: Add Paolo as maintainer for the BFQ I/O scheduler
  blk-mq-sched: alloate reserved tags out of normal pool
  mtip32xx: use runtime tag to initialize command header
  scsi: Implement blk_mq_ops.show_rq()
  blk-mq: Add blk_mq_ops.show_rq()
  blk-mq: Show operation, cmd_flags and rq_flags names
  blk-mq: Make blk_flags_show() callers append a newline character
  blk-mq: Move the "state" debugfs attribute one level down
  blk-mq: Unregister debugfs attributes earlier
  blk-mq: Only unregister hctxs for which registration succeeded
  blk-mq-debugfs: Rename functions for registering and unregistering the mq directory
  blk-mq: Let blk_mq_debugfs_register() look up the queue name
  blk-mq: Register <dev>/queue/mq after having registered <dev>/queue
  ide-pm: always pass 0 error to ide_complete_rq in ide_do_devset
  ide-pm: always pass 0 error to __blk_end_request_all
  ..
2017-05-01 10:39:57 -07:00
Takashi Iwai
d66bb1607e proc: Fix unbalanced hard link numbers
proc_create_mount_point() forgot to increase the parent's nlink, and
it resulted in unbalanced hard link numbers, e.g. /proc/fs shows one
less than expected.

Fixes: eb6d38d542 ("proc: Allow creating permanently empty directories...")
Cc: stable@vger.kernel.org
Reported-by: Tristan Ye <tristan.ye@suse.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2017-04-28 21:05:26 -05:00
Linus Torvalds
28b2013587 Merge branch 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fix from Chris Mason:
 "We have one more fix for btrfs.

  This gets rid of a new WARN_ON from rc1 that ended up making more
  noise than we really want. The larger fix for the underflow got
  delayed a bit and it's better for now to put it under
  CONFIG_BTRFS_DEBUG"

* 'for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  btrfs: qgroup: move noisy underflow warning to debugging build
2017-04-28 10:13:17 -07:00
Brian Foster
e20c8a517f xfs: wait on new inodes during quotaoff dquot release
The quotaoff operation has a race with inode allocation that results
in a livelock. An inode allocation that occurs before the quota
status flags are updated acquires the appropriate dquots for the
inode via xfs_qm_vop_dqalloc(). It then inserts the XFS_INEW inode
into the perag radix tree, sometime later attaches the dquots to the
inode and finally clears the XFS_INEW flag. Quotaoff expects to
release the dquots from all inodes in the filesystem via
xfs_qm_dqrele_all_inodes(). This invokes the AG inode iterator,
which skips inodes in the XFS_INEW state because they are not fully
constructed. If the scan occurs after dquots have been attached to
an inode, but before XFS_INEW is cleared, the newly allocated inode
will continue to hold a reference to the applicable dquots. When
quotaoff invokes xfs_qm_dqpurge_all(), the reference count of those
dquot(s) remain elevated and the dqpurge scan spins indefinitely.

To address this problem, update the xfs_qm_dqrele_all_inodes() scan
to wait on inodes marked on the XFS_INEW state. We wait on the
inodes explicitly rather than skip and retry to avoid continuous
retry loops due to a parallel inode allocation workload. Since
quotaoff updates the quota state flags and uses a synchronous
transaction before the dqrele scan, and dquots are attached to
inodes after radix tree insertion iff quota is enabled, one INEW
waiting pass through the AG guarantees that the scan has processed
all inodes that could possibly hold dquot references.

Reported-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-28 08:11:08 -07:00
Brian Foster
ae2c4ac2dd xfs: update ag iterator to support wait on new inodes
The AG inode iterator currently skips new inodes as such inodes are
inserted into the inode radix tree before they are fully
constructed. Certain contexts require the ability to wait on the
construction of new inodes, however. The fs-wide dquot release from
the quotaoff sequence is an example of this.

Update the AG inode iterator to support the ability to wait on
inodes flagged with XFS_INEW upon request. Create a new
xfs_inode_ag_iterator_flags() interface and support a set of
iteration flags to modify the iteration behavior. When the
XFS_AGITER_INEW_WAIT flag is set, include XFS_INEW flags in the
radix tree inode lookup and wait on them before the callback is
executed.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-28 08:11:08 -07:00
Brian Foster
756baca27f xfs: support ability to wait on new inodes
Inodes that are inserted into the perag tree but still under
construction are flagged with the XFS_INEW bit. Most contexts either
skip such inodes when they are encountered or have the ability to
handle them.

The runtime quotaoff sequence introduces a context that must wait
for construction of such inodes to correctly ensure that all dquots
in the fs are released. In anticipation of this, support the ability
to wait on new inodes. Wake the appropriate bit when XFS_INEW is
cleared.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-28 08:11:08 -07:00
Amir Goldstein
8f720d9f89 xfs: publish UUID in struct super_block
Copy the uuid of the filesystem to struct super_block s_uuid field,
as several other filesystems already do.  Copy regardless of the nouuid
mount option, because other filesystems also do not guaranty uniqueness
of the s_uuid field in super_block struct.

Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-28 08:10:53 -07:00
NeilBrown
a6f74e80f2 cifs: don't check for failure from mempool_alloc()
mempool_alloc() cannot fail if the gfp flags allow it to
sleep, and both GFP_FS allows for sleeping.

So these tests of the return value from mempool_alloc()
cannot be needed.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2017-04-28 07:56:33 -05:00
Sachin Prabhu
7d0c234fd2 Do not return number of bytes written for ioctl CIFS_IOC_COPYCHUNK_FILE
commit 620d8745b3 ("Introduce cifs_copy_file_range()") changes the
behaviour of the cifs ioctl call CIFS_IOC_COPYCHUNK_FILE. In case of
successful writes, it now returns the number of bytes written. This
return value is treated as an error by the xfstest cifs/001. Depending
on the errno set at that time, this may or may not result in the test
failing.

The patch fixes this by setting the return value to 0 in case of
successful writes.

Fixes: commit 620d8745b3 ("Introduce cifs_copy_file_range()")
Reported-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <smfrench@gmail.com>
2017-04-28 07:56:33 -05:00
Sachin Prabhu
cd8c42968e Fix match_prepath()
Incorrect return value for shares not using the prefix path means that
we will never match superblocks for these shares.

Fixes: commit c1d8b24d18 ("Compare prepaths when comparing superblocks")
Cc: stable@vger.kernel.org
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2017-04-28 07:54:54 -05:00
Kees Cook
3a7d2fd16c pstore: Solve lockdep warning by moving inode locks
Lockdep complains about a possible deadlock between mount and unlink
(which is technically impossible), but fixing this improves possible
future multiple-backend support, and keeps locking in the right order.

The lockdep warning could be triggered by unlinking a file in the
pstore filesystem:

  -> #1 (&sb->s_type->i_mutex_key#14){++++++}:
         lock_acquire+0xc9/0x220
         down_write+0x3f/0x70
         pstore_mkfile+0x1f4/0x460
         pstore_get_records+0x17a/0x320
         pstore_fill_super+0xa4/0xc0
         mount_single+0x89/0xb0
         pstore_mount+0x13/0x20
         mount_fs+0xf/0x90
         vfs_kern_mount+0x66/0x170
         do_mount+0x190/0xd50
         SyS_mount+0x90/0xd0
         entry_SYSCALL_64_fastpath+0x1c/0xb1

  -> #0 (&psinfo->read_mutex){+.+.+.}:
         __lock_acquire+0x1ac0/0x1bb0
         lock_acquire+0xc9/0x220
         __mutex_lock+0x6e/0x990
         mutex_lock_nested+0x16/0x20
         pstore_unlink+0x3f/0xa0
         vfs_unlink+0xb5/0x190
         do_unlinkat+0x24c/0x2a0
         SyS_unlinkat+0x16/0x30
         entry_SYSCALL_64_fastpath+0x1c/0xb1

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&sb->s_type->i_mutex_key#14);
                                lock(&psinfo->read_mutex);
                                lock(&sb->s_type->i_mutex_key#14);
   lock(&psinfo->read_mutex);

Reported-by: Marta Lofstedt <marta.lofstedt@intel.com>
Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
2017-04-27 20:35:34 -07:00
Geliang Tang
3509d048c8 pstore: Remove unused vmalloc.h in pmsg
Since the vmalloc code has been removed from write_pmsg() in the commit
"5bf6d1b pstore/pmsg: drop bounce buffer", remove the unused header
vmalloc.h.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
2017-04-27 14:48:59 -07:00
Linus Torvalds
8b5d11e4b0 Thanks to Ari Kauppi and Tuomas Haanpää at Synopsis for spotting bugs in
our NFSv2/v3 xdr code that could crash the server or leak memory.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJZAlLrAAoJECebzXlCjuG+lb8P/idTu9rLGaU1VYPInrdoXru0
 iPY+p5inmGSYW2MfCGlS7disaCACgzPBKVqKjeNB1hfHn2JZCrfeBd0XvBYlc7TH
 JYmlHKSBjN/ZfnYMl0WrlVUCZstt4JVmxWjO3sZTCL3nImEbGA7d13yBDagxISWh
 gM9wOOiJwR5lzT/W3MezNCYj4n27/vVhODMP+Qhy0rpK08dBORIH0Bi7hwtVgdD9
 cMVIXMbujmJrCz0Uhbo/DhoItcePRBrCLTXdY6WEeMmHxnXlyaA2XuA0EjjZHuMs
 +BsbAOsNy1BxY1c8Z2ignYgRymUvUBHeiJeIGZLbyKKM2OJMtE4BoqlSWjx08P3Y
 hTwTuaw8u7uxfAsvxepukZoonWj/uVY5tP2Hyq+K6CIKKJpB7vj+c3QGYD3dNnUu
 zDl/LG3ayZgpPXqfjKRnSzZE+St1/IwDnvaM2WN2B1mkuerVr8qDGq6xd4kB0QKX
 BcEKiwcb2ewfPaLlVnSXz6Wbuh2pB42BObJjC3qbOgMvQ7SBUM0UcZmFpJJ20uCR
 BX20aFzB/GHcd6fTRHpDrAxB4XGdVG/8Da5Ki2WhRnmmaeSXushhiKWImujY9B6i
 s4mZHu4gGGJdLzOT5u2HZl93STevriL70SA9nZPhPLQycnJMUAO0buU9UcNI7wXR
 GVu2F3IHd+anxDtDwOv4
 =Glhl
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-4.11-3' of git://linux-nfs.org/~bfields/linux

Pull nfsd fixes from Bruce Fields:
 "Thanks to Ari Kauppi and Tuomas Haanpää at Synopsis for spotting bugs
  in our NFSv2/v3 xdr code that could crash the server or leak memory"

* tag 'nfsd-4.11-3' of git://linux-nfs.org/~bfields/linux:
  nfsd: stricter decoding of write-like NFSv2/v3 ops
  nfsd4: minor NFSv2/v3 write decoding cleanup
  nfsd: check for oversized NFSv2/v3 arguments
2017-04-27 13:39:19 -07:00
Linus Torvalds
19ac447420 A fix for a kernel stack overflow bug in ceph setattr code, marked for
stable.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJZAhJLAAoJEEp/3jgCEfOLw4gH/ia+bMzmsnkYtjMfxQfCh0ia
 MHi7JS/YcAej/o71c/tvWlTU7mRbmvUCVSAcishRNytEBNGL8YzkP12vMOp/5Vdx
 kKk6yDWn9z0mR5/YdBKaE8ziM5Umdy+zLqeL4yuxyhtbxKFGUPG4txJKS5WD80yU
 Ld/toF2fL3y/JEs+s1pd5G+DPhEhEm2hFf56/VI6N7y08CHJgTqHB3GJ3ZnuUbnU
 UhSvNR9skdVirObI8jt3oWIix8uAGq5+6MjVeTqXo75Qng5sdBGZ8S2agxXbM3j7
 Hu8h/1bhKyPCUzAXnOyGcZeR+5DQolKmlKLhogbT4I9X4YC2ie4Djg0bmFHscWI=
 =8aUa
 -----END PGP SIGNATURE-----

Merge tag 'ceph-for-4.11-rc9' of git://github.com/ceph/ceph-client

Pull ceph fix from Ilya Dryomov:
 "A fix for a kernel stack overflow bug in ceph setattr code, marked for
  stable"

* tag 'ceph-for-4.11-rc9' of git://github.com/ceph/ceph-client:
  ceph: fix recursion between ceph_set_acl() and __ceph_setattr()
2017-04-27 11:38:05 -07:00
Linus Torvalds
f56fc7bdaa Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:

 - fix orangefs handling of faults on write() - I'd missed that one back
   when orangefs was going through review.

 - readdir counterpart of "9p: cope with bogus responses from server in
   p9_client_{read,write}" - server might be lying or broken, and we'd
   better not overrun the kmalloc'ed buffer we are copying the results
   into.

 - NFS O_DIRECT read/write can leave iov_iter advanced by too much;
   that's what had been causing iov_iter_pipe() warnings davej had been
   seeing.

 - statx_timestamp.tv_nsec type fix (s32 -> u32). That one really should
   go in before 4.11.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  uapi: change the type of struct statx_timestamp.tv_nsec to unsigned
  fix nfs O_DIRECT advancing iov_iter too much
  p9_client_readdir() fix
  orangefs_bufmap_copy_from_iovec(): fix EFAULT handling
2017-04-27 11:09:37 -07:00
Lukas Czerner
3c3781951c xfs: Allow user to kill fstrim process
fstrim can take really long time on big, slow device or on file system
with a lots of allocation groups. Currently there is no way for the user
to cancell the operation. This patch makes it possible for the user to
kill fstrim pocess by adding the check for fatal_signal_pending() in
xfs_trim_extents().

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Reported-by: Zdenek Kabelac <zkabelac@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-27 10:45:34 -07:00
Michael Kerrisk (man-pages)
59372bbf3a statx: correct error handling of NULL pathname
The change in commit 1e2f82d1e9 ("statx: Kill fd-with-NULL-path
support in favour of AT_EMPTY_PATH") to error on a NULL pathname to
statx() is inconsistent.

It results in the error EINVAL for a NULL pathname.  Other system calls
with similar APIs (fchownat(), fstatat(), linkat()), return EFAULT.

The solution is simply to remove the EINVAL check.  As I already pointed
out in [1], user_path_at*() and filename_lookup() will handle the NULL
pathname as per the other APIs, to correctly produce the error EFAULT.

[1] https://lkml.org/lkml/2017/4/26/561

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Sandeen <sandeen@sandeen.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-27 10:45:09 -07:00
David S. Miller
b1513c3531 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-26 22:39:08 -04:00
David Howells
1e2f82d1e9 statx: Kill fd-with-NULL-path support in favour of AT_EMPTY_PATH
With the new statx() syscall, the following both allow the attributes of
the file attached to a file descriptor to be retrieved:

	statx(dfd, NULL, 0, ...);

and:

	statx(dfd, "", AT_EMPTY_PATH, ...);

Change the code to reject the first option, though this means copying
the path and engaging pathwalk for the fstat() equivalent.  dfd can be a
non-directory provided path is "".

[ The timing of this isn't wonderful, but applying this now before we
  have statx() in any released kernel, before anybody starts using the
  NULL special case.    - Linus ]

Fixes: a528d35e8b ("statx: Add a system call to make enhanced file info available")
Reported-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Eric Sandeen <sandeen@sandeen.net>
cc: fstests@vger.kernel.org
cc: linux-api@vger.kernel.org
cc: linux-man@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-26 15:05:47 -07:00
Dan Carpenter
907bfcd8d8 orangefs: handle zero size write in debugfs
If we write zero bytes to this debugfs file, then it will cause an
underflow when we do copy_from_user(buf, ubuf, count - 1).  Debugfs can
normally only be written to by root so the impact of this is low.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26 14:33:01 -04:00
Martin Brandenburg
b5a9d61eeb orangefs: do not wait for timeout if umounting
When the computer is turned off, all the processes are killed and then
all the filesystems are umounted.  OrangeFS should not wait for the
userspace daemon to come back in that case.

This only works for plain umount(2).  To actually take advantage of this
interactively, `umount -f' is needed; otherwise umount will issue a
statfs first, which will wait for the userspace daemon to come back.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26 14:33:01 -04:00
Martin Brandenburg
b7a57ccab8 orangefs: return from orangefs_devreq_read quickly if possible
It is not necessary to take the lock and search through the request list
if the list is empty.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26 14:33:00 -04:00
Martin Brandenburg
9d286b0d82 orangefs: ensure the userspace component is unmounted if mount fails
If the mount is aborted after userspace has been asked to mount,
userspace must be told to unmount.

Ordinarily orangefs_kill_sb does the unmount.  However it cannot be
called if the superblock has not been set up.  This is a very narrow
window.

The NULL fs_id is not unmounted.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26 14:33:00 -04:00
Martin Brandenburg
53950ef541 orangefs: do not check possibly stale size on truncate
Let the server figure this out because our size might be out of date or
not present.

The bug was that

	xfs_io -f -t -c "pread -v 0 100" /mnt/foo
	echo "Test" > /mnt/foo
	xfs_io -f -t -c "pread -v 0 100" /mnt/foo

fails because the second truncate did not happen if nothing had
requested the size after the write in echo.  Thus i_size was zero (not
present) and the orangefs_setattr though i_size was zero and there was
nothing to do.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26 14:33:00 -04:00
Martin Brandenburg
68a24a6cc4 orangefs: implement statx
Fortunately OrangeFS has had a getattr request mask for a long time.

The server basically has two difficulty levels for attributes.  Fetching
any attribute except size requires communicating with the metadata
server for that handle.  Since all the attributes are right there, it
makes sense to return them all.  Fetching the size requires
communicating with every I/O server (that the file is distributed
across).  Therefore if asked for anything except size, get everything
except size, and if asked for size, get everything.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26 14:33:00 -04:00
Martin Brandenburg
7b796ae370 orangefs: remove ORANGEFS_READDIR macros
They are clones of the ORANGEFS_ITERATE macros in use elsewhere.  Delete
ORANGEFS_ITERATE_NEXT which is a hack previously used by readdir.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26 14:33:00 -04:00
Martin Brandenburg
480e3e532e orangefs: support very large directories
This works by maintaining a linked list of pages which the directory
has been read into rather than one giant fixed-size buffer.

This replaces code which limits the total directory size to the total
amount that could be returned in one server request.  Since filenames
are usually considerably shorter than the maximum, the old code could
usually handle several server requests before running out of space.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26 14:33:00 -04:00
Martin Brandenburg
72f66b8329 orangefs: support llseek on directories
This and the previous commit fix xfstests generic/257.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26 14:33:00 -04:00
Martin Brandenburg
382f4581e6 orangefs: rewrite readdir to fix several bugs
In the past, readdir assumed that the user buffer will be large enough
that all entries from the server will fit.  If this was not true,
entries would be skipped.

Since it works now, request 512 entries rather than 96 per server
operation.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26 14:33:00 -04:00
Martin Brandenburg
17930b252c orangefs: do not set getattr_time on orangefs_lookup
Since orangefs_lookup calls orangefs_iget which calls
orangefs_inode_getattr, getattr_time will get set.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26 14:33:00 -04:00
Martin Brandenburg
e675c5ec51 orangefs: clean up oversize xattr validation
Also don't check flags as this has been validated by the VFS already.

Fix an off-by-one error in the max size checking.

Stop logging just because userspace wants to write attributes which do
not fit.

This and the previous commit fix xfstests generic/020.

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26 14:33:00 -04:00
Martin Brandenburg
a956af337b orangefs: fix bounds check for listxattr
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Cc: stable@vger.kernel.org
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26 14:33:00 -04:00
Martin Brandenburg
418ce3eb66 orangefs: remove unused get_fsid_from_ino
Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2017-04-26 14:33:00 -04:00
Al Viro
eea86b637a Merge branches 'uaccess.alpha', 'uaccess.arc', 'uaccess.arm', 'uaccess.arm64', 'uaccess.avr32', 'uaccess.bfin', 'uaccess.c6x', 'uaccess.cris', 'uaccess.frv', 'uaccess.h8300', 'uaccess.hexagon', 'uaccess.ia64', 'uaccess.m32r', 'uaccess.m68k', 'uaccess.metag', 'uaccess.microblaze', 'uaccess.mips', 'uaccess.mn10300', 'uaccess.nios2', 'uaccess.openrisc', 'uaccess.parisc', 'uaccess.powerpc', 'uaccess.s390', 'uaccess.score', 'uaccess.sh', 'uaccess.sparc', 'uaccess.tile', 'uaccess.um', 'uaccess.unicore32', 'uaccess.x86' and 'uaccess.xtensa' into work.uaccess 2017-04-26 12:06:59 -04:00
J. Bruce Fields
13bf9fbff0 nfsd: stricter decoding of write-like NFSv2/v3 ops
The NFSv2/v3 code does not systematically check whether we decode past
the end of the buffer.  This generally appears to be harmless, but there
are a few places where we do arithmetic on the pointers involved and
don't account for the possibility that a length could be negative.  Add
checks to catch these.

Reported-by: Tuomas Haanpää <thaan@synopsys.com>
Reported-by: Ari Kauppi <ari@synopsys.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-04-25 16:36:23 -04:00
J. Bruce Fields
db44bac41b nfsd4: minor NFSv2/v3 write decoding cleanup
Use a couple shortcuts that will simplify a following bugfix.

Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-04-25 16:36:16 -04:00
J. Bruce Fields
e6838a29ec nfsd: check for oversized NFSv2/v3 arguments
A client can append random data to the end of an NFSv2 or NFSv3 RPC call
without our complaining; we'll just stop parsing at the end of the
expected data and ignore the rest.

Encoded arguments and replies are stored together in an array of pages,
and if a call is too large it could leave inadequate space for the
reply.  This is normally OK because NFS RPC's typically have either
short arguments and long replies (like READ) or long arguments and short
replies (like WRITE).  But a client that sends an incorrectly long reply
can violate those assumptions.  This was observed to cause crashes.

Also, several operations increment rq_next_page in the decode routine
before checking the argument size, which can leave rq_next_page pointing
well past the end of the page array, causing trouble later in
svc_free_pages.

So, following a suggestion from Neil Brown, add a central check to
enforce our expectation that no NFSv2/v3 call has both a large call and
a large reply.

As followup we may also want to rewrite the encoding routines to check
more carefully that they aren't running off the end of the page array.

We may also consider rejecting calls that have any extra garbage
appended.  That would be safer, and within our rights by spec, but given
the age of our server and the NFS protocol, and the fact that we've
never enforced this before, we may need to balance that against the
possibility of breaking some oddball client.

Reported-by: Tuomas Haanpää <thaan@synopsys.com>
Reported-by: Ari Kauppi <ari@synopsys.com>
Cc: stable@vger.kernel.org
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2017-04-25 16:34:37 -04:00
Dan Williams
d4b29fd78e block: remove block_device_operations ->direct_access()
Now that all the producers and consumers of dax interfaces have been
converted to using dax_operations on a dax_device, remove the block
device direct_access enabling.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-04-25 13:20:46 -07:00
Dan Williams
2093f2e9df block, dax: convert bdev_dax_supported() to dax_direct_access()
Kill of the final user of bdev_direct_access() and struct blk_dax_ctl.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-04-25 13:20:46 -07:00
Dan Williams
cccbce6715 filesystem-dax: convert to dax_direct_access()
Now that a dax_device is plumbed through all dax-capable drivers we can
switch from block_device_operations to dax_operations for invoking
->direct_access.

This also lets us kill off some usages of struct blk_dax_ctl on the way
to its eventual removal.

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-04-25 13:20:46 -07:00
Dan Williams
a41fe02b6b Revert "block: use DAX for partition table reads"
commit d1a5f2b4d8 ("block: use DAX for partition table reads") was
part of a stalled effort to allow dax mappings of block devices. Since
then the device-dax mechanism has filled the role of dax-mapping static
device ranges.

Now that we are moving ->direct_access() from a block_device operation
to a dax_inode operation we would need block devices to map and carry
their own dax_inode reference.

Unless / until we decide to revive dax mapping of raw block devices
through the dax_inode scheme, there is no need to carry
read_dax_sector(). Its removal in turn allows for the removal of
bdev_direct_access() and should have been included in commit
2237570168 ("block_dev: remove DAX leftovers").

Cc: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-04-25 13:20:46 -07:00
Dan Williams
fa5d932c32 ext2, ext4, xfs: retrieve dax_device for iomap operations
In preparation for converting fs/dax.c to use dax_direct_access()
instead of bdev_direct_access(), add the plumbing to retrieve the
dax_device associated with a given block_device.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-04-25 13:20:46 -07:00
Yan, Zheng
8179a101eb ceph: fix recursion between ceph_set_acl() and __ceph_setattr()
ceph_set_acl() calls __ceph_setattr() if the setacl operation needs
to modify inode's i_mode. __ceph_setattr() updates inode's i_mode,
then calls posix_acl_chmod().

The problem is that __ceph_setattr() calls posix_acl_chmod() before
sending the setattr request. The get_acl() call in posix_acl_chmod()
can trigger a getxattr request. The reply of the getxattr request
can restore inode's i_mode to its old value. The set_acl() call in
posix_acl_chmod() sees old value of inode's i_mode, so it calls
__ceph_setattr() again.

Cc: stable@vger.kernel.org # needs backporting for < 4.9
Link: http://tracker.ceph.com/issues/19688
Reported-by: Jerry Lee <leisurelysw24@gmail.com>
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Tested-by: Luis Henriques <lhenriques@suse.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2017-04-25 21:08:26 +02:00
Darrick J. Wong
c4cf1acdb1 xfs: better log intent item refcount checking
Use ASSERTs on the log intent item refcounts so that we fail noisily if
anyone tries to double-free the item.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-04-25 09:40:42 -07:00
Brian Foster
20e8a06378 xfs: fix up quotacheck buffer list error handling
The quotacheck error handling of the delwri buffer list assumes the
resident buffers are locked and doesn't clear the _XBF_DELWRI_Q flag
on the buffers that are dequeued. This can lead to assert failures
on buffer release and possibly other locking problems.

Move this code to a delwri queue cancel helper function to
encapsulate the logic required to properly release buffers from a
delwri queue. Update the helper to clear the delwri queue flag and
call it from quotacheck.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25 09:40:42 -07:00
Christoph Hellwig
27af1bbf52 xfs: remove xfs_trans_ail_delete_bulk
xfs_iflush_done uses an on-stack variable length array to pass the log
items to be deleted to xfs_trans_ail_delete_bulk.  On-stack VLAs are a
nasty gcc extension that can lead to unbounded stack allocations, but
fortunately we can easily avoid them by simply open coding
xfs_trans_ail_delete_bulk in xfs_iflush_done, which is the only caller
of it except for the single-item xfs_trans_ail_delete.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25 09:40:42 -07:00
Christoph Hellwig
3f88a15ae0 xfs: don't use bool values in trace buffers
Using bool values produces sparse warnings of this form:

fs/xfs/./xfs_trace.h:2252:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1)
fs/xfs/./xfs_trace.h:2252:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1)
fs/xfs/./xfs_trace.h:2278:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1)
fs/xfs/./xfs_trace.h:2278:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1)
fs/xfs/./xfs_trace.h:2307:1: warning: odd constant _Bool cast (ffffffffffffffff becomes 1)

Just use a char instead to fix those up.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25 09:40:42 -07:00
Darrick J. Wong
12e4a381c5 xfs: fix getfsmap userspace memory corruption while setting OF_LAST
At the end of a getfsmap call, we will set FMR_OF_LAST in the last
struct fsmap that was handed in by userspace if we've truly run out of
space mapping record (as opposed to simply running out of space in the
user array).  Unfortunately, fmh_entries is the wrong check for whether
or not we've filled out anything in the user array because the ioctl
provides that fmh_count==0 sets fmh_entries without filling out the user
array.  Therefore we end up writing things into user memory areas that we
weren't given, and kaboom.

Since Christoph amended the getfsmap structure to track the number of
fsmap entries we've actually filled out, use that as part of deciding if
we have to set the OF_LAST flag.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2017-04-25 09:40:42 -07:00
Christoph Hellwig
9d17e14cc0 xfs: fix __user annotations for xfs_ioc_getfsmap
By passing the whole fsmap_head structure and an index we can get the
user point annotations right for the embedded variable sized array
in struct fsmap_head.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: change idx to unsigned int]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25 09:40:42 -07:00
Christoph Hellwig
e2a641922a xfs: corruption needs to respect endianess too!
At least if we want to be able to recognize the pattern.  Add a missing
byte swap to the corruption injection case in xlog_sync.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25 09:40:42 -07:00
Christoph Hellwig
ef2b67ecf8 xfs: use NULL instead of 0 to initialize a pointer in xfs_ioc_getfsmap
Found by sparse.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25 09:40:41 -07:00
Christoph Hellwig
fad5656b22 xfs: use NULL instead of 0 to initialize a pointer in xfs_getfsmap
Found by sparse.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25 09:40:41 -07:00
Christoph Hellwig
0c1d9e4a61 xfs: simplify validation of the unwritten extent bit
XFS only supports the unwritten extent bit in the data fork, and only if
the file system has a version 5 superblock or the unwritten extent
feature bit.

We currently have two routines that validate the invariant:
xfs_check_nostate_extents which return -EFSCORRUPTED when it's not met,
and xfs_validate_extent that triggers and assert in debug build.

Both of them iterate over all extents of an inode fork when called,
which isn't very efficient.

This patch instead adds a new helper that verifies the invariant one
extent at a time, and calls it from the places where we iterate over
all extents to converted them from or two the in-memory format.  The
callers then return -EFSCORRUPTED when reading invalid extents from
disk, or trigger an assert when writing them to disk.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25 09:40:41 -07:00
Christoph Hellwig
37f7f9bbf3 xfs: remove unused values from xfs_exntst_t
We only ever use the normal and unwritten states.  And the actual
ondisk format (this enum isn't despite being in xfs_format.h) only
has space for the unwritten bit anyway.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25 09:40:41 -07:00
Christoph Hellwig
895e9bfc9e xfs: remove the unused XFS_MAXLINK_1 define
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25 09:40:41 -07:00
Eric Sandeen
4f1adf3373 xfs: more do_div cleanups
On some architectures do_div does the pointer compare
trick to make sure that we've sent it an unsigned 64-bit
number.  (Why unsigned?  I don't know.)

Fix up the few places that squawk about this; in
xfs_bmap_wants_extents() we just used a bare int64_t so change
that to unsigned.

In xfs_adjust_extent_unmap_boundaries() all we wanted was the
mod, and we have an xfs-specific function to handle that w/o
side effects, which includes proper casting for do_div.

In xfs_daddr_to_ag[b]no, we were using the wrong type anyway;
XFS_BB_TO_FSBT returns a block in the filesystem, so use
xfs_rfsblock_t not xfs_daddr_t, and gain the unsignedness
from that type as a bonus.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25 09:40:41 -07:00
Eric Sandeen
90115407c5 xfs: remove use of do_div with 32-bit dividend in quota
The kbuild test robot caught this; in debug code we have another
caller of do_div with a 32-bit dividend (j) which is caught now
that we are using the kernel-supplied do_div.

None of the values used here are 64-bit; just use simple division.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25 09:40:41 -07:00
Hou Tao
42bf9dba40 xfs: remove the trailing newline used in the fmt parameter of TP_printk
The trailing newlines wil lead to extra newlines in the trace file
which looks like the following output, so remove them.
>kworker/4:1H-1508  [004] .... 47879.101608: xfs_discard_extent: dev 8:0
>
>kworker/u16:2-238  [004] .... 47879.101725: xfs_extent_busy_clear: dev 8:0

Signed-off-by: Hou Tao <houtao1@huawei.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
[darrick: fix the getfsmap tracepoints too]
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25 09:40:40 -07:00
Brian Foster
cb52ee334a xfs: prevent multi-fsb dir readahead from reading random blocks
Directory block readahead uses a complex iteration mechanism to map
between high-level directory blocks and underlying physical extents.
This mechanism attempts to traverse the higher-level dir blocks in a
manner that handles multi-fsb directory blocks and simultaneously
maintains a reference to the corresponding physical blocks.

This logic doesn't handle certain (discontiguous) physical extent
layouts correctly with multi-fsb directory blocks. For example,
consider the case of a 4k FSB filesystem with a 2 FSB (8k) directory
block size and a directory with the following extent layout:

 EXT: FILE-OFFSET      BLOCK-RANGE      AG AG-OFFSET        TOTAL
   0: [0..7]:          88..95            0 (88..95)             8
   1: [8..15]:         80..87            0 (80..87)             8
   2: [16..39]:        168..191          0 (168..191)          24
   3: [40..63]:        5242952..5242975  1 (72..95)            24

Directory block 0 spans physical extents 0 and 1, dirblk 1 lies
entirely within extent 2 and dirblk 2 spans extents 2 and 3. Because
extent 2 is larger than the directory block size, the readahead code
erroneously assumes the block is contiguous and issues a readahead
based on the physical mapping of the first fsb of the dirblk. This
results in read verifier failure and a spurious corruption or crc
failure, depending on the filesystem format.

Further, the subsequent readahead code responsible for walking
through the physical table doesn't correctly advance the physical
block reference for dirblk 2. Instead of advancing two physical
filesystem blocks, the first iteration of the loop advances 1 block
(correctly), but the subsequent iteration advances 2 more physical
blocks because the next physical extent (extent 3, above) happens to
cover more than dirblk 2. At this point, the higher-level directory
block walking is completely off the rails of the actual physical
layout of the directory for the respective mapping table.

Update the contiguous dirblock logic to consider the current offset
in the physical extent to avoid issuing directory readahead to
unrelated blocks. Also, update the mapping table advancing code to
consider the current offset within the current dirblock to avoid
advancing the mapping reference too far beyond the dirblock.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25 09:40:40 -07:00
Eric Sandeen
023cc840b4 xfs: handle array index overrun in xfs_dir2_leaf_readbuf()
Carlos had a case where "find" seemed to start spinning
forever and never return.

This was on a filesystem with non-default multi-fsb (8k)
directory blocks, and a fragmented directory with extents
like this:

0:[0,133646,2,0]
1:[2,195888,1,0]
2:[3,195890,1,0]
3:[4,195892,1,0]
4:[5,195894,1,0]
5:[6,195896,1,0]
6:[7,195898,1,0]
7:[8,195900,1,0]
8:[9,195902,1,0]
9:[10,195908,1,0]
10:[11,195910,1,0]
11:[12,195912,1,0]
12:[13,195914,1,0]
...

i.e. the first extent is a contiguous 2-fsb dir block, but
after that it is fragmented into 1 block extents.

At the top of the readdir path, we allocate a mapping array
which (for this filesystem geometry) can hold 10 extents; see
the assignment to map_info->map_size.  During readdir, we are
therefore able to map extents 0 through 9 above into the array
for readahead purposes.  If we count by 2, we see that the last
mapped index (9) is the first block of a 2-fsb directory block.

At the end of xfs_dir2_leaf_readbuf() we have 2 loops to fill
more readahead; the outer loop assumes one full dir block is
processed each loop iteration, and an inner loop that ensures
that this is so by advancing to the next extent until a full
directory block is mapped.

The problem is that this inner loop may step past the last
extent in the mapping array as it tries to reach the end of
the directory block.  This will read garbage for the extent
length, and as a result the loop control variable 'j' may
become corrupted and never fail the loop conditional.

The number of valid mappings we have in our array is stored
in map->map_valid, so stop this inner loop based on that limit.

There is an ASSERT at the top of the outer loop for this
same condition, but we never made it out of the inner loop,
so the ASSERT never fired.

Huge appreciation for Carlos for debugging and isolating
the problem.

Debugged-and-analyzed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Tested-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25 09:40:40 -07:00
Chandan Rajendra
a008c31c7e iomap_dio_rw: Prevent reading file data beyond iomap_dio->i_size
On a ppc64 machine executing overlayfs/019 with xfs as the lower and
upper filesystem causes the following call trace,

WARNING: CPU: 2 PID: 8034 at /root/repos/linux/fs/iomap.c:765 .iomap_dio_actor+0xcc/0x420
Modules linked in:
CPU: 2 PID: 8034 Comm: fsstress Tainted: G             L  4.11.0-rc5-next-20170405 #100
task: c000000631314880 task.stack: c0000003915d4000
NIP: c00000000035a72c LR: c00000000035a6f4 CTR: c00000000035a660
REGS: c0000003915d7570 TRAP: 0700   Tainted: G             L   (4.11.0-rc5-next-20170405)
MSR: 800000000282b032 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI>
  CR: 24004284  XER: 00000000
CFAR: c0000000006f7190 SOFTE: 1
GPR00: c00000000035a6f4 c0000003915d77f0 c0000000015a3f00 000000007c22f600
GPR04: 000000000022d000 0000000000002600 c0000003b2d56360 c0000003915d7960
GPR08: c0000003915d7cd0 0000000000000002 0000000000002600 c000000000521cc0
GPR12: 0000000024004284 c00000000fd80a00 000000004b04ae64 ffffffffffffffff
GPR16: 000000001000ca70 0000000000000000 c0000003b2d56380 c00000000153d2b8
GPR20: 0000000000000010 c0000003bc87bac8 0000000000223000 000000000022f5ff
GPR24: c0000003b2d56360 000000000000000c 0000000000002600 000000000022d000
GPR28: 0000000000000000 c0000003915d7960 c0000003b2d56360 00000000000001ff
NIP [c00000000035a72c] .iomap_dio_actor+0xcc/0x420
LR [c00000000035a6f4] .iomap_dio_actor+0x94/0x420
Call Trace:
[c0000003915d77f0] [c00000000035a6f4] .iomap_dio_actor+0x94/0x420 (unreliable)
[c0000003915d78f0] [c00000000035b9f4] .iomap_apply+0xf4/0x1f0
[c0000003915d79d0] [c00000000035c320] .iomap_dio_rw+0x230/0x420
[c0000003915d7ae0] [c000000000512a14] .xfs_file_dio_aio_read+0x84/0x160
[c0000003915d7b80] [c000000000512d24] .xfs_file_read_iter+0x104/0x130
[c0000003915d7c10] [c0000000002d6234] .__vfs_read+0x114/0x1a0
[c0000003915d7cf0] [c0000000002d7a8c] .vfs_read+0xac/0x1a0
[c0000003915d7d90] [c0000000002d96b8] .SyS_read+0x58/0x100
[c0000003915d7e30] [c00000000000b8e0] system_call+0x38/0xfc
Instruction dump:
78630020 7f831b78 7ffc07b4 7c7ce039 40820360 a13d0018 2f890003 419e0288
2f890004 419e00a0 2f890001 419e02a8 <0fe00000> 3b80fffb 38210100 7f83e378

The above problem can also be recreated on a regular xfs filesystem
using the command,

$ fsstress -d /mnt -l 1000 -n 1000 -p 1000

The reason for the call trace is,
1. When 'reserving' blocks for delayed allocation , XFS reserves more
   blocks (i.e. past file's current EOF) than required. This is done
   because XFS assumes that userspace might write more data and hence
   'reserving' more blocks might lead to the file's new data being
   stored contiguously on disk.
2. The in-memory 'struct xfs_bmbt_irec' mapping the file's last extent would
   then cover the prealloc-ed EOF blocks in addition to the regular blocks.
3. When flushing the dirty blocks to disk, we only flush data till the
   file's EOF. But before writing out the dirty data, we allocate blocks
   on the disk for holding the file's new data. This allocation includes
   the blocks that are part of the 'prealloc EOF blocks'.
4. Later, when the last reference to the inode is being closed, XFS frees the
   unused 'prealloc EOF blocks' in xfs_inactive().

In step 3 above, When allocating space on disk for the delayed allocation
range, the space allocator might sometimes allocate less blocks than
required. If such an allocation ends right at the current EOF of the
file, We will not be able to clear the "delayed allocation" flag for the
'prealloc EOF blocks', since we won't have dirty buffer heads associated
with that range of the file.

In such a situation if a Direct I/O read operation is performed on file
range [X, Y] (where X < EOF and Y > EOF), we flush dirty data in the
range [X, Y] and invalidate page cache for that range (Refer to
iomap_dio_rw()). Later for performing the Direct I/O read, XFS obtains
the extent items (which are still cached in memory) for the file
range. When doing so we are not supposed to get an extent item with
IOMAP_DELALLOC flag set, since the previous "flush" operation should
have converted any delayed allocation data in the range [X, Y]. Hence we
end up hitting a WARN_ON_ONCE(1) statement in iomap_dio_actor().

This commit fixes the bug by preventing the read operation from going
beyond iomap_dio->i_size.

Reported-by: Santhosh G <santhog4@linux.vnet.ibm.com>
Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25 09:40:40 -07:00
Christoph Hellwig
7590632a33 xfs: remove bmap block allocation retries
Now that reflink operations don't set the firstblock value we don't
need the workarounds for non-NULL firstblock values without a prior
allocation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25 09:40:40 -07:00
Christoph Hellwig
bf8eadbacb xfs: remove xfs_bmap_remap_alloc
The main thing that xfs_bmap_remap_alloc does is fixing the AGFL, similar
to what we do in the space allocator.  But the reflink code doesn't touch
the allocation btree unlike the normal space allocator, so we couldn't
care less about the state of the AGFL.

So remove xfs_bmap_remap_alloc and just handle the di_nblocks update in
the caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25 09:40:40 -07:00
Christoph Hellwig
6ebd5a4413 xfs: introduce xfs_bmapi_remap
Add a new helper to be used for reflink extent list additions instead of
funneling them through xfs_bmapi_write and overloading the firstblock
member in struct xfs_bmalloca and struct xfs_alloc_args.

With some small changes to xfs_bmap_remap_alloc this also means we do
not need a xfs_bmalloca structure for this case at all.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25 09:40:39 -07:00
Christoph Hellwig
6d04558f9f xfs: pass individual arguments to xfs_bmap_add_extent_hole_real
For the reflink case we'd much rather pass the required arguments than
faking up a struct xfs_bmalloca.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25 09:40:39 -07:00
Christoph Hellwig
39e07daa46 xfs: remove attr fork handling in xfs_bmap_finish_one
We never do COW operations for the attr fork, so don't pretend we handle
them.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25 09:40:39 -07:00
Christoph Hellwig
52813fb13f xfs: fix integer truncation in xfs_bmap_remap_alloc
bno should be a xfs_fsblock_t, which is 64-bit wides instead of a
xfs_aglock_t, which truncates the value to 32 bits.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2017-04-25 09:40:39 -07:00
Amir Goldstein
4ff33aafd3 fanotify: don't expose EOPENSTALE to userspace
When delivering an event to userspace for a file on an NFS share,
if the file is deleted on server side before user reads the event,
user will not get the event.

If the event queue contained several events, the stale event is
quietly dropped and read() returns to user with events read so far
in the buffer.

If the event queue contains a single stale event or if the stale
event is a permission event, read() returns to user with the kernel
internal error code 518 (EOPENSTALE), which is not a POSIX error code.

Check the internal return value -EOPENSTALE in fanotify_read(), just
the same as it is checked in path_openat() and drop the event in the
cases that it is not already dropped.

This is a reproducer from Marko Rauhamaa:

Just take the example program listed under "man fanotify" ("fantest")
and follow these steps:

    ==============================================================
    NFS Server    NFS Client(1)     NFS Client(2)
    ==============================================================
    # echo foo >/nfsshare/bar.txt
                  # cat /nfsshare/bar.txt
                  foo
                                    # ./fantest /nfsshare
                                    Press enter key to terminate.
                                    Listening for events.
    # rm -f /nfsshare/bar.txt
                  # cat /nfsshare/bar.txt
                                    read: Unknown error 518
                  cat: /nfsshare/bar.txt: Operation not permitted
    ==============================================================

where NFS Client (1) and (2) are two terminal sessions on a single NFS
Client machine.

Reported-by: Marko Rauhamaa <marko.rauhamaa@f-secure.com>
Tested-by: Marko Rauhamaa <marko.rauhamaa@f-secure.com>
Cc: <linux-api@vger.kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-25 15:48:06 +02:00
Jan Kara
61a929870d ext4: Improve comments in ext4_quota_{on|off}()
Improve comments in ext4_quota_{on|off}() to explain that returning
success despite ext4_journal_start() failing is deliberate.

Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-24 16:49:16 +02:00
Dan Carpenter
f4edce1afd fsnotify: remove a stray unlock
We recently shifted this code around, so we're no longer holding the
lock on this path.

Fixes: 755b5bc681 ("fsnotify: Remove indirection from mark list addition")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-24 16:41:28 +02:00
Fabian Frederick
5c26eac43a udf: use kmap_atomic for memcpy copying
Use temporary mapping for memory copying operations.

To avoid any sleeping problem,

mark_inode_dirty(inode) was moved after kunmap() in
udf_adinicb_readpage()

down_write(&iinfo->i_data_sem) set before kmap_atomic()
in udf_expand_file_adinicb()

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-24 16:28:02 +02:00
Fabian Frederick
6ff6b2b329 udf: use octal for permissions
According to commit f90774e1fd ("checkpatch: look for symbolic
permissions and suggest octal instead")

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Jan Kara <jack@suse.cz>
2017-04-24 16:27:52 +02:00
Linus Torvalds
9ea33c44fb This pull request contains fixes for issues in both UBI and UBIFS:
- More O_TMPFILE fallout
 - RENAME_WHITEOUT regression due to a mis-merge
 - Memory leak in ubifs_mknod()
 - Power-cut problem in UBI's update volume feature
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABAgAGBQJY/HTyAAoJEEtJtSqsAOnW7bYP/jYNCRKUls/FhWcF02h/tuPz
 gYemObWxoEvPqy8laNwyYEkzct4ADsz4C2RCGxygJks2qYCzuXy+IQd/gQlJNIhM
 B9sNa1se1ntt71vwdwgRUgVRMGjknJ6rY8ZP2actuuzztdJE9fGNKhEarMVyQQ61
 f7+AvmFc0CHTEBi5hcHX/Rtdqvzt4OOJl5pLo+VT2RRegxb3oHgl1UTD+gqkdkMg
 WOxvdriSQPi8HHz1clCN36lG2H05sKLd7YEhSWKf6Pt0YYJmF8ckxLERzRclpV4D
 XXqUEwhrovJC+oRQbXS+DaJfjsML9y8GB28izC5c+H8/teGKv2n5NXtjsyWNCfnx
 gmZT1S24TnTsofbmd1GoemMMD4Vlt9EDq2+QVFCaPgK9eWkDQ/xlhjrfPMsuz+YY
 uCWgyzsJyvRewiOlkN1jcTua6f1nHt0SP7alNy0XUQF08jwYMaaf5IJAbEHP8Ink
 seIXNObiGgHl+bBsTQ4SU9PUIJIKIar0TSQf8HXV+/Go3XDz4ZTSjc1KVXs5Hg97
 irUPllIKmjb5Doiln12A5zePmgT2uaChFG21+6SwEWkFYrD3XIxBORxkEF6E7FJU
 asRk2oBzNYezz+qpjn3NeZnIV7Uc9B8bBZpqFrsys2KRAJIZRtdhpAF+jIbrS+7N
 JrakPr2I/FKnsNU/0mZh
 =ER2q
 -----END PGP SIGNATURE-----

Merge tag 'upstream-4.11-rc7' of git://git.infradead.org/linux-ubifs

Pull UBI/UBIFS fixes from Richard Weinberger:
 "This contains fixes for issues in both UBI and UBIFS:

   - more O_TMPFILE fallout

   - RENAME_WHITEOUT regression due to a mis-merge

   - memory leak in ubifs_mknod()

   - power-cut problem in UBI's update volume feature"

* tag 'upstream-4.11-rc7' of git://git.infradead.org/linux-ubifs:
  ubifs: Fix O_TMPFILE corner case in ubifs_link()
  ubifs: Fix RENAME_WHITEOUT support
  ubifs: Fix debug messages for an invalid filename in ubifs_dump_inode
  ubifs: Fix debug messages for an invalid filename in ubifs_dump_node
  ubifs: Remove filename from debug messages in ubifs_readdir
  ubifs: Fix memory leak in error path in ubifs_mknod
  ubi/upd: Always flush after prepared for an update
2017-04-23 16:49:16 -07:00
David S. Miller
fb796707d7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Both conflict were simple overlapping changes.

In the kaweth case, Eric Dumazet's skb_cow() bug fix overlapped the
conversion of the driver in net-next to use in-netdev stats.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-04-21 20:23:53 -07:00
Linus Torvalds
94836ecf1e Fix a 4.11 regression that triggers a BUG() on an attempt to use an
unsupported NFSv4 compound op.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJY+mlFAAoJECebzXlCjuG+eCMP/2r/lXdx91Kibg4/D/9OB+LW
 YmcyUH83iRq/F/V88yEMz5XMUTZ/UkZ3iSGSDObjhBK7oswAtZhC1KiWOiuI9ry4
 tcqZM3se7FQNpbwm2wvhtRmbjd2YOZfS3o32Uorhc/beMZzIzZhUb15pp5o3hOIy
 4aoxeUs6//B51fil4Ur0oeLI1EqUd3t+1z0ytyRRzSfORHF/QEiTTXSmmKK+K2XA
 t35XXYJE3SCLxQpWW/CHT/IODv6nsaM0rgN/kkmZUxAyaw2GXT7U6+mzkz3tSL8s
 AEqNUptCn2M3iAv/e6wgpO3BS49HR+Gdg+JxPilCcs+GniWu4zH3sE0HFjTuqQ23
 7u/NB4lkDqFs9pO1x4D5Y/69yXbs6jUHn+NMnhTZS85pTwU5o3RNhzwegoOfIp/3
 CY22R92ZLHEqb1/VTc7rpeuj3d9r7Swg3U8X7kDnn7XtAj2HlMqaHZfw2zjTIYBO
 wqKnikTDMHuzZ7MtegLld4oayNZO4hN7Vi01Npzgr3nkwvpFpz8Jbvsn8mMHvUq8
 ZYWoMfz5rbH20N7cnFUAtB5fm67U9smjb01uzx3m6ylh1SqTWxAoZPWluGJTv0IA
 JKEM4g4PSNOizhib6KI2PW0SngLkCWKLy463fDS2Mx/M7DBZf0j4697TDK21R5NB
 Pny/pwJT4mjLwePc/qt2
 =mSBF
 -----END PGP SIGNATURE-----

Merge tag 'nfsd-4.11-2' of git://linux-nfs.org/~bfields/linux

Pull nfsd bugfix from Bruce Fields:
 "Fix a 4.11 regression that triggers a BUG() on an attempt to use an
  unsupported NFSv4 compound op"

* tag 'nfsd-4.11-2' of git://linux-nfs.org/~bfields/linux:
  nfsd: fix oops on unsupported operation
2017-04-21 16:37:48 -07:00
Ilya Dryomov
19b7ccf865 block: get rid of blk_integrity_revalidate()
Commit 25520d55cd ("block: Inline blk_integrity in struct gendisk")
introduced blk_integrity_revalidate(), which seems to assume ownership
of the stable pages flag and unilaterally clears it if no blk_integrity
profile is registered:

    if (bi->profile)
            disk->queue->backing_dev_info->capabilities |=
                    BDI_CAP_STABLE_WRITES;
    else
            disk->queue->backing_dev_info->capabilities &=
                    ~BDI_CAP_STABLE_WRITES;

It's called from revalidate_disk() and rescan_partitions(), making it
impossible to enable stable pages for drivers that support partitions
and don't use blk_integrity: while the call in revalidate_disk() can be
trivially worked around (see zram, which doesn't support partitions and
hence gets away with zram_revalidate_disk()), rescan_partitions() can
be triggered from userspace at any time.  This breaks rbd, where the
ceph messenger is responsible for generating/verifying CRCs.

Since blk_integrity_{un,}register() "must" be used for (un)registering
the integrity profile with the block layer, move BDI_CAP_STABLE_WRITES
setting there.  This way drivers that call blk_integrity_register() and
use integrity infrastructure won't interfere with drivers that don't
but still want stable pages.

Fixes: 25520d55cd ("block: Inline blk_integrity in struct gendisk")
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 4.4+, needs backporting
Tested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-21 14:17:27 -06:00
Al Viro
c63ed807d1 orangefs: use iov_iter_revert()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-04-21 13:57:32 -04:00
Dan Williams
b0686260fe dax: introduce dax_direct_access()
Replace bdev_direct_access() with dax_direct_access() that uses
dax_device and dax_operations instead of a block_device and
block_device_operations for dax. Once all consumers of the old api have
been converted bdev_direct_access() will be deleted.

Given that block device partitioning decisions can cause dax page
alignment constraints to be violated this also introduces the
bdev_dax_pgoff() helper. It handles calculating a logical pgoff relative
to the dax_device and also checks for page alignment.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-04-20 11:57:52 -07:00
Dan Williams
d8f07aee3f block: kill bdev_dax_capable()
This is leftover dead code that has since been replaced by
bdev_dax_supported().

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2017-04-20 11:57:52 -07:00
Christoph Hellwig
17d5363b83 scsi: introduce a result field in struct scsi_request
This passes on the scsi_cmnd result field to users of passthrough
requests.  Currently we abuse req->errors for this purpose, but that
field will go away in its current form.

Note that the old IDE code abuses the errors field in very creative
ways and stores all kinds of different values in it.  I didn't dare
to touch this magic, so the abuses are brought forward 1:1.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-20 12:16:10 -06:00