Commit Graph

663222 Commits

Author SHA1 Message Date
Eric Biggers
7b4cc9787f ext4: evict inline data when writing to memory map
Currently the case of writing via mmap to a file with inline data is not
handled.  This is maybe a rare case since it requires a writable memory
map of a very small file, but it is trivial to trigger with on
inline_data filesystem, and it causes the
'BUG_ON(ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA));' in
ext4_writepages() to be hit:

    mkfs.ext4 -O inline_data /dev/vdb
    mount /dev/vdb /mnt
    xfs_io -f /mnt/file \
	-c 'pwrite 0 1' \
	-c 'mmap -w 0 1m' \
	-c 'mwrite 0 1' \
	-c 'fsync'

	kernel BUG at fs/ext4/inode.c:2723!
	invalid opcode: 0000 [#1] SMP
	CPU: 1 PID: 2532 Comm: xfs_io Not tainted 4.11.0-rc1-xfstests-00301-g071d9acf3d1f #633
	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-20170228_101828-anatol 04/01/2014
	task: ffff88003d3a8040 task.stack: ffffc90000300000
	RIP: 0010:ext4_writepages+0xc89/0xf8a
	RSP: 0018:ffffc90000303ca0 EFLAGS: 00010283
	RAX: 0000028410000000 RBX: ffff8800383fa3b0 RCX: ffffffff812afcdc
	RDX: 00000a9d00000246 RSI: ffffffff81e660e0 RDI: 0000000000000246
	RBP: ffffc90000303dc0 R08: 0000000000000002 R09: 869618e8f99b4fa5
	R10: 00000000852287a2 R11: 00000000a03b49f4 R12: ffff88003808e698
	R13: 0000000000000000 R14: 7fffffffffffffff R15: 7fffffffffffffff
	FS:  00007fd3e53094c0(0000) GS:ffff88003e400000(0000) knlGS:0000000000000000
	CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
	CR2: 00007fd3e4c51000 CR3: 000000003d554000 CR4: 00000000003406e0
	Call Trace:
	 ? _raw_spin_unlock+0x27/0x2a
	 ? kvm_clock_read+0x1e/0x20
	 do_writepages+0x23/0x2c
	 ? do_writepages+0x23/0x2c
	 __filemap_fdatawrite_range+0x80/0x87
	 filemap_write_and_wait_range+0x67/0x8c
	 ext4_sync_file+0x20e/0x472
	 vfs_fsync_range+0x8e/0x9f
	 ? syscall_trace_enter+0x25b/0x2d0
	 vfs_fsync+0x1c/0x1e
	 do_fsync+0x31/0x4a
	 SyS_fsync+0x10/0x14
	 do_syscall_64+0x69/0x131
	 entry_SYSCALL64_slow_path+0x25/0x25

We could try to be smart and keep the inline data in this case, or at
least support delayed allocation when allocating the block, but these
solutions would be more complicated and don't seem worthwhile given how
rare this case seems to be.  So just fix the bug by calling
ext4_convert_inline_data() when we're asked to make a page writable, so
that any inline data gets evicted, with the block allocated immediately.

Reported-by: Nick Alcock <nick.alcock@oracle.com>
Cc: stable@vger.kernel.org
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-04-30 00:10:50 -04:00
Eric Biggers
6ba644b9fd ext4: remove ext4_xattr_check_entry()
ext4_xattr_check_entry() was redundant with validation of the full xattr
entries list in ext4_xattr_check_entries(), which all callers also did.
ext4_xattr_check_entry() also didn't actually do correct validation;
specifically, it never checked that the value doesn't overlap the xattr
names, nor did it account for padding when checking whether the xattr
value overflows the available space.  So remove it to eliminate any
potential confusion.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-04-30 00:01:02 -04:00
Eric Biggers
2c4f992337 ext4: rename ext4_xattr_check_names() to ext4_xattr_check_entries()
ext4_xattr_check_names() actually validates both the xattr names and
values, not just the names.  So rename it to ext4_xattr_check_entries()
to avoid confusion.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-04-29 23:56:52 -04:00
Eric Biggers
ba7ea1d8f4 ext4: merge ext4_xattr_list() into ext4_listxattr()
There's no difference between ext4_xattr_list() and ext4_listxattr(), so
merge them together and just have ext4_listxattr().  Some years ago they
took different arguments, but that's no longer the case.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-04-29 23:53:17 -04:00
Eric Biggers
d600618673 ext4: constify static data that is never modified
Constify static data in ext4 that is never (intentionally) modified so
that it is placed in .rodata and benefits from memory protection.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-04-29 23:47:50 -04:00
Eric Biggers
1bc0af600b ext4: trim return value and 'dir' argument from ext4_insert_dentry()
In the initial implementation of ext4 encryption, the filename was
encrypted in ext4_insert_dentry(), which could fail and also required
access to the 'dir' inode.  Since then ext4 filename encryption has been
changed to encrypt the filename earlier, so we can revert the additions
to ext4_insert_dentry().

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-04-29 23:27:26 -04:00
Jan Kara
5052b069ac jbd2: fix dbench4 performance regression for 'nobarrier' mounts
Commit b685d3d65a "block: treat REQ_FUA and REQ_PREFLUSH as
synchronous" removed REQ_SYNC flag from WRITE_FUA implementation. Since
JBD2 strips REQ_FUA and REQ_FLUSH flags from submitted IO when the
filesystem is mounted with nobarrier mount option, journal superblock
writes ended up being async writes after this patch and that caused
heavy performance regression for dbench4 benchmark with high number of
processes. In my test setup with HP RAID array with non-volatile write
cache and 32 GB ram, dbench4 runs with 8 processes regressed by ~25%.

Fix the problem by making sure journal superblock writes are always
treated as synchronous since they generally block progress of the
journalling machinery and thus the whole filesystem.

Fixes: b685d3d65a
CC: stable@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-04-29 21:07:30 -04:00
Jan Kara
c52c47e4b4 jbd2: Fix lockdep splat with generic/270 test
I've hit a lockdep splat with generic/270 test complaining that:

3216.fsstress.b/3533 is trying to acquire lock:
 (jbd2_handle){++++..}, at: [<ffffffff813152e0>] jbd2_log_wait_commit+0x0/0x150

but task is already holding lock:
 (jbd2_handle){++++..}, at: [<ffffffff8130bd3b>] start_this_handle+0x35b/0x850

The underlying problem is that jbd2_journal_force_commit_nested()
(called from ext4_should_retry_alloc()) may get called while a
transaction handle is started. In such case it takes care to not wait
for commit of the running transaction (which would deadlock) but only
for a commit of a transaction that is already committing (which is safe
as that doesn't wait for any filesystem locks).

In fact there are also other callers of jbd2_log_wait_commit() that take
care to pass tid of a transaction that is already committing and for
those cases, the lockdep instrumentation is too restrictive and leading
to false positive reports. Fix the problem by calling
jbd2_might_wait_for_commit() from jbd2_log_wait_commit() only if the
transaction isn't already committing.

Fixes: 1eaa566d36
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-04-29 20:12:16 -04:00
Theodore Ts'o
80a2ea9f85 mm: retry writepages() on ENOMEM when doing an data integrity writeback
Currently, file system's writepages() function must not fail with an
ENOMEM, since if they do, it's possible for buffered data to be lost.
This is because on a data integrity writeback writepages() gets called
but once, and if it returns ENOMEM, if you're lucky the error will get
reflected back to the userspace process calling fsync().  If you
aren't lucky, the user is unmounting the file system, and the dirty
pages will simply be lost.

For this reason, file system code generally will use GFP_NOFS, and in
some cases, will retry the allocation in a loop, on the theory that
"kernel livelocks are temporary; data loss is forever".
Unfortunately, this can indeed cause livelocks, since inside the
writepages() call, the file system is holding various mutexes, and
these mutexes may prevent the OOM killer from killing its targetted
victim if it is also holding on to those mutexes.

A better solution would be to allow writepages() to call the memory
allocator with flags that give greater latitude to the allocator to
fail, and then release its locks and return ENOMEM, and in the case of
background writeback, the writes can be retried at a later time.  In
the case of data-integrity writeback retry after waiting a brief
amount of time.

Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-04-28 09:51:54 -04:00
Linus Torvalds
39da7c509a Linux 4.11-rc6 2017-04-09 09:49:44 -07:00
Linus Torvalds
84ced7fd06 Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fixes from Steve French:
 "This is a set of CIFS/SMB3 fixes for stable.

  There is another set of four SMB3 reconnect fixes for stable in
  progress but they are still being reviewed/tested, so didn't want to
  wait any longer to send these five below"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
  Reset TreeId to zero on SMB2 TREE_CONNECT
  CIFS: Fix build failure with smb2
  Introduce cifs_copy_file_range()
  SMB3: Rename clone_range to copychunk_range
  Handle mismatched open calls
2017-04-09 09:10:02 -07:00
Linus Torvalds
462e9a355e Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
 "A number of ARM fixes:

   - prevent oopses caused by dma_get_sgtable() and declared DMA
     coherent memory

   - fix boot failure on nommu caused by ID_PFR1 access

   - a number of kprobes fixes from Jon Medhurst and Masami Hiramatsu"

* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: 8665/1: nommu: access ID_PFR1 only if CPUID scheme
  ARM: dma-mapping: disallow dma_get_sgtable() for non-kernel managed memory
  arm: kprobes: Align stack to 8-bytes in test code
  arm: kprobes: Fix the return address of multiple kretprobes
  arm: kprobes: Skip single-stepping in recursing path if possible
  arm: kprobes: Allow to handle reentered kprobe on single-stepping
2017-04-09 09:05:25 -07:00
Linus Torvalds
5b50be743f Driver core fixes for 4.11-rc6
Here are 3 small fixes for 4.11-rc6.  One resolves a reported issue with
 sysfs files that NeilBrown found, one is a documenatation fix for the
 stable kernel rules, and the last is a small MAINTAINERS file update for
 kernfs.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWOnrMw8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+yk/JQCfQKjOpGDAR9Hs6u4YQ4hJrAHFneYAn1F4MLDW
 3b0ZMnlZHkDq834UwKnB
 =iiei
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
 "Here are 3 small fixes for 4.11-rc6.

  One resolves a reported issue with sysfs files that NeilBrown found,
  one is a documenatation fix for the stable kernel rules, and the last
  is a small MAINTAINERS file update for kernfs"

* tag 'driver-core-4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  MAINTAINERS: separate out kernfs maintainership
  sysfs: be careful of error returns from ops->show()
  Documentation: stable-kernel-rules: fix stable-tag format
2017-04-09 09:03:51 -07:00
Linus Torvalds
62e1fd08ed Staging/IIO fixes for 4.11-rc6
Here are a number of small IIO and staging driver fixes for 4.11-rc6.
 Nothing big here, just iio fixes for reported issues, and an ashmem fix
 for a very old bug that has been reported by a number of Android
 vendors.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWOnsZA8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ymb4ACfSnGU4ndDTKoyTaJ7B/ZO/RF5lZUAni9d3kYF
 3Ztp0ssmF8PBNvQhyIs0
 =aeZf
 -----END PGP SIGNATURE-----

Merge tag 'staging-4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging/IIO driver rfixes from Greg KH:
 "Here are a number of small IIO and staging driver fixes for 4.11-rc6.
  Nothing big here, just iio fixes for reported issues, and an ashmem
  fix for a very old bug that has been reported by a number of Android
  vendors"

* tag 'staging-4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: android: ashmem: lseek failed due to no FMODE_LSEEK.
  iio: hid-sensor-attributes: Fix sensor property setting failure.
  iio: accel: hid-sensor-accel-3d: Fix duplicate scan index error
  iio: core: Fix IIO_VAL_FRACTIONAL_LOG2 for negative values
  iio: st_pressure: initialize lps22hb bootime
  iio: bmg160: reset chip when probing
  iio: cros_ec_sensors: Fix return value to get raw and calibbias data.
2017-04-09 09:02:31 -07:00
Linus Torvalds
2a610b8aa8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull VFS fixes from Al Viro:
 "statx followup fixes and a fix for stack-smashing on alpha"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  alpha: fix stack smashing in old_adjtimex(2)
  statx: Include a mask for stx_attributes in struct statx
  statx: Reserve the top bit of the mask for future struct expansion
  xfs: report crtime and attribute flags to statx
  ext4: Add statx support
  statx: optimize copy of struct statx to userspace
  statx: remove incorrect part of vfs_statx() comment
  statx: reject unknown flags when using NULL path
  Documentation/filesystems: fix documentation for ->getattr()
2017-04-09 08:26:21 -07:00
Linus Torvalds
78d91a75b4 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "Here's a pull request for 4.11-rc, fixing a set of issues mostly
  centered around the new scheduling framework. These have been brewing
  for a while, but split up into what we absolutely need in 4.11, and
  what we can defer until 4.12. These are well tested, on both single
  queue and multiqueue setups, and with and without shared tags. They
  fix several hangs that have happened in testing.

  This is obviously larger than I would have preferred at this point in
  time, but I don't think we can shave much off this and still get the
  desired results.

  In detail, this pull request contains:

   - a set of five fixes for NVMe, mostly from Christoph and one from
     Roland.

   - a series from Bart, fixing issues with dm-mq and SCSI shared tags
     and scheduling. Note that one of those patches commit messages may
     read like an optimization, but it is in fact an important fix for
     queue restarts in particular.

   - a series from Omar, most importantly fixing a hang with multiple
     hardware queues when we fail to get a driver tag. Another important
     fix in there is for resizing hardware queues, which nbd does when
     handling multiple sockets for one connection.

   - fixing an imbalance in putting the ctx for hctx request allocations
     from Minchan"

* 'for-linus' of git://git.kernel.dk/linux-block:
  blk-mq: Restart a single queue if tag sets are shared
  dm rq: Avoid that request processing stalls sporadically
  scsi: Avoid that SCSI queues get stuck
  blk-mq: Introduce blk_mq_delay_run_hw_queue()
  blk-mq: remap queues when adding/removing hardware queues
  blk-mq-sched: fix crash in switch error path
  blk-mq-sched: set up scheduler tags when bringing up new queues
  blk-mq-sched: refactor scheduler initialization
  blk-mq: use the right hctx when getting a driver tag fails
  nvmet: fix byte swap in nvmet_parse_io_cmd
  nvmet: fix byte swap in nvmet_execute_write_zeroes
  nvmet: add missing byte swap in nvmet_get_smart_log
  nvme: add missing byte swap in nvme_setup_discard
  nvme: Correct NVMF enum values to match NVMe-oF rev 1.0
  block: do not put mq context in blk_mq_alloc_request_hctx
2017-04-08 11:56:58 -07:00
Linus Torvalds
c3df1c7c36 Late pin control fix for v4.11:
An issue was detected with pin control hos on the Freescale i.MX after
 the refactorings for more general group and function handling. We now
 have the proper fix for this.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJY6LkVAAoJEEEQszewGV1znr0P/17ltjCxoR9qYkMsreCs6FIk
 BSCx2UEmYt03WKizyj1M1/YKP2NYcngp8TXsRsMyi7vqMjVoL1BsPo8BjFGNI7lq
 znLyUWuP3xo9Y/naagxkfLw5TbfNF4hyL0JBchvg6ox1Kt7Z47Sed7KDXtB5QQdJ
 WbU4Hdo6ZG/nvl3LAc1wivF3qtnBsxIzx6CMiR2dyiOmLGADHj7jiJ70BuRMyTlo
 4no0Cfm93lnPo1ccNMVZY2Rqt09XhwPppewL7j2IqOin/Kr88qWKwdOheCu/Ojsp
 GJfTgKjVpieKW2PjkIiDDSiTKKkUvVmzEQz+qqXozjQSwwKtJ106xZ8fW+d5xFeY
 EJ3jsQtKdmI3q7M0mbYpfK0vM9C1MKMg71CJt8pvbtg2NXfAfLsA9BioVOGKrOua
 upy6RCMDhoBRh4jRjd5DcJPKRq45m/toVSZ+tfS1Nur2k3tXd41CI3y6D+wUlz95
 oq8QW2bWsC52vLXS6qywJkUM7CQiBs61FIryf84YC7mE4AqRFJpCZfBqrUYLkctN
 5OHF++wu6tEXYfgR6rtWY+c26xgc6PK/rALtYvzDC4o72Z0xQLlQqFnf6hGAp3Dl
 eosuW5TUvnlFUEMF3CEQwVHj3awpgdo6X4UnYDIxZDRU4R/vODH46s1H719TMIWx
 ZBztLllUHpn57LVRvudT
 =06og
 -----END PGP SIGNATURE-----

Merge tag 'pinctrl-v4.11-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pin control fix from Linus Walleij:
 "This late fix for pin control is hopefully the last I send this cycle.

  The problem was detected early in the v4.11 release cycle and there
  has been some back and forth on how to solve it. Sadly the proper fix
  arrives late, but at least not too late.

  An issue was detected with pin control on the Freescale i.MX after the
  refactorings for more general group and function handling.

  We now have the proper fix for this"

* tag 'pinctrl-v4.11-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: core: Fix pinctrl_register_and_init() with pinctrl_enable()
2017-04-08 11:43:38 -07:00
Linus Torvalds
894ca30cf6 powerpc fixes for 4.11 #7
Headed to stable:
  - disable HFSCR[TM] if TM is not supported, fixes a potential host kernel crash
    triggered by a hostile guest, but only in configurations that no one uses
  - don't try to fix up misaligned load-with-reservation instructions
  - fix flush_(d|i)cache_range() called from modules on little endian kernels
  - add missing global TLB invalidate if cxl is active
  - fix missing preempt_disable() in crc32c-vpmsum
 
 And a fix for selftests build changes that went in this release:
  - selftests/powerpc: Fix standalone powerpc build
 
 Thanks to:
   Benjamin Herrenschmidt, Frederic Barrat, Oliver O'Halloran, Paul Mackerras.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJY6LIKAAoJEFHr6jzI4aWAhfcQAKORHx/tJf9w8KqcfSfKfeEL
 O8cZEl5/N3ArNXVM5J5QK5KnMVHnoWWR3FWYwntOjt3RJywjJYJ02YvhOVvt4q+M
 YinRS34KzAhnT1f526zx97v0BGqi//UJamrcFBUBTd4rLuHGbol7fdtWHVrsMYa0
 KWQ+ooPLEpGDk4I3sDz37yeJBQXVpyhC/UF8vzHpvHGPvIQ8Dw8rfWwOZ0HooJuZ
 ewKdkeIsYF8SrM461c1GhOI0VXB0q+CMn9mzIaEKMuZMhHDKyiaM5rm8mWXapzcT
 HsCQKlF9X9YHAbhbSbz9DGvNCEYaW7T4vnudSNHjQaAJlA4HsmeRwWXy4+zqZuPc
 rIbRIFZAyV3wYowN7j3P6Se3lLBDMmlHZvVkygJnwoaR4rmoujePGwdAv8ZH4Udn
 hrbieC41HKVxcm5t3whIDOcHmxaAo1MDqmrVhyxJSjgnkdBtN/gnZXvHDb0VeOJV
 9wFGGE8WvMXnTKEcjM2l+a14CuOrV/wRbHQ1B1O0Kfk613cPrukMYab6eLPqyJzF
 lmkCm1o46bib5oBOmvlqK+5oVuwNyfHmJSzvL+VOylhLVbJPmFJUhHQFssCvsTUf
 k36ZAUxH4fbz1TzAPipXl+wrkE/yzthGmA9FTC9hLkYE/rzvrZt9IKowFw1mq5n/
 2zFabXQBl5JBQ4hdL54f
 =bTuf
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-4.11-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Some more powerpc fixes for 4.11:

  Headed to stable:

   - disable HFSCR[TM] if TM is not supported, fixes a potential host
     kernel crash triggered by a hostile guest, but only in
     configurations that no one uses

   - don't try to fix up misaligned load-with-reservation instructions

   - fix flush_(d|i)cache_range() called from modules on little endian
     kernels

   - add missing global TLB invalidate if cxl is active

   - fix missing preempt_disable() in crc32c-vpmsum

  And a fix for selftests build changes that went in this release:

   - selftests/powerpc: Fix standalone powerpc build

  Thanks to: Benjamin Herrenschmidt, Frederic Barrat, Oliver O'Halloran,
  Paul Mackerras"

* tag 'powerpc-4.11-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/crypto/crc32c-vpmsum: Fix missing preempt_disable()
  powerpc/mm: Add missing global TLB invalidate if cxl is active
  powerpc/64: Fix flush_(d|i)cache_range() called from modules
  powerpc: Don't try to fix up misaligned load-with-reservation instructions
  powerpc: Disable HFSCR[TM] if TM is not supported
  selftests/powerpc: Fix standalone powerpc build
2017-04-08 11:06:12 -07:00
Chris Salls
cf01fb9985 mm/mempolicy.c: fix error handling in set_mempolicy and mbind.
In the case that compat_get_bitmap fails we do not want to copy the
bitmap to the user as it will contain uninitialized stack data and leak
sensitive data.

Signed-off-by: Chris Salls <salls@cs.ucsb.edu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-08 10:57:55 -07:00
Liping Zhang
425fffd886 sysctl: report EINVAL if value is larger than UINT_MAX for proc_douintvec
Currently, inputting the following command will succeed but actually the
value will be truncated:

  # echo 0x12ffffffff > /proc/sys/net/ipv4/tcp_notsent_lowat

This is not friendly to the user, so instead, we should report error
when the value is larger than UINT_MAX.

Fixes: e7d316a02f ("sysctl: handle error writing UINT_MAX to u32 fields")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Cc: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-08 10:27:40 -07:00
Tejun Heo
27f395b857 MAINTAINERS: separate out kernfs maintainership
Separate out kernfs from driver core and add myself as a
co-maintainer.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-08 18:15:32 +02:00
NeilBrown
c8a139d001 sysfs: be careful of error returns from ops->show()
ops->show() can return a negative error code.
Commit 65da3484d9 ("sysfs: correctly handle short reads on PREALLOC attrs.")
(in v4.4) caused this to be stored in an unsigned 'size_t' variable, so errors
would look like large numbers.
As a result, if an error is returned, sysfs_kf_read() will return the
value of 'count', typically 4096.

Commit 17d0774f80 ("sysfs: correctly handle read offset on PREALLOC attrs")
(in v4.8) extended this error to use the unsigned large 'len' as a size for
memmove().
Consequently, if ->show returns an error, then the first read() on the
sysfs file will return 4096 and could return uninitialized memory to
user-space.
If the application performs a subsequent read, this will trigger a memmove()
with extremely large count, and is likely to crash the machine is bizarre ways.

This bug can currently only be triggered by reading from an md
sysfs attribute declared with __ATTR_PREALLOC() during the
brief period between when mddev_put() deletes an mddev from
the ->all_mddevs list, and when mddev_delayed_delete() - which is
scheduled on a workqueue - completes.
Before this, an error won't be returned by the ->show()
After this, the ->show() won't be called.

I can reproduce it reliably only by putting delay like
	usleep_range(500000,700000);
early in mddev_delayed_delete(). Then after creating an
md device md0 run
  echo clear > /sys/block/md0/md/array_state; cat /sys/block/md0/md/array_state

The bug can be triggered without the usleep.

Fixes: 65da3484d9 ("sysfs: correctly handle short reads on PREALLOC attrs.")
Fixes: 17d0774f80 ("sysfs: correctly handle read offset on PREALLOC attrs")
Cc: stable@vger.kernel.org
Signed-off-by: NeilBrown <neilb@suse.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-08 17:33:32 +02:00
Johan Hovold
cf903e9d3a Documentation: stable-kernel-rules: fix stable-tag format
A patch documenting how to specify which kernels a particular fix should
be backported to (seemingly) inadvertently added a minus sign after the
kernel version. This particular stable-tag format had never been used
prior to this patch, and was neither present when the patch in question
was first submitted (it was added in v2 without any comment).

Drop the minus sign to avoid any confusion.

Fixes: fdc81b7910 ("stable_kernel_rules: Add clause about specification of kernel versions to patch.")
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-08 17:33:31 +02:00
Shuxiao Zhang
97fbfef6bd staging: android: ashmem: lseek failed due to no FMODE_LSEEK.
vfs_llseek will check whether the file mode has
FMODE_LSEEK, no return failure. But ashmem can be
lseek, so add FMODE_LSEEK to ashmem file.

Comment From Greg Hackmann:
	ashmem_llseek() passes the llseek() call through to the backing
	shmem file.  91360b02ab ("ashmem: use vfs_llseek()") changed
	this from directly calling the file's llseek() op into a VFS
	layer call.  This also adds a check for the FMODE_LSEEK bit, so
	without that bit ashmem_llseek() now always fails with -ESPIPE.

Fixes: 91360b02ab ("ashmem: use vfs_llseek()")
Signed-off-by: Shuxiao Zhang <zhangshuxiao@xiaomi.com>
Tested-by: Greg Hackmann <ghackmann@google.com>
Cc: stable <stable@vger.kernel.org> # 3.18+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-08 12:13:11 +02:00
Linus Torvalds
8b65bb57d8 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc
Pull sparc fixes from David Miller:
 "Several fixes here, mostly having to due with either build errors or
  memory corruptions depending upon whether you have THP enabled or not"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
  sparc: remove unused wp_works_ok macro
  sparc32: Export vac_cache_size to fix build error
  sparc64: Fix memory corruption when THP is enabled
  sparc64: Fix kernel panic due to erroneous #ifdef surrounding pmd_write()
  arch/sparc: Avoid DCTI Couples
  sparc64: kern_addr_valid regression
  sparc64: Add support for 2G hugepages
  sparc64: Fix size check in huge_pte_alloc
2017-04-08 01:42:05 -07:00
Linus Torvalds
542380a208 KVM fixes for v4.11-rc6
ARM:
  - Fix a problem with GICv3 userspace save/restore
  - Clarify GICv2 userspace save/restore ABI
  - Be more careful in clearing GIC LRs
  - Add missing synchronization primitive to our MMU handling code
 
 PPC:
  - Check for a NULL return from kzalloc
 
 s390:
  - Prevent translation exception errors on valid page tables for the
    instruction-exection-protection support
 
 x86:
  - Fix Page-Modification Logging when running a nested guest
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABCAAGBQJY5/X8AAoJEED/6hsPKofo8hQH/As3CbihZMysaK6JJTx5oMZw
 b3W8p8xVXVu4dKM8WnXa6m5xBDFmOa7eBB+CtT3gP68XnFvMpr/vPmDv6v6i9p8q
 7VyALDqqk2fxDmgHEwuETw9XZyuhdyCz/GaINCdnAJs25wTFOA7r0WEW5W8qRJpA
 9nQirapdJcknymIch1JqeWlYYmbIaFzT8jItfA9QQ7F9mG4pxC8D1k2D56lNYwTf
 FJIgXgkMPe7CPDXmgc/KqT5+iVsc/+SgzP/WdH6bX/007TV71sksxxfz6fIrao0X
 RtcL2WIZTXBdSNrvXflHhCfYgogPgCnYp8AsYTIa+IEijcfteJx7UiET47Ne0Ow=
 =/SPG
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Radim Krčmář:
 "ARM:
   - Fix a problem with GICv3 userspace save/restore
   - Clarify GICv2 userspace save/restore ABI
   - Be more careful in clearing GIC LRs
   - Add missing synchronization primitive to our MMU handling code

  PPC:
   - Check for a NULL return from kzalloc

  s390:
   - Prevent translation exception errors on valid page tables for the
     instruction-exection-protection support

  x86:
   - Fix Page-Modification Logging when running a nested guest"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: PPC: Book3S HV: Check for kmalloc errors in ioctl
  KVM: nVMX: initialize PML fields in vmcs02
  KVM: nVMX: do not leak PML full vmexit to L1
  KVM: arm/arm64: vgic: Fix GICC_PMR uaccess on GICv3 and clarify ABI
  KVM: arm64: Ensure LRs are clear when they should be
  kvm: arm/arm64: Fix locking for kvm_free_stage2_pgd
  KVM: s390: remove change-recording override support
  arm/arm64: KVM: Take mmap_sem in kvm_arch_prepare_memory_region
  arm/arm64: KVM: Take mmap_sem in stage2_unmap_vm
2017-04-08 01:39:43 -07:00
Linus Torvalds
62fedca5ce Merge branch 'stable-4.11' of git://git.infradead.org/users/pcmoore/audit
Pull audit cleanup from Paul Moore:
 "A week later than I had hoped, but as promised, here is the audit
  uninline-fix we talked about during the last audit pull request.

  The patch is slightly different than what we originally discussed as
  it made more sense to keep the audit_signal_info() function in
  auditsc.c rather than move it and bunch of other related
  variables/definitions into audit.c/audit.h.

  At some point in the future I need to look at how the audit code is
  organized across kernel/audit*, I suspect we could do things a bit
  better, but it doesn't seem like a -rc release is a good place for
  that ;)

  Regardless, this patch passes our tests without problem and looks good
  for v4.11"

* 'stable-4.11' of git://git.infradead.org/users/pcmoore/audit:
  audit: move audit_signal_info() into kernel/auditsc.c
2017-04-08 01:37:25 -07:00
Linus Torvalds
56c2997965 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "10 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm: move pcp and lru-pcp draining into single wq
  mailmap: update Yakir Yang email address
  mm, swap_cgroup: reschedule when neeed in swap_cgroup_swapoff()
  dax: fix radix tree insertion race
  mm, thp: fix setting of defer+madvise thp defrag mode
  ptrace: fix PTRACE_LISTEN race corrupting task->state
  vmlinux.lds: add missing VMLINUX_SYMBOL macros
  mm/page_alloc.c: fix print order in show_free_areas()
  userfaultfd: report actual registered features in fdinfo
  mm: fix page_vma_mapped_walk() for ksm pages
2017-04-08 01:35:32 -07:00
Michal Hocko
ce612879dd mm: move pcp and lru-pcp draining into single wq
We currently have 2 specific WQ_RECLAIM workqueues in the mm code.
vmstat_wq for updating pcp stats and lru_add_drain_wq dedicated to drain
per cpu lru caches.  This seems more than necessary because both can run
on a single WQ.  Both do not block on locks requiring a memory
allocation nor perform any allocations themselves.  We will save one
rescuer thread this way.

On the other hand drain_all_pages() queues work on the system wq which
doesn't have rescuer and so this depend on memory allocation (when all
workers are stuck allocating and new ones cannot be created).

Initially we thought this would be more of a theoretical problem but
Hugh Dickins has reported:

: 4.11-rc has been giving me hangs after hours of swapping load.  At
: first they looked like memory leaks ("fork: Cannot allocate memory");
: but for no good reason I happened to do "cat /proc/sys/vm/stat_refresh"
: before looking at /proc/meminfo one time, and the stat_refresh stuck
: in D state, waiting for completion of flush_work like many kworkers.
: kthreadd waiting for completion of flush_work in drain_all_pages().

This worker should be using WQ_RECLAIM as well in order to guarantee a
forward progress.  We can reuse the same one as for lru draining and
vmstat.

Link: http://lkml.kernel.org/r/20170307131751.24936-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@suse.de>
Tested-by: Yang Li <pku.leo@gmail.com>
Tested-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-08 00:47:49 -07:00
Jeffy Chen
cdcf4330d5 mailmap: update Yakir Yang email address
Set current email address to replace previous employers email addresses.

Link: http://lkml.kernel.org/r/1491450722-6633-1-git-send-email-jeffy.chen@rock-chips.com
Signed-off-by: Jeffy Chen <jeffy.chen@rock-chips.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-08 00:47:49 -07:00
David Rientjes
460bcec84e mm, swap_cgroup: reschedule when neeed in swap_cgroup_swapoff()
We got need_resched() warnings in swap_cgroup_swapoff() because
swap_cgroup_ctrl[type].length is particularly large.

Reschedule when needed.

Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1704061315270.80559@chino.kir.corp.google.com
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-08 00:47:49 -07:00
Ross Zwisler
e11f8b7b6c dax: fix radix tree insertion race
While running generic/340 in my test setup I hit the following race.  It
can happen with kernels that support FS DAX PMDs, so v4.10 thru
v4.11-rc5.

Thread 1				Thread 2
--------				--------
dax_iomap_pmd_fault()
  grab_mapping_entry()
    spin_lock_irq()
    get_unlocked_mapping_entry()
    'entry' is NULL, can't call lock_slot()
    spin_unlock_irq()
    radix_tree_preload()
					dax_iomap_pmd_fault()
					  grab_mapping_entry()
					    spin_lock_irq()
					    get_unlocked_mapping_entry()
					    ...
					    lock_slot()
					    spin_unlock_irq()
					  dax_pmd_insert_mapping()
					    <inserts a PMD mapping>
    spin_lock_irq()
    __radix_tree_insert() fails with -EEXIST
    <fall back to 4k fault, and die horribly
     when inserting a 4k entry where a PMD exists>

The issue is that we have to drop mapping->tree_lock while calling
radix_tree_preload(), but since we didn't have a radix tree entry to
lock (unlike in the pmd_downgrade case) we have no protection against
Thread 2 coming along and inserting a PMD at the same index.  For 4k
entries we handled this with a special-case response to -EEXIST coming
from the __radix_tree_insert(), but this doesn't save us for PMDs
because the -EEXIST case can also mean that we collided with a 4k entry
in the radix tree at a different index, but one that is covered by our
PMD range.

So, correctly handle both the 4k and 2M collision cases by explicitly
re-checking the radix tree for an entry at our index once we reacquire
mapping->tree_lock.

This patch has made it through a clean xfstests run with the current
v4.11-rc5 based linux/master, and it also ran generic/340 500 times in a
loop.  It used to fail within the first 10 iterations.

Link: http://lkml.kernel.org/r/20170406212944.2866-1-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: <stable@vger.kernel.org>    [4.10+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-08 00:47:49 -07:00
David Rientjes
4fad7fb6b0 mm, thp: fix setting of defer+madvise thp defrag mode
Setting thp defrag mode of "defer+madvise" actually sets "defer" in the
kernel due to the name similarity and the out-of-order way the string is
checked in defrag_store().

Check the string in the correct order so that
TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG is set appropriately for
"defer+madvise".

Fixes: 21440d7eb9 ("mm, thp: add new defer+madvise defrag option")
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1704051814420.137626@chino.kir.corp.google.com
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-08 00:47:48 -07:00
bsegall@google.com
5402e97af6 ptrace: fix PTRACE_LISTEN race corrupting task->state
In PT_SEIZED + LISTEN mode STOP/CONT signals cause a wakeup against
__TASK_TRACED.  If this races with the ptrace_unfreeze_traced at the end
of a PTRACE_LISTEN, this can wake the task /after/ the check against
__TASK_TRACED, but before the reset of state to TASK_TRACED.  This
causes it to instead clobber TASK_WAKING, allowing a subsequent wakeup
against TRACED while the task is still on the rq wake_list, corrupting
it.

Oleg said:
 "The kernel can crash or this can lead to other hard-to-debug problems.
  In short, "task->state = TASK_TRACED" in ptrace_unfreeze_traced()
  assumes that nobody else can wake it up, but PTRACE_LISTEN breaks the
  contract. Obviusly it is very wrong to manipulate task->state if this
  task is already running, or WAKING, or it sleeps again"

[akpm@linux-foundation.org: coding-style fixes]
Fixes: 9899d11f ("ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL")
Link: http://lkml.kernel.org/r/xm26y3vfhmkp.fsf_-_@bsegall-linux.mtv.corp.google.com
Signed-off-by: Ben Segall <bsegall@google.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-08 00:47:48 -07:00
Jessica Yu
d79bf21e0e vmlinux.lds: add missing VMLINUX_SYMBOL macros
When __{start,end}_ro_after_init is referenced from C code, we run into
the following build errors on blackfin:

  kernel/extable.c:169: undefined reference to `__start_ro_after_init'
  kernel/extable.c:169: undefined reference to `__end_ro_after_init'

The build error is due to the fact that blackfin is one of the few
arches that prepends an underscore '_' to all symbols defined in C.

Fix this by wrapping __{start,end}_ro_after_init in vmlinux.lds.h with
VMLINUX_SYMBOL(), which adds the necessary prefix for arches that have
HAVE_UNDERSCORE_SYMBOL_PREFIX.

Link: http://lkml.kernel.org/r/1491259387-15869-1-git-send-email-jeyu@redhat.com
Signed-off-by: Jessica Yu <jeyu@redhat.com>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Eddie Kovsky <ewk@edkovsky.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-08 00:47:48 -07:00
Alexander Polakov
1f06b81aea mm/page_alloc.c: fix print order in show_free_areas()
Fixes: 11fb998986 ("mm: move most file-based accounting to the node")
Link: http://lkml.kernel.org/r/1490377730.30219.2.camel@beget.ru
Signed-off-by: Alexander Polyakov <apolyakov@beget.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>	[4.8+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-08 00:47:48 -07:00
Mike Rapoport
045098e944 userfaultfd: report actual registered features in fdinfo
fdinfo for userfault file descriptor reports UFFD_API_FEATURES.  Up
until recently, the UFFD_API_FEATURES was defined as 0, therefore
corresponding field in fdinfo always contained zero.  Now, with
introduction of several additional features, UFFD_API_FEATURES is not
longer 0 and it seems better to report actual features requested for the
userfaultfd object described by the fdinfo.

First, the applications that were using userfault will still see zero at
the features field in fdinfo.  Next, reporting actual features rather
than available features, gives clear indication of what userfault
features are used by an application.

Link: http://lkml.kernel.org/r/1491140181-22121-1-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Pavel Emelyanov <xemul@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-08 00:47:48 -07:00
Hugh Dickins
d75450ff40 mm: fix page_vma_mapped_walk() for ksm pages
Doug Smythies reports oops with KSM in this backtrace, I've been seeing
the same:

  page_vma_mapped_walk+0xe6/0x5b0
  page_referenced_one+0x91/0x1a0
  rmap_walk_ksm+0x100/0x190
  rmap_walk+0x4f/0x60
  page_referenced+0x149/0x170
  shrink_active_list+0x1c2/0x430
  shrink_node_memcg+0x67a/0x7a0
  shrink_node+0xe1/0x320
  kswapd+0x34b/0x720

Just as observed in commit 4b0ece6fa0 ("mm: migrate: fix
remove_migration_pte() for ksm pages"), you cannot use page->index
calculations on ksm pages.

page_vma_mapped_walk() is relying on __vma_address(), where a ksm page
can lead it off the end of the page table, and into whatever nonsense is
in the next page, ending as an oops inside check_pte()'s pte_page().

KSM tells page_vma_mapped_walk() exactly where to look for the page, it
does not need any page->index calculation: and that's so also for all
the normal and file and anon pages - just not for THPs and their
subpages.  Get out early in most cases: instead of a PageKsm test, move
down the earlier not-THP-page test, as suggested by Kirill.

I'm also slightly worried that this loop can stray into other vmas, so
added a vm_end test to prevent surprises; though I have not imagined
anything worse than a very contrived case, in which a page mlocked in
the next vma might be reclaimed because it is not mlocked in this vma.

Fixes: ace71a19ce ("mm: introduce page_vma_mapped_walk()")
Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1704031104400.1118@eggly.anvils
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Doug Smythies <dsmythies@telus.net>
Tested-by: Doug Smythies <dsmythies@telus.net>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-08 00:47:48 -07:00
Martin Brandenburg
cefdc26e86 orangefs: move features validation to fix filesystem hang
Without this fix (and another to the userspace component itself
described later), the kernel will be unable to process any OrangeFS
requests after the userspace component is restarted (due to a crash or
at the administrator's behest).

The bug here is that inside orangefs_remount, the orangefs_request_mutex
is locked.  When the userspace component restarts while the filesystem
is mounted, it sends a ORANGEFS_DEV_REMOUNT_ALL ioctl to the device,
which causes the kernel to send it a few requests aimed at synchronizing
the state between the two.  While this is happening the
orangefs_request_mutex is locked to prevent any other requests going
through.

This is only half of the bugfix.  The other half is in the userspace
component which outright ignores(!) requests made before it considers
the filesystem remounted, which is after the ioctl returns.  Of course
the ioctl doesn't return until after the userspace component responds to
the request it ignores.  The userspace component has been changed to
allow ORANGEFS_VFS_OP_FEATURES regardless of the mount status.

Mike Marshall says:
 "I've tested this patch against the fixed userspace part. This patch is
  real important, I hope it can make it into 4.11...

  Here's what happens when the userspace daemon is restarted, without
  the patch:

    =============================================
    [ INFO: possible recursive locking detected ]
    [   4.10.0-00007-ge98bdb3 #1 Not tainted    ]
    ---------------------------------------------
    pvfs2-client-co/29032 is trying to acquire lock:
     (orangefs_request_mutex){+.+.+.}, at: service_operation+0x3c7/0x7b0 [orangefs]
                  but task is already holding lock:
     (orangefs_request_mutex){+.+.+.}, at: dispatch_ioctl_command+0x1bf/0x330 [orangefs]

    CPU: 0 PID: 29032 Comm: pvfs2-client-co Not tainted 4.10.0-00007-ge98bdb3 #1
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.fc25 04/01/2014
    Call Trace:
     __lock_acquire+0x7eb/0x1290
     lock_acquire+0xe8/0x1d0
     mutex_lock_killable_nested+0x6f/0x6e0
     service_operation+0x3c7/0x7b0 [orangefs]
     orangefs_remount+0xea/0x150 [orangefs]
     dispatch_ioctl_command+0x227/0x330 [orangefs]
     orangefs_devreq_ioctl+0x29/0x70 [orangefs]
     do_vfs_ioctl+0xa3/0x6e0
     SyS_ioctl+0x79/0x90"

Signed-off-by: Martin Brandenburg <martin@omnibond.com>
Acked-by: Mike Marshall <hubcap@omnibond.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-07 13:41:22 -07:00
Linus Torvalds
c2eb7beac7 pci-v4.11-fixes-4
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJY5+YTAAoJEFmIoMA60/r8SNAQAI12y86AkLENC5HeYtOpvv5s
 hvXkvQMstaftsV5sBZ9vpvooCZZWugIDIAXBASTJw2xouWLWWAVRBBEKWm3Xht/e
 bifc23jSyKu2noiIjfsRqWRUZFnu9+At8nd0LYsG0NGAgcyQJdW2MdZ7GVSBS+CV
 tFSd1jsaq6eLFrHAf6u31SV2D5ASAkegzSFhLZSOdihD67zwXjO1ibxKzVc00f4Y
 3J7bXsAiUcdTt7I6mAmVAcxa1Hb9cmLMt+WD80lrePskHyIYDnoTyrIleLh3l0UG
 oT25hOzAbbTm+pe/3wjEAEczeiITXw/zf/JJJLql3u8OM4THaJjAwTiJ+ovaRdLz
 /P+Bd2TrsiVNNp34AwnjqZ3zz8Ah9b6OKIJEXSZ97ROziZWKTz6q9JmyXn7NZnYz
 zc8ZqC4fbPpJ0rN7PnOimrDo72/ZIlndCb8+mVvjepA1X32TKo91YhosjJwBu4uu
 gJ88Vh+D5V1gTjCROpasDBHY5/1SUwQ+LNtcHQ2k7hCEJJ4VKm1VvFV5cSdgVkGQ
 MeKH4YQtiUDyyA6p67jDgU7US8RcGc8zSpKm5kZ9Jp6y/WAIUqBJqUIE3duoDuzG
 WuV3ouGP7PttU1WEWmQmX25WmK8DC8ykF5Vo6qHWpWvgeT0cm5jK04MN5Ex2k/EN
 NEmtTrVwDoigzP4fFpRD
 =Xs4R
 -----END PGP SIGNATURE-----

Merge tag 'pci-v4.11-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:

 - fix ThunderX legacy firmware resources

 - fix ARTPEC-6 and DesignWare platform driver NULL pointer dereferences

 - fix HiSilicon link error

* tag 'pci-v4.11-fixes-4' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: dwc: Fix dw_pcie_ops NULL pointer dereference
  PCI: dwc: Select PCI_HOST_COMMON for hisi
  PCI: thunder-pem: Fix legacy firmware PEM-specific resources
2017-04-07 12:26:36 -07:00
Bart Van Assche
6d8c6c0f97 blk-mq: Restart a single queue if tag sets are shared
To improve scalability, if hardware queues are shared, restart
a single hardware queue in round-robin fashion. Rename
blk_mq_sched_restart_queues() to reflect the new semantics.
Remove blk_mq_sched_mark_restart_queue() because this function
has no callers. Remove flag QUEUE_FLAG_RESTART because this
patch removes the code that uses this flag.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-07 12:40:09 -06:00
Bart Van Assche
6077c2d706 dm rq: Avoid that request processing stalls sporadically
While running the srp-test software I noticed that request
processing stalls sporadically at the beginning of a test, namely
when mkfs is run against a dm-mpath device. Every time when that
happened the following command was sufficient to resume request
processing:

    echo run >/sys/kernel/debug/block/dm-0/state

This patch avoids that such request processing stalls occur. The
test I ran is as follows:

    while srp-test/run_tests -d -r 30 -t 02-mq; do :; done

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: dm-devel@redhat.com
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-07 12:27:10 -06:00
Bart Van Assche
36e3cf2739 scsi: Avoid that SCSI queues get stuck
If a .queue_rq() function returns BLK_MQ_RQ_QUEUE_BUSY then the block
driver that implements that function is responsible for rerunning the
hardware queue once requests can be queued again successfully.

commit 52d7f1b5c2 ("blk-mq: Avoid that requeueing starts stopped
queues") removed the blk_mq_stop_hw_queue() call from scsi_queue_rq()
for the BLK_MQ_RQ_QUEUE_BUSY case. Hence change all calls to functions
that are intended to rerun a busy queue such that these examine all
hardware queues instead of only stopped queues.

Since no other functions than scsi_internal_device_block() and
scsi_internal_device_unblock() should ever stop or restart a SCSI
queue, change the blk_mq_delay_queue() call into a
blk_mq_delay_run_hw_queue() call.

Fixes: commit 52d7f1b5c2 ("blk-mq: Avoid that requeueing starts stopped queues")
Fixes: commit 7e79dadce2 ("blk-mq: stop hardware queue in blk_mq_delay_queue()")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Long Li <longli@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-07 12:27:08 -06:00
Bart Van Assche
7587a5ae7e blk-mq: Introduce blk_mq_delay_run_hw_queue()
Introduce a function that runs a hardware queue unconditionally
after a delay. Note: there is already a function that stops and
restarts a hardware queue after a delay, namely blk_mq_delay_queue().

This function will be used in the next patch in this series.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Long Li <longli@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-07 12:27:06 -06:00
Linus Torvalds
81d4bab4ce - Two stable@ fixes for the verity target's FEC support
- A stable@ fix for raid target's raid1 support (when no bitmap is used)
 
 - A 4.11 cache metadata v2 format fix to properly test blocks are clean
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJY56smAAoJEMUj8QotnQNa3xYH/39l25eGzam0cnITa31cX9uu
 lb+oWnqbgvbd65HZr2QPu9RO8LQMK9wxw40wapyYTEnkDfgeW+hmwYo3BUZ0IpdT
 Ry39KGCGaxk3L3cATSgtZT18AsWRHmKqlHLf6y98RdeFLVb3lyUFllkLF9r3M2ep
 1Ga2MiMJYffaiTsSKxwZQG3XG7mq9MNfRnCehGAQwjGgWL3EsYHNsq+Hosn/tdtZ
 2D7BvAMr2X+3xEUVevqL2dFmJ1D2tbJjtedeAKVOccErV/BofwWPUvTOFX8202+Y
 CUC9pW+hDQqpCm15Pr4N6oU4TeC4mHMwGK0SLWmoXkl3VDPbUUO3qC5AwKxsepA=
 =cWkE
 -----END PGP SIGNATURE-----

Merge tag 'dm-4.11-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

 - two stable fixes for the verity target's FEC support

 - a stable fix for raid target's raid1 support (when no bitmap is used)

 - a 4.11 cache metadata v2 format fix to properly test blocks are clean

* tag 'dm-4.11-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm verity fec: fix bufio leaks
  dm raid: fix NULL pointer dereference for raid1 without bitmap
  dm cache metadata: fix metadata2 format's blocks_are_clean_separate_dirty
  dm verity fec: limit error correction recursion
2017-04-07 10:47:20 -07:00
Linus Torvalds
dc25ad3fe1 arm64 fixes:
- Restore previous SIGBUS behaviour for unhandled unaligned user accesses
 
 - Revert broken support for the contiguous bit in hugetlb (again...)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABCgAGBQJY53foAAoJELescNyEwWM0ILEH/An3v6VSnbABDRxvXWkrTvKZ
 y4KoHVgSDqehrH8MysrrI7SlB5J5AEGjQI2SzI2InVS4j4Dd/kfqZMeZlo2Z2Idv
 KlC4FXb6QhRjJrrLCVIWCZxQL8gqP9KEI+DwB76a46WSYHHWP4ihtfYTxpTSAZbj
 mDHOmZ2udc/GjEpPzzPNOhXs0+1dEAHkQa+gW8T5HotQK+VVBwFTJKPXGNjm/YQa
 A1lLzYW/R9xRzAeEaJIGa6/jy6jJQ09vkXUdriibRi9qu7+A/xecgq3nb6puwT3j
 0BQqvVQ3eAEejlXA5L4xtdwNb3fhe8hK4pq9OgNnhSytntAtbSiqvGTHea03XKY=
 =d5YQ
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:
 "We've got a regression fix for the signal raised when userspace makes
  an unsupported unaligned access and a revert of the contiguous
  (hugepte) support for hugetlb, which has once again been found to be
  broken. One day, maybe, we'll get it right.

  Summary:

   - restore previous SIGBUS behaviour for unhandled unaligned user
     accesses

   - revert broken support for the contiguous bit in hugetlb (again...)"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  Revert "Revert "arm64: hugetlb: partial revert of 66b3923a1a0f""
  arm64: mm: unaligned access by user-land should be received as SIGBUS
2017-04-07 10:43:22 -07:00
Linus Torvalds
4f0d14b0c9 metag/usercopy: Fault handling fixes
These patches fix a bunch of longstanding (some over a decade old) metag
 user copy fault handling bugs. Thanks go to Al Viro for spotting some of
 the questionable code in the first place.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABCAAGBQJY5g8zAAoJEGwLaZPeOHZ6m8sP/3M5e2VPrtqK7u22QVrjIkOx
 XwtIRCbEFUf9XJhDATKnwraRcKlfespu3ibc2BrQ7e2FCa/Nx6nSipUIMW+zUmGX
 nu2DHnxh6rEEzHc5pBzmiUH8+AsoK5Q12jeQRu8PviCkn7QOOMFt6ZvOHltE0AOh
 OKycaZAbZnvgKQkYmAqxcakesALc0gSRmXLvxlIba7fnhR8fYhSCow3Fxf+DLBbh
 Hq7/7cRyZi/GNnYd0NNLVyifbZk2Xt3nfN9TadysCMc4InsYSz4uJycZZy2p/WW+
 feHJy0EUvDCRBggU7vgSpAd+7By7+tVSjGTH+dwDVwcP3ukFx6qZcu8dvrbyUoMK
 E2QBLb3DSOqPtRmjIq4AYrQUOnzCNwxDG7f02GQGFmV8VudNHUf6Y7W3xIZrT5Ke
 0Y+mcYN4ZN/g5rzBTgj5+zOMsQr1kRlBSGwt2LZq6G5oFk4ZvFXyO4pA7ZHLzP30
 cRuky9uRYvTBGzwa/vxkecjJ4w7xGZAgWHHtc9yPuetX81bJYTY2bQVeqlMvfCrb
 NzOsObBjqokkYqYe4ywaKhyxFo/Ks4X1LkvCepcj5CfcLvfFW6BuZFKw2WvPGGoB
 RSEhvWhtuok5eafb+6QDd6G2ZLo4wSnt5qbCjjWjW9sHLxWG88cOsCctq4mBT1tl
 Vnd2140H9FoAMY0H2xhc
 =MwhE
 -----END PGP SIGNATURE-----

Merge tag 'metag-for-v4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag

Pull metag usercopy fixes from James Hogan:
 "Metag usercopy fault handling fixes

  These patches fix a bunch of longstanding (some over a decade old)
  metag user copy fault handling bugs. Thanks go to Al Viro for spotting
  some of the questionable code in the first place"

* tag 'metag-for-v4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/metag:
  metag/usercopy: Add missing fixups
  metag/usercopy: Fix src fixup in from user rapf loops
  metag/usercopy: Set flags before ADDZ
  metag/usercopy: Zero rest of buffer from copy_from_user
  metag/usercopy: Add early abort to copy_to_user
  metag/usercopy: Fix alignment error checking
  metag/usercopy: Drop unused macros
2017-04-07 10:11:53 -07:00
Linus Torvalds
7ab661856b ACPI fix for v4.11-rc6
- Refine the check for the existence of _HID in find_child_checks()
    so that it doesn't trigger for device objects with device IDs
    made up by the kernel (Rafael Wysocki).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJY531xAAoJEILEb/54YlRxyoMQAJ7wKKP1+1TBf+rIAb817cdp
 BBP8gPEmkEyTVaF8DkC4byLeIipbRHezq2sjChVM2uj+I1L9HnDg3lkbyISlH1h3
 UgS5/6Sg6aY+nEhNN6REZY/5es+tmUX5bLYsmttETLjluld02XbOQHPVlkY4HR8n
 o5qaMxR0sKSs9AhzWG+NLqSqTSL30uqKzxaBf7ZZPiKkKbOWctgv6ts5fKdXO8mx
 Z4FLvuRygva5L704jycGDJ/3W8gidrfi63n2QAoP5dIoc8UHfMEHVIMYxqW6HJkp
 AuZTk0zJ+KmawP+FIKAXBvPk+T4XIbzV6tWmQWmMJYuEwmilp0V4oGBIZdMdYklo
 zShQaUdpiEAsJswzQDbl6gyl2iGzeCbTgxzhPqP7h3Q7kl5igUYLw491b5S+vEPf
 Kw+eI0T14pnoyk569Abf4S25BBU5fsUiNuFBNncIZNKUd9+V7TKf8WVcxlP0BfBQ
 sCciwlAlb6R+Mxunv4pLSA3gqEdDPHfPNjS3VoWN/B23KY2YbHPk8vlXjClIVPXR
 lvtaZtInR2Ri4AlDMcFRYF7fdyMOxgN+qaLSClyCFuW1ZjyG6JPbjRoadjIgAfrO
 9B1CLiCngAYqw/QZLvAfp5ISxhR4y9WXrb9fbVCy86jPS3foKONbhe094d2V0EH8
 5s7qzE8c6a2URDjZL7sr
 =04+L
 -----END PGP SIGNATURE-----

Merge tag 'acpi-4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fix from Rafael Wysocki:
 "This fixes a core device enumeration code change made in 4.10, in
  order to address a reported issue, that went too far.

  Specifics:

   - Refine the check for the existence of _HID in find_child_checks()
     so that it doesn't trigger for device objects with device IDs made
     up by the kernel (Rafael Wysocki)"

* tag 'acpi-4.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI / scan: Prefer devices without _HID for _ADR matching
2017-04-07 10:01:45 -07:00
Linus Torvalds
50bdd7a0c9 fix for 4.11 rc6
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABAgAGBQJY5ysAAAoJELDendYovxMvEIoH/2Jl9vM9DzfdTxfyvSS0+i3/
 e3UTL4igNSBT5W4t3DL0o5IZ9e6LQ17j8VWqbpT8d6AzdvtW6xx4ZpulPuh/qTnA
 pHNUy9yPXr91k4KzSgV3ASaqxdBOIAq74t0u+BwpDjWV8Vok5oONxPf03vigfrKb
 jFLZfP0DilSI6YIsExtUqZhT1ydnG6mm0PMXGT5VfHgnotUVapSnZw8Ht/STXRbH
 SZ5a1QLcQxfFziFEmBGSOqIKLmA3TYpzBcbSVg4vjysdmf910C8tU6KFkdDhg8Gu
 3vWsjBpgxoAmdHYfElD3PDp1pyWTmKkF0jtJavcaNp0GOGD8f5jg1Rf/eozD9CA=
 =6Oc6
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.11b-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull Xen fix from Juergen Gross:
 "A fix for error path cleanup in the xenbus handler"

* tag 'for-linus-4.11b-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xenbus: remove transaction holder from list before freeing
2017-04-07 09:58:01 -07:00
Liping Zhang
5380e5644a sysctl: don't print negative flag for proc_douintvec
I saw some very confusing sysctl output on my system:
  # cat /proc/sys/net/core/xfrm_aevent_rseqth
  -2
  # cat /proc/sys/net/core/xfrm_aevent_etime
  -10
  # cat /proc/sys/net/ipv4/tcp_notsent_lowat
  -4294967295

Because we forget to set the *negp flag in proc_douintvec, so it will
become a garbage value.

Since the value related to proc_douintvec is always an unsigned integer,
so we can set *negp to false explictily to fix this issue.

Fixes: e7d316a02f ("sysctl: handle error writing UINT_MAX to u32 fields")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Cc: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-04-07 09:46:44 -07:00