Commit Graph

827076 Commits

Author SHA1 Message Date
Matthew Wilcox
15fab63e1e fs: prevent page refcount overflow in pipe_buf_get
Change pipe_buf_get() to return a bool indicating whether it succeeded
in raising the refcount of the page (if the thing in the pipe is a page).
This removes another mechanism for overflowing the page refcount.  All
callers converted to handle a failure.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Matthew Wilcox <willy@infradead.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-14 10:00:04 -07:00
Linus Torvalds
8fde12ca79 mm: prevent get_user_pages() from overflowing page refcount
If the page refcount wraps around past zero, it will be freed while
there are still four billion references to it.  One of the possible
avenues for an attacker to try to make this happen is by doing direct IO
on a page multiple times.  This patch makes get_user_pages() refuse to
take a new page reference if there are already more than two billion
references to the page.

Reported-by: Jann Horn <jannh@google.com>
Acked-by: Matthew Wilcox <willy@infradead.org>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-14 10:00:04 -07:00
Linus Torvalds
88b1a17dfc mm: add 'try_get_page()' helper function
This is the same as the traditional 'get_page()' function, but instead
of unconditionally incrementing the reference count of the page, it only
does so if the count was "safe".  It returns whether the reference count
was incremented (and is marked __must_check, since the caller obviously
has to be aware of it).

Also like 'get_page()', you can't use this function unless you already
had a reference to the page.  The intent is that you can use this
exactly like get_page(), but in situations where you want to limit the
maximum reference count.

The code currently does an unconditional WARN_ON_ONCE() if we ever hit
the reference count issues (either zero or negative), as a notification
that the conditional non-increment actually happened.

NOTE! The count access for the "safety" check is inherently racy, but
that doesn't matter since the buffer we use is basically half the range
of the reference count (ie we look at the sign of the count).

Acked-by: Matthew Wilcox <willy@infradead.org>
Cc: Jann Horn <jannh@google.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-14 10:00:04 -07:00
Linus Torvalds
f958d7b528 mm: make page ref count overflow check tighter and more explicit
We have a VM_BUG_ON() to check that the page reference count doesn't
underflow (or get close to overflow) by checking the sign of the count.

That's all fine, but we actually want to allow people to use a "get page
ref unless it's already very high" helper function, and we want that one
to use the sign of the page ref (without triggering this VM_BUG_ON).

Change the VM_BUG_ON to only check for small underflows (or _very_ close
to overflowing), and ignore overflows which have strayed into negative
territory.

Acked-by: Matthew Wilcox <willy@infradead.org>
Cc: Jann Horn <jannh@google.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-04-14 10:00:04 -07:00
Linus Torvalds
4443f8e6ac for-linus-20190412
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlyw354QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpiMyEAC4THUReCrTuv9oFRNg5uILVYIq51nP8dw7
 XamC7A92jPXd6vl/QVjmvLwT34/Y2XvX0t62RBsk849CEjgGYTeF1/qI3tMkpN7c
 huupab3aYM/Rrv4i1KSPQu6iIto3DYqfmREaGJJ1Ikbu/CKDuUGyEo+Z4wrKUPon
 GWnE8QMS2fdc764eVzKKqB+GryaEiHmeD1N4NnPs+nla14ysueUvJUikkTt/Laef
 h7nOmz9mrqE6u1xVHNpo0TlW0oJdLfaDIL9ghwHFJXqvriTh8Tg2tEHpXI6vSTTt
 StnPbTA1s1uhHs4rWYl8J0UXSZnRRp0Ep8jCvqEb9CJ23uHCNyGEoy/R7q+x2quf
 T+ruolMXY7IIJP30ZMHar374YfajJdw7EH/565nlbLnjSBXhqjmc07kQ7mIYvpg6
 JgureSdDwOOHpfrJgVq5es48ndt5HBYUBPzkvVGTgkeSJkMydkkM1qZeYEnai105
 8EnUFusRUnYZtb73HBPjKS7i0BZZvZlI1oKYHabiMtajqcKyvwDP2tTmhqXYLDLY
 9uloW0u2B0lddfzCb9hTYZOroNWfifo4vuSU5DHvnJoKvf4z3auDxaFD9N8fGn6S
 aZsRjMCpFqFd0YEnZPbsctgPg2Licrs02uPntlzBTJ0ByH20pX4OepYrvgQk3vao
 tOQ1jRYMKw==
 =cISy
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20190412' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
 "Set of fixes that should go into this round. This pull is larger than
  I'd like at this time, but there's really no specific reason for that.
  Some are fixes for issues that went into this merge window, others are
  not. Anyway, this contains:

   - Hardware queue limiting for virtio-blk/scsi (Dongli)

   - Multi-page bvec fixes for lightnvm pblk

   - Multi-bio dio error fix (Jason)

   - Remove the cache hint from the io_uring tool side, since we didn't
     move forward with that (me)

   - Make io_uring SETUP_SQPOLL root restricted (me)

   - Fix leak of page in error handling for pc requests (Jérôme)

   - Fix BFQ regression introduced in this merge window (Paolo)

   - Fix break logic for bio segment iteration (Ming)

   - Fix NVMe cancel request error handling (Ming)

   - NVMe pull request with two fixes (Christoph):
       - fix the initial CSN for nvme-fc (James)
       - handle log page offsets properly in the target (Keith)"

* tag 'for-linus-20190412' of git://git.kernel.dk/linux-block:
  block: fix the return errno for direct IO
  nvmet: fix discover log page when offsets are used
  nvme-fc: correct csn initialization and increments on error
  block: do not leak memory in bio_copy_user_iov()
  lightnvm: pblk: fix crash in pblk_end_partial_read due to multipage bvecs
  nvme: cancel request synchronously
  blk-mq: introduce blk_mq_complete_request_sync()
  scsi: virtio_scsi: limit number of hw queues by nr_cpu_ids
  virtio-blk: limit number of hw queues by nr_cpu_ids
  block, bfq: fix use after free in bfq_bfqq_expire
  io_uring: restrict IORING_SETUP_SQPOLL to root
  tools/io_uring: remove IOCQE_FLAG_CACHEHIT
  block: don't use for-inside-for in bio_for_each_segment_all
2019-04-13 16:23:16 -07:00
Linus Torvalds
b60bc0665e NFS client bugfixes for Linux 5.1
Highlights include:
 
 Stable fixes:
 - Fix a deadlock in close() due to incorrect draining of RDMA queues
 
 Bugfixes:
 - Revert "SUNRPC: Micro-optimise when the task is known not to be sleeping"
   as it is causing stack overflows
 - Fix a regression where NFSv4 getacl and fs_locations stopped working
 - Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family.
 - Fix xfstests failures due to incorrect copy_file_range() return values
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJcsfeVAAoJEA4mA3inWBJcPjAQAIPERRVWjg7xRz6CJzt2yoM1
 ApPj965DCnC9bGcGAH2U+TbCWJOi3lJwaZOPTL0ut/Tcv9PpKETRqk+rrjUcFRy1
 1b1HH16GivprOmHgCRyqo5Qj2ZiaGNpY3tJfxl/6eIiSpHKPZLa4zY+q2KfK/YNI
 SOVyNU0Gq08p4AiKr3CG5VVZGdNgRMrnzBYJqeTh1zZ7erWE2nJoE+pmvcLhZR0w
 uxshbTWbJT21KLEI+PXTyGtFkz5jNaKy4Ts07MRBJdQjDv73MUW8CcqFZicSjtqx
 zdKYa1VH9pEOjFOs57xGELSnYRdB00Vgd9/b6MqKyWH8iJzXFbgjEusMWiU45aeF
 NLg9ySSU8LeY93SxV66CHG57NIgHqwZu6P+lO3efRzuHgEGceDsz0WwDF2KNIZlm
 /vOmbk0I+woneFUeNDWAXD9/ETUJ8RCNk1/b1UlbkUL7aD5WSLDp1bKPifk/WA6E
 Mtgwmqz1Vso3cIPglWcAgsfEAYJZSJVDMfRIhm2dy7vVU0nfW12I00G8BShgr8f7
 mxAxd/V+1/Q9ftPENgC9z5LWKYQjfjksnYRHXW1m5c92Yoe9TF0yiNyDmT5hBR6w
 MvUN2j3yeQBqk6JHZxtH/mmdSRD0o5kxvFrEqMj1PpP8X8DpWupQA8SZKnHq0wlj
 8Q7LRum+wmhbiKCmZ+1F
 =vRPB
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-5.1-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 "Highlights include:

  Stable fix:

   - Fix a deadlock in close() due to incorrect draining of RDMA queues

  Bugfixes:

   - Revert "SUNRPC: Micro-optimise when the task is known not to be
     sleeping" as it is causing stack overflows

   - Fix a regression where NFSv4 getacl and fs_locations stopped
     working

   - Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family.

   - Fix xfstests failures due to incorrect copy_file_range() return
     values"

* tag 'nfs-for-5.1-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  Revert "SUNRPC: Micro-optimise when the task is known not to be sleeping"
  NFSv4.1 fix incorrect return value in copy_file_range
  xprtrdma: Fix helper that drains the transport
  NFS: Fix handling of reply page vector
  NFS: Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family.
2019-04-13 14:47:06 -07:00
Linus Torvalds
87af0c3813 SCSI fixes on 20190413
One obvious fix for a ciostor data corruption on error bug.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXLGx4yYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishXlDAQD41knG
 TLx+E1FCgYEMuq7SdQx6D1Z7l6ZSwBh1hntHdQD+KHAVafU6Kx2lTzfNw7FlCZZ5
 LBwX/4AxmatTzQI4jFg=
 =Fxkf
 -----END PGP SIGNATURE-----

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fix from James Bottomley:
 "One obvious fix for a ciostor data corruption on error bug"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: csiostor: fix missing data copy in csio_scsi_err_handler()
2019-04-13 14:37:49 -07:00
Linus Torvalds
09bad0df39 Here's more than a handful of clk driver fixes for changes that came in
during the merge window:
 
  - Fix the AT91 sama5d2 programmable clk prescaler formula
 
  - A bunch of Amlogic meson clk driver fixes for the VPU clks
 
  - A DMI quirk for Intel's Bay Trail SoC's driver to properly mark
    pmc clks as critical only when really needed
 
  - Stop overwriting CLK_SET_RATE_PARENT flag in mediatek's clk gate
    implementation
 
  - Use the right structure to test for a frequency table in i.MX's
    PLL_1416x driver
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCAAvFiEE9L57QeeUxqYDyoaDrQKIl8bklSUFAlyxC/IRHHNib3lkQGtl
 cm5lbC5vcmcACgkQrQKIl8bklSWPTg//Q9CXbOYC64u2LEMtMKFtxS0UobjFKyMg
 EfRnHM3EuRKHCSPLtcr5bKQkFQYJ7Qx9A8oQm4v1d0KlQ2HyrOuAjfAkCaKweKSK
 iXpvWQMHcyRNPmPhzaDnuGBVXptOQ+kfwjWT4/nbkjW0bnFTwpvx9I5pdUd3UOJv
 IdnYOLKAF8Uwt2nyJd++Bh0UeBhQ1XIl9P46iZGa43nQsQhgSaru3oBnhVOzEti/
 k9Di3H1k1wIKR+xDujl/S3vIIEUcx0eGkL86sFdVq6nYwdQQZKusESC0vh5QJ/Ax
 LLSJcdoM8B84zStkYgIskdltdMZmsUUjLjjEbF5iq1my+LwQZ3JLWkY/gXMeF2Mu
 t5S/TVe5GwqKw2tmoQYkR2Qz76x7/DauZEdUcYtu+K9D2ye5aNDsNNCHlFkamN2N
 EJkBXDqpKGHkyOdUGmL+B0W6D1KxwJEREkCh0aIpbVci1PjfxvI6PLJBF907RkLx
 UNDF/flLoOMy+iUl0ZC05Ie06CkzJMf1e7mMaIIS/FfC7UJ4yNVEHyCADzyrCLOB
 XWwmwCea5NnIi3EQP91a7WO/Gr+yUWxfrQ3viNqM3KbPKOurofMp/JvDnu8bX31O
 l+yiRfpdjIaKUdyDLnTaq3UGBlBlFnqFOWkjRmmMzRZoBmwZhCN7H30LIlqnqnpQ
 wsvhawe24UY=
 =JS0o
 -----END PGP SIGNATURE-----

Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk fixes from Stephen Boyd:
 "Here's more than a handful of clk driver fixes for changes that came
  in during the merge window:

   - Fix the AT91 sama5d2 programmable clk prescaler formula

   - A bunch of Amlogic meson clk driver fixes for the VPU clks

   - A DMI quirk for Intel's Bay Trail SoC's driver to properly mark pmc
     clks as critical only when really needed

   - Stop overwriting CLK_SET_RATE_PARENT flag in mediatek's clk gate
     implementation

   - Use the right structure to test for a frequency table in i.MX's
     PLL_1416x driver"

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: imx: Fix PLL_1416X not rounding rates
  clk: mediatek: fix clk-gate flag setting
  platform/x86: pmc_atom: Drop __initconst on dmi table
  clk: x86: Add system specific quirk to mark clocks as critical
  clk: meson: vid-pll-div: remove warning and return 0 on invalid config
  clk: meson: pll: fix rounding and setting a rate that matches precisely
  clk: meson-g12a: fix VPU clock parents
  clk: meson: g12a: fix VPU clock muxes mask
  clk: meson-gxbb: round the vdec dividers to closest
  clk: at91: fix programmable clock for sama5d2
2019-04-13 14:33:56 -07:00
Linus Torvalds
a3b8424862 pci-v5.1-fixes-2
-----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAlywr7oUHGJoZWxnYWFz
 QGdvb2dsZS5jb20ACgkQWYigwDrT+vzpOQ//YQN/ml2NmNhFRXOlA1ZneWLZ7+Fn
 8lBaTJVnfahLCGQKoV95F0jmzoI2nvJljKjIJ5+6ttTJrZ22Pq+OSzi/n831nIXL
 l9rc2nEjTjSPna7XfxArE+pfo+yXUgBLEH9rcyIMLsDoIXqFkHY73YCBTIZdvEci
 J6dSHd7bCDKDbl/i5VYl8zRQZ9maeLE4Zz6azOHQZ2RjYJKQt/HlPzTEu2SHdVb3
 wRx0wiTiMZENZrg9uo+y5Jcap5YOFpWhmPMGhcB3HLAsQ1+y3Ln4hxJmakvtUHxA
 H1zAFx1PtA+KsBB0k4vPR/+YGIRuZ8eI3RtXRjZ+ZYCZK5jTMltPcvS2DON6du3N
 i3H+Aqnj/iB6Iyg6Z6iJfJqgz5EdBdIRV7s54RbWZN0UEeUS8/dUdygjZ+fQDgYQ
 O8htquxhq2BkdU5rGLSbb4MQICPINfysqjRd012ur2HgY6AuMkomZzZcJT5HG6pV
 OLmTl41IPhklrQ+HpUGUnECHz8OCy3Ospz5eHbnSrQ+fJ1MW3NaF3C4/JOYdCvl/
 5N6kZLZc8Hn75ZqisL/WYr2e0CXGF2NigIlI3tBlg5J8GtLxA1FVCcMPy5LE6bNn
 TtPbea8mupzmMr8wZ/qNxn0tAgdx/Q02ihEWF3ZIUuwFM6TayfGbW+TP6C7Sn7Oh
 eu7neYGdKts7Xr4=
 =DdoZ
 -----END PGP SIGNATURE-----

Merge tag 'pci-v5.1-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:

 - Add a DMA alias quirk for another Marvell SATA device (Andre
   Przywara)

 - Fix a pciehp regression that broke safe removal of devices (Sergey
   Miroshnichenko)

* tag 'pci-v5.1-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  PCI: pciehp: Ignore Link State Changes after powering off a slot
  PCI: Add function 1 DMA alias quirk for Marvell 9170 SATA controller
2019-04-13 14:29:21 -07:00
Linus Torvalds
cf60528f8a powerpc fixes for 5.1 #5
A minor build fix for 64-bit FLATMEM configs.
 
 A fix for a boot failure on 32-bit powermacs.
 
 My commit to fix CLOCK_MONOTONIC across Y2038 broke the 32-bit VDSO on 64-bit
 kernels, ie. compat mode, which is only used on big endian.
 
 The rewrite of the SLB code we merged in 4.20 missed the fact that the 0x380
 exception is also used with the Radix MMU to report out of range accesses. This
 could lead to an oops if userspace tried to read from addresses outside the user
 or kernel range.
 
 Thanks to:
   Aneesh Kumar K.V, Christophe Leroy, Larry Finger, Nicholas Piggin.
 -----BEGIN PGP SIGNATURE-----
 
 iQIcBAABAgAGBQJcsVzhAAoJEFHr6jzI4aWAJuAP/2oLukNIIiF2UW/18xIXfvxR
 ZA9JljVqcKUHEUR4W+Y673xL4ZKtGGF79P+bzSvh8fUTMJ9cIN9mLO7eGGoDNqTn
 XhZX/jxJOh34tbHPYYbi9kYqWpZQKN4WuCjMQSPBCHOHMdx/0yn0wKgriOW1cuzG
 AQqDRHcRX4h1QT9o/hnsCAsdcnLEntdBBCTTHL1dZ8BucuUopjL+7cV0wf4qFIui
 e9SXOEl7yV03JGurmWcipE4mj9SrUioZJyHg6rJs70tlCUHFM24LQEFNIM4WczuF
 GoPfzXi5nNPrOzC3aF/v77hT5t4zD2sPRV2DuKABGsS+gfPoK8sIZC3mo7Vk5y+j
 gsbmkQSZt8/wVhRuAA0m0N6Aqg1J8NjhxoDfyM8kj0FzPe75D662VIgGSx15oMkl
 3olt/9uDyPetxuZ7tmmnFC8wkcmyaGpVurVz9xnqpt6c2r0KI+16R6Mk4OiT/e2p
 KNVBFkqRTp23ETpI8J9HUk9OtFIHqE9Zwzk2YOrX5yuLHByEwMq1T4qn2RuQsJqx
 RWPJagSalGLmM6dqDGe08gQl9rovkYKleGxNIAJuJB9rIxZQke86d2+S0eSUQbAW
 WWhP8SU0LJ5gmhEeZi5MntcuG+gcENwkz2UBK5nVDBVLFxGuBTPQATavW+w1bSi+
 SSEMXx8dNAOvsrqrZ97I
 =pfZc
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.1-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "A minor build fix for 64-bit FLATMEM configs.

  A fix for a boot failure on 32-bit powermacs.

  My commit to fix CLOCK_MONOTONIC across Y2038 broke the 32-bit VDSO on
  64-bit kernels, ie. compat mode, which is only used on big endian.

  The rewrite of the SLB code we merged in 4.20 missed the fact that the
  0x380 exception is also used with the Radix MMU to report out of range
  accesses. This could lead to an oops if userspace tried to read from
  addresses outside the user or kernel range.

  Thanks to: Aneesh Kumar K.V, Christophe Leroy, Larry Finger, Nicholas
  Piggin"

* tag 'powerpc-5.1-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/mm: Define MAX_PHYSMEM_BITS for all 64-bit configs
  powerpc/64s/radix: Fix radix segment exception handling
  powerpc/vdso32: fix CLOCK_MONOTONIC on PPC64
  powerpc/32: Fix early boot failure with RTAS built-in
2019-04-13 09:03:09 -07:00
Linus Torvalds
5ded88718a arm64 fixes for -rc5
- Fix stack unwinding so we ignore user stacks
 
 - Fix ftrace module PLT trampoline initialisation checks
 
 - Fix terminally broken implementation of FUTEX_WAKE_OP atomics
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAlywnLUACgkQt6xw3ITB
 YzSWLAgAtcvWXLbKCGgTsgFwkW0at9j1kwC0eyaLKXY1RQXCA+s2nYaaK1p8vXr0
 qhnKI2do2Jwef0kGEX2iS5PMZaGZv32woWNFd+VLzUimAMNAsSBBKpc7S76tovjo
 5UtFa5SlePy946hV8vAYdyfOemW+5+VfZ7Z5IqQyrF77SL+5Z4CmQxxsrRCpBKMy
 HvNlEzp+opnF0zLBSfcw3YMzN5iYpSK3yqQ2NzR5KjfEKuf9vwePMkgLik1AlT9b
 24ba/Q1g3QB58OqUiRbepR1yxK8sPBtsCaabdMFYeU/b6PZtvnvVnvpNS8a54/SG
 sTnosSSdPnRZT5HIJYcYwbWS11xaNg==
 =EZmf
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Will Deacon:
 "The main thing is a fix to our FUTEX_WAKE_OP implementation which was
  unbelievably broken, but did actually work for the one scenario that
  GLIBC used to use.

  Summary:

   - Fix stack unwinding so we ignore user stacks

   - Fix ftrace module PLT trampoline initialisation checks

   - Fix terminally broken implementation of FUTEX_WAKE_OP atomics"

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: futex: Fix FUTEX_WAKE_OP atomic ops with non-zero result value
  arm64: backtrace: Don't bother trying to unwind the userspace stack
  arm64/ftrace: fix inadvertent BUG() in trampoline check
2019-04-13 08:57:00 -07:00
Linus Torvalds
6d0a598489 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Fix typos in user-visible resctrl parameters, and also fix assembly
  constraint bugs that might result in miscompilation"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/asm: Use stricter assembly constraints in bitops
  x86/resctrl: Fix typos in the mba_sc mount option
2019-04-12 20:54:40 -07:00
Linus Torvalds
122c215bfa Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fix from Ingo Molnar:
 "Fix the alarm_timer_remaining() return value"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  alarmtimer: Return correct remaining time
2019-04-12 20:52:28 -07:00
Linus Torvalds
5e6f1fee60 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Ingo Molnar:
 "Fix a NULL pointer dereference crash in certain environments"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/fair: Do not re-read ->h_load_next during hierarchical load calculation
2019-04-12 20:50:43 -07:00
Linus Torvalds
73fdb2c908 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Six kernel side fixes: three related to NMI handling on AMD systems, a
  race fix, a kexec initialization fix and a PEBS sampling fix"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Fix perf_event_disable_inatomic() race
  x86/perf/amd: Remove need to check "running" bit in NMI handler
  x86/perf/amd: Resolve NMI latency issues for active PMCs
  x86/perf/amd: Resolve race condition when disabling PMC
  perf/x86/intel: Initialize TFA MSR
  perf/x86/intel: Fix handling of wakeup_events for multi-entry PEBS
2019-04-12 20:42:30 -07:00
Linus Torvalds
26e2b81977 Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fix from Ingo Molnar:
 "Fixes a crash when accessing /proc/lockdep"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/lockdep: Zap lock classes even with lock debugging disabled
2019-04-12 20:31:08 -07:00
Linus Torvalds
6a022984c3 Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Ingo Molnar:
 "Two genirq fixes, plus an irqchip driver error handling fix"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq: Respect IRQCHIP_SKIP_SET_WAKE in irq_chip_set_wake_parent()
  genirq: Initialize request_mutex if CONFIG_SPARSE_IRQ=n
  irqchip/irq-ls1x: Missing error code in ls1x_intc_of_init()
2019-04-12 20:21:59 -07:00
Linus Torvalds
54c63a7558 Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core fixes from Ingo Molnar:
 "Fix an objtool warning plus fix a u64_to_user_ptr() macro expansion
  bug"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  objtool: Add rewind_stack_do_exit() to the noreturn list
  linux/kernel.h: Use parentheses around argument in u64_to_user_ptr()
2019-04-12 20:13:13 -07:00
David S. Miller
5fa7d3f9d3 Merge branch 'rhashtable-bit-locking-m68k'
NeilBrown says:

====================
Fix rhashtable bit-locking for m68k

As reported by Guenter Roeck, the new rhashtable bit-locking
doesn't work on m68k as it only requires 2-byte alignment, so BIT(1)
is addresses is not unused.

We current use BIT(0) to identify a NULLS marker, but that is only
needed in ->next pointers.  The bucket head does not need a NULLS
marker, so the lsb there can be used for locking.

the first 4 patches make some small improvements and re-arrange some
code.  The final patch converts to using only BIT(0) for these two
different special purposes.

I had previously suggested dropping the series until I fix it.  Given
that this was fairly easy, I retract that I think it best simply to
add these patches to fix the code.
====================

Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 17:34:46 -07:00
NeilBrown
ca0b709d1a rhashtable: use BIT(0) for locking.
As reported by Guenter Roeck, the new bit-locking using
BIT(1) doesn't work on the m68k architecture.  m68k only requires
2-byte alignment for words and longwords, so there is only one
unused bit in pointers to structs - We current use two, one for the
NULLS marker at the end of the linked list, and one for the bit-lock
in the head of the list.

The two uses don't need to conflict as we never need the head of the
list to be a NULLS marker - the marker is only needed to check if an
object has moved to a different table, and the bucket head cannot
move.  The NULLS marker is only needed in a ->next pointer.

As we already have different types for the bucket head pointer (struct
rhash_lock_head) and the ->next pointers (struct rhash_head), it is
fairly easy to treat the lsb differently in each.

So: Initialize buckets heads to NULL, and use the lsb for locking.
When loading the pointer from the bucket head, if it is NULL (ignoring
the lock big), report as being the expected NULLS marker.
When storing a value into a bucket head, if it is a NULLS marker,
store NULL instead.

And convert all places that used bit 1 for locking, to use bit 0.

Fixes: 8f0db01800 ("rhashtable: use bit_spin_locks to protect hash bucket.")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 17:34:45 -07:00
NeilBrown
f4712b46a5 rhashtable: replace rht_ptr_locked() with rht_assign_locked()
The only times rht_ptr_locked() is used, it is to store a new
value in a bucket-head.  This is the only time it makes sense
to use it too.  So replace it by a function which does the
whole task:  Sets the lock bit and assigns to a bucket head.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 17:34:45 -07:00
NeilBrown
adc6a3ab19 rhashtable: move dereference inside rht_ptr()
Rather than dereferencing a pointer to a bucket and then passing the
result to rht_ptr(), we now pass in the pointer and do the dereference
in rht_ptr().

This requires that we pass in the tbl and hash as well to support RCU
checks, and means that the various rht_for_each functions can expect a
pointer that can be dereferenced without further care.

There are two places where we dereference a bucket pointer
where there is no testable protection - in each case we know
that we much have exclusive access without having taken a lock.
The previous code used rht_dereference() to pretend that holding
the mutex provided protects, but holding the mutex never provides
protection for accessing buckets.

So instead introduce rht_ptr_exclusive() that can be used when
there is known to be exclusive access without holding any locks.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 17:34:45 -07:00
NeilBrown
c5783311a1 rhashtable: reorder some inline functions and macros.
This patch only moves some code around, it doesn't
change the code at all.
A subsequent patch will benefit from this as it needs
to add calls to functions which are now defined before the
call-site, but weren't before.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 17:34:45 -07:00
NeilBrown
e4edbe3c1f rhashtable: fix some __rcu annotation errors
With these annotations, the rhashtable now gets no
warnings when compiled with "C=1" for sparse checking.

Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 17:34:45 -07:00
Gustavo A. R. Silva
c252aa3e8e rhashtable: use struct_size() in kvzalloc()
One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along with
memory for some number of elements for that array.  For example:

struct foo {
    int stuff;
    struct boo entry[];
};

size = sizeof(struct foo) + count * sizeof(struct boo);
instance = kvzalloc(size, GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

instance = kvzalloc(struct_size(instance, entry, count), GFP_KERNEL);

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 17:31:33 -07:00
David S. Miller
9d60f0ea1c Merge branch 'nfp-update-to-control-structures'
Jakub Kicinski says:

====================
nfp: update to control structures

This series prepares NFP control structures for crypto offloads.
So far we mostly dealt with configuration requests under rtnl lock.
This will no longer be the case with crypto.  Additionally we will
try to reuse the BPF control message format, so we move common code
out of BPF.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 17:29:15 -07:00
Jakub Kicinski
bcf0cafab4 nfp: split out common control message handling code
BPF's control message handler seems like a good base to built
on for request-reply control messages.  Split it out to allow
for reuse.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 17:29:15 -07:00
Jakub Kicinski
0a72d8332c nfp: move vNIC reset before netdev init
During probe we clear vNIC configuration in case the device
wasn't closed cleanly by previous driver.  Move that code
before netdev init, so netdev init can already try to apply
its config parameters.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 17:29:15 -07:00
Jakub Kicinski
dd5b2498d8 nfp: add a mutex lock for the vNIC ctrl BAR
Soon we will try to write to the vNIC mailbox without RTNL held.
Add a new mutex to protect access to specific parts of the PCI
control BAR.

Move the mailbox size checking to the mailbox lock() helper, where
it can be more effective (happen prior to potential overwrite of
other data).

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 17:29:15 -07:00
Dirk van der Merwe
e64718282c nfp: opportunistically poll for reconfig result
If the reconfig was a quick update, we could have results available from
firmware within 200us.

Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 17:29:15 -07:00
Stephen Suryaputra
ed0de45a10 ipv4: recompile ip options in ipv4_link_failure
Recompile IP options since IPCB may not be valid anymore when
ipv4_link_failure is called from arp_error_report.

Refer to the commit 3da1ed7ac3 ("net: avoid use IPCB in cipso_v4_error")
and the commit before that (9ef6b42ad6) for a similar issue.

Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 17:23:46 -07:00
David Ahern
1deeb6408c ipv6: Remove flowi6_oif compare from __ip6_route_redirect
In the review of 0b34eb0043 ("ipv6: Refactor __ip6_route_redirect"),
Martin noted that the flowi6_oif compare is moved to the new helper and
should be removed from __ip6_route_redirect. Fix the oversight.

Fixes: 0b34eb0043 ("ipv6: Refactor __ip6_route_redirect")
Reported-by: Martin Lau <kafai@fb.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 17:05:38 -07:00
David S. Miller
9e550f0153 Merge branch 'rxrpc-fixes'
David Howells says:

====================
rxrpc: Fixes

Here is a collection of fixes for rxrpc:

 (1) rxrpc_error_report() needs to call sock_error() to clear the error
     code from the UDP transport socket, lest it be unexpectedly revisited
     on the next kernel_sendmsg() call.  This has been causing all sorts of
     weird effects in AFS as the effects have typically been felt by the
     wrong RxRPC call.

 (2) Allow a kernel user of AF_RXRPC to easily detect if an rxrpc call has
     completed.

 (3) Allow errors incurred by attempting to transmit data through the UDP
     socket to get back up the stack to AFS.

 (4) Make AFS use (2) to abort the synchronous-mode call waiting loop if
     the rxrpc-level call completed.

 (5) Add a missing tracepoint case for tracing abort reception.

 (6) Fix detection and handling of out-of-order ACKs.

====================

Tested-by: Jonathan Billings <jsbillin@umich.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 16:57:23 -07:00
Jeffrey Altman
1a2391c30c rxrpc: Fix detection of out of order acks
The rxrpc packet serial number cannot be safely used to compute out of
order ack packets for several reasons:

 1. The allocation of serial numbers cannot be assumed to imply the order
    by which acks are populated and transmitted.  In some rxrpc
    implementations, delayed acks and ping acks are transmitted
    asynchronously to the receipt of data packets and so may be transmitted
    out of order.  As a result, they can race with idle acks.

 2. Serial numbers are allocated by the rxrpc connection and not the call
    and as such may wrap independently if multiple channels are in use.

In any case, what matters is whether the ack packet provides new
information relating to the bounds of the window (the firstPacket and
previousPacket in the ACK data).

Fix this by discarding packets that appear to wind back the window bounds
rather than on serial number procession.

Fixes: 298bc15b20 ("rxrpc: Only take the rwind and mtu values from latest ACK")
Signed-off-by: Jeffrey Altman <jaltman@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 16:57:23 -07:00
David Howells
39ce675575 rxrpc: Trace received connection aborts
Trace received calls that are aborted due to a connection abort, typically
because of authentication failure.  Without this, connection aborts don't
show up in the trace log.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 16:57:23 -07:00
Marc Dionne
f7f1dd3162 afs: Check for rxrpc call completion in wait loop
Check the state of the rxrpc call backing an afs call in each iteration of
the call wait loop in case the rxrpc call has already been terminated at
the rxrpc layer.

Interrupt the wait loop and mark the afs call as complete if the rxrpc
layer call is complete.

There were cases where rxrpc errors were not passed up to afs, which could
result in this loop waiting forever for an afs call to transition to
AFS_CALL_COMPLETE while the rx call was already complete.

Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 16:57:23 -07:00
Marc Dionne
8e8715aaa9 rxrpc: Allow errors to be returned from rxrpc_queue_packet()
Change rxrpc_queue_packet()'s signature so that it can return any error
code it may encounter when trying to send the packet.

This allows the caller to eventually do something in case of error - though
it should be noted that the packet has been queued and a resend is
scheduled.

Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 16:57:23 -07:00
Marc Dionne
4611da30d6 rxrpc: Make rxrpc_kernel_check_life() indicate if call completed
Make rxrpc_kernel_check_life() pass back the life counter through the
argument list and return true if the call has not yet completed.

Suggested-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 16:57:23 -07:00
Marc Dionne
56d282d9db rxrpc: Clear socket error
When an ICMP or ICMPV6 error is received, the error will be attached
to the socket (sk_err) and the report function will get called.
Clear any pending error here by calling sock_error().

This would cause the following attempt to use the socket to fail with
the error code stored by the ICMP error, resulting in unexpected errors
with various side effects depending on the context.

Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Jonathan Billings <jsbillin@umich.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 16:57:23 -07:00
Colin Ian King
1dc2b3d655 qede: fix write to free'd pointer error and double free of ptp
The err2 error return path calls qede_ptp_disable that cleans up
on an error and frees ptp. After this, the free'd ptp is dereferenced
when ptp->clock is set to NULL and the code falls-through to error
path err1 that frees ptp again.

Fix this by calling qede_ptp_disable and exiting via an error
return path that does not set ptp->clock or kfree ptp.

Addresses-Coverity: ("Write to pointer after free")
Fixes: 035744975a ("qede: Add support for PTP resource locking.")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 16:55:47 -07:00
Colin Ian King
0a2c34f18c vxge: fix return of a free'd memblock on a failed dma mapping
Currently if a pci dma mapping failure is detected a free'd
memblock address is returned rather than a NULL (that indicates
an error). Fix this by ensuring NULL is returned on this error case.

Addresses-Coverity: ("Use after free")
Fixes: 528f727279 ("vxge: code cleanup and reorganization")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 16:54:41 -07:00
David S. Miller
8c5a3ca306 Merge branch 'netdevsim-Mostly-cleanup-in-sdev-bpf-iface-area'
Jiri Pirko says:

====================
netdevsim: Mostly cleanup in sdev/bpf iface area

This patches does mainly internal netdevsim code shuffle. Nothing
serious, just small changes to help readability and preparations for
future work.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 16:49:54 -07:00
Jiri Pirko
4b3a84bce4 netdevsim: move sdev-specific init/uninit code into separate functions
In order to improve readability and prepare for future code changes,
move sdev specific init/uninit code into separate functions.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 16:49:54 -07:00
Jiri Pirko
b26b6946a6 netdevsim: make bpf_offload_dev_create() per-sdev instead of first ns
offload dev is stored in sdev struct. However, first netdevsim instance
is used as a priv. Change this to be sdev to as it is shared among
multiple netdevsim instances.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 16:49:54 -07:00
Jiri Pirko
38f58c9723 netdevsim: move sdev specific bpf debugfs files to sdev dir
Some netdevsim bpf debugfs files are per-sdev, yet they are defined per
netdevsim instance. Move them under sdev directory.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 16:49:54 -07:00
Jiri Pirko
af9095f00d netdevsim: move shared dev creation and destruction into separate file
To make code easier to read, move shared dev bits into a separate file.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 16:49:54 -07:00
Ioana Ciornei
3c91d11483 Documentation: net: dsa: transition to the rst format
This patch also performs some minor adjustments such as numbering for
the receive path sequence, conversion of keywords to inline literals and
adding an index page so it looks better in the output of 'make htmldocs'.

Signed-off-by: Ioana Ciornei <ciorneiioana@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 16:48:35 -07:00
Julian Wiedmann
056b21fbe6 net: veth: use generic helper to report timestamping info
For reporting the common set of SW timestamping capabilities, use
ethtool_op_get_ts_info() instead of re-implementing it.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 16:26:37 -07:00
Julian Wiedmann
af730342ec net: loopback: use generic helper to report timestamping info
For reporting the common set of SW timestamping capabilities, use
ethtool_op_get_ts_info() instead of re-implementing it.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 16:26:37 -07:00
Julian Wiedmann
abe9fd5726 net: dummy: use generic helper to report timestamping info
For reporting the common set of SW timestamping capabilities, use
ethtool_op_get_ts_info() instead of re-implementing it.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-04-12 16:26:37 -07:00