Commit Graph

58355 Commits

Author SHA1 Message Date
Andreas Gruenbacher
ce895cf15a gfs2: Remove misleading comments in gfs2_evict_inode
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-05-07 23:39:14 +02:00
Bob Peterson
73118ca8ba gfs2: Replace gl_revokes with a GLF flag
The gl_revokes value determines how many outstanding revokes a glock has
on the superblock revokes list; this is used to avoid unnecessary log
flushes.  However, gl_revokes is only ever tested for being zero, and it's
only decremented in revoke_lo_after_commit, which removes all revokes
from the list, so we know that the gl_revoke values of all the glocks on
the list will reach zero.  Therefore, we can replace gl_revokes with a
bit flag. This saves an atomic counter in struct gfs2_glock.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-05-07 23:39:14 +02:00
Andreas Gruenbacher
9287c6452d gfs2: Fix occasional glock use-after-free
This patch has to do with the life cycle of glocks and buffers.  When
gfs2 metadata or journaled data is queued to be written, a gfs2_bufdata
object is assigned to track the buffer, and that is queued to various
lists, including the glock's gl_ail_list to indicate it's on the active
items list.  Once the page associated with the buffer has been written,
it is removed from the ail list, but its life isn't over until a revoke
has been successfully written.

So after the block is written, its bufdata object is moved from the
glock's gl_ail_list to a file-system-wide list of pending revokes,
sd_log_le_revoke.  At that point the glock still needs to track how many
revokes it contributed to that list (in gl_revokes) so that things like
glock go_sync can ensure all the metadata has been not only written, but
also revoked before the glock is granted to a different node.  This is
to guarantee journal replay doesn't replay the block once the glock has
been granted to another node.

Ross Lagerwall recently discovered a race in which an inode could be
evicted, and its glock freed after its ail list had been synced, but
while it still had unwritten revokes on the sd_log_le_revoke list.  The
evict decremented the glock reference count to zero, which allowed the
glock to be freed.  After the revoke was written, function
revoke_lo_after_commit tried to adjust the glock's gl_revokes counter
and clear its GLF_LFLUSH flag, at which time it referenced the freed
glock.

This patch fixes the problem by incrementing the glock reference count
in gfs2_add_revoke when the glock's first bufdata object is moved from
the glock to the global revokes list. Later, when the glock's last such
bufdata object is freed, the reference count is decremented. This
guarantees that whichever process finishes last (the revoke writing or
the evict) will properly free the glock, and neither will reference the
glock after it has been freed.

Reported-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
2019-05-07 23:39:14 +02:00
Bob Peterson
7c70b89695 gfs2: clean_journal improperly set sd_log_flush_head
This patch fixes regressions in 588bff95c9.
Due to that patch, function clean_journal was setting the value of
sd_log_flush_head, but that's only valid if it is replaying the node's
own journal. If it's replaying another node's journal, that's completely
wrong and will lead to multiple problems. This patch tries to clean up
the mess by passing the value of the logical journal block number into
gfs2_write_log_header so the function can treat non-owned journals
generically. For the local journal, the journal extent map is used for
best performance. For other nodes from other journals, new function
gfs2_lblk_to_dblk is called to figure it out using gfs2_iomap_get.

This patch also tries to establish more consistency when passing journal
block parameters by changing several unsigned int types to a consistent
u32.

Fixes: 588bff95c9 ("GFS2: Reduce code redundancy writing log headers")
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-05-07 23:39:04 +02:00
Ross Lagerwall
7881ef3f33 gfs2: Fix lru_count going negative
Under certain conditions, lru_count may drop below zero resulting in
a large amount of log spam like this:

vmscan: shrink_slab: gfs2_dump_glock+0x3b0/0x630 [gfs2] \
    negative objects to delete nr=-1

This happens as follows:
1) A glock is moved from lru_list to the dispose list and lru_count is
   decremented.
2) The dispose function calls cond_resched() and drops the lru lock.
3) Another thread takes the lru lock and tries to add the same glock to
   lru_list, checking if the glock is on an lru list.
4) It is on a list (actually the dispose list) and so it avoids
   incrementing lru_count.
5) The glock is moved to lru_list.
5) The original thread doesn't dispose it because it has been re-added
   to the lru list but the lru_count has still decreased by one.

Fix by checking if the LRU flag is set on the glock rather than checking
if the glock is on some list and rearrange the code so that the LRU flag
is added/removed precisely when the glock is added/removed from lru_list.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-05-07 22:33:53 +02:00
Andreas Gruenbacher
71921ef859 gfs2: Fix loop in gfs2_rbm_find (v2)
Fix the resource group wrap-around logic in gfs2_rbm_find that commit
e579ed4f44 broke.  The bug can lead to unnecessary repeated scanning of the
same bitmaps; there is a risk that future changes will turn this into an
endless loop.

This is an updated version of commit 2d29f6b96d ("gfs2: Fix loop in
gfs2_rbm_find") which ended up being reverted because it introduced a
performance regression in iozone (see commit e74c98ca2d).  Changes since v1:

 - Simplify the wrap-around logic.

 - Handle the case where each resource group only has a single bitmap block
   (small filesystem).

 - Update rd_extfail_pt whenever we scan the entire bitmap, even when we don't
   start the scan at the very beginning of the bitmap.

Fixes: e579ed4f44 ("GFS2: Introduce rbm field bii")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-05-07 22:33:44 +02:00
Linus Torvalds
b4b52b881c Wimplicit-fallthrough patches for 5.2-rc1
Hi Linus,
 
 This is my very first pull-request.  I've been working full-time as
 a kernel developer for more than two years now. During this time I've
 been fixing bugs reported by Coverity all over the tree and, as part
 of my work, I'm also contributing to the KSPP. My work in the kernel
 community has been supervised by Greg KH and Kees Cook.
 
 OK. So, after the quick introduction above, please, pull the following
 patches that mark switch cases where we are expecting to fall through.
 These patches are part of the ongoing efforts to enable -Wimplicit-fallthrough.
 They have been ignored for a long time (most of them more than 3 months,
 even after pinging multiple times), which is the reason why I've created
 this tree. Most of them have been baking in linux-next for a whole development
 cycle. And with Stephen Rothwell's help, we've had linux-next nag-emails
 going out for newly introduced code that triggers -Wimplicit-fallthrough
 to avoid gaining more of these cases while we work to remove the ones
 that are already present.
 
 I'm happy to let you know that we are getting close to completing this
 work.  Currently, there are only 32 of 2311 of these cases left to be
 addressed in linux-next.  I'm auditing every case; I take a look into
 the code and analyze it in order to determine if I'm dealing with an
 actual bug or a false positive, as explained here:
 
 https://lore.kernel.org/lkml/c2fad584-1705-a5f2-d63c-824e9b96cf50@embeddedor.com/
 
 While working on this, I've found and fixed the following missing
 break/return bugs, some of them introduced more than 5 years ago:
 
 84242b82d8
 7850b51b6c
 5e420fe635
 09186e5034
 b5be853181
 7264235ee7
 cc5034a5d2
 479826cc86
 5340f23df8
 df997abeeb
 2f10d82373
 307b00c5e6
 5d25ff7a54
 a7ed5b3e7d
 c24bfa8f21
 ad0eaee619
 9ba8376ce1
 dc586a60a1
 a8e9b186f1
 4e57562b48
 60747828ea
 c5b974bee9
 cc44ba9116
 2c930e3d0a
 
 Once this work is finish, we'll be able to universally enable
 "-Wimplicit-fallthrough" to avoid any of these kinds of bugs from
 entering the kernel again.
 
 Thanks
 
 Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEkmRahXBSurMIg1YvRwW0y0cG2zEFAlzQR2IACgkQRwW0y0cG
 2zEJbQ//X930OcBtT/9DRW4XL1Jeq0Mjssz/GLX2Vpup5CwwcTROG65no80Zezf/
 yQRWnUjGX0OBv/fmUK32/nTxI/7k7NkmIXJHe0HiEF069GEENB7FT6tfDzIPjU8M
 qQkB8NsSUWJs3IH6BVynb/9MGE1VpGBDbYk7CBZRtRJT1RMM+3kQPucgiZMgUBPo
 Yd9zKwn4i/8tcOCli++EUdQ29ukMoY2R3qpK4LftdX9sXLKZBWNwQbiCwSkjnvJK
 I6FDiA7RaWH2wWGlL7BpN5RrvAXp3z8QN/JZnivIGt4ijtAyxFUL/9KOEgQpBQN2
 6TBRhfTQFM73NCyzLgGLNzvd8awem1rKGSBBUvevaPbgesgM+Of65wmmTQRhFNCt
 A7+e286X1GiK3aNcjUKrByKWm7x590EWmDzmpmICxNPdt5DHQ6EkmvBdNjnxCMrO
 aGA24l78tBN09qN45LR7wtHYuuyR0Jt9bCmeQZmz7+x3ICDHi/+Gw7XPN/eM9+T6
 lZbbINiYUyZVxOqwzkYDCsdv9+kUvu3e4rPs20NERWRpV8FEvBIyMjXAg6NAMTue
 K+ikkyMBxCvyw+NMimHJwtD7ho4FkLPcoeXb2ZGJTRHixiZAEtF1RaQ7dA05Q/SL
 gbSc0DgLZeHlLBT+BSWC2Z8SDnoIhQFXW49OmuACwCUC68NHKps=
 =k30z
 -----END PGP SIGNATURE-----

Merge tag 'Wimplicit-fallthrough-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux

Pull Wimplicit-fallthrough updates from Gustavo A. R. Silva:
 "Mark switch cases where we are expecting to fall through.

  This is part of the ongoing efforts to enable -Wimplicit-fallthrough.

  Most of them have been baking in linux-next for a whole development
  cycle. And with Stephen Rothwell's help, we've had linux-next
  nag-emails going out for newly introduced code that triggers
  -Wimplicit-fallthrough to avoid gaining more of these cases while we
  work to remove the ones that are already present.

  We are getting close to completing this work. Currently, there are
  only 32 of 2311 of these cases left to be addressed in linux-next. I'm
  auditing every case; I take a look into the code and analyze it in
  order to determine if I'm dealing with an actual bug or a false
  positive, as explained here:

      https://lore.kernel.org/lkml/c2fad584-1705-a5f2-d63c-824e9b96cf50@embeddedor.com/

  While working on this, I've found and fixed the several missing
  break/return bugs, some of them introduced more than 5 years ago.

  Once this work is finished, we'll be able to universally enable
  "-Wimplicit-fallthrough" to avoid any of these kinds of bugs from
  entering the kernel again"

* tag 'Wimplicit-fallthrough-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: (27 commits)
  memstick: mark expected switch fall-throughs
  drm/nouveau/nvkm: mark expected switch fall-throughs
  NFC: st21nfca: Fix fall-through warnings
  NFC: pn533: mark expected switch fall-throughs
  block: Mark expected switch fall-throughs
  ASN.1: mark expected switch fall-through
  lib/cmdline.c: mark expected switch fall-throughs
  lib: zstd: Mark expected switch fall-throughs
  scsi: sym53c8xx_2: sym_nvram: Mark expected switch fall-through
  scsi: sym53c8xx_2: sym_hipd: mark expected switch fall-throughs
  scsi: ppa: mark expected switch fall-through
  scsi: osst: mark expected switch fall-throughs
  scsi: lpfc: lpfc_scsi: Mark expected switch fall-throughs
  scsi: lpfc: lpfc_nvme: Mark expected switch fall-through
  scsi: lpfc: lpfc_nportdisc: Mark expected switch fall-through
  scsi: lpfc: lpfc_hbadisc: Mark expected switch fall-throughs
  scsi: lpfc: lpfc_els: Mark expected switch fall-throughs
  scsi: lpfc: lpfc_ct: Mark expected switch fall-throughs
  scsi: imm: mark expected switch fall-throughs
  scsi: csiostor: csio_wr: mark expected switch fall-through
  ...
2019-05-07 12:48:10 -07:00
Linus Torvalds
eac7078a0f pidfd patches for v5.2-rc1
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE7btrcuORLb1XUhEwjrBW1T7ssS0FAlzReuoACgkQjrBW1T7s
 sS1uvBAA16pgnhRNxNTrp3LYft6lUWmF4n0baOTVtQNLhPjpwaOxHIrCBugkQCJB
 QcQ9IQSOvIkaEW0XAQoPBaeLviiKhHOFw1Fv89OtW6xUidSfSV15lcI9f1F2pCm2
 4yCL/8XvL6M0NhxiwftJAkWOXeDNLfjFnLwyLxBfgg3EeyqMgUB8raeosEID0ORR
 gm2/g8DYS2r+KNqM/F4xvMSgabfi2bGk+8BtAaVnftJfstpRNrqKwWnSK3Wspj1l
 5gkb8gSsiY6ns3V6RgNHrFlhevFg8V+VjcJt7FR+aUEjOkcoiXas/PhvamMzdsn/
 FM1F/A0pM8FSybIUClhnnnxNPc+p8ZN/71YQAPs+Mnh3xvbtKea2lkhC+Xv4OpK3
 edutSZWFaiIery82Rk00H3vqiSF1+kRIXSpZSS4mElk4FsVljkyH+nSP7rbmE2MR
 EQe+kKnZl8QzWrVbnODC+EVvvVpA2bXDvENJmvKqus+t2G0OdV7Iku3F5E3KjF8k
 S5RRV1zuBF3ugqnjmYrVmJtpEA8mxClmqvg6okru+qW6ngO5oOgVpPLjWn1CXcdj
 wcuQ6Pe1QwAHS54e9WSWgCHVssLvm9nCdCqypdNaoyGWmbTWntwlrY7Y0JUQnAbB
 6/G/DQQiCWY9y8bMZlTEydhIpgcsdROuPYv+oHF5+eQQthsWwHc=
 =LH11
 -----END PGP SIGNATURE-----

Merge tag 'pidfd-v5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux

Pull pidfd updates from Christian Brauner:
 "This patchset makes it possible to retrieve pidfds at process creation
  time by introducing the new flag CLONE_PIDFD to the clone() system
  call. Linus originally suggested to implement this as a new flag to
  clone() instead of making it a separate system call.

  After a thorough review from Oleg CLONE_PIDFD returns pidfds in the
  parent_tidptr argument. This means we can give back the associated pid
  and the pidfd at the same time. Access to process metadata information
  thus becomes rather trivial.

  As has been agreed, CLONE_PIDFD creates file descriptors based on
  anonymous inodes similar to the new mount api. They are made
  unconditional by this patchset as they are now needed by core kernel
  code (vfs, pidfd) even more than they already were before (timerfd,
  signalfd, io_uring, epoll etc.). The core patchset is rather small.
  The bulky looking changelist is caused by David's very simple changes
  to Kconfig to make anon inodes unconditional.

  A pidfd comes with additional information in fdinfo if the kernel
  supports procfs. The fdinfo file contains the pid of the process in
  the callers pid namespace in the same format as the procfs status
  file, i.e. "Pid:\t%d".

  To remove worries about missing metadata access this patchset comes
  with a sample/test program that illustrates how a combination of
  CLONE_PIDFD and pidfd_send_signal() can be used to gain race-free
  access to process metadata through /proc/<pid>.

  Further work based on this patchset has been done by Joel. His work
  makes pidfds pollable. It finished too late for this merge window. I
  would prefer to have it sitting in linux-next for a while and send it
  for inclusion during the 5.3 merge window"

* tag 'pidfd-v5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
  samples: show race-free pidfd metadata access
  signal: support CLONE_PIDFD with pidfd_send_signal
  clone: add CLONE_PIDFD
  Make anon_inodes unconditional
2019-05-07 12:30:24 -07:00
Linus Torvalds
41bc10cabe stream_open related patches for Linux 5.2
https://lore.kernel.org/linux-fsdevel/CAHk-=wg1tFzcaX2v9Z91vPJiBR486ddW5MtgDL02-fOen2F0Aw@mail.gmail.com/T/#m5b2d9ad3aeacea4bd6aa1964468ac074bf3aa5bf
 -----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCgAuFiEECVWwJCUO7/z+QjZbZsp4hBP2dUkFAlzR1UgQHGtpcnJAbmV4
 ZWRpLmNvbQAKCRBmyniEE/Z1SZBiEACGw1LzUmjV9eBYFjqaUkgX/Zfcu42D4Ek2
 8MuWnNdRabtpGQq0LccYlfoL3yH5xECp14IkCgJvkjqoZ3CcqWcv6uDxf0WtnUqZ
 wPx1RYZykb4RZj2A6/ndhInReP4AlXICyTVulKb+BquVkemMvmXX8k+bkr/msKfT
 9jdKWFIn+ANNABt3y2D7ywZvs9mkxIx+Fti+tVV4BFBeGfUuj4ArZBOHnngRnIk/
 XYlQ7FVzENSPSB+3GvL34jTGEzo8suPHKhHQlIhtcd5hwzVRZKE2sdVXsCc6/WbY
 YnT32gmT1/+cUuDl1mZSiQY5R4Xkb07k6/jNrdmjQpwmWbZu90cuRhb+JBXwnmjZ
 2Wgy3sfwYISDxtePukg1iYePlHlVlGTYqMo3AQrTBs/gEwCKWrsKQb98mRxlf1YK
 e2mdtmq6upYoorLFQesfRgrCg4GTBiPkrR3amXsFgJ2O5fhV6R98ZdGSv4kip19f
 ZNoc/t1EtKGwyAJwjINduv36E3RSHODWwSPtSnmSS1ieCGToY1SI3bVUkFM4C0tO
 5GMdSugHgXRGGVbTd/VftndJm6Wtj8b1j8c/1Vh04Q8qbKKJDRTDzAbK1v8oLaDh
 UXAKMIc8uY4caZy3/bTAB2Ou9dibrSi8Oc+LwZqJlwIcbkwn/IGNvmwtWv4ehorE
 N7EhCFZsFQ==
 =Mavg
 -----END PGP SIGNATURE-----

Merge tag 'stream_open-5.2' of https://lab.nexedi.com/kirr/linux

Pull stream_open conversion from Kirill Smelkov:

 - remove unnecessary double nonseekable_open from drivers/char/dtlk.c
   as noticed by Pavel Machek while reviewing nonseekable_open ->
   stream_open mass conversion.

 - the mass conversion patch promised in commit 10dce8af34 ("fs:
   stream_open - opener for stream-like files so that read and write can
   run simultaneously without deadlock") and is automatically generated
   by running

        $ make coccicheck MODE=patch COCCI=scripts/coccinelle/api/stream_open.cocci

   I've verified each generated change manually - that it is correct to
   convert - and each other nonseekable_open instance left - that it is
   either not correct to convert there, or that it is not converted due
   to current stream_open.cocci limitations. More details on this in the
   patch.

 - finally, change VFS to pass ppos=NULL into .read/.write for files
   that declare themselves streams. It was suggested by Rasmus Villemoes
   and makes sure that if ppos starts to be erroneously used in a stream
   file, such bug won't go unnoticed and will produce an oops instead of
   creating illusion of position change being taken into account.

   Note: this patch does not conflict with "fuse: Add FOPEN_STREAM to
   use stream_open()" that will be hopefully coming via FUSE tree,
   because fs/fuse/ uses new-style .read_iter/.write_iter, and for these
   accessors position is still passed as non-pointer kiocb.ki_pos .

* tag 'stream_open-5.2' of https://lab.nexedi.com/kirr/linux:
  vfs: pass ppos=NULL to .read()/.write() of FMODE_STREAM files
  *: convert stream-like files from nonseekable_open -> stream_open
  dtlk: remove double call to nonseekable_open
2019-05-07 12:15:13 -07:00
Linus Torvalds
aa26690fab Changes for Linux 5.2:
- Fix some more buffer deadlocks when performing an unmount after a hard
   shutdown.
 - Fix some minor space accounting issues.
 - Fix some use after free problems.
 - Make the (undocumented) FITRIM behavior consistent with other filesystems.
 - Embiggen the xfs geometry ioctl's data structure.
 - Introduce a new AG geometry ioctl.
 - Introduce a new online health reporting infrastructure and ioctl for
   userspace to query a filesystem's health status.
 - Enhance online scrub and repair to update the health reports.
 - Reduce thundering herd problems when writeback io completes.
 - Fix some transaction reservation type errors.
 - Fix integer overflow problems with delayed alloc reservation counters.
 - Fix some problems where we would exit to userspace without unlocking.
 - Fix inconsistent behavior when finishing deferred ops fails.
 - Strengthen scrub to check incore data against ondisk metadata.
 - Remove long-broken mntpt mount option.
 - Add an online scrub function for the filesystem summary counters,
   which should make online metadata scrub more or less feature complete
   for now.
 - Various cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAlzMUEwACgkQ+H93GTRK
 tOvvHw//bou5YL/gMrsxCbg2b7rpsNG3TIOz5Kq52V3JMtyqFWzArpCBEskVZXD0
 J4ZXZMv/VSvI2DgV22w8/vlsjPJiODPu5mIqmcyQmZK8eDg4sL7EKVa601F57kyj
 QPrdT1AnxUl+n1gM4XrV57xmsNwLYMVHKQC9e5MS6LSu7+1sarw7HFxSY1AMG9Ys
 skEvwe762LCbMsnBBLCs/ZeqHWlqDok9HiKZNCj35aRrLV9dA97mjznlBFUJbhbw
 kAG2jeAaG0LQnlnCzPRd3HJqQXlGL4044gx73RRY3+/POVYupKiC9KSlImq/cbLd
 n7UWHVDieWoOuLKverUICw4UuqtkAXurUCW7w91ipEmZUlYKNrMNNXiEm7pfgJU3
 A2KK1R14UYKJ3zX6xPz4mdlYhh0KB/xlN01Rdzhrhk9XKfL92/YyjpyjcTIeUZm8
 RNKLAoWRpJZPou3RPpfZLFTSmtYIcTB92kYV6XpQ3DRrJLjlaHbu9VJaipbZGhxY
 rdF+Rtk8EjKMFP0bixDHePWCu7317vMy1lbpO5UipxyC9eTwry54EaCxP03CI7YO
 OAsqCdf8HYlGqEjWKprkCczMYkDRDT0p4bS27Rdzc1D5lUj6/g5hF7RD0MYc1/eA
 ZDQUqVgBTmAQp+tKPTHhuSTWyZ8IIt0kdg5Z8IRVWxd+SmwwGoo=
 =d2sO
 -----END PGP SIGNATURE-----

Merge tag 'xfs-5.2-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs updates from Darrick Wong:
 "Here's a big pile of new stuff for XFS for 5.2. XFS has grown the
  ability to report metadata health status to userspace after online
  fsck checks the filesystem. The online metadata checking code is (I
  really hope) feature complete with the addition of checks for the
  global fs counters, though it'll remain EXPERIMENTAL for now.

  There are also fixes for thundering herds of writeback completions and
  some other deadlocks, fixes for theoretical integer overflow attacks
  on space accounting, and removal of the long-defunct 'mntpt' option
  which was deprecated in the mid-2000s and (it turns out) totally
  broken since 2011 (and nobody complained...).

  Summary:

   - Fix some more buffer deadlocks when performing an unmount after a
     hard shutdown.

   - Fix some minor space accounting issues.

   - Fix some use after free problems.

   - Make the (undocumented) FITRIM behavior consistent with other
     filesystems.

   - Embiggen the xfs geometry ioctl's data structure.

   - Introduce a new AG geometry ioctl.

   - Introduce a new online health reporting infrastructure and ioctl
     for userspace to query a filesystem's health status.

   - Enhance online scrub and repair to update the health reports.

   - Reduce thundering herd problems when writeback io completes.

   - Fix some transaction reservation type errors.

   - Fix integer overflow problems with delayed alloc reservation
     counters.

   - Fix some problems where we would exit to userspace without
     unlocking.

   - Fix inconsistent behavior when finishing deferred ops fails.

   - Strengthen scrub to check incore data against ondisk metadata.

   - Remove long-broken mntpt mount option.

   - Add an online scrub function for the filesystem summary counters,
     which should make online metadata scrub more or less feature
     complete for now.

   - Various cleanups"

* tag 'xfs-5.2-merge-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (38 commits)
  xfs: change some error-less functions to void types
  xfs: add online scrub for superblock counters
  xfs: don't parse the mtpt mount option
  xfs: always rejoin held resources during defer roll
  xfs: add missing error check in xfs_prepare_shift()
  xfs: scrub should check incore counters against ondisk headers
  xfs: allow scrubbers to pause background reclaim
  xfs: rename the speculative block allocation reclaim toggle functions
  xfs: track delayed allocation reservations across the filesystem
  xfs: fix broken bhold behavior in xrep_roll_ag_trans
  xfs: unlock inode when xfs_ioctl_setattr_get_trans can't get transaction
  xfs: kill the xfs_dqtrx_t typedef
  xfs: widen inode delalloc block counter to 64-bits
  xfs: widen quota block counters to 64-bit integers
  xfs: abort unaligned nowait directio early
  xfs: assert that we don't enter agfl freeing with a non-permanent transaction
  xfs: make tr_growdata a permanent transaction
  xfs: merge adjacent io completions of the same type
  xfs: remove unused m_data_workqueue
  xfs: implement per-inode writeback completion queues
  ...
2019-05-07 11:46:56 -07:00
Linus Torvalds
d8456eaf31 Changes for Linux 5.2:
- Add some extra hooks to the iomap buffered write path to enable gfs2
   journalled writes.
 - SPDX conversion
 - Various refactoring.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEUzaAxoMeQq6m2jMV+H93GTRKtOsFAlzMUJsACgkQ+H93GTRK
 tOsyHxAAjnAAO2ABOt2x9fdsZbuc/3Ox1C0388J21uUOm6lgtKCFm/snVmvC7BMa
 t9bFOS8Y7RLgHclCkEHy0irsHVVuQl+6XyYrjFaPzkoRnVgViZM5aZGSNkBRiBEM
 xVAog5IFLTx59NT41B4pn9y361BFwfHiFRsDgtSVNlv8UsbKdpAMBMX9ezjNLgWI
 H5qJZXfzk5LyNG/jsOe+srwVXsboILvPAiDNP95g2KzrXZMvnf8MsMvAe9cSO9SD
 ERHn9nX5b4hiwiL12lCl10QOsROmElzP82GHJBctFDzdfOfSRuRZw69lFSzf/2CT
 xVypJBm7xVBJ7K50x8KlF1aSLqnxHi/wszS6BaowoMtkPbJRx+FC7M8FCnNr5WtF
 DxJduFBUchbNKt1o2x98Evoqjx6eVp92XsCdjsJ05LQo7cxlwECfYhjruwyg/h16
 qdE+6KmUwOOiqMQ6Z8kvrejpuIq2rcHlJydDojN+lIbbzmtge8ob/q/A1J8FT6k9
 pzVW3y1h7yvgi0ClaQu2DCfx2is2Bd4y0w1b/Y/0jkV9aVbtPqt0akqcaLtAUgc9
 25CkJL0sc7QB88Kd3sP9k4aGlQ2TAx52+3TWDo+CBbfPHBTMDWnGB1nE742WLO4v
 neH+wSzLP/6U9JkxyRpiYHD+6zLAzq2xTZeiRSXYuzrRqVxaurY=
 =Qumg
 -----END PGP SIGNATURE-----

Merge tag 'iomap-5.2-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull iomap updates from Darrick Wong:
 "Nothing particularly exciting here, just adding some callouts for gfs2
  and cleaning a few things.

  Summary:

   - Add some extra hooks to the iomap buffered write path to enable
     gfs2 journalled writes

   - SPDX conversion

   - Various refactoring"

* tag 'iomap-5.2-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  iomap: move iomap_read_inline_data around
  iomap: Add a page_prepare callback
  iomap: Fix use-after-free error in page_done callback
  fs: Turn __generic_write_end into a void function
  iomap: Clean up __generic_write_end calling
  iomap: convert to SPDX identifier
2019-05-07 11:43:32 -07:00
Linus Torvalds
b8cac3cd24 Several minor jfs fixes
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEIodevzQLVs53l6BhNqiEXrVAjGQFAlzQUCsACgkQNqiEXrVA
 jGTXtA/9GJHwg22NvyVxGWF4GLLamFPqgiyaj/XhNw2+2BkK4I60uIPI9QucKG8C
 RWMhMI3ZZH6dxiGitPQ4hLncEVcTTcHNEwhGcgT85M4tOs+g7mVf03+X0xrveXVB
 GMFdf2ETWE80KkIUHaITAHBm/WU7FZG81RQg8IYr8aIqxg3Ey5sowU0vVg54sn61
 jNV2h/UFa4VnPX4o+5GbKZ8gZylBoYDLV9WPlD38BRa10eZDjVcDeASBefvTqQO0
 n/jBzqkWGMPqj2juKXL1MX2Zr+LnUL9An63Ak7EX95slVjiMmncffVJfyY/Sewoa
 hTGIck19NABJsyO83BJqrF0C/c6QbgeRkZS8yZjIVZYF7RbyjQD18TqCZFJMoVpV
 xyZ2dUg5DE2nFSLzIXE1JprPuOgNkY87vwO/PAO1jka7LPk2qQZUKGL3+Xw4At6v
 XNohTbQKhfqYgeMjr2H9aAw53SnAXiHwCaje+mK26WZsT3KqgN4BbtWKEBcYrSYz
 J1ivB7mpNhYwbizUWFZT5znA/ItqwN8YbCazGlxYFOhXgFWUbUXFds250w8F6Iy7
 jp3RkEB8HXuFoFtyVjLTYnoM87l37fTQSTrkl5k5CsQ3kges7v6nxGSaZUHE1Iho
 MQZr151zKQrcmKwjKGxVFMKFTU94x8TYALqYO6iDbmu8rra7nWw=
 =M5Gm
 -----END PGP SIGNATURE-----

Merge tag 'jfs-5.2' of git://github.com/kleikamp/linux-shaggy

Pull jfs updates from Dave Kleikamp:
 "Several minor jfs fixes"

* tag 'jfs-5.2' of git://github.com/kleikamp/linux-shaggy:
  jfs: fix bogus variable self-initialization
  fs/jfs: Switch to use new generic UUID API
  jfs: compare old and new mode before setting update_mode flag
  jfs: remove incorrect comment in jfs_superblock
  jfs: fix spelling mistake, EACCESS -> EACCES
2019-05-07 11:37:27 -07:00
Linus Torvalds
9f2e3a53f7 for-5.2-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAlzQM7MACgkQxWXV+ddt
 WDvrVw/+K0AElSuEfDFWd9HBqRAPlGaEP71xCGGle1tkzuY0DJVIBRZ72q8UR0YP
 7yke7DU0oqXekGype83eTJUjDSLoOXrlVoQ+VqBdFteDk0W4BCG6Nw+N+wYBF7An
 gXRXlGFaYzb2CqqjG92FbtkfxBzISR0XBCQBUN9CBqHNDu1EUQSbnTBkmTMN8MYh
 PCoo37S6e5fR36uB/rOKbGNBJjsZEEg/2G6DprP52+eiQWV2h0avEUJrvv6xC4so
 97QNgUNuuiUmyurqcYHdlaflZwIhuf5nQeNeu/UvMZmmRnBHPhSP7YPM7f7FftwA
 y0d0p+AiEAO0he8nGFb5C6Avs4vuv1u65o1NbF5fqnmAyt+KXWem3LeG6etsXgU8
 +eITgprJD3sNBMDLbLoA+wlhTps+w9tukVF5Zp2a8KgQLMMEyAYqUDWmSHvnO2Me
 RCNPZLzeGXETgKun0WuMtl/CX2iBDnc0Kq5O6ks2ORl2TH6bg5lgEIwr6HP/Ewoy
 w8twsmCOltrxiIptqyQHYD+kvNwqMVV9LSOQ8+EjbYd6BHsfjHjKObOBkhmJ7iqz
 4MAIcZU++F9DLRv92H1kUYVNhAMCdXkEIWyxhZPwN1lUi5k9AhknY3FbheNc7ldl
 LNPIgRxamWCq9oBmzfOcJ3eFOBtNN02fgA1GTXGd1/AgAilEep8=
 =fEkD
 -----END PGP SIGNATURE-----

Merge tag 'for-5.2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs updates from David Sterba:
 "This time the majority of changes are cleanups, though there's still a
  number of changes of user interest.

  User visible changes:

   - better read time and write checks to catch errors early and before
     writing data to disk (to catch potential memory corruption on data
     that get checksummed)

   - qgroups + metadata relocation: last speed up patch int the series
     to address the slowness, there should be no overhead comparing
     balance with and without qgroups

   - FIEMAP ioctl does not start a transaction unnecessarily, this can
     result in a speed up and less blocking due to IO

   - LOGICAL_INO (v1, v2) does not start transaction unnecessarily, this
     can speed up the mentioned ioctl and scrub as well

   - fsync on files with many (but not too many) hardlinks is faster,
     finer decision if the links should be fsynced individually or
     completely

   - send tries harder to find ranges to clone

   - trim/discard will skip unallocated chunks that haven't been touched
     since the last mount

  Fixes:

   - send flushes delayed allocation before start, otherwise it could
     miss some changes in case of a very recent rw->ro switch of a
     subvolume

   - fix fallocate with qgroups that could lead to space accounting
     underflow, reported as a warning

   - trim/discard ioctl honours the requested range

   - starting send and dedupe on a subvolume at the same time will let
     only one of them succeed, this is to prevent changes that send
     could miss due to dedupe; both operations are restartable

  Core changes:

   - more tree-checker validations, errors reported by fuzzing tools:
      - device item
      - inode item
      - block group profiles

   - tracepoints for extent buffer locking

   - async cow preallocates memory to avoid errors happening too deep in
     the call chain

   - metadata reservations for delalloc reworked to better adapt in
     many-writers/low-space scenarios

   - improved space flushing logic for intense DIO vs buffered workloads

   - lots of cleanups
      - removed unused struct members
      - redundant argument removal
      - properties and xattrs
      - extent buffer locking
      - selftests
      - use common file type conversions
      - many-argument functions reduction"

* tag 'for-5.2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (227 commits)
  btrfs: Use kvmalloc for allocating compressed path context
  btrfs: Factor out common extent locking code in submit_compressed_extents
  btrfs: Set io_tree only once in submit_compressed_extents
  btrfs: Replace clear_extent_bit with unlock_extent
  btrfs: Make compress_file_range take only struct async_chunk
  btrfs: Remove fs_info from struct async_chunk
  btrfs: Rename async_cow to async_chunk
  btrfs: Preallocate chunks in cow_file_range_async
  btrfs: reserve delalloc metadata differently
  btrfs: track DIO bytes in flight
  btrfs: merge calls of btrfs_setxattr and btrfs_setxattr_trans in btrfs_set_prop
  btrfs: delete unused function btrfs_set_prop_trans
  btrfs: start transaction in xattr_handler_set_prop
  btrfs: drop local copy of inode i_mode
  btrfs: drop old_fsflags in btrfs_ioctl_setflags
  btrfs: modify local copy of btrfs_inode flags
  btrfs: drop useless inode i_flags copy and restore
  btrfs: start transaction in btrfs_ioctl_setflags()
  btrfs: export btrfs_set_prop
  btrfs: refactor btrfs_set_props to validate externally
  ...
2019-05-07 11:34:19 -07:00
Linus Torvalds
78438ce18f Merge branch 'stable-fodder' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs stable fodder fixes from Al Viro:

 - acct_on() fix for deadlock caught by overlayfs folks

 - autofs RCU use-after-free SNAFU (->d_manage() can be called
   locklessly, so we need to RCU-delay freeing the objects it looks at)

 - (hopefully) the end of "do we need freeing this dentry RCU-delayed"
   whack-a-mole.

* 'stable-fodder' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  autofs: fix use-after-free in lockless ->d_manage()
  dcache: sort the freeing-without-RCU-delay mess for good.
  acct_on(): don't mess with freeze protection
2019-05-07 11:17:26 -07:00
Linus Torvalds
168e153d5e Merge branch 'work.icache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs inode freeing updates from Al Viro:
 "Introduction of separate method for RCU-delayed part of
  ->destroy_inode() (if any).

  Pretty much as posted, except that destroy_inode() stashes
  ->free_inode into the victim (anon-unioned with ->i_fops) before
  scheduling i_callback() and the last two patches (sockfs conversion
  and folding struct socket_wq into struct socket) are excluded - that
  pair should go through netdev once davem reopens his tree"

* 'work.icache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (58 commits)
  orangefs: make use of ->free_inode()
  shmem: make use of ->free_inode()
  hugetlb: make use of ->free_inode()
  overlayfs: make use of ->free_inode()
  jfs: switch to ->free_inode()
  fuse: switch to ->free_inode()
  ext4: make use of ->free_inode()
  ecryptfs: make use of ->free_inode()
  ceph: use ->free_inode()
  btrfs: use ->free_inode()
  afs: switch to use of ->free_inode()
  dax: make use of ->free_inode()
  ntfs: switch to ->free_inode()
  securityfs: switch to ->free_inode()
  apparmor: switch to ->free_inode()
  rpcpipe: switch to ->free_inode()
  bpf: switch to ->free_inode()
  mqueue: switch to ->free_inode()
  ufs: switch to ->free_inode()
  coda: switch to ->free_inode()
  ...
2019-05-07 10:57:05 -07:00
Linus Torvalds
0968621917 Printk changes for 5.2
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEESH4wyp42V4tXvYsjUqAMR0iAlPIFAlzP8nQACgkQUqAMR0iA
 lPK79A/+NkRouqA9ihAZhUbgW0DHzOAFvUJSBgX11HQAZbGjngakuoyYFvwUx0T0
 m80SUTCysxQrWl+xLdccPZ9ZrhP2KFQrEBEdeYHZ6ymcYcl83+3bOIBS7VwdZAbO
 EzB8u/58uU/sI6ABL4lF7ZF/+R+U4CXveEUoVUF04bxdPOxZkRX4PT8u3DzCc+RK
 r4yhwQUXGcKrHa2GrRL3GXKsDxcnRdFef/nzq4RFSZsi0bpskzEj34WrvctV6j+k
 FH/R3kEcZrtKIMPOCoDMMWq07yNqK/QKj0MJlGoAlwfK4INgcrSXLOx+pAmr6BNq
 uMKpkxCFhnkZVKgA/GbKEGzFf+ZGz9+2trSFka9LD2Ig6DIstwXqpAgiUK8JFQYj
 lq1mTaJZD3DfF2vnGHGeAfBFG3XETv+mIT/ow6BcZi3NyNSVIaqa5GAR+lMc6xkR
 waNkcMDkzLFuP1r0p7ZizXOksk9dFkMP3M6KqJomRtApwbSNmtt+O2jvyLPvB3+w
 wRyN9WT7IJZYo4v0rrD5Bl6BjV15ZeCPRSFZRYofX+vhcqJQsFX1M9DeoNqokh55
 Cri8f6MxGzBVjE1G70y2/cAFFvKEKJud0NUIMEuIbcy+xNrEAWPF8JhiwpKKnU10
 c0u674iqHJ2HeVsYWZF0zqzqQ6E1Idhg/PrXfuVuhAaL5jIOnYY=
 =WZfC
 -----END PGP SIGNATURE-----

Merge tag 'printk-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk

Pull printk updates from Petr Mladek:

 - Allow state reset of printk_once() calls.

 - Prevent crashes when dereferencing invalid pointers in vsprintf().
   Only the first byte is checked for simplicity.

 - Make vsprintf warnings consistent and inlined.

 - Treewide conversion of obsolete %pf, %pF to %ps, %pF printf
   modifiers.

 - Some clean up of vsprintf and test_printf code.

* tag 'printk-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
  lib/vsprintf: Make function pointer_string static
  vsprintf: Limit the length of inlined error messages
  vsprintf: Avoid confusion between invalid address and value
  vsprintf: Prevent crash when dereferencing invalid pointers
  vsprintf: Consolidate handling of unknown pointer specifiers
  vsprintf: Factor out %pO handler as kobject_string()
  vsprintf: Factor out %pV handler as va_format()
  vsprintf: Factor out %p[iI] handler as ip_addr_string()
  vsprintf: Do not check address of well-known strings
  vsprintf: Consistent %pK handling for kptr_restrict == 0
  vsprintf: Shuffle restricted_pointer()
  printk: Tie printk_once / printk_deferred_once into .data.once for reset
  treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively
  lib/test_printf: Switch to bitmap_zalloc()
2019-05-07 09:18:12 -07:00
Linus Torvalds
81ff5d2cba Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
 "API:
   - Add support for AEAD in simd
   - Add fuzz testing to testmgr
   - Add panic_on_fail module parameter to testmgr
   - Use per-CPU struct instead multiple variables in scompress
   - Change verify API for akcipher

  Algorithms:
   - Convert x86 AEAD algorithms over to simd
   - Forbid 2-key 3DES in FIPS mode
   - Add EC-RDSA (GOST 34.10) algorithm

  Drivers:
   - Set output IV with ctr-aes in crypto4xx
   - Set output IV in rockchip
   - Fix potential length overflow with hashing in sun4i-ss
   - Fix computation error with ctr in vmx
   - Add SM4 protected keys support in ccree
   - Remove long-broken mxc-scc driver
   - Add rfc4106(gcm(aes)) cipher support in cavium/nitrox"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (179 commits)
  crypto: ccree - use a proper le32 type for le32 val
  crypto: ccree - remove set but not used variable 'du_size'
  crypto: ccree - Make cc_sec_disable static
  crypto: ccree - fix spelling mistake "protedcted" -> "protected"
  crypto: caam/qi2 - generate hash keys in-place
  crypto: caam/qi2 - fix DMA mapping of stack memory
  crypto: caam/qi2 - fix zero-length buffer DMA mapping
  crypto: stm32/cryp - update to return iv_out
  crypto: stm32/cryp - remove request mutex protection
  crypto: stm32/cryp - add weak key check for DES
  crypto: atmel - remove set but not used variable 'alg_name'
  crypto: picoxcell - Use dev_get_drvdata()
  crypto: crypto4xx - get rid of redundant using_sd variable
  crypto: crypto4xx - use sync skcipher for fallback
  crypto: crypto4xx - fix cfb and ofb "overran dst buffer" issues
  crypto: crypto4xx - fix ctr-aes missing output IV
  crypto: ecrdsa - select ASN1 and OID_REGISTRY for EC-RDSA
  crypto: ux500 - use ccflags-y instead of CFLAGS_<basename>.o
  crypto: ccree - handle tee fips error during power management resume
  crypto: ccree - add function to handle cryptocell tee fips error
  ...
2019-05-06 20:15:06 -07:00
Linus Torvalds
2c6a392cdd Merge branch 'core-stacktrace-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull stack trace updates from Ingo Molnar:
 "So Thomas looked at the stacktrace code recently and noticed a few
  weirdnesses, and we all know how such stories of crummy kernel code
  meeting German engineering perfection end: a 45-patch series to clean
  it all up! :-)

  Here's the changes in Thomas's words:

   'Struct stack_trace is a sinkhole for input and output parameters
    which is largely pointless for most usage sites. In fact if embedded
    into other data structures it creates indirections and extra storage
    overhead for no benefit.

    Looking at all usage sites makes it clear that they just require an
    interface which is based on a storage array. That array is either on
    stack, global or embedded into some other data structure.

    Some of the stack depot usage sites are outright wrong, but
    fortunately the wrongness just causes more stack being used for
    nothing and does not have functional impact.

    Another oddity is the inconsistent termination of the stack trace
    with ULONG_MAX. It's pointless as the number of entries is what
    determines the length of the stored trace. In fact quite some call
    sites remove the ULONG_MAX marker afterwards with or without nasty
    comments about it. Not all architectures do that and those which do,
    do it inconsistenly either conditional on nr_entries == 0 or
    unconditionally.

    The following series cleans that up by:

      1) Removing the ULONG_MAX termination in the architecture code

      2) Removing the ULONG_MAX fixups at the call sites

      3) Providing plain storage array based interfaces for stacktrace
         and stackdepot.

      4) Cleaning up the mess at the callsites including some related
         cleanups.

      5) Removing the struct stack_trace based interfaces

    This is not changing the struct stack_trace interfaces at the
    architecture level, but it removes the exposure to the generic
    code'"

* 'core-stacktrace-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits)
  x86/stacktrace: Use common infrastructure
  stacktrace: Provide common infrastructure
  lib/stackdepot: Remove obsolete functions
  stacktrace: Remove obsolete functions
  livepatch: Simplify stack trace retrieval
  tracing: Remove the last struct stack_trace usage
  tracing: Simplify stack trace retrieval
  tracing: Make ftrace_trace_userstack() static and conditional
  tracing: Use percpu stack trace buffer more intelligently
  tracing: Simplify stacktrace retrieval in histograms
  lockdep: Simplify stack trace handling
  lockdep: Remove save argument from check_prev_add()
  lockdep: Remove unused trace argument from print_circular_bug()
  drm: Simplify stacktrace handling
  dm persistent data: Simplify stack trace handling
  dm bufio: Simplify stack trace retrieval
  btrfs: ref-verify: Simplify stack trace retrieval
  dma/debug: Simplify stracktrace retrieval
  fault-inject: Simplify stacktrace retrieval
  mm/page_owner: Simplify stack trace handling
  ...
2019-05-06 13:11:48 -07:00
Kirill Smelkov
438ab720c6 vfs: pass ppos=NULL to .read()/.write() of FMODE_STREAM files
This amends commit 10dce8af34 ("fs: stream_open - opener for
stream-like files so that read and write can run simultaneously without
deadlock") in how position is passed into .read()/.write() handler for
stream-like files:

Rasmus noticed that we currently pass 0 as position and ignore any position
change if that is done by a file implementation. This papers over bugs if ppos
is used in files that declare themselves as being stream-like as such bugs will
go unnoticed. Even if a file implementation is correctly converted into using
stream_open, its read/write later could be changed to use ppos and even though
that won't be working correctly, that bug might go unnoticed without someone
doing wrong behaviour analysis. It is thus better to pass ppos=NULL into
read/write for stream-like files as that don't give any chance for ppos usage
bugs because it will oops if ppos is ever used inside .read() or .write().

Note 1: rw_verify_area, new_sync_{read,write} needs to be updated
because they are called by vfs_read/vfs_write & friends before
file_operations .read/.write .

Note 2: if file backend uses new-style .read_iter/.write_iter, position
is still passed into there as non-pointer kiocb.ki_pos . Currently
stream_open.cocci (semantic patch added by 10dce8af34) ignores files
whose file_operations has *_iter methods.

Suggested-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
2019-05-06 17:46:52 +03:00
Linus Torvalds
51987affd6 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:

 - a couple of ->i_link use-after-free fixes

 - regression fix for wrong errno on absent device name in mount(2)
   (this cycle stuff)

 - ancient UFS braino in large GID handling on Solaris UFS images (bogus
   cut'n'paste from large UID handling; wrong field checked to decide
   whether we should look at old (16bit) or new (32bit) field)

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  ufs: fix braino in ufs_get_inode_gid() for solaris UFS flavour
  Abort file_remove_privs() for non-reg. files
  [fix] get rid of checking for absent device name in vfs_get_tree()
  apparmorfs: fix use-after-free on symlink traversal
  securityfs: fix use-after-free on symlink traversal
2019-05-05 09:28:45 -07:00
Linus Torvalds
5ce3307b6d for-linus-20190502
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAlzLBg4QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpsG8EACHxeAgDY3j+ZSLqn9sCFzvDo+hc9q2x64o
 ydscoxcdRSZYAU1dP6bnjQBFZbwVHaDzYG11L9aPwlvbsxis9/XWASc/gCo8sumN
 itmk3jKW5zYXlLw57ojh2+/UbPKnO5fKJcA66AakDGTopp3nYZQvov1kUJKOwTiz
 Ir48vEwST4DU9szkfGGeKOGG735veU/yarUNyADwc+CIigBcIa/A/bZtY8P4xV5d
 6bSLnMayehntZaHpVUPwDqOgP7+nSqIzQ88BPAKQnSzDat/SK4C0pgIX9L2nS1/4
 7fBpg3/HN0sFgUuhAG/3ILD82yD3CWDGbt2xif2CP0ylvVOMB6d/ObDAIxwTBKVd
 MwJ3U0RK+Ph2rvaglp6ifk3YgS0Nzt4N5F9AVmp32RR3QpqzJ2pLN4PUjbdWaEbr
 lugu6uengn0flu90mesBazrVNrZkrZ2w859tq9TNWPYpdIthKrvNn44J1bXx+yUj
 F1d+Bf0Sxxnc1b8hEKn9KOXPRoIY07hd/wUi2a/F4RVGSFa2b9CnaengqXIDzQNC
 duEErt+Pd2xlBA4WhPlKSI2PmrWJvpDr7iuAiqdC8cC69hpF4F2/yWCiL+JYI4pF
 v0zPn214KZ2rg9XZfb0Me8ff1BNvmXJMKzvecX2mOcclcO97PFVkqJhcx/75JE/6
 ooQiTofl+w==
 =KjZ4
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20190502' of git://git.kernel.dk/linux-block

Pull io_uring fixes from Jens Axboe:
 "This is mostly io_uring fixes/tweaks. Most of these were actually done
  in time for the last -rc, but I wanted to ensure that everything
  tested out great before including them. The code delta looks larger
  than it really is, as it's mostly just comment additions/changes.

  Outside of the comment additions/changes, this is mostly removal of
  unnecessary barriers. In all, this pull request contains:

   - Tweak to how we handle errors at submission time. We now post a
     completion event if the error occurs on behalf of an sqe, instead
     of returning it through the system call. If the error happens
     outside of a specific sqe, we return the error through the system
     call. This makes it nicer to use and makes the "normal" use case
     behave the same as the offload cases. (me)

   - Fix for a missing req reference drop from async context (me)

   - If an sqe is submitted with RWF_NOWAIT, don't punt it to async
     context. Return -EAGAIN directly, instead of using it as a hint to
     do async punt. (Stefan)

   - Fix notes on barriers (Stefan)

   - Remove unnecessary barriers (Stefan)

   - Fix potential double free of memory in setup error (Mark)

   - Further improve sq poll CPU validation (Mark)

   - Fix page allocation warning and leak on buffer registration error
     (Mark)

   - Fix iov_iter_type() for new no-ref flag (Ming)

   - Fix a case where dio doesn't honor bio no-page-ref (Ming)"

* tag 'for-linus-20190502' of git://git.kernel.dk/linux-block:
  io_uring: avoid page allocation warnings
  iov_iter: fix iov_iter_type
  block: fix handling for BIO_NO_PAGE_REF
  io_uring: drop req submit reference always in async punt
  io_uring: free allocated io_memory once
  io_uring: fix SQPOLL cpu validation
  io_uring: have submission side sqe errors post a cqe
  io_uring: remove unnecessary barrier after unsetting IORING_SQ_NEED_WAKEUP
  io_uring: remove unnecessary barrier after incrementing dropped counter
  io_uring: remove unnecessary barrier before reading SQ tail
  io_uring: remove unnecessary barrier after updating SQ head
  io_uring: remove unnecessary barrier before reading cq head
  io_uring: remove unnecessary barrier before wq_has_sleeper
  io_uring: fix notes on barriers
  io_uring: fix handling SQEs requesting NOWAIT
2019-05-02 09:55:04 -07:00
Nikolay Borisov
b1c16ac978 btrfs: Use kvmalloc for allocating compressed path context
Recent refactoring of cow_file_range_async means it's now possible to
request a rather large physically contiguous memory via kmalloc. The
size is dependent on the number of 512k chunks that the compressed range
consists of. David reported multiple OOM messages on such large
allocations. Fix it by switching to using kvmalloc.

Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-05-02 13:48:19 +02:00
Nikolay Borisov
7447555fe7 btrfs: Factor out common extent locking code in submit_compressed_extents
Irrespective of whether the compress code fell back to uncompressed or
a compressed extent has to be submitted, the extent range is always
locked. So factor out the common lock_extent call at the beginning of
the loop. No functional changes just removes one duplicate lock_extent
call.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-05-02 13:48:19 +02:00
Nikolay Borisov
4336650aff btrfs: Set io_tree only once in submit_compressed_extents
The inode never changes so it's sufficient to dereference it and get
the iotree only once, before the execution of the main loop. No
functional changes, only the size of the function is decreased:

add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-44 (-44)
Function                                     old     new   delta
submit_compressed_extents                   1240    1196     -44
Total: Before=88476, After=88432, chg -0.05%

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-05-02 13:48:19 +02:00
Nikolay Borisov
69684c5a88 btrfs: Replace clear_extent_bit with unlock_extent
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-05-02 13:48:19 +02:00
Nikolay Borisov
1368c6dac7 btrfs: Make compress_file_range take only struct async_chunk
All context this function needs is held within struct async_chunk.
Currently we not only pass the struct but also every individual member.
This is redundant, simplify it by only passing struct async_chunk and
leaving it to compress_file_range to extract the values it requires.
No functional changes.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-05-02 13:48:19 +02:00
Nikolay Borisov
c5a68aec4e btrfs: Remove fs_info from struct async_chunk
The associated btrfs_work already contains a reference to the fs_info so
use that instead of passing it via async_chunk. No functional changes.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-05-02 13:48:19 +02:00
Nikolay Borisov
b5326271e7 btrfs: Rename async_cow to async_chunk
Now that we have an explicit async_chunk struct rename references to
variables of this type to async_chunk. No functional changes.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-05-02 13:48:18 +02:00
Nikolay Borisov
97db120451 btrfs: Preallocate chunks in cow_file_range_async
This commit changes the implementation of cow_file_range_async in order
to get rid of the BUG_ON in the middle of the loop. Additionally it
reworks the inner loop in the hopes of making it more understandable.

The idea is to make async_cow be a top-level structured, shared amongst
all chunks being sent for compression. This allows to perform one memory
allocation at the beginning and gracefully fail the IO if there isn't
enough memory. Now, each chunk is going to be described by an
async_chunk struct. It's the responsibility of the final chunk
to actually free the memory.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-05-02 13:48:18 +02:00
Josef Bacik
c8eaeac7b7 btrfs: reserve delalloc metadata differently
With the per-inode block reserves we started refilling the reserve based
on the calculated size of the outstanding csum bytes and extents for the
inode, including the amount we were adding with the new operation.

However, generic/224 exposed a problem with this approach.  With 1000
files all writing at the same time we ended up with a bunch of bytes
being reserved but unusable.

When you write to a file we reserve space for the csum leaves for those
bytes, the number of extent items required to cover those bytes, and a
single transaction item for updating the inode at ordered extent finish
for that range of bytes.  This is held until the ordered extent finishes
and we release all of the reserved space.

If a second write comes in at this point we would add a single
reservation for the new outstanding extent and however many reservations
for the csum leaves.  At this point we find the delta of how much we
have reserved and how much outstanding size this is and attempt to
reserve this delta.  If the first write finishes it will not release any
space, because the space it had reserved for the initial write is still
needed for the second write.  However some space would have been used,
as we have added csums, extent items, and dirtied the inode.  Our
reserved space would be > 0 but less than the total needed reserved
space.

This is just for a single inode, now consider generic/224.  This has
1000 inodes writing in parallel to a very small file system, 1GiB.  In
my testing this usually means we get about a 120MiB metadata area to
work with, more than enough to allow the writes to continue, but not
enough if all of the inodes are stuck trying to reserve the slack space
while continuing to hold their leftovers from their initial writes.

Fix this by pre-reserved _only_ for the space we are currently trying to
add.  Then once that is successful modify our inodes csum count and
outstanding extents, and then add the newly reserved space to the inodes
block_rsv.  This allows us to actually pass generic/224 without running
out of metadata space.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-05-02 13:47:12 +02:00
Al Viro
4e9036042f ufs: fix braino in ufs_get_inode_gid() for solaris UFS flavour
To choose whether to pick the GID from the old (16bit) or new (32bit)
field, we should check if the old gid field is set to 0xffff.  Mainline
checks the old *UID* field instead - cut'n'paste from the corresponding
code in ufs_get_inode_uid().

Fixes: 252e211e90
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-02 02:24:50 -04:00
Eric Sandeen
910832697c xfs: change some error-less functions to void types
There are several functions which have no opportunity to return
an error, and don't contain any ASSERTs which could be argued
to be better constructed as error cases.  So, make them voids
to simplify the callers.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
2019-05-01 20:26:30 -07:00
Christoph Hellwig
cbbf4c0be8 iomap: move iomap_read_inline_data around
iomap_read_inline_data ended up being placed in the middle of the bio
based read I/O completion handling, which tends to confuse the heck out
of me whenever I follow the code.  Move it to a more suitable place.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
2019-05-01 20:16:40 -07:00
Al Viro
f276ae0dd6 orangefs: make use of ->free_inode()
Acked-by: Mike Marshall <hubcap@omnibond.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01 22:43:27 -04:00
Al Viro
b62de32257 hugetlb: make use of ->free_inode()
moving synchronous parts of ->destroy_inode() to ->evict_inode() is
not possible here - they are balancing the stuff done in ->alloc_inode(),
not the things acquired while using it or sanity checks.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01 22:43:27 -04:00
Al Viro
0b269ded4e overlayfs: make use of ->free_inode()
synchronous parts are left in ->destroy_inode()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01 22:43:27 -04:00
Al Viro
b3b4a6e356 jfs: switch to ->free_inode()
synchronous part can be moved to ->evict_inode(), the rest -
->free_inode() fodder

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01 22:43:26 -04:00
Al Viro
9baf28bbfe fuse: switch to ->free_inode()
fuse_destroy_inode() is gone - sanity checks that need the stack
trace of the caller get moved into ->evict_inode(), the rest joins
the RCU-delayed part which becomes ->free_inode().

While we are at it, don't just pass the address of what happens
to be the first member of structure to kmem_cache_free() -
get_fuse_inode() is there for purpose and it gives the proper
container_of() use.  No behaviour change, but verifying correctness
is easier that way.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01 22:43:26 -04:00
Al Viro
94053139d4 ext4: make use of ->free_inode()
the rest of this ->destroy_inode() instance could probably be folded
into ext4_evict_inode()

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01 22:43:26 -04:00
Al Viro
586a94fdc9 ecryptfs: make use of ->free_inode()
no idea if crypto destruction could be moved there as well

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01 22:43:26 -04:00
Al Viro
cfa6d41263 ceph: use ->free_inode()
a lot of non-delayed work in this case; all of that is left in
->destroy_inode()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01 22:43:26 -04:00
Al Viro
26602cab41 btrfs: use ->free_inode()
a lot of stuff remains in ->destroy_inode()

Acked-by: David Sterba <dsterba@suse.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01 22:43:26 -04:00
Al Viro
51b9fe48c4 afs: switch to use of ->free_inode()
debugging printks left in ->destroy_inode() and so's the
update of inode count; we could take the latter to RCU-delayed
part (would take only moving the check on module exit past
rcu_barrier() there), but debugging output ought to either
stay where it is or go into ->evict_inode()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01 22:43:26 -04:00
Al Viro
a2b757fe0f ntfs: switch to ->free_inode()
move the synchronous stuff from ->destroy_inode() to ->evict_inode(),
turn the RCU-delayed part into ->free_inode()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01 22:43:26 -04:00
Al Viro
98835e884c ufs: switch to ->free_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01 22:43:26 -04:00
Al Viro
d984892bd7 coda: switch to ->free_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01 22:43:26 -04:00
Al Viro
6becf8edf1 sysv: switch to ->free_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01 22:43:25 -04:00
Al Viro
a78bb3838d udf: switch to ->free_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01 22:43:25 -04:00
Al Viro
dc43175996 ubifs: switch to ->free_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01 22:43:25 -04:00
Al Viro
56b5af1931 squashfs: switch to ->free_inode()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-05-01 22:43:25 -04:00