Commit Graph

60436 Commits

Author SHA1 Message Date
Linus Torvalds
e170eb2771 Merge branch 'work.mount-base' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs mount API infrastructure updates from Al Viro:
 "Infrastructure bits of mount API conversions.

  The rest is more of per-filesystem updates and that will happen
  in separate pull requests"

* 'work.mount-base' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  mtd: Provide fs_context-aware mount_mtd() replacement
  vfs: Create fs_context-aware mount_bdev() replacement
  new helper: get_tree_keyed()
  vfs: set fs_context::user_ns for reconfigure
2019-09-18 13:15:58 -07:00
Linus Torvalds
b30d87cf96 Merge branch 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull d_path fix from Al Viro:
 "Fix d_absolute_path() regression in the last cycle (felt by tomoyo,
  mostly)"

* 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  [PATCH] fix d_absolute_path() interplay with fsmount()
2019-09-18 13:14:14 -07:00
Linus Torvalds
53e5e7a7a7 Merge branch 'work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs namei updates from Al Viro:
 "Pathwalk-related stuff"

[ Audit-related cleanups, misc simplifications, and easier to follow
  nd->root refcounts     - Linus ]

* 'work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  devpts_pty_kill(): don't bother with d_delete()
  infiniband: don't bother with d_delete()
  hypfs: don't bother with d_delete()
  fs/namei.c: keep track of nd->root refcount status
  fs/namei.c: new helper - legitimize_root()
  kill the last users of user_{path,lpath,path_dir}()
  namei.h: get the comments on LOOKUP_... in sync with reality
  kill LOOKUP_NO_EVAL, don't bother including namei.h from audit.h
  audit_inode(): switch to passing AUDIT_INODE_...
  filename_mountpoint(): make LOOKUP_NO_EVAL unconditional there
  filename_lookup(): audit_inode() argument is always 0
2019-09-18 13:03:01 -07:00
Linus Torvalds
8b53c76533 Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
 "API:
   - Add the ability to abort a skcipher walk.

  Algorithms:
   - Fix XTS to actually do the stealing.
   - Add library helpers for AES and DES for single-block users.
   - Add library helpers for SHA256.
   - Add new DES key verification helper.
   - Add surrounding bits for ESSIV generator.
   - Add accelerations for aegis128.
   - Add test vectors for lzo-rle.

  Drivers:
   - Add i.MX8MQ support to caam.
   - Add gcm/ccm/cfb/ofb aes support in inside-secure.
   - Add ofb/cfb aes support in media-tek.
   - Add HiSilicon ZIP accelerator support.

  Others:
   - Fix potential race condition in padata.
   - Use unbound workqueues in padata"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (311 commits)
  crypto: caam - Cast to long first before pointer conversion
  crypto: ccree - enable CTS support in AES-XTS
  crypto: inside-secure - Probe transform record cache RAM sizes
  crypto: inside-secure - Base RD fetchcount on actual RD FIFO size
  crypto: inside-secure - Base CD fetchcount on actual CD FIFO size
  crypto: inside-secure - Enable extended algorithms on newer HW
  crypto: inside-secure: Corrected configuration of EIP96_TOKEN_CTRL
  crypto: inside-secure - Add EIP97/EIP197 and endianness detection
  padata: remove cpu_index from the parallel_queue
  padata: unbind parallel jobs from specific CPUs
  padata: use separate workqueues for parallel and serial work
  padata, pcrypt: take CPU hotplug lock internally in padata_alloc_possible
  crypto: pcrypt - remove padata cpumask notifier
  padata: make padata_do_parallel find alternate callback CPU
  workqueue: require CPU hotplug read exclusion for apply_workqueue_attrs
  workqueue: unconfine alloc/apply/free_workqueue_attrs()
  padata: allocate workqueue internally
  arm64: dts: imx8mq: Add CAAM node
  random: Use wait_event_freezable() in add_hwgenerator_randomness()
  crypto: ux500 - Fix COMPILE_TEST warnings
  ...
2019-09-18 12:11:14 -07:00
Linus Torvalds
e6874fc294 Staging/IIO driver patches for 5.4-rc1
Here is the big staging/iio driver update for 5.4-rc1.
 
 Lots of churn here, with a few driver/filesystems moving out of staging
 finally:
 	- erofs moved out of staging
 	- greybus core code moved out of staging
 
 Along with that, a new filesytem has been added:
 	- extfat
 to provide support for those devices requiring that filesystem (i.e.
 transfer devices to/from windows systems or printers.)
 
 Other than that, there a number of new IIO drivers, and lots and lots
 and lots of staging driver cleanups and minor fixes as people continue
 to dig into those for easy changes.
 
 All of these have been in linux-next for a while with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXYIW0g8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ynJgwCgt22YQdsWRrOuJnZp3xpFq/ZSMWAAn0dAgFf5
 SlSTI2nQhbW7jjdShrPR
 =Ra09
 -----END PGP SIGNATURE-----

Merge tag 'staging-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging and IIO driver updates from Greg KH:
 "Here is the big staging/iio driver update for 5.4-rc1.

  Lots of churn here, with a few driver/filesystems moving out of
  staging finally:

     - erofs moved out of staging

     - greybus core code moved out of staging

  Along with that, a new filesytem has been added:

     - extfat

  to provide support for those devices requiring that filesystem (i.e.
  transfer devices to/from windows systems or printers)

  Other than that, there a number of new IIO drivers, and lots and lots
  and lots of staging driver cleanups and minor fixes as people continue
  to dig into those for easy changes.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'staging-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (453 commits)
  Staging: gasket: Use temporaries to reduce line length.
  Staging: octeon: Avoid several usecases of strcpy
  staging: vhciq_core: replace snprintf with scnprintf
  staging: wilc1000: avoid twice IRQ handler execution for each single interrupt
  staging: wilc1000: remove unused interrupt status handling code
  staging: fbtft: make several arrays static const, makes object smaller
  staging: rtl8188eu: make two arrays static const, makes object smaller
  staging: rtl8723bs: core: Remove Macro "IS_MAC_ADDRESS_BROADCAST"
  dt-bindings: anybus-controller: move to staging/ tree
  staging: emxx_udc: remove local TRUE/FALSE definition
  staging: wilc1000: look for rtc_clk clock
  staging: dt-bindings: wilc1000: add optional rtc_clk property
  staging: nvec: make use of devm_platform_ioremap_resource
  staging: exfat: drop unused function parameter
  Staging: exfat: Avoid use of strcpy
  staging: exfat: use integer constants
  staging: exfat: cleanup spacing for casts
  staging: exfat: cleanup spacing for operators
  staging: rtl8723bs: hal: remove redundant variable n
  staging: pi433: Fix typo in documentation
  ...
2019-09-18 11:05:34 -07:00
Linus Torvalds
1f7d290a72 Driver core patches for 5.4-rc1
Here is the big driver core update for 5.4-rc1.
 
 There was a bit of a churn in here, with a number of core and OF
 platform patches being added to the tree, and then after much discussion
 and review and a day-long in-person meeting, they were decided to be
 reverted and a new set of patches is currently being reviewed on the
 mailing list.
 
 Other than that churn, there are two "persistent" branches in here that
 other trees will be pulling in as well during the merge window.  One
 branch to add support for drivers to have the driver core automatically
 add sysfs attribute files when a driver is bound to a device so that the
 driver doesn't have to manually do it (and then clean it up, as it
 always gets it wrong).
 
 There's another branch in here for generic lookup helpers for the driver
 core that lots of busses are starting to use.  That's the majority of
 the non-driver-core changes in this patch series.
 
 There's also some on-going debugfs file creation cleanup that has been
 slowly happening over the past few releases, with the goal to hopefully
 get that done sometime next year.
 
 All of these have been in linux-next for a while now with no reported
 issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXYIVHA8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ymEVwCfRPxQHQplI6ZR6h0jPscLSaZnaFIAn1a+rjO2
 EFuuXJ5Ip72F5Ch9AW3G
 =r8lH
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg Kroah-Hartman:
 "Here is the big driver core update for 5.4-rc1.

  There was a bit of a churn in here, with a number of core and OF
  platform patches being added to the tree, and then after much
  discussion and review and a day-long in-person meeting, they were
  decided to be reverted and a new set of patches is currently being
  reviewed on the mailing list.

  Other than that churn, there are two "persistent" branches in here
  that other trees will be pulling in as well during the merge window.
  One branch to add support for drivers to have the driver core
  automatically add sysfs attribute files when a driver is bound to a
  device so that the driver doesn't have to manually do it (and then
  clean it up, as it always gets it wrong).

  There's another branch in here for generic lookup helpers for the
  driver core that lots of busses are starting to use. That's the
  majority of the non-driver-core changes in this patch series.

  There's also some on-going debugfs file creation cleanup that has been
  slowly happening over the past few releases, with the goal to
  hopefully get that done sometime next year.

  All of these have been in linux-next for a while now with no reported
  issues"

[ Note that the above-mentioned generic lookup helpers branch was
  already brought in by the LED merge (commit 4feaab05dc) that had
  shared it.

  Also note that that common branch introduced an i2c bug due to a bad
  conversion, which got fixed here. - Linus ]

* tag 'driver-core-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (49 commits)
  coccinelle: platform_get_irq: Fix parse error
  driver-core: add include guard to linux/container.h
  sysfs: add BIN_ATTR_WO() macro
  driver core: platform: Export platform_get_irq_optional()
  hwmon: pwm-fan: Use platform_get_irq_optional()
  driver core: platform: Introduce platform_get_irq_optional()
  Revert "driver core: Add support for linking devices during device addition"
  Revert "driver core: Add edit_links() callback for drivers"
  Revert "of/platform: Add functional dependency link from DT bindings"
  Revert "driver core: Add sync_state driver/bus callback"
  Revert "of/platform: Pause/resume sync state during init and of_platform_populate()"
  Revert "of/platform: Create device links for all child-supplier depencencies"
  Revert "of/platform: Don't create device links for default busses"
  Revert "of/platform: Fix fn definitons for of_link_is_valid() and of_link_property()"
  Revert "of/platform: Fix device_links_supplier_sync_state_resume() warning"
  Revert "of/platform: Disable generic device linking code for PowerPC"
  devcoredump: fix typo in comment
  devcoredump: use memory_read_from_buffer
  of/platform: Disable generic device linking code for PowerPC
  device.h: Fix warnings for mismatched parameter names in comments
  ...
2019-09-18 10:04:39 -07:00
Jens Axboe
5262f56798 io_uring: IORING_OP_TIMEOUT support
There's been a few requests for functionality similar to io_getevents()
and epoll_wait(), where the user can specify a timeout for waiting on
events. I deliberately did not add support for this through the system
call initially to avoid overloading the args, but I can see that the use
cases for this are valid.

This adds support for IORING_OP_TIMEOUT. If a user wants to get woken
when waiting for events, simply submit one of these timeout commands
with your wait call (or before). This ensures that the application
sleeping on the CQ ring waiting for events will get woken. The timeout
command is passed in as a pointer to a struct timespec. Timeouts are
relative. The timeout command also includes a way to auto-cancel after
N events has passed.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-18 10:43:22 -06:00
Jens Axboe
9831a90ce6 io_uring: use cond_resched() in sqthread
If preempt isn't enabled in the kernel, we can run into hang issues with
sqthread submissions. Use cond_resched() to play nice instead of
cpu_relax(), if we end up starting the loop and not having any events
pending for submissions.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-19 09:49:26 -06:00
Jackie Liu
a1041c27b6 io_uring: fix potential crash issue due to io_get_req failure
Sometimes io_get_req will return a NUL, then we need to do the
correct error handling, otherwise it will cause the kernel null
pointer exception.

Fixes: 4fe2c96315 ("io_uring: add support for link with drain")
Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-18 11:20:04 -06:00
Jens Axboe
6cc47d1d2a io_uring: ensure poll commands clear ->sqe
If we end up getting woken in poll (due to a signal), then we may need
to punt the poll request to an async worker. When we do that, we look up
the list to queue at, deferefencing req->submit.sqe, however that is
only set for requests we initially decided to queue async.

This fixes a crash with poll command usage and wakeups that need to punt
to async context.

Fixes: 54a91f3bb9 ("io_uring: limit parallelism of buffered writes")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-18 11:19:26 -06:00
Jackie Liu
5f5ad9ced3 io_uring: fix use-after-free of shadow_req
There is a potential dangling pointer problem. we never clean
shadow_req, if there are multiple link lists in this series of
sqes, then the shadow_req will not reallocate, and continue to
use the last one. but in the previous, his memory has been
released, thus forming a dangling pointer. let's clean up him
and make sure that every new link list can reapply for a new
shadow_req.

Fixes: 4fe2c96315 ("io_uring: add support for link with drain")
Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-18 11:19:06 -06:00
Jackie Liu
954dab193d io_uring: use kmemdup instead of kmalloc and memcpy
Just clean up the code, no function changes.

Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-18 11:19:06 -06:00
Theodore Ts'o
040823b537 fs/unicode patches for 5.4-rc1
This includes two fixes for the unicode system for inclusion into Linux
 v5.4.
 
   - A patch from Krzysztof Wilczynski solving a build time warning.
 
   - A patch from Colin King making a parsing format static, to reduce
     stack size.
 
 Build validated and run time tested using xfstests casefold testcase.
 
 Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8jAUPq50yNjPBCi4QEuZqsMcppQFAl2BGNEACgkQQEuZqsMc
 ppQufA//SbjnBvMtlPco6Cxz8Qn7IukVUEXDdir8imMP8XWTyHkVMK0C7t0Vd/UX
 ql6wtQDDxFQCrYegw8F1fTd32xI7DaYtHthCy0UdLNoQ08FLGtdA1+zXkM9IdRsy
 /3mWR0jW8Bqfsta/XQffWLdNHvJ9Ny82iWonaW6/ujyxz1/Ch7LbVsLvsMQXVwti
 8dpvT4MFBoODxqKOjZoa3aiqwsljCY+0Dy5uyW5A8r/eBxaymiKlrYkuaqFtM+er
 TQ341/fpGyYdtOYLSLSLPnZmmGdypSE6e6TwT1Q9BO8szNYqF/HeUew3yQ63DVSZ
 lpR02FKKyZLdxGb9CSmJ0eXNJVF8t5zc6Nh+JvnH9e73x916tEnrZivh/UBvzTdD
 yF5uKDWogiKtThGIwHF454nSMpAUKO/orCkhzpMgCsweyC/uRMXZk1SVzL+L4Vdq
 jsVUb7DJSDdfxacc7A7qHg4Yj9y1LAkWh9K5TzKaWtk10LoiL4yx8bAh6lSuoUP9
 MxSAInHLFgYAgwai1KhxaYJnlxgCA7C0viD1dfQtWWFVeX22P8J6qaKN8nE/F8Qr
 9ujwQ3IEtYkiIX2CoBqX5L23yDVcppG3oK68M41Cb9gZHlFiTrEIGwFZBWo7GDa/
 fj7LL9vKy62uubqL8o96YWzzAD48aZ3/HiqIBumKFDNLo5qOll8=
 =fhC0
 -----END PGP SIGNATURE-----

Merge tag 'unicode-next-v5.4' of https://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode into dev

fs/unicode patches for 5.4-rc1

This includes two fixes for the unicode system for inclusion into Linux
v5.4.

  - A patch from Krzysztof Wilczynski solving a build time warning.

  - A patch from Colin King making a parsing format static, to reduce
    stack size.

Build validated and run time tested using xfstests casefold testcase.

Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
2019-09-18 10:36:24 -04:00
Linus Torvalds
77dcfe2b9e Power management updates for 5.4-rc1
- Rework the main suspend-to-idle control flow to avoid repeating
    "noirq" device resume and suspend operations in case of spurious
    wakeups from the ACPI EC and decouple the ACPI EC wakeups support
    from the LPS0 _DSM support (Rafael Wysocki).
 
  - Extend the wakeup sources framework to expose wakeup sources as
    device objects in sysfs (Tri Vo, Stephen Boyd).
 
  - Expose system suspend statistics in sysfs (Kalesh Singh).
 
  - Introduce a new haltpoll cpuidle driver and a new matching
    governor for virtualized guests wanting to do guest-side polling
    in the idle loop (Marcelo Tosatti, Joao Martins, Wanpeng Li,
    Stephen Rothwell).
 
  - Fix the menu and teo cpuidle governors to allow the scheduler tick
    to be stopped if PM QoS is used to limit the CPU idle state exit
    latency in some cases (Rafael Wysocki).
 
  - Increase the resolution of the play_idle() argument to microseconds
    for more fine-grained injection of CPU idle cycles (Daniel Lezcano).
 
  - Switch over some users of cpuidle notifiers to the new QoS-based
    frequency limits and drop the CPUFREQ_ADJUST and CPUFREQ_NOTIFY
    policy notifier events (Viresh Kumar).
 
  - Add new cpufreq driver based on nvmem for sun50i (Yangtao Li).
 
  - Add support for MT8183 and MT8516 to the mediatek cpufreq driver
    (Andrew-sh.Cheng, Fabien Parent).
 
  - Add i.MX8MN support to the imx-cpufreq-dt cpufreq driver (Anson
    Huang).
 
  - Add qcs404 to cpufreq-dt-platdev blacklist (Jorge Ramirez-Ortiz).
 
  - Update the qcom cpufreq driver (among other things, to make it
    easier to extend and to use kryo cpufreq for other nvmem-based
    SoCs) and add qcs404 support to it  (Niklas Cassel, Douglas
    RAILLARD, Sibi Sankar, Sricharan R).
 
  - Fix assorted issues and make assorted minor improvements in the
    cpufreq code (Colin Ian King, Douglas RAILLARD, Florian Fainelli,
    Gustavo Silva, Hariprasad Kelam).
 
  - Add new devfreq driver for NVidia Tegra20 (Dmitry Osipenko, Arnd
    Bergmann).
 
  - Add new Exynos PPMU events to devfreq events and extend that
    mechanism (Lukasz Luba).
 
  - Fix and clean up the exynos-bus devfreq driver (Kamil Konieczny).
 
  - Improve devfreq documentation and governor code, fix spelling
    typos in devfreq (Ezequiel Garcia, Krzysztof Kozlowski, Leonard
    Crestez, MyungJoo Ham, Gaël PORTAY).
 
  - Add regulators enable and disable to the OPP (operating performance
    points) framework (Kamil Konieczny).
 
  - Update the OPP framework to support multiple opp-suspend properties
    (Anson Huang).
 
  - Fix assorted issues and make assorted minor improvements in the OPP
    code (Niklas Cassel, Viresh Kumar, Yue Hu).
 
  - Clean up the generic power domains (genpd) framework (Ulf Hansson).
 
  - Clean up assorted pieces of power management code and documentation
    (Akinobu Mita, Amit Kucheria, Chuhong Yuan).
 
  - Update the pm-graph tool to version 5.5 including multiple fixes
    and improvements (Todd Brandt).
 
  - Update the cpupower utility (Benjamin Weis, Geert Uytterhoeven,
    Sébastien Szymanski).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAl2ArZ4SHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxgfYQAK80hs43vWQDmp7XKrN4pQe8+qYULAGO
 fBfrFl+NG9y/cnuqnt3NtA8MoyNsMMkMLkpkEDMfSbYqqH5ehEzX5+uGJWiWx8+Y
 oH5KU8MH7Tj/utYaalGzDt0AHfHZDIGC0NCUNQJVtE/4mOANFabwsCwscp4MrD5Q
 WjFN8U4BrsmWgJdZ/U9QIWcDZ0I+1etCF+rZG2yxSv31FMq2Zk/Qm4YyobqCvQFl
 TR9rxl08wqUmIYIz5cDjt/3AKH7NLLDqOTstbCL7cmufM5XPFc1yox69xc89UrIa
 4AMgmDp7SMwFG/gdUPof0WQNmx7qxmiRAPleAOYBOZW/8jPNZk2y+RhM5NeF72m7
 AFqYiuxqatkSb4IsT8fLzH9IUZOdYr8uSmoMQECw+MHdApaKFjFV8Lb/qx5+AwkD
 y7pwys8dZSamAjAf62eUzJDWcEwkNrujIisGrIXrVHb7ISbweskMOmdAYn9p4KgP
 dfRzpJBJ45IaMIdbaVXNpg3rP7Apfs7X1X+/ZhG6f+zHH3zYwr8Y81WPqX8WaZJ4
 qoVCyxiVWzMYjY2/1lzjaAdqWojPWHQ3or3eBaK52DouyG3jY6hCDTLwU7iuqcCX
 jzAtrnqrNIKufvaObEmqcmYlIIOFT7QaJCtGUSRFQLfSon8fsVSR7LLeXoAMUJKT
 JWQenuNaJngK
 =TBDQ
 -----END PGP SIGNATURE-----

Merge tag 'pm-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management updates from Rafael Wysocki:
 "These include a rework of the main suspend-to-idle code flow (related
  to the handling of spurious wakeups), a switch over of several users
  of cpufreq notifiers to QoS-based limits, a new devfreq driver for
  Tegra20, a new cpuidle driver and governor for virtualized guests, an
  extension of the wakeup sources framework to expose wakeup sources as
  device objects in sysfs, and more.

  Specifics:

   - Rework the main suspend-to-idle control flow to avoid repeating
     "noirq" device resume and suspend operations in case of spurious
     wakeups from the ACPI EC and decouple the ACPI EC wakeups support
     from the LPS0 _DSM support (Rafael Wysocki).

   - Extend the wakeup sources framework to expose wakeup sources as
     device objects in sysfs (Tri Vo, Stephen Boyd).

   - Expose system suspend statistics in sysfs (Kalesh Singh).

   - Introduce a new haltpoll cpuidle driver and a new matching governor
     for virtualized guests wanting to do guest-side polling in the idle
     loop (Marcelo Tosatti, Joao Martins, Wanpeng Li, Stephen Rothwell).

   - Fix the menu and teo cpuidle governors to allow the scheduler tick
     to be stopped if PM QoS is used to limit the CPU idle state exit
     latency in some cases (Rafael Wysocki).

   - Increase the resolution of the play_idle() argument to microseconds
     for more fine-grained injection of CPU idle cycles (Daniel
     Lezcano).

   - Switch over some users of cpuidle notifiers to the new QoS-based
     frequency limits and drop the CPUFREQ_ADJUST and CPUFREQ_NOTIFY
     policy notifier events (Viresh Kumar).

   - Add new cpufreq driver based on nvmem for sun50i (Yangtao Li).

   - Add support for MT8183 and MT8516 to the mediatek cpufreq driver
     (Andrew-sh.Cheng, Fabien Parent).

   - Add i.MX8MN support to the imx-cpufreq-dt cpufreq driver (Anson
     Huang).

   - Add qcs404 to cpufreq-dt-platdev blacklist (Jorge Ramirez-Ortiz).

   - Update the qcom cpufreq driver (among other things, to make it
     easier to extend and to use kryo cpufreq for other nvmem-based
     SoCs) and add qcs404 support to it (Niklas Cassel, Douglas
     RAILLARD, Sibi Sankar, Sricharan R).

   - Fix assorted issues and make assorted minor improvements in the
     cpufreq code (Colin Ian King, Douglas RAILLARD, Florian Fainelli,
     Gustavo Silva, Hariprasad Kelam).

   - Add new devfreq driver for NVidia Tegra20 (Dmitry Osipenko, Arnd
     Bergmann).

   - Add new Exynos PPMU events to devfreq events and extend that
     mechanism (Lukasz Luba).

   - Fix and clean up the exynos-bus devfreq driver (Kamil Konieczny).

   - Improve devfreq documentation and governor code, fix spelling typos
     in devfreq (Ezequiel Garcia, Krzysztof Kozlowski, Leonard Crestez,
     MyungJoo Ham, Gaël PORTAY).

   - Add regulators enable and disable to the OPP (operating performance
     points) framework (Kamil Konieczny).

   - Update the OPP framework to support multiple opp-suspend properties
     (Anson Huang).

   - Fix assorted issues and make assorted minor improvements in the OPP
     code (Niklas Cassel, Viresh Kumar, Yue Hu).

   - Clean up the generic power domains (genpd) framework (Ulf Hansson).

   - Clean up assorted pieces of power management code and documentation
     (Akinobu Mita, Amit Kucheria, Chuhong Yuan).

   - Update the pm-graph tool to version 5.5 including multiple fixes
     and improvements (Todd Brandt).

   - Update the cpupower utility (Benjamin Weis, Geert Uytterhoeven,
     Sébastien Szymanski)"

* tag 'pm-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (126 commits)
  cpuidle-haltpoll: Enable kvm guest polling when dedicated physical CPUs are available
  cpuidle-haltpoll: do not set an owner to allow modunload
  cpuidle-haltpoll: return -ENODEV on modinit failure
  cpuidle-haltpoll: set haltpoll as preferred governor
  cpuidle: allow governor switch on cpuidle_register_driver()
  PM: runtime: Documentation: add runtime_status ABI document
  pm-graph: make setVal unbuffered again for python2 and python3
  powercap: idle_inject: Use higher resolution for idle injection
  cpuidle: play_idle: Increase the resolution to usec
  cpuidle-haltpoll: vcpu hotplug support
  cpufreq: Add qcs404 to cpufreq-dt-platdev blacklist
  cpufreq: qcom: Add support for qcs404 on nvmem driver
  cpufreq: qcom: Refactor the driver to make it easier to extend
  cpufreq: qcom: Re-organise kryo cpufreq to use it for other nvmem based qcom socs
  dt-bindings: opp: Add qcom-opp bindings with properties needed for CPR
  dt-bindings: opp: qcom-nvmem: Support pstates provided by a power domain
  Documentation: cpufreq: Update policy notifier documentation
  cpufreq: Remove CPUFREQ_ADJUST and CPUFREQ_NOTIFY policy notifier events
  PM / Domains: Verify PM domain type in dev_pm_genpd_set_performance_state()
  PM / Domains: Simplify genpd_lookup_dev()
  ...
2019-09-17 19:15:14 -07:00
Linus Torvalds
7ad67ca553 for-5.4/block-2019-09-16
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl1/no0QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpmo9EACFXMbdNmEEUMyRSdOkVLlr7ZlTyQi1tLpB
 YESDPxdBfybzpi0qa8JSaysGIfvSkSjmSAqBqrWPmASOSOL6CK4bbA4fTYbgPplk
 XeHUdgGiG34oCQUn8Xil5reYaTm7I6LQWnWTpVa5fIhAyUYaGJL+987ykoGmpQmB
 Dvf3YSc+8H0RTp9PCMVd6UCGPkZbVlLImGad3PF5ULvTEaE4RCXC2aiAgh0p1l5A
 J2CkRZ+/mio3zN2O4YN7VdPGfr1Wo1iZ834xbIGLegv1miHXagFk7jwTcC7zIt5t
 oSnJnqIg3iCe7SpWt4Bkzw/zy/2UqaspifbCMgw8vychlViVRUHFO5h85Yboo7kQ
 OMLEQPcwjm6dTHv5h1iXF9LW1O7NoiYmmgvApU9uOo1HUrl1X7PZ3JEfUsVHxkOO
 T4D5igf0Krsl1eAbiwEUQzy7vFZ8PlRHqrHgK+fkyotzHu1BJR7OQkYygEfGFOB/
 EfMxplGDpmibYGuWCwDX2bPAmLV3SPUQENReHrfPJRDt5TD1UkFpVGv/PLLhbr0p
 cLYI78DKpDSigBpVMmwq5nTYpnex33eyDTTA8C0sakcsdzdmU5qv30y3wm4nTiep
 f6gZo6IMXwRg/rCgVVrd9SKQAr/8wEzVlsDW3qyi2pVT8sHIgm0tFv7paihXGdDV
 xsKgmTrQQQ==
 =Qt+h
 -----END PGP SIGNATURE-----

Merge tag 'for-5.4/block-2019-09-16' of git://git.kernel.dk/linux-block

Pull block updates from Jens Axboe:

 - Two NVMe pull requests:
     - ana log parse fix from Anton
     - nvme quirks support for Apple devices from Ben
     - fix missing bio completion tracing for multipath stack devices
       from Hannes and Mikhail
     - IP TOS settings for nvme rdma and tcp transports from Israel
     - rq_dma_dir cleanups from Israel
     - tracing for Get LBA Status command from Minwoo
     - Some nvme-tcp cleanups from Minwoo, Potnuri and Myself
     - Some consolidation between the fabrics transports for handling
       the CAP register
     - reset race with ns scanning fix for fabrics (move fabrics
       commands to a dedicated request queue with a different lifetime
       from the admin request queue)."
     - controller reset and namespace scan races fixes
     - nvme discovery log change uevent support
     - naming improvements from Keith
     - multiple discovery controllers reject fix from James
     - some regular cleanups from various people

 - Series fixing (and re-fixing) null_blk debug printing and nr_devices
   checks (André)

 - A few pull requests from Song, with fixes from Andy, Guoqing,
   Guilherme, Neil, Nigel, and Yufen.

 - REQ_OP_ZONE_RESET_ALL support (Chaitanya)

 - Bio merge handling unification (Christoph)

 - Pick default elevator correctly for devices with special needs
   (Damien)

 - Block stats fixes (Hou)

 - Timeout and support devices nbd fixes (Mike)

 - Series fixing races around elevator switching and device add/remove
   (Ming)

 - sed-opal cleanups (Revanth)

 - Per device weight support for BFQ (Fam)

 - Support for blk-iocost, a new model that can properly account cost of
   IO workloads. (Tejun)

 - blk-cgroup writeback fixes (Tejun)

 - paride queue init fixes (zhengbin)

 - blk_set_runtime_active() cleanup (Stanley)

 - Block segment mapping optimizations (Bart)

 - lightnvm fixes (Hans/Minwoo/YueHaibing)

 - Various little fixes and cleanups

* tag 'for-5.4/block-2019-09-16' of git://git.kernel.dk/linux-block: (186 commits)
  null_blk: format pr_* logs with pr_fmt
  null_blk: match the type of parameter nr_devices
  null_blk: do not fail the module load with zero devices
  block: also check RQF_STATS in blk_mq_need_time_stamp()
  block: make rq sector size accessible for block stats
  bfq: Fix bfq linkage error
  raid5: use bio_end_sector in r5_next_bio
  raid5: remove STRIPE_OPS_REQ_PENDING
  md: add feature flag MD_FEATURE_RAID0_LAYOUT
  md/raid0: avoid RAID0 data corruption due to layout confusion.
  raid5: don't set STRIPE_HANDLE to stripe which is in batch list
  raid5: don't increment read_errors on EILSEQ return
  nvmet: fix a wrong error status returned in error log page
  nvme: send discovery log page change events to userspace
  nvme: add uevent variables for controller devices
  nvme: enable aen regardless of the presence of I/O queues
  nvme-fabrics: allow discovery subsystems accept a kato
  nvmet: Use PTR_ERR_OR_ZERO() in nvmet_init_discovery()
  nvme: Remove redundant assignment of cq vector
  nvme: Assign subsys instance from first ctrl
  ...
2019-09-17 16:57:47 -07:00
Linus Torvalds
1e6fa3a33e for-5.4/io_uring-2019-09-15
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl1+zs8QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpvIOD/9P4MhnjTc9sFz978YGw2kP/yUjW0Vs194B
 nV5wmM+J3I6VUBgurNQeB7f14b/1CZnnPjisLKGzz1SKrhRQT/HuXzItsYpzRql3
 3GvD4aRNJd52pfkX3p7tA8uMr5Hh2gnNLhJtlPTaaAiIc8XCme+hHMnlmhwqMZ0T
 FKTc0Pw0yxvYIbxmt+pfWxoKhHrq12dYLG61Iv8XeSCb3k+EzdZV8jQdnN/TVhsF
 ngzxRMvbPjJ/TvDcsdV0qNqmLTojgmuiAvDs/3YPCreGVhMBkm4w/tlZGqTJHSdb
 Ljxzwp0Ungu94O2Ct3gN+pIxTJ5hgfnArfT40fbWczZrlXzUKw4d3mW/QG3g/Elf
 aUfCY+3WWR/Fs+XT5ZDasJDewwO4j5wYgO8VYeyTX08jobcxC/osqoMtGUqsljwJ
 CTWMYSD8MTXEnOIZFK9sgjg0AoJYVDE8SXN/zx+7K7iVk9+5xQTkuL/cCd+3/Nih
 73QqvwpRm0Tik4yLGcw+EJc2ttvNUsI5EEETloiGNaMeK1oOJvzSu8qhh0G9VdYo
 DWka5Yo+kLzwuUMbtzVSq288Un+RjbUIoqrqMFqm5QLqUYYg735IH2McIZvIEHEj
 o0vjudw/yxcpURGqV5cokeusIHex/LC8hItCwHY4U9959A2yOtiCs/Bi7LIj3lAu
 FO5gTlGDHA==
 =KSaR
 -----END PGP SIGNATURE-----

Merge tag 'for-5.4/io_uring-2019-09-15' of git://git.kernel.dk/linux-block

Pull io_uring updates from Jens Axboe:

 - Allocate SQ/CQ ring together, more efficient. Expose this through a
   feature flag as well, so we can reduce the number of mmaps by 1
   (Hristo and me)

 - Fix for sequence logic with SQ thread (Jackie).

 - Add support for links with drain commands (Jackie).

 - Improved async merging (me)

 - Improved buffered async write performance (me)

 - Support SQ poll wakeup + event get in single io_uring_enter() (me)

 - Support larger SQ ring size. For epoll conversions, the 4k limit was
   too small for some prod workloads (Daniel).

 - put_user_page() usage (John)

* tag 'for-5.4/io_uring-2019-09-15' of git://git.kernel.dk/linux-block:
  io_uring: increase IORING_MAX_ENTRIES to 32K
  io_uring: make sqpoll wakeup possible with getevents
  io_uring: extend async work merging
  io_uring: limit parallelism of buffered writes
  io_uring: add io_queue_async_work() helper
  io_uring: optimize submit_and_wait API
  io_uring: add support for link with drain
  io_uring: fix wrong sequence setting logic
  io_uring: expose single mmap capability
  io_uring: allocate the two rings together
  fs/io_uring.c: convert put_page() to put_user_page*()
2019-09-17 16:50:19 -07:00
Linus Torvalds
7c672abc12 It's a somewhat calmer cycle for docs this time, as the churn of the mass
RST conversion is happily mostly behind us.
 
  - A new document on reproducible builds.
 
  - We finally got around to zapping the documentation for hardware support
    that was removed in 2004; one doesn't want to rush these things.
 
  - The usual assortment of fixes, typo corrections, etc.
 
 You'll still find a handful of annoying conflicts against other trees,
 mostly tied to the last RST conversions; resolutions are straightforward
 and the linux-next ones are good.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAl1/J4IACgkQF0NaE2wM
 flhYogf9EgYozCe8RocSq+JjJpZOSFjIGDQv+GwTjOBIdqgO9tSIaY/p0wSkYKil
 jYXyMDF+Xwr8podsUep2F7akBM7j9XJ+XBGJcfOna0ypC9xoejMgWt9fU3YvaWge
 dQJxIQ/iwkDlKNx6uOYgKysLUWFS0EP/nzPhqBo4bZZzhugvrR46D/nQqFNmGihd
 l9yLalJtP5mC0XRUv3hpdAFFFKxdC0R3BGOel2V+slSClp0LEgpdMAuMaKydEDI3
 Ch9ZpIp8fB8kqONCs9/X6083WRsDOMe28KgeGrGHo4Jla6u51QBLQjSVKttFv7xk
 051yNJwDWMxgl+A4gyNLDPXM7Gd7HQ==
 =v4dp
 -----END PGP SIGNATURE-----

Merge tag 'docs-5.4' of git://git.lwn.net/linux

Pull documentation updates from Jonathan Corbet:
 "It's a somewhat calmer cycle for docs this time, as the churn of the
  mass RST conversion is happily mostly behind us.

   - A new document on reproducible builds.

   - We finally got around to zapping the documentation for hardware
     support that was removed in 2004; one doesn't want to rush these
     things.

   - The usual assortment of fixes, typo corrections, etc"

* tag 'docs-5.4' of git://git.lwn.net/linux: (67 commits)
  Documentation: kbuild: Add document about reproducible builds
  docs: printk-formats: Stop encouraging use of unnecessary %h[xudi] and %hh[xudi]
  Documentation: Add "earlycon=sbi" to the admin guide
  doc🔒 remove reference to clever use of read-write lock
  devices.txt: improve entry for comedi (char major 98)
  docs: mtd: Update spi nor reference driver
  doc: arm64: fix grammar dtb placed in no attributes region
  Documentation: sysrq: don't recommend 'S' 'U' before 'B'
  mailmap: Update email address for Quentin Perret
  docs: ftrace: clarify when tracing is disabled by the trace file
  docs: process: fix broken link
  Documentation/arm/samsung-s3c24xx: Remove stray U+FEFF character to fix title
  Documentation/arm/sa1100/assabet: Fix 'make assabet_defconfig' command
  Documentation/arm/sa1100: Remove some obsolete documentation
  docs/zh_CN: update Chinese howto.rst for latexdocs making
  Documentation: virt: Fix broken reference to virt tree's index
  docs: Fix typo on pull requests guide
  kernel-doc: Allow anonymous enum
  Documentation: sphinx: Don't parse socket() as identifier reference
  Documentation: sphinx: Add missing comma to list of strings
  ...
2019-09-17 16:22:26 -07:00
Sahitya Tummala
fbbf779989 f2fs: add a condition to detect overflow in f2fs_ioc_gc_range()
end = range.start + range.len;

If the range.start/range.len is a very large value, then end can overflow
in this operation. It results into a crash in get_valid_blocks() when
accessing the invalid range.start segno.

This issue is reported in ioctl fuzz testing.

Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-09-17 13:56:15 -07:00
Linus Torvalds
7f2444d38f Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core timer updates from Thomas Gleixner:
 "Timers and timekeeping updates:

   - A large overhaul of the posix CPU timer code which is a preparation
     for moving the CPU timer expiry out into task work so it can be
     properly accounted on the task/process.

     An update to the bogus permission checks will come later during the
     merge window as feedback was not complete before heading of for
     travel.

   - Switch the timerqueue code to use cached rbtrees and get rid of the
     homebrewn caching of the leftmost node.

   - Consolidate hrtimer_init() + hrtimer_init_sleeper() calls into a
     single function

   - Implement the separation of hrtimers to be forced to expire in hard
     interrupt context even when PREEMPT_RT is enabled and mark the
     affected timers accordingly.

   - Implement a mechanism for hrtimers and the timer wheel to protect
     RT against priority inversion and live lock issues when a (hr)timer
     which should be canceled is currently executing the callback.
     Instead of infinitely spinning, the task which tries to cancel the
     timer blocks on a per cpu base expiry lock which is held and
     released by the (hr)timer expiry code.

   - Enable the Hyper-V TSC page based sched_clock for Hyper-V guests
     resulting in faster access to timekeeping functions.

   - Updates to various clocksource/clockevent drivers and their device
     tree bindings.

   - The usual small improvements all over the place"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (101 commits)
  posix-cpu-timers: Fix permission check regression
  posix-cpu-timers: Always clear head pointer on dequeue
  hrtimer: Add a missing bracket and hide `migration_base' on !SMP
  posix-cpu-timers: Make expiry_active check actually work correctly
  posix-timers: Unbreak CONFIG_POSIX_TIMERS=n build
  tick: Mark sched_timer to expire in hard interrupt context
  hrtimer: Add kernel doc annotation for HRTIMER_MODE_HARD
  x86/hyperv: Hide pv_ops access for CONFIG_PARAVIRT=n
  posix-cpu-timers: Utilize timerqueue for storage
  posix-cpu-timers: Move state tracking to struct posix_cputimers
  posix-cpu-timers: Deduplicate rlimit handling
  posix-cpu-timers: Remove pointless comparisons
  posix-cpu-timers: Get rid of 64bit divisions
  posix-cpu-timers: Consolidate timer expiry further
  posix-cpu-timers: Get rid of zero checks
  rlimit: Rewrite non-sensical RLIMIT_CPU comment
  posix-cpu-timers: Respect INFINITY for hard RTTIME limit
  posix-cpu-timers: Switch thread group sampling to array
  posix-cpu-timers: Restructure expiry array
  posix-cpu-timers: Remove cputime_expires
  ...
2019-09-17 12:35:15 -07:00
Colin Ian King
aa28b98d6d unicode: make array 'token' static const, makes object smaller
Don't populate the array 'token' on the stack but instead make it
static const. Makes the object code smaller by 234 bytes.

Before:
   text	   data	    bss	    dec	    hex	filename
   5371	    272	      0	   5643	   160b	fs/unicode/utf8-core.o

After:
   text	   data	    bss	    dec	    hex	filename
   5041	    368	      0	   5409	   1521	fs/unicode/utf8-core.o

(gcc version 9.2.1, amd64)

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
2019-09-17 11:48:24 -04:00
Krzysztof Wilczynski
334b427e96 unicode: Move static keyword to the front of declarations
Move the static keyword to the front of declarations of nfdi_test_data
and nfdicf_test_data, and resolve the following compiler warnings that
can be seen when building with warnings enabled (W=1):

fs/unicode/utf8-selftest.c:38:1: warning:
  ‘static’ is not at beginning of declaration [-Wold-style-declaration]

fs/unicode/utf8-selftest.c:92:1: warning:
  ‘static’ is not at beginning of declaration [-Wold-style-declaration]

Signed-off-by: Krzysztof Wilczynski <kw@linux.com>
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com>
2019-09-17 11:48:11 -04:00
Bob Peterson
f0b444b349 gfs2: clear buf_in_tr when ending a transaction in sweep_bh_for_rgrps
In function sweep_bh_for_rgrps, which is a helper for punch_hole,
it uses variable buf_in_tr to keep track of when it needs to commit
pending block frees on a partial delete that overflows the
transaction created for the delete. The problem is that the
variable was initialized at the start of function sweep_bh_for_rgrps
but it was never cleared, even when starting a new transaction.

This patch reinitializes the variable when the transaction is
ended, so the next transaction starts out with it cleared.

Fixes: d552a2b9b3 ("GFS2: Non-recursive delete")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2019-09-17 16:50:50 +02:00
Rafael J. Wysocki
d281706369 Merge branch 'pm-sleep'
* pm-sleep: (29 commits)
  ACPI: PM: s2idle: Always set up EC GPE for system wakeup
  ACPI: PM: s2idle: Avoid rearming SCI for wakeup unnecessarily
  PM / wakeup: Unexport wakeup_source_sysfs_{add,remove}()
  PM / wakeup: Register wakeup class kobj after device is added
  PM / wakeup: Fix sysfs registration error path
  PM / wakeup: Show wakeup sources stats in sysfs
  PM / wakeup: Use wakeup_source_register() in wakelock.c
  PM / wakeup: Drop wakeup_source_init(), wakeup_source_prepare()
  PM: sleep: Replace strncmp() with str_has_prefix()
  PM: suspend: Fix platform_suspend_prepare_noirq()
  intel-hid: Disable button array during suspend-to-idle
  intel-hid: intel-vbtn: Avoid leaking wakeup_mode set
  ACPI: PM: s2idle: Execute LPS0 _DSM functions with suspended devices
  ACPI: EC: PM: Make acpi_ec_dispatch_gpe() print debug message
  ACPI: EC: PM: Consolidate some code depending on PM_SLEEP
  ACPI: PM: s2idle: Eliminate acpi_sleep_no_ec_events()
  ACPI: PM: s2idle: Switch EC over to polling during "noirq" suspend
  ACPI: PM: s2idle: Add acpi.sleep_no_lps0 module parameter
  ACPI: PM: s2idle: Rearrange lps0_device_attach()
  PM/sleep: Expose suspend stats in sysfs
  ...
2019-09-17 09:36:34 +02:00
Steve French
4d6bcba70a cifs: update internal module version number
To 2.23

Signed-off-by: Steve French <stfrench@microsoft.com>
2019-09-16 19:18:39 -05:00
Aurelien Aptel
e37a02c7eb cifs: modefromsid: write mode ACE first
DACL should start with mode ACE first but we are putting it at the
end. reorder them to put it first.

Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-09-16 18:49:11 -05:00
Aurelien Aptel
352f2c9a57 cifs: cifsroot: add more err checking
make cifs more verbose about buffer size errors
and add some comments

Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-09-16 11:43:39 -05:00
Steve French
c3498185b7 smb3: add missing worker function for SMB3 change notify
SMB3 change notify is important to allow applications to wait
on directory change events of different types (e.g. adding
and deleting files from others systems). Add worker functions
for this.

Acked-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-09-16 11:43:39 -05:00
Paulo Alcantara (SUSE)
8eecd1c2e5 cifs: Add support for root file systems
Introduce a new CONFIG_CIFS_ROOT option to handle root file systems
over a SMB share.

In order to mount the root file system during the init process, make
cifs.ko perform non-blocking socket operations while mounting and
accessing it.

Cc: Steve French <smfrench@gmail.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Paulo Alcantara (SUSE) <paulo@paulo.ac>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-09-16 11:43:38 -05:00
Aurelien Aptel
0892ba693f cifs: modefromsid: make room for 4 ACE
when mounting with modefromsid, we end up writing 4 ACE in a security
descriptor that only has room for 3, thus triggering an out-of-bounds
write. fix this by changing the min size of a security descriptor.

Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-09-16 11:43:38 -05:00
Steve French
2255397c33 smb3: fix potential null dereference in decrypt offload
commit a091c5f67c99 ("smb3: allow parallelizing decryption of reads")
had a potential null dereference

Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Suggested-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-09-16 11:43:38 -05:00
Steve French
96d9f7ed00 smb3: fix unmount hang in open_shroot
An earlier patch "CIFS: fix deadlock in cached root handling"
did not completely address the deadlock in open_shroot. This
patch addresses the deadlock.

In testing the recent patch:
  smb3: improve handling of share deleted (and share recreated)
we were able to reproduce the open_shroot deadlock to one
of the target servers in unmount in a delete share scenario.

Fixes: 7e5a70ad88 ("CIFS: fix deadlock in cached root handling")

This is version 2 of this patch. An earlier version of this
patch "smb3: fix unmount hang in open_shroot" had a problem
found by Dan.

Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>

Suggested-by: Pavel Shilovsky <pshilov@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Aurelien Aptel <aaptel@suse.com>
CC: Stable <stable@vger.kernel.org>
2019-09-16 11:43:38 -05:00
Steve French
3e7a02d478 smb3: allow disabling requesting leases
In some cases to work around server bugs or performance
problems it can be helpful to be able to disable requesting
SMB2.1/SMB3 leases on a particular mount (not to all servers
and all shares we are mounted to). Add new mount parm
"nolease" which turns off requesting leases on directory
or file opens.  Currently the only way to disable leases is
globally through a module load parameter. This is more
granular.

Suggested-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
CC: Stable <stable@vger.kernel.org>
2019-09-16 11:43:38 -05:00
Steve French
7dcc82c2df smb3: improve handling of share deleted (and share recreated)
When a share is deleted, returning EIO is confusing and no useful
information is logged.  Improve the handling of this case by
at least logging a better error for this (and also mapping the error
differently to EREMCHG).  See e.g. the new messages that would be logged:

[55243.639530] server share \\192.168.1.219\scratch deleted
[55243.642568] CIFS VFS: \\192.168.1.219\scratch BAD_NETWORK_NAME: \\192.168.1.219\scratch

In addition for the case where a share is deleted and then recreated
with the same name, have now fixed that so it works. This is sometimes
done for example, because the admin had to move a share to a different,
bigger local drive when a share is running low on space.

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2019-09-16 11:43:38 -05:00
Steve French
1b63f1840e smb3: display max smb3 requests in flight at any one time
Displayed in /proc/fs/cifs/Stats once for each
socket we are connected to.

This allows us to find out what the maximum number of
requests that had been in flight (at any one time). Note that
/proc/fs/cifs/Stats can be reset if you want to look for
maximum over a small period of time.

Sample output (immediately after mount):

Resources in use
CIFS Session: 1
Share (unique mount targets): 2
SMB Request/Response Buffer: 1 Pool size: 5
SMB Small Req/Resp Buffer: 1 Pool size: 30
Operations (MIDs): 0

0 session 0 share reconnects
Total vfs operations: 5 maximum at one time: 2

Max requests in flight: 2
1) \\localhost\scratch
SMBs: 18
Bytes read: 0  Bytes written: 0
...

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2019-09-16 11:43:38 -05:00
Steve French
10328c44cc smb3: only offload decryption of read responses if multiple requests
No point in offloading read decryption if no other requests on the
wire

Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
2019-09-16 11:43:38 -05:00
Ronnie Sahlberg
496902dc17 cifs: add a helper to find an existing readable handle to a file
and convert smb2_query_path_info() to use it.
This will eliminate the need for a SMB2_Create when we already have an
open handle that can be used. This will also prevent a oplock break
in case the other handle holds a lease.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-09-16 11:43:38 -05:00
Steve French
563317ec30 smb3: enable offload of decryption of large reads via mount option
Disable offload of the decryption of encrypted read responses
by default (equivalent to setting this new mount option "esize=0").

Allow setting the minimum encrypted read response size that we
will choose to offload to a worker thread - it is now configurable
via on a new mount option "esize="

Depending on which encryption mechanism (GCM vs. CCM) and
the number of reads that will be issued in parallel and the
performance of the network and CPU on the client, it may make
sense to enable this since it can provide substantial benefit when
multiple large reads are in flight at the same time.

Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
2019-09-16 11:43:38 -05:00
Steve French
35cf94a397 smb3: allow parallelizing decryption of reads
decrypting large reads on encrypted shares can be slow (e.g. adding
multiple milliseconds per-read on non-GCM capable servers or
when mounting with dialects prior to SMB3.1.1) - allow parallelizing
of read decryption by launching worker threads.

Testing to Samba on localhost showed 25% improvement.
Testing to remote server showed very large improvement when
doing more than one 'cp' command was called at one time.

Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
2019-09-16 11:43:38 -05:00
Ronnie Sahlberg
3175eb9b57 cifs: add a debug macro that prints \\server\share for errors
Where we have a tcon available we can log \\server\share as part
of the message. Only do this for the VFS log level.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-09-16 11:43:38 -05:00
Steve French
46f17d1768 smb3: fix signing verification of large reads
Code cleanup in the 5.1 kernel changed the array
passed into signing verification on large reads leading
to warning messages being logged when copying files to local
systems from remote.

   SMB signature verification returned error = -5

This changeset fixes verification of SMB3 signatures of large
reads.

Suggested-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2019-09-16 11:43:38 -05:00
Steve French
4f5c10f1ad smb3: allow skipping signature verification for perf sensitive configurations
Add new mount option "signloosely" which enables signing but skips the
sometimes expensive signing checks in the responses (signatures are
calculated and sent correctly in the SMB2/SMB3 requests even with this
mount option but skipped in the responses).  Although weaker for security
(and also data integrity in case a packet were corrupted), this can provide
enough of a performance benefit (calculating the signature to verify a
packet can be expensive especially for large packets) to be useful in
some cases.

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2019-09-16 11:43:38 -05:00
Steve French
f90f979726 smb3: add dynamic tracepoints for flush and close
We only had dynamic tracepoints on errors in flush
and close, but may be helpful to trace enter
and non-error exits for those.  Sample trace examples
(excerpts) from "cp" and "dd" show two of the new
tracepoints.

  cp-22823 [002] .... 123439.179701: smb3_enter: _cifsFileInfo_put: xid=10
  cp-22823 [002] .... 123439.179705: smb3_close_enter: xid=10 sid=0x98871327 tid=0xfcd585ff fid=0xc7f84682
  cp-22823 [002] .... 123439.179711: smb3_cmd_enter: sid=0x98871327 tid=0xfcd585ff cmd=6 mid=43
  cp-22823 [002] .... 123439.180175: smb3_cmd_done: sid=0x98871327 tid=0xfcd585ff cmd=6 mid=43
  cp-22823 [002] .... 123439.180179: smb3_close_done: xid=10 sid=0x98871327 tid=0xfcd585ff fid=0xc7f84682

  dd-22981 [003] .... 123696.946011: smb3_flush_enter: xid=24 sid=0x98871327 tid=0xfcd585ff fid=0x1917736f
  dd-22981 [003] .... 123696.946013: smb3_cmd_enter: sid=0x98871327 tid=0xfcd585ff cmd=7 mid=123
  dd-22981 [003] .... 123696.956639: smb3_cmd_done: sid=0x98871327 tid=0x0 cmd=7 mid=123
  dd-22981 [003] .... 123696.956644: smb3_flush_done: xid=24 sid=0x98871327 tid=0xfcd585ff fid=0x1917736f

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2019-09-16 11:43:38 -05:00
Steve French
cae53f70f8 smb3: log warning if CSC policy conflicts with cache mount option
If the server config (e.g. Samba smb.conf "csc policy = disable)
for the share indicates that the share should not be cached, log
a warning message if forced client side caching ("cache=ro" or
"cache=singleclient") is requested on mount.

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2019-09-16 11:43:38 -05:00
Steve French
41e033fecd smb3: add mount option to allow RW caching of share accessed by only 1 client
If a share is known to be only to be accessed by one client, we
can aggressively cache writes not just reads to it.

Add "cache=" option (cache=singleclient) for mounting read write shares
(that will not be read or written to from other clients while we have
it mounted) in order to improve performance.

Signed-off-by: Steve French <stfrench@microsoft.com>
2019-09-16 11:43:38 -05:00
Steve French
1981ebaabd smb3: add some more descriptive messages about share when mounting cache=ro
Add some additional logging so the user can see if the share they
mounted with cache=ro is considered read only by the server

CIFS: Attempting to mount //localhost/test
CIFS VFS: mounting share with read only caching. Ensure that the share will not be modified while in use.
CIFS VFS: read only mount of RW share

CIFS: Attempting to mount //localhost/test-ro
CIFS VFS: mounting share with read only caching. Ensure that the share will not be modified while in use.
CIFS VFS: mounted to read only share

Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
2019-09-16 11:43:37 -05:00
Steve French
83bbfa706d smb3: add mount option to allow forced caching of read only share
If a share is immutable (at least for the period that it will
be mounted) it would be helpful to not have to revalidate
dentries repeatedly that we know can not be changed remotely.

Add "cache=" option (cache=ro) for mounting read only shares
in order to improve performance in cases in which we know that
the share will not be changing while it is in use.

Signed-off-by: Steve French <stfrench@microsoft.com>
2019-09-16 11:43:37 -05:00
Colin Ian King
ac6ad7a8c9 cifs: fix dereference on ses before it is null checked
The assignment of pointer server dereferences pointer ses, however,
this dereference occurs before ses is null checked and hence we
have a potential null pointer dereference.  Fix this by only
dereferencing ses after it has been null checked.

Addresses-Coverity: ("Dereference before null check")
Fixes: 2808c6639104 ("cifs: add new debugging macro cifs_server_dbg")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-09-16 11:43:37 -05:00
Ronnie Sahlberg
afe6f65353 cifs: add new debugging macro cifs_server_dbg
which can be used from contexts where we have a TCP_Server_Info *server.
This new macro will prepend the debugging string with "Server:<servername> "
which will help when debugging issues on hosts with many cifs connections
to several different servers.

Convert a bunch of cifs_dbg(VFS) calls to cifs_server_dbg(VFS)

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-09-16 11:43:37 -05:00
Ronnie Sahlberg
dc9300a670 cifs: use existing handle for compound_op(OP_SET_INFO) when possible
If we already have a writable handle for a path we want to set the
attributes for then use that instead of a create/set-info/close compound.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-09-16 11:43:37 -05:00
Ronnie Sahlberg
8de9e86c67 cifs: create a helper to find a writeable handle by path name
rename() takes a path for old_file and in SMB2 we used to just create
a compound for create(old_path)/rename/close().
If we already have a writable handle we can avoid the create() and close()
altogether and just use the existing handle.

For this situation, as we avoid doing the create()
we also avoid triggering an oplock break for the existing handle.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-09-16 11:43:37 -05:00
YueHaibing
31ebdc1134 cifs: remove set but not used variables
Fixes gcc '-Wunused-but-set-variable' warning:

fs/cifs/file.c: In function cifs_lock:
fs/cifs/file.c:1696:24: warning: variable cinode set but not used [-Wunused-but-set-variable]
fs/cifs/file.c: In function cifs_write:
fs/cifs/file.c:1765:23: warning: variable cifs_sb set but not used [-Wunused-but-set-variable]
fs/cifs/file.c: In function collect_uncached_read_data:
fs/cifs/file.c:3578:20: warning: variable tcon set but not used [-Wunused-but-set-variable]

'cinode' is never used since introduced by
commit 03776f4516 ("CIFS: Simplify byte range locking code")
'cifs_sb' is not used since commit cb7e9eabb2 ("CIFS: Use
multicredits for SMB 2.1/3 writes").
'tcon' is not used since commit d26e2903fc ("smb3: fix bytes_read statistics")

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-09-16 11:43:37 -05:00
Steve French
df58fae724 smb3: Incorrect size for netname negotiate context
It is not null terminated (length was off by two).

Also see similar change to Samba:

https://gitlab.com/samba-team/samba/merge_requests/666

Reported-by: Stefan Metzmacher <metze@samba.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-09-16 11:43:37 -05:00
zhengbin
2617474bfa cifs: remove unused variable
In smb3_punch_hole, variable cifsi set but not used, remove it.
In cifs_lock, variable netfid set but not used, remove it.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-09-16 11:43:37 -05:00
Colin Ian King
1efd4fc72e cifs: remove redundant assignment to variable rc
Variable rc is being initialized with a value that is never read
and rc is being re-assigned a little later on. The assignment is
redundant and hence can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-09-16 11:43:37 -05:00
Steve French
59519803a9 smb3: add missing flag definitions
SMB3 and 3.1.1 added two additional flags including
the priority mask.  Add them to our protocol definitions
in smb2pdu.h.  See MS-SMB2 2.2.1.2

Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
2019-09-16 11:43:37 -05:00
Ronnie Sahlberg
0e90696dc2 cifs: add passthrough for smb2 setinfo
Add support to send smb2 set-info commands from userspace.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Paulo Alcantara <palcantara@suse.de>
2019-09-16 11:43:37 -05:00
Ronnie Sahlberg
86e14e1205 cifs: prepare SMB2_Flush to be usable in compounds
Create smb2_flush_init() and smb2_flush_free() so we can use the flush command
in compounds.

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-09-16 11:43:37 -05:00
Steve French
22442179a5 cifs: allow chmod to set mode bits using special sid
When mounting with "modefromsid" set mode bits (chmod) by
    adding ACE with special SID (S-1-5-88-3-<mode>) to the ACL.
    Subsequent patch will fix setting default mode on file
    create and mkdir.

    See See e.g.
        https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2008-R2-and-2008/hh509017(v=ws.10)

Signed-off-by: Steve French <stfrench@microsoft.com>
2019-09-16 11:43:37 -05:00
Steve French
e2f8fbfb8d cifs: get mode bits from special sid on stat
When mounting with "modefromsid" retrieve mode bits from
special SID (S-1-5-88-3) on stat.  Subsequent patch will fix
setattr (chmod) to save mode bits in S-1-5-88-3-<mode>

Note that when an ACE matching S-1-5-88-3 is not found, we
default the mode to an approximation based on the owner, group
and everyone permissions (as with the "cifsacl" mount option).

See See e.g.
    https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2008-R2-and-2008/hh509017(v=ws.10)

Signed-off-by: Steve French <stfrench@microsoft.com>
2019-09-16 11:43:37 -05:00
Colin Ian King
1afdea4f19 fs: cifs: cifsssmb: remove redundant assignment to variable ret
The variable ret is being initialized however this is never read
and later it is being reassigned to a new value. The initialization
is redundant and hence can be removed.

Addresses-Coverity: ("Unused Value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-09-16 11:43:37 -05:00
Ronnie Sahlberg
becc2ba26a cifs: fix a comment for the timeouts when sending echos
Clarify a trivial comment

Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
2019-09-16 11:43:37 -05:00
Chao Yu
8223ecc456 f2fs: fix to add missing F2FS_IO_ALIGNED() condition
In f2fs_allocate_data_block(), we will reset fio.retry for IO
alignment feature instead of IO serialization feature.

In addition, spread F2FS_IO_ALIGNED() to check IO alignment
feature status explicitly.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-09-16 08:38:49 -07:00
Chao Yu
9720ee80aa f2fs: fix to fallback to buffered IO in IO aligned mode
In LFS mode, we allow OPU for direct IO, however, we didn't consider
IO alignment feature, so direct IO can trigger unaligned IO, let's
just fallback to buffered IO to keep correct IO alignment semantics
in all places.

Fixes: f847c699cf ("f2fs: allow out-place-update for direct IO in LFS mode")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-09-16 08:38:49 -07:00
Chao Yu
05e360061c f2fs: fix to handle error path correctly in f2fs_map_blocks
In f2fs_map_blocks(), we should bail out once __allocate_data_block()
failed.

Fixes: f847c699cf ("f2fs: allow out-place-update for direct IO in LFS mode")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-09-16 08:38:49 -07:00
Chao Yu
86f35dc39e f2fs: fix extent corrupotion during directIO in LFS mode
In LFS mode, por_fsstress testcase reports a bug as below:

[ASSERT] (fsck_chk_inode_blk: 931)  --> ino: 0x12fe has wrong ext: [pgofs:142, blk:215424, len:16]

Since commit f847c699cf ("f2fs: allow out-place-update for direct
IO in LFS mode"), we start to allow OPU mode for direct IO, however,
we missed to update extent cache in __allocate_data_block(), finally,
it cause extent field being inconsistent with physical block address,
fix it.

Fixes: f847c699cf ("f2fs: allow out-place-update for direct IO in LFS mode")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-09-16 08:38:49 -07:00
Surbhi Palande
1166c1f2f6 f2fs: check all the data segments against all node ones
As a part of the sanity checking while mounting, distinct segment number
assignment to data and node segments is verified. Fixing a small bug in
this verification between node and data segments. We need to check all
the data segments with all the node segments.

Fixes: 042be0f849 ("f2fs: fix to do sanity check with current segment number")
Signed-off-by: Surbhi Palande <csurbhi@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-09-16 08:38:48 -07:00
Lockywolf
bd7253bc5e f2fs: Add a small clarification to CONFIG_FS_F2FS_FS_SECURITY
Signed-off-by: Lockywolf <lockywolf@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-09-16 08:38:48 -07:00
Goldwyn Rodrigues
cb8434f164 f2fs: fix inode rwsem regression
This is similar to 942491c9e6 ("xfs: fix AIM7 regression")
Apparently our current rwsem code doesn't like doing the trylock, then
lock for real scheme.  So change our read/write methods to just do the
trylock for the RWF_NOWAIT case.

We don't need a check for IOCB_NOWAIT and !direct-IO because it
is checked in generic_write_checks().

Fixes: b91050a80c ("f2fs: add nowait aio support")
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-09-16 08:38:45 -07:00
Chao Yu
9819403055 f2fs: fix to avoid accessing uninitialized field of inode page in is_alive()
If inode is newly created, inode page may not synchronize with inode cache,
so fields like .i_inline or .i_extra_isize could be wrong, in below call
path, we may access such wrong fields, result in failing to migrate valid
target block.

Thread A				Thread B
- f2fs_create
 - f2fs_add_link
  - f2fs_add_dentry
   - f2fs_init_inode_metadata
    - f2fs_add_inline_entry
     - f2fs_new_inode_page
     - f2fs_put_page
     : inode page wasn't updated with inode cache
					- gc_data_segment
					 - is_alive
					  - f2fs_get_node_page
					  - datablock_addr
					   - offset_in_addr
					   : access uninitialized fields

Fixes: 7a2af766af ("f2fs: enhance on-disk inode structure scalability")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-09-16 08:38:26 -07:00
Jaegeuk Kim
743b620cb0 f2fs: avoid infinite GC loop due to stale atomic files
If committing atomic pages is failed when doing f2fs_do_sync_file(), we can
get commited pages but atomic_file being still set like:

- inmem:    0, atomic IO:    4 (Max.   10), volatile IO:    0 (Max.    0)

If GC selects this block, we can get an infinite loop like this:

f2fs_submit_page_bio: dev = (253,7), ino = 2, page_index = 0x2359a8, oldaddr = 0x2359a8, newaddr = 0x2359a8, rw = READ(), type = COLD_DATA
f2fs_submit_read_bio: dev = (253,7)/(253,7), rw = READ(), DATA, sector = 18533696, size = 4096
f2fs_get_victim: dev = (253,7), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 4355, cost = 1, ofs_unit = 1, pre_victim_secno = 4355, prefree = 0, free = 234
f2fs_iget: dev = (253,7), ino = 6247, pino = 5845, i_mode = 0x81b0, i_size = 319488, i_nlink = 1, i_blocks = 624, i_advise = 0x2c
f2fs_submit_page_bio: dev = (253,7), ino = 2, page_index = 0x2359a8, oldaddr = 0x2359a8, newaddr = 0x2359a8, rw = READ(), type = COLD_DATA
f2fs_submit_read_bio: dev = (253,7)/(253,7), rw = READ(), DATA, sector = 18533696, size = 4096
f2fs_get_victim: dev = (253,7), type = No TYPE, policy = (Foreground GC, LFS-mode, Greedy), victim = 4355, cost = 1, ofs_unit = 1, pre_victim_secno = 4355, prefree = 0, free = 234
f2fs_iget: dev = (253,7), ino = 6247, pino = 5845, i_mode = 0x81b0, i_size = 319488, i_nlink = 1, i_blocks = 624, i_advise = 0x2c

In that moment, we can observe:

[Before]
Try to move 5084219 blocks (BG: 384508)
  - data blocks : 4962373 (274483)
  - node blocks : 121846 (110025)
Skipped : atomic write 4534686 (10)

[After]
Try to move 5088973 blocks (BG: 384508)
  - data blocks : 4967127 (274483)
  - node blocks : 121846 (110025)
Skipped : atomic write 4539440 (10)

So, refactor atomic_write flow like this:
1. start_atomic_write
 - add inmem_list and set atomic_file

2. write()
 - register it in inmem_pages

3. commit_atomic_write
 - if no error, f2fs_drop_inmem_pages()
 - f2fs_commit_inmme_pages() failed
   : __revoked_inmem_pages() was done
 - f2fs_do_sync_file failed
   : abort_atomic_write later

4. abort_atomic_write
 - f2fs_drop_inmem_pages

5. f2fs_drop_inmem_pages
 - clear atomic_file
 - remove inmem_list

Based on this change, when GC fails to move block in atomic_file,
f2fs_drop_inmem_pages_all() can call f2fs_drop_inmem_pages().

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-09-16 08:38:20 -07:00
Wenwen Wang
6a379f6745 jffs2: Fix memory leak in jffs2_scan_eraseblock() error path
In jffs2_scan_eraseblock(), 'sumptr' is allocated through kmalloc() if
'sumlen' is larger than 'buf_size'. However, it is not deallocated in the
following execution if jffs2_fill_scan_buf() fails, leading to a memory
leak bug. To fix this issue, free 'sumptr' before returning the error.

Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Signed-off-by: Richard Weinberger <richard@nod.at>
2019-09-15 22:42:41 +02:00
Christoph Hellwig
61b875e88a jffs2: Remove jffs2_gc_fetch_page and jffs2_gc_release_page
Merge these two helpers into the only callers to get rid of some
amazingly bad calling conventions.

Suggested-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Richard Weinberger <richard@nod.at>
2019-09-15 22:42:33 +02:00
Jia-Ju Bai
f2538f9993 jffs2: Fix possible null-pointer dereferences in jffs2_add_frag_to_fragtree()
In jffs2_add_frag_to_fragtree(), there is an if statement on line 223 to
check whether "this" is NULL:
    if (this)

When "this" is NULL, it is used at several places, such as on line 249:
    if (this->node)
and on line 260:
    if (newfrag->ofs > this->ofs)

Thus possible null-pointer dereferences may occur.

To fix these bugs, -EINVAL is returned when "this" is NULL.

These bugs are found by a static analysis tool STCheck written by us.

Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2019-09-15 22:42:10 +02:00
Wenwen Wang
9163e0184b ubifs: Fix memory leak bug in alloc_ubifs_info() error path
In ubifs_mount(), 'c' is allocated through kzalloc() in alloc_ubifs_info().
However, it is not deallocated in the following execution if
ubifs_fill_super() fails, leading to a memory leak bug. To fix this issue,
free 'c' before going to the 'out_deact' label.

Fixes: 1e51764a3c ("UBIFS: add new flash file system")
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Signed-off-by: Richard Weinberger <richard@nod.at>
2019-09-15 22:12:20 +02:00
Wenwen Wang
7992e00469 ubifs: Fix memory leak in __ubifs_node_verify_hmac error path
In __ubifs_node_verify_hmac(), 'hmac' is allocated through kmalloc().
However, it is not deallocated in the following execution if
ubifs_node_calc_hmac() fails, leading to a memory leak bug. To fix this
issue, free 'hmac' before returning the error.

Fixes: 49525e5eec ("ubifs: Add helper functions for authentication support")
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Signed-off-by: Richard Weinberger <richard@nod.at>
2019-09-15 22:11:58 +02:00
Wenwen Wang
ce4d8b16e6 ubifs: Fix memory leak in read_znode() error path
In read_znode(), the indexing node 'idx' is allocated by kmalloc().
However, it is not deallocated in the following execution if
ubifs_node_check_hash() fails, leading to a memory leak bug. To fix this
issue, free 'idx' before returning the error.

Fixes: 16a26b20d2 ("ubifs: authentication: Add hashes to index nodes")
Signed-off-by: Wenwen Wang <wenwen@cs.uga.edu>
Signed-off-by: Richard Weinberger <richard@nod.at>
2019-09-15 22:11:18 +02:00
Colin Ian King
cbc898d52c ubifs: Remove redundant assignment to pointer fname
The pointer fname is being assigned with a value that is never
read because the function returns after the assignment. The assignment
is redundant and can be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
2019-09-15 21:55:12 +02:00
Linus Torvalds
72dbcf7215 Revert "ext4: make __ext4_get_inode_loc plug"
This reverts commit b03755ad6f.

This is sad, and done for all the wrong reasons.  Because that commit is
good, and does exactly what it says: avoids a lot of small disk requests
for the inode table read-ahead.

However, it turns out that it causes an entirely unrelated problem: the
getrandom() system call was introduced back in 2014 by commit
c6e9d6f388 ("random: introduce getrandom(2) system call"), and people
use it as a convenient source of good random numbers.

But part of the current semantics for getrandom() is that it waits for
the entropy pool to fill at least partially (unlike /dev/urandom).  And
at least ArchLinux apparently has a systemd that uses getrandom() at
boot time, and the improvements in IO patterns means that existing
installations suddenly start hanging, waiting for entropy that will
never happen.

It seems to be an unlucky combination of not _quite_ enough entropy,
together with a particular systemd version and configuration.  Lennart
says that the systemd-random-seed process (which is what does this early
access) is supposed to not block any other boot activity, but sadly that
doesn't actually seem to be the case (possibly due bogus dependencies on
cryptsetup for encrypted swapspace).

The correct fix is to fix getrandom() to not block when it's not
appropriate, but that fix is going to take a lot more discussion.  Do we
just make it act like /dev/urandom by default, and add a new flag for
"wait for entropy"? Do we add a boot-time option? Or do we just limit
the amount of time it will wait for entropy?

So in the meantime, we do the revert to give us time to discuss the
eventual fix for the fundamental problem, at which point we can re-apply
the ext4 inode table access optimization.

Reported-by: Ahmed S. Darwish <darwish.07@gmail.com>
Cc: Ted Ts'o <tytso@mit.edu>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Alexander E. Patrakov <patrakov@gmail.com>
Cc: Lennart Poettering <mzxreary@0pointer.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-09-15 12:32:03 -07:00
Daniel Xu
5277deaab9 io_uring: increase IORING_MAX_ENTRIES to 32K
Some workloads can require far more than 4K oustanding entries. For
example memcached can have ~300K sockets over ~40 cores. Bumping the max
to 32K seems to work pretty well.

Reported-by: Dan Melnic <dmm@fb.com>
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-14 17:06:22 -06:00
Jason Gunthorpe
75c66515e4 Merge tag 'v5.3-rc8' into rdma.git for-next
To resolve dependencies in following patches

mlx5_ib.h conflict resolved by keeing both hunks

Linux 5.3-rc8

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2019-09-13 16:59:51 -03:00
Linus Torvalds
1b304a1ae4 for-5.3-rc8-tag
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAl16aTEACgkQxWXV+ddt
 WDvICg//cSn5w+g6EnxbrAZ6IYQJ4GA7lZSk2i6Dc/lI3BTfs7Wj0SPRKd01pBjT
 N+wbqoOgubsS1jkNfJsGCN80XzSa0tvyQdbezj5ncgSPXp4FRlT0K24EUQNPaqbg
 SsvvxAOCerVN3Yj2qrHNWIS5qZ5/8/NjLXca1DJ/OYmrkKfhe+Z6/b9EuKffPnco
 erMnaeSvQ27hYkkcdM0DGcWDoHHAQrefGNjQzp5vncJNN1F7+EGLbcH31UwApk1K
 /hvOQ6Q6SoR/NKbQu3AitrR9u7v9uhWP9jHJZT46q1m89CzI4S5FjK2wKZFjPE6r
 0PGRqnpdaGAERaTo3s6jIqv/X2gzJkhhhzGMiPgPJCQbAH39f/fFGEX22TjG33Yq
 2CiGSIPnmKQ7HE494YLuSyHD/89SutMMCkbF0sFBoKuTnu2HQMn9r5Pk6bEKtvIY
 iTk75/WTXR02qWCVhTyNDa9QnxewQGJC1d1KNQ6MwbzBiYyG9S/DDZnjLJPNx7DF
 KAAANCDdyPpraLcmw2sD/qd1o10HfQmn9z1L2v3YvJBfjMe76SQFCP5WwaJRcjOm
 c3ScAX9bXeXJgH+E98kWc7T6p49IPdMDGAtArQmtjO4V8pFRuqG+2Ibg6Za/y5XZ
 fkaS5UY+XIk3TUpEqkWKMPMigM9a3jgHskyMgdRLQfVnoOc6Z+k=
 =KXB8
 -----END PGP SIGNATURE-----

Merge tag 'for-5.3-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
 "Here are two fixes, one of them urgent fixing a bug introduced in 5.2
  and reported by many users. It took time to identify the root cause,
  catching the 5.3 release is higly desired also to push the fix to 5.2
  stable tree.

  The bug is a mess up of return values after adding proper error
  handling and honestly the kind of bug that can cause sleeping
  disorders until it's caught. My appologies to everybody who was
  affected.

  Summary of what could happen:

  1) either a hang when committing a transaction, if this happens
     there's no risk of corruption, still the hang is very inconvenient
     and can't be resolved without a reboot

  2) writeback for some btree nodes may never be started and we end up
     committing a transaction without noticing that, this is really
     serious and that will lead to the "parent transid verify failed"
     messages"

* tag 'for-5.3-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  Btrfs: fix unwritten extent buffers and hangs on future writeback attempts
  Btrfs: fix assertion failure during fsync and use of stale transaction
2019-09-13 09:48:47 +01:00
David Howells
74983ac20a vfs: Make fs_parse() handle fs_param_is_fd-type params better
Make fs_parse() handle fs_param_is_fd-type parameters that are passed a
string by converting it to an integer (in addition to handling direct fd
specification).

Also range check the integer.

[fix from  Yin Fengwei folded]

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-09-12 21:06:14 -04:00
David Howells
f32356261d vfs: Convert ramfs, shmem, tmpfs, devtmpfs, rootfs to use the new mount API
Convert the ramfs, shmem, tmpfs, devtmpfs and rootfs filesystems to the new
internal mount API as the old one will be obsoleted and removed.  This
allows greater flexibility in communication of mount parameters between
userspace, the VFS and the filesystem.

See Documentation/filesystems/mount_api.txt for more information.

Note that tmpfs is slightly tricky as it can contain embedded commas, so it
can't be trivially split up using strsep() to break on commas in
generic_parse_monolithic().  Instead, tmpfs has to supply its own generic
parser.

However, if tmpfs changes, then devtmpfs and rootfs, which are wrappers
around tmpfs or ramfs, must change too - and thus so must ramfs, so these
had to be converted also.

[AV: rewritten]

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Hugh Dickins <hughd@google.com>
cc: linux-mm@kvack.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2019-09-12 21:05:34 -04:00
Jens Axboe
b2a9eadab8 io_uring: make sqpoll wakeup possible with getevents
The way the logic is setup in io_uring_enter() means that you can't wake
up the SQ poller thread while at the same time waiting (or polling) for
completions afterwards. There's no reason for that to be the case.

Reported-by: Lewis Baker <lbaker@fb.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-12 14:19:16 -06:00
Jens Axboe
6d5d5ac522 io_uring: extend async work merging
We currently merge async work items if we see a strict sequential hit.
This helps avoid unnecessary workqueue switches when we don't need
them. We can extend this merging to cover cases where it's not a strict
sequential hit, but the IO still fits within the same page. If an
application is doing multiple requests within the same page, we don't
want separate workers waiting on the same page to complete IO. It's much
faster to let the first worker bring in the page, then operate on that
page from the same worker to complete the next request(s).

Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-12 14:18:48 -06:00
Colin Ian King
e6b998ab62 orangefs: remove redundant assignment to err
Variable err is initialized to a value that is never read and it
is re-assigned later.  The initialization is redundant and can
be removed.

Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-09-12 14:17:16 -04:00
Artur Świgoń
c42293a951 orangefs: Add octal zero prefix
This patch adds a missing zero to mode 755 specification required to
express it in octal numeral system.

Reported-by: Łukasz Wrochna <l.wrochna@samsung.com>
Signed-off-by: Artur Świgoń <a.swigon@partner.samsung.com>
Signed-off-by: Mike Marshall <hubcap@omnibond.com>
2019-09-12 14:17:16 -04:00
Filipe Manana
18dfa7117a Btrfs: fix unwritten extent buffers and hangs on future writeback attempts
The lock_extent_buffer_io() returns 1 to the caller to tell it everything
went fine and the callers needs to start writeback for the extent buffer
(submit a bio, etc), 0 to tell the caller everything went fine but it does
not need to start writeback for the extent buffer, and a negative value if
some error happened.

When it's about to return 1 it tries to lock all pages, and if a try lock
on a page fails, and we didn't flush any existing bio in our "epd", it
calls flush_write_bio(epd) and overwrites the return value of 1 to 0 or
an error. The page might have been locked elsewhere, not with the goal
of starting writeback of the extent buffer, and even by some code other
than btrfs, like page migration for example, so it does not mean the
writeback of the extent buffer was already started by some other task,
so returning a 0 tells the caller (btree_write_cache_pages()) to not
start writeback for the extent buffer. Note that epd might currently have
either no bio, so flush_write_bio() returns 0 (success) or it might have
a bio for another extent buffer with a lower index (logical address).

Since we return 0 with the EXTENT_BUFFER_WRITEBACK bit set on the
extent buffer and writeback is never started for the extent buffer,
future attempts to writeback the extent buffer will hang forever waiting
on that bit to be cleared, since it can only be cleared after writeback
completes. Such hang is reported with a trace like the following:

  [49887.347053] INFO: task btrfs-transacti:1752 blocked for more than 122 seconds.
  [49887.347059]       Not tainted 5.2.13-gentoo #2
  [49887.347060] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [49887.347062] btrfs-transacti D    0  1752      2 0x80004000
  [49887.347064] Call Trace:
  [49887.347069]  ? __schedule+0x265/0x830
  [49887.347071]  ? bit_wait+0x50/0x50
  [49887.347072]  ? bit_wait+0x50/0x50
  [49887.347074]  schedule+0x24/0x90
  [49887.347075]  io_schedule+0x3c/0x60
  [49887.347077]  bit_wait_io+0x8/0x50
  [49887.347079]  __wait_on_bit+0x6c/0x80
  [49887.347081]  ? __lock_release.isra.29+0x155/0x2d0
  [49887.347083]  out_of_line_wait_on_bit+0x7b/0x80
  [49887.347084]  ? var_wake_function+0x20/0x20
  [49887.347087]  lock_extent_buffer_for_io+0x28c/0x390
  [49887.347089]  btree_write_cache_pages+0x18e/0x340
  [49887.347091]  do_writepages+0x29/0xb0
  [49887.347093]  ? kmem_cache_free+0x132/0x160
  [49887.347095]  ? convert_extent_bit+0x544/0x680
  [49887.347097]  filemap_fdatawrite_range+0x70/0x90
  [49887.347099]  btrfs_write_marked_extents+0x53/0x120
  [49887.347100]  btrfs_write_and_wait_transaction.isra.4+0x38/0xa0
  [49887.347102]  btrfs_commit_transaction+0x6bb/0x990
  [49887.347103]  ? start_transaction+0x33e/0x500
  [49887.347105]  transaction_kthread+0x139/0x15c

So fix this by not overwriting the return value (ret) with the result
from flush_write_bio(). We also need to clear the EXTENT_BUFFER_WRITEBACK
bit in case flush_write_bio() returns an error, otherwise it will hang
any future attempts to writeback the extent buffer, and undo all work
done before (set back EXTENT_BUFFER_DIRTY, etc).

This is a regression introduced in the 5.2 kernel.

Fixes: 2e3c25136a ("btrfs: extent_io: add proper error handling to lock_extent_buffer_for_io()")
Fixes: f4340622e0 ("btrfs: extent_io: Move the BUG_ON() in flush_write_bio() one level up")
Reported-by: Zdenek Sojka <zsojka@seznam.cz>
Link: https://lore.kernel.org/linux-btrfs/GpO.2yos.3WGDOLpx6t%7D.1TUDYM@seznam.cz/T/#u
Reported-by: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
Link: https://lore.kernel.org/linux-btrfs/5c4688ac-10a7-fb07-70e8-c5d31a3fbb38@profihost.ag/T/#t
Reported-by: Drazen Kacar <drazen.kacar@oradian.com>
Link: https://lore.kernel.org/linux-btrfs/DB8PR03MB562876ECE2319B3E579590F799C80@DB8PR03MB5628.eurprd03.prod.outlook.com/
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204377
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-12 13:37:25 +02:00
Filipe Manana
410f954cb1 Btrfs: fix assertion failure during fsync and use of stale transaction
Sometimes when fsync'ing a file we need to log that other inodes exist and
when we need to do that we acquire a reference on the inodes and then drop
that reference using iput() after logging them.

That generally is not a problem except if we end up doing the final iput()
(dropping the last reference) on the inode and that inode has a link count
of 0, which can happen in a very short time window if the logging path
gets a reference on the inode while it's being unlinked.

In that case we end up getting the eviction callback, btrfs_evict_inode(),
invoked through the iput() call chain which needs to drop all of the
inode's items from its subvolume btree, and in order to do that, it needs
to join a transaction at the helper function evict_refill_and_join().
However because the task previously started a transaction at the fsync
handler, btrfs_sync_file(), it has current->journal_info already pointing
to a transaction handle and therefore evict_refill_and_join() will get
that transaction handle from btrfs_join_transaction(). From this point on,
two different problems can happen:

1) evict_refill_and_join() will often change the transaction handle's
   block reserve (->block_rsv) and set its ->bytes_reserved field to a
   value greater than 0. If evict_refill_and_join() never commits the
   transaction, the eviction handler ends up decreasing the reference
   count (->use_count) of the transaction handle through the call to
   btrfs_end_transaction(), and after that point we have a transaction
   handle with a NULL ->block_rsv (which is the value prior to the
   transaction join from evict_refill_and_join()) and a ->bytes_reserved
   value greater than 0. If after the eviction/iput completes the inode
   logging path hits an error or it decides that it must fallback to a
   transaction commit, the btrfs fsync handle, btrfs_sync_file(), gets a
   non-zero value from btrfs_log_dentry_safe(), and because of that
   non-zero value it tries to commit the transaction using a handle with
   a NULL ->block_rsv and a non-zero ->bytes_reserved value. This makes
   the transaction commit hit an assertion failure at
   btrfs_trans_release_metadata() because ->bytes_reserved is not zero but
   the ->block_rsv is NULL. The produced stack trace for that is like the
   following:

   [192922.917158] assertion failed: !trans->bytes_reserved, file: fs/btrfs/transaction.c, line: 816
   [192922.917553] ------------[ cut here ]------------
   [192922.917922] kernel BUG at fs/btrfs/ctree.h:3532!
   [192922.918310] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
   [192922.918666] CPU: 2 PID: 883 Comm: fsstress Tainted: G        W         5.1.4-btrfs-next-47 #1
   [192922.919035] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
   [192922.919801] RIP: 0010:assfail.constprop.25+0x18/0x1a [btrfs]
   (...)
   [192922.920925] RSP: 0018:ffffaebdc8a27da8 EFLAGS: 00010286
   [192922.921315] RAX: 0000000000000051 RBX: ffff95c9c16a41c0 RCX: 0000000000000000
   [192922.921692] RDX: 0000000000000000 RSI: ffff95cab6b16838 RDI: ffff95cab6b16838
   [192922.922066] RBP: ffff95c9c16a41c0 R08: 0000000000000000 R09: 0000000000000000
   [192922.922442] R10: ffffaebdc8a27e70 R11: 0000000000000000 R12: ffff95ca731a0980
   [192922.922820] R13: 0000000000000000 R14: ffff95ca84c73338 R15: ffff95ca731a0ea8
   [192922.923200] FS:  00007f337eda4e80(0000) GS:ffff95cab6b00000(0000) knlGS:0000000000000000
   [192922.923579] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
   [192922.923948] CR2: 00007f337edad000 CR3: 00000001e00f6002 CR4: 00000000003606e0
   [192922.924329] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
   [192922.924711] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
   [192922.925105] Call Trace:
   [192922.925505]  btrfs_trans_release_metadata+0x10c/0x170 [btrfs]
   [192922.925911]  btrfs_commit_transaction+0x3e/0xaf0 [btrfs]
   [192922.926324]  btrfs_sync_file+0x44c/0x490 [btrfs]
   [192922.926731]  do_fsync+0x38/0x60
   [192922.927138]  __x64_sys_fdatasync+0x13/0x20
   [192922.927543]  do_syscall_64+0x60/0x1c0
   [192922.927939]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
   (...)
   [192922.934077] ---[ end trace f00808b12068168f ]---

2) If evict_refill_and_join() decides to commit the transaction, it will
   be able to do it, since the nested transaction join only increments the
   transaction handle's ->use_count reference counter and it does not
   prevent the transaction from getting committed. This means that after
   eviction completes, the fsync logging path will be using a transaction
   handle that refers to an already committed transaction. What happens
   when using such a stale transaction can be unpredictable, we are at
   least having a use-after-free on the transaction handle itself, since
   the transaction commit will call kmem_cache_free() against the handle
   regardless of its ->use_count value, or we can end up silently losing
   all the updates to the log tree after that iput() in the logging path,
   or using a transaction handle that in the meanwhile was allocated to
   another task for a new transaction, etc, pretty much unpredictable
   what can happen.

In order to fix both of them, instead of using iput() during logging, use
btrfs_add_delayed_iput(), so that the logging path of fsync never drops
the last reference on an inode, that step is offloaded to a safe context
(usually the cleaner kthread).

The assertion failure issue was sporadically triggered by the test case
generic/475 from fstests, which loads the dm error target while fsstress
is running, which lead to fsync failing while logging inodes with -EIO
errors and then trying later to commit the transaction, triggering the
assertion failure.

CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
2019-09-12 13:37:19 +02:00
Mark Salyzyn
5c2e9f346b ovl: filter of trusted xattr results in audit
When filtering xattr list for reading, presence of trusted xattr
results in a security audit log.  However, if there is other content
no errno will be set, and if there isn't, the errno will be -ENODATA
and not -EPERM as is usually associated with a lack of capability.
The check does not block the request to list the xattrs present.

Switch to ns_capable_noaudit to reflect a more appropriate check.

Signed-off-by: Mark Salyzyn <salyzyn@android.com>
Cc: linux-security-module@vger.kernel.org
Cc: kernel-team@android.com
Cc: stable@vger.kernel.org # v3.18+
Fixes: a082c6f680 ("ovl: filter trusted xattr for non-admin")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-11 16:11:45 +02:00
Ding Xiang
97f024b917 ovl: Fix dereferencing possible ERR_PTR()
if ovl_encode_real_fh() fails, no memory was allocated
and the error in the error-valued pointer should be returned.

Fixes: 9b6faee074 ("ovl: check ERR_PTR() return value from ovl_encode_fh()")
Signed-off-by: Ding Xiang <dingxiang@cmss.chinamobile.com>
Cc: <stable@vger.kernel.org> # v4.16+
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2019-09-11 16:11:45 +02:00
Al Viro
e9c03af21c configfs: calculate the symlink target only once
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-09-11 12:46:14 +02:00
Al Viro
2743c515a1 configfs: make configfs_create() return inode
Get rid of the callback, deal with that and dentry in callers

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-09-11 12:46:10 +02:00
Christoph Hellwig
1cf7a003b0 configfs: factor dirent removal into helpers
Lots of duplicated code that benefits from a little consolidation.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-09-11 12:45:57 +02:00
Al Viro
351e5d869e configfs: fix a deadlock in configfs_symlink()
Configfs abuses symlink(2).  Unlike the normal filesystems, it
wants the target resolved at symlink(2) time, like link(2) would've
done.  The problem is that ->symlink() is called with the parent
directory locked exclusive, so resolving the target inside the
->symlink() is easily deadlocked.

Short of really ugly games in sys_symlink() itself, all we can
do is to unlock the parent before resolving the target and
relock it after.  However, that invalidates the checks done
by the caller of ->symlink(), so we have to
	* check that dentry is still where it used to be
(it couldn't have been moved, but it could've been unhashed)
	* recheck that it's still negative (somebody else
might've successfully created a symlink with the same name
while we were looking the target up)
	* recheck the permissions on the parent directory.

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2019-09-11 12:45:49 +02:00
Jens Axboe
54a91f3bb9 io_uring: limit parallelism of buffered writes
All the popular filesystems need to grab the inode lock for buffered
writes. With io_uring punting buffered writes to async context, we
observe a lot of contention with all workers hamming this mutex.

For buffered writes, we generally don't need a lot of parallelism on
the submission side, as the flushing will take care of that for us.
Hence we don't need a deep queue on the write side, as long as we
can safely punt from the original submission context.

Add a workqueue with a limit of 2 that we can use for buffered writes.
This greatly improves the performance and efficiency of higher queue
depth buffered async writes with io_uring.

Reported-by: Andres Freund <andres@anarazel.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-10 09:49:35 -06:00
Jens Axboe
18d9be1a97 io_uring: add io_queue_async_work() helper
Add a helper for queueing a request for async execution, in preparation
for optimizing it.

No functional change in this patch.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-10 09:13:05 -06:00
Jens Axboe
c576666863 io_uring: optimize submit_and_wait API
For some applications that end up using a submit-and-wait type of
approach for certain batches of IO, we can make that a bit more
efficient by allowing the application to block for the last IO
submission. This prevents an async when we don't need it, as the
application will be blocking for the completion event(s) anyway.

Typical use cases are using the liburing
io_uring_submit_and_wait() API, or just using io_uring_enter()
doing both submissions and completions. As a specific example,
RocksDB doing MultiGet() is sped up quite a bit with this
change.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-10 08:21:03 -06:00
Jackie Liu
4fe2c96315 io_uring: add support for link with drain
To support the link with drain, we need to do two parts.

There is an sqes:

    0     1     2     3     4     5     6
 +-----+-----+-----+-----+-----+-----+-----+
 |  N  |  L  |  L  | L+D |  N  |  N  |  N  |
 +-----+-----+-----+-----+-----+-----+-----+

First, we need to ensure that the io before the link is completed,
there is a easy way is set drain flag to the link list's head, so
all subsequent io will be inserted into the defer_list.

	+-----+
    (0) |  N  |
	+-----+
           |          (2)         (3)         (4)
	+-----+     +-----+     +-----+     +-----+
    (1) | L+D | --> |  L  | --> | L+D | --> |  N  |
	+-----+     +-----+     +-----+     +-----+
           |
	+-----+
    (5) |  N  |
	+-----+
           |
	+-----+
    (6) |  N  |
	+-----+

Second, ensure that the following IO will not be completed first,
an easy way is to create a mirror of drain io and insert it into
defer_list, in this way, as long as drain io is not processed, the
following io in the defer_list will not be actively process.

	+-----+
    (0) |  N  |
	+-----+
           |          (2)         (3)         (4)
	+-----+     +-----+     +-----+     +-----+
    (1) | L+D | --> |  L  | --> | L+D | --> |  N  |
	+-----+     +-----+     +-----+     +-----+
           |
	+-----+
   ('3) |  D  |   <== This is a shadow of (3)
	+-----+
           |
	+-----+
    (5) |  N  |
	+-----+
           |
	+-----+
    (6) |  N  |
	+-----+

Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-09 16:15:00 -06:00
Jackie Liu
8776f3fa15 io_uring: fix wrong sequence setting logic
Sqo_thread will get sqring in batches, which will cause
ctx->cached_sq_head to be added in batches. if one of these
sqes is set with the DRAIN flag, then he will never get a
chance to process, and finally sqo_thread will not exit.

Fixes: de0617e467 ("io_uring: add support for marking commands as draining")
Signed-off-by: Jackie Liu <liuyun01@kylinos.cn>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2019-09-09 16:14:47 -06:00