Commit Graph

661346 Commits

Author SHA1 Message Date
Linus Torvalds
69fd110eb6 Merge branch 'work.sendmsg' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs sendmsg updates from Al Viro:
 "More sendmsg work.

  This is a fairly separate isolated stuff (there's a continuation
  around lustre, but that one was too late to soak in -next), thus the
  separate pull request"

* 'work.sendmsg' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  ncpfs: switch to sock_sendmsg()
  ncpfs: don't mess with manually advancing iovec on send
  ncpfs: sendmsg does *not* bugger iovec these days
  ceph_tcp_sendpage(): use ITER_BVEC sendmsg
  afs_send_pages(): use ITER_BVEC
  rds: remove dead code
  ceph: switch to sock_recvmsg()
  usbip_recv(): switch to sock_recvmsg()
  iscsi_target: deal with short writes on the tx side
  [nbd] pass iov_iter to nbd_xmit()
  [nbd] switch sock_xmit() to sock_{send,recv}msg()
  [drbd] use sock_sendmsg()
2017-03-02 15:16:38 -08:00
Jan Kara
165a5e22fa block: Move bdi_unregister() to del_gendisk()
Commit 6cd18e711d "block: destroy bdi before blockdev is
unregistered." moved bdi unregistration (at that time through
bdi_destroy()) from blk_release_queue() to blk_cleanup_queue() because
it needs to happen before blk_unregister_region() call in del_gendisk()
for MD. SCSI though will free up the device number from sd_remove()
called through a maze of callbacks from device_del() in
__scsi_remove_device() before blk_cleanup_queue() and thus similar races
as described in 6cd18e711d can happen for SCSI as well as reported by
Omar [1].

Moving bdi_unregister() to del_gendisk() works for MD and fixes the
problem for SCSI since del_gendisk() gets called from sd_remove() before
freeing the device number.

This also makes device_add_disk() (calling bdi_register_owner()) more
symmetric with del_gendisk().

[1] http://marc.info/?l=linux-block&m=148554717109098&w=2

Tested-by: Lekshmi Pillai <lekshmicpillai@in.ibm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jan Kara <jack@suse.cz>
Tested-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-02 16:08:35 -07:00
Aurelien Aptel
9d49640a21 CIFS: implement get_dfs_refer for SMB2+
in SMB2+ the get_dfs_refer operation uses a FSCTL. The request can be
made on any Tree Connection according to the specs. Since Samba only
accepted it on an IPC connection until recently, try that first.

https://lists.samba.org/archive/samba-technical/2017-February/118859.html

3.2.4.20.3 Application Requests DFS Referral Information:
> The client MUST search for an existing Session and TreeConnect to any
> share on the server identified by ServerName for the user identified by
> UserCredentials. If no Session and TreeConnect are found, the client
> MUST establish a new Session and TreeConnect to IPC$ on the target
> server as described in section 3.2.4.2 using the supplied ServerName and
> UserCredentials.

Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2017-03-02 17:05:31 -06:00
Aurelien Aptel
f0712928be CIFS: use DFS pathnames in SMB2+ Create requests
When connected to a DFS capable share, the client must set the
SMB2_FLAGS_DFS_OPERATIONS flag in the SMB2 header and use
DFS path names: "<server>\<share>\<path>" *without* leading \\.

Sources:

[MS-SMB2] 3.2.5.5 Receiving an SMB2 TREE_CONNECT Response
> TreeConnect.IsDfsShare MUST be set to TRUE, if the SMB2_SHARE_CAP_DFS
> bit is set in the Capabilities field of the response.

[MS-SMB2] 3.2.4.3 Application Requests Opening a File
> If TreeConnect.IsDfsShare is TRUE, the SMB2_FLAGS_DFS_OPERATIONS flag
> is set in the Flags field.

[MS-SMB2] 2.2.13 SMB2 CREATE Request, NameOffset:
> If SMB2_FLAGS_DFS_OPERATIONS is set in the Flags field of the SMB2
> header, the file name includes a prefix that will be processed during
> DFS name normalization as specified in section 3.3.5.9. Otherwise, the
> file name is relative to the share that is identified by the TreeId in
> the SMB2 header.

Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2017-03-02 17:04:58 -06:00
Linus Torvalds
821fd6f6cb Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
 "The highlights this round include:

   - enable dual mode (initiator + target) qla2xxx operation. (Quinn +
     Himanshu)

   - add a framework for qla2xxx async fabric discovery. (Quinn +
     Himanshu)

   - enable iscsi PDU DDP completion offload in cxgbit/T6 NICs. (Varun)

   - fix target-core handling of aborted failed commands. (Bart)

   - fix a long standing target-core issue NULL pointer dereference with
     active I/O LUN shutdown. (Rob Millner + Bryant + nab)"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (44 commits)
  target: Add counters for ABORT_TASK success + failure
  iscsi-target: Fix early login failure statistics misses
  target: Fix NULL dereference during LUN lookup + active I/O shutdown
  target: Delete tmr from list before processing
  target: Fix handling of aborted failed commands
  uapi: fix linux/target_core_user.h userspace compilation errors
  target: export protocol identifier
  qla2xxx: Fix a warning reported by the "smatch" static checker
  target/iscsi: Fix unsolicited data seq_end_offset calculation
  target/cxgbit: add T6 iSCSI DDP completion feature
  target/cxgbit: Enable DDP for T6 only if data sequence and pdu are in order
  target/cxgbit: Use T6 specific macros to get ETH/IP hdr len
  target/cxgbit: use cxgb4_tp_smt_idx() to get smt idx
  target/iscsi: split iscsit_check_dataout_hdr()
  target: Remove command flag CMD_T_DEV_ACTIVE
  target: Remove command flag CMD_T_BUSY
  target: Move session check from target_put_sess_cmd() into target_release_cmd_kref()
  target: Inline transport_cmd_check_stop()
  target: Remove an overly chatty debug message
  target: Stop execution if CMD_T_STOP has been set
  ...
2017-03-02 14:52:05 -08:00
Linus Torvalds
ca4c7d7c2b - A dm-raid stable@ fix for possible corruption when triggering a raid
reshape via lvm2; and an additional small patch ontop to bump version
   of the dm-raid target outside of the stable@ fix
 
 - A dm-raid fix for a 'dm-4.11-changes' regression introduced by a
   commit that was meant to only cleanup confusing branching.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJYuDs5AAoJEMUj8QotnQNa+qEH/i/hjhNJuGvAYjCXqf8Qxtyh
 kudWuk5eiUIXVZbs9HJF/ZwMH2B/ReE5Qg+zKis80Hn069jAVADwvMtENsy2mrKp
 I34hALtoySsXzVv0BccJ3jVAvwCA41E5HvJjSWFHmjAWgy4lIqtIqAJ8sxjPifJ4
 XAu98ZuR1rPthpjP0olnUgvwgvOEi9MlsckvksMLaZDz+Sn2cyQZinuMEGtslSzA
 fgoiCFDtC5R/RhYklAwP1wTieeH/szK1NHBBv9Wsp/ln78610d/KPw41rMLA2zy9
 YoNlR5EtAssCggD4Gl7JUOQ2z8FR0sYOfSYVd7gFEmh8PfBq6X1ctnfzpKbf/YA=
 =SB73
 -----END PGP SIGNATURE-----

Merge tag 'dm-4.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper fixes from Mike Snitzer:

 - a dm-raid stable@ fix for possible corruption when triggering a raid
   reshape via lvm2; and an additional small patch ontop to bump version
   of the dm-raid target outside of the stable@ fix

 - a dm-raid fix for a 'dm-4.11-changes' regression introduced by a
   commit that was meant to only cleanup confusing branching.

* tag 'dm-4.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
  dm raid: bump the target version
  dm raid: fix data corruption on reshape request
  dm raid: fix raid "check" regression due to improper cleanup in raid_message()
2017-03-02 14:36:00 -08:00
Arnd Bergmann
ca2dea434d Merge tag 'juno-fixes-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into next/late
Merge "ARMv8 Juno DT fix for v4.11" from Sudeep Holla:

Just single patch to fix replicator in order to prevent overflows at
the source and reduce the back pressure by splitting the trace output
to TPIU and ETR.

* tag 'juno-fixes-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux:
  arm64: dts: juno: update definition for programmable replicator
2017-03-02 23:08:31 +01:00
Linus Walleij
332524eaf7 ARM: deconfig: fix the moxart defconfig
The moxart defconfig wasn't even building a kernel for Moxart,
it was building a kernel for V4T on the nothing platform. Switch
to MULTI_V4 and keep the right drivers, update a few selections.
Now it (presumably) builds a minimalist Moxart kernel again.

Cc: Jonas Jensen <jonas.jensen@gmail.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2017-03-02 23:08:28 +01:00
Linus Torvalds
54d7989f47 virtio, vhost: optimizations, fixes
Looks like a quiet cycle for vhost/virtio, just a couple of minor
 tweaks. Most notable is automatic interrupt affinity for blk and scsi.
 Hopefully other devices are not far behind.
 
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJYt1rRAAoJECgfDbjSjVRpEZsIALSHevdXWtRHBZUb0ZkqPLQb
 /x2Vn49CcALS1p7iSuP9L027MPeaLKyr0NBT9hptBChp/4b9lnZWyyAo6vYQrzfx
 Ia/hLBYsK4ml6lEwbyfLwqkF2cmYCrZhBSVAILifn84lTPoN7CT0PlYDfA+OCaNR
 geo75qF8KR+AUO0aqchwMRL3RV3OxZKxQr2AR6LttCuhiBgnV3Xqxffg/M3x6ONM
 0ffFFdodm6slem3hIEiGUMwKj4NKQhcOleV+y0fVBzWfLQG9210pZbQyRBRikIL0
 7IsaarpaUr7OrLAZFMGF6nJnyRAaRrt6WknTHZkyvyggrePrGcmGgPm4jrODwY4=
 =2zwv
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull vhost updates from Michael Tsirkin:
 "virtio, vhost: optimizations, fixes

  Looks like a quiet cycle for vhost/virtio, just a couple of minor
  tweaks. Most notable is automatic interrupt affinity for blk and scsi.
  Hopefully other devices are not far behind"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  virtio-console: avoid DMA from stack
  vhost: introduce O(1) vq metadata cache
  virtio_scsi: use virtio IRQ affinity
  virtio_blk: use virtio IRQ affinity
  blk-mq: provide a default queue mapping for virtio device
  virtio: provide a method to get the IRQ affinity mask for a virtqueue
  virtio: allow drivers to request IRQ affinity when creating VQs
  virtio_pci: simplify MSI-X setup
  virtio_pci: don't duplicate the msix_enable flag in struct pci_dev
  virtio_pci: use shared interrupts for virtqueues
  virtio_pci: remove struct virtio_pci_vq_info
  vhost: try avoiding avail index access when getting descriptor
  virtio_mmio: expose header to userspace
2017-03-02 13:53:13 -08:00
Jens Axboe
113285b473 blk-mq: ensure that bd->last is always set correctly
When drivers are called with a request in blk-mq, blk-mq flags the
state such that the driver knows if this is the last request in
this call chain or not. The driver can then use that information
to defer kicking off IO until bd->last is true. However, with blk-mq
and scheduling, we need to allocate a driver tag for a request before
it can be issued. If we fail to allocate such a tag, we could end up
in the situation where the last request issued did not have
bd->last == true set. This can then cause a driver hang.

This fixes a hang with virtio-blk, which uses bd->last as a hint
on whether to kick the queue or not.

Reported-by: Chris Mason <clm@fb.com>
Tested-by: Chris Mason <clm@fb.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-02 14:30:51 -07:00
Linus Torvalds
0f221a3102 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem fixes from James Morris:
 "Two fixes for the security subsystem:

   - keys: split both rcu_dereference_key() and user_key_payload() into
     versions which can be called with or without holding the key
     semaphore.

   - SELinux: fix Android init(8) breakage due to new cgroup security
     labeling support when using older policy"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  selinux: wrap cgroup seclabel support with its own policy capability
  KEYS: Differentiate uses of rcu_dereference_key() and user_key_payload()
2017-03-02 13:22:18 -08:00
Jens Axboe
7b36a7189f block: don't call ioc_exit_icq() with the queue lock held for blk-mq
For legacy scheduling, we always call ioc_exit_icq() with both the
ioc and queue lock held. This poses a problem for blk-mq with
scheduling, since the queue lock isn't what we use in the scheduler.
And since we don't need the queue lock held for ioc exit there,
don't grab it and leave any extra locking up to the blk-mq scheduler.

Reported-by: Paolo Valente <paolo.valente@linaro.org>
Tested-by: Paolo Valente <paolo.valente@linaro.org>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-02 13:59:08 -07:00
Linus Torvalds
4f1f2b8f08 watchdog updates for v4.11-rc1, take 2
Fix fallout from enabling COMPILE_TEST
 Fix gcc-4.3 build of kempld watchdog driver
 Use hrtimer in softdog
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJYttg1AAoJEMsfJm/On5mBBacP/135qH1mSRqqvVJ+BpF13tal
 zkjOyQUJbVk4kLyaXIhBgunu6d+1bcjL/S8ZiQD1+QJmtCXcz8xAaGLCrVETt4GK
 DX6U4LowIk9sG8Qr5i1ZvmJtWLxTe4r7DBNC1QkPM839PAn5hFewhT3HQG9PCADm
 d2AiZfp/+tDUONvFcZS5MjdRIF7ZfckIXlaEBUTWLgvRV8HotTjQY6IWP9VviFHo
 XPkh6TjsBNovSfQcRfTs+tJV+RenS6XH1adNmsuHJ8CW1CNnQ6ew8UDEjNzXxiZC
 9KGzgxr8YP4eDPE6Ku0mu2bkPA2wqM8VOvzUv8kLGGUvPCyS+ZikDF2YgUyyAU5I
 7S5nASht7uExd0kcAiRfRAhwZt6HN34ug8xAg4NKxiFRXWEbYYWxxJzeKj65TCHS
 /kitrM9+SHy9xOB9ciEs+LpewPq7EEXjEuodQyWjvuzZ2ZtyW6tPm+A0YNe4lSNE
 76bF1NstuzHx4E/ygvGhClP4fE4jRc/Nf6+lgak6kg4YX9GBBZ/MKfin05klwbHr
 3VnhhAnV9+om1lbW/aADy/qo64jlDHaCOLSt/ZGOZvTv1RlbINuIwaMWetjM8Ya+
 YPXtGekAcdyGzAjBq4HxdjapG2TiaS4CugyfCePyD+mfFMyg0gDtNcJNzscwk/be
 1ZtCyg+b4AVPNON1tIqR
 =DM4t
 -----END PGP SIGNATURE-----

Merge tag 'watchdog-for-linus-v4.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull more watchdog updates from Guenter Roeck:

 - fix fallout from enabling COMPILE_TEST

 - fix gcc-4.3 build of kempld watchdog driver

 - use hrtimer in softdog

* tag 'watchdog-for-linus-v4.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  watchdog: retu: restore MFD dependency
  watchdog: db8500: add back prmcu dependency
  watchdog: kempld: fix gcc-4.3 build
  watchdog: softdog: fire watchdog even if softirqs do not get to run
  watchdog: kempld: revert to full dependency
  watchdog: bcm2835: add CONFIG_OF dependency
  watchdog: sp805: add back AMBA dependency
  watchdog: menf21bmc: add I2C dependency
  watchdog: geode: restore hard CS5535_MFGPT dependency
  watchdog: wm831x watchdog really needs mfd
2017-03-02 12:45:46 -08:00
Linus Torvalds
474c90156c give up on gcc ilog2() constant optimizations
gcc-7 has an "optimization" pass that completely screws up, and
generates the code expansion for the (impossible) case of calling
ilog2() with a zero constant, even when the code gcc compiles does not
actually have a zero constant.

And we try to generate a compile-time error for anybody doing ilog2() on
a constant where that doesn't make sense (be it zero or negative).  So
now gcc7 will fail the build due to our sanity checking, because it
created that constant-zero case that didn't actually exist in the source
code.

There's a whole long discussion on the kernel mailing about how to work
around this gcc bug.  The gcc people themselevs have discussed their
"feature" in

   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72785

but it's all water under the bridge, because while it looked at one
point like it would be solved by the time gcc7 was released, that was
not to be.

So now we have to deal with this compiler braindamage.

And the only simple approach seems to be to just delete the code that
tries to warn about bad uses of ilog2().

So now "ilog2()" will just return 0 not just for the value 1, but for
any non-positive value too.

It's not like I can recall anybody having ever actually tried to use
this function on any invalid value, but maybe the sanity check just
meant that such code never made it out in public.

Reported-by: Laura Abbott <labbott@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>,
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-03-02 12:17:22 -08:00
Linus Walleij
3131d970f0 ARM: ux500: resume the second core properly
The pen hold/release scheme was copied over to Ux500 from the ARM
reference designs like most of these at the time. It is not needed
at all, and was mostly removed in commit c00def71ef
"ARM: ux500: simplify secondary CPU boot".

However on the suspend/resume path and hot plug/unplug of CPUs,
the .cpu_die() callback was still waiting for the pen to be
released which made it spin forever and the second core never come
back online after suspend/resume.

Fix this by simply replacing the strange custom .cpu_die() with
a oneline wfi() just like e.g. the qcom platform does. This fixes
the issue and makes the second core come up properly after
suspend/resume.

As a side effect, this rids us of the completely surplus local
setup.h and hotplug.c files, and we just compile this into platsmp.c
with everything else SMP.

Cc: stable@vger.kernel.org
Fixes: c00def71ef ("ARM: ux500: simplify secondary CPU boot")
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2017-03-02 17:53:17 +01:00
Arnd Bergmann
d4b80d9aac Merge branch 'next/late' with mainline
* next/late: (25 commits)
  arm64: dts: exynos: Add regulators for Vbus and Vbus-Boost
  arm64: dts: exynos: Add USB 3.0 controller node for Exynos7
  arm64: dts: exynos: Use macros for pinctrl configuration on Exynos7
  pinctrl: dt-bindings: samsung: Add Exynos7 specific pinctrl macro definitions
  arm64: dts: exynos: Add initial configuration for DISP clocks for TM2/TM2e
  ARM64: dts: meson-gxbb-p200: add ADC laddered keys
  ARM64: dts: meson: meson-gx: add the SAR ADC
  ARM64: dts: meson-gxl: add the pwm_ao_b pin
  ARM64: dts: meson-gx: add the missing pwm_AO_ab node
  clk: gxbb: fix CLKID_ETH defined twice
  clk: samsung: exynos5433: Add data for 250MHz and 278MHz PLL rates
  clk: samsung: exynos5433: Add IDs for PHYCLK_MIPIDPHY0_* clocks
  ARM64: dts: meson-gxl: rename Nexbox A95x for consistency
  clk: gxbb: add the SAR ADC clocks and expose them
  dt-bindings: amlogic: Add WeTek boards
  ARM64: dts: meson-gxbb: Add support for WeTek Hub and Play
  dt-bindings: vendor-prefix: Add wetek vendor prefix
  ARM64: dts: meson-gxm: Rename q200 and q201 DT files for consistency
  ARM64: dts: meson-gx: Add HDMI HPD/DDC pinctrl nodes
  ARM64: dts: meson-gxbb-vega-s95: Add LED
  ...

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2017-03-02 17:52:44 +01:00
Jan Kara
a5a79d0001 block: Initialize bd_bdi on inode initialization
So far we initialized bd_bdi only in bdget(). That is fine for normal
bdev inodes however for the special case of the root inode of
blockdev_superblock that function is never called and thus bd_bdi is
left uninitialized. As a result bdev_evict_inode() may oops doing
bdi_put(root->bd_bdi) on that inode as can be seen when doing:

mount -t bdev none /mnt

Fix the problem by initializing bd_bdi when first allocating the inode
and then reinitializing bd_bdi in bdev_evict_inode().

Thanks to syzkaller team for finding the problem.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Fixes: b1d2dc5659 ("block: Make blk_get_backing_dev_info() safe without open bdev")
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-02 08:56:59 -07:00
Omar Sandoval
e02898b423 loop: fix LO_FLAGS_PARTSCAN hang
loop_reread_partitions() needs to do I/O, but we just froze the queue,
so we end up waiting forever. This can easily be reproduced with losetup
-P. Fix it by moving the reread to after we unfreeze the queue.

Fixes: ecdd09597a ("block/loop: fix race between I/O and set_status")
Reported-by: Tejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-02 08:56:59 -07:00
Keith Busch
302ad8cc09 nvme: Complete all stuck requests
If the nvme driver is shutting down its controller, the drievr will not
start the queues up again, preventing blk-mq's hot CPU notifier from
making forward progress.

To fix that, this patch starts a request_queue freeze when the driver
resets a controller so no new requests may enter. The driver will wait
for frozen after IO queues are restarted to ensure the queue reference
can be reinitialized when nvme requests to unfreeze the queues.

If the driver is doing a safe shutdown, the driver will wait for the
controller to successfully complete all inflight requests so that we
don't unnecessarily fail them. Once the controller has been disabled,
the queues will be restarted to force remaining entered requests to end
in failure so that blk-mq's hot cpu notifier may progress.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-02 08:56:59 -07:00
Keith Busch
f91328c40a blk-mq: Provide freeze queue timeout
A driver may wish to take corrective action if queued requests do not
complete within a set time.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-02 08:56:04 -07:00
Keith Busch
6bae363ee3 blk-mq: Export blk_mq_freeze_queue_wait
Drivers can start a freeze, so this provides a way to wait for frozen.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-02 08:56:04 -07:00
Josef Bacik
6a8a215465 nbd: stop leaking sockets
This was introduced in the multi-connection patch, we've been leaking
socket's ever since.

Fixes: 9561a7a ("nbd: add multi-connection support")
cc: stable@vger.kernel.org
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-02 08:56:04 -07:00
Omar Sandoval
562bef4259 blk-mq: move update of tags->rqs to __blk_mq_alloc_request()
No functional difference, it just makes a little more sense to update
the tag map where we actually allocate the tag.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Tested-by: Sagi Grimberg <sagi@grimberg.me>
2017-03-02 08:56:04 -07:00
Omar Sandoval
5974839899 blk-mq: kill blk_mq_set_alloc_data()
Nothing is using it anymore.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Tested-by: Sagi Grimberg <sagi@grimberg.me>
2017-03-02 08:56:04 -07:00
Omar Sandoval
6d2809d51a blk-mq: make blk_mq_alloc_request_hctx() allocate a scheduler request
blk_mq_alloc_request_hctx() allocates a driver request directly, unlike
its blk_mq_alloc_request() counterpart. It also crashes because it
doesn't update the tags->rqs map.

Fix it by making it allocate a scheduler request.

Reported-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Tested-by: Sagi Grimberg <sagi@grimberg.me>
2017-03-02 08:56:04 -07:00
Sagi Grimberg
415b806de5 blk-mq-sched: Allocate sched reserved tags as specified in the original queue tagset
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>

Modified by me to also check at driver tag allocation time if the
original request was reserved, so we can be sure to allocate a
properly reserved tag at that point in time, too.

Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-02 08:56:04 -07:00
Shaohua Li
d3af3ecdc6 nvme: allocate nvme_queue in correct node
nvme_queue is per-cpu queue (mostly). Allocating it in node where blk-mq
will use it.

Signed-off-by: Shaohua Li <shli@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-02 08:56:04 -07:00
Shaohua Li
27ddb68990 PCI: add an API to get node from vector
Next patch will use the API to get the node from vector for nvme device

Signed-off-by: Shaohua Li <shli@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-02 08:56:04 -07:00
Shaohua Li
59f082e464 blk-mq: allocate blk_mq_tags and requests in correct node
blk_mq_tags/requests of specific hardware queue are mostly used in
specific cpus, which might not be in the same numa node as disk. For
example, a nvme card is in node 0. half hardware queue will be used by
node 0, the other node 1.

Signed-off-by: Shaohua Li <shli@fb.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-03-02 08:56:04 -07:00
Shuah Khan
e53aff45c4 selftests: lib.mk Fix individual test builds
In commit a8ba798bc8 ("selftests: enable O and KBUILD_OUTPUT"), added
support to generate compile targets in a user specified directory. OUTPUT
variable controls the location which is undefined when tests are built in
the test directory or with "make -C tools/testing/selftests/x86".

make -C tools/testing/selftests/x86/
make: Entering directory '/lkml/linux_4.11/tools/testing/selftests/x86'
Makefile:44: warning: overriding recipe for target 'clean'
../lib.mk:51: warning: ignoring old recipe for target 'clean'
gcc -m64 -o /single_step_syscall_64 -O2 -g -std=gnu99 -pthread -Wall  single_step_syscall.c -lrt -ldl
/usr/bin/ld: cannot open output file /single_step_syscall_64: Permission denied
collect2: error: ld returned 1 exit status
Makefile:50: recipe for target '/single_step_syscall_64' failed
make: *** [/single_step_syscall_64] Error 1
make: Leaving directory '/lkml/linux_4.11/tools/testing/selftests/x86'

Same failure with "cd tools/testing/selftests/x86/;make" run.

Fix this with a change to lib.mk to define OUTPUT to be the pwd when
MAKELEVEL is 0. This covers both cases mentioned above.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
2017-03-02 07:53:01 -07:00
Al Viro
653a7746fa Merge remote-tracking branch 'ovl/for-viro' into for-linus
Overlayfs-related series from Miklos and Amir
2017-03-02 06:41:22 -05:00
Al Viro
f6c99aad4d Merge branch 'work.namei' into for-linus 2017-03-02 06:41:12 -05:00
Peter Zijlstra
0695d7dc1d orangefs: Use RCU for destroy_inode
freeing of inodes must be RCU-delayed on all filesystems

Cc: stable@vger.kernel.org
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-03-02 06:40:36 -05:00
Ingo Molnar
de8f1c7731 sched/headers: Move autogroup APIs into <linux/sched/autogroup.h>
Further reduce the size of sched.h.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:43 +01:00
Ingo Molnar
dea38c74cb sched/headers: Move loadavg related definitions from <linux/sched.h> to <linux/sched/loadavg.h>
Move these bits to <linux/sched/loadavg.h>, to reduce the size and
complexity of <linux/sched.h>.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:43 +01:00
Ingo Molnar
e2d1e2aec5 sched/headers: Move various ABI definitions to <uapi/linux/sched/types.h>
Move scheduler ABI types (struct sched_attr, struct sched_param, etc.) into
the new UAPI header.

This further reduces the size and complexity of <linux/sched.h>.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:42 +01:00
Ingo Molnar
47913d4ebd sched/headers, delayacct: Move the 'struct task_delay_info' definition from <linux/sched.h> to <linux/delayacct.h>
The 'struct task_delay_info' definition does not have to be in sched.h,
because task_struct only has a pointer to it.

So move it to <linux/delayacct.h> to reduce the size of <linux/sched.h>.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:42 +01:00
Ingo Molnar
5689810360 sched/headers: Move scheduler clock interfaces to <linux/sched/clock.h>
Move the sched_clock interfaces into a separate header file, to reduce
the size of sched.h.

Include <linux/sched/clock.h> in all files that made use of one of the
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:42 +01:00
Ingo Molnar
eb61baf698 sched/headers: Move the wake-queue types and interfaces from sched.h into <linux/sched/wake_q.h>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:42 +01:00
Ingo Molnar
5dbe91de59 sched/headers: Move idle polling methods to <linux/sched/idle.h>
Further reduce the size of <linux/sched.h> by moving these APIs:

	tsk_is_polling()
	__current_set_polling()
	current_set_polling_and_test()
	__current_clr_polling()
	current_clr_polling_and_test()
	current_clr_polling()

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:41 +01:00
Ingo Molnar
4437722b04 sched/headers: Move the wake_up_if_idle() prototype to <linux/sched/idle.h>
No need to clutter <linux/sched.h> with this rarely used prototype.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:41 +01:00
Ingo Molnar
b768917d2c sched/headers: Move the 'cpu_idle_type' enum from <linux/sched.h> to <linux/sched/idle.h>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:41 +01:00
Ingo Molnar
a60b9eda67 sched/headers: Move scheduler topology interfaces to <linux/sched/topology.h>
The vast majority of sched.h users does not require the topology types and
interfaces, so split them out into <linux/sched/topology.h>.

This reduces the size of linux/sched.h by ~6%.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:41 +01:00
Ingo Molnar
b69339ba10 sched/headers: Prepare to remove spurious <linux/sched.h> inclusion dependencies
In the following patches we are going to remove various headers
from sched.h and other headers that sched.h includes.

To make those patches build cleanly prepare the scene by adding
dependencies to various files that learned to rely on those
to-be-removed dependencies.

These changes all make sense standalone: they add a header for
a data type that a particular .c or .h file is using.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:41 +01:00
Ingo Molnar
50d34394ce sched/headers: Prepare to remove the <linux/magic.h> include from <linux/sched/task_stack.h>
Update files that depend on the magic.h inclusion.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:40 +01:00
Ingo Molnar
0881e7bd34 sched/headers: Prepare to move the get_task_struct()/put_task_struct() and related APIs from <linux/sched.h> to <linux/sched/task.h>
But first update usage sites with the new header dependency.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:40 +01:00
Ingo Molnar
1777e46355 sched/headers: Prepare to move _init() prototypes from <linux/sched.h> to <linux/sched/init.h>
But first introduce a trivial header and update usage sites.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:40 +01:00
Ingo Molnar
61855b6b03 sched/headers: Prepare to move exit_files() and exit_itimers() from <linux/sched.h> to <linux/sched/task.h>
But first update the usage site.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:40 +01:00
Ingo Molnar
3f8c24529b sched/headers: Prepare to move kstack_end() from <linux/sched.h> to <linux/sched/task_stack.h>
But first update the usage sites with the new header dependency.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:39 +01:00
Ingo Molnar
5c2c5c5514 sched/headers, vfs/execve: Prepare to move the do_execve*() prototypes from <linux/sched.h> to <linux/binfmts.h>
But first update the usage sites with the new header dependency.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-03-02 08:42:39 +01:00