Commit Graph

2240 Commits

Author SHA1 Message Date
Linus Torvalds
9820b4dca0 for-5.12/drivers-2021-02-17
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmAtnZ8QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpoOhD/9TZJN6mgvlmO2zuNlZwko0jD+HUYNRHdfa
 UiZhKs55ShlT/Wd8MMLcmMU2/+iztq4c/ZLK9eS7NHgKTu3GbgsICEZK+XLSTVJh
 gCwwEnY2dnBAIwBWxLeDG04DQvcwnOhVN1OSmhFbXG3dpElXSyfEjx00niGtl0tE
 5YtvmStpqkC0tHlxq8CMyyfaL1ODGmBK2uhDeQCO12SXIgKondJUaI3/H1l1dC5t
 +yg4PsSLqezo0oWmqdTEE7lcEJs4GK1ZOhIBLtWe6tl/zaVD6DuzJL83pChm8vF+
 qV4LCpJL0wUL7IG601AemFcUmEg34oC0FD6GYYhXxVOrlk43V6AfkycZ2rljNRop
 /+Ff+CmXfWPAwSfJi/vDlveCvgyAJNMpv4/GwynUM1563v3TYy0YjT6Jlz6M3TUn
 pS0MW7iUHj3t36U3JvcYnITqTPSfTMtYMOsWDx+V4+E9iGsQF1d7KZlCPMvjWAI2
 c3QuWjitXd10BZ2qUvSzAg6piv1taBKJxg1PsGlu707mHZp5J6VPAkYQn4rFgjua
 uCBbmRQCDF/wJ02IBmBUiMPP64UGbkhr+O3MILPqki967BdDDrLzjTs4e5zbMeu/
 qB9XRr1Yd95GCyS8I42OCC906NXrJ2R2E8dtV1XoASWGusL/wFZrLd1th8Uq8ibb
 Os+G4t1Qug==
 =vx6U
 -----END PGP SIGNATURE-----

Merge tag 'for-5.12/drivers-2021-02-17' of git://git.kernel.dk/linux-block

Pull block driver updates from Jens Axboe:

 - Remove the skd driver. It's been EOL for a long time (Damien)

 - NVMe pull requests
      - fix multipath handling of ->queue_rq errors (Chao Leng)
      - nvmet cleanups (Chaitanya Kulkarni)
      - add a quirk for buggy Amazon controller (Filippo Sironi)
      - avoid devm allocations in nvme-hwmon that don't interact well
        with fabrics (Hannes Reinecke)
      - sysfs cleanups (Jiapeng Chong)
      - fix nr_zones for multipath (Keith Busch)
      - nvme-tcp crash fix for no-data commands (Sagi Grimberg)
      - nvmet-tcp fixes (Sagi Grimberg)
      - add a missing __rcu annotation (Christoph)
      - failed reconnect fixes (Chao Leng)
      - various tracing improvements (Michal Krakowiak, Johannes
        Thumshirn)
      - switch the nvmet-fc assoc_list to use RCU protection (Leonid
        Ravich)
      - resync the status codes with the latest spec (Max Gurtovoy)
      - minor nvme-tcp improvements (Sagi Grimberg)
      - various cleanups (Rikard Falkeborn, Minwoo Im, Chaitanya
        Kulkarni, Israel Rukshin)

 - Floppy O_NDELAY fix (Denis)

 - MD pull request
      - raid5 chunk_sectors fix (Guoqing)

 - Use lore links (Kees)

 - Use DEFINE_SHOW_ATTRIBUTE for nbd (Liao)

 - loop lock scaling (Pavel)

 - mtip32xx PCI fixes (Bjorn)

 - bcache fixes (Kai, Dongdong)

 - Misc fixes (Tian, Yang, Guoqing, Joe, Andy)

* tag 'for-5.12/drivers-2021-02-17' of git://git.kernel.dk/linux-block: (64 commits)
  lightnvm: pblk: Replace guid_copy() with export_guid()/import_guid()
  lightnvm: fix unnecessary NULL check warnings
  nvme-tcp: fix crash triggered with a dataless request submission
  block: Replace lkml.org links with lore
  nbd: Convert to DEFINE_SHOW_ATTRIBUTE
  nvme: add 48-bit DMA address quirk for Amazon NVMe controllers
  nvme-hwmon: rework to avoid devm allocation
  nvmet: remove else at the end of the function
  nvmet: add nvmet_req_subsys() helper
  nvmet: use min of device_path and disk len
  nvmet: use invalid cmd opcode helper
  nvmet: use invalid cmd opcode helper
  nvmet: add helper to report invalid opcode
  nvmet: remove extra variable in id-ns handler
  nvmet: make nvmet_find_namespace() req based
  nvmet: return uniform error for invalid ns
  nvmet: set status to 0 in case for invalid nsid
  nvmet-fc: add a missing __rcu annotation to nvmet_fc_tgt_assoc.queues
  nvme-multipath: set nr_zones for zoned namespaces
  nvmet-tcp: fix potential race of tcp socket closing accept_work
  ...
2021-02-21 11:06:54 -08:00
Linus Torvalds
582cd91f69 for-5.12/block-2021-02-17
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmAtmIwQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgplzLEAC5O+3rBM8QuiJdo39Yppmuw4hDJ6hOKynP
 EJQLKQQi0VfXgU+MprGvcbpFYmNbgICvUICQkEzJuk++kPCu/BJtJz0yErQeLgS+
 RdXiPV6enbF7iRML5TVRTr1q/z7sJMXcIIJ8Pz/rU/JNfGYExVd0WfnEY9mp1jOt
 Bl9V+qyTazdP+Ma4+uEPatSayqcdi1rxB5I+7v/sLiOvKZZWkaRZjUZ/mxAjUfvK
 dBOOPjMygEo3tCLkIyyA6lpLvr1r+SUZhLuebRLEKa3To3TW6RtoG0qwpKmI2iKw
 ylLeVLB60nM9RUxjflVOfBsHxz1bDg5Ve86y5nCjQd4Jo8x1c4DnecyGE5/Tu8Rg
 rgbsfD6nFWzhDCvcZT0XrfQ4ZAjIL2IfT+ypQiQ6UlRd3hvIKRmzWMkjuH2svr0u
 ey9Kq+lYerI4cM0F3W73gzUKdIQOuCzBCYxQuSQQomscBa7FCInyU192dAI9Aj6l
 Yd06mgKu6qCx6zLv6JfpBqaBHZMwyGE4dmZgPQFuuwO+b4N+Ck3Jm5fzEzw/xIxQ
 wdo/DlsAl60BXentB6FByGBJaCjVdSymRqN/xNCAbFKCjmr6TLBuXPfg1gYYO7xC
 VOcVjWe8iN3wWHZab3t2mxMKH9B9B/KKzIhu6TNHSmgtQ5paZPRCBx995pDyRw26
 WC22RGC2MA==
 =os1E
 -----END PGP SIGNATURE-----

Merge tag 'for-5.12/block-2021-02-17' of git://git.kernel.dk/linux-block

Pull core block updates from Jens Axboe:
 "Another nice round of removing more code than what is added, mostly
  due to Christoph's relentless pursuit of tech debt removal/cleanups.
  This pull request contains:

   - Two series of BFQ improvements (Paolo, Jan, Jia)

   - Block iov_iter improvements (Pavel)

   - bsg error path fix (Pan)

   - blk-mq scheduler improvements (Jan)

   - -EBUSY discard fix (Jan)

   - bvec allocation improvements (Ming, Christoph)

   - bio allocation and init improvements (Christoph)

   - Store bdev pointer in bio instead of gendisk + partno (Christoph)

   - Block trace point cleanups (Christoph)

   - hard read-only vs read-only split (Christoph)

   - Block based swap cleanups (Christoph)

   - Zoned write granularity support (Damien)

   - Various fixes/tweaks (Chunguang, Guoqing, Lei, Lukas, Huhai)"

* tag 'for-5.12/block-2021-02-17' of git://git.kernel.dk/linux-block: (104 commits)
  mm: simplify swapdev_block
  sd_zbc: clear zone resources for non-zoned case
  block: introduce blk_queue_clear_zone_settings()
  zonefs: use zone write granularity as block size
  block: introduce zone_write_granularity limit
  block: use blk_queue_set_zoned in add_partition()
  nullb: use blk_queue_set_zoned() to setup zoned devices
  nvme: cleanup zone information initialization
  block: document zone_append_max_bytes attribute
  block: use bi_max_vecs to find the bvec pool
  md/raid10: remove dead code in reshape_request
  block: mark the bio as cloned in bio_iov_bvec_set
  block: set BIO_NO_PAGE_REF in bio_iov_bvec_set
  block: remove a layer of indentation in bio_iov_iter_get_pages
  block: turn the nr_iovecs argument to bio_alloc* into an unsigned short
  block: remove the 1 and 4 vec bvec_slabs entries
  block: streamline bvec_alloc
  block: factor out a bvec_alloc_gfp helper
  block: move struct biovec_slab to bio.c
  block: reuse BIO_INLINE_VECS for integrity bvecs
  ...
2021-02-21 11:02:48 -08:00
Sagi Grimberg
e11e511617 nvme-tcp: fix crash triggered with a dataless request submission
write-zeros has a bio, but does not have any data buffers associated
with it. Hence should not initialize the request iter for it (which
attempts to reference the bi_io_vec (and crash).
--
 run blktests nvme/012 at 2021-02-05 21:53:34
 BUG: kernel NULL pointer dereference, address: 0000000000000008
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: 0000 [#1] SMP NOPTI
 CPU: 15 PID: 12069 Comm: kworker/15:2H Tainted: G S        I       5.11.0-rc6+ #1
 Hardware name: Dell Inc. PowerEdge R640/06NR82, BIOS 2.10.0 11/12/2020
 Workqueue: kblockd blk_mq_run_work_fn
 RIP: 0010:nvme_tcp_init_iter+0x7d/0xd0 [nvme_tcp]
 RSP: 0018:ffffbd084447bd18 EFLAGS: 00010246
 RAX: 0000000000000000 RBX: ffffa0bba9f3ce80 RCX: 0000000000000000
 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000002000000
 RBP: ffffa0ba8ac6fec0 R08: 0000000002000000 R09: 0000000000000000
 R10: 0000000002800809 R11: 0000000000000000 R12: 0000000000000000
 R13: ffffa0bba9f3cf90 R14: 0000000000000000 R15: 0000000000000000
 FS:  0000000000000000(0000) GS:ffffa0c9ff9c0000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000008 CR3: 00000001c9c6c005 CR4: 00000000007706e0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 PKRU: 55555554
 Call Trace:
  nvme_tcp_queue_rq+0xef/0x330 [nvme_tcp]
  blk_mq_dispatch_rq_list+0x11c/0x7c0
  ? blk_mq_flush_busy_ctxs+0xf6/0x110
  __blk_mq_sched_dispatch_requests+0x12b/0x170
  blk_mq_sched_dispatch_requests+0x30/0x60
  __blk_mq_run_hw_queue+0x2b/0x60
  process_one_work+0x1cb/0x360
  ? process_one_work+0x360/0x360
  worker_thread+0x30/0x370
  ? process_one_work+0x360/0x360
  kthread+0x116/0x130
  ? kthread_park+0x80/0x80
  ret_from_fork+0x1f/0x30
--

Fixes: cb9b870fba ("nvme-tcp: fix wrong setting of request iov_iter")
Reported-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-11 08:04:51 +01:00
Filippo Sironi
4bdf260362 nvme: add 48-bit DMA address quirk for Amazon NVMe controllers
Some Amazon NVMe controllers do not follow the NVMe specification
and are limited to 48-bit DMA addresses.  Add a quirk to force
bounce buffering if needed and limit the IOVA allocation for these
devices.

This affects all current Amazon NVMe controllers that expose EBS
volumes (0x0061, 0x0065, 0x8061) and local instance storage
(0xcd00, 0xcd01, 0xcd02).

Signed-off-by: Filippo Sironi <sironi@amazon.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10 16:38:06 +01:00
Hannes Reinecke
ed7770f662 nvme-hwmon: rework to avoid devm allocation
The original design to use device-managed resource allocation
doesn't really work as the NVMe controller has a vastly different
lifetime than the hwmon sysfs attributes, causing warning about
duplicate sysfs entries upon reconnection.
This patch reworks the hwmon allocation to avoid device-managed
resource allocation, and uses the NVMe controller as parent for
the sysfs attributes.

Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Tested-by: Enzo Matsumiya <ematsumiya@suse.de>
Tested-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10 16:38:06 +01:00
Chaitanya Kulkarni
295a39f5a5 nvmet: remove else at the end of the function
The function nvmet_parse_io_cmd() returns value from
nvmet_file_parse_io_cmd() or nvmet_bdev_parse_io_cmd() based on which
backend is set for the request. Remove the else and just return the
value from nvmet_bdev_parse_io_cmd().

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10 16:38:06 +01:00
Chaitanya Kulkarni
20c2c3bb83 nvmet: add nvmet_req_subsys() helper
Just like what we have to get the passthru ctrl from the req, add an
helper to get the subsystem associated with the nvmet_req() instead
of open coding the chain of structures.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10 16:38:05 +01:00
Chaitanya Kulkarni
d86481e924 nvmet: use min of device_path and disk len
In function __assign_req_name() instead of using the DEVICE_NAME_LEN in
strncpy() use min of DISK_NAME_LEN and strlen(req->ns->device_path).

This is needed to turn off the following warnings:-

In file included from drivers/nvme/target/core.c:14:
In function ‘__assign_req_name’,
    inlined from ‘trace_event_raw_event_nvmet_req_init’ at drivers/nvme/target/./trace.h:58:1:
drivers/nvme/target/trace.h:52:3: warning: ‘strncpy’ specified bound 32 equals destination size [-Wstringop-truncation]
   strncpy(name, req->ns->device_path, DISK_NAME_LEN);
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In function ‘__assign_req_name’,
    inlined from ‘perf_trace_nvmet_req_complete’ at drivers/nvme/target/./trace.h💯1:
drivers/nvme/target/trace.h:52:3: warning: ‘strncpy’ specified bound 32 equals destination size [-Wstringop-truncation]
   strncpy(name, req->ns->device_path, DISK_NAME_LEN);
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In function ‘__assign_req_name’,
    inlined from ‘perf_trace_nvmet_req_init’ at drivers/nvme/target/./trace.h:58:1:
drivers/nvme/target/trace.h:52:3: warning: ‘strncpy’ specified bound 32 equals destination size [-Wstringop-truncation]
   strncpy(name, req->ns->device_path, DISK_NAME_LEN);
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In function ‘__assign_req_name’,
    inlined from ‘trace_event_raw_event_nvmet_req_complete’ at drivers/nvme/target/./trace.h💯1:
drivers/nvme/target/trace.h:52:3: warning: ‘strncpy’ specified bound 32 equals destination size [-Wstringop-truncation]
   strncpy(name, req->ns->device_path, DISK_NAME_LEN);
   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10 16:38:05 +01:00
Chaitanya Kulkarni
07116ea50f nvmet: use invalid cmd opcode helper
In the NVMeOF block device backend, file backend, and passthru backend
we reject and report the commands if opcode is not handled.

Use the previously introduced helper in the passthru backend to make the
error message uniform.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10 16:38:05 +01:00
Chaitanya Kulkarni
1c2c761368 nvmet: use invalid cmd opcode helper
In the NVMeOF block device backend, file backend, and passthru backend
we reject and report the commands if opcode is not handled.

Use the previously introduced helper in file backend to reduce the
duplicate code and make the error message uniform.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10 16:38:05 +01:00
Chaitanya Kulkarni
d81d57cf1b nvmet: add helper to report invalid opcode
In the NVMeOF block device backend, file backend, and passthru backend
we reject and report the commands if opcode is not handled.

Add an helper and use it in block device backend to keep the code
and error message uniform.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10 16:38:05 +01:00
Chaitanya Kulkarni
3999434b6c nvmet: remove extra variable in id-ns handler
In nvmet_execute_identify_ns() local variable ctrl is accessed only in
one place, remove that and directly use it from nvmet_req->sq->ctrl.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10 16:38:04 +01:00
Chaitanya Kulkarni
3a1f7c79ae nvmet: make nvmet_find_namespace() req based
The six callers of nvmet_find_namespace() duplicate the error log page
update and status setting code for each call on failure.

All callers are nvmet requests based functions, so we can pass req
to the nvmet_find_namesapce() & derive ctrl from req, that'll allow us
to update the error log page in nvmet_find_namespace(). Now that we
pass the request we can also get rid of the local variable in
nvmet_find_namespace() and use the req->ns and return the error code.

Replace the ctrl parameter with nvmet_req for nvmet_find_namespace(),
centralize the error log page update for non allocated namesapces, and
return uniform error for non-allocated namespace.

The nvmet_find_namespace() takes nsid parameter which is from NVMe
commands structures such as get_log_page, identify, rw and common. All
these commands have same offset for the nsid field.

Derive nsid from req->cmd->common.nsid) & remove the extra parameter
from the nvmet_find_namespace().

Lastly now we associate the ns to the req parameter that we pass to the
nvmet_find_namespace(), rename nvmet_find_namespace() to
nvmet_req_find_ns().

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10 16:38:04 +01:00
Chaitanya Kulkarni
aa0aff604a nvmet: return uniform error for invalid ns
For nvmet_find_namespace() error case we have inconsistent error code
mapping in the function nvmet_get_smart_log_nsid() and
nvmet_set_feat_write_protect().

There is no point in retrying for the invalid namesapce from the host
side. Set the error code to the NVME_SC_INVALID_NS | NVME_SC_DNR which
matches what we have in nvmet_execute_identify_desclist().

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10 16:38:04 +01:00
Chaitanya Kulkarni
40244ad36b nvmet: set status to 0 in case for invalid nsid
For unallocated namespace in nvmet_execute_identify_ns() don't set the
status to NVME_SC_INVALID_NS, set it to zero.

Fixes: bffcd50778 ("nvmet: set right status on error in id-ns handler")
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10 16:38:04 +01:00
Christoph Hellwig
b5df8e79a2 nvmet-fc: add a missing __rcu annotation to nvmet_fc_tgt_assoc.queues
Make sparse happy after the recent conversion to RCU lookups.

Fixes: 4e2f02bf77 ("nvmet-fc: use RCU proctection for assoc_list")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
2021-02-10 16:38:04 +01:00
Keith Busch
73a1a2298f nvme-multipath: set nr_zones for zoned namespaces
The bio based drivers only require the request_queue's nr_zones is set,
so set this field in the head if the namespace path is zoned.

Fixes: 240e6ee272 ("nvme: support for zoned namespaces")
Reported-by: Minwoo Im <minwoo.im.dev@gmail.com>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10 16:38:04 +01:00
Sagi Grimberg
0fbcfb089a nvmet-tcp: fix potential race of tcp socket closing accept_work
When we accept a TCP connection and allocate an nvmet-tcp queue we should
make sure not to fully establish it or reference it as the connection may
be already closing, which triggers queue release work, which does not
fence against queue establishment.

In order to address such a race, we make sure to check the sk_state and
contain the queue reference to be done underneath the sk_callback_lock
such that the queue release work correctly fences against it.

Fixes: 872d26a391 ("nvmet-tcp: add NVMe over TCP target driver")
Reported-by: Elad Grupi <elad.grupi@dell.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10 16:38:03 +01:00
Sagi Grimberg
fda871c0ba nvmet-tcp: fix receive data digest calculation for multiple h2cdata PDUs
When a host sends multiple h2cdata PDUs for a single command, we
should verify the data digest calculation per PDU and not
per command.

Fixes: 872d26a391 ("nvmet-tcp: add NVMe over TCP target driver")
Reported-by: Narayan Ayalasomayajula <Narayan.Ayalasomayajula@wdc.com>
Tested-by: Narayan Ayalasomayajula <Narayan.Ayalasomayajula@wdc.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10 16:38:03 +01:00
Chao Leng
62eca39722 nvme-rdma: handle nvme_rdma_post_send failures better
nvme_rdma_post_send failing is a path related error and should bounce
to another path when using nvme-multipath.  Call nvme_host_path_error
when nvme_rdma_post_send returns -EIO to ensure nvme_complete_rq gets
invoked to fail over to another path if there is one.

Signed-off-by: Chao Leng <lengchao@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10 16:38:03 +01:00
Chao Leng
ea5e5f42cd nvme-fabrics: avoid double completions in nvmf_fail_nonready_command
When reconnecting, the request may be completed with
NVME_SC_HOST_PATH_ERROR in nvmf_fail_nonready_command, which currently
set the state of the request to MQ_RQ_IN_FLIGHT before calling
nvme_complete_rq.  When this happens for a request that is freed by
the caller, such as nvme_submit_user_cmd, in the worst case the request
could be completed again in tear down process.

Instead of calling blk_mq_start_request from nvmf_fail_nonready_command,
just use the new nvme_host_path_error helper to complete the command
without starting it.

Signed-off-by: Chao Leng <lengchao@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10 16:38:03 +01:00
Chao Leng
dda3248e7f nvme: introduce a nvme_host_path_error helper
When using nvme native multipathing, if a path related error occurs
during ->queue_rq, the request needs to be completed with
NVME_SC_HOST_PATH_ERROR so that the request can be failed over.

Introduce a helper to complete the command from ->queue_rq in a wait
that invokes nvme_complete_rq.

Signed-off-by: Chao Leng <lengchao@huawei.com>
[hch: renamed, added a return value to clean up the callers a bit]
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10 16:38:03 +01:00
Jiapeng Chong
f720a8edbc nvme: convert sysfs sprintf/snprintf family to sysfs_emit
Fix the following coccicheck warning:

./drivers/nvme/host/core.c:3580:8-16: WARNING: use scnprintf or sprintf.
./drivers/nvme/host/core.c:3570:8-16: WARNING: use scnprintf or sprintf.
./drivers/nvme/host/core.c:3560:8-16: WARNING: use scnprintf or sprintf.
./drivers/nvme/host/core.c:3526:8-16: WARNING: use scnprintf or sprintf.
./drivers/nvme/host/core.c:2833:8-16: WARNING: use scnprintf or sprintf.

Reported-by: Abaci Robot<abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10 16:38:02 +01:00
Damien Le Moal
73d90386b5 nvme: cleanup zone information initialization
For a zoned namespace, in nvme_update_ns_info(), call
nvme_update_zone_info() after executing nvme_update_disk_info() so that
the namespace queue logical and physical block size limits are set.
This allows setting the namespace queue max_zone_append_sectors limit
in nvme_update_zone_info() instead of nvme_revalidate_zones(),
simplifying this function. Also use blk_queue_set_zoned() to set the
namespace zoned model.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@edc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2021-02-10 07:44:40 -07:00
Sagi Grimberg
cb8563f5c7 nvmet-tcp: fix out-of-bounds access when receiving multiple h2cdata PDUs
When the host sends multiple h2cdata PDUs, we keep track on
the receive progress and calculate the scatterlist index and
offsets.

The issue is that sg_offset should only be kept for the first
iov entry we map in the iovec as this is the difference between
our cursor and the sg entry offset itself.

In addition, the sg index was calculated wrong because we should
not round up when dividing the command byte offset with PAG_SIZE.

Fixes: 872d26a391 ("nvmet-tcp: add NVMe over TCP target driver")
Reported-by: Narayan Ayalasomayajula <Narayan.Ayalasomayajula@wdc.com>
Tested-by: Narayan Ayalasomayajula <Narayan.Ayalasomayajula@wdc.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-03 16:57:36 +01:00
Claus Stovgaard
c9e95c3928 nvme-pci: ignore the subsysem NQN on Phison E16
Tested both with Corsairs firmware 11.3 and 13.0 for the Corsairs MP600
and both have the issue as reported by the kernel.

nvme nvme0: missing or invalid SUBNQN field.

Signed-off-by: Claus Stovgaard <claus.stovgaard@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02 10:48:37 +01:00
Thorsten Leemhuis
538e4a8c57 nvme-pci: avoid the deepest sleep state on Kingston A2000 SSDs
Some Kingston A2000 NVMe SSDs sooner or later get confused and stop
working when they use the deepest APST sleep while running Linux. The
system then crashes and one has to cold boot it to get the SSD working
again.

Kingston seems to known about this since at least mid-September 2020:
https://bbs.archlinux.org/viewtopic.php?pid=1926994#p1926994

Someone working for a German company representing Kingston to the German
press confirmed to me Kingston engineering is aware of the issue and
investigating; the person stated that to their current knowledge only
the deepest APST sleep state causes trouble. Therefore, make Linux avoid
it for now by applying the NVME_QUIRK_NO_DEEPEST_PS to this SSD.

I have two such SSDs, but it seems the problem doesn't occur with them.
I hence couldn't verify if this patch really fixes the problem, but all
the data in front of me suggests it should.

This patch can easily be reverted or improved upon if a better solution
surfaces.

FWIW, there are many reports about the issue scattered around the web;
most of the users disabled APST completely to make things work, some
just made Linux avoid the deepest sleep state:

https://bugzilla.kernel.org/show_bug.cgi?id=195039#c65
https://bugzilla.kernel.org/show_bug.cgi?id=195039#c73
https://bugzilla.kernel.org/show_bug.cgi?id=195039#c74
https://bugzilla.kernel.org/show_bug.cgi?id=195039#c78
https://bugzilla.kernel.org/show_bug.cgi?id=195039#c79
https://bugzilla.kernel.org/show_bug.cgi?id=195039#c80
https://askubuntu.com/questions/1222049/nvmekingston-a2000-sometimes-stops-giving-response-in-ubuntu-18-04dell-inspir
https://community.acer.com/en/discussion/604326/m-2-nvme-ssd-aspire-517-51g-issue-compatibility-kingston-a2000-linux-ubuntu

For the record, some data from 'nvme id-ctrl /dev/nvme0'

NVME Identify Controller:
vid       : 0x2646
ssvid     : 0x2646
mn        : KINGSTON SA2000M81000G
fr        : S5Z42105
[...]
ps    0 : mp:9.00W operational enlat:0 exlat:0 rrt:0 rrl:0
          rwt:0 rwl:0 idle_power:- active_power:-
ps    1 : mp:4.60W operational enlat:0 exlat:0 rrt:1 rrl:1
          rwt:1 rwl:1 idle_power:- active_power:-
ps    2 : mp:3.80W operational enlat:0 exlat:0 rrt:2 rrl:2
          rwt:2 rwl:2 idle_power:- active_power:-
ps    3 : mp:0.0450W non-operational enlat:2000 exlat:2000 rrt:3 rrl:3
          rwt:3 rwl:3 idle_power:- active_power:-
ps    4 : mp:0.0040W non-operational enlat:15000 exlat:15000 rrt:4 rrl:4
          rwt:4 rwl:4 idle_power:- active_power:-

Cc: stable@vger.kernel.org # 4.14+
Signed-off-by: Thorsten Leemhuis <linux@leemhuis.info>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02 10:46:17 +01:00
Chao Leng
563c81586d nvme-tcp: use cancel tagset helper for tear down
Use nvme_cancel_tagset and nvme_cancel_admin_tagset to clean code for
tear down process.

Signed-off-by: Chao Leng <lengchao@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02 10:26:13 +01:00
Chao Leng
c4189d680e nvme-rdma: use cancel tagset helper for tear down
Use nvme_cancel_tagset and nvme_cancel_admin_tagset to clean code for
tear down process.

Signed-off-by: Chao Leng <lengchao@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02 10:26:12 +01:00
Chao Leng
70a99574a7 nvme-tcp: add clean action for failed reconnection
If reconnect failed after start io queues, the queues will be unquiesced
and new requests continue to be delivered. Reconnection error handling
process directly free queues without cancel suspend requests. The
suppend request will time out, and then crash due to use the queue
after free.

Add sync queues and cancel suppend requests for reconnection error
handling.

Signed-off-by: Chao Leng <lengchao@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02 10:26:12 +01:00
Chao Leng
958dc1d32c nvme-rdma: add clean action for failed reconnection
A crash happens when inject failed reconnection.
If reconnect failed after start io queues, the queues will be unquiesced
and new requests continue to be delivered. Reconnection error handling
process directly free queues without cancel suspend requests. The
suppend request will time out, and then crash due to use the queue
after free.

Add sync queues and cancel suppend requests for reconnection error
handling.

Signed-off-by: Chao Leng <lengchao@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02 10:26:12 +01:00
Chao Leng
2547906982 nvme-core: add cancel tagset helpers
Add nvme_cancel_tagset and nvme_cancel_admin_tagset for tear down and
reconnection error handling.

Signed-off-by: Chao Leng <lengchao@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02 10:26:12 +01:00
Chaitanya Kulkarni
8f8ea928fd nvme-core: get rid of the extra space
Remove the extra space in the nvme_free_cels() when calling
xa_for_each loop which is not a common practice
(except drivers/infiniband/core/ not sure why).

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02 10:26:12 +01:00
Johannes Thumshirn
4a407d5ebc nvme: add tracing of zns commands
When support for the NVMe ZNS commands was merged, tracing of these has
been omitted.

Add nvme_cmd_zone_mgmt_send, nvme_cmd_zone_mgmt_recv as well as
nvme_cmd_zone_append to the nvme driver's tracing facility.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02 10:26:12 +01:00
Michal Krakowiak
3a98c51a24 nvme: parse format nvm command details when tracing
Add detailed parsing of format nvm admin command to make the
trace log more consistent and human-readable.

Signed-off-by: Michal Krakowiak <michal.krakowiak@intel.com>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02 10:26:12 +01:00
Chaitanya Kulkarni
193fcf371f nvmet: add lba to sect conversion helpers
In this preparation patch, we add helpers to convert lbas to sectors &
sectors to lba. This is needed to eliminate code duplication in the ZBD
backend.

Use these helpers in the block device backend.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02 10:26:11 +01:00
Chaitanya Kulkarni
3c7b224f19 nvmet: remove extra variable in identify ns
We remove the extra local variable struct nvmet_ns in
nvmet_execute_identify_ns() since req already has ns member that can be
reused, this also eliminates the explicit call to nvmet_put_namespace()
which is already present in the request completion path.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02 10:26:11 +01:00
Chaitanya Kulkarni
3631c7f4a2 nvmet: remove extra variable in id-desclist
We remove the extra local variable struct nvmet_ns in
nvmet_execute_identify_desclist() since req already has ns member that
can be reused, this also eliminates the explicit call to
nvmet_put_namespace() which is already present in the request
completion path.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02 10:26:11 +01:00
Chaitanya Kulkarni
624e67fdf9 nvmet: remove extra variable in smart log nsid
We remove the extra local variable struct nvmet_ns in
nvmet_get_smart_log_nsid() since req already has ns member that can be
reused, this also eliminates the explicit call to nvmet_put_namespace()
which is already present in the request completion path.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02 10:26:11 +01:00
Minwoo Im
fc97e942d9 nvme: refactor ns->ctrl by request
Just for current code in nvme_cleanup_cmd(), we don't have to get
namespace instance, but we need controller instance.

Controller instance can be retrieved by namespace instance, but it can
be directly accessed by nvme_request instance from request.

	ctrl = nvme_req(req)->ctrl;

We don't have to go around namespace instance from request instance
through gendisk.

Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02 10:26:11 +01:00
Sagi Grimberg
0dc9edaf80 nvme-tcp: pass multipage bvec to request iov_iter
iov_iter uses the right helpers so we should be able
to pass in a multipage bvec. Right now the iov_iter is
initialized with more segments that it needs which doesn't
fail because the iov_iter is capped by byte count, but it
is better to use a full multipage bvec iter.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02 10:26:11 +01:00
Sagi Grimberg
60141aa08c nvme-tcp: get rid of unused helper function
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02 10:26:11 +01:00
Sagi Grimberg
cb9b870fba nvme-tcp: fix wrong setting of request iov_iter
We might set the iov_iter direction wrong, which is harmless for this
use-case, but get it right. Also this makes the code slightly cleaner.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02 10:26:11 +01:00
Minwoo Im
f9063a5327 nvme: support command retry delay for admin command
The controller can request a delay retrying a failed command by setting
the Command Retry Delay (CRD) field in the Completion Queue Entry.

Currentlty this features is only applied to commands on the I/O queue, but
not to commands on the admin queue.  Retreive the nvme_ctrl from the
request so that no namespace is required and apply the feature to all
commands.

Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02 10:26:10 +01:00
Rikard Falkeborn
60b152a508 nvme: constify static attribute_group structs
The only usage of these is to put their addresses in arrays of pointers
to const attribute_groups. Make them const to allow the compiler to put
them in read-only memory.

Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02 10:26:10 +01:00
Leonid Ravich
4e2f02bf77 nvmet-fc: use RCU proctection for assoc_list
searching assoc_list protected by rcu_read_lock if list not changed inline.
and according to the rcu list rules.

queue array embedded into nvmet_fc_tgt_assoc protected by rcu_read_lock
according to rcu dereference/assign rules.

queue and assoc object freed after grace period by call_rcu.

tgtport lock taken for changing assoc_list.

Reviewed-by: Eldad Zinger <Eldad.Zinger@dell.com>
Reviewed-by: Elad Grupi <Elad.Grupi@dell.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Leonid Ravich <Leonid.Ravich@emc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02 10:26:10 +01:00
Israel Rukshin
36ca03c830 nvmet: Fix nvmet_is_port_enabled indentation
Remove extra tab.

Signed-off-by: Israel Rukshin <israelr@nvidia.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02 10:26:10 +01:00
Israel Rukshin
cc34562261 nvmet: Use nvmet_is_port_enabled helper for pi_enable
Remove code duplication.

Signed-off-by: Israel Rukshin <israelr@nvidia.com>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02 10:26:10 +01:00
Chao Leng
772ea326a4 nvme-core: use list_add_tail_rcu instead of list_add_tail for nvme_init_ns_head
The "list" of nvme_ns_head is used as rcu list, now in nvme_init_ns_head
list_add_tail is used to add ns->siblings to the rcu list. It is not safe.
Should use list_add_tail_rcu instead of list_add_tail.

Signed-off-by: Chao Leng <lengchao@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-01-28 19:25:07 +01:00
Daniel Wagner
d1bcf006a9 nvme-multipath: Early exit if no path is available
nvme_round_robin_path() should test if the return ns pointer is valid.
nvme_next_ns() will return a NULL pointer if there is no path left.

Fixes: 75c10e7327 ("nvme-multipath: round-robin I/O policy")
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-01-28 19:25:07 +01:00