linux

mirror of https://github.com/torvalds/linux.git synced 2024-11-14 08:02:07 +00:00

Author	SHA1	Message	Date
Steve Wise	f361e5a01e	nvme-rdma: destroy nvme queue rdma resources on connect failure After address resolution, the nvme_rdma_queue rdma resources are allocated. If rdma route resolution or the connect fails, or the controller reconnect times out and gives up, then the rdma resources need to be freed. Otherwise, rdma resources are leaked. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimbrg.me> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>	2016-09-04 10:00:54 +03:00
Steve Wise	cdbecc8d24	nvme_rdma: keep a ref on the ctrl during delete/flush Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimbrg.me> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>	2016-09-04 10:00:53 +03:00
Sagi Grimberg	4d8c6a7946	nvme-rdma: Get rid of redundant defines Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de>	2016-08-28 15:41:08 +03:00
Sagi Grimberg	f5b7b559e1	nvme-rdma: Get rid of duplicate variable We already have need_inval in ib_mr, lets use that instead. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de>	2016-08-28 15:40:54 +03:00
Christoph Hellwig	aa71987472	nvme: fabrics drivers don't need the nvme-pci driver So select the NVME_CORE symbol instead of depending on BLK_DEV_NVME. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>	2016-08-19 14:22:28 +03:00
Christoph Hellwig	98096d8a78	nvme-fabrics: get a reference when reusing a nvme_host structure Without this we'll get a use after free after connecting two controller using the same hostnqn and then disconnecting one of them. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>	2016-08-19 14:22:12 +03:00
Daniel Verkamp	7a665d2f60	nvme-fabrics: change NQN UUID to big-endian format NVM Express 1.2.1 section 7.9, NVMe Qualified Names, specifies that the UUID format of NQN uses a UUID based on RFC 4122. RFC 4122 specifies that the UUID is encoded in big-endian byte order. Switch the NVMe over Fabrics host ID field from little-endian UUID to big-endian UUID to match the specification. Signed-off-by: Daniel Verkamp <daniel.verkamp@intel.com> Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>	2016-08-19 12:00:44 +03:00
Jay Freyensee	eadb7cf441	nvme-loop: set sqsize to 0-based value, per spec Signed-off-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>	2016-08-18 09:58:06 +03:00
Jay Freyensee	c5af8654c4	nvme-rdma: fix sqsize/hsqsize per spec Per NVMe-over-Fabrics 1.0 spec, sqsize is represented as a 0-based value. Also per spec, the RDMA binding values shall be set to sqsize, which makes hsqsize 0-based values. Thus, the sqsize during NVMf connect() is now: [root@fedora23-fabrics-host1 for-48]# dmesg [ 318.720645] nvme_fabrics: nvmf_connect_admin_queue(): sqsize for admin queue: 31 [ 318.720884] nvme nvme0: creating 16 I/O queues. [ 318.810114] nvme_fabrics: nvmf_connect_io_queue(): sqsize for i/o queue: 127 Finally, current interpretation implies hrqsize is 1's based so set it appropriately. Reported-by: Daniel Verkamp <daniel.verkamp@intel.com> Signed-off-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>	2016-08-18 09:58:06 +03:00
Jay Freyensee	f994d9dc28	fabrics: define admin sqsize min default, per spec Upon admin queue connect(), the rdma qp was being set based on NVMF_AQ_DEPTH. However, the fabrics layer was using the sqsize field value set for I/O queues for the admin queue, which threw the nvme layer and rdma layer off-whack: root@fedora23-fabrics-host1 nvmf]# dmesg [ 3507.798642] nvme_fabrics: nvmf_connect_admin_queue():admin sqsize being sent is: 128 [ 3507.798858] nvme nvme0: creating 16 I/O queues. [ 3507.896407] nvme nvme0: new ctrl: NQN "nullside-nqn", addr 192.168.1.3:4420 Thus, to have a different admin queue value, we use NVMF_AQ_DEPTH for connect() and RDMA private data as the minimum depth specified in the NVMe-over-Fabrics 1.0 spec (and in that RDMA private data we treat hrqsize as 1's-based value, per current understanding of the fabrics spec). Reported-by: Daniel Verkamp <daniel.verkamp@intel.com> Signed-off-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Reviewed-by: Daniel Verkamp <daniel.verkamp@intel.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>	2016-08-18 09:58:05 +03:00
Jay Freyensee	b825b44c4e	nvmet-rdma: +1 to *queue_size from hsqsize/hrqsize The host will be sending sqsize 0-based hsqsize value, the target need to be adjusted as well. Signed-off-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>	2016-08-18 09:57:37 +03:00
Vincent Stehlé	3256aaef5e	nvmet-rdma: Fix use after free Avoid dereferencing the queue pointer in nvmet_rdma_release_queue_work() after it has been freed by nvmet_rdma_free_queue(). Fixes: `d8f7750a08` ("nvmet-rdma: Correctly handle RDMA device hot removal") Signed-off-by: Vincent Stehlé <vincent.stehle@intel.com> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>	2016-08-16 16:16:31 +03:00
Colin Ian King	39bbee4e54	nvme-rdma: initialize ret to zero to avoid returning garbage ret is not initialized so it contains garbage. Ensure garbage is not returned by initializing rc to 0. Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>	2016-08-16 11:42:29 +03:00
Sagi Grimberg	e3266378bd	nvme-rdma: Remove unused includes Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Steve Wise <swise@opengridcomputing.com>	2016-08-04 17:47:54 +03:00
Sagi Grimberg	3ef1b4b298	nvme-rdma: start async event handler after reconnecting to a controller When we reset or reconnect to a controller, we are cancelling the async event handler so we can safely re-establish resources, but we need to remember to start it again when we successfully reconnect. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de>	2016-08-04 17:45:35 +03:00
Sagi Grimberg	28b8911853	nvmet: Fix controller serial number inconsistency The host is allowed to issue identify as many times as it wants, we need to stay consistent when reporting the serial number for a given controller. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de>	2016-08-04 17:45:10 +03:00
Sagi Grimberg	40e64e0721	nvmet-rdma: Don't use the inline buffer in order to avoid allocation for small reads Under extreme conditions this might cause data corruptions. By doing that we we repost the buffer and then post this buffer for the device to send. If we happen to use shared receive queues the device might write to the buffer before it sends it (there is no ordering between send and recv queues). Without SRQs we probably won't get that if the host doesn't mis-behave and send more than we allowed it, but relying on that is not really a good idea. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de>	2016-08-04 17:44:40 +03:00
Sagi Grimberg	d8f7750a08	nvmet-rdma: Correctly handle RDMA device hot removal When configuring a device attached listener, we may see device removal events. In this case we return a non-zero return code from the cm event handler which implicitly destroys the cm_id. It is possible that in the future the user will remove this listener and by that trigger a second call to rdma_destroy_id on an already destroyed cm_id -> BUG. In addition, when a queue bound (active session) cm_id generates a DEVICE_REMOVAL event we must guarantee all resources are cleaned up by the time we return from the event handler. Introduce nvmet_rdma_device_removal which addresses (or at least attempts to) both scenarios. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de>	2016-08-04 17:43:06 +03:00
Sagi Grimberg	45862ebcc4	nvme-rdma: Make sure to shutdown the controller if we can Relying on ctrl state in nvme_rdma_shutdown_ctrl is wrong because it will never be NVME_CTRL_LIVE (delete_ctrl or reset_ctrl invoked it). Instead, check that the admin queue is connected. Note that it is safe because we can never see a copmeting thread trying to destroy the admin queue (reset or delete controller). Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de>	2016-08-03 16:25:23 +03:00
Sagi Grimberg	a159c64d93	nvme-loop: Remove duplicate call to nvme_remove_namespaces nvme_uninit_ctrl already does that for us. Note that we reordered nvme_loop_shutdown_ctrl with nvme_uninit_ctrl but its safe because we want controller uninit to happen before we shutdown the transport resources. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de>	2016-08-03 16:25:19 +03:00
Sagi Grimberg	a34ca17a97	nvme-rdma: Free the I/O tags when we delete the controller If we wait until we free the controller (free_ctrl) we might lose our rdma device without any notification while we still have open resources (tags mrs and dma mappings). Instead, destroy the tags with their rdma resources once we delete the device and not when freeing it. Note that we don't do that in nvme_rdma_shutdown_ctrl because controller reset uses it as well and we want to give active I/O a chance to complete successfully. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de>	2016-08-03 16:25:16 +03:00
Sagi Grimberg	2461a8dd38	nvme-rdma: Remove duplicate call to nvme_remove_namespaces nvme_uninit_ctrl already does that for us. Note that we reordered nvme_rdma_shutdown_ctrl and nvme_uninit_ctrl, this is perfectly fine because we actually want ctrl uninit (aen, scan cancellation and namespaces removal) to happen before we shutdown the rdma resources. Also, centralize the deletion work and the dead controller removal work code duplication into __nvme_rdma_shutdown_ctrl that accepts a shutdown boolean. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de>	2016-08-03 16:25:11 +03:00
Sagi Grimberg	57de5a0a40	nvme-rdma: Fix device removal handling Device removal sequence may have crashed because the controller (and admin queue space) was freed before we destroyed the admin queue resources. Thus we want to destroy the admin queue and only then queue controller deletion and wait for it to complete. More specifically we: 1. own the controller deletion (make sure we are not competing with another deletion). 2. get rid of inflight reconnects if exists (which also destroy and create queues). 3. destroy the queue. 4. safely queue controller deletion (and wait for it to complete). Reported-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de>	2016-08-03 16:25:07 +03:00
Sagi Grimberg	5f372eb3e7	nvme-rdma: Queue ns scanning after a sucessful reconnection On an ordered target shutdown, the target can send a AEN on a namespace removal, this will trigger the host to queue ns-list query. The shutdown will trigger error recovery which will attepmt periodic reconnect. We can hit a race where the ns rescanning fails (error recovery kicked in and we're not connected) causing removing all the namespaces and when we reconnect we won't see any namespaces for this controller. So, queue a namespace rescan after we successfully reconnected to the target. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de>	2016-08-03 16:25:03 +03:00
Roland Dreier	0b857b44b5	nvme-rdma: Don't leak uninitialized memory in connect request private data Zero out the full nvme_rdma_cm_req structure before sending it. Otherwise we end up leaking kernel memory in the reserved field, which might break forward compatibility in the future. Signed-off-by: Roland Dreier <roland@purestorage.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sagi Grimberg <sagi@grimberg.me>	2016-08-03 16:24:57 +03:00
Linus Torvalds	3fc9d69093	Merge branch 'for-4.8/drivers' of git://git.kernel.dk/linux-block Pull block driver updates from Jens Axboe: "This branch also contains core changes. I've come to the conclusion that from 4.9 and forward, I'll be doing just a single branch. We often have dependencies between core and drivers, and it's hard to always split them up appropriately without pulling core into drivers when that happens. That said, this contains: - separate secure erase type for the core block layer, from Christoph. - set of discard fixes, from Christoph. - bio shrinking fixes from Christoph, as a followup up to the op/flags change in the core branch. - map and append request fixes from Christoph. - NVMeF (NVMe over Fabrics) code from Christoph. This is pretty exciting! - nvme-loop fixes from Arnd. - removal of ->driverfs_dev from Dan, after providing a device_add_disk() helper. - bcache fixes from Bhaktipriya and Yijing. - cdrom subchannel read fix from Vchannaiah. - set of lightnvm updates from Wenwei, Matias, Johannes, and Javier. - set of drbd updates and fixes from Fabian, Lars, and Philipp. - mg_disk error path fix from Bart. - user notification for failed device add for loop, from Minfei. - NVMe in general: + NVMe delay quirk from Guilherme. + SR-IOV support and command retry limits from Keith. + fix for memory-less NUMA node from Masayoshi. + use UINT_MAX for discard sectors, from Minfei. + cancel IO fixes from Ming. + don't allocate unused major, from Neil. + error code fixup from Dan. + use constants for PSDT/FUSE from James. + variable init fix from Jay. + fabrics fixes from Ming, Sagi, and Wei. + various fixes" * 'for-4.8/drivers' of git://git.kernel.dk/linux-block: (115 commits) nvme/pci: Provide SR-IOV support nvme: initialize variable before logical OR'ing it block: unexport various bio mapping helpers scsi/osd: open code blk_make_request target: stop using blk_make_request block: simplify and export blk_rq_append_bio block: ensure bios return from blk_get_request are properly initialized virtio_blk: use blk_rq_map_kern memstick: don't allow REQ_TYPE_BLOCK_PC requests block: shrink bio size again block: simplify and cleanup bvec pool handling block: get rid of bio_rw and READA block: don't ignore -EOPNOTSUPP blkdev_issue_write_same block: introduce BLKDEV_DISCARD_ZERO to fix zeroout NVMe: don't allocate unused nvme_major nvme: avoid crashes when node 0 is memoryless node. nvme: Limit command retries loop: Make user notify for adding loop device failed nvme-loop: fix nvme-loop Kconfig dependencies nvmet: fix return value check in nvmet_subsys_alloc() ...	2016-07-26 15:37:51 -07:00
Linus Torvalds	d05d7f4079	Merge branch 'for-4.8/core' of git://git.kernel.dk/linux-block Pull core block updates from Jens Axboe: - the big change is the cleanup from Mike Christie, cleaning up our uses of command types and modified flags. This is what will throw some merge conflicts - regression fix for the above for btrfs, from Vincent - following up to the above, better packing of struct request from Christoph - a 2038 fix for blktrace from Arnd - a few trivial/spelling fixes from Bart Van Assche - a front merge check fix from Damien, which could cause issues on SMR drives - Atari partition fix from Gabriel - convert cfq to highres timers, since jiffies isn't granular enough for some devices these days. From Jan and Jeff - CFQ priority boost fix idle classes, from me - cleanup series from Ming, improving our bio/bvec iteration - a direct issue fix for blk-mq from Omar - fix for plug merging not involving the IO scheduler, like we do for other types of merges. From Tahsin - expose DAX type internally and through sysfs. From Toshi and Yigal * 'for-4.8/core' of git://git.kernel.dk/linux-block: (76 commits) block: Fix front merge check block: do not merge requests without consulting with io scheduler block: Fix spelling in a source code comment block: expose QUEUE_FLAG_DAX in sysfs block: add QUEUE_FLAG_DAX for devices to advertise their DAX support Btrfs: fix comparison in __btrfs_map_block() block: atari: Return early for unsupported sector size Doc: block: Fix a typo in queue-sysfs.txt cfq-iosched: Charge at least 1 jiffie instead of 1 ns cfq-iosched: Fix regression in bonnie++ rewrite performance cfq-iosched: Convert slice_resid from u64 to s64 block: Convert fifo_time from ulong to u64 blktrace: avoid using timespec block/blk-cgroup.c: Declare local symbols static block/bio-integrity.c: Add #include "blk.h" block/partition-generic.c: Remove a set-but-not-used variable block: bio: kill BIO_MAX_SIZE cfq-iosched: temporarily boost queue priority for idle classes block: drbd: avoid to use BIO_MAX_SIZE block: bio: remove BIO_MAX_SECTORS ...	2016-07-26 15:03:07 -07:00
Keith Busch	13880f5b57	nvme/pci: Provide SR-IOV support This registers an sr-iov callback for nvme. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-07-20 21:29:59 -06:00
Jay Freyensee	fa9a89fc66	nvme: initialize variable before logical OR'ing it It is typically not good coding or secure coding practice to logical OR a variable without an initialization value first. Here on this line: integrity.flags \|= BLK_INTEGRITY_DEVICE_CAPABLE; BLK_INTEGRITY_DEVICE_CAPABLE is being OR'ed to a member variable never set to an initial value. This patch fixes that. Signed-off-by: Jay Freyensee <james.p.freyensee@intel.com> Reviewed-by: Ming Lin <ming.l@samsung.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-07-20 21:26:16 -06:00
Christoph Hellwig	0c4de0f33b	block: ensure bios return from blk_get_request are properly initialized blk_get_request is used for BLOCK_PC and similar passthrough requests. Currently we always need to call blk_rq_set_block_pc or an open coded version of it to allow appending bios using the request mapping helpers later on, which is a somewhat awkward API. Instead move the initialization part of blk_rq_set_block_pc into blk_get_request, so that we always have a safe to use request. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-07-20 17:38:30 -06:00
Christoph Hellwig	70246286e9	block: get rid of bio_rw and READA These two are confusing leftover of the old world order, combining values of the REQ_OP_ and REQ_ namespaces. For callers that don't special case we mostly just replace bi_rw with bio_data_dir or op_is_write, except for the few cases where a switch over the REQ_OP_ values makes more sense. Any check for READA is replaced with an explicit check for REQ_RAHEAD. Also remove the READA alias for REQ_RAHEAD. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-07-20 17:37:01 -06:00
NeilBrown	b09dcf585d	NVMe: don't allocate unused nvme_major When alloc_disk(0) is used, the ->major number is ignored. All device numbers are allocated with a major of BLOCK_EXT_MAJOR. So remove all references to nvme_major. [akpm@linux-foundation.org: one unregister_blkdev() was missed] Link: http://lkml.kernel.org/r/20160602064318.4403.93301.stgit@noble Signed-off-by: NeilBrown <neilb@suse.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Maxim Levitsky <maximlevitsky@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-07-14 08:52:56 -07:00
Keith Busch	32f0c4afb4	nvme: Remove RCU namespace protection We can't sleep with RCU read lock held, but we need to do potentially blocking stuff to namespace queues when iterating the list. This patch removes the RCU locking and holds a mutex instead. To prevent deadlocks, this patch removes holding the mutex during namespace scanning and removal. The unlocked namespace scanning is made safe by holding a reference to the namespace being scanned. List iteration that does IO has to be unlocked to allow error recovery. The caller must ensure the list can not be manipulated during such an event, so this patch adds a comment explaining this requirement to the only function that iterates an unlocked list. All callers currently meet this requirement, so no further changes required. List iterations that do not do IO can safely use the lock since it couldn't block recovery from missing forced IO completions. Reported-by: Ming Lin <mlin at kernel.org> [fixes `0bf77e9` nvme: switch to RCU freeing the namespace] Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-07-14 08:48:08 -07:00
Masayoshi Mizuma	2fa843512b	nvme: avoid crashes when node 0 is memoryless node. When CONFIG_NUMA is enabled and node 0 is memoryless, the system crashes because nvme_probe() sets the device->numa_node to 0 by set_dev_node(&pdev->dev, 0), so it tries to allocate memory from node 0. To avoid the crash, we should change the 0 to first_memory_node. Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-07-13 09:17:55 -07:00
Keith Busch	f80ec966c1	nvme: Limit command retries Many controller implementations will return errors to commands that will not succeed, but without the DNR bit set. The driver previously retried these commands an unlimited number of times until the command timeout has exceeded, which takes an unnecessarilly long period of time. This patch limits the number of retries a command can have, defaulting to 5, but is user tunable at load or runtime. The struct request's 'retries' field is used to track the number of retries attempted. This is in contrast with scsi's use of this field, which indicates how many retries are allowed. Signed-off-by: Keith Busch <keith.busch@intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-07-12 16:20:31 -07:00
Arnd Bergmann	6eae8c4520	nvme-loop: fix nvme-loop Kconfig dependencies I ran into the same problem on NVME_TARGET_RDMA now, which otherwise needs dependencies on both CONFIG_BLOCK and CONFIGFS_FS: warning: (NVME_TARGET_LOOP && NVME_TARGET_RDMA) selects NVME_TARGET which has unmet direct dependencies (BLOCK && CONFIGFS_FS) 0xA002B368 Mon Jul 11 18:00:45 CEST 2016 failed In file included from ../drivers/nvme/target/core.c:16:0: drivers/nvme/target/nvmet.h:222:14: error: field 'inline_bio' has incomplete type struct bio inline_bio; ^~~~~~~~~~ drivers/nvme/target/core.c: In function 'nvmet_async_event_work': drivers/nvme/target/core.c:98:3: error: implicit declaration of function 'kfree' [-Werror=implicit-function-declaration] kfree(aen); ^~~~~ ../drivers/nvme/target/core.c: In function 'nvmet_ns_enable': ../drivers/nvme/target/core.c:269:13: error: implicit declaration of function 'blkdev_get_by_path' [-Werror=implicit-function-declaration] ns->bdev = blkdev_get_by_path(ns->device_path, FMODE_READ \| FMODE_WRITE, Folding in my patch below should address that too. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reported-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-07-12 08:36:40 -07:00
Wei Yongjun	69555af2ce	nvmet: fix return value check in nvmet_subsys_alloc() In case of error, the function kstrndup() returns NULL pointer not ERR_PTR(). The IS_ERR() test in the return value check should be replaced with NULL test. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-07-12 08:33:43 -07:00
Ming Lin	e76debd996	nvme-fabrics: add-remove ctrl repeat fix Repeatedly adding then removing the same NVMe-over-Fabrics controller over and over again (shown below) can cause a kernel crash (also shown below). This patch fixes that. [nvmf]# ./setup_nvme_connections.sh traddr=192.168.1.100,transport=rdma,trsvcid=4420,nqn=darkside -nqn,hostnqn=evil-wins-nqn,nr_io_queues=16 > /dev/nvme-fabrics traddr=192.168.1.100,transport=rdma,trsvcid=4420,nqn=lightside -nqn,hostnqn=good-wins-nqn > /dev/nvme-fabrics [nvmf]# ./remove_nvme_connections.sh 2 echo 1 > /sys/class/nvme/nvme0/delete_controller echo 1 > /sys/class/nvme/nvme1/delete_controller [nvmf]# ./setup_nvme_connections.sh traddr=192.168.1.100,transport=rdma,trsvcid=4420,nqn=darkside -nqn,hostnqn=evil-wins-nqn,nr_io_queues=16 > /dev/nvme-fabrics Killed [nvmf]# dmesg [ 313.416908] nvme nvme0: creating 16 I/O queues. [ 313.523908] nvme nvme0: new ctrl: NQN "darkside-nqn", addr 192.168.1.100:4420 [ 313.524857] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 [ 313.525262] IP: [<ffffffff8136c60e>] strcmp+0xe/0x30 [ 313.525490] PGD 0 [ 313.525726] Oops: 0000 [#1] SMP [ 313.525900] Modules linked in: nvme_rdma nvme_fabrics nvme_core ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_en mlx4_ib ib_core mlx4_core [ 313.527085] CPU: 15 PID: 5856 Comm: setup_nvme_conn Not tainted 4.7.0-rc2+ #2 [ 313.527259] Hardware name: Supermicro X9DRT-F/IBQF/IBFF/X9DRT -F/IBQF/IBFF, BIOS 1.0a 10/09/2012 [ 313.527551] task: ffff88027646cd40 ti: ffff88025b980000 task.ti: ffff88025b980000 [ 313.527879] RIP: 0010:[<ffffffff8136c60e>] [<ffffffff8136c60e>] strcmp+0xe/0x30 [ 313.528232] RSP: 0018:ffff88025b983db0 EFLAGS: 00010206 [ 313.528403] RAX: 0000000000000000 RBX: ffff880471879880 RCX: fffffffffffffff1 [ 313.528594] RDX: 0000000000000000 RSI: ffff880474afa860 RDI: 0000000000000011 [ 313.528778] RBP: ffff88025b983db0 R08: ffff880474afa860 R09: ffff880471879058 [ 313.528956] R10: 000000000000002c R11: ffff88047f415000 R12: ffff880471879800 [ 313.529129] R13: ffff880471879000 R14: ffff880474afa860 R15: fffffffffffffff8 [ 313.529303] FS: 00007f778f510700(0000) GS:ffff88047fbc0000(0000) knlGS:0000000000000000 [ 313.529629] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 313.529817] CR2: 0000000000000010 CR3: 0000000274174000 CR4: 00000000000406e0 [ 313.529989] Stack: [ 313.530154] ffff88025b983e48 ffffffffa0171c74 0000000000000001 0000000000000059 [ 313.530621] ffff880476f32400 ffff88047e8add80 0000010074b33aa0 ffff880471879059 [ 313.531162] ffff88047187904b ffff880471879058 0000000000000000 ffff88047736e000 [ 313.531629] Call Trace: [ 313.531797] [<ffffffffa0171c74>] nvmf_dev_write+0x674/0x840 [nvme_fabrics] [ 313.531974] [<ffffffff81180b53>] __vfs_write+0x23/0x120 [ 313.532146] [<ffffffff8119daff>] ? __fd_install+0x1f/0xc0 [ 313.532316] [<ffffffff8119d97a>] ? __alloc_fd+0x3a/0x170 [ 313.532487] [<ffffffff811811f3>] vfs_write+0xb3/0x1b0 [ 313.532658] [<ffffffff8117e321>] ? filp_close+0x51/0x70 [ 313.532845] [<ffffffff811824e1>] SyS_write+0x41/0xa0 [ 313.533016] [<ffffffff8183055b>] entry_SYSCALL_64_fastpath+0x13/0x8f [ 313.533188] Code: 80 3a 00 75 f7 48 83 c6 01 0f b6 4e ff 48 83 c2 01 84 c9 88 4a ff 75 ed 5d c3 0f 1f 00 55 48 89 e5 eb 04 84 c0 74 18 48 83 c7 01 <0f> b6 47 ff 48 83 c6 01 3a 46 ff 74 eb 19 c0 83 c8 01 5d c3 31 [ 313.536563] RIP [<ffffffff8136c60e>] strcmp+0xe/0x30 [ 313.536815] RSP <ffff88025b983db0> [ 313.536981] CR2: 0000000000000010 [ 313.537151] ---[ end trace 3d952e590e7bc2d5 ]--- Reported-and-tested-by: Jay Freyensee <james.p.freyensee@intel.com> Signed-off-by: Ming Lin <mlin@kernel.org> Signed-off-by: Jay Freyensee <james.p.freyensee@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-07-12 08:32:19 -07:00
Sagi Grimberg	6a92967ccb	nvme-fabrics: Remove tl_retry_count The timeout before error recovery logic kicks in is dictated by the nvme keep-alive, so we don't really need a transport layer retry count. transports can retry for as much as they like. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-07-12 08:31:11 -07:00
Sagi Grimberg	2ac17c283a	nvme-rdma: Don't use tl_retry_count Always use the maximum qp retry count as the error recovery timeout is dictated from the nvme keep-alive. Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-07-12 08:31:10 -07:00
Wei Yongjun	458a9632ad	nvme-rdma: fix the return value of nvme_rdma_reinit_request() PTR_ERR should be applied before its argument is reassigned, otherwise the return value will be set to 0, not error code. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-07-12 08:27:03 -07:00
Guilherme G. Piccoli	54adc01055	nvme/quirk: Add a delay before checking for adapter readiness When disabling the controller, the specification says the register NVME_REG_CC should be written and then driver needs to wait the adapter to be ready, which is checked by reading another register bit (NVME_CSTS_RDY). There's a timeout validation in this checking, so in case this timeout is reached the driver gives up and removes the adapter from the system. After a firmware activation procedure, the PCI_DEVICE(0x1c58, 0x0003) (HGST adapter) end up being removed if we issue a reset_controller, because driver keeps verifying the NVME_REG_CSTS until the timeout is reached. This patch adds a necessary quirk for this adapter, by introducing a delay before nvme_wait_ready(), so the reset procedure is able to be completed. This quirk is needed because just increasing the timeout is not enough in case of this adapter - the driver must wait before start reading NVME_REG_CSTS register on this specific device. Signed-off-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-07-12 08:23:00 -07:00
Jens Axboe	41d512e51b	Merge branch 'for-4.8/block' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm into for-4.8/drivers Dan writes: "The removal of ->driverfs_dev in favor of just passing the parent device in as a parameter to add_disk(). See below, it has received a "Reviewed-by" from Christoph, Bart, and Johannes. It is also a pre-requisite for Fam Zheng's work to cleanup gendisk uevents vs attribute visibility [1]. We would extend device_add_disk() to take an attribute_group list. This is based off a branch of block.git/for-4.8/drivers and has received a positive build success notification from the kbuild robot across several configs. [1]: "gendisk: Generate uevent after attribute available" http://marc.info/?l=linux-virtualization&m=146725201522201&w=2"	2016-07-08 16:04:11 -06:00
Christoph Hellwig	7110230719	nvme-rdma: add a NVMe over Fabrics RDMA host driver This patch implements the RDMA host (initiator in SCSI speak) driver. It can be used to connect to remote NVMe over Fabrics controllers over Infiniband, RoCE or iWarp, and uses the existing NVMe core driver as well a the new fabrics library. To connect to all NVMe over Fabrics controller reachable on a given taget port using RDMA/CM use the following command: nvme connect-all -t rdma -a $IPADDR This requires the latest version of nvme-cli with Fabrics support. Signed-off-by: Jay Freyensee <james.p.freyensee@intel.com> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-07-08 08:38:49 -06:00
Christoph Hellwig	8f000cac6e	nvmet-rdma: add a NVMe over Fabrics RDMA target driver This patch implements the RDMA transport for the NVMe over Fabrics target, which allows exporting NVMe over Fabrics functionality over RDMA fabrics (Infiniband, RoCE, iWARP). All NVMe logic is in the generic target and this module just provides a small glue between it and the generic code in the RDMA subsystem. Signed-off-by: Armen Baloyan <armenx.baloyan@intel.com>, Signed-off-by: Jay Freyensee <james.p.freyensee@intel.com> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-07-08 08:38:49 -06:00
Christoph Hellwig	def61eca96	nvme: add new reconnecting controller state The nvme fabric (RDMA, FC, etc...) can introduce port, link or node failures that may require a reconnect to re-establish the connection. Add a new reconnecting state that will initially be used by the RDMA driver. Reviewed-by: Jay Freyensee <james.p.freyensee@intel.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-07-08 08:38:49 -06:00
Johannes Thumshirn	6f92970219	nvme: lightnvm: make MLC num_pairs little endian According to the OpenChannel SSD interface specification the NAND flash MLC page pairing information's number of page page pairings field is the first two bytes in the MLC Page Pairing data structure. The hardware's data structure itself is little endian so annotate it as such, like the rest of lighnvm's data structures. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-07-07 08:51:52 -06:00
Dan Carpenter	f98d9ca17f	nvmet: fix an error code We accidentally return zero here when ERR_PTR(-ENOMEM) is intended. Fixes: `a07b4970f4` ('nvmet: add a generic NVMe target') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-07-07 08:37:36 -06:00
Arnd Bergmann	1fb4704084	nvme-loop: add configfs dependency CONFIG_NVME_TARGET has a correct CONFIG_CONFIGFS_FS dependency, but the newly added NVME_TARGET_LOOP is missing this, resulting in a link failure: drivers/nvme/built-in.o: In function `nvmet_init_configfs': loop.c:(.init.text+0x2a0): undefined reference to `config_group_init' loop.c:(.init.text+0x2c0): undefined reference to `config_group_init_type_name' loop.c:(.init.text+0x318): undefined reference to `configfs_register_subsystem' drivers/nvme/built-in.o: In function `nvmet_exit_configfs': loop.c:(.exit.text+0x9c): undefined reference to `configfs_unregister_subsystem' This adds the same dependency here. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: `3a85a5de29` ("nvme-loop: add a NVMe loopback host driver") Signed-off-by: Jens Axboe <axboe@fb.com>	2016-07-07 08:34:45 -06:00
Christoph Hellwig	3a85a5de29	nvme-loop: add a NVMe loopback host driver This patch implements adds nvme-loop which allows to access local devices exported as NVMe over Fabrics namespaces. This module can be useful for easy evaluation, testing and also feature experimentation. To createa nvme-loop device you need to configure the NVMe target to export a loop port (see the nvmetcli documentaton for that) and then connect to it using nvme connect-all -t loop which requires the very latest nvme-cli version with Fabrics support. Signed-off-by: Jay Freyensee <james.p.freyensee@intel.com> Signed-off-by: Ming Lin <ming.l@ssi.samsung.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-07-05 11:30:36 -06:00

1 2 3 4 5 ...

260 Commits