linux

Author	SHA1	Message	Date
Dan Williams	c969e24c1b	libnvdimm, namespace: filter out of range labels in scan_labels() Short-circuit doomed-to-fail label validation attempts by skipping labels that are outside the given region. For example a DIMM that has multiple PMEM regions will waste time attempting to create namespaces only to find that the interleave-set-cookie does not validate, e.g.: nd_region region6: invalid cookie in label: 73e608dc-47b9-4b2a-b5c7-2d55a32e0c2 Similar to how we skip BLK labels when performing PMEM validation we can skip out-of-range labels early. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-10-07 09:22:53 -07:00
Dan Williams	762d067dba	libnvdimm, namespace: enable allocation of multiple pmem namespaces Now that we have nd_region_available_dpa() able to handle the presence of multiple PMEM allocations in aliased PMEM regions, reuse that same infrastructure to track allocations from free space. In particular handle allocating from an aliased PMEM region in the case where there are dis-contiguous holes. The allocation for BLK and PMEM are documented in the space_valid() helper: BLK-space is valid as long as it does not precede a PMEM allocation in a given region. PMEM-space must be contiguous and adjacent to an existing existing allocation (if one exists). Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-10-07 09:22:53 -07:00
Dan Williams	16660eaea0	libnvdimm, namespace: update label implementation for multi-pmem Instead of assuming that there will only ever be one allocated range at the start of the region, account for additional namespaces that might start at an offset from the region base. After this change pmem namespaces now have a reason to carry an array of resources similar to blk. Unifying the resource tracking infrastructure in nd_namespace_common is a future cleanup candidate. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-10-07 09:22:53 -07:00
Dan Williams	012207334a	libnvdimm, namespace: expand pmem device naming scheme for multi-pmem pmem devices are currently named /dev/pmem<region-index>. Preserve the naming of the 0th device, but add a ".<namespace-index>" for other devices. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-10-07 09:22:53 -07:00
Dan Williams	a1f3e4d6a0	libnvdimm, region: update nd_region_available_dpa() for multi-pmem support The free dpa (dimm-physical-address) space calculation reports how much free space is available with consideration for aliased BLK + PMEM regions. Recall that BLK capacity is allocated from high addresses and PMEM is allocated from low addresses in their respective regions. nd_region_available_dpa() accounts for the fact that the largest encroachment (lowest starting address) into PMEM capacity by a BLK allocation limits the available capacity to that point, regardless if there is BLK allocation hole at a higher address. Similarly, for the multi-pmem case we need to track the largest encroachment (highest ending address) of a PMEM allocation in BLK capacity regardless of whether there is an allocation hole that a BLK allocation could fill at a lower address. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-10-07 09:20:53 -07:00
Dan Williams	6ff3e912d3	libnvdimm, namespace: sort namespaces by dpa at init Add more determinism to initial namespace device-name assignments by sorting the namespaces by starting dpa. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-10-07 09:20:53 -07:00
Dan Williams	0e3b0d123c	libnvdimm, namespace: allow multiple pmem-namespaces per region at scan time If label scanning finds multiple valid pmem namespaces allow them to be surfaced rather than fail namespace scanning. Support for creating multiple namespaces per region is saved for a later patch. Note that this adds some new error messages to clarify which of the pmem namespaces in the set are potentially impacted by invalid labels. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-10-07 09:20:53 -07:00
Dan Williams	8a5f50d3b7	libnvdimm, namespace: unify blk and pmem label scanning In preparation for allowing multiple namespace per pmem region, unify blk and pmem label scanning. Given that blk regions already support multiple namespaces, teaching that path how to do pmem namespace scanning is an incremental step towards multiple pmem namespace support. This should be functionally equivalent to the previous state in that stops after finding the first valid pmem label set. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-10-05 20:24:18 -07:00
Dan Williams	f95b4bca9e	libnvdimm, namespace: refactor uuid_show() into a namespace_to_uuid() helper The ability to translate a generic struct device pointer into a namespace uuid is a useful utility as we go to unify the blk and pmem label scanning paths. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-10-05 20:24:18 -07:00
Dan Williams	ae8219f186	libnvdimm, label: convert label tracking to a linked list In preparation for enabling multiple namespaces per pmem region, convert the label tracking to use a linked list. In particular this will allow select_pmem_id() to move labels from the unvalidated state to the validated state. Currently we only track one validated set per-region. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-09-30 19:13:42 -07:00
Dan Williams	44c462eb9e	libnvdimm, region: move region-mapping input-paramters to nd_mapping_desc Before we add more libnvdimm-private fields to nd_mapping make it clear which parameters are input vs libnvdimm internals. Use struct nd_mapping_desc instead of struct nd_mapping in nd_region_desc and make struct nd_mapping private to libnvdimm. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-09-30 19:13:42 -07:00
Dave Jiang	db58028ee4	nvdimm: reduce duplicated wpq flushes Existing implemenetation writes to all the flush hint addresses for a given ND region. This is not necessary as the flushes are per imc and not per DIMM. Search the mappings and clear out the duplicates at init to avoid multiple flush to the same imc. Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-09-30 19:12:52 -07:00
Vishal Verma	e046114af5	libnvdimm: clear the internal poison_list when clearing badblocks nvdimm_clear_poison cleared the user-visible badblocks, and sent commands to the NVDIMM to clear the areas marked as 'poison', but it neglected to clear the same areas from the internal poison_list which is used to marshal ARS results before sorting them by namespace. As a result, once on-demand ARS functionality was added: `37b137f` nfit, libnvdimm: allow an ARS scrub to be triggered on demand A scrub triggered from either sysfs or an MCE was found to be adding stale entries that had been cleared from gendisk->badblocks, but were still present in nvdimm_bus->poison_list. Additionally, the stale entries could be triggered into producing stale disk->badblocks by simply disabling and re-enabling the namespace or region. This adds the missing step of clearing poison_list entries when clearing poison, so that it is always in sync with badblocks. Fixes: `37b137f` ("nfit, libnvdimm: allow an ARS scrub to be triggered on demand") Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-09-30 17:03:45 -07:00
Vishal Verma	bd697a80c3	pmem: reduce kmap_atomic sections to the memcpys only pmem_do_bvec used to kmap_atomic at the begin, and only unmap at the end. Things like nvdimm_clear_poison may want to do nvdimm subsystem bookkeeping operations that may involve taking locks or doing memory allocations, and we can't do that from the atomic context. Reduce the atomic context to just what needs it - the memcpy to/from pmem. Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-09-30 17:03:45 -07:00
Dan Williams	595c73071e	libnvdimm, region: fix flush hint table thinko The definition of the flush hint table as: void __iomem *flush_wpq[0][0]; ...passed the unit test, but is broken as flush_wpq[0][1] and flush_wpq[1][0] refer to the same entry. Fix this to use a helper that calculates a slot in the table based on the geometry of flush hints in the region. This is important to get right since virtualization solutions use this mechanism to trigger hypervisor flushes to platform persistence. Reported-by: Dave Jiang <dave.jiang@intel.com> Tested-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-09-24 11:45:38 -07:00
Dave Jiang	a0056afe21	nvdimm: remove duplicate nd_mapping declaration Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-09-21 15:28:29 -07:00
Dan Williams	4765218db7	libnvdimm, namespace: debug invalid interleave-set-cookie values If platform firmware fails to populate unique / non-zero serial number data for each nvdimm in an interleave-set it may cause pmem region initialization to fail. Add a debug message for this case. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-09-21 09:36:36 -07:00
Dan Williams	ecfb6d8a04	libnvdimm: fix devm_nvdimm_memremap() error path The internal alloc_nvdimm_map() helper might fail, particularly if the memory region is already busy. Report request_mem_region() failures and check for the failure. Reported-by: Ryan Chen <ryan.chan105@gmail.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-09-21 09:35:15 -07:00
Oliver O'Halloran	480b6837aa	nvdimm: fix PHYS_PFN/PFN_PHYS mixup nd_activate_region() iomaps any hint addresses required when activating a region. To prevent duplicate mappings it checks the PFN of the hint to be mapped against the PFNs of the already mapped hints. Unfortunately it doesn't convert the PFN back into a physical address before passing it to devm_nvdimm_ioremap(). Instead it applies PHYS_PFN a second time which ends about as well as you would imagine. Signed-off-by: Oliver O'Halloran <oohall@gmail.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-09-19 08:54:27 -07:00
Dave Jiang	1e8b8d9619	libnvdimm: allow legacy (e820) pmem region to clear bad blocks Bad blocks can be injected via /sys/block/pmemN/badblocks. In a situation where legacy pmem is being used or a pmem region created by using memmap kernel parameter, the injected bad blocks are not cleared due to nvdimm_clear_poison() failing from lack of ndctl function pointer. In this case we need to just return as handled and allow the bad blocks to be cleared rather than fail. Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-09-09 17:34:46 -07:00
Toshi Kani	aee6598748	libnvdimm: Fix nvdimm_probe error on NVDIMM-N 'ndctl list --buses --dimms' does not list any NVDIMM-Ns since they are considered as idle. ndctl checks if any driver is attached to nmem device. nvdimm_probe() always fails in nvdimm_init_nsarea() since NVDIMM-Ns do not implement optinal ND_CMD_GET_CONFIG_DATA command. Change nvdimm_probe() to accept the case that the CONFIG_DATA command is not implemented for NVDIMM-Ns. The driver attaches without ndd, which keeps it no-op to the device. Reported-by: Brian Boylston <brian.boylston@hpe.com> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Dan Williams <dan.j.williams@intel.com> Tested-by: Johannes Thumshirn <jthumshirn@suse.de> Acked-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-09-01 18:20:39 -07:00
Geert Uytterhoeven	ae551e9ca2	nvdimm: Spelling s/unacknoweldged/unacknowledged/ Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-09-01 18:20:39 -07:00
Dan Williams	ba9c8dd3c2	acpi, nfit: add dimm device notification support Per "ACPI 6.1 Section 9.20.3" NVDIMM devices, children of the ACPI0012 NVDIMM Root device, can receive health event notifications. Given that these devices are precluded from registering a notification handler via acpi_driver.acpi_device_ops (due to no _HID), we use acpi_install_notify_handler() directly. The registered handler, acpi_nvdimm_notify(), triggers a poll(2) event on the nmemX/nfit/flags sysfs attribute when a health event notification is received. Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Toshi Kani <toshi.kani@hpe.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Acked-by: Rafael J. Wysocki <rafael@kernel.org> Reviewed-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-08-29 14:55:17 -07:00
Vishal Verma	abe8b4e3ce	nvdimm, btt: add a size attribute for BTTs To be consistent with other namespaces, expose a 'size' attribute for BTT devices also. Cc: Dan Williams <dan.j.williams@intel.com> Reported-by: Linda Knippers <linda.knippers@hpe.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-08-08 09:26:14 -07:00
Jens Axboe	1eff9d322a	block: rename bio bi_rw to bi_opf Since commit `63a4cc2486`, bio->bi_rw contains flags in the lower portion and the op code in the higher portions. This means that old code that relies on manually setting bi_rw is most likely going to be broken. Instead of letting that brokeness linger, rename the member, to force old and out-of-tree code to break at compile time instead of at runtime. No intended functional changes in this commit. Signed-off-by: Jens Axboe <axboe@fb.com>	2016-08-07 14:41:02 -06:00
Jens Axboe	c11f0c0b5b	block/mm: make bdev_ops->rw_page() take a bool for read/write Commit `abf545484d` changed it from an 'rw' flags type to the newer ops based interface, but now we're effectively leaking some bdev internals to the rest of the kernel. Since we only care about whether it's a read or a write at that level, just pass in a bool 'is_write' parameter instead. Then we can also move op_is_write() and friends back under CONFIG_BLOCK protection. Reviewed-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-08-07 14:41:02 -06:00
Mike Christie	abf545484d	mm/block: convert rw_page users to bio op use The rw_page users were not converted to use bio/req ops. As a result bdev_write_page is not passing down REQ_OP_WRITE and the IOs will be sent down as reads. Signed-off-by: Mike Christie <mchristi@redhat.com> Fixes: `4e1b2d52a8` ("block, fs, drivers: remove REQ_OP compat defs and related code") Modified by me to: 1) Drop op_flags passing into ->rw_page(), as we don't use it. 2) Make op_is_write() and friends safe to use for !CONFIG_BLOCK Signed-off-by: Jens Axboe <axboe@fb.com>	2016-08-04 14:25:33 -06:00
Linus Torvalds	f0c98ebc57	libnvdimm for 4.8 1/ Replace pcommit with ADR / directed-flushing: The pcommit instruction, which has not shipped on any product, is deprecated. Instead, the requirement is that platforms implement either ADR, or provide one or more flush addresses per nvdimm. ADR (Asynchronous DRAM Refresh) flushes data in posted write buffers to the memory controller on a power-fail event. Flush addresses are defined in ACPI 6.x as an NVDIMM Firmware Interface Table (NFIT) sub-structure: "Flush Hint Address Structure". A flush hint is an mmio address that when written and fenced assures that all previous posted writes targeting a given dimm have been flushed to media. 2/ On-demand ARS (address range scrub): Linux uses the results of the ACPI ARS commands to track bad blocks in pmem devices. When latent errors are detected we re-scrub the media to refresh the bad block list, userspace can also request a re-scrub at any time. 3/ Support for the Microsoft DSM (device specific method) command format. 4/ Support for EDK2/OVMF virtual disk device memory ranges. 5/ Various fixes and cleanups across the subsystem. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJXmXBsAAoJEB7SkWpmfYgCEwwP/1IOt9ocP+iHLMDH9KE7VaTZ NmUDR+Zy6g5cRQM7SgcuU5BXUcx+OsSrSrUTVF1cW994o9Gbz1mFotkv0ZAsPcYY ZVRQxo2oqHrssyOcg+PsgKWiXn68rJOCgmpEyzaJywl5qTMst7pzsT1s1f7rSh6h trCf4VaJJwxZR8fARGtlHUnnhPe2Orp99EZRKEWprAsIv2kPuWpPHSjRjuEgN1JG KW8AYwWqFTtiLRUk86I4KBB0wcDrfctsjgN9Ogd6+aHyQBRnVSr2U+vDCFkC8KLu qiDCpYp+yyxBjclnljz7tRRT3GtzfCUWd4v2KVWqgg2IaobUc0Lbukp/rmikUXQP WLikT2OCQ994eFK5OX3Q3cIU/4j459TQnof8q14yVSpjAKrNUXVSR5puN7Hxa+V7 41wKrAsnsyY1oq+Yd/rMR8VfH7PHx3bFkrmRCGZCufLX1UQm4aYj+sWagDKiV3yA DiudghbOnhfurfGsnXUVw7y7GKs+gNWNBmB6ndAD6ZEHmKoGUhAEbJDLCc3DnANl b/2mv1MIdIcC1DlCmnbbcn6fv6bICe/r8poK3VrCK3UgOq/EOvKIWl7giP+k1JuC 6DdVYhlNYIVFXUNSLFAwz8OkLu8byx7WDm36iEqrKHtPw+8qa/2bWVgOU6OBgpjV cN3edFVIdxvZeMgM5Ubq =xCBG -----END PGP SIGNATURE----- Merge tag 'libnvdimm-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm updates from Dan Williams: - Replace pcommit with ADR / directed-flushing. The pcommit instruction, which has not shipped on any product, is deprecated. Instead, the requirement is that platforms implement either ADR, or provide one or more flush addresses per nvdimm. ADR (Asynchronous DRAM Refresh) flushes data in posted write buffers to the memory controller on a power-fail event. Flush addresses are defined in ACPI 6.x as an NVDIMM Firmware Interface Table (NFIT) sub-structure: "Flush Hint Address Structure". A flush hint is an mmio address that when written and fenced assures that all previous posted writes targeting a given dimm have been flushed to media. - On-demand ARS (address range scrub). Linux uses the results of the ACPI ARS commands to track bad blocks in pmem devices. When latent errors are detected we re-scrub the media to refresh the bad block list, userspace can also request a re-scrub at any time. - Support for the Microsoft DSM (device specific method) command format. - Support for EDK2/OVMF virtual disk device memory ranges. - Various fixes and cleanups across the subsystem. * tag 'libnvdimm-for-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (41 commits) libnvdimm-btt: Delete an unnecessary check before the function call "__nd_device_register" nfit: do an ARS scrub on hitting a latent media error nfit: move to nfit/ sub-directory nfit, libnvdimm: allow an ARS scrub to be triggered on demand libnvdimm: register nvdimm_bus devices with an nd_bus driver pmem: clarify a debug print in pmem_clear_poison x86/insn: remove pcommit Revert "KVM: x86: add pcommit support" nfit, tools/testing/nvdimm/: unify shutdown paths libnvdimm: move ->module to struct nvdimm_bus_descriptor nfit: cleanup acpi_nfit_init calling convention nfit: fix _FIT evaluation memory leak + use after free tools/testing/nvdimm: add manufacturing_{date\|location} dimm properties tools/testing/nvdimm: add virtual ramdisk range acpi, nfit: treat virtual ramdisk SPA as pmem region pmem: kill __pmem address space pmem: kill wmb_pmem() libnvdimm, pmem: use nvdimm_flush() for namespace I/O writes fs/dax: remove wmb_pmem() libnvdimm, pmem: flush posted-write queues on shutdown ...	2016-07-28 17:38:16 -07:00
Linus Torvalds	3fc9d69093	Merge branch 'for-4.8/drivers' of git://git.kernel.dk/linux-block Pull block driver updates from Jens Axboe: "This branch also contains core changes. I've come to the conclusion that from 4.9 and forward, I'll be doing just a single branch. We often have dependencies between core and drivers, and it's hard to always split them up appropriately without pulling core into drivers when that happens. That said, this contains: - separate secure erase type for the core block layer, from Christoph. - set of discard fixes, from Christoph. - bio shrinking fixes from Christoph, as a followup up to the op/flags change in the core branch. - map and append request fixes from Christoph. - NVMeF (NVMe over Fabrics) code from Christoph. This is pretty exciting! - nvme-loop fixes from Arnd. - removal of ->driverfs_dev from Dan, after providing a device_add_disk() helper. - bcache fixes from Bhaktipriya and Yijing. - cdrom subchannel read fix from Vchannaiah. - set of lightnvm updates from Wenwei, Matias, Johannes, and Javier. - set of drbd updates and fixes from Fabian, Lars, and Philipp. - mg_disk error path fix from Bart. - user notification for failed device add for loop, from Minfei. - NVMe in general: + NVMe delay quirk from Guilherme. + SR-IOV support and command retry limits from Keith. + fix for memory-less NUMA node from Masayoshi. + use UINT_MAX for discard sectors, from Minfei. + cancel IO fixes from Ming. + don't allocate unused major, from Neil. + error code fixup from Dan. + use constants for PSDT/FUSE from James. + variable init fix from Jay. + fabrics fixes from Ming, Sagi, and Wei. + various fixes" * 'for-4.8/drivers' of git://git.kernel.dk/linux-block: (115 commits) nvme/pci: Provide SR-IOV support nvme: initialize variable before logical OR'ing it block: unexport various bio mapping helpers scsi/osd: open code blk_make_request target: stop using blk_make_request block: simplify and export blk_rq_append_bio block: ensure bios return from blk_get_request are properly initialized virtio_blk: use blk_rq_map_kern memstick: don't allow REQ_TYPE_BLOCK_PC requests block: shrink bio size again block: simplify and cleanup bvec pool handling block: get rid of bio_rw and READA block: don't ignore -EOPNOTSUPP blkdev_issue_write_same block: introduce BLKDEV_DISCARD_ZERO to fix zeroout NVMe: don't allocate unused nvme_major nvme: avoid crashes when node 0 is memoryless node. nvme: Limit command retries loop: Make user notify for adding loop device failed nvme-loop: fix nvme-loop Kconfig dependencies nvmet: fix return value check in nvmet_subsys_alloc() ...	2016-07-26 15:37:51 -07:00
Linus Torvalds	d05d7f4079	Merge branch 'for-4.8/core' of git://git.kernel.dk/linux-block Pull core block updates from Jens Axboe: - the big change is the cleanup from Mike Christie, cleaning up our uses of command types and modified flags. This is what will throw some merge conflicts - regression fix for the above for btrfs, from Vincent - following up to the above, better packing of struct request from Christoph - a 2038 fix for blktrace from Arnd - a few trivial/spelling fixes from Bart Van Assche - a front merge check fix from Damien, which could cause issues on SMR drives - Atari partition fix from Gabriel - convert cfq to highres timers, since jiffies isn't granular enough for some devices these days. From Jan and Jeff - CFQ priority boost fix idle classes, from me - cleanup series from Ming, improving our bio/bvec iteration - a direct issue fix for blk-mq from Omar - fix for plug merging not involving the IO scheduler, like we do for other types of merges. From Tahsin - expose DAX type internally and through sysfs. From Toshi and Yigal * 'for-4.8/core' of git://git.kernel.dk/linux-block: (76 commits) block: Fix front merge check block: do not merge requests without consulting with io scheduler block: Fix spelling in a source code comment block: expose QUEUE_FLAG_DAX in sysfs block: add QUEUE_FLAG_DAX for devices to advertise their DAX support Btrfs: fix comparison in __btrfs_map_block() block: atari: Return early for unsupported sector size Doc: block: Fix a typo in queue-sysfs.txt cfq-iosched: Charge at least 1 jiffie instead of 1 ns cfq-iosched: Fix regression in bonnie++ rewrite performance cfq-iosched: Convert slice_resid from u64 to s64 block: Convert fifo_time from ulong to u64 blktrace: avoid using timespec block/blk-cgroup.c: Declare local symbols static block/bio-integrity.c: Add #include "blk.h" block/partition-generic.c: Remove a set-but-not-used variable block: bio: kill BIO_MAX_SIZE cfq-iosched: temporarily boost queue priority for idle classes block: drbd: avoid to use BIO_MAX_SIZE block: bio: remove BIO_MAX_SECTORS ...	2016-07-26 15:03:07 -07:00
Dan Williams	0606263f24	Merge branch 'for-4.8/libnvdimm' into libnvdimm-for-next	2016-07-24 08:05:44 -07:00
Markus Elfring	d4c5725d57	libnvdimm-btt: Delete an unnecessary check before the function call "__nd_device_register" The __nd_device_register() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-07-24 08:04:55 -07:00
Vishal Verma	37b137ff8c	nfit, libnvdimm: allow an ARS scrub to be triggered on demand Normally, an ARS (Address Range Scrub) only happens at boot/initialization time. There can however arise situations where a bus-wide rescan is needed - notably, in the case of discovering a latent media error, we should do a full rescan to figure out what other sectors are bad, and thus potentially avoid triggering an mce on them in the future. Also provide a sysfs trigger to start a bus-wide scrub. Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-07-23 21:51:42 -07:00
Dan Williams	18515942d6	libnvdimm: register nvdimm_bus devices with an nd_bus driver A recent effort to add a new nvdimm bus provider attribute highlighted a race between interrogating nvdimm_bus->nd_desc and nvdimm_bus tear down. The typical way to handle these races is to take the device_lock() in the attribute method and validate that the device is still active. In order for a device to be 'active' it needs to be associated with a driver. So, we create the small boilerplate for a driver and register nvdimm_bus devices on the 'nvdimm_bus_type' bus. A result of this change is that ndbusX devices now appear under /sys/bus/nd/devices. In fact this makes /sys/class/nd somewhat redundant, but removing that will need to take a long deprecation period given its use by ndctl binaries in the field. This change naturally pulls code from drivers/nvdimm/core.c to drivers/nvdimm/bus.c, so it is a nice code organization clean-up as well. Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-07-23 11:06:33 -07:00
Vishal Verma	5bf0b6e1af	pmem: clarify a debug print in pmem_clear_poison Prefix the sector number being cleared with a '0x' to make it clear that this is a hex value. Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-07-23 11:06:33 -07:00
Dan Williams	bc9775d869	libnvdimm: move ->module to struct nvdimm_bus_descriptor Let the provider module be explicitly passed in rather than implicitly assumed by the module that calls nvdimm_bus_register(). This is in preparation for unifying the nfit and nfit_test driver teardown paths. Reviewed-by: Lee, Chun-Yi <jlee@suse.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-07-21 20:03:19 -07:00
Toshi Kani	163d4baaeb	block: add QUEUE_FLAG_DAX for devices to advertise their DAX support Currently, presence of direct_access() in block_device_operations indicates support of DAX on its block device. Because block_device_operations is instantiated with 'const', this DAX capablity may not be enabled conditinally. In preparation for supporting DAX to device-mapper devices, add QUEUE_FLAG_DAX to request_queue flags to advertise their DAX support. This will allow to set the DAX capability based on how mapped device is composed. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: <linux-s390@vger.kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com>	2016-07-20 21:01:01 -06:00
Dan Williams	7a9eb20666	pmem: kill __pmem address space The __pmem address space was meant to annotate codepaths that touch persistent memory and need to coordinate a call to wmb_pmem(). Now that wmb_pmem() is gone, there is little need to keep this annotation. Cc: Christoph Hellwig <hch@lst.de> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-07-12 19:25:38 -07:00
Dan Williams	91131dbd1d	libnvdimm, pmem: use nvdimm_flush() for namespace I/O writes nsio_rw_bytes() is used to write info block metadata to the namespace, so it should trigger a flush after every write. Replace wmb_pmem() with nvdimm_flush() in this path. Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-07-12 15:13:48 -07:00
Dan Williams	476f848aae	libnvdimm, pmem: flush posted-write queues on shutdown Commit writes to media on system shutdown or pmem driver unload. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-07-12 15:13:48 -07:00
Dan Williams	7e267a8c79	libnvdimm, pmem: use REQ_FUA, REQ_FLUSH for nvdimm_flush() Given that nvdimm_flush() has higher overhead than wmb_pmem() (pointer chasing through nd_region), and that we otherwise assume a platform has ADR capability when flush hints are not present, move nvdimm_flush() to REQ_FLUSH context. Note that we still arrange for nvdimm_flush() to be called even in the ADR case. We need at least once wmb() fence to push buffered writes in the cpu out to the ADR protected domain. Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-07-11 16:16:03 -07:00
Dan Williams	0c27af60d1	libnvdimm: cycle flush hints When the NFIT provides multiple flush hint addresses per-dimm it is expressing that the platform is capable of processing multiple flush requests in parallel. There is some fixed cost per flush request, let the cost be shared in parallel on multiple cpus. Since there may not be enough flush hint addresses for each cpu to have one, keep a per-cpu index of the last used hint, hash it with current pid, and assume that access pattern and scheduler randomness will keep the flush-hint usage somewhat staggered across cpus. Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-07-11 16:16:03 -07:00
Dan Williams	f284a4f237	libnvdimm: introduce nvdimm_flush() and nvdimm_has_flush() nvdimm_flush() is a replacement for the x86 'pcommit' instruction. It is an optional write flushing mechanism that an nvdimm bus can provide for the pmem driver to consume. In the case of the NFIT nvdimm-bus-provider nvdimm_flush() is implemented as a series of flush-hint-address [1] writes to each dimm in the interleave set (region) that backs the namespace. The nvdimm_has_flush() routine relies on platform firmware to describe the flushing capabilities of a platform. It uses the heuristic of whether an nvdimm bus provider provides flush address data to return a ternary result: 1: flush addresses defined 0: dimm topology described without flush addresses (assume ADR) -errno: no topology information, unable to determine flush mechanism The pmem driver is expected to take the following actions on this ternary result: 1: nvdimm_flush() in response to REQ_FUA / REQ_FLUSH and shutdown 0: do not set, WC or FUA on the queue, take no further action -errno: warn and then operate as if nvdimm_has_flush() returned '0' The caveat of this heuristic is that it can not distinguish the "dimm does not have flush address" case from the "platform firmware is broken and failed to describe a flush address". Given we are already explicitly trusting the NFIT there's not much more we can do beyond blacklisting broken firmwares if they are ever encountered. Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-07-11 16:13:42 -07:00
Dan Williams	a8f720224e	libnvdimm: keep region data alive over namespace removal nd_region device driver data will be used in the namespace i/o path. Re-order nd_region_remove() to ensure this data stays live across namespace device removal Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-07-11 16:13:41 -07:00
Dan Williams	e5ae3b252c	libnvdimm, nfit: move flush hint mapping to region-device driver-data In preparation for triggering flushes of a DIMM's writes-posted-queue (WPQ) via the pmem driver move mapping of flush hint addresses to the region driver. Since this uses devm_nvdimm_memremap() the flush addresses will remain mapped while any region to which the dimm belongs is active. We need to communicate more information to the nvdimm core to facilitate this mapping, namely each dimm object now carries an array of flush hint address resources. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-07-11 15:09:26 -07:00
Dan Williams	a8a6d2e04c	libnvdimm, nfit: remove nfit_spa_map() infrastructure Now that all shared mappings are handled by devm_nvdimm_memremap() we no longer need nfit_spa_map() nor do we need to trigger a callback to the bus provider at region disable time. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-07-11 15:09:26 -07:00
Dan Williams	29b9aa0aa3	libnvdimm: introduce devm_nvdimm_memremap(), convert nfit_spa_map() users In preparation for generically mapping flush hint addresses for both the BLK and PMEM use case, provide a generic / reference counted mapping api. Given the fact that a dimm may belong to multiple regions (PMEM and BLK), the flush hint addresses need to be held valid as long as any region associated with the dimm is active. This is similar to the existing BLK-region case where multiple BLK-regions may share an aperture mapping. Up-level this shared / reference-counted mapping capability from the nfit driver to a core nvdimm capability. This eliminates the need for the nd_blk_region.disable() callback. Note that the removal of nfit_spa_map() and related infrastructure is deferred to a later patch. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-07-07 17:11:09 -07:00
Johannes Thumshirn	8729bdea82	libnvdimm: initialize struct blk_integrity with 0 Initialize struct blk_integrity with 0 as blk_integrity_register() takes the then unitialized struct blk_integrity::flags and ORs it to the resulting block integrity structure. Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-07-06 10:34:42 -07:00
Dan Williams	52c44d93c2	block: remove ->driverfs_dev Now that all drivers that specify a ->driverfs_dev have been converted to device_add_disk(), the pointer can be removed from struct gendisk. Cc: Jens Axboe <axboe@fb.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-06-27 12:26:08 -07:00
Dan Williams	0d52c756a6	block: convert to device_add_disk() For block drivers that specify a parent device, convert them to use device_add_disk(). This conversion was done with the following semantic patch: @@ struct gendisk disk; expression E; @@ - disk->driverfs_dev = E; ... - add_disk(disk); + device_add_disk(E, disk); @@ struct gendisk disk; expression E1, E2; @@ - disk->driverfs_dev = E1; ... E2 = disk; ... - add_disk(E2); + device_add_disk(E1, E2); ...plus some manual fixups for a few missed conversions. Cc: Jens Axboe <axboe@fb.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: James Bottomley <James.Bottomley@hansenpartnership.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-06-27 12:26:08 -07:00
Dan Williams	f295e53b60	libnvdimm, pmem: allow nfit_test to override pmem_direct_access() Currently phys_to_pfn_t() is an exported symbol to allow nfit_test to override it and indicate that nfit_test-pmem is not device-mapped. Now, we want to enable nfit_test to operate without DMA_CMA and the pmem it provides will no longer be physically contiguous, i.e. won't be capable of supporting direct_access requests larger than a page. Make pmem_direct_access() a weak symbol so that it can be replaced by the tools/testing/nvdimm/ version, and move phys_to_pfn_t() to a static inline now that it no longer needs to be overridden. Acked-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-06-24 11:39:29 -07:00
Dan Williams	1ee6667cd8	libnvdimm, pfn, dax: fix initialization vs autodetect for mode + alignment The updated ndctl unit tests discovered that if a pfn configuration with a 4K alignment is read from the namespace, that alignment will be ignored in favor of the default 2M alignment. The result is that the configuration will fail initialization with a message like: dax6.1: bad offset: 0x22000 dax disabled align: 0x200000 Fix this by allowing the alignment read from the info block to override the default which is 2M not 0 in the autodetect path. This also fixes a similar problem with the mode and alignment settings silently being overwritten by the kernel when userspace has changed it. We now will either overwrite the info block if userspace changes the uuid or fail and warn if a live setting disagrees with the info block. Cc: <stable@vger.kernel.org> Cc: Micah Parrish <micah.parrish@hpe.com> Cc: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-06-23 17:50:39 -07:00
Dan Williams	4258895814	libnvdimm: IS_ERR() usage cleanup Prompted by commit `287980e49f` "remove lots of IS_ERR_VALUE abuses", I ran make coccicheck against drivers/nvdimm/ and found that: if (IS_ERR(x)) return PTR_ERR(x); return 0; ...can be replaced with PTR_ERR_OR_ZERO(). Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-06-17 16:23:23 -07:00
Dan Williams	f02716db95	libnvdimm: use devm_add_action_or_reset() Clean up needless calls to the action routine by letting devm_add_action_or_reset() call it automatically. This does cause the disk to registered and immediately unregistered when a memory allocation fails, but the block layer should be prepared for such an event. Reported-by: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-06-15 14:59:17 -07:00
Linus Torvalds	315227f6da	DAX error handling for 4.7 - Until now, dax has been disabled if media errors were found on any device. This enables the use of DAX in the presence of these errors by making all sector-aligned zeroing go through the driver. - The driver (already) has the ability to clear errors on writes that are sent through the block layer using 'DSMs' defined in ACPI 6.1. Other misc changes: - When mounting DAX filesystems, check to make sure the partition is page aligned. This is a requirement for DAX, and previously, we allowed such unaligned mounts to succeed, but subsequent reads/writes would fail. - Misc/cleanup fixes from Jan that remove unused code from DAX related to zeroing, writeback, and some size checks. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJXQ4GKAAoJEHr6Yb6juE3/zowP/iclIhgXXXMQJRUHJlePMXC8 15sGZ32JS1ak9g7vrsmNVEDNynfNtiMYdBxtUyRuj6xqgwdZvFk3F55KOCPtaeA1 +yADkgeRkTAcwzmHw9WQVEzBCqyzSisdrwtEfH817qdq9FJdH66x2Kos6i+HeAVr 5Q/e4gs7lKrjf384/QBl+wxNZOndJaQAPd2VRHQqx2A9F33v0ljdwRaUG1r4fjK2 dtmhcZCqdQyuAGXW3piTnZc5ZFc3DPqO4FkEfqkEK3lFOflK0fd8wMsAZRp/Jd0j GJsgnVSWSqG0Dz476djlG0w8t2p5Jv1g9cKChV+ZZEdFLKWHCOUFqXNj8uI8I4k5 cOEKCHyJ3IwfSHhNQqktEWrQN4T8ZXhWtuc9GuV4UZYuqJqHci6EdR/YsWsJjV+L lm/qvK4ipDS1pivxOy8KX/iN0z7Io8J9GXpStDx3g8iWjLlh4YYlbJLWeeRepo/z aPlV/QAKcHiGY6jzLExrZIyCWkzwo6O+0p1Kxerv9/7K/32HWbOodZ+tC8eD+N25 pV69nCGf+u50T2TtIx1+iann4NC1r7zg5yqnT9AgpyZpiwR5joCDzI5sXW+D0rcS vPtfM84Ccdeq/e6mvfIpZgR0/npQapKnrmUest0J7P2BFPHiFPji1KzZ7M+1aFOo 9R6JdrAj0Sc+FBa+cGzH =v6Of -----END PGP SIGNATURE----- Merge tag 'dax-misc-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull misc DAX updates from Vishal Verma: "DAX error handling for 4.7 - Until now, dax has been disabled if media errors were found on any device. This enables the use of DAX in the presence of these errors by making all sector-aligned zeroing go through the driver. - The driver (already) has the ability to clear errors on writes that are sent through the block layer using 'DSMs' defined in ACPI 6.1. Other misc changes: - When mounting DAX filesystems, check to make sure the partition is page aligned. This is a requirement for DAX, and previously, we allowed such unaligned mounts to succeed, but subsequent reads/writes would fail. - Misc/cleanup fixes from Jan that remove unused code from DAX related to zeroing, writeback, and some size checks" * tag 'dax-misc-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: dax: fix a comment in dax_zero_page_range and dax_truncate_page dax: for truncate/hole-punch, do zeroing through the driver if possible dax: export a low-level __dax_zero_page_range helper dax: use sb_issue_zerout instead of calling dax_clear_sectors dax: enable dax in the presence of known media errors (badblocks) dax: fallback from pmd to pte on error block: Update blkdev_dax_capable() for consistency xfs: Add alignment check for DAX mount ext2: Add alignment check for DAX mount ext4: Add alignment check for DAX mount block: Add bdev_dax_supported() for dax mount checks block: Add vfs_msg() interface dax: Remove redundant inode size checks dax: Remove pointless writeback from dax_do_io() dax: Remove zeroing from dax_io() dax: Remove dead zeroing code from fault handlers ext2: Avoid DAX zeroing to corrupt data ext2: Fix block zeroing in ext2_get_blocks() for DAX dax: Remove complete_unwritten argument DAX: move RADIX_DAX_ definitions to dax.c	2016-05-26 19:34:26 -07:00
Dan Williams	36092ee8ba	Merge branch 'for-4.7/dax' into libnvdimm-for-next	2016-05-21 12:33:04 -07:00
Dan Williams	03dca343af	libnvdimm, dax: fix deletion The ndctl unit tests discovered that the dax enabling omitted updates to nd_detach_and_reset(). This routine clears device the configuration when the namespace is detached. Without this clearing userspace may assume that the device is in the process of being configured by another agent in the system. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-05-21 12:22:41 -07:00
Dan Williams	5e24c9fd36	libnvdimm, dax: fix alignment validation Testing the dax-device autodetect support revealed a probe failure with the following result: dax0.1: bad offset: 0x8200000 dax disabled The original pfn-device implementation inferred the alignment from ilog2(offset), now that the alignment is explicit the is_power_of_2() needs replacing with a real sanity check against the recorded alignment. Otherwise the alignment check is useless in the implicit case and only the minimum size of the offset matters. This self-consistency check is further validated by the probe path that will re-check that the offset is large enough to contain all the metadata required to enable the device. Cc: <stable@vger.kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-05-21 11:01:41 -07:00
Dan Williams	c5ed926864	libnvdimm, dax: autodetect support For autodetecting a previously established dax configuration we need the info block to indicate block-device vs device-dax mode, and we need to have the default namespace probe hand-off the configuration to the dax_pmem driver. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-05-20 22:02:57 -07:00
Dan Williams	b354aba016	libnvdimm: release ida resources ida instances allocate some internal memory for ->free_bitmap in addition to the base 'struct ida'. Use ida_destroy() to release that memory at module_exit(). Reported-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-05-20 22:02:56 -07:00
Dan Williams	0a70bd4305	dax: enable dax in the presence of known media errors (badblocks) 1/ If a mapping overlaps a bad sector fail the request. 2/ Do not opportunistically report more dax-capable capacity than is requested when errors present. Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com> [vishal: fix a conflict with system RAM collision patches] [vishal: add a 'size' parameter to ->direct_access] [vishal: fix a conflict with DAX alignment check patches] Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>	2016-05-18 12:16:56 -06:00
Dan Williams	1f716d05f8	Merge branch 'for-4.7/dsm' into libnvdimm-for-next	2016-05-18 10:06:59 -07:00
Dan Williams	2159669f58	Merge branch 'for-4.7/libnvdimm' into libnvdimm-for-next	2016-05-18 10:06:48 -07:00
Dan Williams	594d6d96ea	Merge branch 'for-4.7/dax' into libnvdimm-for-next	2016-05-18 09:59:34 -07:00
Dan Williams	6cf9c5babd	libnvdimm: stop requiring a driver ->remove() method The dax_pmem driver was implementing an empty ->remove() method to satisfy the nvdimm bus driver that unconditionally calls ->remove(). Teach the core bus driver to check if ->remove() is NULL to remove that requirement. Reported-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-05-18 09:13:13 -07:00
Dan Williams	45a0dac045	libnvdimm, dax: record the specified alignment of a dax-device instance We want to use the alignment as the allocation and mapping unit. Previously this information was only useful for establishing the data offset, but now it is important to remember the granularity for the later use. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-05-09 15:35:42 -07:00
Dan Williams	52ac23b25e	libnvdimm, dax: reserve space to store labels for device-dax We may want to subdivide a device-dax range into multiple devices so that each can have separate permissions or naming. Reserve 128K of label space by default so we have the capability of making allocation decisions persistent. This reservation is not something we can add later since it would result in the default size of a device-dax range changing between kernel versions. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-05-09 15:35:42 -07:00
Dan Williams	cd03412a51	libnvdimm, dax: introduce device-dax infrastructure Device DAX is the device-centric analogue of Filesystem DAX (CONFIG_FS_DAX). It allows persistent memory ranges to be allocated and mapped without need of an intervening file system. This initial infrastructure arranges for a libnvdimm pfn-device to be represented as a different device-type so that it can be attached to a driver other than the pmem driver. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-05-09 15:35:42 -07:00
Dan Williams	1b8d2afde5	libnvdimm, pfn: fix ARCH=alpha allmodconfig build failure I had relied on the kbuild robot for cross build coverage, however it only builds alpha_defconfig. Switch from HPAGE_SIZE to PMD_SIZE, which is more widely defined. Fixes: `658922e57b` ("libnvdimm, pfn: fix memmap reservation sizing") Cc: <stable@vger.kernel.org> Reported-by: Guenter Roeck <guenter@roeck-us.net> Tested-by: Guenter Roeck <guenter@roeck-us.net> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-05-06 10:20:10 -07:00
Dan Williams	658922e57b	libnvdimm, pfn: fix memmap reservation sizing When configuring a pfn-device instance to allocate the memmap array it needs to account for the fact that vmemmap_populate_hugepages() allocates struct page blocks in HPAGE_SIZE chunks. We need to align the reserved area size to 2MB otherwise arch_add_memory() runs out of memory while establishing the memmap: WARNING: CPU: 0 PID: 496 at arch/x86/mm/init_64.c:704 arch_add_memory+0xe7/0xf0 [..] Call Trace: [<ffffffff8148bdb3>] dump_stack+0x85/0xc2 [<ffffffff810a749b>] __warn+0xcb/0xf0 [<ffffffff810a75cd>] warn_slowpath_null+0x1d/0x20 [<ffffffff8106a497>] arch_add_memory+0xe7/0xf0 [<ffffffff811d2097>] devm_memremap_pages+0x287/0x450 [<ffffffff811d1ffa>] ? devm_memremap_pages+0x1ea/0x450 [<ffffffffa0000298>] __wrap_devm_memremap_pages+0x58/0x70 [nfit_test_iomap] [<ffffffffa0047a58>] pmem_attach_disk+0x318/0x420 [nd_pmem] [<ffffffffa0047bcf>] nd_pmem_probe+0x6f/0x90 [nd_pmem] [<ffffffffa0009469>] nvdimm_bus_probe+0x69/0x110 [libnvdimm] [..] ndbus0: nd_pmem.probe(pfn3.0) = -12 nd_pmem: probe of pfn3.0 failed with error -12 libndctl: ndctl_pfn_enable: pfn3.0: failed to enable Reported-by: Namratha Kothapalli <namratha.n.kothapalli@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-04-30 13:07:06 -07:00
Dan Williams	31eca76ba2	nfit, libnvdimm: limited/whitelisted dimm command marshaling mechanism There are currently 4 known similar but incompatible definitions of the command sets that can be sent to an NVDIMM through ACPI. It is also clear that future platform generations (ACPI or not) will continue to revise and extend the DIMM command set as new devices and use cases arrive. It is obviously untenable to continue to proliferate divergence of these command definitions, and to that end a standardization process has begun to provide for a unified specification. However, that leaves a problem about what to do with this first generation where vendors are already shipping divergence. The Linux kernel can support these initial diverged platforms without giving platform-firmware free reign to continue to diverge and compound kernel maintenance overhead. The kernel implementation can encourage standardization in two ways: 1/ Require that any function code that userspace wants to send be explicitly white-listed in the implementation. For ACPI this means function codes marked as supported by acpi_check_dsm() may only be invoked if they appear in the white-list. A function must be publicly documented before it is added to the white-list. 2/ The above restrictions can be trivially bypassed by using the "vendor-specific" payload command. However, since vendor-specific commands are by definition not publicly documented and have the potential to corrupt the kernel's view of the dimm state, we provide a toggle to disable vendor-specific operations. Enabling undefined behavior is a policy decision that can be made by the platform owner and encourages firmware implementations to choose public over private command implementations. Based on an initial patch from Jerry Hoemann Cc: Jerry Hoemann <jerry.hoemann@hpe.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-04-28 16:59:06 -07:00
Dan Williams	e3654eca70	nfit, libnvdimm: clarify "commands" vs "_DSMs" Clarify the distinction between "commands", the ioctls userspace calls to request the kernel take some action on a given dimm device, and "_DSMs", the actual function numbers used in the firmware interface to the DIMM. _DSMs are ACPI specific whereas commands are Linux kernel generic. This is in preparation for breaking the 1:1 implicit relationship between the kernel ioctl number space and the firmware specific function numbers. Cc: Jerry Hoemann <jerry.hoemann@hpe.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-04-28 16:23:16 -07:00
Dan Williams	0bfb8dd3ed	libnvdimm: cleanup nvdimm_namespace_common_probe(), kill 'host' The 'host' variable can be killed as it is always the same as the passed in device. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-04-22 12:26:24 -07:00
Dan Williams	5a92289f41	libnvdimm, pmem: kill ->pmem_queue and ->pmem_disk The devm conversion obviates the need to continue to remember the queue and disk locally in the driver. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-04-22 12:26:24 -07:00
Dan Williams	ac515c084b	libnvdimm, pmem, pfn: move pfn setup to the core Now that pmem internals have been disentangled from pfn setup, that code can move to the core. This is in preparation for adding another user of the pfn-device capabilities. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-04-22 12:26:23 -07:00
Dan Williams	200c79da82	libnvdimm, pmem, pfn: make pmem_rw_bytes generic and refactor pfn setup In preparation for providing an alternative (to block device) access mechanism to persistent memory, convert pmem_rw_bytes() to nsio_rw_bytes(). This allows ->rw_bytes() functionality without requiring a 'struct pmem_device' to be instantiated. In other words, when ->rw_bytes() is in use i/o is driven through 'struct nd_namespace_io', otherwise it is driven through 'struct pmem_device' and the block layer. This consolidates the disjoint calls to devm_exit_badblocks() and devm_memunmap() into a common devm_nsio_disable() and cleans up the init path to use a unified pmem_attach_disk() implementation. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-04-22 12:26:23 -07:00
Dan Williams	947df02d25	libnvdimm, pmem: clean up resource print / request The leading '0x' in front of %pa is redundant, also we can just use %pR to simplify the print statement. The request parameters can be directly taken from the resource as well. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-04-22 12:26:23 -07:00
Dan Williams	030b99e39c	libnvdimm, pmem: use devm_add_action to release bdev resources Register a callback to clean up the request_queue and put the gendisk at driver disable time. Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-04-22 12:26:23 -07:00
Dan Williams	9d90725ddc	libnvdimm, blk: move i/o infrastructure to nd_namespace_blk Consolidate the information for issuing i/o to a blk-namespace, and eliminate some pointer chasing. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-04-22 12:26:23 -07:00
Dan Williams	8378af17a4	libnvdimm, blk: quiet i/o error reporting I/O errors events have the potential to be a high frequency and a log message for each event can swamp the system. This message is also redundant with upper layer error reporting. Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-04-22 12:26:23 -07:00
Dan Williams	bd842b8ca7	libnvdimm, pmem: use ->queuedata for driver private data Save a pointer chase by storing the driver private data in the request_queue rather than the gendisk. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-04-22 12:26:23 -07:00
Dan Williams	d44077a7cd	libnvdimm, blk: use ->queuedata for driver private data Save a pointer chase by storing the driver private data in the request_queue rather than the gendisk. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-04-22 12:26:22 -07:00
Dan Williams	d29cee120e	libnvdimm, blk: use devm_add_action to release bdev resources Register a callback to clean up the request_queue and put the gendisk at driver disable time. Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-04-22 12:26:22 -07:00
Dan Williams	9dec4892ca	libnvdimm, btt: add btt startup debug Report the reason for btt probe failures when debug is enabled. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-04-22 12:26:05 -07:00
Dan Williams	e32bc729a3	libnvdimm, btt, convert nd_btt_probe() to devm Pass the device performing the probe so we can use a devm allocation for the btt superblock. Cc: Vishal Verma <vishal.l.verma@intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-04-22 10:59:54 -07:00
Dan Williams	bd032943b5	libnvdimm, pfn, convert nd_pfn_probe() to devm Pass the device performing the probe so we can use a devm allocation for the pfn superblock. Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-04-22 10:59:54 -07:00
Dan Williams	298f2bc5db	libnvdimm, pmem: kill pmem->ndns We can derive the common namespace from other information. We also do not need to cache it because all the usages are in slow paths. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-04-22 10:59:54 -07:00
Dan Williams	0a370d261c	libnvdimm, pmem: clarify the write+clear_poison+write flow The ACPI specification does not specify the state of data after a clear poison operation. Potential future libnvdimm bus implementations for other architectures also might not specify or disagree on the state of data after clear poison. Clarify why we write twice. Reported-by: Jeff Moyer <jmoyer@redhat.com> Reported-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com>	2016-04-15 14:59:41 -06:00
Dan Williams	baa51277cf	libnvdimm, test: add mock SMART data payload Provide simulated SMART data to enable the ndctl implementation of SMART data retrieval and parsing. The payload is defined here, "Section 4.1 SMART and Health Info (Function Index 1)": http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-04-11 11:11:14 -07:00
Linus Torvalds	239467e852	Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm fixes from Dan Williams: "Three fixes, the first two are tagged for -stable: - The ndctl utility/library gained expanded unit tests illuminating a long standing bug in the libnvdimm SMART data retrieval implementation. It has been broken since its initial implementation, now fixed. - Another one line fix for the detection of stale info blocks. Without this change userspace can get into a situation where it is unable to reconfigure a namespace. - Fix the badblock initialization path in the presence of the new (in v4.6-rc1) section alignment workarounds. Without this change badblocks will be reported at the wrong offset. These have received a build success report from the kbuild robot and have appeared in -next with no reported issues" * 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: libnvdimm, pfn: fix nvdimm_namespace_add_poison() vs section alignment libnvdimm, pfn: fix uuid validation libnvdimm: fix smart data retrieval	2016-04-09 14:05:45 -07:00
Dan Williams	a390180291	libnvdimm, pfn: fix nvdimm_namespace_add_poison() vs section alignment When section alignment padding is in effect we need to shift / truncate the range that is queried for poison by the 'start_pad' or 'end_trunc' reservations. It's easiest if we just pass in an adjusted resource range rather than deriving it from the passed in namespace. With the resource range resolution pushed out to the caller we can also push the namespace-to-region lookup to the caller and drop the implicit pmem-type assumption about the passed in namespace object. Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-04-07 20:02:06 -07:00
Dan Williams	e5670563f5	libnvdimm, pfn: fix uuid validation If we detect a namespace has a stale info block in the init path, we should overwrite with the latest configuration. In fact, we already return -ENODEV when the parent uuid is invalid, the same should be done for the 'self' uuid. Otherwise we can get into a condition where userspace is unable to reconfigure the pfn-device without directly / manually invalidating the info block. Cc: <stable@vger.kernel.org> Reported-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-04-07 19:59:27 -07:00
Dan Williams	2112911266	libnvdimm: fix smart data retrieval It appears that smart data retrieval has been broken the since the initial implementation. Fix the payload size to be 128-bytes per the specification. Cc: <stable@vger.kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-04-07 19:58:44 -07:00
Kirill A. Shutemov	09cbfeaf1a	mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced long time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-04-04 10:41:08 -07:00
Dan Williams	fc0c202813	x86, pmem: use memcpy_mcsafe() for memcpy_from_pmem() Update the definition of memcpy_from_pmem() to return 0 or a negative error code. Implement x86/arch_memcpy_from_pmem() with memcpy_mcsafe(). Cc: Borislav Petkov <bp@alien8.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-03-28 17:19:31 -07:00
Linus Torvalds	de06dbfa78	Merge branch 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm Pull ARM updates from Russell King: "Another mixture of changes this time around: - Split XIP linker file from main linker file to make it more maintainable, and various XIP fixes, and clean up a resulting macro. - Decompressor cleanups from Masahiro Yamada - Avoid printing an error for a missing L2 cache - Remove some duplicated symbols in System.map, and move vectors/stubs back into kernel VMA - Various low priority fixes from Arnd - Updates to allow bus match functions to return negative errno values, touching some drivers and the driver core. Greg has acked these changes. - Virtualisation platform udpates form Jean-Philippe Brucker. - Security enhancements from Kees Cook - Rework some Kconfig dependencies and move PSCI idle management code out of arch/arm into drivers/firmware/psci.c - ARM DMA mapping updates, touching media, acked by Mauro. - Fix places in ARM code which should be using virt_to_idmap() so that Keystone2 can work. - Fix Marvell Tauros2 to work again with non-DT boots. - Provide a delay timer for ARM Orion platforms" * 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: (45 commits) ARM: 8546/1: dma-mapping: refactor to fix coherent+cma+gfp=0 ARM: 8547/1: dma-mapping: store buffer information ARM: 8543/1: decompressor: rename suffix_y to compress-y ARM: 8542/1: decompressor: merge piggy.*.S and simplify Makefile ARM: 8541/1: decompressor: drop redundant FORCE in Makefile ARM: 8540/1: decompressor: use clean-files instead of extra-y to clean files ARM: 8539/1: decompressor: drop more unneeded assignments to "targets" ARM: 8538/1: decompressor: drop unneeded assignments to "targets" ARM: 8532/1: uncompress: mark putc as inline ARM: 8531/1: turn init_new_context into an inline function ARM: 8530/1: remove VIRT_TO_BUS ARM: 8537/1: drop unused DEBUG_RODATA from XIP_KERNEL ARM: 8536/1: mm: hide __start_rodata_section_aligned for non-debug builds ARM: 8535/1: mm: DEBUG_RODATA makes no sense with XIP_KERNEL ARM: 8534/1: virt: fix hyp-stub build for pre-ARMv7 CPUs ARM: make the physical-relative calculation more obvious ARM: 8512/1: proc-v7.S: Adjust stack address when XIP_KERNEL ARM: 8411/1: Add default SPARSEMEM settings ARM: 8503/1: clk_register_clkdev: remove format string interface ARM: 8529/1: remove 'i' and 'zi' targets ...	2016-03-19 16:31:54 -07:00
Dan Williams	489011652a	Merge branch 'for-4.6/pfn' into libnvdimm-for-next	2016-03-09 17:15:43 -08:00
Dan Williams	59e6473980	libnvdimm, pmem: clear poison on write If a write is directed at a known bad block perform the following: 1/ write the data 2/ send a clear poison command 3/ invalidate the poison out of the cache hierarchy Cc: <x86@kernel.org> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-03-09 15:15:32 -08:00
Dan Williams	b5ebc8ec69	libnvdimm, pmem: fix kmap_atomic() leak in error path When we enounter a bad block we need to kunmap_atomic() before returning. Cc: <stable@vger.kernel.org> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-03-09 15:12:41 -08:00
NeilBrown	ff8e92d5d9	nvdimm/btt: don't allocate unused major device number alloc_disk(0) does not require or use a ->major number, all devices are allocated with a major of BLOCK_EXT_MAJOR. So don't allocate btt_major. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-03-09 15:00:24 -08:00
NeilBrown	ec56151d38	nvdimm/blk: don't allocate unused major device number When alloc_disk(0) is used ->major is completely ignored, all devices are allocated with a "major" of BLOCK_EXT_MAJOR. So don't allocate nd_blk_major Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-03-09 14:59:41 -08:00
NeilBrown	55155291b3	pmem: don't allocate unused major device number When alloc_disk(0) or alloc_disk-node(0, XX) is used, the ->major number is completely ignored: all devices are allocated with a major of BLOCK_EXT_MAJOR. So there is no point allocating pmem_major. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-03-09 11:31:34 -08:00
Dan Williams	45f68802f2	libnvdimm, pmem: fix ia64 build, use PHYS_PFN drivers/nvdimm/pmem.c: In function 'nvdimm_namespace_attach_pfn': drivers/nvdimm/pmem.c:367:3: error: implicit declaration of function '__phys_to_pfn' [-Werror=implicit-function-declaration] .base_pfn = __phys_to_pfn(nsio->res.start), ia64 does not provide __phys_to_pfn(), just use the PHYS_PFN() alias. Cc: Guenter Roeck <linux@roeck-us.net> Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-03-06 08:04:12 -08:00
Dan Williams	d4f323672a	nfit, libnvdimm: clear poison command support Add the boiler-plate for a 'clear error' command based on section 9.20.7.6 "Function Index 4 - Clear Uncorrectable Error" from the ACPI 6.1 specification, and add a reference implementation in nfit_test. Reviewed-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-03-05 18:06:14 -08:00
Dan Williams	f6ed58c70d	libnvdimm, pfn: 'resource'-address and 'size' attributes for pfn devices Currenty with a raw mode pmem namespace the physical memory address range for the device can be obtained via /sys/block/pmemX/device/{resource\|size}. Add similar attributes for pfn instances that takes the struct page memmap and section padding into account. Reported-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-03-05 12:25:45 -08:00
Dan Williams	cfe30b8720	libnvdimm, pmem: adjust for section collisions with 'System RAM' On a platform where 'Persistent Memory' and 'System RAM' are mixed within a given sparsemem section, trim the namespace and notify about the sub-optimal alignment. Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-03-05 12:25:45 -08:00
Dan Williams	d9cbe09d39	libnvdimm, pmem: fix 'pfn' support for section-misaligned namespaces The altmap for a section-misaligned namespace needs to arrange for the base_pfn to be section-aligned. As a result the 'reserve' region (pfns from base that do not have a struct page) must be increased. Otherwise we trip the altmap validation check in __add_pages: if (altmap->base_pfn != phys_start_pfn \|\| vmem_altmap_offset(altmap) > nr_pages) { pr_warn_once("memory add fail, invalid altmap\n"); return -EINVAL; } Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-03-05 12:25:44 -08:00
Jerry Hoemann	07accfa9d1	libnvdimm: Fix security issue with DSM IOCTL. Code attempts to prevent certain IOCTL DSM from being called when device is opened read only. This security feature can be trivially overcome by changing the size portion of the ioctl_command which isn't used. Check only the _IOC_NR (i.e. the command). Cc: <stable@vger.kernel.org> Signed-off-by: Jerry Hoemann <jerry.hoemann@hpe.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-03-05 12:24:06 -08:00
Jerry Hoemann	4dc0e7be88	libnvdimm: Clean-up access mode check. Change nd_ioctl and nvdimm_ioctl access mode check to use O_RDONLY. Signed-off-by: Jerry Hoemann <jerry.hoemann@hpe.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-03-05 12:24:06 -08:00
Dan Williams	87bf572e19	nfit: disable userspace initiated ars during scrub While the nfit driver is issuing address range scrub commands and reaping the results do not permit an ars_start command issued from userspace. The scrub thread assumes that all ars completions are for scrubs initiated by platform firmware at boot, or by the nfit driver. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-03-05 12:24:06 -08:00
Dan Williams	7ae0fa439f	nfit, libnvdimm: async region scrub workqueue Introduce a workqueue that will be used to run address range scrub asynchronously with the rest of nvdimm device probing. Userspace still wants notification when probing operations complete, so introduce a new callback to flush this workqueue when userspace is awaiting probe completion. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-03-05 12:24:06 -08:00
Dan Williams	719994660c	libnvdimm: async notification support In preparation for asynchronous address range scrub support add an ability for the pmem driver to dynamically consume address range scrub results. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-03-05 12:24:06 -08:00
Dan Williams	5faecf4eb0	libnvdimm: protect nvdimm_{bus\|namespace}_add_poison() with nvdimm_bus_lock() In preparation for making poison list retrieval asynchronus to region registration, add protection for walking and mutating the bus-level poison list. Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-03-05 12:24:06 -08:00
Dan Williams	aef2533822	libnvdimm, nfit: centralize command status translation The return value from an 'ndctl_fn' reports the command execution status, i.e. was the command properly formatted and was it successfully submitted to the bus provider. The new 'cmd_rc' parameter allows the bus provider to communicate command specific results, translated into common error codes. Convert the ARS commands to this scheme to: 1/ Consolidate status reporting 2/ Prepare for for expanding ars unit test cases 3/ Make the implementation more generic Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-03-05 12:24:06 -08:00
Russell King	1b3bf84797	Merge branches 'amba', 'fixes', 'misc' and 'tauros2' into for-next	2016-03-04 23:36:02 +00:00
Ingo Molnar	bc94b99636	Linux 4.5-rc6 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJW0yM6AAoJEHm+PkMAQRiGeUwIAJRTHFPJTFpJcJjeZEV4/EL1 7Pl0WSHs/CWBkXIevAg2HgkECSQ9NI9FAUFvoGxCldDpFAnL1U2QV8+Ur2qhiXMG 5v0jILJuiw57qT/NfhEudZolerlRoHILmB3JRTb+DUV4GHZuWpTkJfUSI9j5aTEl w83XUgtK4bKeIyFbHdWQk6xqfzfFBSuEITuSXreOMwkFfMmeScE0WXOPLBZWyhPa v0rARJLYgM+vmRAnJjnG8unH+SgnqiNcn2oOFpevKwmpVcOjcEmeuxh/HdeZf7HM /R8F86OwdmXsO+z8dQxfcucLg+I9YmKfFr8b6hopu1sRztss2+Uk6H1j2J7IFIg= =tvkh -----END PGP SIGNATURE----- Merge tag 'v4.5-rc6' into core/resources, to resolve conflict Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-03-04 12:12:08 +01:00
Arnd Bergmann	c45442055d	nvdimm: use 'u64' for pfn flags A recent bugfix changed pfn_t to always be 64-bit wide, but did not change the code in pmem.c, which is now broken on 32-bit architectures as reported by gcc: In file included from ../drivers/nvdimm/pmem.c:28:0: drivers/nvdimm/pmem.c: In function 'pmem_alloc': include/linux/pfn_t.h:15:17: error: large integer implicitly truncated to unsigned type [-Werror=overflow] #define PFN_DEV (1ULL << (BITS_PER_LONG_LONG - 3)) This changes the intermediate pfn_flags in struct pmem_device to be 64 bit wide as well, so they can store the flags correctly. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Fixes: `db78c22230` ("mm: fix pfn_t vs highmem") Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-02-23 17:17:20 -08:00
Dan Williams	4577b06655	nfit: update address range scrub commands to the acpi 6.1 format The original format of these commands from the "NVDIMM DSM Interface Example" [1] are superseded by the ACPI 6.1 definition of the "NVDIMM Root Device _DSMs" [2]. [1]: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf [2]: http://www.uefi.org/sites/default/files/resources/ACPI_6_1.pdf "9.20.7 NVDIMM Root Device _DSMs" Changes include: 1/ New 'restart' fields in ars_status, unfortunately these are implemented in the middle of the existing definition so this change is not backwards compatible. The expectation is that shipping platforms will only ever support the ACPI 6.1 definition. 2/ New status values for ars_start ('busy') and ars_status ('overflow'). Cc: Vishal Verma <vishal.l.verma@intel.com> Cc: Linda Knippers <linda.knippers@hpe.com> Cc: <stable@vger.kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-02-23 17:17:20 -08:00
Dan Williams	747ffe11b4	libnvdimm, tools/testing/nvdimm: fix 'ars_status' output buffer sizing Use the output length specified in the command to size the receive buffer rather than the arbitrary 4K limit. This bug was hiding the fact that the ndctl implementation of ndctl_bus_cmd_new_ars_status() was not specifying an output buffer size. Cc: <stable@vger.kernel.org> Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-02-19 15:21:52 -08:00
Dan Williams	82ec2ba2b1	ARM: 8522/1: drivers: nvdimm: ensure no negative value gets returned on positive match This patch ensures that existing bus match callbacks don't return negative values (which might be interpreted as potential errors in the future) in case of positive match. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>	2016-02-16 16:28:50 +00:00
Toshi Kani	f0f4711aa1	x86, kexec, nvdimm: Use walk_iomem_res_desc() for iomem search Change the callers of walk_iomem_res() scanning for the following resources by name to use walk_iomem_res_desc() instead. "ACPI Tables" "ACPI Non-volatile Storage" "Persistent Memory (legacy)" "Crash kernel" Note, the caller of walk_iomem_res() with "GART" will be removed in a later patch. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Dave Young <dyoung@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Chun-Yi <joeyli.kernel@gmail.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Don Zickus <dzickus@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Lee, Chun-Yi <joeyli.kernel@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Minfei Huang <mnfhuang@gmail.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Takao Indoh <indou.takao@jp.fujitsu.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: kexec@lists.infradead.org Cc: linux-arch@vger.kernel.org Cc: linux-mm <linux-mm@kvack.org> Cc: linux-nvdimm@lists.01.org Link: http://lkml.kernel.org/r/1453841853-11383-15-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>	2016-01-30 09:49:59 +01:00
Dan Williams	45eb570a0d	libnvdimm, pfn: fix restoring memmap location This path was missed when turning on the memmap in pmem support. Permit 'pmem' as a valid location for the map. Reported-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-01-29 17:43:16 -08:00
Dan Williams	9c41242817	libnvdimm: fix mode determination for e820 devices Correctly display "safe" mode when a btt is established on a e820/memmap defined pmem namespace. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-01-26 09:40:32 -08:00
Dan Williams	5c2c2587b1	mm, dax, pmem: introduce {get\|put}_dev_pagemap() for dax-gup get_dev_page() enables paths like get_user_pages() to pin a dynamically mapped pfn-range (devm_memremap_pages()) while the resulting struct page objects are in use. Unlike get_page() it may fail if the device is, or is in the process of being, disabled. While the initial lookup of the range may be an expensive list walk, the result is cached to speed up subsequent lookups which are likely to be in the same mapped range. devm_memremap_pages() now requires a reference counter to be specified at init time. For pmem this means moving request_queue allocation into pmem_alloc() so the existing queue usage counter can track "device pages". ZONE_DEVICE pages always have an elevated count and will never be on an lru reclaim list. That space in 'struct page' can be redirected for other uses, but for safety introduce a poison value that will always trip __list_add() to assert. This allows half of the struct list_head storage to be reclaimed with some assurance to back up the assumption that the page count never goes to zero and a list_add() is never attempted. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Logan Gunthorpe <logang@deltatee.com> Cc: Dave Hansen <dave@sr71.net> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-01-15 17:56:32 -08:00
Dan Williams	468ded03c0	libnvdimm, pmem: move request_queue allocation earlier in probe Before the dynamically allocated struct pages from devm_memremap_pages() can be put to use outside the driver, we need a mechanism to track whether they are still in use at teardown. Towards that goal reorder the initialization sequence to allow the 'q_usage_counter' from the request_queue to be used by the devm_memremap_pages() implementation (in subsequent patches). Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-01-15 17:56:32 -08:00
Dan Williams	d2c0f041e1	libnvdimm, pfn, pmem: allocate memmap array in persistent memory Use the new vmem_altmap capability to enable the pmem driver to arrange for a struct page memmap to be established in persistent memory. [linux@roeck-us.net: mn10300: declare __pfn_to_phys() to fix build error] Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Chinner <david@fromorbit.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-01-15 17:56:32 -08:00
Dan Williams	4b94ffdc41	x86, mm: introduce vmem_altmap to augment vmemmap_populate() In support of providing struct page for large persistent memory capacities, use struct vmem_altmap to change the default policy for allocating memory for the memmap array. The default vmemmap_populate() allocates page table storage area from the page allocator. Given persistent memory capacities relative to DRAM it may not be feasible to store the memmap in 'System Memory'. Instead vmem_altmap represents pre-allocated "device pages" to satisfy vmemmap_alloc_block_buf() requests. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reported-by: kbuild test robot <lkp@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-01-15 17:56:32 -08:00
Dan Williams	9476df7d80	mm: introduce find_dev_pagemap() There are several scenarios where we need to retrieve and update metadata associated with a given devm_memremap_pages() mapping, and the only lookup key available is a pfn in the range: 1/ We want to augment vmemmap_populate() (called via arch_add_memory()) to allocate memmap storage from pre-allocated pages reserved by the device driver. At vmemmap_alloc_block_buf() time it grabs device pages rather than page allocator pages. This is in support of devm_memremap_pages() mappings where the memmap is too large to fit in main memory (i.e. large persistent memory devices). 2/ Taking a reference against the mapping when inserting device pages into the address_space radix of a given inode. This facilitates unmap_mapping_range() and truncate_inode_pages() operations when the driver is tearing down the mapping. 3/ get_user_pages() operations on ZONE_DEVICE memory require taking a reference against the mapping so that the driver teardown path can revoke and drain usage of device pages. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Logan Gunthorpe <logang@deltatee.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Chinner <david@fromorbit.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-01-15 17:56:32 -08:00
Dan Williams	34c0fd540e	mm, dax, pmem: introduce pfn_t For the purpose of communicating the optional presence of a 'struct page' for the pfn returned from ->direct_access(), introduce a type that encapsulates a page-frame-number plus flags. These flags contain the historical "page_link" encoding for a scatterlist entry, but can also denote "device memory". Where "device memory" is a set of pfns that are not part of the kernel's linear mapping by default, but are accessed via the same memory controller as ram. The motivation for this new type is large capacity persistent memory that needs struct page entries in the 'memmap' to support 3rd party DMA (i.e. O_DIRECT I/O with a persistent memory source/target). However, we also need it in support of maintaining a list of mapped inodes which need to be unmapped at driver teardown or freeze_bdev() time. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Dave Hansen <dave@sr71.net> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-01-15 17:56:32 -08:00
Dan Williams	8b63b6bfc1	Merge branch 'for-4.5/block-dax' into for-4.5/libnvdimm	2016-01-10 07:53:55 -08:00
Dan Williams	710d69cc99	libnvdimm, pmem: nvdimm_read_bytes() badblocks support Support badblock checking in all the pmem read paths that do not go through the block layer. This protects info block reads (btt or pfn) as well as data reads to a pmem namespace via a btt instance. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-01-09 22:42:31 -08:00
Dan Williams	57f7f317ab	pmem, dax: disable dax in the presence of bad blocks Longer term teach dax to punch "error" holes in mapping requests and deliver SIGBUS to applications that consume a bad pmem page. For now, simply disable the dax performance optimization in the presence of known errors. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-01-09 22:42:31 -08:00
Dan Williams	e10624f8c0	pmem: fail io-requests to known bad blocks Check the sectors specified in a read bio to see if they hit a known bad block, and return an error code pmem_do_bvec(). Note that the ->rw_page() is not in a position to return errors. For now, copy the same layering violation present in zram_rw_page() to avoid crashes of the form: kernel BUG at mm/filemap.c:822! [..] Call Trace: [<ffffffff811c540e>] page_endio+0x1e/0x60 [<ffffffff81290d29>] mpage_end_io+0x39/0x60 [<ffffffff8141c4ef>] bio_endio+0x3f/0x60 [<ffffffffa005c491>] pmem_make_request+0x111/0x230 [nd_pmem] ...i.e. unlock a page that was already unlocked via pmem_rw_page() => page_endio(). Reported-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-01-09 08:39:04 -08:00
Dan Williams	b95f5f4391	libnvdimm: convert to statically allocated badblocks If a device will ever have badblocks it should always have a badblocks instance available. So, similar to md, embed a badblocks instance in pmem_device. This reduces pointer chasing in the i/o fast path, and simplifies the init path. Reported-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-01-09 08:39:04 -08:00
Dan Williams	87ba05dff3	libnvdimm: don't fail init for full badblocks list If the badblocks list runs out of space it simply means that software is unable to intercept all errors. This is no different than the latent discovery of new badblocks case and should not be an initialization failure condition. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-01-09 08:39:04 -08:00
Dan Williams	ad9a8bde2c	libnvdimm, pmem: move definition of nvdimm_namespace_add_poison to nd.h nd-core.h is private to the libnvdimm core internals and should not be used by drivers. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-01-09 08:39:03 -08:00
Vishal Verma	0caeef63e6	libnvdimm: Add a poison list and export badblocks During region creation, perform Address Range Scrubs (ARS) for the SPA (System Physical Address) ranges to retrieve known poison locations from firmware. Add a new data structure 'nd_poison' which is used as a list in nvdimm_bus to store these poison locations. When creating a pmem namespace, if there is any known poison associated with its physical address space, convert the poison ranges to bad sectors that are exposed using the badblocks interface. Signed-off-by: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-01-09 08:39:03 -08:00
Dan Williams	e07ecd76d4	libnvdimm: fix namespace object confusion in is_uuid_busy() When btt devices were re-worked to be child devices of regions this routine was overlooked. It mistakenly attempts to_nd_namespace_pmem() or to_nd_namespace_blk() conversions on btt and pfn devices. By luck to date we have happened to be hitting valid memory leading to a uuid miscompare, but a recent change to struct nd_namespace_common causes: BUG: unable to handle kernel NULL pointer dereference at 0000000000000001 IP: [<ffffffff814610dc>] memcmp+0xc/0x40 [..] Call Trace: [<ffffffffa0028631>] is_uuid_busy+0xc1/0x2a0 [libnvdimm] [<ffffffffa0028570>] ? to_nd_blk_region+0x50/0x50 [libnvdimm] [<ffffffff8158c9c0>] device_for_each_child+0x50/0x90 Cc: <stable@vger.kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2016-01-05 18:37:23 -08:00
Dan Williams	0731de0dd9	libnvdimm, pfn: move 'memory mode' indication to sysfs 'Memory mode' is defined as the capability of a DAX mapping to be the source/target of DMA and other "direct I/O" scenarios. While it currently requires allocating 'struct page' for each page frame of persistent memory in the namespace it will not always be the case. Work continues on reducing the kernel's dependency on 'struct page'. Let's not maintain a suffix that is expected to lose meaning over time. In other words a future 'raw mode' pmem namespace may be as capable as today's 'memory mode' namespace. Undo the encoding of the mode in the device name and leave it to other tooling to determine the mode of the namespace from its attributes. Reported-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2015-12-24 12:20:20 -08:00
Dan Williams	3fa9626865	libnvdimm, pfn: fix nd_pfn_validate() return value handling The -ENODEV case indicates that the info-block needs to established. All other return codes cause nd_pfn_init() to abort. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2015-12-24 12:20:19 -08:00
Dan Williams	2dc43331e3	libnvdimm, pfn: fix pfn seed creation Similar to btt, plant a new pfn seed when the existing one is activated. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2015-12-13 11:41:36 -08:00
Dan Williams	a34d5e8a6a	libnvdimm, pfn: add parent uuid validation Track and check the uuid of the namespace hosting a pfn instance. This forces the pfn info block to be invalidated if the namespace is re-configured with a different uuid. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2015-12-13 11:40:20 -08:00
Dan Williams	315c562536	libnvdimm, pfn: add 'align' attribute, default to HPAGE_SIZE When setting aside capacity for struct page it must be aligned to the largest mapping size that is to be made available via DAX. Make the alignment configurable to enable support for 1GiB page-size mappings. The offset for PFN_MODE_RAM may now be larger than SZ_8K, so fixup the offset check in nvdimm_namespace_attach_pfn(). Reported-by: Toshi Kani <toshi.kani@hpe.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2015-12-12 15:04:26 -08:00
Dan Williams	f7c6ab80fa	libnvdimm, pfn: clean up pfn create parameters In all cases __nd_pfn_create is called with default parameters which are then overridden by values in the info block. Clean up pfn creation by dropping the parameters and setting default values internal to __nd_pfn_create. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2015-12-10 16:11:59 -08:00
Dan Williams	9f1e8cee77	libnvdimm, pfn: kill ND_PFN_ALIGN The alignment constraint isn't necessary now that devm_memremap_pages() allows for unaligned mappings. Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2015-12-10 15:14:20 -08:00
Dmitry Krivenok	6bb691ac08	nvdimm: do not show pfn_seed for non pmem regions This simple change hides pfn_seed attribute for non pmem regions because they don't support pfn anyway. Signed-off-by: Dmitry V. Krivenok <krivenok.dmitry@gmail.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2015-12-08 16:27:30 -08:00
Dmitry Krivenok	bd26d0d0ce	nvdimm: improve diagnosibility of namespaces In order to bind namespace to the driver user must first set all mandatory attributes in the following order: - uuid - size - sector_size (for blk namespace only) If the order is wrong, then user either won't be able to set the attribute or bind the namespace. This simple patch improves diagnosibility of common operations with namespaces by printing some details about the error instead of failing silently. Below are examples of error messages (assuming dyndbg is enabled for nvdimms): [/]# echo 4194304 > /sys/bus/nd/devices/region5/namespace5.0/size [ 288.372612] nd namespace5.0: __size_store: uuid not set [ 288.374839] nd namespace5.0: size_store: 400000 fail (-6) sh: write error: No such device or address [/]# [/]# echo namespace5.0 > /sys/bus/nd/drivers/nd_blk/bind [ 554.671648] nd_blk namespace5.0: nvdimm_namespace_common_probe: sector size not set [ 554.674688] ndbus1: nd_blk.probe(namespace5.0) = -19 sh: write error: No such device [/]# Signed-off-by: Dmitry V. Krivenok <krivenok.dmitry@gmail.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2015-12-08 16:27:30 -08:00
Dan Williams	589e75d157	libnvdimm, pmem: fix size trim in pmem_direct_access() This masking prevents access to the end of the device via dax_do_io(), and is unnecessary as arch_add_memory() would have rejected an unaligned allocation. Cc: <stable@vger.kernel.org> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2015-11-12 09:55:23 -08:00
Dan Williams	f7256dc0cd	libnvdimm, e820: fix numa node for e820-type-12 pmem ranges Rather than punt on the numa node for these e820 ranges try to find a better answer with memory_add_physaddr_to_nid() when it is available. Cc: <stable@vger.kernel.org> Reported-by: Boaz Harrosh <boaz@plexistor.com> Tested-by: Boaz Harrosh <boaz@plexistor.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>	2015-11-12 09:21:18 -08:00
Linus Torvalds	3419b45039	Merge branch 'for-4.4/io-poll' of git://git.kernel.dk/linux-block Pull block IO poll support from Jens Axboe: "Various groups have been doing experimentation around IO polling for (really) fast devices. The code has been reviewed and has been sitting on the side for a few releases, but this is now good enough for coordinated benchmarking and further experimentation. Currently O_DIRECT sync read/write are supported. A framework is in the works that allows scalable stats tracking so we can auto-tune this. And we'll add libaio support as well soon. Fow now, it's an opt-in feature for test purposes" * 'for-4.4/io-poll' of git://git.kernel.dk/linux-block: direct-io: be sure to assign dio->bio_bdev for both paths directio: add block polling support NVMe: add blk polling support block: add block polling support blk-mq: return tag/queue combo in the make_request_fn handlers block: change ->make_request_fn() and users to return a queue cookie	2015-11-10 17:23:49 -08:00

1 2 3 4 5 ...

312 Commits