Merge misc updates from Andrew Morton:
"146 patches.
Subsystems affected by this patch series: kthread, ia64, scripts,
ntfs, squashfs, ocfs2, vfs, and mm (slab-generic, slab, kmemleak,
dax, kasan, debug, pagecache, gup, shmem, frontswap, memremap,
memcg, selftests, pagemap, dma, vmalloc, memory-failure, hugetlb,
userfaultfd, vmscan, mempolicy, oom-kill, hugetlbfs, migration, thp,
ksm, page-poison, percpu, rmap, zswap, zram, cleanups, hmm, and
damon)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (146 commits)
mm/damon: hide kernel pointer from tracepoint event
mm/damon/vaddr: hide kernel pointer from damon_va_three_regions() failure log
mm/damon/vaddr: use pr_debug() for damon_va_three_regions() failure logging
mm/damon/dbgfs: remove an unnecessary variable
mm/damon: move the implementation of damon_insert_region to damon.h
mm/damon: add access checking for hugetlb pages
Docs/admin-guide/mm/damon/usage: update for schemes statistics
mm/damon/dbgfs: support all DAMOS stats
Docs/admin-guide/mm/damon/reclaim: document statistics parameters
mm/damon/reclaim: provide reclamation statistics
mm/damon/schemes: account how many times quota limit has exceeded
mm/damon/schemes: account scheme actions that successfully applied
mm/damon: remove a mistakenly added comment for a future feature
Docs/admin-guide/mm/damon/usage: update for kdamond_pid and (mk|rm)_contexts
Docs/admin-guide/mm/damon/usage: mention tracepoint at the beginning
Docs/admin-guide/mm/damon/usage: remove redundant information
Docs/admin-guide/mm/damon/usage: update for scheme quotas and watermarks
mm/damon: convert macro functions to static inline functions
mm/damon: modify damon_rand() macro to static inline function
mm/damon: move damon_rand() definition into damon.h
...
Use the newly added compound devmap facility which maps the assigned dax
ranges as compound pages at a page size of @align.
dax devices are created with a fixed @align (huge page size) which is
enforced through as well at mmap() of the device. Faults, consequently
happen too at the specified @align specified at the creation, and those
don't change throughout dax device lifetime. MCEs unmap a whole dax
huge page, as well as splits occurring at the configured page size.
Performance measured by gup_test improves considerably for
unpin_user_pages() and altmap with NVDIMMs:
$ gup_test -f /dev/dax1.0 -m 16384 -r 10 -S -a -n 512 -w
(pin_user_pages_fast 2M pages) put:~71 ms -> put:~22 ms
[altmap]
(pin_user_pages_fast 2M pages) get:~524ms put:~525 ms -> get: ~127ms put:~71ms
$ gup_test -f /dev/dax1.0 -m 129022 -r 10 -S -a -n 512 -w
(pin_user_pages_fast 2M pages) put:~513 ms -> put:~188 ms
[altmap with -m 127004]
(pin_user_pages_fast 2M pages) get:~4.1 secs put:~4.12 secs -> get:~1sec put:~563ms
.. as well as unpin_user_page_range_dirty_lock() being just as effective
as THP/hugetlb[0] pages.
[0] https://lore.kernel.org/linux-mm/20210212130843.13865-5-joao.m.martins@oracle.com/
Link: https://lkml.kernel.org/r/20211202204422.26777-12-joao.m.martins@oracle.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Right now, only static dax regions have a valid @pgmap pointer in its
struct dev_dax. Dynamic dax case however, do not.
In preparation for device-dax compound devmap support, make sure that
dev_dax pgmap field is set after it has been allocated and initialized.
dynamic dax device have the @pgmap is allocated at probe() and it's
managed by devm (contrast to static dax region which a pgmap is provided
and dax core kfrees it). So in addition to ensure a valid @pgmap, clear
the pgmap when the dynamic dax device is released to avoid the same
pgmap ranges to be re-requested across multiple region device reconfigs.
Add a static_dev_dax() and use that helper in dev_dax_probe() to ensure
the initialization differences between dynamic and static regions are
more explicit. While at it, consolidate the ranges initialization when
we allocate the @pgmap for the dynamic dax region case. Also take the
opportunity to document the differences between static and dynamic da
regions.
Link: https://lkml.kernel.org/r/20211202204422.26777-8-joao.m.martins@oracle.com
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Muchun Song <songmuchun@bytedance.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
These methods indirect the actual DAX read/write path. In the end pmem
uses magic flush and mc safe variants and fuse and dcssblk use plain ones
while device mapper picks redirects to the underlying device.
Add set_dax_nocache() and set_dax_nomc() APIs to control which copy
routines are used to remove indirect call from the read/write fast path
as well as a lot of boilerplate code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Vivek Goyal <vgoyal@redhat.com> [virtiofs]
Link: https://lore.kernel.org/r/20211215084508.435401-5-hch@lst.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The device mapper DAX support is all hanging off a block device and thus
can't be used with device dax. Make it depend on CONFIG_FS_DAX instead
of CONFIG_DAX_DRIVER. This also means that bdev_dax_pgoff only needs to
be built under CONFIG_FS_DAX now.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Link: https://lore.kernel.org/r/20211129102203.2243509-3-hch@lst.de
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The /sys/class/dax compatibility option has shipped in the kernel for 4
years now which should be sufficient time for tools to abandon the old
ABI in favor of the /sys/bus/dax device-model. Delete it now and see if
anyone screams.
Since this compatibility option shipped there has been more reports of
users being surprised by the compat ABI than surprised by the "new", so
the compat infrastructure has outlived its usefulness. Recall that
/sys/bus/dax device-model is required for the dax kmem driver which
allows PMEM to be used as "System RAM".
The following projects were known to have a dependency on /sys/class/dax
and have dropped their dependency as of the listed version:
- ndctl (including libndctl, daxctl, and libdaxctl): v64+
- fio: v3.13+
- pmdk: v1.5.2+
As further evidence this option is no longer needed some distributions
have already stopped enabling CONFIG_DEV_DAX_PMEM_COMPAT.
Cc: Ira Weiny <ira.weiny@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Reported-by: Vishal Verma <vishal.l.verma@intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Jane Chu <jane.chu@oracle.com>
Link: https://lore.kernel.org/r/163701116195.3784476.726128179293466337.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Pull libnvdimm updates from Dan Williams:
- Fix a race condition in the teardown path of raw mode pmem
namespaces.
- Cleanup the code that filesystems use to detect filesystem-dax
capabilities of their underlying block device.
* tag 'libnvdimm-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
dax: remove bdev_dax_supported
xfs: factor out a xfs_buftarg_is_dax helper
dax: stub out dax_supported for !CONFIG_FS_DAX
dax: remove __generic_fsdax_supported
dax: move the dax_read_lock() locking into dax_supported
dax: mark dax_get_by_host static
dm: use fs_dax_get_by_bdev instead of dax_get_by_host
dax: stop using bdevname
fsdax: improve the FS_DAX Kconfig description and help text
libnvdimm/pmem: Fix crash triggered when I/O in-flight during unbind
Merge more updates from Andrew Morton:
"147 patches, based on 7d2a07b769.
Subsystems affected by this patch series: mm (memory-hotplug, rmap,
ioremap, highmem, cleanups, secretmem, kfence, damon, and vmscan),
alpha, percpu, procfs, misc, core-kernel, MAINTAINERS, lib,
checkpatch, epoll, init, nilfs2, coredump, fork, pids, criu, kconfig,
selftests, ipc, and scripts"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (94 commits)
scripts: check_extable: fix typo in user error message
mm/workingset: correct kernel-doc notations
ipc: replace costly bailout check in sysvipc_find_ipc()
selftests/memfd: remove unused variable
Kconfig.debug: drop selecting non-existing HARDLOCKUP_DETECTOR_ARCH
configs: remove the obsolete CONFIG_INPUT_POLLDEV
prctl: allow to setup brk for et_dyn executables
pid: cleanup the stale comment mentioning pidmap_init().
kernel/fork.c: unexport get_{mm,task}_exe_file
coredump: fix memleak in dump_vma_snapshot()
fs/coredump.c: log if a core dump is aborted due to changed file permissions
nilfs2: use refcount_dec_and_lock() to fix potential UAF
nilfs2: fix memory leak in nilfs_sysfs_delete_snapshot_group
nilfs2: fix memory leak in nilfs_sysfs_create_snapshot_group
nilfs2: fix memory leak in nilfs_sysfs_delete_##name##_group
nilfs2: fix memory leak in nilfs_sysfs_create_##name##_group
nilfs2: fix NULL pointer in nilfs_##name##_attr_release
nilfs2: fix memory leak in nilfs_sysfs_create_device_group
trap: cleanup trap_init()
init: move usermodehelper_enable() to populate_rootfs()
...
Pull driver core updates from Greg KH:
"Here is the big set of driver core patches for 5.15-rc1.
These do change a number of different things across different
subsystems, and because of that, there were 2 stable tags created that
might have already come into your tree from different pulls that did
the following
- changed the bus remove callback to return void
- sysfs iomem_get_mapping rework
Other than those two things, there's only a few small things in here:
- kernfs performance improvements for huge numbers of sysfs users at
once
- tiny api cleanups
- other minor changes
All of these have been in linux-next for a while with no reported
problems, other than the before-mentioned merge issue"
* tag 'driver-core-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (33 commits)
MAINTAINERS: Add dri-devel for component.[hc]
driver core: platform: Remove platform_device_add_properties()
ARM: tegra: paz00: Handle device properties with software node API
bitmap: extend comment to bitmap_print_bitmask/list_to_buf
drivers/base/node.c: use bin_attribute to break the size limitation of cpumap ABI
topology: use bin_attribute to break the size limitation of cpumap ABI
lib: test_bitmap: add bitmap_print_bitmask/list_to_buf test cases
cpumask: introduce cpumap_print_list/bitmask_to_buf to support large bitmask and list
sysfs: Rename struct bin_attribute member to f_mapping
sysfs: Invoke iomem_get_mapping() from the sysfs open callback
debugfs: Return error during {full/open}_proxy_open() on rmmod
zorro: Drop useless (and hardly used) .driver member in struct zorro_dev
zorro: Simplify remove callback
sh: superhyway: Simplify check in remove callback
nubus: Simplify check in remove callback
nubus: Make struct nubus_driver::remove return void
kernfs: dont call d_splice_alias() under kernfs node lock
kernfs: use i_lock to protect concurrent inode updates
kernfs: switch kernfs to use an rwsem
kernfs: use VFS negative dentry caching
...
gcc warns about an empty body in an else statement:
drivers/dax/bus.c: In function 'do_id_store':
drivers/dax/bus.c:94:48: error: suggest braces around empty body in an 'else' statement [-Werror=empty-body]
94 | /* nothing to remove */;
| ^
drivers/dax/bus.c:99:43: error: suggest braces around empty body in an 'else' statement [-Werror=empty-body]
99 | /* dax_id already added */;
| ^
In both of these cases, the 'else' exists only to have a place to
add a comment, but that comment doesn't really explain that much
either, so the easiest way to shut up that warning is to just
remove the else.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20210322114514.3490752-1-arnd@kernel.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Pull misc vfs updates from Al Viro:
"Assorted stuff pile - no common topic here"
* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
whack-a-mole: don't open-code iminor/imajor
9p: fix misuse of sscanf() in v9fs_stat2inode()
audit_alloc_mark(): don't open-code ERR_CAST()
fs/inode.c: make inode_init_always() initialize i_ino to 0
vfs: don't unnecessarily clone write access for writable fds
Pick up device-dax updates to merge with libnvdimm device updates for
5.12.
* Fix the polarity of EINVAL in a sysfs return code
* Drop the unused return code for driver remove() callbacks
The driver core ignores the return value of struct bus_type::remove()
because there is only little that can be done. To simplify the quest to
make this function return void, let struct dax_device_driver::remove()
return void, too. All users already unconditionally return 0, this commit
makes it obvious that returning an error code isn't intended.
Signed-off-by: Uwe Kleine-König <uwe@kleine-koenig.org>
Link: https://lore.kernel.org/r/20210205222842.34896-6-uwe@kleine-koenig.org
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Pull libnvdimm updates from Dan Williams:
"Twas the day before Christmas and the only thing stirring in libnvdimm
/ device-dax land is a pile of miscellaneous fixups and cleanups.
The bulk of it has appeared in -next save the last two patches to
device-dax that have passed my build and unit tests.
- Fix a long standing block-window-namespace issue surfaced by the
ndctl change to attempt to preserve the kernel device name over
a 'reconfigure'
- Fix a few error path memory leaks in nfit and device-dax
- Silence a smatch warning in the ioctl path
- Miscellaneous cleanups"
* tag 'libnvdimm-for-5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
device-dax: Avoid an unnecessary check in alloc_dev_dax_range()
device-dax: Fix range release
device-dax: delete a redundancy check in dev_dax_validate_align()
libnvdimm/label: Return -ENXIO for no slot in __blk_label_update
device-dax/core: Fix memory leak when rmmod dax.ko
device-dax/pmem: Convert comma to semicolon
libnvdimm: Cleanup include of badblocks.h
ACPI: NFIT: Fix input validation of bus-family
libnvdimm/namespace: Fix reaping of invalidated block-window-namespace labels
ACPI/nfit: avoid accessing uninitialized memory in acpi_nfit_ctl()