Commit Graph

18403 Commits

Author SHA1 Message Date
Joao Martins
5b24eeef06 mm/page_alloc: split prep_compound_page into head and tail subparts
Patch series "mm, device-dax: Introduce compound pages in devmap", v7.

This series converts device-dax to use compound pages, and moves away
from the 'struct page per basepage on PMD/PUD' that is done today.

Doing so
 1) unlocks a few noticeable improvements on unpin_user_pages() and
    makes device-dax+altmap case 4x times faster in pinning (numbers
    below and in last patch)
 2) as mentioned in various other threads it's one important step
    towards cleaning up ZONE_DEVICE refcounting.

I've split the compound pages on devmap part from the rest based on
recent discussions on devmap pending and future work planned[5][6].
There is consensus that device-dax should be using compound pages to
represent its PMD/PUDs just like HugeTLB and THP, and that leads to less
specialization of the dax parts.  I will pursue the rest of the work in
parallel once this part is merged, particular the GUP-{slow,fast}
improvements [7] and the tail struct page deduplication memory savings
part[8].

To summarize what the series does:

Patch 1: Prepare hwpoisoning to work with dax compound pages.

Patches 2-3: Split the current utility function of prep_compound_page()
into head and tail and use those two helpers where appropriate to take
advantage of caches being warm after __init_single_page().  This is used
when initializing zone device when we bring up device-dax namespaces.

Patches 4-10: Add devmap support for compound pages in device-dax.
memmap_init_zone_device() initialize its metadata as compound pages, and
it introduces a new devmap property known as vmemmap_shift which
outlines how the vmemmap is structured (defaults to base pages as done
today).  The property describe the page order of the metadata
essentially.  While at it do a few cleanups in device-dax in patches
5-9.  Finally enable device-dax usage of devmap @vmemmap_shift to a
value based on its own @align property.  @vmemmap_shift returns 0 by
default (which is today's case of base pages in devmap, like fsdax or
the others) and the usage of compound devmap is optional.  Starting with
device-dax (*not* fsdax) we enable it by default.  There are a few
pinning improvements particular on the unpinning case and altmap, as
well as unpin_user_page_range_dirty_lock() being just as effective as
THP/hugetlb[0] pages.

    $ gup_test -f /dev/dax1.0 -m 16384 -r 10 -S -a -n 512 -w
    (pin_user_pages_fast 2M pages) put:~71 ms -> put:~22 ms
    [altmap]
    (pin_user_pages_fast 2M pages) get:~524ms put:~525 ms -> get: ~127ms put:~71ms

     $ gup_test -f /dev/dax1.0 -m 129022 -r 10 -S -a -n 512 -w
    (pin_user_pages_fast 2M pages) put:~513 ms -> put:~188 ms
    [altmap with -m 127004]
    (pin_user_pages_fast 2M pages) get:~4.1 secs put:~4.12 secs -> get:~1sec put:~563ms

Tested on x86 with 1Tb+ of pmem (alongside registering it with RDMA with
and without altmap), alongside gup_test selftests with dynamic dax
regions and static dax regions.  Coupled with ndctl unit tests for
dynamic dax devices that exercise all of this.  Note, for dynamic dax
regions I had to revert commit 8aa83e6395 ("x86/setup: Call
early_reserve_memory() earlier"), it is a known issue that this commit
broke efi_fake_mem=.

This patch (of 11):

Split the utility function prep_compound_page() into head and tail
counterparts, and use them accordingly.

This is in preparation for sharing the storage for compound page
metadata.

Link: https://lkml.kernel.org/r/20211202204422.26777-1-joao.m.martins@oracle.com
Link: https://lkml.kernel.org/r/20211202204422.26777-3-joao.m.martins@oracle.com
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Muchun Song <songmuchun@bytedance.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 16:30:25 +02:00
Kefeng Wang
60115fa54a mm: defer kmemleak object creation of module_alloc()
Yongqiang reports a kmemleak panic when module insmod/rmmod with KASAN
enabled(without KASAN_VMALLOC) on x86[1].

When the module area allocates memory, it's kmemleak_object is created
successfully, but the KASAN shadow memory of module allocation is not
ready, so when kmemleak scan the module's pointer, it will panic due to
no shadow memory with KASAN check.

  module_alloc
    __vmalloc_node_range
      kmemleak_vmalloc
				kmemleak_scan
				  update_checksum
    kasan_module_alloc
      kmemleak_ignore

Note, there is no problem if KASAN_VMALLOC enabled, the modules area
entire shadow memory is preallocated.  Thus, the bug only exits on ARCH
which supports dynamic allocation of module area per module load, for
now, only x86/arm64/s390 are involved.

Add a VM_DEFER_KMEMLEAK flags, defer vmalloc'ed object register of
kmemleak in module_alloc() to fix this issue.

[1] https://lore.kernel.org/all/6d41e2b9-4692-5ec4-b1cd-cbe29ae89739@huawei.com/

[wangkefeng.wang@huawei.com: fix build]
  Link: https://lkml.kernel.org/r/20211125080307.27225-1-wangkefeng.wang@huawei.com
[akpm@linux-foundation.org: simplify ifdefs, per Andrey]
  Link: https://lkml.kernel.org/r/CA+fCnZcnwJHUQq34VuRxpdoY6_XbJCDJ-jopksS5Eia4PijPzw@mail.gmail.com

Link: https://lkml.kernel.org/r/20211124142034.192078-1-wangkefeng.wang@huawei.com
Fixes: 793213a82d ("s390/kasan: dynamic shadow mem allocation for modules")
Fixes: 39d114ddc6 ("arm64: add KASAN support")
Fixes: bebf56a1b1 ("kasan: enable instrumentation of global variables")
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reported-by: Yongqiang Liu <liuyongqiang13@huawei.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 16:30:25 +02:00
Kuan-Ying Lee
ad1a3e15fc kmemleak: fix kmemleak false positive report with HW tag-based kasan enable
With HW tag-based kasan enable, We will get the warning when we free
object whose address starts with 0xFF.

It is because kmemleak rbtree stores tagged object and this freeing
object's tag does not match with rbtree object.

In the example below, kmemleak rbtree stores the tagged object in the
kmalloc(), and kfree() gets the pointer with 0xFF tag.

Call sequence:
    ptr = kmalloc(size, GFP_KERNEL);
    page = virt_to_page(ptr);
    offset = offset_in_page(ptr);
    kfree(page_address(page) + offset);
    ptr = kmalloc(size, GFP_KERNEL);

A sequence like that may cause the warning as following:

 1) Freeing unknown object:

    In kfree(), we will get free unknown object warning in
    kmemleak_free(). Because object(0xFx) in kmemleak rbtree and
    pointer(0xFF) in kfree() have different tag.

 2) Overlap existing:

    When we allocate that object with the same hw-tag again, we will
    find the overlap in the kmemleak rbtree and kmemleak thread will be
    killed.

	kmemleak: Freeing unknown object at 0xffff000003f88000
	CPU: 5 PID: 177 Comm: cat Not tainted 5.16.0-rc1-dirty #21
	Hardware name: linux,dummy-virt (DT)
	Call trace:
	 dump_backtrace+0x0/0x1ac
	 show_stack+0x1c/0x30
	 dump_stack_lvl+0x68/0x84
	 dump_stack+0x1c/0x38
	 kmemleak_free+0x6c/0x70
	 slab_free_freelist_hook+0x104/0x200
	 kmem_cache_free+0xa8/0x3d4
	 test_version_show+0x270/0x3a0
	 module_attr_show+0x28/0x40
	 sysfs_kf_seq_show+0xb0/0x130
	 kernfs_seq_show+0x30/0x40
	 seq_read_iter+0x1bc/0x4b0
	 seq_read_iter+0x1bc/0x4b0
	 kernfs_fop_read_iter+0x144/0x1c0
	 generic_file_splice_read+0xd0/0x184
	 do_splice_to+0x90/0xe0
	 splice_direct_to_actor+0xb8/0x250
	 do_splice_direct+0x88/0xd4
	 do_sendfile+0x2b0/0x344
	 __arm64_sys_sendfile64+0x164/0x16c
	 invoke_syscall+0x48/0x114
	 el0_svc_common.constprop.0+0x44/0xec
	 do_el0_svc+0x74/0x90
	 el0_svc+0x20/0x80
	 el0t_64_sync_handler+0x1a8/0x1b0
	 el0t_64_sync+0x1ac/0x1b0
	...
	kmemleak: Cannot insert 0xf2ff000003f88000 into the object search tree (overlaps existing)
	CPU: 5 PID: 178 Comm: cat Not tainted 5.16.0-rc1-dirty #21
	Hardware name: linux,dummy-virt (DT)
	Call trace:
	 dump_backtrace+0x0/0x1ac
	 show_stack+0x1c/0x30
	 dump_stack_lvl+0x68/0x84
	 dump_stack+0x1c/0x38
	 create_object.isra.0+0x2d8/0x2fc
	 kmemleak_alloc+0x34/0x40
	 kmem_cache_alloc+0x23c/0x2f0
	 test_version_show+0x1fc/0x3a0
	 module_attr_show+0x28/0x40
	 sysfs_kf_seq_show+0xb0/0x130
	 kernfs_seq_show+0x30/0x40
	 seq_read_iter+0x1bc/0x4b0
	 kernfs_fop_read_iter+0x144/0x1c0
	 generic_file_splice_read+0xd0/0x184
	 do_splice_to+0x90/0xe0
	 splice_direct_to_actor+0xb8/0x250
	 do_splice_direct+0x88/0xd4
	 do_sendfile+0x2b0/0x344
	 __arm64_sys_sendfile64+0x164/0x16c
	 invoke_syscall+0x48/0x114
	 el0_svc_common.constprop.0+0x44/0xec
	 do_el0_svc+0x74/0x90
	 el0_svc+0x20/0x80
	 el0t_64_sync_handler+0x1a8/0x1b0
	 el0t_64_sync+0x1ac/0x1b0
	kmemleak: Kernel memory leak detector disabled
	kmemleak: Object 0xf2ff000003f88000 (size 128):
	kmemleak:   comm "cat", pid 177, jiffies 4294921177
	kmemleak:   min_count = 1
	kmemleak:   count = 0
	kmemleak:   flags = 0x1
	kmemleak:   checksum = 0
	kmemleak:   backtrace:
	     kmem_cache_alloc+0x23c/0x2f0
	     test_version_show+0x1fc/0x3a0
	     module_attr_show+0x28/0x40
	     sysfs_kf_seq_show+0xb0/0x130
	     kernfs_seq_show+0x30/0x40
	     seq_read_iter+0x1bc/0x4b0
	     kernfs_fop_read_iter+0x144/0x1c0
	     generic_file_splice_read+0xd0/0x184
	     do_splice_to+0x90/0xe0
	     splice_direct_to_actor+0xb8/0x250
	     do_splice_direct+0x88/0xd4
	     do_sendfile+0x2b0/0x344
	     __arm64_sys_sendfile64+0x164/0x16c
	     invoke_syscall+0x48/0x114
	     el0_svc_common.constprop.0+0x44/0xec
	     do_el0_svc+0x74/0x90
	kmemleak: Automatic memory scanning thread ended

[akpm@linux-foundation.org: whitespace tweak]

Link: https://lkml.kernel.org/r/20211118054426.4123-1-Kuan-Ying.Lee@mediatek.com
Signed-off-by: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Doug Berger <opendmb@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 16:30:25 +02:00
Muchun Song
c29b5b3d33 mm: slab: make slab iterator functions static
There is no external users of slab_start/next/stop(), so make them
static.  And the memory.kmem.slabinfo is deprecated, which outputs
nothing now, so move memcg_slab_show() into mm/memcontrol.c and rename
it to mem_cgroup_slab_show to be consistent with other function names.

Link: https://lkml.kernel.org/r/20211109133359.32881-1-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 16:30:25 +02:00
Marco Elver
7302e91f39 mm/slab_common: use WARN() if cache still has objects on destroy
Calling kmem_cache_destroy() while the cache still has objects allocated
is a kernel bug, and will usually result in the entire cache being
leaked.  While the message in kmem_cache_destroy() resembles a warning,
it is currently not implemented using a real WARN().

This is problematic for infrastructure testing the kernel, all of which
rely on the specific format of WARN()s to pick up on bugs.

Some 13 years ago this used to be a simple WARN_ON() in slub, but commit
d629d81957 ("slub: improve kmem_cache_destroy() error message")
changed it into an open-coded warning to avoid confusion with a bug in
slub itself.

Instead, turn the open-coded warning into a real WARN() with the message
preserved, so that test systems can actually identify these issues, and
we get all the other benefits of using a normal WARN().  The warning
message is extended with "when called from <caller-ip>" to make it even
clearer where the fault lies.

For most configurations this is only a cosmetic change, however, note
that WARN() here will now also respect panic_on_warn.

Link: https://lkml.kernel.org/r/20211102170733.648216-1-elver@google.com
Signed-off-by: Marco Elver <elver@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 16:30:25 +02:00
Linus Torvalds
3acbdbf42e Merge tag 'libnvdimm-for-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull dax and libnvdimm updates from Dan Williams:
 "The bulk of this is a rework of the dax_operations API after
  discovering the obstacles it posed to the work-in-progress DAX+reflink
  support for XFS and other copy-on-write filesystem mechanics.

  Primarily the need to plumb a block_device through the API to handle
  partition offsets was a sticking point and Christoph untangled that
  dependency in addition to other cleanups to make landing the
  DAX+reflink support easier.

  The DAX_PMEM_COMPAT option has been around for 4 years and not only
  are distributions shipping userspace that understand the current
  configuration API, but some are not even bothering to turn this option
  on anymore, so it seems a good time to remove it per the deprecation
  schedule. Recall that this was added after the device-dax subsystem
  moved from /sys/class/dax to /sys/bus/dax for its sysfs organization.
  All recent functionality depends on /sys/bus/dax.

  Some other miscellaneous cleanups and reflink prep patches are
  included as well.

  Summary:

   - Simplify the dax_operations API:

      - Eliminate bdev_dax_pgoff() in favor of the filesystem
        maintaining and applying a partition offset to all its DAX iomap
        operations.

      - Remove wrappers and device-mapper stacked callbacks for
        ->copy_from_iter() and ->copy_to_iter() in favor of moving
        block_device relative offset responsibility to the
        dax_direct_access() caller.

      - Remove the need for an @bdev in filesystem-DAX infrastructure

      - Remove unused uio helpers copy_from_iter_flushcache() and
        copy_mc_to_iter() as only the non-check_copy_size() versions are
        used for DAX.

   - Prepare XFS for the pending (next merge window) DAX+reflink support

   - Remove deprecated DEV_DAX_PMEM_COMPAT support

   - Cleanup a straggling misuse of the GUID api"

* tag 'libnvdimm-for-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (38 commits)
  iomap: Fix error handling in iomap_zero_iter()
  ACPI: NFIT: Import GUID before use
  dax: remove the copy_from_iter and copy_to_iter methods
  dax: remove the DAXDEV_F_SYNC flag
  dax: simplify dax_synchronous and set_dax_synchronous
  uio: remove copy_from_iter_flushcache() and copy_mc_to_iter()
  iomap: turn the byte variable in iomap_zero_iter into a ssize_t
  memremap: remove support for external pgmap refcounts
  fsdax: don't require CONFIG_BLOCK
  iomap: build the block based code conditionally
  dax: fix up some of the block device related ifdefs
  fsdax: shift partition offset handling into the file systems
  dax: return the partition offset from fs_dax_get_by_bdev
  iomap: add a IOMAP_DAX flag
  xfs: pass the mapping flags to xfs_bmbt_to_iomap
  xfs: use xfs_direct_write_iomap_ops for DAX zeroing
  xfs: move dax device handling into xfs_{alloc,free}_buftarg
  ext4: cleanup the dax handling in ext4_fill_super
  ext2: cleanup the dax handling in ext2_fill_super
  fsdax: decouple zeroing from the iomap buffered I/O code
  ...
2022-01-12 15:46:11 -08:00
Linus Torvalds
6020c204be Merge tag 'folio-5.17' of git://git.infradead.org/users/willy/pagecache
Pull folio conversion updates from Matthew Wilcox:
 "Convert much of the page cache to use folios

  This stops just short of actually enabling large folios. It converts
  everything that I noticed needs to be converted, but there may still
  be places I've overlooked which still have page size assumptions.

  The big change here is using large entries in the page cache XArray
  instead of many small entries. That only affects shmem for now, but
  it's a pretty big change for shmem since it changes where memory needs
  to be allocated (at split time instead of insertion)"

* tag 'folio-5.17' of git://git.infradead.org/users/willy/pagecache: (49 commits)
  mm: Use multi-index entries in the page cache
  XArray: Add xas_advance()
  truncate,shmem: Handle truncates that split large folios
  truncate: Convert invalidate_inode_pages2_range to folios
  fs: Convert vfs_dedupe_file_range_compare to folios
  mm: Remove pagevec_remove_exceptionals()
  mm: Convert find_lock_entries() to use a folio_batch
  filemap: Return only folios from find_get_entries()
  filemap: Convert filemap_get_read_batch() to use a folio_batch
  filemap: Convert filemap_read() to use a folio
  truncate: Add invalidate_complete_folio2()
  truncate: Convert invalidate_inode_pages2_range() to use a folio
  truncate: Skip known-truncated indices
  truncate,shmem: Add truncate_inode_folio()
  shmem: Convert part of shmem_undo_range() to use a folio
  mm: Add unmap_mapping_folio()
  truncate: Add truncate_cleanup_folio()
  filemap: Add filemap_release_folio()
  filemap: Use a folio in filemap_page_mkwrite
  filemap: Use a folio in filemap_map_pages
  ...
2022-01-12 12:37:02 -08:00
Linus Torvalds
d3c8108035 Merge tag 'for-5.17/block-2022-01-11' of git://git.kernel.dk/linux-block
Pull block updates from Jens Axboe:

 - Unify where the struct request handling code is located in the blk-mq
   code (Christoph)

 - Header cleanups (Christoph)

 - Clean up the io_context handling code (Christoph, me)

 - Get rid of ->rq_disk in struct request (Christoph)

 - Error handling fix for add_disk() (Christoph)

 - request allocation cleanusp (Christoph)

 - Documentation updates (Eric, Matthew)

 - Remove trivial crypto unregister helper (Eric)

 - Reduce shared tag overhead (John)

 - Reduce poll_stats memory overhead (me)

 - Known indirect function call for dio (me)

 - Use atomic references for struct request (me)

 - Support request list issue for block and NVMe (me)

 - Improve queue dispatch pinning (Ming)

 - Improve the direct list issue code (Keith)

 - BFQ improvements (Jan)

 - Direct completion helper and use it in mmc block (Sebastian)

 - Use raw spinlock for the blktrace code (Wander)

 - fsync error handling fix (Ye)

 - Various fixes and cleanups (Lukas, Randy, Yang, Tetsuo, Ming, me)

* tag 'for-5.17/block-2022-01-11' of git://git.kernel.dk/linux-block: (132 commits)
  MAINTAINERS: add entries for block layer documentation
  docs: block: remove queue-sysfs.rst
  docs: sysfs-block: document virt_boundary_mask
  docs: sysfs-block: document stable_writes
  docs: sysfs-block: fill in missing documentation from queue-sysfs.rst
  docs: sysfs-block: add contact for nomerges
  docs: sysfs-block: sort alphabetically
  docs: sysfs-block: move to stable directory
  block: don't protect submit_bio_checks by q_usage_counter
  block: fix old-style declaration
  nvme-pci: fix queue_rqs list splitting
  block: introduce rq_list_move
  block: introduce rq_list_for_each_safe macro
  block: move rq_list macros to blk-mq.h
  block: drop needless assignment in set_task_ioprio()
  block: remove unnecessary trailing '\'
  bio.h: fix kernel-doc warnings
  block: check minor range in device_add_disk()
  block: use "unsigned long" for blk_validate_block_size().
  block: fix error unwinding in device_add_disk
  ...
2022-01-12 10:26:52 -08:00
Linus Torvalds
1be5bdf8cd Merge tag 'kcsan.2022.01.09a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull KCSAN updates from Paul McKenney:
 "This provides KCSAN fixes and also the ability to take memory barriers
  into account for weakly-ordered systems. This last can increase the
  probability of detecting certain types of data races"

* tag 'kcsan.2022.01.09a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (29 commits)
  kcsan: Only test clear_bit_unlock_is_negative_byte if arch defines it
  kcsan: Avoid nested contexts reading inconsistent reorder_access
  kcsan: Turn barrier instrumentation into macros
  kcsan: Make barrier tests compatible with lockdep
  kcsan: Support WEAK_MEMORY with Clang where no objtool support exists
  compiler_attributes.h: Add __disable_sanitizer_instrumentation
  objtool, kcsan: Remove memory barrier instrumentation from noinstr
  objtool, kcsan: Add memory barrier instrumentation to whitelist
  sched, kcsan: Enable memory barrier instrumentation
  mm, kcsan: Enable barrier instrumentation
  x86/qspinlock, kcsan: Instrument barrier of pv_queued_spin_unlock()
  x86/barriers, kcsan: Use generic instrumentation for non-smp barriers
  asm-generic/bitops, kcsan: Add instrumentation for barriers
  locking/atomics, kcsan: Add instrumentation for barriers
  locking/barriers, kcsan: Support generic instrumentation
  locking/barriers, kcsan: Add instrumentation for barriers
  kcsan: selftest: Add test case to check memory barrier instrumentation
  kcsan: Ignore GCC 11+ warnings about TSan runtime support
  kcsan: test: Add test cases for memory barrier instrumentation
  kcsan: test: Match reordered or normal accesses
  ...
2022-01-11 09:51:26 -08:00
Linus Torvalds
ca1a46d6f5 Merge tag 'slab-for-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab updates from Vlastimil Babka:

 - Separate struct slab from struct page - an offshot of the page folio
   work.

   Struct page fields used by slab allocators are moved from struct page
   to a new struct slab, that uses the same physical storage. Similar to
   struct folio, it always is a head page. This brings better type
   safety, separation of large kmalloc allocations from true slabs, and
   cleanup of related objcg code.

 - A SLAB_MERGE_DEFAULT config optimization.

* tag 'slab-for-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: (33 commits)
  mm/slob: Remove unnecessary page_mapcount_reset() function call
  bootmem: Use page->index instead of page->freelist
  zsmalloc: Stop using slab fields in struct page
  mm/slub: Define struct slab fields for CONFIG_SLUB_CPU_PARTIAL only when enabled
  mm/slub: Simplify struct slab slabs field definition
  mm/sl*b: Differentiate struct slab fields by sl*b implementations
  mm/kfence: Convert kfence_guarded_alloc() to struct slab
  mm/kasan: Convert to struct folio and struct slab
  mm/slob: Convert SLOB to use struct slab and struct folio
  mm/memcg: Convert slab objcgs from struct page to struct slab
  mm: Convert struct page to struct slab in functions used by other subsystems
  mm/slab: Finish struct page to struct slab conversion
  mm/slab: Convert most struct page to struct slab by spatch
  mm/slab: Convert kmem_getpages() and kmem_freepages() to struct slab
  mm/slub: Finish struct page to struct slab conversion
  mm/slub: Convert most struct page to struct slab by spatch
  mm/slub: Convert pfmemalloc_match() to take a struct slab
  mm/slub: Convert __free_slab() to use struct slab
  mm/slub: Convert alloc_slab_page() to return a struct slab
  mm/slub: Convert print_page_info() to print_slab_info()
  ...
2022-01-10 11:58:12 -08:00
Linus Torvalds
bfed6efb8e Merge tag 'x86_sgx_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 SGX updates from Borislav Petkov:

 - Add support for handling hw errors in SGX pages: poisoning,
   recovering from poison memory and error injection into SGX pages

 - A bunch of changes to the SGX selftests to simplify and allow of SGX
   features testing without the need of a whole SGX software stack

 - Add a sysfs attribute which is supposed to show the amount of SGX
   memory in a NUMA node, similar to what /proc/meminfo is to normal
   memory

 - The usual bunch of fixes and cleanups too

* tag 'x86_sgx_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  x86/sgx: Fix NULL pointer dereference on non-SGX systems
  selftests/sgx: Fix corrupted cpuid macro invocation
  x86/sgx: Add an attribute for the amount of SGX memory in a NUMA node
  x86/sgx: Fix minor documentation issues
  selftests/sgx: Add test for multiple TCS entry
  selftests/sgx: Enable multiple thread support
  selftests/sgx: Add page permission and exception test
  selftests/sgx: Rename test properties in preparation for more enclave tests
  selftests/sgx: Provide per-op parameter structs for the test enclave
  selftests/sgx: Add a new kselftest: Unclobbered_vdso_oversubscribed
  selftests/sgx: Move setup_test_encl() to each TEST_F()
  selftests/sgx: Encpsulate the test enclave creation
  selftests/sgx: Dump segments and /proc/self/maps only on failure
  selftests/sgx: Create a heap for the test enclave
  selftests/sgx: Make data measurement for an enclave segment optional
  selftests/sgx: Assign source for each segment
  selftests/sgx: Fix a benign linker warning
  x86/sgx: Add check for SGX pages to ghes_do_memory_failure()
  x86/sgx: Add hook to error injection address validation
  x86/sgx: Hook arch_memory_failure() into mainline code
  ...
2022-01-10 09:44:09 -08:00
Eric W. Biederman
2d4bcf886e exit: Remove profile_task_exit & profile_munmap
When I say remove I mean remove.  All profile_task_exit and
profile_munmap do is call a blocking notifier chain.  The helpers
profile_task_register and profile_task_unregister are not called
anywhere in the tree.  Which means this is all dead code.

So remove the dead code and make it easier to read do_exit.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lkml.kernel.org/r/20220103213312.9144-1-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-01-08 12:43:57 -06:00
Eric W. Biederman
98b24b16b2 signal: Have the oom killer detect coredumps using signal->core_state
In preparation for removing the flag SIGNAL_GROUP_COREDUMP, change
__task_will_free_mem to test signal->core_state instead of the flag
SIGNAL_GROUP_COREDUMP.

Both fields are protected by siglock and both live in signal_struct so
there are no real tradeoffs here, just a change to which field is
being tested.

Link: https://lkml.kernel.org/r/20211213225350.27481-3-ebiederm@xmission.com
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2022-01-08 11:36:24 -06:00
Matthew Wilcox (Oracle)
6b24ca4a1a mm: Use multi-index entries in the page cache
We currently store large folios as 2^N consecutive entries.  While this
consumes rather more memory than necessary, it also turns out to be buggy.
A writeback operation which starts within a tail page of a dirty folio will
not write back the folio as the xarray's dirty bit is only set on the
head index.  With multi-index entries, the dirty bit will be found no
matter where in the folio the operation starts.

This does end up simplifying the page cache slightly, although not as
much as I had hoped.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-08 00:28:41 -05:00
Matthew Wilcox (Oracle)
b9a8a4195c truncate,shmem: Handle truncates that split large folios
Handle folio splitting in the parts of the truncation functions which
already handle partial pages.  Factor all that code out into a new
function called truncate_inode_partial_folio().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-08 00:28:41 -05:00
Matthew Wilcox (Oracle)
f6357c3a9d truncate: Convert invalidate_inode_pages2_range to folios
If we're going to unmap a folio, we have to be sure to unmap the entire
folio, not just the part of it which lies after the search index.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-08 00:28:41 -05:00
Matthew Wilcox (Oracle)
1613fac9aa mm: Remove pagevec_remove_exceptionals()
All of its callers now call folio_batch_remove_exceptionals().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-08 00:28:41 -05:00
Matthew Wilcox (Oracle)
51dcbdac28 mm: Convert find_lock_entries() to use a folio_batch
find_lock_entries() already only returned the head page of folios, so
convert it to return a folio_batch instead of a pagevec.  That cascades
through converting truncate_inode_pages_range() to
delete_from_page_cache_batch() and page_cache_delete_batch().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-08 00:28:41 -05:00
Matthew Wilcox (Oracle)
0e499ed3d7 filemap: Return only folios from find_get_entries()
The callers have all been converted to work on folios, so convert
find_get_entries() to return a batch of folios instead of pages.
We also now return multiple large folios in a single call.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
2022-01-08 00:28:41 -05:00
Matthew Wilcox (Oracle)
25d6a23e8d filemap: Convert filemap_get_read_batch() to use a folio_batch
This change ripples all the way through the filemap_read() call chain and
removes a lot of messing about converting folios to pages and back again.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-08 00:28:41 -05:00
Matthew Wilcox (Oracle)
d996fc7f61 filemap: Convert filemap_read() to use a folio
We know the pagevec always contains folios, but use page_folio() anyway
instead of casting.  Removes a few calls to legacy functions.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-08 00:28:41 -05:00
Matthew Wilcox (Oracle)
78f426608f truncate: Add invalidate_complete_folio2()
Convert invalidate_complete_page2() to invalidate_complete_folio2().
Use filemap_free_folio() to free the page instead of calling ->freepage
manually.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-08 00:28:41 -05:00
Matthew Wilcox (Oracle)
fae9bc4a90 truncate: Convert invalidate_inode_pages2_range() to use a folio
If we're going to unmap a folio, we have to be sure to unmap the entire
folio, not just the part of it which lies after the search index.

We cannot yet remove the struct page from invalidate_inode_pages2_range()
because the page pointer in the pvec might be a shadow/dax/swap entry
instead of actually a page.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-08 00:28:41 -05:00
Matthew Wilcox (Oracle)
ccbbf761d4 truncate: Skip known-truncated indices
If we've truncated an entire folio, we can skip over all the indices
covered by this folio.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-08 00:28:41 -05:00
Matthew Wilcox (Oracle)
1e84a3d997 truncate,shmem: Add truncate_inode_folio()
Convert all callers of truncate_inode_page() to call
truncate_inode_folio() instead, and move the declaration to mm/internal.h.
Move the assertion that the caller is not passing in a tail page to
generic_error_remove_page().  We can't entirely remove the struct page
from the callers yet because the page pointer in the pvec might be a
shadow/dax/swap entry instead of actually a page.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-08 00:28:41 -05:00
Matthew Wilcox (Oracle)
7b774aab79 shmem: Convert part of shmem_undo_range() to use a folio
find_lock_entries() never returns tail pages.  We cannot use page_folio()
here as the pagevec may also contain swap entries, so simply cast for
now.  This is an intermediate step which will be fully removed by the
end of this series.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-08 00:28:41 -05:00
Matthew Wilcox (Oracle)
3506659e18 mm: Add unmap_mapping_folio()
Convert both callers of unmap_mapping_page() to call unmap_mapping_folio()
instead.  Also move zap_details from linux/mm.h to mm/memory.c

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
2022-01-08 00:28:32 -05:00
Matthew Wilcox (Oracle)
07f910f9b7 mm: Remove slab from struct page
All members of struct slab can now be removed from struct page.
This shrinks the definition of struct page by 30 LOC, making
it easier to understand.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2022-01-06 18:06:58 +01:00
Hyeonggon Yoo
b01af5c0b0 mm/slob: Remove unnecessary page_mapcount_reset() function call
After commit 401fb12c68 ("mm/sl*b: Differentiate struct slab fields by
sl*b implementations"), we can reorder fields of struct slab depending
on slab allocator.

For now, page_mapcount_reset() is called because page->_mapcount and
slab->units have same offset. But this is not necessary for struct slab.
Use unused field for units instead.

Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Link: https://lore.kernel.org/r/20211212065241.GA886691@odroid
2022-01-06 12:27:28 +01:00
Matthew Wilcox (Oracle)
c5e97ed154 bootmem: Use page->index instead of page->freelist
page->freelist is for the use of slab.  Using page->index is the same
set of bits as page->freelist, and by using an integer instead of a
pointer, we can avoid casts.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: <x86@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
2022-01-06 12:27:03 +01:00
Matthew Wilcox (Oracle)
ffedd09fa9 zsmalloc: Stop using slab fields in struct page
The ->freelist and ->units members of struct page are for the use of
slab only.  I'm not particularly familiar with zsmalloc, so generate the
same code by using page->index to store 'page' (page->index and
page->freelist are at the same offset in struct page).  This should be
cleaned up properly at some point by somebody who is familiar with
zsmalloc.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Nitin Gupta <ngupta@vflare.org>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
2022-01-06 12:27:03 +01:00
Vlastimil Babka
9c01e9af17 mm/slub: Define struct slab fields for CONFIG_SLUB_CPU_PARTIAL only when enabled
The fields 'next' and 'slabs' are only used when CONFIG_SLUB_CPU_PARTIAL
is enabled. We can put their definition to #ifdef to prevent accidental
use when disabled.

Currenlty show_slab_objects() and slabs_cpu_partial_show() contain code
accessing the slabs field that's effectively dead with
CONFIG_SLUB_CPU_PARTIAL=n through the wrappers slub_percpu_partial() and
slub_percpu_partial_read_once(), but to prevent a compile error, we need
to hide all this code behind #ifdef.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
2022-01-06 12:26:53 +01:00
Vlastimil Babka
662188c3a2 mm/slub: Simplify struct slab slabs field definition
Before commit b47291ef02 ("mm, slub: change percpu partial accounting
from objects to pages") we had to fit two integer fields into a native
word size, so we used short int on 32-bit and int on 64-bit via #ifdef.
After that commit there is only one integer field, so we can simply
define it as int everywhere.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Roman Gushchin <guro@fb.com>
2022-01-06 12:26:45 +01:00
Vlastimil Babka
401fb12c68 mm/sl*b: Differentiate struct slab fields by sl*b implementations
With a struct slab definition separate from struct page, we can go
further and define only fields that the chosen sl*b implementation uses.
This means everything between __page_flags and __page_refcount
placeholders now depends on the chosen CONFIG_SL*B. Some fields exist in
all implementations (slab_list) but can be part of a union in some, so
it's simpler to repeat them than complicate the definition with ifdefs
even more.

The patch doesn't change physical offsets of the fields, although it
could be done later - for example it's now clear that tighter packing in
SLOB could be possible.

This should also prevent accidental use of fields that don't exist in
given implementation. Before this patch virt_to_cache() and
cache_from_obj() were visible for SLOB (albeit not used), although they
rely on the slab_cache field that isn't set by SLOB. With this patch
it's now a compile error, so these functions are now hidden behind
an #ifndef CONFIG_SLOB.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Roman Gushchin <guro@fb.com>
Tested-by: Marco Elver <elver@google.com> # kfence
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: <kasan-dev@googlegroups.com>
2022-01-06 12:26:29 +01:00
Vlastimil Babka
8dae0cfed5 mm/kfence: Convert kfence_guarded_alloc() to struct slab
The function sets some fields that are being moved from struct page to
struct slab so it needs to be converted.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Tested-by: Marco Elver <elver@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: <kasan-dev@googlegroups.com>
2022-01-06 12:26:29 +01:00
Matthew Wilcox (Oracle)
6e48a966df mm/kasan: Convert to struct folio and struct slab
KASAN accesses some slab related struct page fields so we need to
convert it to struct slab. Some places are a bit simplified thanks to
kasan_addr_to_slab() encapsulating the PageSlab flag check through
virt_to_slab().  When resolving object address to either a real slab or
a large kmalloc, use struct folio as the intermediate type for testing
the slab flag to avoid unnecessary implicit compound_head().

[ vbabka@suse.cz: use struct folio, adjust to differences in previous
  patches ]

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Tested-by: Hyeongogn Yoo <42.hyeyoo@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: <kasan-dev@googlegroups.com>
2022-01-06 12:26:14 +01:00
Matthew Wilcox (Oracle)
50757018b4 mm/slob: Convert SLOB to use struct slab and struct folio
Use struct slab throughout the slob allocator. Where non-slab page can
appear use struct folio instead of struct page.

[ vbabka@suse.cz: don't introduce wrappers for PageSlobFree in mm/slab.h
  just for the single callers being wrappers in mm/slob.c ]

[ Hyeonggon Yoo <42.hyeyoo@gmail.com>: fix NULL pointer deference ]

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
2022-01-06 12:26:14 +01:00
Vlastimil Babka
4b5f8d9a89 mm/memcg: Convert slab objcgs from struct page to struct slab
page->memcg_data is used with MEMCG_DATA_OBJCGS flag only for slab pages
so convert all the related infrastructure to struct slab. Also use
struct folio instead of struct page when resolving object pointers.

This is not just mechanistic changing of types and names. Now in
mem_cgroup_from_obj() we use folio_test_slab() to decide if we interpret
the folio as a real slab instead of a large kmalloc, instead of relying
on MEMCG_DATA_OBJCGS bit that used to be checked in page_objcgs_check().
Similarly in memcg_slab_free_hook() where we can encounter
kmalloc_large() pages (here the folio slab flag check is implied by
virt_to_slab()). As a result, page_objcgs_check() can be dropped instead
of converted.

To avoid include cycles, move the inline definition of slab_objcgs()
from memcontrol.h to mm/slab.h.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Roman Gushchin <guro@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: <cgroups@vger.kernel.org>
2022-01-06 12:26:14 +01:00
Vlastimil Babka
40f3bf0cb0 mm: Convert struct page to struct slab in functions used by other subsystems
KASAN, KFENCE and memcg interact with SLAB or SLUB internals through
functions nearest_obj(), obj_to_index() and objs_per_slab() that use
struct page as parameter. This patch converts it to struct slab
including all callers, through a coccinelle semantic patch.

// Options: --include-headers --no-includes --smpl-spacing include/linux/slab_def.h include/linux/slub_def.h mm/slab.h mm/kasan/*.c mm/kfence/kfence_test.c mm/memcontrol.c mm/slab.c mm/slub.c
// Note: needs coccinelle 1.1.1 to avoid breaking whitespace

@@
@@

-objs_per_slab_page(
+objs_per_slab(
 ...
 )
 { ... }

@@
@@

-objs_per_slab_page(
+objs_per_slab(
 ...
 )

@@
identifier fn =~ "obj_to_index|objs_per_slab";
@@

 fn(...,
-   const struct page *page
+   const struct slab *slab
    ,...)
 {
<...
(
- page_address(page)
+ slab_address(slab)
|
- page
+ slab
)
...>
 }

@@
identifier fn =~ "nearest_obj";
@@

 fn(...,
-   struct page *page
+   const struct slab *slab
    ,...)
 {
<...
(
- page_address(page)
+ slab_address(slab)
|
- page
+ slab
)
...>
 }

@@
identifier fn =~ "nearest_obj|obj_to_index|objs_per_slab";
expression E;
@@

 fn(...,
(
- slab_page(E)
+ E
|
- virt_to_page(E)
+ virt_to_slab(E)
|
- virt_to_head_page(E)
+ virt_to_slab(E)
|
- page
+ page_slab(page)
)
  ,...)

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Julia Lawall <julia.lawall@inria.fr>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: <kasan-dev@googlegroups.com>
Cc: <cgroups@vger.kernel.org>
2022-01-06 12:26:13 +01:00
Vlastimil Babka
dd35f71a1d mm/slab: Finish struct page to struct slab conversion
Change cache_free_alien() to use slab_nid(virt_to_slab()). Otherwise
just update of comments and some remaining variable names.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Reviewed-by: Roman Gushchin <guro@fb.com>
2022-01-06 12:26:10 +01:00
Vlastimil Babka
7981e67efb mm/slab: Convert most struct page to struct slab by spatch
The majority of conversion from struct page to struct slab in SLAB
internals can be delegated to a coccinelle semantic patch. This includes
renaming of variables with 'page' in name to 'slab', and similar.

Big thanks to Julia Lawall and Luis Chamberlain for help with
coccinelle.

// Options: --include-headers --no-includes --smpl-spacing mm/slab.c
// Note: needs coccinelle 1.1.1 to avoid breaking whitespace, and ocaml for the
// embedded script

// build list of functions for applying the next rule
@initialize:ocaml@
@@

let ok_function p =
  not (List.mem (List.hd p).current_element ["kmem_getpages";"kmem_freepages"])

// convert the type in selected functions
@@
position p : script:ocaml() { ok_function p };
@@

- struct page@p
+ struct slab

@@
@@

-PageSlabPfmemalloc(page)
+slab_test_pfmemalloc(slab)

@@
@@

-ClearPageSlabPfmemalloc(page)
+slab_clear_pfmemalloc(slab)

@@
@@

obj_to_index(
 ...,
- page
+ slab_page(slab)
,...)

// for all functions, change any "struct slab *page" parameter to "struct slab
// *slab" in the signature, and generally all occurences of "page" to "slab" in
// the body - with some special cases.
@@
identifier fn;
expression E;
@@

 fn(...,
-   struct slab *page
+   struct slab *slab
    ,...)
 {
<...
(
- int page_node;
+ int slab_node;
|
- page_node
+ slab_node
|
- page_slab(page)
+ slab
|
- page_address(page)
+ slab_address(slab)
|
- page_size(page)
+ slab_size(slab)
|
- page_to_nid(page)
+ slab_nid(slab)
|
- virt_to_head_page(E)
+ virt_to_slab(E)
|
- page
+ slab
)
...>
 }

// rename a function parameter
@@
identifier fn;
expression E;
@@

 fn(...,
-   int page_node
+   int slab_node
    ,...)
 {
<...
- page_node
+ slab_node
...>
 }

// functions converted by previous rules that were temporarily called using
// slab_page(E) so we want to remove the wrapper now that they accept struct
// slab ptr directly
@@
identifier fn =~ "index_to_obj";
expression E;
@@

 fn(...,
- slab_page(E)
+ E
 ,...)

// functions that were returning struct page ptr and now will return struct
// slab ptr, including slab_page() wrapper removal
@@
identifier fn =~ "cache_grow_begin|get_valid_first_slab|get_first_slab";
expression E;
@@

 fn(...)
 {
<...
- slab_page(E)
+ E
...>
 }

// rename any former struct page * declarations
@@
@@

struct slab *
-page
+slab
;

// all functions (with exceptions) with a local "struct slab *page" variable
// that will be renamed to "struct slab *slab"
@@
identifier fn !~ "kmem_getpages|kmem_freepages";
expression E;
@@

 fn(...)
 {
<...
(
- page_slab(page)
+ slab
|
- page_to_nid(page)
+ slab_nid(slab)
|
- kasan_poison_slab(page)
+ kasan_poison_slab(slab_page(slab))
|
- page_address(page)
+ slab_address(slab)
|
- page_size(page)
+ slab_size(slab)
|
- page->pages
+ slab->slabs
|
- page = virt_to_head_page(E)
+ slab = virt_to_slab(E)
|
- virt_to_head_page(E)
+ virt_to_slab(E)
|
- page
+ slab
)
...>
 }

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Roman Gushchin <guro@fb.com>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Julia Lawall <julia.lawall@inria.fr>
Cc: Luis Chamberlain <mcgrof@kernel.org>
2022-01-06 12:26:02 +01:00
Vlastimil Babka
42c0faac31 mm/slab: Convert kmem_getpages() and kmem_freepages() to struct slab
These functions sit at the boundary to page allocator. Also use folio
internally to avoid extra compound_head() when dealing with page flags.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
2022-01-06 12:26:02 +01:00
Vlastimil Babka
c2092c1206 mm/slub: Finish struct page to struct slab conversion
Update comments mentioning pages to mention slabs where appropriate.
Also some goto labels.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Roman Gushchin <guro@fb.com>
2022-01-06 12:26:02 +01:00
Vlastimil Babka
bb192ed9aa mm/slub: Convert most struct page to struct slab by spatch
The majority of conversion from struct page to struct slab in SLUB
internals can be delegated to a coccinelle semantic patch. This includes
renaming of variables with 'page' in name to 'slab', and similar.

Big thanks to Julia Lawall and Luis Chamberlain for help with
coccinelle.

// Options: --include-headers --no-includes --smpl-spacing include/linux/slub_def.h mm/slub.c
// Note: needs coccinelle 1.1.1 to avoid breaking whitespace, and ocaml for the
// embedded script

// build list of functions to exclude from applying the next rule
@initialize:ocaml@
@@

let ok_function p =
  not (List.mem (List.hd p).current_element ["nearest_obj";"obj_to_index";"objs_per_slab_page";"__slab_lock";"__slab_unlock";"free_nonslab_page";"kmalloc_large_node"])

// convert the type from struct page to struct page in all functions except the
// list from previous rule
// this also affects struct kmem_cache_cpu, but that's ok
@@
position p : script:ocaml() { ok_function p };
@@

- struct page@p
+ struct slab

// in struct kmem_cache_cpu, change the name from page to slab
// the type was already converted by the previous rule
@@
@@

struct kmem_cache_cpu {
...
-struct slab *page;
+struct slab *slab;
...
}

// there are many places that use c->page which is now c->slab after the
// previous rule
@@
struct kmem_cache_cpu *c;
@@

-c->page
+c->slab

@@
@@

struct kmem_cache {
...
- unsigned int cpu_partial_pages;
+ unsigned int cpu_partial_slabs;
...
}

@@
struct kmem_cache *s;
@@

- s->cpu_partial_pages
+ s->cpu_partial_slabs

@@
@@

static void
- setup_page_debug(
+ setup_slab_debug(
 ...)
 {...}

@@
@@

- setup_page_debug(
+ setup_slab_debug(
 ...);

// for all functions (with exceptions), change any "struct slab *page"
// parameter to "struct slab *slab" in the signature, and generally all
// occurences of "page" to "slab" in the body - with some special cases.

@@
identifier fn !~ "free_nonslab_page|obj_to_index|objs_per_slab_page|nearest_obj";
@@
 fn(...,
-   struct slab *page
+   struct slab *slab
    ,...)
 {
<...
- page
+ slab
...>
 }

// similar to previous but the param is called partial_page
@@
identifier fn;
@@

 fn(...,
-   struct slab *partial_page
+   struct slab *partial_slab
    ,...)
 {
<...
- partial_page
+ partial_slab
...>
 }

// similar to previous but for functions that take pointer to struct page ptr
@@
identifier fn;
@@

 fn(...,
-   struct slab **ret_page
+   struct slab **ret_slab
    ,...)
 {
<...
- ret_page
+ ret_slab
...>
 }

// functions converted by previous rules that were temporarily called using
// slab_page(E) so we want to remove the wrapper now that they accept struct
// slab ptr directly
@@
identifier fn =~ "slab_free|do_slab_free";
expression E;
@@

 fn(...,
- slab_page(E)
+ E
  ,...)

// similar to previous but for another pattern
@@
identifier fn =~ "slab_pad_check|check_object";
@@

 fn(...,
- folio_page(folio, 0)
+ slab
  ,...)

// functions that were returning struct page ptr and now will return struct
// slab ptr, including slab_page() wrapper removal
@@
identifier fn =~ "allocate_slab|new_slab";
expression E;
@@

 static
-struct slab *
+struct slab *
 fn(...)
 {
<...
- slab_page(E)
+ E
...>
 }

// rename any former struct page * declarations
@@
@@

struct slab *
(
- page
+ slab
|
- partial_page
+ partial_slab
|
- oldpage
+ oldslab
)
;

// this has to be separate from previous rule as page and page2 appear at the
// same line
@@
@@

struct slab *
-page2
+slab2
;

// similar but with initial assignment
@@
expression E;
@@

struct slab *
(
- page
+ slab
|
- flush_page
+ flush_slab
|
- discard_page
+ slab_to_discard
|
- page_to_unfreeze
+ slab_to_unfreeze
)
= E;

// convert most of struct page to struct slab usage inside functions (with
// exceptions), including specific variable renames
@@
identifier fn !~ "nearest_obj|obj_to_index|objs_per_slab_page|__slab_(un)*lock|__free_slab|free_nonslab_page|kmalloc_large_node";
expression E;
@@

 fn(...)
 {
<...
(
- int pages;
+ int slabs;
|
- int pages = E;
+ int slabs = E;
|
- page
+ slab
|
- flush_page
+ flush_slab
|
- partial_page
+ partial_slab
|
- oldpage->pages
+ oldslab->slabs
|
- oldpage
+ oldslab
|
- unsigned int nr_pages;
+ unsigned int nr_slabs;
|
- nr_pages
+ nr_slabs
|
- unsigned int partial_pages = E;
+ unsigned int partial_slabs = E;
|
- partial_pages
+ partial_slabs
)
...>
 }

// this has to be split out from the previous rule so that lines containing
// multiple matching changes will be fully converted
@@
identifier fn !~ "nearest_obj|obj_to_index|objs_per_slab_page|__slab_(un)*lock|__free_slab|free_nonslab_page|kmalloc_large_node";
@@

 fn(...)
 {
<...
(
- slab->pages
+ slab->slabs
|
- pages
+ slabs
|
- page2
+ slab2
|
- discard_page
+ slab_to_discard
|
- page_to_unfreeze
+ slab_to_unfreeze
)
...>
 }

// after we simply changed all occurences of page to slab, some usages need
// adjustment for slab-specific functions, or use slab_page() wrapper
@@
identifier fn !~ "nearest_obj|obj_to_index|objs_per_slab_page|__slab_(un)*lock|__free_slab|free_nonslab_page|kmalloc_large_node";
@@

 fn(...)
 {
<...
(
- page_slab(slab)
+ slab
|
- kasan_poison_slab(slab)
+ kasan_poison_slab(slab_page(slab))
|
- page_address(slab)
+ slab_address(slab)
|
- page_size(slab)
+ slab_size(slab)
|
- PageSlab(slab)
+ folio_test_slab(slab_folio(slab))
|
- page_to_nid(slab)
+ slab_nid(slab)
|
- compound_order(slab)
+ slab_order(slab)
)
...>
 }

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Cc: Julia Lawall <julia.lawall@inria.fr>
Cc: Luis Chamberlain <mcgrof@kernel.org>
2022-01-06 12:26:02 +01:00
Matthew Wilcox (Oracle)
01b34d1631 mm/slub: Convert pfmemalloc_match() to take a struct slab
Preparatory for mass conversion. Use the new slab_test_pfmemalloc()
helper.  As it doesn't do VM_BUG_ON(!PageSlab()) we no longer need the
pfmemalloc_match_unsafe() variant.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
2022-01-06 12:26:02 +01:00
Vlastimil Babka
4020b4a226 mm/slub: Convert __free_slab() to use struct slab
__free_slab() is on the boundary of distinguishing struct slab and
struct page so start with struct slab but convert to folio for working
with flags and folio_page() to call functions that require struct page.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
2022-01-06 12:26:01 +01:00
Vlastimil Babka
45387b8c14 mm/slub: Convert alloc_slab_page() to return a struct slab
Preparatory, callers convert back to struct page for now.

Also move setting page flags to alloc_slab_page() where we still operate
on a struct page. This means the page->slab_cache pointer is now set
later than the PageSlab flag, which could theoretically confuse some pfn
walker assuming PageSlab means there would be a valid cache pointer. But
as the code had no barriers and used __set_bit() anyway, it could have
happened already, so there shouldn't be such a walker.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Tested-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
2022-01-06 12:26:01 +01:00
Matthew Wilcox (Oracle)
fb012e278d mm/slub: Convert print_page_info() to print_slab_info()
Improve the type safety and prepare for further conversion. For flags
access, convert to folio internally.

[ vbabka@suse.cz: access flags via folio_flags() ]

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
2022-01-06 12:26:01 +01:00
Vlastimil Babka
0393895b09 mm/slub: Convert __slab_lock() and __slab_unlock() to struct slab
These functions operate on the PG_locked page flag, but make them accept
struct slab to encapsulate this implementation detail.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Roman Gushchin <guro@fb.com>
2022-01-06 12:26:01 +01:00
Matthew Wilcox (Oracle)
d835eef4fc mm/slub: Convert kfree() to use a struct slab
Convert kfree(), kmem_cache_free() and ___cache_free() to resolve object
addresses to struct slab, using folio as intermediate step where needed.
Keep passing the result as struct page for now in preparation for mass
conversion of internal functions.

[ vbabka@suse.cz: Use folio as intermediate step when checking for
  large kmalloc pages, and when freeing them - rename
  free_nonslab_page() to free_large_kmalloc() that takes struct folio ]

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Roman Gushchin <guro@fb.com>
2022-01-06 12:25:57 +01:00