linux/arch/arm64/mm
Pingfan Liu c4885bbb3a arm64/mm: save memory access in check_and_switch_context() fast switch path
On arm64, smp_processor_id() reads a per-cpu `cpu_number` variable,
using the per-cpu offset stored in the tpidr_el1 system register. In
some cases we generate a per-cpu address with a sequence like:

  cpu_ptr = &per_cpu(ptr, smp_processor_id());

Which potentially incurs a cache miss for both `cpu_number` and the
in-memory `__per_cpu_offset` array. This can be written more optimally
as:

  cpu_ptr = this_cpu_ptr(ptr);

Which only needs the offset from tpidr_el1, and does not need to
load from memory.

The following two test cases show a small performance improvement measured
on a 46-cpus qualcomm machine with 5.8.0-rc4 kernel.

Test 1: (about 0.3% improvement)
    #cat b.sh
    make clean && make all -j138
    #perf stat --repeat 10 --null --sync sh b.sh

    - before this patch
     Performance counter stats for 'sh b.sh' (10 runs):

                298.62 +- 1.86 seconds time elapsed  ( +-  0.62% )

    - after this patch
     Performance counter stats for 'sh b.sh' (10 runs):

               297.734 +- 0.954 seconds time elapsed  ( +-  0.32% )

Test 2: (about 1.69% improvement)
     'perf stat -r 10 perf bench sched messaging'
        Then sum the total time of 'sched/messaging' by manual.

    - before this patch
      total 0.707 sec for 10 times
    - after this patch
      totol 0.695 sec for 10 times

Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Steve Capper <steve.capper@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Vladimir Murzin <vladimir.murzin@arm.com>
Cc: Jean-Philippe Brucker <jean-philippe@linaro.org>
Link: https://lore.kernel.org/r/1594389852-19949-1-git-send-email-kernelfans@gmail.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2020-07-30 12:58:40 +01:00
..
cache.S arm64: mm: Use modern annotations for assembly functions 2020-01-08 12:23:38 +00:00
context.c arm64/mm: save memory access in check_and_switch_context() fast switch path 2020-07-30 12:58:40 +01:00
copypage.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 234 2019-06-19 17:09:07 +02:00
dma-mapping.c dma-mapping: drop the dev argument to arch_sync_dma_for_* 2019-11-20 20:31:38 +01:00
dump.c mm: don't include asm/pgtable.h if linux/mm.h is already included 2020-06-09 09:39:13 -07:00
extable.c
fault.c mmap locking API: use coccinelle to convert mmap_sem rwsem call sites 2020-06-09 09:39:14 -07:00
flush.c mm: introduce page_size() 2019-09-24 15:54:08 -07:00
hugetlbpage.c arm64/hugetlb: Reserve CMA areas for gigantic pages on 16K and 64K configs 2020-07-15 13:38:03 +01:00
init.c arm64/hugetlb: Reserve CMA areas for gigantic pages on 16K and 64K configs 2020-07-15 13:38:03 +01:00
ioremap.c arm64: remove __iounmap 2019-09-04 13:12:26 +01:00
kasan_init.c mm: consolidate pte_index() and pte_offset_*() definitions 2020-06-09 09:39:14 -07:00
Makefile arm64: mm: convert mm/dump.c to use walk_page_range() 2020-02-04 03:05:25 +00:00
mmap.c arm64, mm: move generic mmap layout functions to mm 2019-09-24 15:54:11 -07:00
mmu.c arm64: mm: reset address tag set by kasan sw tagging 2020-06-15 16:58:13 +01:00
numa.c mm: memblock: replace dereferences of memblock_region.nid with API calls 2020-06-03 20:09:43 -07:00
pageattr.c mm: don't include asm/pgtable.h if linux/mm.h is already included 2020-06-09 09:39:13 -07:00
pgd.c mm: consolidate pgtable_cache_init() and pgd_cache_init() 2019-09-24 15:54:09 -07:00
physaddr.c
proc.S mm: reorder includes after introduction of linux/pgtable.h 2020-06-09 09:39:13 -07:00
ptdump_debugfs.c arm64/mm: Hold memory hotplug lock while walking for kernel page table dump 2020-03-04 15:35:22 +00:00