linux/arch/s390/include/asm
Minchan Kim 99baac21e4 mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem
Nadav reported parallel MADV_DONTNEED on same range has a stale TLB
problem and Mel fixed it[1] and found same problem on MADV_FREE[2].

Quote from Mel Gorman:
 "The race in question is CPU 0 running madv_free and updating some PTEs
  while CPU 1 is also running madv_free and looking at the same PTEs.
  CPU 1 may have writable TLB entries for a page but fail the pte_dirty
  check (because CPU 0 has updated it already) and potentially fail to
  flush.

  Hence, when madv_free on CPU 1 returns, there are still potentially
  writable TLB entries and the underlying PTE is still present so that a
  subsequent write does not necessarily propagate the dirty bit to the
  underlying PTE any more. Reclaim at some unknown time at the future
  may then see that the PTE is still clean and discard the page even
  though a write has happened in the meantime. I think this is possible
  but I could have missed some protection in madv_free that prevents it
  happening."

This patch aims for solving both problems all at once and is ready for
other problem with KSM, MADV_FREE and soft-dirty story[3].

TLB batch API(tlb_[gather|finish]_mmu] uses [inc|dec]_tlb_flush_pending
and mmu_tlb_flush_pending so that when tlb_finish_mmu is called, we can
catch there are parallel threads going on.  In that case, forcefully,
flush TLB to prevent for user to access memory via stale TLB entry
although it fail to gather page table entry.

I confirmed this patch works with [4] test program Nadav gave so this
patch supersedes "mm: Always flush VMA ranges affected by zap_page_range
v2" in current mmotm.

NOTE:

This patch modifies arch-specific TLB gathering interface(x86, ia64,
s390, sh, um).  It seems most of architecture are straightforward but
s390 need to be careful because tlb_flush_mmu works only if
mm->context.flush_mm is set to non-zero which happens only a pte entry
really is cleared by ptep_get_and_clear and friends.  However, this
problem never changes the pte entries but need to flush to prevent
memory access from stale tlb.

[1] http://lkml.kernel.org/r/20170725101230.5v7gvnjmcnkzzql3@techsingularity.net
[2] http://lkml.kernel.org/r/20170725100722.2dxnmgypmwnrfawp@suse.de
[3] http://lkml.kernel.org/r/BD3A0EBE-ECF4-41D4-87FA-C755EA9AB6BD@gmail.com
[4] https://patchwork.kernel.org/patch/9861621/

[minchan@kernel.org: decrease tlb flush pending count in tlb_finish_mmu]
  Link: http://lkml.kernel.org/r/20170808080821.GA31730@bbox
Link: http://lkml.kernel.org/r/20170802000818.4760-7-namit@vmware.com
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
Reported-by: Nadav Amit <namit@vmware.com>
Reported-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-08-10 15:54:07 -07:00
..
fpu s390/fpu: improve kernel_fpu_[begin|end] 2016-08-29 11:05:01 +02:00
trace s390/zcrypt: tracepoint definitions for zcrypt device driver. 2016-12-14 16:33:40 +01:00
airq.h
appldata.h s390/diag: add a statistic for diagnose calls 2015-10-14 14:32:06 +02:00
archrandom.h s390/crypto: Provide s390 specific arch random functionality. 2017-04-26 13:41:35 +02:00
asm-prototypes.h s390/kbuild: enable modversions for symbols exported from asm 2016-12-20 15:22:56 +01:00
atomic_ops.h s390/spinlock: use atomic primitives for spinlocks 2017-04-12 08:43:33 +02:00
atomic.h s390/atomic: refactor atomic primitives 2016-11-11 16:37:33 +01:00
barrier.h s390: more efficient smp barriers 2016-01-12 20:47:05 +02:00
bitops.h s390/bitops: remove outdated comment 2017-03-22 08:29:05 +01:00
bug.h debug: Fix WARN_ON_ONCE() for modules 2017-07-20 12:31:04 +02:00
bugs.h
cache.h s390: use __section macro everywhere 2016-06-13 15:58:23 +02:00
ccwdev.h
ccwgroup.h
checksum.h Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
chpid.h
cio.h vfio: ccw: introduce support for ccw0 2017-03-31 12:55:12 +02:00
clp.h s390/pci: add ioctl interface for CLP 2016-03-07 16:54:32 +01:00
cmb.h s390/cio: use device_lock during cmb activation 2015-10-14 14:32:02 +02:00
cmpxchg.h s390/cmpxchg: remove dead code 2015-10-14 14:32:15 +02:00
compat.h take compat_sys_old_getrlimit() to native syscall 2017-05-27 15:38:06 -04:00
cpacf.h Merge branch 's390forkvm' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into features 2017-04-27 07:34:07 +02:00
cpcmd.h
cpu_mf.h s390/cpu_mf: remove register variable in __ecctr() 2017-03-31 07:53:34 +02:00
cpu.h s390/smp: cleanup core vs. cpu in the SCLP interface 2015-06-25 09:39:24 +02:00
cpufeature.h s390/module: enable generic CPU feature modalias using s390 ELF hwcaps 2015-07-22 09:58:02 +02:00
cputime.h s390/cputime: provide archicture specific cputime_to_nsecs 2017-03-01 09:59:27 +01:00
crw.h s390/cio: Consolidate inline assemblies and related data definitions 2015-12-18 14:59:34 +01:00
css_chars.h
ctl_reg.h KVM: s390: implement instruction execution protection for emulated 2017-06-22 12:41:06 +02:00
current.h
debug.h s390: convert debug_info.ref_count from atomic_t to refcount_t 2017-05-11 16:35:32 +02:00
delay.h
diag.h s390/diag: add diag26c support 2017-06-20 15:44:15 -04:00
dis.h s390/uprobes: fix compile for !KPROBES 2017-05-03 09:08:57 +02:00
dma-mapping.h s390: implement ->mapping_error 2017-06-28 06:54:31 -07:00
dma.h
eadm.h block: introduce new block status code type 2017-06-09 09:27:32 -06:00
ebcdic.h
elf.h s390: reduce ELF_ET_DYN_BASE 2017-07-10 16:32:36 -07:00
exec.h
extable.h s390: switch to extable.h 2017-03-28 18:23:55 -04:00
extmem.h
facility.h s390/facilities: get rid of __ASSEMBLY__ in facility header file 2017-03-22 08:29:18 +01:00
fcx.h s390: use canonical include guard style 2016-06-13 15:58:17 +02:00
ftrace.h s390/dumpstack: get rid of return_address again 2016-10-17 14:44:33 +02:00
futex.h
gmap.h KVM: s390: backup the currently enabled gmap when scheduled out 2016-06-20 09:55:24 +02:00
hardirq.h
hugetlb.h mm/hugetlb: allow architectures to override huge_pte_clear() 2017-07-06 16:24:34 -07:00
hw_irq.h
idals.h Replace <asm/uaccess.h> with <linux/uaccess.h> globally 2016-12-24 11:46:01 -08:00
idle.h s390/udelay: make udelay have busy loop semantics 2015-10-14 14:32:13 +02:00
io.h s390: provide default ioremap and iounmap declaration 2017-06-12 16:26:00 +02:00
ipl.h s390: fix initrd corruptions with gcov/kcov instrumented kernels 2016-12-12 12:11:20 +01:00
irq.h s390: use SPARSE_IRQ 2016-06-13 15:58:24 +02:00
irqflags.h s390/irqflags: optimize irq restore 2016-01-19 12:14:01 +01:00
isc.h vfio: ccw: basic implementation for vfio_ccw driver 2017-03-31 12:55:04 +02:00
itcw.h
jump_label.h s390: add explicit <linux/stringify.h> for jump label 2016-06-13 15:58:16 +02:00
Kbuild s390: use two more generic header files 2017-06-12 16:25:57 +02:00
kdebug.h
kexec.h s390/crash: Remove unused KEXEC_NOTE_BYTES 2017-07-05 07:35:29 +02:00
kprobes.h s390/uprobes: fix compile for !KPROBES 2017-05-03 09:08:57 +02:00
kvm_host.h PPC: 2017-07-06 18:38:31 -07:00
kvm_para.h s390/diag: add a statistic for diagnose calls 2015-10-14 14:32:06 +02:00
linkage.h s390/kernel: move EX_TABLE macros to linkage.h header file 2015-07-22 09:57:59 +02:00
livepatch.h s390: Audit and remove any remaining unnecessary uses of module.h 2017-02-17 07:40:41 +01:00
lowcore.h s390: add a system call for guarded storage 2017-03-22 08:14:25 +01:00
mman.h s390/mm: make TASK_SIZE independent from the number of page table levels 2017-04-25 07:47:32 +02:00
mmu_context.h s390/kvm: avoid global config of vm.alloc_pgste=1 2017-06-13 13:03:41 +02:00
mmu.h s390/kvm: Add use_cmma field to mm_context_t 2017-04-20 13:33:09 +02:00
mmzone.h s390/numa: add core infrastructure 2015-08-03 18:40:25 +02:00
module.h
nmi.h KVM: s390: Inject machine check into the guest 2017-06-28 12:42:32 +02:00
numa.h s390/numa: use correct type for node_to_cpumask_map 2015-09-23 09:18:56 +02:00
os_info.h s390/dump: streamline oldmem copy functions 2015-11-27 09:24:12 +01:00
page-states.h s390/kvm: Add PGSTE manipulation functions 2017-04-20 13:33:08 +02:00
page.h s390/mm: implement 5 level pages tables 2017-06-12 16:25:54 +02:00
pci_clp.h s390/pci: use proper endianness annotations 2017-01-16 07:27:53 +01:00
pci_debug.h
pci_dma.h s390/pci_dma: fix DMA table corruption with > 4 TB main memory 2015-11-27 09:24:15 +01:00
pci_insn.h s390/pci: improve error handling during interrupt deregistration 2017-06-28 07:32:08 +02:00
pci_io.h s390/pci: improve ZPCI_* macros 2016-01-26 12:45:49 +01:00
pci.h s390/pci: fix handling of PEC 306 2017-06-28 07:32:13 +02:00
percpu.h s390/percpu: remove this_cpu_cmpxchg_double_4 2016-03-02 06:44:30 -06:00
perf_event.h s390/cpum_cf: update counter numbers to ecctr limits 2017-03-31 07:53:26 +02:00
pgalloc.h s390/mm: implement 5 level pages tables 2017-06-12 16:25:54 +02:00
pgtable.h s390/mm: add p?d_folded() helper functions 2017-06-12 16:26:00 +02:00
pkey.h s390/pkey: Introduce new API for secure key verification 2017-03-22 08:29:13 +01:00
preempt.h s390/preempt: move preempt_count to the lowcore 2016-11-11 16:37:40 +01:00
processor.h Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux 2017-07-03 15:39:36 -07:00
ptrace.h s390/kvm: avoid global config of vm.alloc_pgste=1 2017-06-13 13:03:41 +02:00
qdio.h s390: remove 31 bit support 2015-03-25 11:49:33 +01:00
reset.h s390/dump: rework CPU register dump code 2015-11-27 09:24:14 +01:00
runtime_instr.h s390: remove 31 bit support 2015-03-25 11:49:33 +01:00
rwsem.h locking/rwsem: Remove rwsem_atomic_add() and rwsem_atomic_update() 2016-06-08 15:16:59 +02:00
schid.h
sclp.h s390/sclp: Detect KSS facility 2017-04-21 11:08:04 +02:00
scsw.h s390/dasd: channel path aware error recovery 2016-12-12 12:05:03 +01:00
seccomp.h s390/seccomp: include generic seccomp header file 2016-04-01 17:20:55 +02:00
sections.h mm: fix section name for .data..ro_after_init 2017-03-31 17:13:30 -07:00
segment.h
serial.h
set_memory.h treewide: move set_memory_* functions away from cacheflush.h 2017-05-08 17:15:13 -07:00
setup.h s390/spinlock: remove compare and delay instruction 2017-04-12 08:43:33 +02:00
shmparam.h
signal.h
sigp.h s390/smp: use sigp condition code define 2017-06-12 16:25:58 +02:00
smp.h s390/smp: initialize cpu_present_mask in setup_arch 2016-12-07 07:23:07 +01:00
sparsemem.h s390: make MAX_PHYSMEM_BITS configurable 2017-03-28 16:55:10 +02:00
spinlock_types.h s390/spinlock: use atomic primitives for spinlocks 2017-04-12 08:43:33 +02:00
spinlock.h s390/spinlock: use atomic primitives for spinlocks 2017-04-12 08:43:33 +02:00
stp.h s390/time: remove ETR support 2016-06-13 15:58:21 +02:00
string.h s390/lib: add missing memory barriers to string inline assemblies 2016-12-14 16:33:41 +01:00
switch_to.h s390: add a system call for guarded storage 2017-03-22 08:14:25 +01:00
syscall.h s390/syscalls: Fix out of bounds arguments access 2017-07-05 07:35:30 +02:00
sysinfo.h S390/sysinfo: use uuid_is_null instead of opencoding it 2017-06-05 16:59:06 +02:00
termios.h
thread_info.h s390/kvm: avoid global config of vm.alloc_pgste=1 2017-06-13 13:03:41 +02:00
timex.h s390/timex: micro optimization for tod_to_ns 2017-03-01 09:59:28 +01:00
tlb.h mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem 2017-08-10 15:54:07 -07:00
tlbflush.h s390/mm,kvm: flush gmap address space with IDTE 2016-08-24 09:23:55 +02:00
topology.h s390/numa: establish cpu to node mapping early 2016-12-07 07:23:25 +01:00
types.h s390: remove 31 bit support 2015-03-25 11:49:33 +01:00
uaccess.h Merge branch 'work.uaccess-unaligned' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2017-07-15 11:17:52 -07:00
unaligned.h
unistd.h s390: ignore pkey system calls 2016-10-17 11:25:25 +02:00
uprobes.h uprobes: remove function declarations from arch/{mips,s390} 2016-10-07 18:46:30 -07:00
user.h
vdso.h s390/time: steer clocksource on STP sync events 2016-10-28 10:09:02 +02:00
vga.h
vtime.h
vtimer.h s390/idle: consolidate idle functions and definitions 2014-10-09 09:14:03 +02:00
vx-insn.h RAID/s390: add SIMD implementation for raid6 gen/xor 2016-08-29 11:05:04 +02:00
xor.h s390/xor: optimized xor routing using the XC instruction 2016-02-23 08:56:17 +01:00