Commit Graph

363 Commits

Author SHA1 Message Date
Joerg Roedel
86cf69f1d8 x86/mm/32: implement arch_sync_kernel_mappings()
Implement the function to sync changes in vmalloc and ioremap ranges to
all page-tables.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vlastimil Babka <vbabka@suse.cz>
Link: http://lkml.kernel.org/r/20200515140023.25469-6-joro@8bytes.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-06-02 10:59:11 -07:00
Andy Lutomirski
ef68017eb5 x86/kvm: Handle async page faults directly through do_page_fault()
KVM overloads #PF to indicate two types of not-actually-page-fault
events.  Right now, the KVM guest code intercepts them by modifying
the IDT and hooking the #PF vector.  This makes the already fragile
fault code even harder to understand, and it also pollutes call
traces with async_page_fault and do_async_page_fault for normal page
faults.

Clean it up by moving the logic into do_page_fault() using a static
branch.  This gets rid of the platform trap_init override mechanism
completely.

[ tglx: Fixed up 32bit, removed error code from the async functions and
  	massaged coding style ]

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200505134059.169270470@linutronix.de
2020-05-19 15:53:57 +02:00
Anshuman Khandual
3122e80efc mm/vma: make vma_is_accessible() available for general use
Lets move vma_is_accessible() helper to include/linux/mm.h which makes it
available for general use.  While here, this replaces all remaining open
encodings for VMA access check with vma_is_accessible().

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Guo Ren <guoren@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Guo Ren <guoren@kernel.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Burton <paulburton@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Will Deacon <will@kernel.org>
Link: http://lkml.kernel.org/r/1582520593-30704-3-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-07 10:43:37 -07:00
Peter Xu
4064b98270 mm: allow VM_FAULT_RETRY for multiple times
The idea comes from a discussion between Linus and Andrea [1].

Before this patch we only allow a page fault to retry once.  We achieved
this by clearing the FAULT_FLAG_ALLOW_RETRY flag when doing
handle_mm_fault() the second time.  This was majorly used to avoid
unexpected starvation of the system by looping over forever to handle the
page fault on a single page.  However that should hardly happen, and after
all for each code path to return a VM_FAULT_RETRY we'll first wait for a
condition (during which time we should possibly yield the cpu) to happen
before VM_FAULT_RETRY is really returned.

This patch removes the restriction by keeping the FAULT_FLAG_ALLOW_RETRY
flag when we receive VM_FAULT_RETRY.  It means that the page fault handler
now can retry the page fault for multiple times if necessary without the
need to generate another page fault event.  Meanwhile we still keep the
FAULT_FLAG_TRIED flag so page fault handler can still identify whether a
page fault is the first attempt or not.

Then we'll have these combinations of fault flags (only considering
ALLOW_RETRY flag and TRIED flag):

  - ALLOW_RETRY and !TRIED:  this means the page fault allows to
                             retry, and this is the first try

  - ALLOW_RETRY and TRIED:   this means the page fault allows to
                             retry, and this is not the first try

  - !ALLOW_RETRY and !TRIED: this means the page fault does not allow
                             to retry at all

  - !ALLOW_RETRY and TRIED:  this is forbidden and should never be used

In existing code we have multiple places that has taken special care of
the first condition above by checking against (fault_flags &
FAULT_FLAG_ALLOW_RETRY).  This patch introduces a simple helper to detect
the first retry of a page fault by checking against both (fault_flags &
FAULT_FLAG_ALLOW_RETRY) and !(fault_flag & FAULT_FLAG_TRIED) because now
even the 2nd try will have the ALLOW_RETRY set, then use that helper in
all existing special paths.  One example is in __lock_page_or_retry(), now
we'll drop the mmap_sem only in the first attempt of page fault and we'll
keep it in follow up retries, so old locking behavior will be retained.

This will be a nice enhancement for current code [2] at the same time a
supporting material for the future userfaultfd-writeprotect work, since in
that work there will always be an explicit userfault writeprotect retry
for protected pages, and if that cannot resolve the page fault (e.g., when
userfaultfd-writeprotect is used in conjunction with swapped pages) then
we'll possibly need a 3rd retry of the page fault.  It might also benefit
other potential users who will have similar requirement like userfault
write-protection.

GUP code is not touched yet and will be covered in follow up patch.

Please read the thread below for more information.

[1] https://lore.kernel.org/lkml/20171102193644.GB22686@redhat.com/
[2] https://lore.kernel.org/lkml/20181230154648.GB9832@redhat.com/

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Brian Geffon <bgeffon@google.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220160246.9790-1-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02 09:35:30 -07:00
Peter Xu
dde1607248 mm: introduce FAULT_FLAG_DEFAULT
Although there're tons of arch-specific page fault handlers, most of them
are still sharing the same initial value of the page fault flags.  Say,
merely all of the page fault handlers would allow the fault to be retried,
and they also allow the fault to respond to SIGKILL.

Let's define a default value for the fault flags to replace those initial
page fault flags that were copied over.  With this, it'll be far easier to
introduce new fault flag that can be used by all the architectures instead
of touching all the archs.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Brian Geffon <bgeffon@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220160238.9694-1-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02 09:35:29 -07:00
Peter Xu
39678191cd x86/mm: use helper fault_signal_pending()
Let's move the fatal signal check even earlier so that we can directly use
the new fault_signal_pending() in x86 mm code.

Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Brian Geffon <bgeffon@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220155353.8676-5-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-04-02 09:35:29 -07:00
Joerg Roedel
763802b53a x86/mm: split vmalloc_sync_all()
Commit 3f8fd02b1b ("mm/vmalloc: Sync unmappings in
__purge_vmap_area_lazy()") introduced a call to vmalloc_sync_all() in
the vunmap() code-path.  While this change was necessary to maintain
correctness on x86-32-pae kernels, it also adds additional cycles for
architectures that don't need it.

Specifically on x86-64 with CONFIG_VMAP_STACK=y some people reported
severe performance regressions in micro-benchmarks because it now also
calls the x86-64 implementation of vmalloc_sync_all() on vunmap().  But
the vmalloc_sync_all() implementation on x86-64 is only needed for newly
created mappings.

To avoid the unnecessary work on x86-64 and to gain the performance
back, split up vmalloc_sync_all() into two functions:

	* vmalloc_sync_mappings(), and
	* vmalloc_sync_unmappings()

Most call-sites to vmalloc_sync_all() only care about new mappings being
synchronized.  The only exception is the new call-site added in the
above mentioned commit.

Shile Zhang directed us to a report of an 80% regression in reaim
throughput.

Fixes: 3f8fd02b1b ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()")
Reported-by: kernel test robot <oliver.sang@intel.com>
Reported-by: Shile Zhang <shile.zhang@linux.alibaba.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Borislav Petkov <bp@suse.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	[GHES]
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20191009124418.8286-1-joro@8bytes.org
Link: https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/4D3JPPHBNOSPFK2KEPC6KGKS6J25AIDB/
Link: http://lkml.kernel.org/r/20191113095530.228959-1-shile.zhang@linux.alibaba.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-03-21 18:56:06 -07:00
Linus Torvalds
bcc8aff6af Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 asm updates from Ingo Molnar:
 "Misc updates:

   - Remove last remaining calls to exception_enter/exception_exit() and
     simplify the entry code some more.

   - Remove force_iret()

   - Add support for "Fast Short Rep Mov", which is available starting
     with Ice Lake Intel CPUs - and make the x86 assembly version of
     memmove() use REP MOV for all sizes when FSRM is available.

   - Micro-optimize/simplify the 32-bit boot code a bit.

   - Use a more future-proof SYSRET instruction mnemonic"

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot: Simplify calculation of output address
  x86/entry/64: Add instruction suffix to SYSRET
  x86: Remove force_iret()
  x86/cpufeatures: Add support for fast short REP; MOVSB
  x86/context-tracking: Remove exception_enter/exit() from KVM_PV_REASON_PAGE_NOT_PRESENT async page fault
  x86/context-tracking: Remove exception_enter/exit() from do_page_fault()
2020-01-28 11:08:13 -08:00
Frederic Weisbecker
ee6352b2c4 x86/context-tracking: Remove exception_enter/exit() from do_page_fault()
do_page_fault(), like other exceptions, is already covered by
user_enter() and user_exit() when the exception triggers in userspace.

As explained in:

  8c84014f3b ("x86/entry: Remove exception_enter() from most trap handlers")

exception_enter/exit() only remained to handle possible page fault from
kernel mode while context tracking is in CONTEXT_USER mode, ie: on
kernel entry before we manage to call user_exit(). The only known
offender was do_fast_syscall_32() fetching EBP register from where
vDSO stashed it.

Meanwhile this got fixed in:

  9999c8c01f ("x86/entry: Call enter_from_user_mode() with IRQs off")

that moved enter_from_user_mode() before the call to get_user().

So we can safely remove it now.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Link: https://lkml.kernel.org/r/20191227163612.10039-2-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-01-07 08:11:23 +01:00
Ingo Molnar
186525bd6b mm, x86/mm: Untangle address space layout definitions from basic pgtable type definitions
- Untangle the somewhat incestous way of how VMALLOC_START is used all across the
  kernel, but is, on x86, defined deep inside one of the lowest level page table headers.
  It doesn't help that vmalloc.h only includes a single asm header:

     #include <asm/page.h>           /* pgprot_t */

  So there was no existing cross-arch way to decouple address layout
  definitions from page.h details. I used this:

   #ifndef VMALLOC_START
   # include <asm/vmalloc.h>
   #endif

  This way every architecture that wants to simplify page.h can do so.

- Also on x86 we had a couple of LDT related inline functions that used
  the late-stage address space layout positions - but these could be
  uninlined without real trouble - the end result is cleaner this way as
  well.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-12-10 10:12:55 +01:00
Joerg Roedel
9a62d20027 x86/mm/32: Sync only to VMALLOC_END in vmalloc_sync_all()
The job of vmalloc_sync_all() is to help the lazy freeing of vmalloc()
ranges: before such vmap ranges are reused we make sure that they are
unmapped from every task's page tables.

This is really easy on pagetable setups where the kernel page tables
are shared between all tasks - this is the case on 32-bit kernels
with SHARED_KERNEL_PMD = 1.

But on !SHARED_KERNEL_PMD 32-bit kernels this involves iterating
over the pgd_list and clearing all pmd entries in the pgds that
are cleared in the init_mm.pgd, which is the reference pagetable
that the vmalloc() code uses.

In that context the current practice of vmalloc_sync_all() iterating
until FIX_ADDR_TOP is buggy:

        for (address = VMALLOC_START & PMD_MASK;
             address >= TASK_SIZE_MAX && address < FIXADDR_TOP;
             address += PMD_SIZE) {
                struct page *page;

Because iterating up to FIXADDR_TOP will involve a lot of non-vmalloc
address ranges:

	VMALLOC -> PKMAP -> LDT -> CPU_ENTRY_AREA -> FIX_ADDR

This is mostly harmless for the FIX_ADDR and CPU_ENTRY_AREA ranges
that don't clear their pmds, but it's lethal for the LDT range,
which relies on having different mappings in different processes,
and 'synchronizing' them in the vmalloc sense corrupts those
pagetable entries (clearing them).

This got particularly prominent with PTI, which turns SHARED_KERNEL_PMD
off and makes this the dominant mapping mode on 32-bit.

To make LDT working again vmalloc_sync_all() must only iterate over
the volatile parts of the kernel address range that are identical
between all processes.

So the correct check in vmalloc_sync_all() is "address < VMALLOC_END"
to make sure the VMALLOC areas are synchronized and the LDT
mapping is not falsely overwritten.

The CPU_ENTRY_AREA and the FIXMAP area are no longer synced either,
but this is not really a proplem since their PMDs get established
during bootup and never change.

This change fixes the ldt_gdt selftest in my setup.

[ mingo: Fixed up the changelog to explain the logic and modified the
         copying to only happen up until VMALLOC_END. ]

Reported-by: Borislav Petkov <bp@suse.de>
Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: hpa@zytor.com
Fixes: 7757d607c6: ("x86/pti: Allow CONFIG_PAGE_TABLE_ISOLATION for x86_32")
Link: https://lkml.kernel.org/r/20191126111119.GA110513@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-11-26 21:53:34 +01:00
Joerg Roedel
8e998fc24d x86/mm: Sync also unmappings in vmalloc_sync_all()
With huge-page ioremap areas the unmappings also need to be synced between
all page-tables. Otherwise it can cause data corruption when a region is
unmapped and later re-used.

Make the vmalloc_sync_one() function ready to sync unmappings and make sure
vmalloc_sync_all() iterates over all page-tables even when an unmapped PMD
is found.

Fixes: 5d72b4fba4 ('x86, mm: support huge I/O mapping capability I/F')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20190719184652.11391-3-joro@8bytes.org
2019-07-22 10:18:30 +02:00
Joerg Roedel
51b75b5b56 x86/mm: Check for pfn instead of page in vmalloc_sync_one()
Do not require a struct page for the mapped memory location because it
might not exist. This can happen when an ioremapped region is mapped with
2MB pages.

Fixes: 5d72b4fba4 ('x86, mm: support huge I/O mapping capability I/F')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20190719184652.11391-2-joro@8bytes.org
2019-07-22 10:18:30 +02:00
Linus Torvalds
c6dd78fcb8 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
 "A set of x86 specific fixes and updates:

   - The CR2 corruption fixes which store CR2 early in the entry code
     and hand the stored address to the fault handlers.

   - Revert a forgotten leftover of the dropped FSGSBASE series.

   - Plug a memory leak in the boot code.

   - Make the Hyper-V assist functionality robust by zeroing the shadow
     page.

   - Remove a useless check for dead processes with LDT

   - Update paravirt and VMware maintainers entries.

   - A few cleanup patches addressing various compiler warnings"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/entry/64: Prevent clobbering of saved CR2 value
  x86/hyper-v: Zero out the VP ASSIST PAGE on allocation
  x86, boot: Remove multiple copy of static function sanitize_boot_params()
  x86/boot/compressed/64: Remove unused variable
  x86/boot/efi: Remove unused variables
  x86/mm, tracing: Fix CR2 corruption
  x86/entry/64: Update comments and sanity tests for create_gap
  x86/entry/64: Simplify idtentry a little
  x86/entry/32: Simplify common_exception
  x86/paravirt: Make read_cr2() CALLEE_SAVE
  MAINTAINERS: Update PARAVIRT_OPS_INTERFACE and VMWARE_HYPERVISOR_INTERFACE
  x86/process: Delete useless check for dead process with LDT
  x86: math-emu: Hide clang warnings for 16-bit overflow
  x86/e820: Use proper booleans instead of 0/1
  x86/apic: Silence -Wtype-limits compiler warnings
  x86/mm: Free sme_early_buffer after init
  x86/boot: Fix memory leak in default_get_smp_config()
  Revert "x86/ptrace: Prevent ptrace from clearing the FS/GS selector" and fix the test
2019-07-20 11:24:49 -07:00
Peter Zijlstra
a0d14b8909 x86/mm, tracing: Fix CR2 corruption
Despite the current efforts to read CR2 before tracing happens there still
exist a number of possible holes:

  idtentry page_fault             do_page_fault           has_error_code=1
    call error_entry
      TRACE_IRQS_OFF
        call trace_hardirqs_off*
          #PF // modifies CR2

      CALL_enter_from_user_mode
        __context_tracking_exit()
          trace_user_exit(0)
            #PF // modifies CR2

    call do_page_fault
      address = read_cr2(); /* whoopsie */

And similar for i386.

Fix it by pulling the CR2 read into the entry code, before any of that
stuff gets a chance to run and ruin things.

Reported-by: He Zhe <zhe.he@windriver.com>
Reported-by: Eiichi Tsukata <devel@etsukata.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: bp@alien8.de
Cc: rostedt@goodmis.org
Cc: torvalds@linux-foundation.org
Cc: hpa@zytor.com
Cc: dave.hansen@linux.intel.com
Cc: jgross@suse.com
Cc: joel@joelfernandes.org
Link: https://lkml.kernel.org/r/20190711114336.116812491@infradead.org

Debugged-by: Steven Rostedt <rostedt@goodmis.org>
2019-07-17 23:17:38 +02:00
Anshuman Khandual
b98cca444d mm, kprobes: generalize and rename notify_page_fault() as kprobe_page_fault()
Architectures which support kprobes have very similar boilerplate around
calling kprobe_fault_handler().  Use a helper function in kprobes.h to
unify them, based on the x86 code.

This changes the behaviour for other architectures when preemption is
enabled.  Previously, they would have disabled preemption while calling
the kprobe handler.  However, preemption would be disabled if this fault
was due to a kprobe, so we know the fault was not due to a kprobe
handler and can simply return failure.

This behaviour was introduced in commit a980c0ef9f ("x86/kprobes:
Refactor kprobes_fault() like kprobe_exceptions_notify()")

[anshuman.khandual@arm.com: export kprobe_fault_handler()]
  Link: http://lkml.kernel.org/r/1561133358-8876-1-git-send-email-anshuman.khandual@arm.com
Link: http://lkml.kernel.org/r/1560420444-25737-1-git-send-email-anshuman.khandual@arm.com
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: James Hogan <jhogan@kernel.org>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-07-16 19:23:22 -07:00
Linus Torvalds
5ad18b2e60 Merge branch 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull force_sig() argument change from Eric Biederman:
 "A source of error over the years has been that force_sig has taken a
  task parameter when it is only safe to use force_sig with the current
  task.

  The force_sig function is built for delivering synchronous signals
  such as SIGSEGV where the userspace application caused a synchronous
  fault (such as a page fault) and the kernel responded with a signal.

  Because the name force_sig does not make this clear, and because the
  force_sig takes a task parameter the function force_sig has been
  abused for sending other kinds of signals over the years. Slowly those
  have been fixed when the oopses have been tracked down.

  This set of changes fixes the remaining abusers of force_sig and
  carefully rips out the task parameter from force_sig and friends
  making this kind of error almost impossible in the future"

* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (27 commits)
  signal/x86: Move tsk inside of CONFIG_MEMORY_FAILURE in do_sigbus
  signal: Remove the signal number and task parameters from force_sig_info
  signal: Factor force_sig_info_to_task out of force_sig_info
  signal: Generate the siginfo in force_sig
  signal: Move the computation of force into send_signal and correct it.
  signal: Properly set TRACE_SIGNAL_LOSE_INFO in __send_signal
  signal: Remove the task parameter from force_sig_fault
  signal: Use force_sig_fault_to_task for the two calls that don't deliver to current
  signal: Explicitly call force_sig_fault on current
  signal/unicore32: Remove tsk parameter from __do_user_fault
  signal/arm: Remove tsk parameter from __do_user_fault
  signal/arm: Remove tsk parameter from ptrace_break
  signal/nds32: Remove tsk parameter from send_sigtrap
  signal/riscv: Remove tsk parameter from do_trap
  signal/sh: Remove tsk parameter from force_sig_info_fault
  signal/um: Remove task parameter from send_sigtrap
  signal/x86: Remove task parameter from send_sigtrap
  signal: Remove task parameter from force_sig_mceerr
  signal: Remove task parameter from force_sig
  signal: Remove task parameter from force_sigsegv
  ...
2019-07-08 21:48:15 -07:00
Andy Lutomirski
e0a446ce39 x86/vsyscall: Document odd SIGSEGV error code for vsyscalls
Even if vsyscall=none, user page faults on the vsyscall page are reported
as though the PROT bit in the error code was set.  Add a comment explaining
why this is probably okay and display the value in the test case.

While at it, explain why the behavior is correct with respect to PKRU.

Modify also the selftest to print the odd error code so that there is a
way to demonstrate the odd behaviour.

If anyone really cares about more accurate emulation, the behaviour could
be changed. But that needs a real good justification.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Jann Horn <jannh@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Kernel Hardening <kernel-hardening@lists.openwall.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/75c91855fd850649ace162eec5495a1354221aaa.1561610354.git.luto@kernel.org
2019-06-28 00:04:39 +02:00
Andy Lutomirski
918ce32509 x86/vsyscall: Show something useful on a read fault
Just segfaulting the application when it tries to read the vsyscall page in
xonly mode is not helpful for those who need to debug it.

Emit a hint.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Jann Horn <jannh@google.com>
Link: https://lkml.kernel.org/r/8016afffe0eab497be32017ad7f6f7030dc3ba66.1561610354.git.luto@kernel.org
2019-06-28 00:04:39 +02:00
Eric W. Biederman
318759b473 signal/x86: Move tsk inside of CONFIG_MEMORY_FAILURE in do_sigbus
Stephen Rothwell <sfr@canb.auug.org.au> reported:
> After merging the userns tree, today's linux-next build (i386 defconfig)
> produced this warning:
>
> arch/x86/mm/fault.c: In function 'do_sigbus':
> arch/x86/mm/fault.c:1017:22: warning: unused variable 'tsk' [-Wunused-variable]
>   struct task_struct *tsk = current;
>                       ^~~
>
> Introduced by commit
>
>   351b6825b3 ("signal: Explicitly call force_sig_fault on current")
>
> The remaining used of "tsk" are protected by CONFIG_MEMORY_FAILURE.

So do the obvious thing and move tsk inside of CONFIG_MEMORY_FAILURE
to prevent introducing new warnings into the build.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2019-06-03 10:23:58 -05:00
Eric W. Biederman
2e1661d267 signal: Remove the task parameter from force_sig_fault
As synchronous exceptions really only make sense against the current
task (otherwise how are you synchronous) remove the task parameter
from from force_sig_fault to make it explicit that is what is going
on.

The two known exceptions that deliver a synchronous exception to a
stopped ptraced task have already been changed to
force_sig_fault_to_task.

The callers have been changed with the following emacs regular expression
(with obvious variations on the architectures that take more arguments)
to avoid typos:

force_sig_fault[(]\([^,]+\)[,]\([^,]+\)[,]\([^,]+\)[,]\W+current[)]
->
force_sig_fault(\1,\2,\3)

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2019-05-29 09:31:43 -05:00
Eric W. Biederman
351b6825b3 signal: Explicitly call force_sig_fault on current
Update the calls of force_sig_fault that pass in a variable that is
set to current earlier to explicitly use current.

This is to make the next change that removes the task parameter
from force_sig_fault easier to verify.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2019-05-29 09:31:43 -05:00
Eric W. Biederman
f8eac9011b signal: Remove task parameter from force_sig_mceerr
All of the callers pass current into force_sig_mceer so remove the
task parameter to make this obvious.

This also makes it clear that force_sig_mceerr passes current
into force_sig_info.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2019-05-27 09:36:28 -05:00
Linus Torvalds
0bc40e549a Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:
 "The changes in here are:

   - text_poke() fixes and an extensive set of executability lockdowns,
     to (hopefully) eliminate the last residual circumstances under
     which we are using W|X mappings even temporarily on x86 kernels.
     This required a broad range of surgery in text patching facilities,
     module loading, trampoline handling and other bits.

   - tweak page fault messages to be more informative and more
     structured.

   - remove DISCONTIGMEM support on x86-32 and make SPARSEMEM the
     default.

   - reduce KASLR granularity on 5-level paging kernels from 512 GB to
     1 GB.

   - misc other changes and updates"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
  x86/mm: Initialize PGD cache during mm initialization
  x86/alternatives: Add comment about module removal races
  x86/kprobes: Use vmalloc special flag
  x86/ftrace: Use vmalloc special flag
  bpf: Use vmalloc special flag
  modules: Use vmalloc special flag
  mm/vmalloc: Add flag for freeing of special permsissions
  mm/hibernation: Make hibernation handle unmapped pages
  x86/mm/cpa: Add set_direct_map_*() functions
  x86/alternatives: Remove the return value of text_poke_*()
  x86/jump-label: Remove support for custom text poker
  x86/modules: Avoid breaking W^X while loading modules
  x86/kprobes: Set instruction page as executable
  x86/ftrace: Set trampoline pages as executable
  x86/kgdb: Avoid redundant comparison of patched code
  x86/alternatives: Use temporary mm for text poking
  x86/alternatives: Initialize temporary mm for patching
  fork: Provide a function for copying init_mm
  uprobes: Initialize uprobes earlier
  x86/mm: Save debug registers when loading a temporary mm
  ...
2019-05-06 16:13:31 -07:00
Jiri Kosina
a65c88e16f x86/mm: Remove in_nmi() warning from 64-bit implementation of vmalloc_fault()
In-NMI warnings have been added to vmalloc_fault() via:

  ebc8827f75 ("x86: Barf when vmalloc and kmemcheck faults happen in NMI")

back in the time when our NMI entry code could not cope with nested NMIs.

These days, it's perfectly fine to take a fault in NMI context and we
don't have to care about the fact that IRET from the fault handler might
cause NMI nesting.

This warning has already been removed from 32-bit implementation of
vmalloc_fault() in:

  6863ea0cda ("x86/mm: Remove in_nmi() warning from vmalloc_fault()")

but the 64-bit version was omitted.

Remove the bogus warning also from 64-bit implementation of vmalloc_fault().

Reported-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 6863ea0cda ("x86/mm: Remove in_nmi() warning from vmalloc_fault()")
Link: http://lkml.kernel.org/r/nycvar.YFH.7.76.1904240902280.9803@cbobk.fhfr.pm
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-24 12:21:35 +02:00
Borislav Petkov
ea2f8d6060 x86/fault: Make fault messages more succinct
So we are going to be staring at those in the next years, let's make
them more succinct. In particular:

 - change "address = " to "address: "

 - "-privileged" reads funny. It should be simply "kernel" or "user"

 - "from kernel code" reads funny too. "kernel mode" or "user mode" is
   more natural.

An actual example says more than 1000 words, of course:

  [    0.248370] BUG: kernel NULL pointer dereference, address: 00000000000005b8
  [    0.249120] #PF: supervisor write access in kernel mode
  [    0.249717] #PF: error_code(0x0002) - not-present page

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dave.hansen@linux.intel.com
Cc: luto@kernel.org
Cc: riel@surriel.com
Cc: sean.j.christopherson@intel.com
Cc: yu-cheng.yu@intel.com
Link: http://lkml.kernel.org/r/20190421183524.GC6048@zn.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-21 20:48:51 +02:00
Sean Christopherson
18ea35c5ed x86/fault: Decode and print #PF oops in human readable form
Linus pointed out that deciphering the raw #PF error code and printing
a more human readable message are two different things, and also that
printing the negative cases is mostly just noise[1].  For example, the
USER bit doesn't mean the fault originated in user code and stating
that an oops wasn't due to a protection keys violation isn't interesting
since an oops on a keys violation is a one-in-a-million scenario.

Remove the per-bit decoding of the error code and instead print:
  - the raw error code
  - why the fault occurred
  - the effective privilege level of the access
  - the type of access
  - whether the fault originated in user code or kernel code

This provides the user with the information needed to triage 99.9% of
oopses without polluting the log with useless information or conflating
the error_code with the CPL.

Sample output:

    BUG: kernel NULL pointer dereference, address = 0000000000000008
    #PF: supervisor-privileged instruction fetch from kernel code
    #PF: error_code(0x0010) - not-present page

    BUG: unable to handle page fault for address = ffffbeef00000000
    #PF: supervisor-privileged instruction fetch from kernel code
    #PF: error_code(0x0010) - not-present page

    BUG: unable to handle page fault for address = ffffc90000230000
    #PF: supervisor-privileged write access from kernel code
    #PF: error_code(0x000b) - reserved bit violation

[1] https://lkml.kernel.org/r/CAHk-=whk_fsnxVMvF1T2fFCaP2WrvSybABrLQCWLJyCvHw6NKA@mail.gmail.com

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20181221213657.27628-3-sean.j.christopherson@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-19 19:31:16 +02:00
Sean Christopherson
f28b11a2ab x86/fault: Reword initial BUG message for unhandled page faults
Reword the NULL pointer dereference case to simply state that a NULL
pointer was dereferenced, i.e. drop "unable to handle" as that implies
that there are instances where the kernel actual does handle NULL
pointer dereferences, which is not true barring funky exception fixup.

For the non-NULL case, replace "kernel paging request" with "page fault"
as the kernel can technically oops on faults that originated in user
code.  Dropping "kernel" also allows future patches to provide detailed
information on where the fault occurred, e.g. user vs. kernel, without
conflicting with the initial BUG message.

In both cases, replace "at address=" with wording more appropriate to
the oops, as "at" may be interpreted as stating that the address is the
RIP of the instruction that faulted.

Last, and probably least, further qualify the NULL-pointer path by
checking that the fault actually originated in kernel code.  It's
technically possible for userspace to map address 0, and not printing
a super specific message is the least of our worries if the kernel does
manage to oops on an actual NULL pointer dereference from userspace.

Before:
    BUG: unable to handle kernel NULL pointer dereference at ffffbeef00000000
    BUG: unable to handle kernel paging request at ffffbeef00000000

After:
    BUG: kernel NULL pointer dereference, address = 0000000000000008
    BUG: unable to handle page fault for address = ffffbeef00000000

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/20181221213657.27628-2-sean.j.christopherson@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2019-04-19 19:31:15 +02:00
Thomas Gleixner
d876b67343 x86/traps: Use cpu_entry_area instead of orig_ist
The orig_ist[] array is a shadow copy of the IST array in the TSS. The
reason why it exists is that older kernels used two TSS variants with
different pointers into the debug stack. orig_ist[] contains the real
starting points.

There is no point anymore to do so because the same information can be
retrieved using the base address of the cpu entry area mapping and the
offsets of the various exception stacks.

No functional change. Preparation for removing orig_ist.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190414160144.784487230@linutronix.de
2019-04-17 13:01:59 +02:00
Thomas Gleixner
8f34c5b5af x86/exceptions: Make IST index zero based
The defines for the exception stack (IST) array in the TSS are using the
SDM convention IST1 - IST7. That causes all sorts of code to subtract 1 for
array indices related to IST. That's confusing at best and does not provide
any value.

Make the indices zero based and fixup the usage sites. The only code which
needs to adjust the 0 based index is the interrupt descriptor setup which
needs to add 1 now.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: "Chang S. Bae" <chang.seok.bae@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: linux-doc@vger.kernel.org
Cc: Nicolai Stange <nstange@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qian Cai <cai@lca.pw>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20190414160144.331772825@linutronix.de
2019-04-17 12:48:00 +02:00
Souptick Joarder
3d3539018d mm: create the new vm_fault_t type
Page fault handlers are supposed to return VM_FAULT codes, but some
drivers/file systems mistakenly return error numbers.  Now that all
drivers/file systems have been converted to use the vm_fault_t return
type, change the type definition to no longer be compatible with 'int'.
By making it an unsigned int, the function prototype becomes
incompatible with a function which returns int.  Sparse will detect any
attempts to return a value which is not a VM_FAULT code.

VM_FAULT_SET_HINDEX and VM_FAULT_GET_HINDEX values are changed to avoid
conflict with other VM_FAULT codes.

[jrdr.linux@gmail.com: fix warnings]
  Link: http://lkml.kernel.org/r/20190109183742.GA24326@jordon-HP-15-Notebook-PC
Link: http://lkml.kernel.org/r/20190108183041.GA12137@jordon-HP-15-Notebook-PC
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: William Kucharski <william.kucharski@oracle.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2019-03-07 18:32:03 -08:00
Colin Ian King
5ccd35287e x86/fault: Fix sign-extend unintended sign extension
show_ldttss() shifts desc.base2 by 24 bit, but base2 is 8 bits of a
bitfield in a u16.

Due to the really great idea of integer promotion in C99 base2 is promoted
to an int, because that's the standard defined behaviour when all values
which can be represented by base2 fit into an int.

Now if bit 7 is set in desc.base2 the result of the shift left by 24 makes
the resulting integer negative and the following conversion to unsigned
long legitmately sign extends first causing the upper bits 32 bits to be
set in the result.

Fix this by casting desc.base2 to unsigned long before the shift.

Detected by CoverityScan, CID#1475635 ("Unintended sign extension")

[ tglx: Reworded the changelog a bit as I actually had to lookup
  	the standard (again) to decode the original one. ]

Fixes: a1a371c468 ("x86/fault: Decode page fault OOPSes better")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: kernel-janitors@vger.kernel.org
Link: https://lkml.kernel.org/r/20181222191116.21831-1-colin.king@canonical.com
2019-01-29 21:58:59 +01:00
Ingo Molnar
a2aa52ab16 x86/fault: Clean up the page fault oops decoder a bit
- Make the oops messages a bit less scary (don't mention 'HW errors')

 - Turn 'PROT USER' (which is visually easily confused with PROT_USER)
   into individual bit descriptors: "[PROT] [USER]".
   This also makes "[normal kernel read fault]" more apparent.

 - De-abbreviate variables to make the code easier to read

 - Use vertical alignment where appropriate.

 - Add comment about string size limits and the helper function.

 - Remove unnecessary line breaks.

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-22 09:38:13 +01:00
Andy Lutomirski
a1a371c468 x86/fault: Decode page fault OOPSes better
One of Linus' favorite hobbies seems to be looking at OOPSes and
decoding the error code in his head.  This is not one of my favorite
hobbies :)

Teach the page fault OOPS hander to decode the error code.  If it's
a !USER fault from user mode, print an explicit note to that effect
and print out the addresses of various tables that might cause such
an error.

With this patch applied, if I intentionally point the LDT at 0x0 and
run the x86 selftests, I get:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
  HW error: normal kernel read fault
  This was a system access from user code
  IDT: 0xfffffe0000000000 (limit=0xfff) GDT: 0xfffffe0000001000 (limit=0x7f)
  LDTR: 0x50 -- base=0x0 limit=0xfff7
  TR: 0x40 -- base=0xfffffe0000003000 limit=0x206f
  PGD 800000000456e067 P4D 800000000456e067 PUD 4623067 PMD 0
  SMP PTI
  CPU: 0 PID: 153 Comm: ldt_gdt_64 Not tainted 4.19.0+ #1317
  Hardware name: ...
  RIP: 0033:0x401454

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/11212acb25980cd1b3030875cd9502414fbb214d.1542841400.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-22 09:24:28 +01:00
Andy Lutomirski
ebb53e2597 x86/fault: Don't try to recover from an implicit supervisor access
This avoids a situation in which we attempt to apply various fixups
that are not intended to handle implicit supervisor accesses from
user mode if we screw up in a way that causes this type of fault.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/9999f151d72ff352265f3274c5ab3a4105090f49.1542841400.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-22 09:23:00 +01:00
Andy Lutomirski
0ed32f1aa6 x86/fault: Remove sw_error_code
All of the fault handling code now corrently checks user_mode(regs)
as needed, and nothing depends on the X86_PF_USER bit being munged.
Get rid of the sw_error code and use hw_error_code everywhere.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/078f5b8ae6e8c79ff8ee7345b5c476c45003e5ac.1542841400.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-22 09:22:59 +01:00
Andy Lutomirski
1ad33f5aec x86/fault: Don't set thread.cr2, etc before OOPSing
The fault handling code sets the cr2, trap_nr, and error_code fields
in thread_struct before OOPSing.  No one reads those fields during
an OOPS, so remove the code to set them.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/d418022aa0fad9cb40467aa7acaf4e95be50ee96.1542667307.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-20 08:44:30 +01:00
Andy Lutomirski
e49d3cbef0 x86/fault: Make error_code sanitization more robust
The error code in a page fault on a kernel address indicates
whether that address is mapped, which should not be revealed in a signal.

The normal code path for a page fault on a kernel address sanitizes the bit,
but the paths for vsyscall emulation and SIGBUS do not.  Both are
harmless, but for subtle reasons.  SIGBUS is never sent for a kernel
address, and vsyscall emulation will never fault on a kernel address
per se because it will fail an access_ok() check instead.

Make the code more robust by adding a helper that sets the relevant
fields and sanitizing the error code in the helper.  This also
cleans up the code -- we had three copies of roughly the same thing.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/b31159bd55bd0c4fa061a20dfd6c429c094bebaa.1542667307.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-20 08:44:29 +01:00
Andy Lutomirski
6ea59b074f x86/fault: Improve the condition for signalling vs OOPSing
__bad_area_nosemaphore() currently checks the X86_PF_USER bit in the
error code to decide whether to send a signal or to treat the fault
as a kernel error.  This can cause somewhat erratic behavior.  The
straightforward cases where the CPL agrees with the hardware USER
bit are all correct, but the other cases are confusing.

 - A user instruction accessing a kernel address with supervisor
   privilege (e.g. a descriptor table access failed).  The USER bit
   will be clear, and we OOPS.  This is correct, because it indicates
   a kernel bug, not a user error.

 - A user instruction accessing a user address with supervisor
   privilege (e.g. a descriptor table was incorrectly pointing at
   user memory).  __bad_area_nosemaphore() will be passed a modified
   error code with the user bit set, and we will send a signal.
   Sending the signal will work (because the regs and the entry
   frame genuinely come from user mode), but we really ought to
   OOPS, as this event indicates a severe kernel bug.

 - A kernel instruction with user privilege (i.e. WRUSS).  This
   should OOPS or get fixed up.  The current code would instead try
   send a signal and malfunction.

Change the logic: a signal should be sent if the faulting context is
user mode *and* the access has user privilege.  Otherwise it's
either a kernel mode fault or a failed implicit access, either of
which should end up in no_context().

Note to -stable maintainers: don't backport this unless you backport
CET.  The bug it fixes is unobservable in current kernels unless
something is extremely wrong.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/10e509c43893170e262e82027ea399130ae81159.1542667307.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-20 08:44:29 +01:00
Andy Lutomirski
e50928d721 x86/fault: Fix SMAP #PF handling buglet for implicit supervisor accesses
Currently, if a user program somehow triggers an implicit supervisor
access to a user address (e.g. if the kernel somehow sets LDTR to a
user address), it will be incorrectly detected as a SMAP violation
if AC is clear and SMAP is enabled.  This is incorrect -- the error
has nothing to do with SMAP.  Fix the condition so that only
accesses with the hardware USER bit set are diagnosed as SMAP
violations.

With the logic fixed, an implicit supervisor access to a user address
will hit the code lower in the function that is intended to handle it
even if SMAP is enabled.  That logic is still a bit buggy, and later
patches will clean it up.

I *think* this code is still correct for WRUSS, and I've added a
comment to that effect.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/d1d1b2e66ef31f884dba172084486ea9423ddcdb.1542667307.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-20 08:44:29 +01:00
Andy Lutomirski
a15781b536 x86/fault: Fold smap_violation() into do_user_addr_fault()
smap_violation() has a single caller, and the contents are a bit
nonsensical.  I'm going to fix it, but first let's fold it into its
caller for ease of comprehension.

In this particular case, the user_mode(regs) check is incorrect --
it will cause false positives in the case of a user-initiated
kernel-privileged access.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/806c366f6ca861152398ce2c01744d59d9aceb6d.1542667307.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-20 08:44:28 +01:00
Andy Lutomirski
dae0a10593 x86/cpufeatures, x86/fault: Mark SMAP as disabled when configured out
Add X86_FEATURE_SMAP to the disabled features mask as appropriate
and use cpu_feature_enabled() in the fault code.  This lets us get
rid of a redundant IS_ENABLED(CONFIG_X86_SMAP).

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/fe93332eded3d702f0b0b4cf83928d6830739ba3.1542667307.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-20 08:44:28 +01:00
Andy Lutomirski
6344be608c x86/fault: Check user_mode(regs) when avoiding an mmap_sem deadlock
The fault-handling code that takes mmap_sem needs to avoid a
deadlock that could occur if the kernel took a bad (OOPS-worthy)
page fault on a user address while holding mmap_sem.  This can only
happen if the faulting instruction was in the kernel
(i.e. user_mode(regs)).  Rather than checking the sw_error_code
(which will have the USER bit set if the fault was a USER-permission
access *or* if user_mode(regs)), just check user_mode(regs)
directly.

The old code would have malfunctioned if the kernel executed a bogus
WRUSS instruction while holding mmap_sem.  Fortunately, that is
extremely unlikely in current kernels, which don't use WRUSS.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Link: http://lkml.kernel.org/r/4b89b542e8ceba9bd6abde2f386afed6d99244a9.1542667307.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-20 08:44:27 +01:00
Waiman Long
1d8ca3be86 x86/mm/fault: Allow stack access below %rsp
The current x86 page fault handler allows stack access below the stack
pointer if it is no more than 64k+256 bytes. Any access beyond the 64k+
limit will cause a segmentation fault.

The gcc -fstack-check option generates code to probe the stack for
large stack allocation to see if the stack is accessible. The newer gcc
does that while updating the %rsp simultaneously. Older gcc's like gcc4
doesn't do that. As a result, an application compiled with an old gcc
and the -fstack-check option may fail to start at all:

  $ cat test.c
  int main() {
	char tmp[1024*128];
	printf("### ok\n");
	return 0;
  }

  $ gcc -fstack-check -g -o test test.c

  $ ./test
  Segmentation fault

The old binary was working in older kernels where expand_stack() was
somehow called before the check. But it is not working in newer kernels.
Besides, the 64k+ limit check is kind of crude and will not catch a
lot of mistakes that userspace applications may be misbehaving anyway.
I think the kernel isn't the right place for this kind of tests. We
should leave it to userspace instrumentation tools to perform them.

The 64k+ limit check is now removed to just let expand_stack() decide
if a segmentation fault should happen, when the RLIMIT_STACK limit is
exceeded, for example.

Signed-off-by: Waiman Long <longman@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1541535149-31963-1-git-send-email-longman@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-11-12 11:06:19 +01:00
Mike Rapoport
57c8a661d9 mm: remove include/linux/bootmem.h
Move remaining definitions and declarations from include/linux/bootmem.h
into include/linux/memblock.h and remove the redundant header.

The includes were replaced with the semantic patch below and then
semi-automated removal of duplicated '#include <linux/memblock.h>

@@
@@
- #include <linux/bootmem.h>
+ #include <linux/memblock.h>

[sfr@canb.auug.org.au: dma-direct: fix up for the removal of linux/bootmem.h]
  Link: http://lkml.kernel.org/r/20181002185342.133d1680@canb.auug.org.au
[sfr@canb.auug.org.au: powerpc: fix up for removal of linux/bootmem.h]
  Link: http://lkml.kernel.org/r/20181005161406.73ef8727@canb.auug.org.au
[sfr@canb.auug.org.au: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal]
  Link: http://lkml.kernel.org/r/20181008190341.5e396491@canb.auug.org.au
Link: http://lkml.kernel.org/r/1536927045-23536-30-git-send-email-rppt@linux.vnet.ibm.com
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Greentime Hu <green.hu@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Paul Burton <paul.burton@mips.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Serge Semin <fancer.lancer@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-10-31 08:54:16 -07:00
Linus Torvalds
ba9f6f8954 Merge branch 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull siginfo updates from Eric Biederman:
 "I have been slowly sorting out siginfo and this is the culmination of
  that work.

  The primary result is in several ways the signal infrastructure has
  been made less error prone. The code has been updated so that manually
  specifying SEND_SIG_FORCED is never necessary. The conversion to the
  new siginfo sending functions is now complete, which makes it
  difficult to send a signal without filling in the proper siginfo
  fields.

  At the tail end of the patchset comes the optimization of decreasing
  the size of struct siginfo in the kernel from 128 bytes to about 48
  bytes on 64bit. The fundamental observation that enables this is by
  definition none of the known ways to use struct siginfo uses the extra
  bytes.

  This comes at the cost of a small user space observable difference.
  For the rare case of siginfo being injected into the kernel only what
  can be copied into kernel_siginfo is delivered to the destination, the
  rest of the bytes are set to 0. For cases where the signal and the
  si_code are known this is safe, because we know those bytes are not
  used. For cases where the signal and si_code combination is unknown
  the bits that won't fit into struct kernel_siginfo are tested to
  verify they are zero, and the send fails if they are not.

  I made an extensive search through userspace code and I could not find
  anything that would break because of the above change. If it turns out
  I did break something it will take just the revert of a single change
  to restore kernel_siginfo to the same size as userspace siginfo.

  Testing did reveal dependencies on preferring the signo passed to
  sigqueueinfo over si->signo, so bit the bullet and added the
  complexity necessary to handle that case.

  Testing also revealed bad things can happen if a negative signal
  number is passed into the system calls. Something no sane application
  will do but something a malicious program or a fuzzer might do. So I
  have fixed the code that performs the bounds checks to ensure negative
  signal numbers are handled"

* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (80 commits)
  signal: Guard against negative signal numbers in copy_siginfo_from_user32
  signal: Guard against negative signal numbers in copy_siginfo_from_user
  signal: In sigqueueinfo prefer sig not si_signo
  signal: Use a smaller struct siginfo in the kernel
  signal: Distinguish between kernel_siginfo and siginfo
  signal: Introduce copy_siginfo_from_user and use it's return value
  signal: Remove the need for __ARCH_SI_PREABLE_SIZE and SI_PAD_SIZE
  signal: Fail sigqueueinfo if si_signo != sig
  signal/sparc: Move EMT_TAGOVF into the generic siginfo.h
  signal/unicore32: Use force_sig_fault where appropriate
  signal/unicore32: Generate siginfo in ucs32_notify_die
  signal/unicore32: Use send_sig_fault where appropriate
  signal/arc: Use force_sig_fault where appropriate
  signal/arc: Push siginfo generation into unhandled_exception
  signal/ia64: Use force_sig_fault where appropriate
  signal/ia64: Use the force_sig(SIGSEGV,...) in ia64_rt_sigreturn
  signal/ia64: Use the generic force_sigsegv in setup_frame
  signal/arm/kvm: Use send_sig_mceerr
  signal/arm: Use send_sig_fault where appropriate
  signal/arm: Use force_sig_fault where appropriate
  ...
2018-10-24 11:22:39 +01:00
Linus Torvalds
99792e0cea Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:
 "Lots of changes in this cycle:

   - Lots of CPA (change page attribute) optimizations and related
     cleanups (Thomas Gleixner, Peter Zijstra)

   - Make lazy TLB mode even lazier (Rik van Riel)

   - Fault handler cleanups and improvements (Dave Hansen)

   - kdump, vmcore: Enable kdumping encrypted memory with AMD SME
     enabled (Lianbo Jiang)

   - Clean up VM layout documentation (Baoquan He, Ingo Molnar)

   - ... plus misc other fixes and enhancements"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (51 commits)
  x86/stackprotector: Remove the call to boot_init_stack_canary() from cpu_startup_entry()
  x86/mm: Kill stray kernel fault handling comment
  x86/mm: Do not warn about PCI BIOS W+X mappings
  resource: Clean it up a bit
  resource: Fix find_next_iomem_res() iteration issue
  resource: Include resource end in walk_*() interfaces
  x86/kexec: Correct KEXEC_BACKUP_SRC_END off-by-one error
  x86/mm: Remove spurious fault pkey check
  x86/mm/vsyscall: Consider vsyscall page part of user address space
  x86/mm: Add vsyscall address helper
  x86/mm: Fix exception table comments
  x86/mm: Add clarifying comments for user addr space
  x86/mm: Break out user address space handling
  x86/mm: Break out kernel address space handling
  x86/mm: Clarify hardware vs. software "error_code"
  x86/mm/tlb: Make lazy TLB mode lazier
  x86/mm/tlb: Add freed_tables element to flush_tlb_info
  x86/mm/tlb: Add freed_tables argument to flush_tlb_mm_range
  smp,cpumask: introduce on_each_cpu_cond_mask
  smp: use __cpumask_set_cpu in on_each_cpu_cond
  ...
2018-10-23 17:05:28 +01:00
Linus Torvalds
0200fbdd43 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking and misc x86 updates from Ingo Molnar:
 "Lots of changes in this cycle - in part because locking/core attracted
  a number of related x86 low level work which was easier to handle in a
  single tree:

   - Linux Kernel Memory Consistency Model updates (Alan Stern, Paul E.
     McKenney, Andrea Parri)

   - lockdep scalability improvements and micro-optimizations (Waiman
     Long)

   - rwsem improvements (Waiman Long)

   - spinlock micro-optimization (Matthew Wilcox)

   - qspinlocks: Provide a liveness guarantee (more fairness) on x86.
     (Peter Zijlstra)

   - Add support for relative references in jump tables on arm64, x86
     and s390 to optimize jump labels (Ard Biesheuvel, Heiko Carstens)

   - Be a lot less permissive on weird (kernel address) uaccess faults
     on x86: BUG() when uaccess helpers fault on kernel addresses (Jann
     Horn)

   - macrofy x86 asm statements to un-confuse the GCC inliner. (Nadav
     Amit)

   - ... and a handful of other smaller changes as well"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits)
  locking/lockdep: Make global debug_locks* variables read-mostly
  locking/lockdep: Fix debug_locks off performance problem
  locking/pvqspinlock: Extend node size when pvqspinlock is configured
  locking/qspinlock_stat: Count instances of nested lock slowpaths
  locking/qspinlock, x86: Provide liveness guarantee
  x86/asm: 'Simplify' GEN_*_RMWcc() macros
  locking/qspinlock: Rework some comments
  locking/qspinlock: Re-order code
  locking/lockdep: Remove duplicated 'lock_class_ops' percpu array
  x86/defconfig: Enable CONFIG_USB_XHCI_HCD=y
  futex: Replace spin_is_locked() with lockdep
  locking/lockdep: Make class->ops a percpu counter and move it under CONFIG_DEBUG_LOCKDEP=y
  x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs
  x86/cpufeature: Macrofy inline assembly code to work around GCC inlining bugs
  x86/extable: Macrofy inline assembly code to work around GCC inlining bugs
  x86/paravirt: Work around GCC inlining bugs when compiling paravirt ops
  x86/bug: Macrofy the BUG table section handling, to work around GCC inlining bugs
  x86/alternatives: Macrofy lock prefixes to work around GCC inlining bugs
  x86/refcount: Work around GCC inlining bug
  x86/objtool: Use asm macros to work around GCC inlining bugs
  ...
2018-10-23 13:08:53 +01:00
Dave Hansen
1620414251 x86/mm: Kill stray kernel fault handling comment
I originally had matching user and kernel comments, but the kernel
one got improved.  Some errant conflict resolution kicked the commment
somewhere wrong.  Kill it.

Reported-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Jann Horn <jannh@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: aa37c51b94 ("x86/mm: Break out user address space handling")
Link: http://lkml.kernel.org/r/20181019140842.12F929FA@viggo.jf.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-10-21 10:58:10 +02:00
Dave Hansen
367e3f1d3f x86/mm: Remove spurious fault pkey check
Spurious faults only ever occur in the kernel's address space.  They
are also constrained specifically to faults with one of these error codes:

	X86_PF_WRITE | X86_PF_PROT
	X86_PF_INSTR | X86_PF_PROT

So, it's never even possible to reach spurious_kernel_fault_check() with
X86_PF_PK set.

In addition, the kernel's address space never has pages with user-mode
protections.  Protection Keys are only enforced on pages with user-mode
protection.

This gives us lots of reasons to not check for protection keys in our
sprurious kernel fault handling.

But, let's also add some warnings to ensure that these assumptions about
protection keys hold true.

Cc: x86@kernel.org
Cc: Jann Horn <jannh@google.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180928160231.243A0D6A@viggo.jf.intel.com
2018-10-09 16:51:16 +02:00
Dave Hansen
3ae0ad92f5 x86/mm/vsyscall: Consider vsyscall page part of user address space
The vsyscall page is weird.  It is in what is traditionally part of
the kernel address space.  But, it has user permissions and we handle
faults on it like we would on a user page: interrupts on.

Right now, we handle vsyscall emulation in the "bad_area" code, which
is used for both user-address-space and kernel-address-space faults.
Move the handling to the user-address-space code *only* and ensure we
get there by "excluding" the vsyscall page from the kernel address
space via a check in fault_in_kernel_space().

Since the fault_in_kernel_space() check is used on 32-bit, also add a
64-bit check to make it clear we only use this path on 64-bit.  Also
move the unlikely() to be in is_vsyscall_vaddr() itself.

This helps clean up the kernel fault handling path by removing a case
that can happen in normal[1] operation.  (Yeah, yeah, we can argue
about the vsyscall page being "normal" or not.)  This also makes
sanity checks easier, like the "we never take pkey faults in the
kernel address space" check in the next patch.

Cc: x86@kernel.org
Cc: Jann Horn <jannh@google.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180928160230.6E9336EE@viggo.jf.intel.com
2018-10-09 16:51:16 +02:00
Dave Hansen
02e983b760 x86/mm: Add vsyscall address helper
We will shortly be using this check in two locations.  Put it in
a helper before we do so.

Let's also insert PAGE_MASK instead of the open-coded ~0xfff.
It is easier to read and also more obviously correct considering
the implicit type conversion that has to happen when it is not
an implicit 'unsigned long'.

Cc: x86@kernel.org
Cc: Jann Horn <jannh@google.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180928160228.C593509B@viggo.jf.intel.com
2018-10-09 16:51:16 +02:00
Dave Hansen
88259744e2 x86/mm: Fix exception table comments
The comments here are wrong.  They are too absolute about where
faults can occur when running in the kernel.  The comments are
also a bit hard to match up with the code.

Trim down the comments, and make them more precise.

Also add a comment explaining why we are doing the
bad_area_nosemaphore() path here.

Cc: x86@kernel.org
Cc: Jann Horn <jannh@google.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180928160227.077DDD7A@viggo.jf.intel.com
2018-10-09 16:51:16 +02:00
Dave Hansen
5b0c2cac54 x86/mm: Add clarifying comments for user addr space
The SMAP and Reserved checking do not have nice comments.  Add
some to clarify and make it match everything else.

Cc: x86@kernel.org
Cc: Jann Horn <jannh@google.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180928160225.FFD44B8D@viggo.jf.intel.com
2018-10-09 16:51:16 +02:00
Dave Hansen
aa37c51b94 x86/mm: Break out user address space handling
The last patch broke out kernel address space handing into its own
helper.  Now, do the same for user address space handling.

Cc: x86@kernel.org
Cc: Jann Horn <jannh@google.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180928160223.9C4F6440@viggo.jf.intel.com
2018-10-09 16:51:15 +02:00
Dave Hansen
8fed620000 x86/mm: Break out kernel address space handling
The page fault handler (__do_page_fault())  basically has two sections:
one for handling faults in the kernel portion of the address space
and another for faults in the user portion of the address space.

But, these two parts don't stick out that well.  Let's make that more
clear from code separation and naming.  Pull kernel fault
handling into its own helper, and reflect that naming by renaming
spurious_fault() -> spurious_kernel_fault().

Also, rewrite the vmalloc() handling comment a bit.  It was a bit
stale and also glossed over the reserved bit handling.

Cc: x86@kernel.org
Cc: Jann Horn <jannh@google.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180928160222.401F4E10@viggo.jf.intel.com
2018-10-09 16:51:15 +02:00
Dave Hansen
164477c233 x86/mm: Clarify hardware vs. software "error_code"
We pass around a variable called "error_code" all around the page
fault code.  Sounds simple enough, especially since "error_code" looks
like it exactly matches the values that the hardware gives us on the
stack to report the page fault error code (PFEC in SDM parlance).

But, that's not how it works.

For part of the page fault handler, "error_code" does exactly match
PFEC.  But, during later parts, it diverges and starts to mean
something a bit different.

Give it two names for its two jobs.

The place it diverges is also really screwy.  It's only in a spot
where the hardware tells us we have kernel-mode access that occurred
while we were in usermode accessing user-controlled address space.
Add a warning in there.

Cc: x86@kernel.org
Cc: Jann Horn <jannh@google.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180928160220.4A2272C9@viggo.jf.intel.com
2018-10-09 16:51:15 +02:00
Sai Praneeth
3425d934fc efi/x86: Handle page faults occurring while running EFI runtime services
Memory accesses performed by UEFI runtime services should be limited to:
- reading/executing from EFI_RUNTIME_SERVICES_CODE memory regions
- reading/writing from/to EFI_RUNTIME_SERVICES_DATA memory regions
- reading/writing by-ref arguments
- reading/writing from/to the stack.

Accesses outside these regions may cause the kernel to hang because the
memory region requested by the firmware isn't mapped in efi_pgd, which
causes a page fault in ring 0 and the kernel fails to handle it, leading
to die(). To save kernel from hanging, add an EFI specific page fault
handler which recovers from such faults by
1. If the efi runtime service is efi_reset_system(), reboot the machine
   through BIOS.
2. If the efi runtime service is _not_ efi_reset_system(), then freeze
   efi_rts_wq and schedule a new process.

The EFI page fault handler offers us two advantages:
1. Avoid potential hangs caused by buggy firmware.
2. Shout loud that the firmware is buggy and hence is not a kernel bug.

Tested-by: Bhupesh Sharma <bhsharma@redhat.com>
Suggested-by: Matt Fleming <matt@codeblueprint.co.uk>
Based-on-code-from: Ricardo Neri <ricardo.neri@intel.com>
Signed-off-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
[ardb: clarify commit log]
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
2018-09-26 12:14:55 +02:00
Eric W. Biederman
419ceeb128 signal/x86: Pass pkey by value
Now that si_code == SEGV_PKUERR is the flag indicating that a pkey
is present there is no longer a need to pass a pointer to a local
pkey value, instead pkey can be passed more efficiently by value.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-09-21 15:29:57 +02:00
Eric W. Biederman
b4fd52f25c signal/x86: Replace force_sig_info_fault with force_sig_fault
Now that the pkey handling has been removed force_sig_info_fault and
force_sig_fault perform identical work.  Just the type of the address
paramter is different.  So replace calls to force_sig_info_fault with
calls to force_sig_fault, and remove force_sig_info_fault.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-09-21 15:26:24 +02:00
Eric W. Biederman
9db812dbb2 signal/x86: Call force_sig_pkuerr from __bad_area_nosemaphore
There is only one code path that can generate a pkuerr signal.  That
code path calls __bad_area_nosemaphore and can be dectected by testing
if si_code == SEGV_PKUERR.  It can be seen from inspection that all of
the other tests in fill_sig_info_pkey are unnecessary.

Therefore call force_sig_pkuerr directly from __bad_area_semaphore and
remove fill_sig_info_pkey.

At the same time move the comment above force_sig_info_pkey into
bad_area_access_error, so that the documentation about pkey generation
races is not lost.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-09-21 15:03:05 +02:00
Eric W. Biederman
aba1ecd32c signal/x86: Pass pkey not vma into __bad_area
There is only one caller of __bad_area that passes in PKUERR and thus
will generate a siginfo with si_pkey set.  Therefore simplify the
logic and hoist reading of vma_pkey up into that caller, and just
pass *pkey into __bad_area.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-09-21 14:54:17 +02:00
Eric W. Biederman
988bbc7b1a signal/x86: Don't compute pkey in __do_page_fault
There are no more users of the computed pkey value in __do_page_fault
so stop computing the value.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-09-21 14:52:12 +02:00
Eric W. Biederman
25c102d803 signal/x86: Remove pkey parameter from mm_fault_error
After the previous cleanups to do_sigbus and and bad_area_nosemaphore
mm_fault_error no now longer uses it's pkey parameter.  Therefore
remove the unused parameter.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-09-21 14:51:42 +02:00
Eric W. Biederman
27274f731c signal/x86: Remove the pkey parameter from do_sigbus
The function do_sigbus never sets si_code to PKUERR so it can never
return a pkey to userspace.  Therefore remove the unusable pkey
parameter from do_sigbus.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-09-21 14:48:44 +02:00
Eric W. Biederman
768fd9c69b signal/x86: Remove pkey parameter from bad_area_nosemaphore
The function bad_area_nosemaphore always sets si_code to SEGV_MAPERR
and as such can never return a pkey parameter.  Therefore remove the
unusable pkey parameter from bad_area_nosemaphore.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-09-21 14:48:04 +02:00
Eric W. Biederman
40e5539463 signal/x86: Move MCE error reporting out of force_sig_info_fault
Only the call from do_sigbus will send SIGBUS due to a memory machine
check error.  Consolidate all of the machine check signal generation
code in do_sigbus and remove the now unnecessary fault parameter from
force_sig_info_fault.

Explicitly use the now constant si_code BUS_ADRERR in the call
to force_sig_info_fault from do_sigbus.

This makes the code in arch/x86/mm/fault.c easier to follower and
simpler to maintain.

Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-09-19 15:50:08 +02:00
Jann Horn
81fd9c1844 x86/fault: Plumb error code and fault address through to fault handlers
This is preparation for looking at trap number and fault address in the
handlers for uaccess errors. No functional change.

Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Kees Cook <keescook@chromium.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: linux-kernel@vger.kernel.org
Cc: dvyukov@google.com
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: Borislav Petkov <bp@alien8.de>
Link: https://lkml.kernel.org/r/20180828201421.157735-6-jannh@google.com
2018-09-03 15:12:09 +02:00
Jann Horn
a980c0ef9f x86/kprobes: Refactor kprobes_fault() like kprobe_exceptions_notify()
This is an extension of commit b506a9d08b ("x86: code clarification patch
to Kprobes arch code"). As that commit explains, even though
kprobe_running() can't be called with preemption enabled, preemption does
not need to be disabled. If preemption is enabled, then this can't be
originate from a kprobe.

Also, use X86_TRAP_PF instead of 14.

Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Kees Cook <keescook@chromium.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: kernel-hardening@lists.openwall.com
Cc: dvyukov@google.com
Cc: "Naveen N. Rao" <naveen.n.rao@linux.vnet.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: Borislav Petkov <bp@alien8.de>
Link: https://lkml.kernel.org/r/20180828201421.157735-2-jannh@google.com
2018-09-03 15:12:08 +02:00
Jann Horn
342db04ae7 x86/dumpstack: Don't dump kernel memory based on usermode RIP
show_opcodes() is used both for dumping kernel instructions and for dumping
user instructions. If userspace causes #PF by jumping to a kernel address,
show_opcodes() can be reached with regs->ip controlled by the user,
pointing to kernel code. Make sure that userspace can't trick us into
dumping kernel memory into dmesg.

Fixes: 7cccf0725c ("x86/dumpstack: Add a show_ip() function")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: security@kernel.org
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180828154901.112726-1-jannh@google.com
2018-08-31 17:08:22 +02:00
Souptick Joarder
50a7ca3c6f mm: convert return type of handle_mm_fault() caller to vm_fault_t
Use new return type vm_fault_t for fault handler.  For now, this is just
documenting that the function returns a VM_FAULT value rather than an
errno.  Once all instances are converted, vm_fault_t will become a
distinct type.

Ref-> commit 1c8f422059 ("mm: change return type to vm_fault_t")

In this patch all the caller of handle_mm_fault() are changed to return
vm_fault_t type.

Link: http://lkml.kernel.org/r/20180617084810.GA6730@jordon-HP-15-Notebook-PC
Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: James Hogan <jhogan@kernel.org>
Cc: Ley Foon Tan <lftan@altera.com>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: James E.J. Bottomley <jejb@parisc-linux.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: David S. Miller <davem@davemloft.net>
Cc: Richard Weinberger <richard@nod.at>
Cc: Guan Xuetao <gxt@pku.edu.cn>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Levin, Alexander (Sasha Levin)" <alexander.levin@verizon.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-08-17 16:20:28 -07:00
Joerg Roedel
6863ea0cda x86/mm: Remove in_nmi() warning from vmalloc_fault()
It is perfectly okay to take page-faults, especially on the
vmalloc area while executing an NMI handler. Remove the
warning.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: David H. Gutteridge <dhgutteridge@sympatico.ca>
Cc: "H . Peter Anvin" <hpa@zytor.com>
Cc: linux-mm@kvack.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Waiman Long <llong@redhat.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: joro@8bytes.org
Link: https://lkml.kernel.org/r/1532533683-5988-2-git-send-email-joro@8bytes.org
2018-07-30 13:53:48 +02:00
Dmitry Vyukov
d79d0d8ad0 x86/mm: Clean up the printk()s in show_fault_oops()
- Remove 'nx_warning' and 'smep_warning', which are just pointless obfuscation.
- Also convert to pr_crit().

Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180627090715.28076-1-dvyukov@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-27 14:08:11 +02:00
Dmitry Vyukov
4188f063e3 x86/mm: Get rid of KERN_CONT in show_fault_oops()
KERN_CONT leads to split lines in kernel output
and complicates useful changes to printk like
printing context before each line.

Only acceptable use of continuations is basically
boot-time testing.

Get rid of it.

Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180625123808.227417-1-dvyukov@gmail.com
[ Removed unnecessary parentheses and prettified the printk statement. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-06-26 09:00:25 +02:00
Linus Torvalds
8316385687 Merge branch 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 debug updates from Ingo Molnar:
 "This contains the x86 oops code printing reorganization and cleanups
  from Borislav Betkov, with a particular focus in enhancing opcode
  dumping all around"

* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/dumpstack: Explain the reasoning for the prologue and buffer size
  x86/dumpstack: Save first regs set for the executive summary
  x86/dumpstack: Add a show_ip() function
  x86/fault: Dump user opcode bytes on fatal faults
  x86/dumpstack: Add loglevel argument to show_opcodes()
  x86/dumpstack: Improve opcodes dumping in the code section
  x86/dumpstack: Carve out code-dumping into a function
  x86/dumpstack: Unexport oops_begin()
  x86/dumpstack: Remove code_bytes
2018-06-04 19:19:16 -07:00
Linus Torvalds
5cef8c2a22 Merge branch 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 boot updates from Ingo Molnar:

 - Centaur CPU updates (David Wang)

 - AMD and other CPU topology enumeration improvements and fixes
   (Borislav Petkov, Thomas Gleixner, Suravee Suthikulpanit)

 - Continued 5-level paging work (Kirill A. Shutemov)

* 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/mm: Mark __pgtable_l5_enabled __initdata
  x86/mm: Mark p4d_offset() __always_inline
  x86/mm: Introduce the 'no5lvl' kernel parameter
  x86/mm: Stop pretending pgtable_l5_enabled is a variable
  x86/mm: Unify pgtable_l5_enabled usage in early boot code
  x86/boot/compressed/64: Fix trampoline page table address calculation
  x86/CPU: Move x86_cpuinfo::x86_max_cores assignment to detect_num_cpu_cores()
  x86/Centaur: Report correct CPU/cache topology
  x86/CPU: Move cpu_detect_cache_sizes() into init_intel_cacheinfo()
  x86/CPU: Make intel_num_cpu_cores() generic
  x86/CPU: Move cpu local function declarations to local header
  x86/CPU/AMD: Derive CPU topology from CPUID function 0xB when available
  x86/CPU: Modify detect_extended_topology() to return result
  x86/CPU/AMD: Calculate last level cache ID from number of sharing threads
  x86/CPU: Rename intel_cacheinfo.c to cacheinfo.c
  perf/events/amd/uncore: Fix amd_uncore_llc ID to use pre-defined cpu_llc_id
  x86/CPU/AMD: Have smp_num_siblings and cpu_llc_id always be present
  x86/Centaur: Initialize supported CPU features properly
2018-06-04 18:19:18 -07:00
Kirill A. Shutemov
ed7588d5dc x86/mm: Stop pretending pgtable_l5_enabled is a variable
pgtable_l5_enabled is defined using cpu_feature_enabled() but we refer
to it as a variable. This is misleading.

Make pgtable_l5_enabled() a function.

We cannot literally define it as a function due to circular dependencies
between header files. Function-alike macros is close enough.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180518103528.59260-4-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-05-19 11:56:57 +02:00
Borislav Petkov
ba54d856a9 x86/fault: Dump user opcode bytes on fatal faults
Sometimes it is useful to see which user opcode bytes RIP points to
when a fault happens: be it to rule out RIP corruption, to dump info
early during boot, when doing core dumps is impossible due to not having
a writable filesystem yet.

Sometimes it is useful if debugging an issue and one doesn't have access
to the executable which caused the fault in order to disassemble it.

That last aspect might have some security implications so
show_unhandled_signals could be revisited for that or a new config option
added.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Link: https://lkml.kernel.org/r/20180417161124.5294-7-bp@alien8.de
2018-04-26 16:15:27 +02:00
Eric W. Biederman
3eb0f5193b signal: Ensure every siginfo we send has all bits initialized
Call clear_siginfo to ensure every stack allocated siginfo is properly
initialized before being passed to the signal sending functions.

Note: It is not safe to depend on C initializers to initialize struct
siginfo on the stack because C is allowed to skip holes when
initializing a structure.

The initialization of struct siginfo in tracehook_report_syscall_exit
was moved from the helper user_single_step_siginfo into
tracehook_report_syscall_exit itself, to make it clear that the local
variable siginfo gets fully initialized.

In a few cases the scope of struct siginfo has been reduced to make it
clear that siginfo siginfo is not used on other paths in the function
in which it is declared.

Instances of using memset to initialize siginfo have been replaced
with calls clear_siginfo for clarity.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2018-04-25 10:40:51 -05:00
Linus Torvalds
d22fff8141 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Ingo Molnar:

 - Extend the memmap= boot parameter syntax to allow the redeclaration
   and dropping of existing ranges, and to support all e820 range types
   (Jan H. Schönherr)

 - Improve the W+X boot time security checks to remove false positive
   warnings on Xen (Jan Beulich)

 - Support booting as Xen PVH guest (Juergen Gross)

 - Improved 5-level paging (LA57) support, in particular it's possible
   now to have a single kernel image for both 4-level and 5-level
   hardware (Kirill A. Shutemov)

 - AMD hardware RAM encryption support (SME/SEV) fixes (Tom Lendacky)

 - Preparatory commits for hardware-encrypted RAM support on Intel CPUs.
   (Kirill A. Shutemov)

 - Improved Intel-MID support (Andy Shevchenko)

 - Show EFI page tables in page_tables debug files (Andy Lutomirski)

 - ... plus misc fixes and smaller cleanups

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (56 commits)
  x86/cpu/tme: Fix spelling: "configuation" -> "configuration"
  x86/boot: Fix SEV boot failure from change to __PHYSICAL_MASK_SHIFT
  x86/mm: Update comment in detect_tme() regarding x86_phys_bits
  x86/mm/32: Remove unused node_memmap_size_bytes() & CONFIG_NEED_NODE_MEMMAP_SIZE logic
  x86/mm: Remove pointless checks in vmalloc_fault
  x86/platform/intel-mid: Add special handling for ACPI HW reduced platforms
  ACPI, x86/boot: Introduce the ->reduced_hw_early_init() ACPI callback
  ACPI, x86/boot: Split out acpi_generic_reduce_hw_init() and export
  x86/pconfig: Provide defines and helper to run MKTME_KEY_PROG leaf
  x86/pconfig: Detect PCONFIG targets
  x86/tme: Detect if TME and MKTME is activated by BIOS
  x86/boot/compressed/64: Handle 5-level paging boot if kernel is above 4G
  x86/boot/compressed/64: Use page table in trampoline memory
  x86/boot/compressed/64: Use stack from trampoline memory
  x86/boot/compressed/64: Make sure we have a 32-bit code segment
  x86/mm: Do not use paravirtualized calls in native_set_p4d()
  kdump, vmcoreinfo: Export pgtable_l5_enabled value
  x86/boot/compressed/64: Prepare new top-level page table for trampoline
  x86/boot/compressed/64: Set up trampoline memory
  x86/boot/compressed/64: Save and restore trampoline memory
  ...
2018-04-02 15:45:30 -07:00
Linus Torvalds
986b37c0ae Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups and msr updates from Ingo Molnar:
 "The main change is a performance/latency improvement to /dev/msr
  access. The rest are misc cleanups"

* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/msr: Make rdmsrl_safe_on_cpu() scheduling safe as well
  x86/cpuid: Allow cpuid_read() to schedule
  x86/msr: Allow rdmsr_safe_on_cpu() to schedule
  x86/rtc: Stop using deprecated functions
  x86/dumpstack: Unify show_regs()
  x86/fault: Do not print IP in show_fault_oops()
  x86/MSR: Move native_* variants to msr.h
2018-04-02 15:16:43 -07:00
Toshi Kani
565977a3d9 x86/mm: Remove pointless checks in vmalloc_fault
vmalloc_fault() sets user's pgd or p4d from the kernel page table.  Once
it's set, all tables underneath are identical. There is no point of
following the same page table with two separate pointers and make sure they
see the same with BUG().

Remove the pointless checks in vmalloc_fault(). Also rename the kernel
pgd/p4d pointers to pgd_k/p4d_k so that their names are consistent in the
file.

Suggested-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Cc: Borislav Petkov <bp@alien8.de>
Cc: Gratian Crisan <gratian.crisan@ni.com>
Link: https://lkml.kernel.org/r/20180314205932.7193-1-toshi.kani@hpe.com
2018-03-15 15:27:47 +01:00
Thomas Gleixner
745dd37f9d Merge branch 'x86/urgent' into x86/mm to pick up dependencies 2018-03-14 20:23:25 +01:00
Toshi Kani
18a955219b x86/mm: Fix vmalloc_fault to use pXd_large
Gratian Crisan reported that vmalloc_fault() crashes when CONFIG_HUGETLBFS
is not set since the function inadvertently uses pXn_huge(), which always
return 0 in this case.  ioremap() does not depend on CONFIG_HUGETLBFS.

Fix vmalloc_fault() to call pXd_large() instead.

Fixes: f4eafd8bcd ("x86/mm: Fix vmalloc_fault() to handle large pages properly")
Reported-by: Gratian Crisan <gratian.crisan@ni.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Cc: linux-mm@kvack.org
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20180313170347.3829-2-toshi.kani@hpe.com
2018-03-14 20:22:42 +01:00
Ingo Molnar
3c76db70eb Merge branch 'x86/pti' into x86/mm, to pick up dependencies
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-03-12 12:10:03 +01:00
Borislav Petkov
9558080935 x86/fault: Do not print IP in show_fault_oops()
... because __show_regs() already does that.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Link: https://lkml.kernel.org/r/20180306094920.16917-3-bp@alien8.de
2018-03-08 12:04:59 +01:00
Jann Horn
3b3a9268bb x86/mm: Remove stale comment about KMEMCHECK
This comment referred to a conditional call to kmemcheck_hide() that was
here until commit 4950276672 ("kmemcheck: remove annotations").

Now that kmemcheck has been removed, it doesn't make sense anymore.

Signed-off-by: Jann Horn <jannh@google.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20180219175039.253089-1-jannh@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-20 09:33:39 +01:00
Kirill A. Shutemov
91f606a8fa x86/mm: Replace compile-time checks for 5-level paging with runtime-time checks
This patch converts the of CONFIG_X86_5LEVEL check to runtime checks for
p4d folding.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/20180214182542.69302-9-kirill.shutemov@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2018-02-16 10:48:49 +01:00
Andy Lutomirski
36b3a77268 x86/mm/64: Tighten up vmalloc_fault() sanity checks on 5-level kernels
On a 5-level kernel, if a non-init mm has a top-level entry, it needs to
match init_mm's, but the vmalloc_fault() code skipped over the BUG_ON()
that would have checked it.

While we're at it, get rid of the rather confusing 4-level folded "pgd"
logic.

Cleans-up: b50858ce3e ("x86/mm/vmalloc: Add 5-level paging support")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Neil Berrington <neil.berrington@datacore.com>
Link: https://lkml.kernel.org/r/2ae598f8c279b0a29baf75df207e6f2fdddc0a1b.1516914529.git.luto@kernel.org
2018-01-26 15:56:23 +01:00
Eric W. Biederman
beacd6f7ed x86/mm/pkeys: Fix fill_sig_info_pkey
SEGV_PKUERR is a signal specific si_code which happens to have the same
numeric value as several others: BUS_MCEERR_AR, ILL_ILLTRP, FPE_FLTOVF,
TRAP_HWBKPT, CLD_TRAPPED, POLL_ERR, SEGV_THREAD_ID, as such it is not safe
to just test the si_code the signal number must also be tested to prevent a
false positive in fill_sig_info_pkey.

This error was by inspection, and BUS_MCEERR_AR appears to be a real
candidate for confusion.  So pass in si_signo and check for SIG_SEGV to
verify that it is actually a SEGV_PKUERR

Fixes: 019132ff3d ("x86/mm/pkeys: Fill in pkey field in siginfo")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20180112203135.4669-2-ebiederm@xmission.com
2018-01-14 12:14:51 +01:00
Kees Cook
10a7e9d849 Do not hash userspace addresses in fault handlers
The hashing of %p was designed to restrict kernel addresses. There is
no reason to hash the userspace values seen during a segfault report,
so switch these to %px. (Some architectures already use %lx.)

Fixes: ad67b74d24 ("printk: hash addresses printed with %p")
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-19 17:04:43 -08:00
Linus Torvalds
328b4ed93b x86: don't hash faulting address in oops printout
Things like this will probably keep showing up for other architectures
and other special cases.

I actually thought we already used %lx for this, and that is indeed
_historically_ the case, but we moved to %p when merging the 32-bit and
64-bit cases as a convenient way to get the formatting right (ie
automatically picking "%08lx" vs "%016lx" based on register size).

So just turn this %p into %px.

Reported-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-12-05 17:59:29 -08:00
Levin, Alexander (Sasha Levin)
4950276672 kmemcheck: remove annotations
Patch series "kmemcheck: kill kmemcheck", v2.

As discussed at LSF/MM, kill kmemcheck.

KASan is a replacement that is able to work without the limitation of
kmemcheck (single CPU, slow).  KASan is already upstream.

We are also not aware of any users of kmemcheck (or users who don't
consider KASan as a suitable replacement).

The only objection was that since KASAN wasn't supported by all GCC
versions provided by distros at that time we should hold off for 2
years, and try again.

Now that 2 years have passed, and all distros provide gcc that supports
KASAN, kill kmemcheck again for the very same reasons.

This patch (of 4):

Remove kmemcheck annotations, and calls to kmemcheck from the kernel.

[alexander.levin@verizon.com: correctly remove kmemcheck call from dma_map_sg_attrs]
  Link: http://lkml.kernel.org/r/20171012192151.26531-1-alexander.levin@verizon.com
Link: http://lkml.kernel.org/r/20171007030159.22241-2-alexander.levin@verizon.com
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim Hansen <devtimhansen@gmail.com>
Cc: Vegard Nossum <vegardno@ifi.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:04 -08:00
Ingo Molnar
b3d9a13681 Merge branch 'linus' into x86/asm, to pick up fixes and resolve conflicts
Conflicts:
	arch/x86/kernel/cpu/Makefile

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-07 10:53:06 +01:00
Linus Torvalds
ead751507d License cleanup: add SPDX license identifiers to some files
Many source files in the tree are missing licensing information, which
 makes it harder for compliance tools to determine the correct license.
 
 By default all files without license information are under the default
 license of the kernel, which is GPL version 2.
 
 Update the files which contain no license information with the 'GPL-2.0'
 SPDX license identifier.  The SPDX identifier is a legally binding
 shorthand, which can be used instead of the full boiler plate text.
 
 This patch is based on work done by Thomas Gleixner and Kate Stewart and
 Philippe Ombredanne.
 
 How this work was done:
 
 Patches were generated and checked against linux-4.14-rc6 for a subset of
 the use cases:
  - file had no licensing information it it.
  - file was a */uapi/* one with no licensing information in it,
  - file was a */uapi/* one with existing licensing information,
 
 Further patches will be generated in subsequent months to fix up cases
 where non-standard license headers were used, and references to license
 had to be inferred by heuristics based on keywords.
 
 The analysis to determine which SPDX License Identifier to be applied to
 a file was done in a spreadsheet of side by side results from of the
 output of two independent scanners (ScanCode & Windriver) producing SPDX
 tag:value files created by Philippe Ombredanne.  Philippe prepared the
 base worksheet, and did an initial spot review of a few 1000 files.
 
 The 4.13 kernel was the starting point of the analysis with 60,537 files
 assessed.  Kate Stewart did a file by file comparison of the scanner
 results in the spreadsheet to determine which SPDX license identifier(s)
 to be applied to the file. She confirmed any determination that was not
 immediately clear with lawyers working with the Linux Foundation.
 
 Criteria used to select files for SPDX license identifier tagging was:
  - Files considered eligible had to be source code files.
  - Make and config files were included as candidates if they contained >5
    lines of source
  - File already had some variant of a license header in it (even if <5
    lines).
 
 All documentation files were explicitly excluded.
 
 The following heuristics were used to determine which SPDX license
 identifiers to apply.
 
  - when both scanners couldn't find any license traces, file was
    considered to have no license information in it, and the top level
    COPYING file license applied.
 
    For non */uapi/* files that summary was:
 
    SPDX license identifier                            # files
    ---------------------------------------------------|-------
    GPL-2.0                                              11139
 
    and resulted in the first patch in this series.
 
    If that file was a */uapi/* path one, it was "GPL-2.0 WITH
    Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
 
    SPDX license identifier                            # files
    ---------------------------------------------------|-------
    GPL-2.0 WITH Linux-syscall-note                        930
 
    and resulted in the second patch in this series.
 
  - if a file had some form of licensing information in it, and was one
    of the */uapi/* ones, it was denoted with the Linux-syscall-note if
    any GPL family license was found in the file or had no licensing in
    it (per prior point).  Results summary:
 
    SPDX license identifier                            # files
    ---------------------------------------------------|------
    GPL-2.0 WITH Linux-syscall-note                       270
    GPL-2.0+ WITH Linux-syscall-note                      169
    ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
    ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
    LGPL-2.1+ WITH Linux-syscall-note                      15
    GPL-1.0+ WITH Linux-syscall-note                       14
    ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
    LGPL-2.0+ WITH Linux-syscall-note                       4
    LGPL-2.1 WITH Linux-syscall-note                        3
    ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
    ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
 
    and that resulted in the third patch in this series.
 
  - when the two scanners agreed on the detected license(s), that became
    the concluded license(s).
 
  - when there was disagreement between the two scanners (one detected a
    license but the other didn't, or they both detected different
    licenses) a manual inspection of the file occurred.
 
  - In most cases a manual inspection of the information in the file
    resulted in a clear resolution of the license that should apply (and
    which scanner probably needed to revisit its heuristics).
 
  - When it was not immediately clear, the license identifier was
    confirmed with lawyers working with the Linux Foundation.
 
  - If there was any question as to the appropriate license identifier,
    the file was flagged for further research and to be revisited later
    in time.
 
 In total, over 70 hours of logged manual review was done on the
 spreadsheet to determine the SPDX license identifiers to apply to the
 source files by Kate, Philippe, Thomas and, in some cases, confirmation
 by lawyers working with the Linux Foundation.
 
 Kate also obtained a third independent scan of the 4.13 code base from
 FOSSology, and compared selected files where the other two scanners
 disagreed against that SPDX file, to see if there was new insights.  The
 Windriver scanner is based on an older version of FOSSology in part, so
 they are related.
 
 Thomas did random spot checks in about 500 files from the spreadsheets
 for the uapi headers and agreed with SPDX license identifier in the
 files he inspected. For the non-uapi files Thomas did random spot checks
 in about 15000 files.
 
 In initial set of patches against 4.14-rc6, 3 files were found to have
 copy/paste license identifier errors, and have been fixed to reflect the
 correct identifier.
 
 Additionally Philippe spent 10 hours this week doing a detailed manual
 inspection and review of the 12,461 patched files from the initial patch
 version early this week with:
  - a full scancode scan run, collecting the matched texts, detected
    license ids and scores
  - reviewing anything where there was a license detected (about 500+
    files) to ensure that the applied SPDX license was correct
  - reviewing anything where there was no detection but the patch license
    was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
    SPDX license was correct
 
 This produced a worksheet with 20 files needing minor correction.  This
 worksheet was then exported into 3 different .csv files for the
 different types of files to be modified.
 
 These .csv files were then reviewed by Greg.  Thomas wrote a script to
 parse the csv files and add the proper SPDX tag to the file, in the
 format that the file expected.  This script was further refined by Greg
 based on the output to detect more types of files automatically and to
 distinguish between header and source .c files (which need different
 comment types.)  Finally Greg ran the script using the .csv files to
 generate the patches.
 
 Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
 Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
 Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWfswbQ8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykvEwCfXU1MuYFQGgMdDmAZXEc+xFXZvqgAoKEcHDNA
 6dVh26uchcEQLN/XqUDt
 =x306
 -----END PGP SIGNATURE-----

Merge tag 'spdx_identifiers-4.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull initial SPDX identifiers from Greg KH:
 "License cleanup: add SPDX license identifiers to some files

  Many source files in the tree are missing licensing information, which
  makes it harder for compliance tools to determine the correct license.

  By default all files without license information are under the default
  license of the kernel, which is GPL version 2.

  Update the files which contain no license information with the
  'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally
  binding shorthand, which can be used instead of the full boiler plate
  text.

  This patch is based on work done by Thomas Gleixner and Kate Stewart
  and Philippe Ombredanne.

  How this work was done:

  Patches were generated and checked against linux-4.14-rc6 for a subset
  of the use cases:

   - file had no licensing information it it.

   - file was a */uapi/* one with no licensing information in it,

   - file was a */uapi/* one with existing licensing information,

  Further patches will be generated in subsequent months to fix up cases
  where non-standard license headers were used, and references to
  license had to be inferred by heuristics based on keywords.

  The analysis to determine which SPDX License Identifier to be applied
  to a file was done in a spreadsheet of side by side results from of
  the output of two independent scanners (ScanCode & Windriver)
  producing SPDX tag:value files created by Philippe Ombredanne.
  Philippe prepared the base worksheet, and did an initial spot review
  of a few 1000 files.

  The 4.13 kernel was the starting point of the analysis with 60,537
  files assessed. Kate Stewart did a file by file comparison of the
  scanner results in the spreadsheet to determine which SPDX license
  identifier(s) to be applied to the file. She confirmed any
  determination that was not immediately clear with lawyers working with
  the Linux Foundation.

  Criteria used to select files for SPDX license identifier tagging was:

   - Files considered eligible had to be source code files.

   - Make and config files were included as candidates if they contained
     >5 lines of source

   - File already had some variant of a license header in it (even if <5
     lines).

  All documentation files were explicitly excluded.

  The following heuristics were used to determine which SPDX license
  identifiers to apply.

   - when both scanners couldn't find any license traces, file was
     considered to have no license information in it, and the top level
     COPYING file license applied.

     For non */uapi/* files that summary was:

       SPDX license identifier                            # files
       ---------------------------------------------------|-------
       GPL-2.0                                              11139

     and resulted in the first patch in this series.

     If that file was a */uapi/* path one, it was "GPL-2.0 WITH
     Linux-syscall-note" otherwise it was "GPL-2.0". Results of that
     was:

       SPDX license identifier                            # files
       ---------------------------------------------------|-------
       GPL-2.0 WITH Linux-syscall-note                        930

     and resulted in the second patch in this series.

   - if a file had some form of licensing information in it, and was one
     of the */uapi/* ones, it was denoted with the Linux-syscall-note if
     any GPL family license was found in the file or had no licensing in
     it (per prior point). Results summary:

       SPDX license identifier                            # files
       ---------------------------------------------------|------
       GPL-2.0 WITH Linux-syscall-note                       270
       GPL-2.0+ WITH Linux-syscall-note                      169
       ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
       ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
       LGPL-2.1+ WITH Linux-syscall-note                      15
       GPL-1.0+ WITH Linux-syscall-note                       14
       ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
       LGPL-2.0+ WITH Linux-syscall-note                       4
       LGPL-2.1 WITH Linux-syscall-note                        3
       ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
       ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

     and that resulted in the third patch in this series.

   - when the two scanners agreed on the detected license(s), that
     became the concluded license(s).

   - when there was disagreement between the two scanners (one detected
     a license but the other didn't, or they both detected different
     licenses) a manual inspection of the file occurred.

   - In most cases a manual inspection of the information in the file
     resulted in a clear resolution of the license that should apply
     (and which scanner probably needed to revisit its heuristics).

   - When it was not immediately clear, the license identifier was
     confirmed with lawyers working with the Linux Foundation.

   - If there was any question as to the appropriate license identifier,
     the file was flagged for further research and to be revisited later
     in time.

  In total, over 70 hours of logged manual review was done on the
  spreadsheet to determine the SPDX license identifiers to apply to the
  source files by Kate, Philippe, Thomas and, in some cases,
  confirmation by lawyers working with the Linux Foundation.

  Kate also obtained a third independent scan of the 4.13 code base from
  FOSSology, and compared selected files where the other two scanners
  disagreed against that SPDX file, to see if there was new insights.
  The Windriver scanner is based on an older version of FOSSology in
  part, so they are related.

  Thomas did random spot checks in about 500 files from the spreadsheets
  for the uapi headers and agreed with SPDX license identifier in the
  files he inspected. For the non-uapi files Thomas did random spot
  checks in about 15000 files.

  In initial set of patches against 4.14-rc6, 3 files were found to have
  copy/paste license identifier errors, and have been fixed to reflect
  the correct identifier.

  Additionally Philippe spent 10 hours this week doing a detailed manual
  inspection and review of the 12,461 patched files from the initial
  patch version early this week with:

   - a full scancode scan run, collecting the matched texts, detected
     license ids and scores

   - reviewing anything where there was a license detected (about 500+
     files) to ensure that the applied SPDX license was correct

   - reviewing anything where there was no detection but the patch
     license was not GPL-2.0 WITH Linux-syscall-note to ensure that the
     applied SPDX license was correct

  This produced a worksheet with 20 files needing minor correction. This
  worksheet was then exported into 3 different .csv files for the
  different types of files to be modified.

  These .csv files were then reviewed by Greg. Thomas wrote a script to
  parse the csv files and add the proper SPDX tag to the file, in the
  format that the file expected. This script was further refined by Greg
  based on the output to detect more types of files automatically and to
  distinguish between header and source .c files (which need different
  comment types.) Finally Greg ran the script using the .csv files to
  generate the patches.

  Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
  Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
  Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
  Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"

* tag 'spdx_identifiers-4.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  License cleanup: add SPDX license identifier to uapi header files with a license
  License cleanup: add SPDX license identifier to uapi header files with no license
  License cleanup: add SPDX GPL-2.0 license identifier to files with no license
2017-11-02 10:04:46 -07:00
Greg Kroah-Hartman
b24413180f License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 11:10:55 +01:00
Ricardo Neri
1067f03099 x86/mm: Relocate page fault error codes to traps.h
Up to this point, only fault.c used the definitions of the page fault error
codes. Thus, it made sense to keep them within such file. Other portions of
code might be interested in those definitions too. For instance, the User-
Mode Instruction Prevention emulation code will use such definitions to
emulate a page fault when it is unable to successfully copy the results
of the emulated instructions to user space.

While relocating the error code enumeration, the prefix X86_ is used to
make it consistent with the rest of the definitions in traps.h. Of course,
code using the enumeration had to be updated as well. No functional changes
were performed.

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Andy Lutomirski <luto@kernel.org>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: ricardo.neri@intel.com
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Chen Yucong <slaoub@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Link: https://lkml.kernel.org/r/1509135945-13762-2-git-send-email-ricardo.neri-calderon@linux.intel.com
2017-11-01 21:50:07 +01:00
Vlastimil Babka
cb0631fd3c x86/mm: fix use-after-free of vma during userfaultfd fault
Syzkaller with KASAN has reported a use-after-free of vma->vm_flags in
__do_page_fault() with the following reproducer:

  mmap(&(0x7f0000000000/0xfff000)=nil, 0xfff000, 0x3, 0x32, 0xffffffffffffffff, 0x0)
  mmap(&(0x7f0000011000/0x3000)=nil, 0x3000, 0x1, 0x32, 0xffffffffffffffff, 0x0)
  r0 = userfaultfd(0x0)
  ioctl$UFFDIO_API(r0, 0xc018aa3f, &(0x7f0000002000-0x18)={0xaa, 0x0, 0x0})
  ioctl$UFFDIO_REGISTER(r0, 0xc020aa00, &(0x7f0000019000)={{&(0x7f0000012000/0x2000)=nil, 0x2000}, 0x1, 0x0})
  r1 = gettid()
  syz_open_dev$evdev(&(0x7f0000013000-0x12)="2f6465762f696e7075742f6576656e742300", 0x0, 0x0)
  tkill(r1, 0x7)

The vma should be pinned by mmap_sem, but handle_userfault() might (in a
return to userspace scenario) release it and then acquire again, so when
we return to __do_page_fault() (with other result than VM_FAULT_RETRY),
the vma might be gone.

Specifically, per Andrea the scenario is
 "A return to userland to repeat the page fault later with a
  VM_FAULT_NOPAGE retval (potentially after handling any pending signal
  during the return to userland). The return to userland is identified
  whenever FAULT_FLAG_USER|FAULT_FLAG_KILLABLE are both set in
  vmf->flags"

However, since commit a3c4fb7c9c ("x86/mm: Fix fault error path using
unsafe vma pointer") there is a vma_pkey() read of vma->vm_flags after
that point, which can thus become use-after-free.  Fix this by moving
the read before calling handle_mm_fault().

Reported-by: syzbot <bot+6a5269ce759a7bb12754ed9622076dc93f65a1f6@syzkaller.appspotmail.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Suggested-by: Kirill A. Shutemov <kirill@shutemov.name>
Fixes: 3c4fb7c9c2e ("x86/mm: Fix fault error path using unsafe vma pointer")
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-01 08:09:58 -07:00
Laurent Dufour
a3c4fb7c9c x86/mm: Fix fault error path using unsafe vma pointer
commit 7b2d0dbac4 ("x86/mm/pkeys: Pass VMA down in to fault signal
generation code") passes down a vma pointer to the error path, but that is
done once the mmap_sem is released when calling mm_fault_error() from
__do_page_fault().

This is dangerous as the vma structure is no more safe to be used once the
mmap_sem has been released. As only the protection key value is required in
the error processing, we could just pass down this value.

Fix it by passing a pointer to a protection key value down to the fault
signal generation code. The use of a pointer allows to keep the check
generating a warning message in fill_sig_info_pkey() when the vma was not
known. If the pointer is valid, the protection value can be accessed by
deferencing the pointer.

[ tglx: Made *pkey u32 as that's the type which is passed in siginfo ]

Fixes: 7b2d0dbac4 ("x86/mm/pkeys: Pass VMA down in to fault signal generation code")
Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1504513935-12742-1-git-send-email-ldufour@linux.vnet.ibm.com
2017-09-25 09:36:15 +02:00
Josh Poimboeuf
f5caf621ee x86/asm: Fix inline asm call constraints for Clang
For inline asm statements which have a CALL instruction, we list the
stack pointer as a constraint to convince GCC to ensure the frame
pointer is set up first:

  static inline void foo()
  {
	register void *__sp asm(_ASM_SP);
	asm("call bar" : "+r" (__sp))
  }

Unfortunately, that pattern causes Clang to corrupt the stack pointer.

The fix is easy: convert the stack pointer register variable to a global
variable.

It should be noted that the end result is different based on the GCC
version.  With GCC 6.4, this patch has exactly the same result as
before:

	defconfig	defconfig-nofp	distro		distro-nofp
 before	9820389		9491555		8816046		8516940
 after	9820389		9491555		8816046		8516940

With GCC 7.2, however, GCC's behavior has changed.  It now changes its
behavior based on the conversion of the register variable to a global.
That somehow convinces it to *always* set up the frame pointer before
inserting *any* inline asm.  (Therefore, listing the variable as an
output constraint is a no-op and is no longer necessary.)  It's a bit
overkill, but the performance impact should be negligible.  And in fact,
there's a nice improvement with frame pointers disabled:

	defconfig	defconfig-nofp	distro		distro-nofp
 before	9796316		9468236		9076191		8790305
 after	9796957		9464267		9076381		8785949

So in summary, while listing the stack pointer as an output constraint
is no longer necessary for newer versions of GCC, it's still needed for
older versions.

Suggested-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reported-by: Matthias Kaehlcke <mka@chromium.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Miguel Bernal Marin <miguel.bernal.marin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/3db862e970c432ae823cf515c52b54fec8270e0e.1505942196.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-09-23 15:06:20 +02:00