Commit Graph

141141 Commits

Author SHA1 Message Date
Michal Hocko
fcdaf842bd mm, sparse: do not swamp log with huge vmemmap allocation failures
While doing memory hotplug tests under heavy memory pressure we have
noticed too many page allocation failures when allocating vmemmap memmap
backed by huge page

  kworker/u3072:1: page allocation failure: order:9, mode:0x24084c0(GFP_KERNEL|__GFP_REPEAT|__GFP_ZERO)
  [...]
  Call Trace:
    dump_trace+0x59/0x310
    show_stack_log_lvl+0xea/0x170
    show_stack+0x21/0x40
    dump_stack+0x5c/0x7c
    warn_alloc_failed+0xe2/0x150
    __alloc_pages_nodemask+0x3ed/0xb20
    alloc_pages_current+0x7f/0x100
    vmemmap_alloc_block+0x79/0xb6
    __vmemmap_alloc_block_buf+0x136/0x145
    vmemmap_populate+0xd2/0x2b9
    sparse_mem_map_populate+0x23/0x30
    sparse_add_one_section+0x68/0x18e
    __add_pages+0x10a/0x1d0
    arch_add_memory+0x4a/0xc0
    add_memory_resource+0x89/0x160
    add_memory+0x6d/0xd0
    acpi_memory_device_add+0x181/0x251
    acpi_bus_attach+0xfd/0x19b
    acpi_bus_scan+0x59/0x69
    acpi_device_hotplug+0xd2/0x41f
    acpi_hotplug_work_fn+0x1a/0x23
    process_one_work+0x14e/0x410
    worker_thread+0x116/0x490
    kthread+0xbd/0xe0
    ret_from_fork+0x3f/0x70

and we do see many of those because essentially every allocation fails
for each memory section.  This is an excessive way to tell the user that
there is nothing to really worry about because we do have a fallback
mechanism to use base pages.  The only downside might be a performance
degradation due to TLB pressure.

This patch changes vmemmap_alloc_block() to use __GFP_NOWARN and warn
explicitly once on the first allocation failure.  This will reduce the
noise in the kernel log considerably, while we still have an indication
that a performance might be impacted.

[mhocko@kernel.org: forgot to git add the follow up fix]
  Link: http://lkml.kernel.org/r/20171107090635.c27thtse2lchjgvb@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/20171106092228.31098-1-mhocko@kernel.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Cc: Joe Perches <joe@perches.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Khalid Aziz <khalid.aziz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:07 -08:00
Mel Gorman
2d4894b5d2 mm: remove cold parameter from free_hot_cold_page*
Most callers users of free_hot_cold_page claim the pages being released
are cache hot.  The exception is the page reclaim paths where it is
likely that enough pages will be freed in the near future that the
per-cpu lists are going to be recycled and the cache hotness information
is lost.  As no one really cares about the hotness of pages being
released to the allocator, just ditch the parameter.

The APIs are renamed to indicate that it's no longer about hot/cold
pages.  It should also be less confusing as there are subtle differences
between them.  __free_pages drops a reference and frees a page when the
refcount reaches zero.  free_hot_cold_page handled pages whose refcount
was already zero which is non-obvious from the name.  free_unref_page
should be more obvious.

No performance impact is expected as the overhead is marginal.  The
parameter is removed simply because it is a bit stupid to have a useless
parameter copied everywhere.

[mgorman@techsingularity.net: add pages to head, not tail]
  Link: http://lkml.kernel.org/r/20171019154321.qtpzaeftoyyw4iey@techsingularity.net
Link: http://lkml.kernel.org/r/20171018075952.10627-8-mgorman@techsingularity.net
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:06 -08:00
Pavel Tatashin
78c943662f sparc64: optimize struct page zeroing
Add an optimized mm_zero_struct_page(), so struct page's are zeroed
without calling memset().  We do eight to ten regular stores based on
the size of struct page.  Compiler optimizes out the conditions of
switch() statement.

SPARC-M6 with 15T of memory, single thread performance:

                               BASE            FIX  OPTIMIZED_FIX
        bootmem_init   28.440467985s   2.305674818s   2.305161615s
free_area_init_nodes  202.845901673s 225.343084508s 172.556506560s
                      --------------------------------------------
Total                 231.286369658s 227.648759326s 174.861668175s

BASE:  current linux
FIX:   This patch series without "optimized struct page zeroing"
OPTIMIZED_FIX: This patch series including the current patch.

bootmem_init() is where memory for struct pages is zeroed during
allocation.  Note, about two seconds in this function is a fixed time:
it does not increase as memory is increased.

Link: http://lkml.kernel.org/r/20171013173214.27300-11-pasha.tatashin@oracle.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Acked-by: David S. Miller <davem@davemloft.net>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:05 -08:00
Will Deacon
e17d8025f0 arm64/mm/kasan: don't use vmemmap_populate() to initialize shadow
The kasan shadow is currently mapped using vmemmap_populate() since that
provides a semi-convenient way to map pages into init_top_pgt.  However,
since that no longer zeroes the mapped pages, it is not suitable for
kasan, which requires zeroed shadow memory.

Add kasan_populate_shadow() interface and use it instead of
vmemmap_populate().  Besides, this allows us to take advantage of
gigantic pages and use them to populate the shadow, which should save us
some memory wasted on page tables and reduce TLB pressure.

Link: http://lkml.kernel.org/r/20171103185147.2688-3-pasha.tatashin@oracle.com
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Steven Sistare <steven.sistare@oracle.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Bob Picco <bob.picco@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:05 -08:00
Andrey Ryabinin
d17a1d97dc x86/mm/kasan: don't use vmemmap_populate() to initialize shadow
The kasan shadow is currently mapped using vmemmap_populate() since that
provides a semi-convenient way to map pages into init_top_pgt.  However,
since that no longer zeroes the mapped pages, it is not suitable for
kasan, which requires zeroed shadow memory.

Add kasan_populate_shadow() interface and use it instead of
vmemmap_populate().  Besides, this allows us to take advantage of
gigantic pages and use them to populate the shadow, which should save us
some memory wasted on page tables and reduce TLB pressure.

Link: http://lkml.kernel.org/r/20171103185147.2688-2-pasha.tatashin@oracle.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Cc: Steven Sistare <steven.sistare@oracle.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: Bob Picco <bob.picco@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:05 -08:00
Pavel Tatashin
df8ee57889 sparc64: simplify vmemmap_populate
Remove duplicating code by using common functions vmemmap_pud_populate
and vmemmap_pgd_populate.

Link: http://lkml.kernel.org/r/20171013173214.27300-5-pasha.tatashin@oracle.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:05 -08:00
Pavel Tatashin
2a20aa1710 sparc64/mm: set fields in deferred pages
Without deferred struct page feature (CONFIG_DEFERRED_STRUCT_PAGE_INIT),
flags and other fields in "struct page"es are never changed prior to
first initializing struct pages by going through __init_single_page().

With deferred struct page feature enabled there is a case where we set
some fields prior to initializing:

mem_init() {
     register_page_bootmem_info();
     free_all_bootmem();
     ...
}

When register_page_bootmem_info() is called only non-deferred struct
pages are initialized.  But, this function goes through some reserved
pages which might be part of the deferred, and thus are not yet
initialized.

mem_init
register_page_bootmem_info
register_page_bootmem_info_node
 get_page_bootmem
  .. setting fields here ..
  such as: page->freelist = (void *)type;

free_all_bootmem()
free_low_memory_core_early()
 for_each_reserved_mem_region()
  reserve_bootmem_region()
   init_reserved_page() <- Only if this is deferred reserved page
    __init_single_pfn()
     __init_single_page()
      memset(0) <-- Loose the set fields here

We end up with similar issue as in the previous patch, where currently
we do not observe problem as memory is zeroed.  But, if flag asserts are
changed we can start hitting issues.

Also, because in this patch series we will stop zeroing struct page
memory during allocation, we must make sure that struct pages are
properly initialized prior to using them.

The deferred-reserved pages are initialized in free_all_bootmem().
Therefore, the fix is to switch the above calls.

Link: http://lkml.kernel.org/r/20171013173214.27300-4-pasha.tatashin@oracle.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:05 -08:00
Pavel Tatashin
353b1e7b58 x86/mm: set fields in deferred pages
Without deferred struct page feature (CONFIG_DEFERRED_STRUCT_PAGE_INIT),
flags and other fields in "struct page"es are never changed prior to
first initializing struct pages by going through __init_single_page().

With deferred struct page feature enabled, however, we set fields in
register_page_bootmem_info that are subsequently clobbered right after
in free_all_bootmem:

        mem_init() {
                register_page_bootmem_info();
                free_all_bootmem();
                ...
        }

When register_page_bootmem_info() is called only non-deferred struct
pages are initialized.  But, this function goes through some reserved
pages which might be part of the deferred, and thus are not yet
initialized.

  mem_init
   register_page_bootmem_info
    register_page_bootmem_info_node
     get_page_bootmem
      .. setting fields here ..
      such as: page->freelist = (void *)type;

  free_all_bootmem()
   free_low_memory_core_early()
    for_each_reserved_mem_region()
     reserve_bootmem_region()
      init_reserved_page() <- Only if this is deferred reserved page
       __init_single_pfn()
        __init_single_page()
            memset(0) <-- Loose the set fields here

We end up with issue where, currently we do not observe problem as
memory is explicitly zeroed.  But, if flag asserts are changed we can
start hitting issues.

Also, because in this patch series we will stop zeroing struct page
memory during allocation, we must make sure that struct pages are
properly initialized prior to using them.

The deferred-reserved pages are initialized in free_all_bootmem().
Therefore, the fix is to switch the above calls.

Link: http://lkml.kernel.org/r/20171013173214.27300-3-pasha.tatashin@oracle.com
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
Tested-by: Bob Picco <bob.picco@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:05 -08:00
Levin, Alexander (Sasha Levin)
4675ff05de kmemcheck: rip it out
Fix up makefiles, remove references, and git rm kmemcheck.

Link: http://lkml.kernel.org/r/20171007030159.22241-4-alexander.levin@verizon.com
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Vegard Nossum <vegardno@ifi.uio.no>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Tim Hansen <devtimhansen@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:05 -08:00
Levin, Alexander (Sasha Levin)
d8be75663c kmemcheck: remove whats left of NOTRACK flags
Now that kmemcheck is gone, we don't need the NOTRACK flags.

Link: http://lkml.kernel.org/r/20171007030159.22241-5-alexander.levin@verizon.com
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim Hansen <devtimhansen@gmail.com>
Cc: Vegard Nossum <vegardno@ifi.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:05 -08:00
Levin, Alexander (Sasha Levin)
75f296d93b kmemcheck: stop using GFP_NOTRACK and SLAB_NOTRACK
Convert all allocations that used a NOTRACK flag to stop using it.

Link: http://lkml.kernel.org/r/20171007030159.22241-3-alexander.levin@verizon.com
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim Hansen <devtimhansen@gmail.com>
Cc: Vegard Nossum <vegardno@ifi.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:04 -08:00
Levin, Alexander (Sasha Levin)
4950276672 kmemcheck: remove annotations
Patch series "kmemcheck: kill kmemcheck", v2.

As discussed at LSF/MM, kill kmemcheck.

KASan is a replacement that is able to work without the limitation of
kmemcheck (single CPU, slow).  KASan is already upstream.

We are also not aware of any users of kmemcheck (or users who don't
consider KASan as a suitable replacement).

The only objection was that since KASAN wasn't supported by all GCC
versions provided by distros at that time we should hold off for 2
years, and try again.

Now that 2 years have passed, and all distros provide gcc that supports
KASAN, kill kmemcheck again for the very same reasons.

This patch (of 4):

Remove kmemcheck annotations, and calls to kmemcheck from the kernel.

[alexander.levin@verizon.com: correctly remove kmemcheck call from dma_map_sg_attrs]
  Link: http://lkml.kernel.org/r/20171012192151.26531-1-alexander.levin@verizon.com
Link: http://lkml.kernel.org/r/20171007030159.22241-2-alexander.levin@verizon.com
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tim Hansen <devtimhansen@gmail.com>
Cc: Vegard Nossum <vegardno@ifi.uio.no>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:04 -08:00
Kirill A. Shutemov
c4812909f5 mm: introduce wrappers to access mm->nr_ptes
Let's add wrappers for ->nr_ptes with the same interface as for nr_pmd
and nr_pud.

The patch also makes nr_ptes accounting dependent onto CONFIG_MMU.  Page
table accounting doesn't make sense if you don't have page tables.

It's preparation for consolidation of page-table counters in mm_struct.

Link: http://lkml.kernel.org/r/20171006100651.44742-1-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:04 -08:00
Kirill A. Shutemov
b4e98d9ac7 mm: account pud page tables
On a machine with 5-level paging support a process can allocate
significant amount of memory and stay unnoticed by oom-killer and memory
cgroup.  The trick is to allocate a lot of PUD page tables.  We don't
account PUD page tables, only PMD and PTE.

We already addressed the same issue for PMD page tables, see commit
dc6c9a35b6 ("mm: account pmd page tables to the process").
Introduction of 5-level paging brings the same issue for PUD page
tables.

The patch expands accounting to PUD level.

[kirill.shutemov@linux.intel.com: s/pmd_t/pud_t/]
  Link: http://lkml.kernel.org/r/20171004074305.x35eh5u7ybbt5kar@black.fi.intel.com
[heiko.carstens@de.ibm.com: s390/mm: fix pud table accounting]
  Link: http://lkml.kernel.org/r/20171103090551.18231-1-heiko.carstens@de.ibm.com
Link: http://lkml.kernel.org/r/20171002080427.3320-1-kirill.shutemov@linux.intel.com
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:04 -08:00
Michal Hocko
8745808fda mm, arch: remove empty_bad_page*
empty_bad_page() and empty_bad_pte_table() seem to be relics from old
days which is not used by any code for a long time.  I have tried to
find when exactly but this is not really all that straightforward due to
many code movements - traces disappear around 2.4 times.

Anyway no code really references neither empty_bad_page nor
empty_bad_pte_table.  We only allocate the storage which is not used by
anybody so remove them.

Link: http://lkml.kernel.org/r/20171004150045.30755-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Ralf Baechle <ralf@linus-mips.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: David Howells <dhowells@redhat.com>
Cc: Rich Felker <dalias@libc.org>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:03 -08:00
Geert Uytterhoeven
c95f121142 m32r: fix endianness constraints
The m32r Kconfig provides both CPU_BIG_ENDIAN and CPU_LITTLE_ENDIAN
configuration options.  As they are user-selectable and independent,
this allows invalid configurations:

  - All m32r defconfigs build a big endian kernel, but CPU_BIG_ENDIAN is
    not set, causing compiler warnings like:

	include/linux/byteorder/big_endian.h:7:2: warning: #warning inconsistent configuration, needs CONFIG_CPU_BIG_ENDIAN [-Wcpp]
	 #warning inconsistent configuration, needs CONFIG_CPU_BIG_ENDIAN
	  ^

  - Since commit 5bdfca6435 ("m32r: define CPU_BIG_ENDIAN"),
    building an allmodconfig or allyesconfig enables both
    CONFIG_CPU_BIG_ENDIAN and CONFIG_CPU_LITTLE_ENDIAN.
    While this did get rid of the warning above, both options are
    obviously mutually exclusive.

Fix this by making only CPU_LITTLE_ENDIAN configurable by the user, as
before, and by making sure exactly one of CPU_BIG_ENDIAN and
CPU_LITTLE_ENDIAN is always enabled.

Link: http://lkml.kernel.org/r/1509361505-18150-1-git-send-email-geert@linux-m68k.org
Fixes: 5bdfca6435 ("m32r: define CPU_BIG_ENDIAN")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-11-15 18:21:00 -08:00
Masahiro Yamada
8a78756eb5 kbuild: create object directories simpler and faster
For the out-of-tree build, scripts/Makefile.build creates output
directories, but this operation is not efficient.

scripts/Makefile.lib calculates obj-dirs as follows:

  obj-dirs := $(dir $(multi-objs) $(obj-y))

Please notice $(sort ...) is not used here.  Usually the result is
as many "./" as objects here.

For a lot of duplicated paths, the following command is invoked.

  _dummy := $(foreach d,$(obj-dirs), $(shell [ -d $(d) ] || mkdir -p $(d)))

Then, the costly shell command is run over and over again.

I see many points for optimization:

[1] Use $(sort ...) to cut down duplicated paths before passing them
    to system call
[2] Use single $(shell ...) instead of repeating it with $(foreach ...)
    This will reduce forking.
[3] We can calculate obj-dirs more simply.  Most of objects are already
    accumulated in $(targets).  So, $(dir $(targets)) is fine and more
    comprehensive.

I also removed ugly code in arch/x86/entry/vdso/Makefile.  This is now
really unnecessary.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Douglas Anderson <dianders@chromium.org>
2017-11-16 09:07:35 +09:00
Linus Torvalds
1b6115fbe3 Merge tag 'pci-v4.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:

  - detach driver before tearing down procfs/sysfs (Alex Williamson)

  - disable PCIe services during shutdown (Sinan Kaya)

  - fix ASPM oops on systems with no Root Ports (Ard Biesheuvel)

  - fix ASPM LTR_L1.2_THRESHOLD programming (Bjorn Helgaas)

  - fix ASPM Common_Mode_Restore_Time computation (Bjorn Helgaas)

  - fix portdrv MSI/MSI-X vector allocation (Dongdong Liu, Bjorn
    Helgaas)

  - report non-fatal AER errors only to the affected endpoint (Gabriele
    Paoloni)

  - distribute bus numbers, MMIO, and I/O space among hotplug bridges to
    allow more devices to be hot-added (Mika Westerberg)

  - fix pciehp races during initialization and surprise link down (Mika
    Westerberg)

  - handle surprise-removed devices in PME handling (Qiang)

  - support resizable BARs for large graphics devices (Christian König)

  - expose SR-IOV offset, stride, and VF device ID via sysfs (Filippo
    Sironi)

  - create SR-IOV virtfn/physfn sysfs links before attaching driver
    (Stuart Hayes)

  - fix SR-IOV "ARI Capable Hierarchy" restore issue (Tony Nguyen)

  - enforce Kconfig IOV/REALLOC dependency (Sascha El-Sharkawy)

  - avoid slot reset if bridge itself is broken (Jan Glauber)

  - clean up pci_reset_function() path (Jan H. Schönherr)

  - make pci_map_rom() fail if the option ROM is invalid (Changbin Du)

  - convert timers to timer_setup() (Kees Cook)

  - move PCI_QUIRKS to PCI bus Kconfig menu (Randy Dunlap)

  - constify pci_dev_type and intel_mid_pci_ops (Bhumika Goyal)

  - remove unnecessary pci_dev, pci_bus, resource, pcibios_set_master()
    declarations (Bjorn Helgaas)

  - fix endpoint framework overflows and BUG()s (Dan Carpenter)

  - fix endpoint framework issues (Kishon Vijay Abraham I)

  - avoid broken Cavium CN8xxx bus reset behavior (David Daney)

  - extend Cavium ACS capability quirks (Vadim Lomovtsev)

  - support Synopsys DesignWare RC in ECAM mode (Ard Biesheuvel)

  - turn off dra7xx clocks cleanly on shutdown (Keerthy)

  - fix Faraday probe error path (Wei Yongjun)

  - support HiSilicon STB SoC PCIe host controller (Jianguo Sun)

  - fix Hyper-V interrupt affinity issue (Dexuan Cui)

  - remove useless ACPI warning for Hyper-V pass-through devices (Vitaly
    Kuznetsov)

  - support multiple MSI on iProc (Sandor Bodo-Merle)

  - support Layerscape LS1012a and LS1046a PCIe host controllers (Hou
    Zhiqiang)

  - fix Layerscape default error response (Minghuan Lian)

  - support MSI on Tango host controller (Marc Gonzalez)

  - support Tegra186 PCIe host controller (Manikanta Maddireddy)

  - use generic accessors on Tegra when possible (Thierry Reding)

  - support V3 Semiconductor PCI host controller (Linus Walleij)

* tag 'pci-v4.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (85 commits)
  PCI/ASPM: Add L1 Substates definitions
  PCI/ASPM: Reformat ASPM register definitions
  PCI/ASPM: Use correct capability pointer to program LTR_L1.2_THRESHOLD
  PCI/ASPM: Account for downstream device's Port Common_Mode_Restore_Time
  PCI: xgene: Rename xgene_pcie_probe_bridge() to xgene_pcie_probe()
  PCI: xilinx: Rename xilinx_pcie_link_is_up() to xilinx_pcie_link_up()
  PCI: altera: Rename altera_pcie_link_is_up() to altera_pcie_link_up()
  PCI: Fix kernel-doc build warning
  PCI: Fail pci_map_rom() if the option ROM is invalid
  PCI: Move pci_map_rom() error path
  PCI: Move PCI_QUIRKS to the PCI bus menu
  alpha/PCI: Make pdev_save_srm_config() static
  PCI: Remove unused declarations
  PCI: Remove redundant pci_dev, pci_bus, resource declarations
  PCI: Remove redundant pcibios_set_master() declarations
  PCI/PME: Handle invalid data when reading Root Status
  PCI: hv: Use effective affinity mask
  PCI: pciehp: Do not clear Presence Detect Changed during initialization
  PCI: pciehp: Fix race condition handling surprise link down
  PCI: Distribute available resources to hotplug-capable bridges
  ...
2017-11-15 15:01:28 -08:00
Linus Torvalds
1be2172e96 Merge tag 'modules-for-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux
Pull module updates from Jessica Yu:
 "Summary of modules changes for the 4.15 merge window:

   - treewide module_param_call() cleanup, fix up set/get function
     prototype mismatches, from Kees Cook

   - minor code cleanups"

* tag 'modules-for-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
  module: Do not paper over type mismatches in module_param_call()
  treewide: Fix function prototypes for module_param_call()
  module: Prepare to convert all module_param_call() prototypes
  kernel/module: Delete an error message for a failed memory allocation in add_module_usage()
2017-11-15 13:46:33 -08:00
Linus Torvalds
5bbcc0f595 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Highlights:

   1) Maintain the TCP retransmit queue using an rbtree, with 1GB
      windows at 100Gb this really has become necessary. From Eric
      Dumazet.

   2) Multi-program support for cgroup+bpf, from Alexei Starovoitov.

   3) Perform broadcast flooding in hardware in mv88e6xxx, from Andrew
      Lunn.

   4) Add meter action support to openvswitch, from Andy Zhou.

   5) Add a data meta pointer for BPF accessible packets, from Daniel
      Borkmann.

   6) Namespace-ify almost all TCP sysctl knobs, from Eric Dumazet.

   7) Turn on Broadcom Tags in b53 driver, from Florian Fainelli.

   8) More work to move the RTNL mutex down, from Florian Westphal.

   9) Add 'bpftool' utility, to help with bpf program introspection.
      From Jakub Kicinski.

  10) Add new 'cpumap' type for XDP_REDIRECT action, from Jesper
      Dangaard Brouer.

  11) Support 'blocks' of transformations in the packet scheduler which
      can span multiple network devices, from Jiri Pirko.

  12) TC flower offload support in cxgb4, from Kumar Sanghvi.

  13) Priority based stream scheduler for SCTP, from Marcelo Ricardo
      Leitner.

  14) Thunderbolt networking driver, from Amir Levy and Mika Westerberg.

  15) Add RED qdisc offloadability, and use it in mlxsw driver. From
      Nogah Frankel.

  16) eBPF based device controller for cgroup v2, from Roman Gushchin.

  17) Add some fundamental tracepoints for TCP, from Song Liu.

  18) Remove garbage collection from ipv6 route layer, this is a
      significant accomplishment. From Wei Wang.

  19) Add multicast route offload support to mlxsw, from Yotam Gigi"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2177 commits)
  tcp: highest_sack fix
  geneve: fix fill_info when link down
  bpf: fix lockdep splat
  net: cdc_ncm: GetNtbFormat endian fix
  openvswitch: meter: fix NULL pointer dereference in ovs_meter_cmd_reply_start
  netem: remove unnecessary 64 bit modulus
  netem: use 64 bit divide by rate
  tcp: Namespace-ify sysctl_tcp_default_congestion_control
  net: Protect iterations over net::fib_notifier_ops in fib_seq_sum()
  ipv6: set all.accept_dad to 0 by default
  uapi: fix linux/tls.h userspace compilation error
  usbnet: ipheth: prevent TX queue timeouts when device not ready
  vhost_net: conditionally enable tx polling
  uapi: fix linux/rxrpc.h userspace compilation errors
  net: stmmac: fix LPI transitioning for dwmac4
  atm: horizon: Fix irq release error
  net-sysfs: trigger netlink notification on ifalias change via sysfs
  openvswitch: Using kfree_rcu() to simplify the code
  openvswitch: Make local function ovs_nsh_key_attr_size() static
  openvswitch: Fix return value check in ovs_meter_cmd_features()
  ...
2017-11-15 11:56:19 -08:00
Linus Torvalds
892204e06c Merge tag 'mips_4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips
Pull MIPS updates from James Hogan:
 "These are the main MIPS changes for 4.15.

  Fixes:
   - ralink: Fix MT7620 PCI build issues (4.5)
   - Disable cmpxchg64() and HAVE_VIRT_CPU_ACCOUNTING_GEN for 32-bit SMP
     (4.1)
   - Fix MIPS64 FP save/restore on 32-bit kernels (4.0)
   - ptrace: Pick up ptrace/seccomp changed syscall numbers (3.19)
   - ralink: Fix MT7628 pinmux (3.19)
   - BCM47XX: Fix LED inversion on WRT54GSv1 (3.17)
   - Fix n32 core dumping as o32 since regset support (3.13)
   - ralink: Drop obsolete USB_ARCH_HAS_HCD select

  Build system:
   - Default to "generic" (multiplatform) system type instead of IP22
   - Use generic little endian MIPS32 r2 configuration as default
     defconfig instead of ip22_defconfig

  FPU emulation:
   - Fix exception generation for certain R6 FPU instructions

  SMP:
   - Allow __cpu_number_map to be larger than NR_CPUS for sparse CPU id
     spaces

  Miscellaneous:
   - Add iomem resource for kernel bss section for kexec/kdump
   - Atomics: Nudge writes on bit unlock
   - DT files: Standardise "ok" -> "okay"

  Minor cleanups:
   - Define virt_to_pfn()
   - Make thread_saved_pc static
   - Simplify 32-bit sign extension in __read_64bit_c0_split()
   - DMA: Use vma_pages() helper
   - FPU emulation: Replace unsigned with unsigned int
   - MM: Removed unused lastpfn
   - Alchemy: Make clk_ops const
   - Lasat: Use setup_timer() helper
   - ralink: Use BIT() in MT7620 PCI driver

  Platform support:

  BMIPS:
  - Enable HARDIRQS_SW_RESEND

  Broadcom BCM63XX:
  - Add clkdev lookup support
  - Update clk driver, UART driver, DTs to handle named refclk from DTs
  - Split apart various clocks to more closely match hardware
  - Add ethernet clocks

  Cavium Octeon:
  - Remove usage of cvmx_wait() in favour of __delay()

  ImgTec Pistachio:
  - DT: Drop deprecated dwmmc num-slots property

  Ingenic JZ4780:
  - Add NFS root to Ci20 defconfig
  - Add watchdog to Ci20 DT & defconfig, and allow building of watchdog
    driver with this SoC

  Generic (multiplatform):
  - Migrate xilfpga (MIPSfpga) platform to the generic platform

  Lantiq xway:
  - Fix ASC0/ASC1 clocks"

* tag 'mips_4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips: (46 commits)
  MIPS: Add iomem resource for kernel bss section.
  MIPS: cmpxchg64() and HAVE_VIRT_CPU_ACCOUNTING_GEN don't work for 32-bit SMP
  MIPS: BMIPS: Enable HARDIRQS_SW_RESEND
  MIPS: pci: Make use of the BIT() macro inside the mt7620 driver
  MIPS: pci: Remove KERN_WARN instance inside the mt7620 driver
  MIPS: pci: Remove duplicate define in mt7620 driver
  MIPS: ralink: Fix typo in mt7628 pinmux function
  MIPS: ralink: Fix MT7628 pinmux
  MIPS: Fix odd fp register warnings with MIPS64r2
  watchdog: jz4780: Allow selection of jz4740-wdt driver
  MIPS/ptrace: Update syscall nr on register changes
  MIPS/ptrace: Pick up ptrace/seccomp changed syscalls
  MIPS: Fix an n32 core file generation regset support regression
  MIPS: Fix MIPS64 FP save/restore on 32-bit kernels
  MIPS: page.h: Define virt_to_pfn()
  MIPS: Xilfpga: Switch to using generic defconfigs
  MIPS: generic: Add support for MIPSfpga
  MIPS: Set defconfig target to a generic system for 32r2el
  MIPS: Kconfig: Set default MIPS system type as generic
  MIPS: DTS: Remove num-slots from Pistachio SoC
  ...
2017-11-15 11:36:08 -08:00
Linus Torvalds
c9b012e5f4 Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Will Deacon:
 "The big highlight is support for the Scalable Vector Extension (SVE)
  which required extensive ABI work to ensure we don't break existing
  applications by blowing away their signal stack with the rather large
  new vector context (<= 2 kbit per vector register). There's further
  work to be done optimising things like exception return, but the ABI
  is solid now.

  Much of the line count comes from some new PMU drivers we have, but
  they're pretty self-contained and I suspect we'll have more of them in
  future.

  Plenty of acronym soup here:

   - initial support for the Scalable Vector Extension (SVE)

   - improved handling for SError interrupts (required to handle RAS
     events)

   - enable GCC support for 128-bit integer types

   - remove kernel text addresses from backtraces and register dumps

   - use of WFE to implement long delay()s

   - ACPI IORT updates from Lorenzo Pieralisi

   - perf PMU driver for the Statistical Profiling Extension (SPE)

   - perf PMU driver for Hisilicon's system PMUs

   - misc cleanups and non-critical fixes"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (97 commits)
  arm64: Make ARMV8_DEPRECATED depend on SYSCTL
  arm64: Implement __lshrti3 library function
  arm64: support __int128 on gcc 5+
  arm64/sve: Add documentation
  arm64/sve: Detect SVE and activate runtime support
  arm64/sve: KVM: Hide SVE from CPU features exposed to guests
  arm64/sve: KVM: Treat guest SVE use as undefined instruction execution
  arm64/sve: KVM: Prevent guests from using SVE
  arm64/sve: Add sysctl to set the default vector length for new processes
  arm64/sve: Add prctl controls for userspace vector length management
  arm64/sve: ptrace and ELF coredump support
  arm64/sve: Preserve SVE registers around EFI runtime service calls
  arm64/sve: Preserve SVE registers around kernel-mode NEON use
  arm64/sve: Probe SVE capabilities and usable vector lengths
  arm64: cpufeature: Move sys_caps_initialised declarations
  arm64/sve: Backend logic for setting the vector length
  arm64/sve: Signal handling support
  arm64/sve: Support vector length resetting for new processes
  arm64/sve: Core task context handling
  arm64/sve: Low-level CPU setup
  ...
2017-11-15 10:56:56 -08:00
Linus Torvalds
b293fca43b Merge tag 'riscv-for-linus-4.15-arch-v9-premerge' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux
Pull RISC-V architecture support from Palmer Dabbelt:
 "This contains the core RISC-V Linux port, which has been through nine
  rounds of review on various mailing lists. The port is not complete:
  there's some cleanup patches moving through the review process, a
  whole bunch of drivers that need some work, and a lot of feature
  additions that will be needed.

  The patches contained in this tag have been through nine rounds of
  review on the various mailing lists. I have some outstanding cleanup
  patches, but since there's been so much review on these patches I
  thought it would be best to submit them as-is and then submit explicit
  cleanup patches so everyone can review them. This first patch set is
  big enough that it's a bit of a pain to constantly rewrite, and it's
  caused a few headaches with various contributors.

  The port is definately a work in progress. While what's there builds
  and boots with 4.14, it's a bit hard to actually see anything happen
  because there are no device drivers yet. I maintain a staging branch
  that contains all the device drivers and cleanup that actually works,
  but those patches won't all be ready for a while. I'd like to get what
  we currently have into your tree so everyone can start working from a
  single base -- of particular importance is allowing the glibc
  upstreaming process to proceed so we can sort out any possibly
  lingering user-visible ABI problems we might have.

  Copied below is the ChangeLog that contains the history of this patch
  set:

   (v9) As per suggestions on our v8 patch set, I've split the core
        architecture code out from our drivers and would like to submit
        this patch set to be included into linux-next, with the goal
        being to be merged in during the next merge window. This patch
        set is based on 4.14-rc2, but if it's better to have it based on
        something else then I can change it around.

        This patch set contains just the core arch code for RISC-V, so
        while it builds an nominally boots, you can't print or take an
        interrupt so it's not that useful. If you're looking to actually
        boot a system it would probably be better to use the full patch
        set listed below.

        We've collected a handful of tags from reviewers, and the
        remainder of the patch set only got minimal feedback last time.
        Here's what changed:

         - We now use the device tree to initialize the timer driver so
           it's less tighly coupled with the arch port.

         - I cleaned up the defconfigs -- there's actually now just one,
           and it's empty. For now I think we're OK with what the kernel
           sets as defaults, but I anticipate we'll begin to expand this
           as people start to use the port more.

         - The VDSO symbols version is sane.

         - We WFI while spinning in the boot loop.

         - A handful of comments have been added.

        While there are still a handful of FIXMEs in this patch set,
        we've started to get enough interest from various users and
        contributors that maintaining an out of tree patch set is
        starting to become a big burden. Hopefully the patches are good
        enough to merge now, which will at least get everyone working in
        a more reasonable manner as we clean up the remaining issues.

   (v8) I know it may not be the ideal time to submit a patch set right
        now, as it's the middle of the merge window, but things have
        calmed down quite a bit in the last month so I thought it would
        be good to get everyone on the same page. There's been a handful
        of changes since the last patch set, but most of them are fairly
        minor:

         - We changed PAGE_OFFSET to allowing mapping more physical
           memory on 64-bit systems. This is user configurable, as it
           triggers a different code model that generates slightly less
           efficient code.

         - The device tree binding documentation is back, I'd managed to
           lose it at some point.

         - We now pass the atomic64 test suite

         - The SBI timer driver has been refactored.

   (v7) It's been a while since my last patch set, but the changes han
        been fairly minimal:

         - The PCI cleanup patches have been dropped, we'll do them as a
           separate patch set later.

         - We've the Kconfig entries from CONFIG_ISA_* to
           CONFIG_RISCV_ISA_*, to make grep easier.

         - There have been a handful of memory model related tweaks in
           I/O land, particularly relating the PCI and the upcoming
           platform specification. There are significant comments in the
           relevant files. This is still a WIP, but I think we're close
           to getting as good as we're going to get until we end up with
           some more specifications.

   (v6) As it's been only a day since the v5 patch set, the changes are
        pretty minimal:

         - The patch set is now based on linux-next/master, which I
           believe is a better base now that we're getting closer to
           upstream.

         - EARLY_PRINTK is no longer an option. Since the SBI console is
           reasonable, there's no penalty to enabling it (and thus no
           benefit to disabling it).

         - The mmap syscalls were refactored a bit.

   (v5) Things have really started to calm down, so this is fairly
        similar to the v4 patch set. The most interesting changes
        include:

         - We've moved back to a single patch set.

         - SMP support has been fixed, I was accidentally running on a
           non-SMP configuration. There were various mistakes all over
           the tree as a result of this.

         - The cmpxchg syscalls have been removed, as they were deemed a
           bad idea. As a result, RISC-V Linux systems mandate the A
           extension. The corresponding Kconfig entry to enable builds
           on non-A systems has been removed.

         - A few more atomic fixes: mostly fence changes, but those
           resulted in a handful of additional macros that were no
           longer necessary.

         - riscv_early_sie has been removed.

   (v4) There have only been a few changes since the v3 patch set:

         - The cmpxchg64 syscall is no longer enabled on 32-bit systems.
           It's not possible to provide this on SMP systems, and it's
           not necessary as glibc knows not to call it.

         - We provide a ELF_HWCAP so users can determine the ISA of the
           machine the kernel is running on.

         - The multi-line comments are in a better form.

         - There were a handful of headers that could be replaced with
           the asm-generic versions, and a few unnecessary definitions.

         - We no longer use printk, but instead use pr_*.

         - A few Kconfig and defconfig entries have been cleaned up.

   (v3) A highlight of the changes since the v2 patch set includes:

         - We've split out all our drivers into separate patch sets,
           which I've already sent out to the relevant maintainers. I
           haven't included those patches in this patch set, but some of
           them are necessary to build our port.

         - The patch set is now split up differently: rather than being
           split per directory it is split per topic. Hopefully this
           will make it easier to review the port on the mailing list.
           The split is a bit rough, so you probably still want to look
           at the patch set as a whole.

         - atomic.h has been completely rewritten and is hopefully now
           correct. I've attempted to sanitize the various other memory
           model related code as well, and I think it should all be sane
           now aside from a handful of FIXMEs commented in the code.

         - We've changed the cmpexchg syscall to always exist and to not
           be multiplexed. There is also a VDSO entry for compare and
           exchange, which allows kernels with the A extension to
           execute user code without the A extension reasonably fast.

         - Our user-visible register state now contains enough space for
           the Q extension for 128-bit floating point, as well as a few
           words to allow extensibility to future ISA extensions like
           the eventual V extension for vectors.

         - A handful of driver cleanups, but these have been split into
           separate patch sets now so I won't duplicate them here.

   (v2) A highlight of the changes since the v1 patch set includes:

         - We've split out our drivers into the right places, which
           means now there's a lot more patches. I'll be submitting
           these patches to various subsystem maintainers and including
           them in any future RISC-V patch sets until they've been
           merged.

         - The SBI console driver has been completely rewritten to use
           the HVC helpers and is now significantly smaller.

         - We've begun to use weaker barriers as opposed to just the big
           "fence". There's still some work to do here, specifically:
            - We need fences in the relaxed MMIO functions.
            - The non-relaxed MMIO functions are missing R/W bits on their fences.
            - Many AMOs need the aq and rl bits set.

         - We now have thread_info in task_struct. As a result, sscratch
           now contains TP instead of SP. This was necessary because
           thread_info is no longer on the stack.

         - A few shared routines have been added that we use instead of
           creating another arch copy"

Reviewed-by: Arnd Bergmann <arnd@arndb.de>

* tag 'riscv-for-linus-4.15-arch-v9-premerge' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/linux:
  RISC-V: Build Infrastructure
  RISC-V: User-facing API
  RISC-V: Paging and MMU
  RISC-V: Device, timer, IRQs, and the SBI
  RISC-V: Task implementation
  RISC-V: ELF and module implementation
  RISC-V: Generic library routines and assembly
  RISC-V: Atomic and Locking Code
  RISC-V: Init and Halt Code
  dt-bindings: RISC-V CPU Bindings
  lib: Add shared copies of some GCC library routines
  MAINTAINERS: Add RISC-V
2017-11-15 10:49:15 -08:00
Rafael J. Wysocki
7d5905dc14 x86 / CPU: Always show current CPU frequency in /proc/cpuinfo
After commit 890da9cf09 (Revert "x86: do not use cpufreq_quick_get()
for /proc/cpuinfo "cpu MHz"") the "cpu MHz" number in /proc/cpuinfo
on x86 can be either the nominal CPU frequency (which is constant)
or the frequency most recently requested by a scaling governor in
cpufreq, depending on the cpufreq configuration.  That is somewhat
inconsistent and is different from what it was before 4.13, so in
order to restore the previous behavior, make it report the current
CPU frequency like the scaling_cur_freq sysfs file in cpufreq.

To that end, modify the /proc/cpuinfo implementation on x86 to use
aperfmperf_snapshot_khz() to snapshot the APERF and MPERF feedback
registers, if available, and use their values to compute the CPU
frequency to be reported as "cpu MHz".

However, do that carefully enough to avoid accumulating delays that
lead to unacceptable access times for /proc/cpuinfo on systems with
many CPUs.  Run aperfmperf_snapshot_khz() once on all CPUs
asynchronously at the /proc/cpuinfo open time, add a single delay
upfront (if necessary) at that point and simply compute the current
frequency while running show_cpuinfo() for each individual CPU.

Also, to avoid slowing down /proc/cpuinfo accesses too much, reduce
the default delay between consecutive APERF and MPERF reads to 10 ms,
which should be sufficient to get large enough numbers for the
frequency computation in all cases.

Fixes: 890da9cf09 (Revert "x86: do not use cpufreq_quick_get() for /proc/cpuinfo "cpu MHz"")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@kernel.org>
2017-11-15 19:46:50 +01:00
Linus Torvalds
9682b3dea2 Merge branch 'for-linus' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina:
 "The usual rocket-science from trivial tree for 4.15"

* 'for-linus' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
  MAINTAINERS: relinquish kconfig
  MAINTAINERS: Update my email address
  treewide: Fix typos in Kconfig
  kfifo: Fix comments
  init/Kconfig: Fix module signing document location
  misc: ibmasm: Return error on error path
  HID: logitech-hidpp: fix mistake in printk, "feeback" -> "feedback"
  MAINTAINERS: Correct path to uDraw PS3 driver
  tracing: Fix doc mistakes in trace sample
  tracing: Kconfig text fixes for CONFIG_HWLAT_TRACER
  MIPS: Alchemy: Remove reverted CONFIG_NETLINK_MMAP from db1xxx_defconfig
  mm/huge_memory.c: fixup grammar in comment
  lib/xz: Add fall-through comments to a switch statement
2017-11-15 10:14:11 -08:00
Eugeniy Paltsev
ff64d695f9 ARC: [plat-axs10x] DTS: Add reset controller node to manage ethernet reset
DW ethernet controller on axs10x hangs sometimes after SW reset.
Invoke the newly aded driver (reset-axs10x.c) by adding the DT bits.

With this in place, we don't need the open-coded quirk in platform
code, so get rid of it as well !

Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>

Signed-off-by: Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2017-11-15 09:40:43 -08:00
Jeff Layton
4d2dc2cc76 fcntl: don't cap l_start and l_end values for F_GETLK64 in compat syscall
Currently, we're capping the values too low in the F_GETLK64 case. The
fields in that structure are 64-bit values, so we shouldn't need to do
any sort of fixup there.

Make sure we check that assumption at build time in the future however
by ensuring that the sizes we're copying will fit.

With this, we no longer need COMPAT_LOFF_T_MAX either, so remove it.

Fixes: 94073ad77f (fs/locks: don't mess with the address limit in compat_fcntl64)
Reported-by: Vitaly Lipatov <lav@etersoft.ru>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: David Howells <dhowells@redhat.com>
2017-11-15 08:08:36 -05:00
Nitin Gupta
70f3c8b7c2 sparc64: Fix page table walk for PUD hugepages
For a PUD hugepage entry, we need to propagate bits [32:22]
from virtual address to resolve at 4M granularity. However,
the current code was incorrectly propagating bits [29:19].
This bug can cause incorrect data to be returned for pages
backed with 16G hugepages.

Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com>
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-15 14:37:43 +09:00
Allen Pais
ff0296877a sparc64: Convert timers to user timer_setup()
Switch to using the new timer_setup() and from_timer()
in LDOM Virtual I/O handshake.

Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Allen Pais <allen.pais@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-15 14:36:27 +09:00
Elena Reshetova
cdf5976fcb sparc64: convert mdesc_handle.refcnt from atomic_t to refcount_t
atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable mdesc_handle.refcnt is used as pure reference counter.
Convert it to refcount_t and fix up the operations.

Suggested-by: Kees Cook <keescook@chromium.org>
Reviewed-by: David Windsor <dwindsor@gmail.com>
Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-15 14:28:23 +09:00
Kees Cook
68fa10dce0 sparc/led: Convert timers to use timer_setup()
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly. Adds a static variable to hold timeout
value.

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Geliang Tang <geliangtang@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: sparclinux@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-15 14:27:50 +09:00
Vijay Kumar
46ad8d2d22 sparc64: Use sparc optimized fls and __fls for T4 and above
For T4 and above, patch fls and __fls functions
at the boot time to use lzcnt instruction.

Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-15 14:26:46 +09:00
Vijay Kumar
2b41ce5df2 sparc64: SPARC optimized __fls function
Defined SPARC optimized __fls using lzcnt opcode.

Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-15 14:26:46 +09:00
Vijay Kumar
70cbec0c53 sparc64: SPARC optimized fls function
Defined SPARC optimized fls using lzcnt opcode.

Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-15 14:26:46 +09:00
Vijay Kumar
be52bbe3ea sparc64: Define SPARC default __fls function
__fls will now require a boot time patching on T4 and above.
Redefining it under arch/sparc/lib.

Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-15 14:26:46 +09:00
Vijay Kumar
41413a6035 sparc64: Define SPARC default fls function
fls will now require a boot time patching on T4 and above.
Redefining it under arch/sparc/lib.

Signed-off-by: Vijay Kumar <vijay.ac.kumar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-15 14:26:45 +09:00
Nagarathnam Muthusamy
9a08862a5d vDSO for sparc
Following patch is based on work done by Nick Alcock on 64-bit vDSO for sparc
in Oracle linux. I have extended it to include support for 32-bit vDSO for sparc
on 64-bit kernel.

vDSO for sparc is based on the X86 implementation. This patch
provides vDSO support for both 64-bit and 32-bit programs on 64-bit kernel.
vDSO will be disabled on 32-bit linux kernel on sparc.

*) vclock_gettime.c contains all the vdso functions. Since data page is mapped
   before the vdso code page, the pointer to data page is got by subracting offset
   from an address in the vdso code page. The return address stored in
   %i7 is used for this purpose.
*) During compilation, both 32-bit and 64-bit vdso images are compiled and are
   converted into raw bytes by vdso2c program to be ready for mapping into the
   process. 32-bit images are compiled only if CONFIG_COMPAT is enabled. vdso2c
   generates two files vdso-image-64.c and vdso-image-32.c which contains the
   respective vDSO image in C structure.
*) During vdso initialization, required number of vdso pages are allocated and
   raw bytes are copied into the pages.
*) During every exec, these pages are mapped into the process through
   arch_setup_additional_pages and the location of mapping is passed on to the
   process through aux vector AT_SYSINFO_EHDR which is used by glibc.
*) A new update_vsyscall routine for sparc is added to keep the data page in
   vdso updated.
*) As vDSO cannot contain dynamically relocatable references, a new version of
   cpu_relax is added for the use of vDSO.

This change also requires a putback to glibc to use vDSO. For testing,
programs planning to try vDSO can be compiled against the generated
vdso(64/32).so in the source.

Testing:

========
[root@localhost ~]# cat vdso_test.c
int main() {
        struct timespec tv_start, tv_end;
        struct timeval tv_tmp;
	int i;
        int count = 1 * 1000 * 10000;
	long long diff;

        clock_gettime(0, &tv_start);
        for (i = 0; i < count; i++)
              gettimeofday(&tv_tmp, NULL);
        clock_gettime(0, &tv_end);
        diff = (long long)(tv_end.tv_sec -
		tv_start.tv_sec)*(1*1000*1000*1000);
        diff += (tv_end.tv_nsec - tv_start.tv_nsec);
	printf("Start sec: %d\n", tv_start.tv_sec);
	printf("End sec  : %d\n", tv_end.tv_sec);
        printf("%d cycles in %lld ns = %f ns/cycle\n", count, diff,
		(double)diff / (double)count);
        return 0;
}

[root@localhost ~]# cc vdso_test.c -o t32_without_fix -m32 -lrt
[root@localhost ~]# ./t32_without_fix
Start sec: 1502396130
End sec  : 1502396140
10000000 cycles in 9565148528 ns = 956.514853 ns/cycle
[root@localhost ~]# cc vdso_test.c -o t32_with_fix -m32 ./vdso32.so.dbg
[root@localhost ~]# ./t32_with_fix
Start sec: 1502396168
End sec  : 1502396169
10000000 cycles in 798141262 ns = 79.814126 ns/cycle
[root@localhost ~]# cc vdso_test.c -o t64_without_fix -m64 -lrt
[root@localhost ~]# ./t64_without_fix
Start sec: 1502396208
End sec  : 1502396218
10000000 cycles in 9846091800 ns = 984.609180 ns/cycle
[root@localhost ~]# cc vdso_test.c -o t64_with_fix -m64 ./vdso64.so.dbg
[root@localhost ~]# ./t64_with_fix
Start sec: 1502396257
End sec  : 1502396257
10000000 cycles in 380984048 ns = 38.098405 ns/cycle

V1 to V2 Changes:
=================
	Added hot patching code to switch the read stick instruction to read
tick instruction based on the hardware.

V2 to V3 Changes:
=================
	Merged latest changes from sparc-next and moved the initialization
of clocksource_tick.archdata.vclock_mode to time_init_early. Disabled
queued spinlock and rwlock configuration when simulating 32-bit config
to compile 32-bit VDSO.

V3 to V4 Changes:
=================
	Hardcoded the page size as 8192 in linker script for both 64-bit and
32-bit binaries. Removed unused variables in vdso2c.h. Added -mv8plus flag to
Makefile to prevent the generation of relocation entries for __lshrdi3 in 32-bit
vdso binary.

Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Signed-off-by: Nagarathnam Muthusamy <nagarathnam.muthusamy@oracle.com>
Reviewed-by: Shannon Nelson <shannon.nelson@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-11-15 14:21:03 +09:00
Michael Ellerman
3ffa9d9e2a powerpc/64s: Fix Power9 DD2.0 workarounds by adding DD2.1 feature
Recently we added a CPU feature for Power9 DD2.0, to capture the fact
that some workarounds are required only on Power9 DD1 and DD2.0 but
not DD2.1 or later.

Then in commit 9d2f510a66 ("powerpc/64s/idle: avoid POWER9 DD1 and
DD2.0 ERAT workaround on DD2.1") and commit e3646330cf
"powerpc/64s/idle: avoid POWER9 DD1 and DD2.0 PMU workaround on
DD2.1") we changed CPU_FTR_SECTIONs to check for DD1 or DD20, eg:

  BEGIN_FTR_SECTION
          PPC_INVALIDATE_ERAT
  END_FTR_SECTION_IFSET(CPU_FTR_POWER9_DD1 | CPU_FTR_POWER9_DD20)

Unfortunately although this reads as "if set DD1 or DD2.0", the or is
a bitwise or and actually generates a mask of both bits. The code that
does the feature patching then checks that the value of the CPU
features masked with that mask are equal to the mask.

So the end result is we're checking for DD1 and DD20 being set, which
never happens. Yes the API is terrible.

Removing the ERAT workaround on DD2.0 results in random SEGVs, the
system tends to boot, but things randomly die including sometimes
dhclient, udev etc.

To fix the problem and hopefully avoid it in future, we remove the
DD2.0 CPU feature and instead add a DD2.1 (or later) feature. This
allows us to easily express that the workarounds are required if DD2.1
is not set.

At some point we will drop the DD1 workarounds entirely and some of
this can be cleaned up.

Fixes: 9d2f510a66 ("powerpc/64s/idle: avoid POWER9 DD1 and DD2.0 ERAT workaround on DD2.1")
Fixes: e3646330cf ("powerpc/64s/idle: avoid POWER9 DD1 and DD2.0 PMU workaround on DD2.1")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2017-11-15 14:25:42 +11:00
Linus Torvalds
37cb8e1f8e Merge tag 'devicetree-for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull DeviceTree updates from Rob Herring:
 "A bigger diffstat than usual with the kbuild changes and a tree wide
  fix in the binding documentation.

  Summary:

   - kbuild cleanups and improvements for dtbs

   - Code clean-up of overlay code and fixing for some long standing
     memory leak and race condition in applying overlays

   - Improvements to DT memory usage making sysfs/kobjects optional and
     skipping unflattening of disabled nodes. This is part of kernel
     tinification efforts.

   - Final piece of removing storing the full path for every DT node.
     The prerequisite conversion of printk's to use device_node format
     specifier happened in 4.14.

   - Sync with current upstream dtc. This brings additional checks to
     dtb compiling.

   - Binding doc tree wide removal of leading 0s from examples

   - RTC binding documentation adding missing devices and some
     consolidation of duplicated bindings

   - Vendor prefix documentation for nutsboard, Silicon Storage
     Technology, shimafuji, Tecon Microprocessor Technologies, DH
     electronics GmbH, Opal Kelly, and Next Thing"

* tag 'devicetree-for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (55 commits)
  dt-bindings: usb: add #phy-cells to usb-nop-xceiv
  dt-bindings: Remove leading zeros from bindings notation
  kbuild: handle dtb-y and CONFIG_OF_ALL_DTBS natively in Makefile.lib
  MIPS: dts: remove bogus bcm96358nb4ser.dtb from dtb-y entry
  kbuild: clean up *.dtb and *.dtb.S patterns from top-level Makefile
  .gitignore: move *.dtb and *.dtb.S patterns to the top-level .gitignore
  .gitignore: sort normal pattern rules alphabetically
  dt-bindings: add vendor prefix for Next Thing Co.
  scripts/dtc: Update to upstream version v1.4.5-6-gc1e55a5513e9
  of: dynamic: fix memory leak related to properties of __of_node_dup
  of: overlay: make pr_err() string unique
  of: overlay: pr_err from return NOTIFY_OK to overlay apply/remove
  of: overlay: remove unneeded check for NULL kbasename()
  of: overlay: remove a dependency on device node full_name
  of: overlay: simplify applying symbols from an overlay
  of: overlay: avoid race condition between applying multiple overlays
  of: overlay: loosen overly strict phandle clash check
  of: overlay: expand check of whether overlay changeset can be removed
  of: overlay: detect cases where device tree may become corrupt
  of: overlay: minor restructuring
  ...
2017-11-14 18:25:40 -08:00
Linus Torvalds
4008e6a9bc Merge branch 'i2c/for-4.15' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c updates from Wolfram Sang:
 "This contains two bigger than usual tree-wide changes this time. They
  all have proper acks, caused no merge conflicts in linux-next where
  they have been for a while. They are namely:

   - to-gpiod conversion of the i2c-gpio driver and its users (touching
     arch/* and drivers/mfd/*)

   - adding a sbs-manager based on I2C core updates to SMBus alerts
     (touching drivers/power/*)

  Other notable changes:

   - i2c_boardinfo can now carry a dev_name to be used when the device
     is created. This is because some devices in ACPI world need fixed
     names to find the regulators.

   - the designware driver got a long discussed overhaul of its PM
     handling. img-scb and davinci got PM support, too.

   - at24 driver has way better OF support. And it has a new maintainer.
     Thanks Bartosz for stepping up!

  The rest is regular driver updates and fixes"

* 'i2c/for-4.15' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/wsa/linux: (55 commits)
  ARM: sa1100: simpad: Correct I2C GPIO offsets
  i2c: aspeed: Deassert reset in probe
  eeprom: at24: Add OF device ID table
  MAINTAINERS: new maintainer for AT24 driver
  i2c: nuc900: remove platform_data, too
  i2c: thunderx: Remove duplicate NULL check
  i2c: taos-evm: Remove duplicate NULL check
  i2c: Make i2c_unregister_device() NULL-aware
  i2c: xgene-slimpro: Support v2
  i2c: mpc: remove useless variable initialization
  i2c: omap: Trigger bus recovery in lockup case
  i2c: gpio: Add support for named gpios in DT
  dt-bindings: i2c: i2c-gpio: Add support for named gpios
  i2c: gpio: Local vars in probe
  i2c: gpio: Augment all boardfiles to use open drain
  i2c: gpio: Enforce open drain through gpiolib
  gpio: Make it possible for consumers to enforce open drain
  i2c: gpio: Convert to use descriptors
  power: supply: sbs-message: fix some code style issues
  power: supply: sbs-battery: remove unchecked return var
  ...
2017-11-14 17:52:21 -08:00
Linus Torvalds
e37e0ee019 Merge tag 'dma-mapping-4.15' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping updates from Christoph Hellwig:

 - turn dma_cache_sync into a dma_map_ops instance and remove
   implementation that purely are dead because the architecture doesn't
   support noncoherent allocations

 - add a flag for busses that need DMA configuration (Robin Murphy)

* tag 'dma-mapping-4.15' of git://git.infradead.org/users/hch/dma-mapping:
  dma-mapping: turn dma_cache_sync into a dma_map_ops method
  sh: make dma_cache_sync a no-op
  xtensa: make dma_cache_sync a no-op
  unicore32: make dma_cache_sync a no-op
  powerpc: make dma_cache_sync a no-op
  mn10300: make dma_cache_sync a no-op
  microblaze: make dma_cache_sync a no-op
  ia64: make dma_cache_sync a no-op
  frv: make dma_cache_sync a no-op
  x86: make dma_cache_sync a no-op
  floppy: consolidate the dummy fd_cacheflush definition
  drivers: flag buses which demand DMA configuration
2017-11-14 16:54:12 -08:00
Linus Torvalds
e2c5923c34 Merge branch 'for-4.15/block' of git://git.kernel.dk/linux-block
Pull core block layer updates from Jens Axboe:
 "This is the main pull request for block storage for 4.15-rc1.

  Nothing out of the ordinary in here, and no API changes or anything
  like that. Just various new features for drivers, core changes, etc.
  In particular, this pull request contains:

   - A patch series from Bart, closing the whole on blk/scsi-mq queue
     quescing.

   - A series from Christoph, building towards hidden gendisks (for
     multipath) and ability to move bio chains around.

   - NVMe
        - Support for native multipath for NVMe (Christoph).
        - Userspace notifications for AENs (Keith).
        - Command side-effects support (Keith).
        - SGL support (Chaitanya Kulkarni)
        - FC fixes and improvements (James Smart)
        - Lots of fixes and tweaks (Various)

   - bcache
        - New maintainer (Michael Lyle)
        - Writeback control improvements (Michael)
        - Various fixes (Coly, Elena, Eric, Liang, et al)

   - lightnvm updates, mostly centered around the pblk interface
     (Javier, Hans, and Rakesh).

   - Removal of unused bio/bvec kmap atomic interfaces (me, Christoph)

   - Writeback series that fix the much discussed hundreds of millions
     of sync-all units. This goes all the way, as discussed previously
     (me).

   - Fix for missing wakeup on writeback timer adjustments (Yafang
     Shao).

   - Fix laptop mode on blk-mq (me).

   - {mq,name} tupple lookup for IO schedulers, allowing us to have
     alias names. This means you can use 'deadline' on both !mq and on
     mq (where it's called mq-deadline). (me).

   - blktrace race fix, oopsing on sg load (me).

   - blk-mq optimizations (me).

   - Obscure waitqueue race fix for kyber (Omar).

   - NBD fixes (Josef).

   - Disable writeback throttling by default on bfq, like we do on cfq
     (Luca Miccio).

   - Series from Ming that enable us to treat flush requests on blk-mq
     like any other request. This is a really nice cleanup.

   - Series from Ming that improves merging on blk-mq with schedulers,
     getting us closer to flipping the switch on scsi-mq again.

   - BFQ updates (Paolo).

   - blk-mq atomic flags memory ordering fixes (Peter Z).

   - Loop cgroup support (Shaohua).

   - Lots of minor fixes from lots of different folks, both for core and
     driver code"

* 'for-4.15/block' of git://git.kernel.dk/linux-block: (294 commits)
  nvme: fix visibility of "uuid" ns attribute
  blk-mq: fixup some comment typos and lengths
  ide: ide-atapi: fix compile error with defining macro DEBUG
  blk-mq: improve tag waiting setup for non-shared tags
  brd: remove unused brd_mutex
  blk-mq: only run the hardware queue if IO is pending
  block: avoid null pointer dereference on null disk
  fs: guard_bio_eod() needs to consider partitions
  xtensa/simdisk: fix compile error
  nvme: expose subsys attribute to sysfs
  nvme: create 'slaves' and 'holders' entries for hidden controllers
  block: create 'slaves' and 'holders' entries for hidden gendisks
  nvme: also expose the namespace identification sysfs files for mpath nodes
  nvme: implement multipath access to nvme subsystems
  nvme: track shared namespaces
  nvme: introduce a nvme_ns_ids structure
  nvme: track subsystems
  block, nvme: Introduce blk_mq_req_flags_t
  block, scsi: Make SCSI quiesce and resume work reliably
  block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag
  ...
2017-11-14 15:32:19 -08:00
Heiko Carstens
049a2c2d48 s390: enable CPU alternatives unconditionally
Remove the CPU_ALTERNATIVES config option and enable the code
unconditionally. The config option was only added to avoid a conflict
with the named saved segment support. Since that code is gone there is
no reason to keep the CPU_ALTERNATIVES config option.

Just enable it unconditionally to also reduce the number of config
options and make it less likely that something breaks.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2017-11-14 22:08:12 +01:00
Heiko Carstens
a6de0a91d9 s390/nmi: remove unused code
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2017-11-14 22:08:08 +01:00
Heiko Carstens
2be1da8d4d s390/mm: remove unused code
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2017-11-14 22:08:05 +01:00
Heiko Carstens
78ca4fe3bb s390/spinlock: fix indentation
checkpatch:
    WARNING: Statements should start on a tabstop
    #9499: FILE: arch/s390/lib/spinlock.c:231:
    +                          return;

sparse:
arch/s390/lib/spinlock.c:81 arch_load_niai4()
    warn: inconsistent indenting

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2017-11-14 22:07:58 +01:00
Heiko Carstens
3c6153e814 s390/vdso: add missing boot_vdso_data declaration
sparse says:
arch/s390/kernel/vdso.c:150:18:
 warning: symbol 'boot_vdso_data' was not declared. Should it be static?

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2017-11-14 22:07:49 +01:00
Linus Torvalds
37dc79565c Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
 "Here is the crypto update for 4.15:

  API:

   - Disambiguate EBUSY when queueing crypto request by adding ENOSPC.
     This change touches code outside the crypto API.
   - Reset settings when empty string is written to rng_current.

  Algorithms:

   - Add OSCCA SM3 secure hash.

  Drivers:

   - Remove old mv_cesa driver (replaced by marvell/cesa).
   - Enable rfc3686/ecb/cfb/ofb AES in crypto4xx.
   - Add ccm/gcm AES in crypto4xx.
   - Add support for BCM7278 in iproc-rng200.
   - Add hash support on Exynos in s5p-sss.
   - Fix fallback-induced error in vmx.
   - Fix output IV in atmel-aes.
   - Fix empty GCM hash in mediatek.

  Others:

   - Fix DoS potential in lib/mpi.
   - Fix potential out-of-order issues with padata"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (162 commits)
  lib/mpi: call cond_resched() from mpi_powm() loop
  crypto: stm32/hash - Fix return issue on update
  crypto: dh - Remove pointless checks for NULL 'p' and 'g'
  crypto: qat - Clean up error handling in qat_dh_set_secret()
  crypto: dh - Don't permit 'key' or 'g' size longer than 'p'
  crypto: dh - Don't permit 'p' to be 0
  crypto: dh - Fix double free of ctx->p
  hwrng: iproc-rng200 - Add support for BCM7278
  dt-bindings: rng: Document BCM7278 RNG200 compatible
  crypto: chcr - Replace _manual_ swap with swap macro
  crypto: marvell - Add a NULL entry at the end of mv_cesa_plat_id_table[]
  hwrng: virtio - Virtio RNG devices need to be re-registered after suspend/resume
  crypto: atmel - remove empty functions
  crypto: ecdh - remove empty exit()
  MAINTAINERS: update maintainer for qat
  crypto: caam - remove unused param of ctx_map_to_sec4_sg()
  crypto: caam - remove unneeded edesc zeroization
  crypto: atmel-aes - Reset the controller before each use
  crypto: atmel-aes - properly set IV after {en,de}crypt
  hwrng: core - Reset user selected rng by writing "" to rng_current
  ...
2017-11-14 10:52:09 -08:00
Bjorn Helgaas
89000e89bf Merge branch 'pci/host-layerscape' into next
* pci/host-layerscape:
  PCI: layerscape: Change default error response behavior
  PCI: Disable MSI for Freescale Layerscape PCIe RC mode
  arm64: dts: ls1046a: Add PCIe controller DT nodes
  arm64: dts: ls1012a: Add PCIe controller DT node
  PCI: layerscape: Add support for ls1012a
  arm64: dts: ls1012a: Add MSI controller DT node
  irqchip/ls-scfg-msi: Add LS1012a MSI support
2017-11-14 12:11:33 -06:00
Bjorn Helgaas
9ceb09cce1 Merge branch 'pci/virtualization' into next
* pci/virtualization:
  PCI: Document reset method return values
  PCI: Detach driver before procfs & sysfs teardown on device remove
  PCI: Apply Cavium ThunderX ACS quirk to more Root Ports
  PCI: Set Cavium ACS capability quirk flags to assert RR/CR/SV/UF
  PCI: Restore ARI Capable Hierarchy before setting numVFs
  PCI: Create SR-IOV virtfn/physfn links before attaching driver
  PCI: Expose SR-IOV offset, stride, and VF device ID via sysfs
  PCI: Cache the VF device ID in the SR-IOV structure
  PCI: Add Kconfig PCI_IOV dependency for PCI_REALLOC_ENABLE_AUTO
  PCI: Remove unused function __pci_reset_function()
  PCI: Remove reset argument from pci_iov_{add,remove}_virtfn()
2017-11-14 12:11:26 -06:00