Commit Graph

3164 Commits

Author SHA1 Message Date
Andrey Ryabinin
bebf56a1b1 kasan: enable instrumentation of global variables
This feature let us to detect accesses out of bounds of global variables.
This will work as for globals in kernel image, so for globals in modules.
Currently this won't work for symbols in user-specified sections (e.g.
__init, __read_mostly, ...)

The idea of this is simple.  Compiler increases each global variable by
redzone size and add constructors invoking __asan_register_globals()
function.  Information about global variable (address, size, size with
redzone ...) passed to __asan_register_globals() so we could poison
variable's redzone.

This patch also forces module_alloc() to return 8*PAGE_SIZE aligned
address making shadow memory handling (
kasan_module_alloc()/kasan_module_free() ) more simple.  Such alignment
guarantees that each shadow page backing modules address space correspond
to only one module_alloc() allocation.

Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrey Konovalov <adech.fo@gmail.com>
Cc: Yuri Gribov <tetra2005@gmail.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13 21:21:42 -08:00
Andrey Ryabinin
3f15801cdc lib: add kasan test module
This is a test module doing various nasty things like out of bounds
accesses, use after free.  It is useful for testing kernel debugging
features like kernel address sanitizer.

It mostly concentrates on testing of slab allocator, but we might want to
add more different stuff here in future (like stack/global variables out
of bounds accesses and so on).

Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrey Konovalov <adech.fo@gmail.com>
Cc: Yuri Gribov <tetra2005@gmail.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13 21:21:41 -08:00
Andrey Ryabinin
0316bec22e mm: slub: add kernel address sanitizer support for slub allocator
With this patch kasan will be able to catch bugs in memory allocated by
slub.  Initially all objects in newly allocated slab page, marked as
redzone.  Later, when allocation of slub object happens, requested by
caller number of bytes marked as accessible, and the rest of the object
(including slub's metadata) marked as redzone (inaccessible).

We also mark object as accessible if ksize was called for this object.
There is some places in kernel where ksize function is called to inquire
size of really allocated area.  Such callers could validly access whole
allocated memory, so it should be marked as accessible.

Code in slub.c and slab_common.c files could validly access to object's
metadata, so instrumentation for this files are disabled.

Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Signed-off-by: Dmitry Chernenkov <dmitryc@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Konstantin Serebryany <kcc@google.com>
Signed-off-by: Andrey Konovalov <adech.fo@gmail.com>
Cc: Yuri Gribov <tetra2005@gmail.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13 21:21:41 -08:00
Andrey Ryabinin
ef7f0d6a6c x86_64: add KASan support
This patch adds arch specific code for kernel address sanitizer.

16TB of virtual addressed used for shadow memory.  It's located in range
[ffffec0000000000 - fffffc0000000000] between vmemmap and %esp fixup
stacks.

At early stage we map whole shadow region with zero page.  Latter, after
pages mapped to direct mapping address range we unmap zero pages from
corresponding shadow (see kasan_map_shadow()) and allocate and map a real
shadow memory reusing vmemmap_populate() function.

Also replace __pa with __pa_nodebug before shadow initialized.  __pa with
CONFIG_DEBUG_VIRTUAL=y make external function call (__phys_addr)
__phys_addr is instrumented, so __asan_load could be called before shadow
area initialized.

Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrey Konovalov <adech.fo@gmail.com>
Cc: Yuri Gribov <tetra2005@gmail.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Jim Davis <jim.epost@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13 21:21:41 -08:00
Andrey Ryabinin
0b24becc81 kasan: add kernel address sanitizer infrastructure
Kernel Address sanitizer (KASan) is a dynamic memory error detector.  It
provides fast and comprehensive solution for finding use-after-free and
out-of-bounds bugs.

KASAN uses compile-time instrumentation for checking every memory access,
therefore GCC > v4.9.2 required.  v4.9.2 almost works, but has issues with
putting symbol aliases into the wrong section, which breaks kasan
instrumentation of globals.

This patch only adds infrastructure for kernel address sanitizer.  It's
not available for use yet.  The idea and some code was borrowed from [1].

Basic idea:

The main idea of KASAN is to use shadow memory to record whether each byte
of memory is safe to access or not, and use compiler's instrumentation to
check the shadow memory on each memory access.

Address sanitizer uses 1/8 of the memory addressable in kernel for shadow
memory and uses direct mapping with a scale and offset to translate a
memory address to its corresponding shadow address.

Here is function to translate address to corresponding shadow address:

     unsigned long kasan_mem_to_shadow(unsigned long addr)
     {
                return (addr >> KASAN_SHADOW_SCALE_SHIFT) + KASAN_SHADOW_OFFSET;
     }

where KASAN_SHADOW_SCALE_SHIFT = 3.

So for every 8 bytes there is one corresponding byte of shadow memory.
The following encoding used for each shadow byte: 0 means that all 8 bytes
of the corresponding memory region are valid for access; k (1 <= k <= 7)
means that the first k bytes are valid for access, and other (8 - k) bytes
are not; Any negative value indicates that the entire 8-bytes are
inaccessible.  Different negative values used to distinguish between
different kinds of inaccessible memory (redzones, freed memory) (see
mm/kasan/kasan.h).

To be able to detect accesses to bad memory we need a special compiler.
Such compiler inserts a specific function calls (__asan_load*(addr),
__asan_store*(addr)) before each memory access of size 1, 2, 4, 8 or 16.

These functions check whether memory region is valid to access or not by
checking corresponding shadow memory.  If access is not valid an error
printed.

Historical background of the address sanitizer from Dmitry Vyukov:

	"We've developed the set of tools, AddressSanitizer (Asan),
	ThreadSanitizer and MemorySanitizer, for user space. We actively use
	them for testing inside of Google (continuous testing, fuzzing,
	running prod services). To date the tools have found more than 10'000
	scary bugs in Chromium, Google internal codebase and various
	open-source projects (Firefox, OpenSSL, gcc, clang, ffmpeg, MySQL and
	lots of others): [2] [3] [4].
	The tools are part of both gcc and clang compilers.

	We have not yet done massive testing under the Kernel AddressSanitizer
	(it's kind of chicken and egg problem, you need it to be upstream to
	start applying it extensively). To date it has found about 50 bugs.
	Bugs that we've found in upstream kernel are listed in [5].
	We've also found ~20 bugs in out internal version of the kernel. Also
	people from Samsung and Oracle have found some.

	[...]

	As others noted, the main feature of AddressSanitizer is its
	performance due to inline compiler instrumentation and simple linear
	shadow memory. User-space Asan has ~2x slowdown on computational
	programs and ~2x memory consumption increase. Taking into account that
	kernel usually consumes only small fraction of CPU and memory when
	running real user-space programs, I would expect that kernel Asan will
	have ~10-30% slowdown and similar memory consumption increase (when we
	finish all tuning).

	I agree that Asan can well replace kmemcheck. We have plans to start
	working on Kernel MemorySanitizer that finds uses of unitialized
	memory. Asan+Msan will provide feature-parity with kmemcheck. As
	others noted, Asan will unlikely replace debug slab and pagealloc that
	can be enabled at runtime. Asan uses compiler instrumentation, so even
	if it is disabled, it still incurs visible overheads.

	Asan technology is easily portable to other architectures. Compiler
	instrumentation is fully portable. Runtime has some arch-dependent
	parts like shadow mapping and atomic operation interception. They are
	relatively easy to port."

Comparison with other debugging features:
========================================

KMEMCHECK:

  - KASan can do almost everything that kmemcheck can.  KASan uses
    compile-time instrumentation, which makes it significantly faster than
    kmemcheck.  The only advantage of kmemcheck over KASan is detection of
    uninitialized memory reads.

    Some brief performance testing showed that kasan could be
    x500-x600 times faster than kmemcheck:

$ netperf -l 30
		MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to localhost (127.0.0.1) port 0 AF_INET
		Recv   Send    Send
		Socket Socket  Message  Elapsed
		Size   Size    Size     Time     Throughput
		bytes  bytes   bytes    secs.    10^6bits/sec

no debug:	87380  16384  16384    30.00    41624.72

kasan inline:	87380  16384  16384    30.00    12870.54

kasan outline:	87380  16384  16384    30.00    10586.39

kmemcheck: 	87380  16384  16384    30.03      20.23

  - Also kmemcheck couldn't work on several CPUs.  It always sets
    number of CPUs to 1.  KASan doesn't have such limitation.

DEBUG_PAGEALLOC:
	- KASan is slower than DEBUG_PAGEALLOC, but KASan works on sub-page
	  granularity level, so it able to find more bugs.

SLUB_DEBUG (poisoning, redzones):
	- SLUB_DEBUG has lower overhead than KASan.

	- SLUB_DEBUG in most cases are not able to detect bad reads,
	  KASan able to detect both reads and writes.

	- In some cases (e.g. redzone overwritten) SLUB_DEBUG detect
	  bugs only on allocation/freeing of object. KASan catch
	  bugs right before it will happen, so we always know exact
	  place of first bad read/write.

[1] https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel
[2] https://code.google.com/p/address-sanitizer/wiki/FoundBugs
[3] https://code.google.com/p/thread-sanitizer/wiki/FoundBugs
[4] https://code.google.com/p/memory-sanitizer/wiki/FoundBugs
[5] https://code.google.com/p/address-sanitizer/wiki/AddressSanitizerForKernel#Trophies

Based on work by Andrey Konovalov.

Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Acked-by: Michal Marek <mmarek@suse.cz>
Signed-off-by: Andrey Konovalov <adech.fo@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Cc: Yuri Gribov <tetra2005@gmail.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13 21:21:40 -08:00
Tejun Heo
46385326cc bitmap, cpumask, nodemask: remove dedicated formatting functions
Now that all bitmap formatting usages have been converted to
'%*pb[l]', the separate formatting functions are unnecessary.  The
following functions are removed.

* bitmap_scn[list]printf()
* cpumask_scnprintf(), cpulist_scnprintf()
* [__]nodemask_scnprintf(), [__]nodelist_scnprintf()
* seq_bitmap[_list](), seq_cpumask[_list](), seq_nodemask[_list]()
* seq_buf_bitmask()

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13 21:21:39 -08:00
Tejun Heo
4a0792b0e7 bitmap: use %*pb[l] to print bitmaps including cpumasks and nodemasks
printk and friends can now format bitmaps using '%*pb[l]'.  cpumask
and nodemask also provide cpumask_pr_args() and nodemask_pr_args()
respectively which can be used to generate the two printf arguments
necessary to format the specified cpu/nodemask.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13 21:21:36 -08:00
Tejun Heo
dbc760bcc1 lib/vsprintf: implement bitmap printing through '%*pb[l]'
bitmap and its derivatives such as cpumask and nodemask currently only
provide formatting functions which put the output string into the
provided buffer; however, how long this buffer should be isn't defined
anywhere and given that some of these bitmaps can be too large to be
formatted into an on-stack buffer it users sometimes are unnecessarily
forced to come up with creative solutions and compromises for the
buffer just to printk these bitmaps.

There have been a couple different attempts at making this easier.

1. Way back, PeterZ tried printk '%pb' extension with the precision
   for bit width - '%.*pb'.  This was intuitive and made sense but
   unfortunately triggered a compile warning about using precision
   for a pointer.

   http://lkml.kernel.org/g/1336577562.2527.58.camel@twins

2. I implemented bitmap_pr_cont[_list]() and its wrappers for cpumask
   and nodemask.  This works but PeterZ pointed out that pr_cont's
   tendency to produce broken lines when multiple CPUs are printing is
   bothering considering the usages.

   http://lkml.kernel.org/g/1418226774-30215-3-git-send-email-tj@kernel.org

So, this patch is another attempt at teaching printk and friends how
to print bitmaps.  It's almost identical to what PeterZ tried with
precision but it uses the field width for the number of bits instead
of precision.  The format used is '%*pb[l]', with the optional
trailing 'l' specifying list format instead of hex masks.

This is a valid format string and doesn't trigger compiler warnings;
however, it does make it impossible to specify output field width when
printing bitmaps.  I think this is an acceptable trade-off given how
much easier it makes printing bitmaps and that we don't have any
in-kernel user which is using the field width specification.  If any
future user wants to use field width with a bitmap, it'd have to
format the bitmap into a string buffer and then print that buffer with
width spec, which isn't different from how it should be done now.

This patch implements bitmap[_list]_string() which are called from the
vsprintf pointer() formatting function.  The implementation is mostly
identical to bitmap_scn[list]printf() except that the output is
performed in the vsprintf way.  These functions handle formatting into
too small buffers and sprintf() family of functions report the correct
overrun output length.

bitmap_scn[list]printf() are now thin wrappers around scnprintf().

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Mike Travis <travis@sgi.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13 21:21:36 -08:00
Jan Kara
310ee9e8f3 lib/genalloc.c: check result of devres_alloc()
devm_gen_pool_create() calls devres_alloc() and dereferences its result
without checking whether devres_alloc() succeeded.  Check for error and
bail out if it happened.

Coverity-id 1016493.

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13 21:21:36 -08:00
Rasmus Villemoes
8da53d4595 lib/string.c: improve strrchr()
Instead of potentially passing over the string twice in case c is not
found, just keep track of the last occurrence.  According to
bloat-o-meter, this also cuts the generated code by a third (54 vs 36
bytes).  Oh, and we get rid of those 7-space indented lines.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13 21:21:36 -08:00
Daniel Borkmann
f5e38b9284 lib: crc32: constify crc32 lookup table
Commit 8f243af42a ("sections: fix const sections for crc32 table")
removed the compile-time generated crc32 tables from the RO sections,
because it conflicts with the definition of __cacheline_aligned which
puts all such aligned data into .data..cacheline_aligned section
optimized for wasting less space, and can cause alignment issues when
used in combination with const with some gcc versions like 4.7.0 due to
a gcc bug [1].

Given that most gcc versions should have the fix by now, we can just use
____cacheline_aligned, which only aligns the data but doesn't move it
into specific sections as opposed to __cacheline_aligned.  In case of
gcc versions having the mentioned bug, the alignment attribute will have
no effect, but the data will still be made RO.

After patch tables are in RO:

  $ nm -v lib/crc32.o | grep -1 -E "crc32c?table"
  0000000000000000 t arch_local_irq_enable
  0000000000000000 r crc32ctable_le
  0000000000000000 t crc32_exit
  --
  0000000000000960 t test_buf
  0000000000002000 r crc32table_be
  0000000000004000 r crc32table_le
  000000001d1056e5 A __crc_crc32_be

  [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52181

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Cc: Joe Mario <jmario@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13 21:21:35 -08:00
Rasmus Villemoes
7f59065793 lib: bitmap: remove redundant code from __bitmap_shift_left
The first of these conditionals is completely redundant: If k == lim-1, we
must have off==0, so the second conditional will also trigger and then it
wouldn't matter if upper had some high bits set.  But the second
conditional is in fact also redundant, since it only serves to clear out
some high-order "don't care" bits of dst, about which no guarantee is
made.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13 21:21:35 -08:00
Rasmus Villemoes
6d874eca65 lib: bitmap: eliminate branch in __bitmap_shift_left
We can shift the bits from lower and upper into place before assembling
dst[k + off]; moving the shift of lower into the branch where we already
know that rem is non-zero allows us to remove a conditional.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13 21:21:35 -08:00
Rasmus Villemoes
dba94c2553 lib: bitmap: change bitmap_shift_left to take unsigned parameters
gcc can generate slightly better code for stuff like "nbits %
BITS_PER_LONG" when it knows nbits is not negative.  Since negative size
bitmaps or shift amounts don't make sense, change these parameters of
bitmap_shift_right to unsigned.

If off >= lim (which requires shift >= nbits), k is initialized with a
large positive value, but since I've let k continue to be signed, the loop
will never run and dst will be zeroed as expected.  Inside the loop, k is
guaranteed to be non-negative, so the fact that it is promoted to unsigned
in the various expressions it appears in is harmless.

Also use "shift" and "nbits" consistently for the parameter names.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13 21:21:35 -08:00
Rasmus Villemoes
cfac1d080a lib: bitmap: yet another simplification in __bitmap_shift_right
If left is 0, we can just let mask be ~0UL, so that anding with it is a
no-op.  Conveniently, BITMAP_LAST_WORD_MASK provides precisely what we
need, and we can eliminate left.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13 21:21:35 -08:00
Rasmus Villemoes
97fb8e940b lib: bitmap: remove redundant code from __bitmap_shift_right
If the condition k==lim-1 is true, we must have off == 0 (otherwise, k
could never become that big).  But in that case we have upper == 0 and
hence dst[k] == (src[k] & mask) >> rem.  Since mask consists of a
consecutive range of bits starting from the LSB, anding dst[k] with mask
is a no-op.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13 21:21:35 -08:00
Rasmus Villemoes
9d8a6b2a02 lib: bitmap: eliminate branch in __bitmap_shift_right
We can shift the bits from lower and upper into place before assembling
dst[k]; moving the shift of upper into the branch where we already know
that rem is non-zero allows us to remove a conditional.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13 21:21:35 -08:00
Rasmus Villemoes
2fbad29917 lib: bitmap: change bitmap_shift_right to take unsigned parameters
I've previously changed the nbits parameter of most bitmap_* functions to
unsigned; now it is bitmap_shift_{left,right}'s turn.  This alone saves
some .text, but while at it I found that there were a few other things one
could do.  The end result of these seven patches is

  $ scripts/bloat-o-meter /tmp/bitmap.o.{old,new}
  add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-328 (-328)
  function                                     old     new   delta
  __bitmap_shift_right                         384     226    -158
  __bitmap_shift_left                          306     136    -170

and less importantly also a smaller stack footprint

  $ stack-o-meter.pl master bitmap
  file                 function                       old  new  delta
  lib/bitmap.o         __bitmap_shift_right             24    8  -16
  lib/bitmap.o         __bitmap_shift_left              24    0  -24

For each pair of 0 <= shift <= nbits <= 256 I've tested the end result
with a few randomly filled src buffers (including garbage beyond nbits),
in each case verifying that the shift {left,right}-most bits of dst are
zero and the remaining nbits-shift bits correspond to src, so I'm fairly
confident I didn't screw up.  That hasn't stopped me from being wrong
before, though.

This patch (of 7):

gcc can generate slightly better code for stuff like "nbits %
BITS_PER_LONG" when it knows nbits is not negative.  Since negative size
bitmaps or shift amounts don't make sense, change these parameters of
bitmap_shift_right to unsigned.

The expressions involving "lim - 1" are still ok, since if lim is 0 the
loop is never executed.

Also use "shift" and "nbits" consistently for the parameter names.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13 21:21:35 -08:00
Rasmus Villemoes
e8f2427832 lib/bitmap.c: elide bitmap_copy_le on little-endian
On little-endian, there's no reason to have an extra, presumably less
efficient, way of copying a bitmap.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13 21:21:35 -08:00
Rasmus Villemoes
9b6c2d2e2b lib/bitmap.c: change prototype of bitmap_copy_le
Make the prototype of bitmap_copy_le the same as bitmap_copy's.  All other
bitmap_* functions take unsigned long* parameters; there's no reason this
should be special.

The only current user is the static inline uwb_mas_bm_copy_le, which
already does the void* laundering, so the end users can pass their u8 or
__le32 buffers without a cast.

Furthermore, this allows us to simply let bitmap_copy_le be an alias for
bitmap_copy on little-endian; see next patch.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13 21:21:34 -08:00
Linus Torvalds
818099574b Merge branch 'akpm' (patches from Andrew)
Merge third set of updates from Andrew Morton:

 - the rest of MM

   [ This includes getting rid of the numa hinting bits, in favor of
     just generic protnone logic.  Yay.     - Linus ]

 - core kernel

 - procfs

 - some of lib/ (lots of lib/ material this time)

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (104 commits)
  lib/lcm.c: replace include
  lib/percpu_ida.c: remove redundant includes
  lib/strncpy_from_user.c: replace module.h include
  lib/stmp_device.c: replace module.h include
  lib/sort.c: move include inside #if 0
  lib/show_mem.c: remove redundant include
  lib/radix-tree.c: change to simpler include
  lib/plist.c: remove redundant include
  lib/nlattr.c: remove redundant include
  lib/kobject_uevent.c: remove redundant include
  lib/llist.c: remove redundant include
  lib/md5.c: simplify include
  lib/list_sort.c: rearrange includes
  lib/genalloc.c: remove redundant include
  lib/idr.c: remove redundant include
  lib/halfmd4.c: simplify includes
  lib/dynamic_queue_limits.c: simplify includes
  lib/sort.c: use simpler includes
  lib/interval_tree.c: simplify includes
  hexdump: make it return number of bytes placed in buffer
  ...
2015-02-12 18:54:28 -08:00
Rasmus Villemoes
6016daed58 lib/lcm.c: replace include
We don't need all the stuff kernel.h pulls in; just compiler.h since
export.h doesn't do necessary #includes.  This removes more than 100
dependencies.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:16 -08:00
Rasmus Villemoes
6918584aad lib/percpu_ida.c: remove redundant includes
These three #includes seem to be completely redundant: Removing them
yields identical objdump -d output for each of {allyes,allno,def}config,
and neither included file end up in the generated dependency file through
some recursive include.  In total, about 50 lines are eliminated from
.percpu.o.cmd.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:16 -08:00
Rasmus Villemoes
bf3c2d6d2f lib/strncpy_from_user.c: replace module.h include
strncpy_from_user.c only needs EXPORT_SYMBOL, so just include compiler.h
and export.h instead of the whole module.h machinery.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:16 -08:00
Rasmus Villemoes
b6d4f3221d lib/stmp_device.c: replace module.h include
stmp_device.c only needs EXPORT_SYMBOL, so just include compiler.h and
export.h instead of the whole module.h machinery.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:16 -08:00
Rasmus Villemoes
2ddae683bf lib/sort.c: move include inside #if 0
The sort function and its helpers don't do memory allocation, so the
slab.h include is redundant.  Move it inside the #if 0 protecting the
self-test, similar to how it is done in lib/list_sort.c.  This removes
over 450 lines from the generated dependency file.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:16 -08:00
Rasmus Villemoes
b8b6db1793 lib/show_mem.c: remove redundant include
show_mem.c doesn't use anything from nmi.h.  Removing it yields identical
objdump -d output for each of {allyes,allno,def}config and eliminates more
than 100 lines in the dependency file.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:16 -08:00
Rasmus Villemoes
886d3dfa85 lib/radix-tree.c: change to simpler include
The comment helpfully explains why hardirq.h is included, but since
commit 2d4b84739f ("hardirq: Split preempt count mask definitions")
in_interrupt() has been provided by preempt_mask.h.  Use that instead,
saving around 40 lines in the generated dependency file.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:16 -08:00
Rasmus Villemoes
7f1ce3c864 lib/plist.c: remove redundant include
Removing the include of linux/spinlock.h produces byte-identical output
for {allno,def}config, and identical objdump -d output for allyesconfig.
In the former two cases, more than a 100 lines are eliminated from the
generated dependency file.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:16 -08:00
Rasmus Villemoes
fb41f9d71c lib/nlattr.c: remove redundant include
nlattr.c doesn't seem to rely on anything from netdevice.h.  Removing it
yields identical objdump -d output for each of {allyes,allno,def}config,
and eliminates more than 200 lines from the generated dependency file.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:16 -08:00
Rasmus Villemoes
a69ae45c26 lib/kobject_uevent.c: remove redundant include
The file doesn't seem to use anything from linux/user_namespace.h, and
removing it yields byte-identical object code and strictly fewer
dependencies in the .cmd file.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:15 -08:00
Rasmus Villemoes
9b40570bd9 lib/llist.c: remove redundant include
This file doesn't seem to use anything provided by linux/interrupt.h or
anything recursively included through that.  Removing it produces
byte-identical output, while reducing .llist.o.cmd from 541 to 156 lines.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:15 -08:00
Rasmus Villemoes
9a29ae84c1 lib/md5.c: simplify include
md5.c doesn't use anything from kernel.h, except that that pulls in
compiler.h, which is needed for the export.h to work.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:15 -08:00
Rasmus Villemoes
7259fa0424 lib/list_sort.c: rearrange includes
Memory allocation only happens in the self test, just as random numbers
are only used there.  So move the inclusion of slab.h inside the
CONFIG_TEST_LIST_SORT.

We don't need module.h and all of the stuff it carries with it, so replace
with export.h and compiler.h.  Unfortunately, the ARRAY_SIZE macro from
kernel.h requires the user to ensure bug.h is also included (for
BUILD_BUG_ON_ZERO, used by __must_be_array).  We used to get that through
some maze of nested includes, but just include it explicitly.

linux/string.h is then only included implicitly through
kernel.h->printk.h->dynamic_debug.h, but only if !CONFIG_DYNAMIC_DEBUG, so
just include it explicitly (for memset).

objdump -d says the generated code is the same, and wc -l says that
lib/.list_sort.o.cmd went from 579 to 165 lines.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:15 -08:00
Rasmus Villemoes
18fa6d2e45 lib/genalloc.c: remove redundant include
Removing this include produces byte-identical output, and thus removes a
false dependency.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:15 -08:00
Rasmus Villemoes
87d1d16937 lib/idr.c: remove redundant include
idr.c doesn't seem to use anything from hardirq.h (or anything included
from that).  Removing it produces identical objdump -d output, and gives
44 fewer lines in the .idr.o.cmd dependency file.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:15 -08:00
Rasmus Villemoes
3248340d3f lib/halfmd4.c: simplify includes
We only need EXPORT_SYMBOL, so compiler.h and export.h suffice.  This
means linux/types.h is no longer implicitly included, so add an include of
uapi/linux/types.h to linux/cryptohash.h for __u32.  Other users of
cryptohash.h cannot be affected, since they must already have been
including uapi/linux/types.h in order for gcc not to complain about
unknown types.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:15 -08:00
Rasmus Villemoes
565ac23b81 lib/dynamic_queue_limits.c: simplify includes
The file doesn't use anything from ctype.h.  Instead of module.h, just use
export.h for EXPORT_SYMBOL.  The latter requires the user to include
compiler.h, so do that explicitly instead of relying on some other header
pulling it in.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:15 -08:00
Rasmus Villemoes
42cf809654 lib/sort.c: use simpler includes
sort.c doesn't use facilities from kernel.h, but does use some types
defined in linux/types.h.  Include the latter directly instead of relying
on some other header doing it.  Similarly, include linux/export.h directly
instead of through module.h.  This removes 80 lines from the dependency
file .sort.o.cmd.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:15 -08:00
Rasmus Villemoes
85c5e27c4a lib/interval_tree.c: simplify includes
The file uses nothing from init.h, and also doesn't need the full module.h
machinery; export.h is sufficient.  The latter requires the user to ensure
compiler.h is included, so do that explicitly instead of relying on some
other header pulling it in.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:15 -08:00
Andy Shevchenko
114fc1afb2 hexdump: make it return number of bytes placed in buffer
This patch makes hexdump return the number of bytes placed in the buffer
excluding trailing NUL.  In the case of overflow it returns the desired
amount of bytes to produce the entire dump.  Thus, it mimics snprintf().

This will be useful for users that would like to repeat with a bigger
buffer.

[akpm@linux-foundation.org: fix printk warning]
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:15 -08:00
Andy Shevchenko
5d909c8d54 hexdump: do a few calculations ahead
Instead of doing calculations in each case of different groupsize let's do
them beforehand.  While there, change the switch to an if-else-if
construction.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:15 -08:00
Andy Shevchenko
6f6f3fcb87 hexdump: fix ascii column for the tail of a dump
In the current implementation we have a floating ascii column in the tail
of the dump.

For example, for row size equal to 16 the ascii column as in following
table

group size \ length	8	12	16
	1		50	50	50
	2		22	32	42
	4		20	29	38
	8		19	-	36

This patch makes it the same independently of amount of bytes dumped.

The change is safe since all current users, which use ASCII part of the
dump, rely on the group size equal to 1.  The patch doesn't change
behaviour for such group size (see the table above).

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:14 -08:00
Andy Shevchenko
64d1d77a44 hexdump: introduce test suite
Test different scenarios of function calls located in lib/hexdump.c.

Currently hex_dump_to_buffer() is only tested and test data is provided
for little endian CPUs.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:14 -08:00
Toshi Kikuchi
ad3d5d2f7d lib/genalloc.c: fix the end addr check in addr_in_gen_pool()
Since chunk->end_addr is (chunk->start_addr + size - 1), the end address
to compare should be (start + size - 1).

Signed-off-by: Toshi Kikuchi <toshik@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:14 -08:00
Rasmus Villemoes
af3cd13501 lib/string.c: remove strnicmp()
Now that all in-tree users of strnicmp have been converted to
strncasecmp, the wrapper can be removed.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: David Howells <dhowells@redhat.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:14 -08:00
Rasmus Villemoes
9814ec135d lib/bitmap.c: make the bits parameter of bitmap_remap unsigned
Also, rename bits to nbits. Both changes for consistency with other
bitmap_* functions.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:14 -08:00
Rasmus Villemoes
f6a1f5db8d lib/bitmap.c: simplify bitmap_ord_to_pos
Make the return value and the ord and nbits parameters of
bitmap_ord_to_pos unsigned.

Also, simplify the implementation and as a side effect make the result
fully defined, returning nbits for ord >= weight, in analogy with what
find_{first,next}_bit does.  This is a better sentinel than the former
("unofficial") 0.  No current users are affected by this change.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:14 -08:00
Rasmus Villemoes
df1d80a9eb lib/bitmap.c: simplify bitmap_pos_to_ord
The ordinal of a set bit is simply the number of set bits before it;
counting those doesn't need to be done one bit at a time.  While at it,
update the parameters to unsigned int.

It is not completely unthinkable that gcc would see pos as compile-time
constant 0 in one of the uses of bitmap_pos_to_ord.  Since the static
inline frontend bitmap_weight doesn't handle nbits==0 correctly (it would
behave exactly as if nbits==BITS_PER_LONG), use __bitmap_weight.

Alternatively, the last line could be spelled bitmap_weight(buf, pos+1)-1,
but this is simpler.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:14 -08:00
Rasmus Villemoes
b26ad5836c lib/bitmap.c: change parameters of bitmap_fold to unsigned
Change the sz and nbits parameters of bitmap_fold to unsigned int for
consistency with other bitmap_* functions, and to save another few bytes
in the generated code.

[akpm@linux-foundation.org: fix kerneldoc]
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:14 -08:00
Rasmus Villemoes
eb56988378 lib/bitmap.c: update bitmap_onto to unsigned
Change the nbits parameter of bitmap_onto to unsigned int for consistency
with other bitmap_* functions.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:14 -08:00
Rasmus Villemoes
d1214c65c0 libstring_helpers.c:string_get_size(): return void
string_get_size() was documented to return an error, but in fact always
returned 0.  Since the output always fits in 9 bytes, just document that
and let callers do what they do now: pass a small stack buffer and ignore
the return value.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:13 -08:00
Rasmus Villemoes
84b9fbedf5 lib/string_helpers.c:string_get_size(): use 32 bit arithmetic when possible
The remainder from do_div is always a u32, and after size has been reduced
to be below 1000 (or 1024), it certainly fits in u32.  So both remainder
and sf_cap can be made u32s, the format specifiers can be simplified (%lld
wasn't the right thing to use for _unsigned_ long long anyway), and we can
replace a do_div with an ordinary 32/32 bit division.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:13 -08:00
Rasmus Villemoes
7eed8fde02 lib/string_helpers.c:string_get_size(): remove redundant prefixes
While commit 3c9f3681d0 ("[SCSI] lib: add generic helper to print
sizes rounded to the correct SI range") says that Z and Y are included
in preparation for 128 bit computers, they just waste .text currently.
If and when we get u128, string_get_size needs updating anyway (and ISO
needs to come up with four more prefixes).

Also there's no need to include and test for the NULL sentinel; once we
reach "E" size is at most 18.  [The test is also wrong; it should be
units_str[units][i+1]; if we've reached NULL we're already doomed.]

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:13 -08:00
Rasmus Villemoes
43e5b666cf lib/vsprintf.c: replace while with do-while in skip_atoi
All callers of skip_atoi have already checked for the first character
being a digit.  In this case, gcc generates simpler code for a do
while-loop.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:13 -08:00
Rasmus Villemoes
2aa2f9e21e lib/vsprintf.c: improve sanity check in vsnprintf()
On 64 bit, size may very well be huge even if bit 31 happens to be 0.
Somehow it doesn't feel right that one can pass a 5 GiB buffer but not a
3 GiB one.  So cap at INT_MAX as was probably the intention all along.
This is also the made-up value passed by sprintf and vsprintf.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:13 -08:00
Rasmus Villemoes
ffbfed03b4 lib/vsprintf.c: consume 'p' in format_decode
It seems a little simpler to consume the p from a %p specifier in
format_decode, just as it is done for the surrounding %c, %s and %% cases.

While there, delete a redundant and misplaced comment.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:13 -08:00
Linus Torvalds
5d8e7fb691 md updates for 3.20
- assorted locking changes so that access to /proc/mdstat
    and much of /sys/block/mdXX/md/* is protected by a spinlock
    rather than a mutex and will never block indefinitely.
 
  - Make an 'if' condition in RAID5 - which has been implicated
    in recent bugs - more readable.
 
  - misc minor fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIVAwUAVNwaHTnsnt1WYoG5AQLssw//TniaqtlsK6RPd4PbWdyKFBfUx6bPurtx
 ZRQvw7FcX8sFW/QtxKyJZsOhKcCc1CQEByzZzvuR5SLE8scyw/SxC8eFqMY+vSWF
 HmqJtoK3LdxY/PPC2qRYcdzpU4DzYuIu3ChkJvdXJa4mULKJhutR8nbp1k52JXN8
 U2KPoE+hScVLKCuhwkvp6h4Fg/Caa9MjSpk3uMEkGDm75Iwl175EIb/fV95pIfqd
 cXs2wJBW3TA/YQecMeHC/YmVSIjF8nobCfhFlWfWYg0ySrSh1YSPLjb707Ohs5cF
 efEGoIWfm2uKd0XmsKEDclYDTNtwqgO7A3QDuWFP5ab9cTpi0Fuj1CE9LNBCf5Hf
 a5mwggBQ2KkpGwgtBuz6MJRaVu1xtOOjyFG8qzzeDOHLKxHRvZgiUnkb5U2I+YxB
 iMmyk7ijm9PF5M+wm3AqzFmG8icrAf6ixkSZidQy0PfAIFFFph+jQvdHqiXnLOGD
 6S/xpG0avHxdC5pC4R0iRsNMNl3IESqy6qLkTOYjQ+yoZ9A+LY19nsMf9LBnwgdY
 hz9WxV+EvFVggVl19U5Bg+3z1EwPt3kNr4Se1bkPI8reHH/adacNmHWBBLQ5KHle
 WYdqcNXiTwaLje/7Qw5TKQFSGgLMAPc8ciDA7QOX/ZFoyUmWFn89YAoDdGqvvDd9
 g5mQc/Amxt8=
 =/FmP
 -----END PGP SIGNATURE-----

Merge tag 'md/3.20' of git://neil.brown.name/md

Pull md updates from Neil Brown:

 - assorted locking changes so that access to /proc/mdstat
   and much of /sys/block/mdXX/md/* is protected by a spinlock
   rather than a mutex and will never block indefinitely.

 - Make an 'if' condition in RAID5 - which has been implicated
   in recent bugs - more readable.

 - misc minor fixes

* tag 'md/3.20' of git://neil.brown.name/md: (28 commits)
  md/raid10: fix conversion from RAID0 to RAID10
  md: wakeup thread upon rdev_dec_pending()
  md: make reconfig_mutex optional for writes to md sysfs files.
  md: move mddev_lock and related to md.h
  md: use mddev->lock to protect updates to resync_{min,max}.
  md: minor cleanup in safe_delay_store.
  md: move GET_BITMAP_FILE ioctl out from mddev_lock.
  md: tidy up set_bitmap_file
  md: remove unnecessary 'buf' from get_bitmap_file.
  md: remove mddev_lock from rdev_attr_show()
  md: remove mddev_lock() from md_attr_show()
  md/raid5: use ->lock to protect accessing raid5 sysfs attributes.
  md: remove need for mddev_lock() in md_seq_show()
  md/bitmap: protect clearing of ->bitmap by mddev->lock
  md: protect ->pers changes with mddev->lock
  md: level_store: group all important changes into one place.
  md: rename ->stop to ->free
  md: split detach operation out from ->stop.
  md/linear: remove rcu protections in favour of suspend/resume
  md: make merge_bvec_fn more robust in face of personality changes.
  ...
2015-02-12 11:05:49 -08:00
Linus Torvalds
42cf0f203e Merge branch 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm
Pull ARM updates from Russell King:

 - clang assembly fixes from Ard

 - optimisations and cleanups for Aurora L2 cache support

 - efficient L2 cache support for secure monitor API on Exynos SoCs

 - debug menu cleanup from Daniel Thompson to allow better behaviour for
   multiplatform kernels

 - StrongARM SA11x0 conversion to irq domains, and pxa_timer

 - kprobes updates for older ARM CPUs

 - move probes support out of arch/arm/kernel to arch/arm/probes

 - add inline asm support for the rbit (reverse bits) instruction

 - provide an ARM mode secondary CPU entry point (for Qualcomm CPUs)

 - remove the unused ARMv3 user access code

 - add driver_override support to AMBA Primecell bus

* 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: (55 commits)
  ARM: 8256/1: driver coamba: add device binding path 'driver_override'
  ARM: 8301/1: qcom: Use secondary_startup_arm()
  ARM: 8302/1: Add a secondary_startup that assumes ARM mode
  ARM: 8300/1: teach __asmeq that r11 == fp and r12 == ip
  ARM: kprobes: Fix compilation error caused by superfluous '*'
  ARM: 8297/1: cache-l2x0: optimize aurora range operations
  ARM: 8296/1: cache-l2x0: clean up aurora cache handling
  ARM: 8284/1: sa1100: clear RCSR_SMR on resume
  ARM: 8283/1: sa1100: collie: clear PWER register on machine init
  ARM: 8282/1: sa1100: use handle_domain_irq
  ARM: 8281/1: sa1100: move GPIO-related IRQ code to gpio driver
  ARM: 8280/1: sa1100: switch to irq_domain_add_simple()
  ARM: 8279/1: sa1100: merge both GPIO irqdomains
  ARM: 8278/1: sa1100: split irq handling for low GPIOs
  ARM: 8291/1: replace magic number with PAGE_SHIFT macro in fixup_pv code
  ARM: 8290/1: decompressor: fix a wrong comment
  ARM: 8286/1: mm: Fix dma_contiguous_reserve comment
  ARM: 8248/1: pm: remove outdated comment
  ARM: 8274/1: Fix DEBUG_LL for multi-platform kernels (without PL01X)
  ARM: 8273/1: Seperate DEBUG_UART_PHYS from DEBUG_LL on EP93XX
  ...
2015-02-12 08:51:56 -08:00
Linus Torvalds
8cc748aa76 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security layer updates from James Morris:
 "Highlights:

   - Smack adds secmark support for Netfilter
   - /proc/keys is now mandatory if CONFIG_KEYS=y
   - TPM gets its own device class
   - Added TPM 2.0 support
   - Smack file hook rework (all Smack users should review this!)"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (64 commits)
  cipso: don't use IPCB() to locate the CIPSO IP option
  SELinux: fix error code in policydb_init()
  selinux: add security in-core xattr support for pstore and debugfs
  selinux: quiet the filesystem labeling behavior message
  selinux: Remove unused function avc_sidcmp()
  ima: /proc/keys is now mandatory
  Smack: Repair netfilter dependency
  X.509: silence asn1 compiler debug output
  X.509: shut up about included cert for silent build
  KEYS: Make /proc/keys unconditional if CONFIG_KEYS=y
  MAINTAINERS: email update
  tpm/tpm_tis: Add missing ifdef CONFIG_ACPI for pnp_acpi_device
  smack: fix possible use after frees in task_security() callers
  smack: Add missing logging in bidirectional UDS connect check
  Smack: secmark support for netfilter
  Smack: Rework file hooks
  tpm: fix format string error in tpm-chip.c
  char/tpm/tpm_crb: fix build error
  smack: Fix a bidirectional UDS connect check typo
  smack: introduce a special case for tmpfs in smack_d_instantiate()
  ...
2015-02-11 20:25:11 -08:00
Linus Torvalds
b3d6524ff7 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Martin Schwidefsky:

 - The remaining patches for the z13 machine support: kernel build
   option for z13, the cache synonym avoidance, SMT support,
   compare-and-delay for spinloops and the CES5S crypto adapater.

 - The ftrace support for function tracing with the gcc hotpatch option.
   This touches common code Makefiles, Steven is ok with the changes.

 - The hypfs file system gets an extension to access diagnose 0x0c data
   in user space for performance analysis for Linux running under z/VM.

 - The iucv hvc console gets wildcard spport for the user id filtering.

 - The cacheinfo code is converted to use the generic infrastructure.

 - Cleanup and bug fixes.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (42 commits)
  s390/process: free vx save area when releasing tasks
  s390/hypfs: Eliminate hypfs interval
  s390/hypfs: Add diagnose 0c support
  s390/cacheinfo: don't use smp_processor_id() in preemptible context
  s390/zcrypt: fixed domain scanning problem (again)
  s390/smp: increase maximum value of NR_CPUS to 512
  s390/jump label: use different nop instruction
  s390/jump label: add sanity checks
  s390/mm: correct missing space when reporting user process faults
  s390/dasd: cleanup profiling
  s390/dasd: add locking for global_profile access
  s390/ftrace: hotpatch support for function tracing
  ftrace: let notrace function attribute disable hotpatching if necessary
  ftrace: allow architectures to specify ftrace compile options
  s390: reintroduce diag 44 calls for cpu_relax()
  s390/zcrypt: Add support for new crypto express (CEX5S) adapter.
  s390/zcrypt: Number of supported ap domains is not retrievable.
  s390/spinlock: add compare-and-delay to lock wait loops
  s390/tape: remove redundant if statement
  s390/hvc_iucv: add simple wildcard matches to the iucv allow filter
  ...
2015-02-11 17:42:32 -08:00
Linus Torvalds
c5ce28df0e Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) More iov_iter conversion work from Al Viro.

    [ The "crypto: switch af_alg_make_sg() to iov_iter" commit was
      wrong, and this pull actually adds an extra commit on top of the
      branch I'm pulling to fix that up, so that the pre-merge state is
      ok.   - Linus ]

 2) Various optimizations to the ipv4 forwarding information base trie
    lookup implementation.  From Alexander Duyck.

 3) Remove sock_iocb altogether, from CHristoph Hellwig.

 4) Allow congestion control algorithm selection via routing metrics.
    From Daniel Borkmann.

 5) Make ipv4 uncached route list per-cpu, from Eric Dumazet.

 6) Handle rfs hash collisions more gracefully, also from Eric Dumazet.

 7) Add xmit_more support to r8169, e1000, and e1000e drivers.  From
    Florian Westphal.

 8) Transparent Ethernet Bridging support for GRO, from Jesse Gross.

 9) Add BPF packet actions to packet scheduler, from Jiri Pirko.

10) Add support for uniqu flow IDs to openvswitch, from Joe Stringer.

11) New NetCP ethernet driver, from Muralidharan Karicheri and Wingman
    Kwok.

12) More sanely handle out-of-window dupacks, which can result in
    serious ACK storms.  From Neal Cardwell.

13) Various rhashtable bug fixes and enhancements, from Herbert Xu,
    Patrick McHardy, and Thomas Graf.

14) Support xmit_more in be2net, from Sathya Perla.

15) Group Policy extensions for vxlan, from Thomas Graf.

16) Remove Checksum Offload support for vxlan, from Tom Herbert.

17) Like ipv4, support lockless transmit over ipv6 UDP sockets.  From
    Vlad Yasevich.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1494+1 commits)
  crypto: fix af_alg_make_sg() conversion to iov_iter
  ipv4: Namespecify TCP PMTU mechanism
  i40e: Fix for stats init function call in Rx setup
  tcp: don't include Fast Open option in SYN-ACK on pure SYN-data
  openvswitch: Only set TUNNEL_VXLAN_OPT if VXLAN-GBP metadata is set
  ipv6: Make __ipv6_select_ident static
  ipv6: Fix fragment id assignment on LE arches.
  bridge: Fix inability to add non-vlan fdb entry
  net: Mellanox: Delete unnecessary checks before the function call "vunmap"
  cxgb4: Add support in cxgb4 to get expansion rom version via ethtool
  ethtool: rename reserved1 memeber in ethtool_drvinfo for expansion ROM version
  net: dsa: Remove redundant phy_attach()
  IB/mlx4: Reset flow support for IB kernel ULPs
  IB/mlx4: Always use the correct port for mirrored multicast attachments
  net/bonding: Fix potential bad memory access during bonding events
  tipc: remove tipc_snprintf
  tipc: nl compat add noop and remove legacy nl framework
  tipc: convert legacy nl stats show to nl compat
  tipc: convert legacy nl net id get to nl compat
  tipc: convert legacy nl net id set to nl compat
  ...
2015-02-10 20:01:30 -08:00
Linus Torvalds
29afc4e9a4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree changes from Jiri Kosina:
 "Patches from trivial.git that keep the world turning around.

  Mostly documentation and comment fixes, and a two corner-case code
  fixes from Alan Cox"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
  kexec, Kconfig: spell "architecture" properly
  mm: fix cleancache debugfs directory path
  blackfin: mach-common: ints-priority: remove unused function
  doubletalk: probe failure causes OOPS
  ARM: cache-l2x0.c: Make it clear that cache-l2x0 handles L310 cache controller
  msdos_fs.h: fix 'fields' in comment
  scsi: aic7xxx: fix comment
  ARM: l2c: fix comment
  ibmraid: fix writeable attribute with no store method
  dynamic_debug: fix comment
  doc: usbmon: fix spelling s/unpriviledged/unprivileged/
  x86: init_mem_mapping(): use capital BIOS in comment
2015-02-10 18:57:15 -08:00
Linus Torvalds
23e8fe2e16 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull RCU updates from Ingo Molnar:
 "The main RCU changes in this cycle are:

   - Documentation updates.

   - Miscellaneous fixes.

   - Preemptible-RCU fixes, including fixing an old bug in the
     interaction of RCU priority boosting and CPU hotplug.

   - SRCU updates.

   - RCU CPU stall-warning updates.

   - RCU torture-test updates"

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
  rcu: Initialize tiny RCU stall-warning timeouts at boot
  rcu: Fix RCU CPU stall detection in tiny implementation
  rcu: Add GP-kthread-starvation checks to CPU stall warnings
  rcu: Make cond_resched_rcu_qs() apply to normal RCU flavors
  rcu: Optionally run grace-period kthreads at real-time priority
  ksoftirqd: Use new cond_resched_rcu_qs() function
  ksoftirqd: Enable IRQs and call cond_resched() before poking RCU
  rcutorture: Add more diagnostics in rcu_barrier() test failure case
  torture: Flag console.log file to prevent holdovers from earlier runs
  torture: Add "-enable-kvm -soundhw pcspk" to qemu command line
  rcutorture: Handle different mpstat versions
  rcutorture: Check from beginning to end of grace period
  rcu: Remove redundant rcu_batches_completed() declaration
  rcutorture: Drop rcu_torture_completed() and friends
  rcu: Provide rcu_batches_completed_sched() for TINY_RCU
  rcutorture: Use unsigned for Reader Batch computations
  rcutorture: Make build-output parsing correctly flag RCU's warnings
  rcu: Make _batches_completed() functions return unsigned long
  rcutorture: Issue warnings on close calls due to Reader Batch blows
  documentation: Fix smp typo in memory-barriers.txt
  ...
2015-02-09 14:28:42 -08:00
Stephen Rothwell
61d7b09773 rhashtable: using ERR_PTR requires linux/err.h
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08 21:52:24 -08:00
Thomas Graf
020219a69d rhashtable: Fix remove logic to avoid cross references between buckets
The remove logic properly searched the remaining chain for a matching
entry with an identical hash but it did this while searching from both
the old and new table. Instead in order to not leave stale references
behind we need to:

 1. When growing and searching from the new table:
    Search remaining chain for entry with same hash to avoid having
    the new table directly point to a entry with a different hash.

 2. When shrinking and searching from the old table:
    Check if the element after the removed would create a cross
    reference and avoid it if so.

These bugs were present from the beginning in nft_hash.

Also, both insert functions calculated the hash based on the mask of
the new table. This worked while growing. Wwhile shrinking, the mask
of the inew table is smaller than the mask of the old table. This lead
to a bit not being taken into account when selecting the bucket lock
and thus caused the wrong bucket to be locked eventually.

Fixes: 7e1e77636e ("lib: Resizable, Scalable, Concurrent Hash Table")
Fixes: 97defe1ecf ("rhashtable: Per bucket locks & deferred expansion/shrinking")
Reported-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-06 15:19:17 -08:00
Thomas Graf
cf52d52f9c rhashtable: Avoid bucket cross reference after removal
During a resize, when two buckets in the larger table map to
a single bucket in the smaller table and the new table has already
been (partially) linked to the old table. Removal of an element
may result the bucket in the larger table to point to entries
which all hash to a different value than the bucket index. Thus
causing two buckets to point to the same sub chain after unzipping.
This is not illegal *during* the resize phase but after it has
completed.

Keep the old table around until all of the unzipping is done to
allow the removal code to only search for matching hashed entries
during this special period.

Reported-by: Ying Xue <ying.xue@windriver.com>
Fixes: 97defe1ecf ("rhashtable: Per bucket locks & deferred expansion/shrinking")
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-06 15:18:35 -08:00
Thomas Graf
7cd10db8de rhashtable: Add more lock verification
Catch hash miscalculations which result in hard to track down race
conditions.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-06 15:18:34 -08:00
Thomas Graf
a03eaec0df rhashtable: Dump bucket tables on locking violation under PROVE_LOCKING
This simplifies debugging of locking violations if compiled with
CONFIG_PROVE_LOCKING.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-06 15:18:34 -08:00
Thomas Graf
2af4b52988 rhashtable: Wait for RCU readers after final unzip work
We need to wait for all RCU readers to complete after the last bit of
unzipping has been completed. Otherwise the old table is freed up
prematurely.

Fixes: 7e1e77636e ("lib: Resizable, Scalable, Concurrent Hash Table")
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-06 15:18:34 -08:00
Thomas Graf
a5ec68e3b8 rhashtable: Use a single bucket lock for sibling buckets
rhashtable currently allows to use a bucket lock per bucket. This
requires multiple levels of complicated nested locking because when
resizing, a single bucket of the smaller table will map to two
buckets in the larger table. So far rhashtable has explicitly locked
both buckets in the larger table.

By excluding the highest bit of the hash from the bucket lock map and
thus only allowing locks to buckets in a ratio of 1:2, the locking
can be simplified a lot without losing the benefits of multiple locks.
Larger tables which benefit from multiple locks will not have a single
lock per bucket anyway.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-06 15:18:34 -08:00
Thomas Graf
c88455ce50 rhashtable: key_hashfn() must return full hash value
The value computed by key_hashfn() is used by rhashtable_lookup_compare()
to traverse both tables during a resize. key_hashfn() must therefore
return the hash value without the buckets mask applied so it can be
masked to the size of each individual table.

Fixes: 97defe1ecf ("rhashtable: Per bucket locks & deferred expansion/shrinking")
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-06 15:18:34 -08:00
David S. Miller
6e03f896b5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/vxlan.c
	drivers/vhost/net.c
	include/linux/if_vlan.h
	net/core/dev.c

The net/core/dev.c conflict was the overlap of one commit marking an
existing function static whilst another was adding a new function.

In the include/linux/if_vlan.h case, the type used for a local
variable was changed in 'net', whereas the function got rewritten
to fix a stacked vlan bug in 'net-next'.

In drivers/vhost/net.c, Al Viro's iov_iter conversions in 'net-next'
overlapped with an endainness fix for VHOST 1.0 in 'net'.

In drivers/net/vxlan.c, vxlan_find_vni() added a 'flags' parameter
in 'net-next' whereas in 'net' there was a bug fix to pass in the
correct network namespace pointer in calls to this function.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-05 14:33:28 -08:00
David S. Miller
f2683b743f Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
More iov_iter work from Al Viro.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-04 20:46:55 -08:00
Herbert Xu
f2dba9c6ff rhashtable: Introduce rhashtable_walk_*
Some existing rhashtable users get too intimate with it by walking
the buckets directly.  This prevents us from easily changing the
internals of rhashtable.

This patch adds the helpers rhashtable_walk_init/exit/start/next/stop
which will replace these custom walkers.

They are meant to be usable for both procfs seq_file walks as well
as walking by a netlink dump.  The iterator structure should fit
inside a netlink dump cb structure, with at least one element to
spare.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-04 20:34:52 -08:00
Herbert Xu
28134a53d6 rhashtable: Fix potential crash on destroy in rhashtable_shrink
The current being_destroyed check in rhashtable_expand is not
enough since if we start a shrinking process after freeing all
elements in the table that's also going to crash.

This patch adds a being_destroyed check to the deferred worker
thread so that we bail out as soon as we take the lock.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-04 20:34:52 -08:00
Al Viro
57dd8a0735 vhost: vhost_scsi_handle_vq() should just use copy_from_user()
it has just verified that it asks no more than the length of the
first segment of iovec.

And with that the last user of stuff in lib/iovec.c is gone.
RIP.

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
Cc: kvm@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-04 01:34:16 -05:00
Al Viro
ba7438aed9 vhost: don't bother copying iovecs in handle_rx(), kill memcpy_toiovecend()
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: kvm@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-04 01:34:16 -05:00
Al Viro
aad9a1cec7 vhost: switch vhost get_indirect() to iov_iter, kill memcpy_fromiovec()
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: kvm@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-04 01:34:15 -05:00
Jan Beulich
75aaf4c3e6 x86/raid6: correctly check for assembler capabilities
Just like for AVX2 (which simply needs an #if -> #ifdef conversion),
SSSE3 assembler support should be checked for before using it.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Jim Kukunas <james.t.kukunas@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NeilBrown <neilb@suse.de>
2015-02-04 08:35:51 +11:00
Geert Uytterhoeven
9d6dbe1bba rhashtable: Make selftest modular
Allow the selftest on the resizable hash table to be built modular, just
like all other tests that do not depend on DEBUG_KERNEL.

Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-30 18:06:33 -08:00
karl beldan
9ce357795e lib/checksum.c: fix build for generic csum_tcpudp_nofold
Fixed commit added from64to32 under _#ifndef do_csum_ but used it
under _#ifndef csum_tcpudp_nofold_, breaking some builds (Fengguang's
robot reported TILEGX's). Move from64to32 under the latter.

Fixes: 150ae0e946 ("lib/checksum.c: fix carry in csum_tcpudp_nofold")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-29 11:57:38 -08:00
Heiko Carstens
c0a80c0c27 ftrace: allow architectures to specify ftrace compile options
If the kernel is compiled with function tracer support the -pg compile option
is passed to gcc to generate extra code into the prologue of each function.

This patch replaces the "open-coded" -pg compile flag with a CC_FLAGS_FTRACE
makefile variable which architectures can override if a different option
should be used for code generation.

Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2015-01-29 09:19:19 +01:00
karl beldan
150ae0e946 lib/checksum.c: fix carry in csum_tcpudp_nofold
The carry from the 64->32bits folding was dropped, e.g with:
saddr=0xFFFFFFFF daddr=0xFF0000FF len=0xFFFF proto=0 sum=1,
csum_tcpudp_nofold returned 0 instead of 1.

Signed-off-by: Karl Beldan <karl.beldan@rivierawaves.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-28 22:32:33 -08:00
Thomas Graf
fe6a043c53 rhashtable: rhashtable_remove() must unlink in both tbl and future_tbl
As removals can occur during resizes, entries may be referred to from
both tbl and future_tbl when the removal is requested. Therefore
rhashtable_remove() must unlink the entry in both tables if this is
the case. The existing code did search both tables but stopped when it
hit the first match.

Failing to unlink in both tables resulted in use after free.

Fixes: 97defe1ecf ("rhashtable: Per bucket locks & deferred expansion/shrinking")
Reported-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-26 11:56:34 -08:00
Borislav Petkov
edb0ec0725 kexec, Kconfig: spell "architecture" properly
Grepping for "archicture" showed it actually twice! Most unusual
spelling error, very interesting. :)

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-01-26 14:36:46 +01:00
Linus Torvalds
360f54796e dcache: let the dentry count go down to zero without taking d_lock
We can be more aggressive about this, if we are clever and careful. This is subtle.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-01-25 23:16:29 -05:00
Michael S. Tsirkin
eb29d8d2aa pci: add pci_iomap_range
Virtio drivers should map the part of the BAR they need, not necessarily
all of it.

Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2015-01-21 16:28:49 +10:30
Ingo Molnar
f49028292c Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu
Pull RCU updates from Paul E. McKenney:

  - Documentation updates.

  - Miscellaneous fixes.

  - Preemptible-RCU fixes, including fixing an old bug in the
    interaction of RCU priority boosting and CPU hotplug.

  - SRCU updates.

  - RCU CPU stall-warning updates.

  - RCU torture-test updates.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-21 06:12:21 +01:00
Paul E. McKenney
78e691f4ae Merge branches 'doc.2015.01.07a', 'fixes.2015.01.15a', 'preempt.2015.01.06a', 'srcu.2015.01.06a', 'stall.2015.01.16a' and 'torture.2015.01.11a' into HEAD
doc.2015.01.07a: Documentation updates.
fixes.2015.01.15a: Miscellaneous fixes.
preempt.2015.01.06a: Changes to handling of lists of preempted tasks.
srcu.2015.01.06a: SRCU updates.
stall.2015.01.16a: RCU CPU stall-warning updates and fixes.
torture.2015.01.11a: RCU torture-test updates and fixes.
2015-01-15 23:34:34 -08:00
Ying Xue
57699a40b4 rhashtable: Fix race in rhashtable_destroy() and use regular work_struct
When we put our declared work task in the global workqueue with
schedule_delayed_work(), its delay parameter is always zero.
Therefore, we should define a regular work in rhashtable structure
instead of a delayed work.

By the way, we add a condition to check whether resizing functions
are NULL before cancelling the work, avoiding to cancel an
uninitialized work.

Lastly, while we wait for all work items we submitted before to run
to completion with cancel_delayed_work(), ht->mutex has been taken in
rhashtable_destroy(). Moreover, cancel_delayed_work() doesn't return
until all work items are accomplished, and when work items are
scheduled, the work's function - rht_deferred_worker() will be called.
However, as rht_deferred_worker() also needs to acquire the lock,
deadlock might happen at the moment as the lock is already held before.
So if the cancel work function is moved out of the lock covered scope,
this will avoid the deadlock.

Fixes: 97defe1 ("rhashtable: Per bucket locks & deferred expansion/shrinking")
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-16 01:18:51 -05:00
David S. Miller
3f3558bb51 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/xen-netfront.c

Minor overlapping changes in xen-netfront.c, mostly to do
with some buffer management changes alongside the split
of stats into TX and RX.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-15 00:53:17 -05:00
James Morris
bb31f607a0 Merge tag 'keys-next-fixes-20150114' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs into next 2015-01-15 11:30:54 +11:00
Rasmus Villemoes
b9f918a31d MPILIB: Fix comparison of negative MPIs
If u and v both represent negative integers and their limb counts
happen to differ, mpi_cmp will always return a positive value - this
is obviously bogus. u is smaller than v if and only if it is larger in
absolute value.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
2015-01-14 16:10:12 +00:00
Rasmus Villemoes
98dbbcba1b MPILIB: Fix obvious but harmless typo
The macro MPN_COPY_INCR this occurs in isn't used anywhere.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: David Howells <dhowells@redhat.com>
2015-01-14 15:16:00 +00:00
Rasmus Villemoes
7fe21291ba MPILIB: Deobfuscate mpi_cmp
The condition preceding 'return 1;' makes my head hurt. At this point,
we know that u and v have the same sign; if they are negative, they
compare opposite to how their absolute values compare (which
mpihelp_cmp found for us), otherwise cmp itself is the
answer. Negating cmp is ok since mpihelp_cmp returns {-1,0,1};
-INT_MIN==INT_MIN won't bite us.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
2015-01-14 15:15:57 +00:00
Thomas Graf
80ca8c3a84 rhashtable: Lower/upper bucket may map to same lock while shrinking
Each per bucket lock covers a configurable number of buckets. While
shrinking, two buckets in the old table contain entries for a single
bucket in the new table. We need to lock down both while linking.
Check if they are protected by different locks to avoid a recursive
lock.

Fixes: 97defe1e ("rhashtable: Per bucket locks & deferred expansion/shrinking")
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-14 00:21:44 -05:00
Ying Xue
7a868d1e9a rhashtable: involve rhashtable_lookup_compare_insert routine
Introduce a new function called rhashtable_lookup_compare_insert()
which is very similar to rhashtable_lookup_insert(). But the former
makes use of users' given compare function to look for an object,
and then inserts it into hash table if found. As the entire process
of search and insertion is under protection of per bucket lock, this
can help users to avoid the involvement of extra lock.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-13 14:01:00 -05:00
Linus Torvalds
aa9291355e KGDB/KDB fixes and cleanups
Cleanups
    kdb: Remove unused command flags, repeat flags and KDB_REPEAT_NONE
 
  Fixes
    kgdb/kdb: Allow access on a single core, if a CPU round up is deemed
       impossible, which will allow inspection of the now "trashed" kernel
    kdb: Add enable mask for the command groups
    kdb: access controls to restrict sensitive commands
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUrq8WAAoJEIciOldedpOj+C8P/AjSUVBZdBLWzCU2VG150sQ0
 UacwFVLve9heoColHBF7VqIDCRkZokIKJmCbHUBPZTbs22auLRpNI+D6CY5lZD17
 jEHxrkKY4ragRRc/W3Y1MSc3aeGnS0i5AR8PJermMWxyUBfN3FBxgFHzTaLB2ZTT
 8A+tvmwiG4mHue52gSiYZPCl/52WWOh+NjDe7T9OZ+mNmQKwZ5ssQZmmyUkxrs3b
 LKXVXVtTUXxfEgB2x+lYTYAztcTsM5h+NbkT74FpSmwPjvU/p81Ptqveh+3JTdmX
 H+Jz/SqD1/NfxC1Eenh5Mc++p/UVxeRbBulV9jwqjOyJqDjw3qHs1cjm8tZZj1qG
 J3LODKi3GWhujMCfwdu5EJRnrFxgHCPiWInc2708oLbRi5SyOe6P6hNQ3K3Y4JtF
 VkYa62wSaI0fDNQUFRc3bXUOUdMOCXjuzw3BtTi93tcUNcQwCXuYCmWtVvBgmK1h
 LTrFCJmzbopiwpomxCwZ4BQm8id9HxP5pod95ypYb8K5aheXHCuSgibqj0nswWMm
 ix0YTd4UNTn79r6p4d0fXFjOOYpXZA80ojeVI27D9zW7dBYc5CGVA1IDNH0ZfiPo
 qySPUNUMXIjiTSOGZdUehByEC7tliLZczelRPnNh/9fmhJkJ745S7zs3DNQ7Ypg4
 xDKthlRGNjn6cXOPl7gX
 =cf1c
 -----END PGP SIGNATURE-----

Merge tag 'for_linus-3.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb

Pull kgdb/kdb fixes from Jason Wessel:
 "These have been around since 3.17 and in kgdb-next for the last 9
  weeks and some will go back to -stable.

  Summary of changes:

  Cleanups
   - kdb: Remove unused command flags, repeat flags and KDB_REPEAT_NONE

  Fixes
   - kgdb/kdb: Allow access on a single core, if a CPU round up is
     deemed impossible, which will allow inspection of the now "trashed"
     kernel
   - kdb: Add enable mask for the command groups
   - kdb: access controls to restrict sensitive commands"

* tag 'for_linus-3.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb:
  kernel/debug/debug_core.c: Logging clean-up
  kgdb: timeout if secondary CPUs ignore the roundup
  kdb: Allow access to sensitive commands to be restricted by default
  kdb: Add enable mask for groups of commands
  kdb: Categorize kdb commands (similar to SysRq categorization)
  kdb: Remove KDB_REPEAT_NONE flag
  kdb: Use KDB_REPEAT_* values as flags
  kdb: Rename kdb_register_repeat() to kdb_register_flags()
  kdb: Rename kdb_repeat_t to kdb_cmdflags_t, cmd_repeat to cmd_flags
  kdb: Remove currently unused kdbtab_t->cmd_flags
2015-01-09 20:51:10 -08:00
Ying Xue
545a148e43 rhashtable: initialize atomic nelems variable
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-01-08 19:47:13 -08:00