Commit Graph

495023 Commits

Author SHA1 Message Date
Linus Torvalds
03c751a5e1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "None of these are huge, but my commit does fix a regression from 3.18
  that could cause lost files during log replay.

  This also adds Dave Sterba to the list of Btrfs maintainers.  It
  doesn't mean we're doing things differently, but Dave has really been
  helping with the maintainer workload for years"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: don't delay inode ref updates during log replay
  Btrfs: correctly get tree level in tree_backref_for_extent
  Btrfs: call inode_dec_link_count() on mkdir error path
  Btrfs: abort transaction if we don't find the block group
  Btrfs, scrub: uninitialized variable in scrub_extent_for_parity()
  Btrfs: add more maintainers
2015-01-09 17:46:07 -08:00
Linus Torvalds
b3d574aec7 Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "12 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm, vmscan: prevent kswapd livelock due to pfmemalloc-throttled process being killed
  memcg: fix destination cgroup leak on task charges migration
  mm: memcontrol: switch soft limit default back to infinity
  mm/debug_pagealloc: remove obsolete Kconfig options
  vfs: renumber FMODE_NONOTIFY and add to uniqueness check
  arch/blackfin/mach-bf533/boards/stamp.c: add linux/delay.h
  ocfs2: fix the wrong directory passed to ocfs2_lookup_ino_from_name() when link file
  MAINTAINERS: update rydberg's addresses
  mm: protect set_page_dirty() from ongoing truncation
  mm: prevent endless growth of anon_vma hierarchy
  exit: fix race between wait_consider_task() and wait_task_zombie()
  ocfs2: remove bogus check in dlm_process_recovery_data
2015-01-09 15:10:59 -08:00
Victor Kamensky
1e3479225a ARM: 8275/1: mm: fix PMD_SECT_RDONLY undeclared compile error
In v3.19-rc3 tree when CONFIG_ARM_LPAE and CONFIG_DEBUG_RODATA are enabled
image failed to compile with the following error:

arch/arm/mm/init.c:661:14: error: ‘PMD_SECT_RDONLY’ undeclared here (not in a function)

It seems that '80d6b0c ARM: mm: allow text and rodata sections to be read-only'
and 'ded9477 ARM: 8109/1: mm: Modify pte_write and pmd_write logic for LPAE'
commits crossed. 80d6b0c uses PMD_SECT_RDONLY macro but ded9477 renames it
and uses software bits L_PMD_SECT_RDONLY instead.

Fix is to use L_PMD_SECT_RDONLY instead PMD_SECT_RDONLY as ded9477 does in
another places.

Signed-off-by: Victor Kamensky <victor.kamensky@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-01-09 20:44:12 +00:00
Dan Carpenter
606185b20c HID: roccat: potential out of bounds in pyra_sysfs_write_settings()
This is a static checker fix.  We write some binary settings to the
sysfs file.  One of the settings is the "->startup_profile".  There
isn't any checking to make sure it fits into the
pyra->profile_settings[] array in the profile_activated() function.

I added a check to pyra_sysfs_write_settings() in both places because
I wasn't positive that the other callers were correct.

Cc: <stable@vger.kernel.org>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2015-01-09 14:41:01 +01:00
Chris Wilson
a63b03e2d2 mutex: Always clear owner field upon mutex_unlock()
Currently if DEBUG_MUTEXES is enabled, the mutex->owner field is only
cleared iff debug_locks is active. This exposes a race to other users of
the field where the mutex->owner may be still set to a stale value,
potentially upsetting mutex_spin_on_owner() among others.

References: https://bugs.freedesktop.org/show_bug.cgi?id=87955
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1420540175-30204-1-git-send-email-chris@chris-wilson.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-09 11:20:39 +01:00
Tetsuo Handa
7f1a169b88 sched/fair: Fix RCU stall upon -ENOMEM in sched_create_group()
When alloc_fair_sched_group() in sched_create_group() fails,
free_sched_group() is called, and free_fair_sched_group() is called by
free_sched_group(). Since destroy_cfs_bandwidth() is called by
free_fair_sched_group() without calling init_cfs_bandwidth(),
RCU stall occurs at hrtimer_cancel():

  INFO: rcu_sched self-detected stall on CPU { 1}  (t=60000 jiffies g=13074 c=13073 q=0)
  Task dump for CPU 1:
  (fprintd)       R  running task        0  6249      1 0x00000088
  ...
  Call Trace:
   <IRQ>  [<ffffffff81094988>] sched_show_task+0xa8/0x110
   [<ffffffff81097acd>] dump_cpu_task+0x3d/0x50
   [<ffffffff810c3a80>] rcu_dump_cpu_stacks+0x90/0xd0
   [<ffffffff810c7751>] rcu_check_callbacks+0x491/0x700
   [<ffffffff810cbf2b>] update_process_times+0x4b/0x80
   [<ffffffff810db046>] tick_sched_handle.isra.20+0x36/0x50
   [<ffffffff810db0a2>] tick_sched_timer+0x42/0x70
   [<ffffffff810ccb19>] __run_hrtimer+0x69/0x1a0
   [<ffffffff810db060>] ? tick_sched_handle.isra.20+0x50/0x50
   [<ffffffff810ccedf>] hrtimer_interrupt+0xef/0x230
   [<ffffffff810452cb>] local_apic_timer_interrupt+0x3b/0x70
   [<ffffffff8164a465>] smp_apic_timer_interrupt+0x45/0x60
   [<ffffffff816485bd>] apic_timer_interrupt+0x6d/0x80
   <EOI>  [<ffffffff810cc588>] ? lock_hrtimer_base.isra.23+0x18/0x50
   [<ffffffff81193cf1>] ? __kmalloc+0x211/0x230
   [<ffffffff810cc9d2>] hrtimer_try_to_cancel+0x22/0xd0
   [<ffffffff81193cf1>] ? __kmalloc+0x211/0x230
   [<ffffffff810ccaa2>] hrtimer_cancel+0x22/0x30
   [<ffffffff810a3cb5>] free_fair_sched_group+0x25/0xd0
   [<ffffffff8108df46>] free_sched_group+0x16/0x40
   [<ffffffff810971bb>] sched_create_group+0x4b/0x80
   [<ffffffff810aa383>] sched_autogroup_create_attach+0x43/0x1c0
   [<ffffffff8107dc9c>] sys_setsid+0x7c/0x110
   [<ffffffff81647729>] system_call_fastpath+0x12/0x17

Check whether init_cfs_bandwidth() was called before calling
destroy_cfs_bandwidth().

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
[ Move the check into destroy_cfs_bandwidth() to aid compilability. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Paul Turner <pjt@google.com>
Cc: Ben Segall <bsegall@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/201412252210.GCC30204.SOMVFFOtQJFLOH@I-love.SAKURA.ne.jp
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-09 11:19:00 +01:00
Luca Abeni
269ad8015a sched/deadline: Avoid double-accounting in case of missed deadlines
The dl_runtime_exceeded() function is supposed to ckeck if
a SCHED_DEADLINE task must be throttled, by checking if its
current runtime is <= 0. However, it also checks if the
scheduling deadline has been missed (the current time is
larger than the current scheduling deadline), further
decreasing the runtime if this happens.
This "double accounting" is wrong:

- In case of partitioned scheduling (or single CPU), this
  happens if task_tick_dl() has been called later than expected
  (due to small HZ values). In this case, the current runtime is
  also negative, and replenish_dl_entity() can take care of the
  deadline miss by recharging the current runtime to a value smaller
  than dl_runtime

- In case of global scheduling on multiple CPUs, scheduling
  deadlines can be missed even if the task did not consume more
  runtime than expected, hence penalizing the task is wrong

This patch fix this problem by throttling a SCHED_DEADLINE task
only when its runtime becomes negative, and not modifying the runtime

Signed-off-by: Luca Abeni <luca.abeni@unitn.it>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@gmail.com>
Cc: <stable@vger.kernel.org>
Cc: Dario Faggioli <raistlin@linux.it>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1418813432-20797-3-git-send-email-luca.abeni@unitn.it
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-09 11:18:57 +01:00
Luca Abeni
6a503c3be9 sched/deadline: Fix migration of SCHED_DEADLINE tasks
According to global EDF, tasks should be migrated between runqueues
without checking if their scheduling deadlines and runtimes are valid.
However, SCHED_DEADLINE currently performs such a check:
a migration happens doing:

	deactivate_task(rq, next_task, 0);
	set_task_cpu(next_task, later_rq->cpu);
	activate_task(later_rq, next_task, 0);

which ends up calling dequeue_task_dl(), setting the new CPU, and then
calling enqueue_task_dl().

enqueue_task_dl() then calls enqueue_dl_entity(), which calls
update_dl_entity(), which can modify scheduling deadline and runtime,
breaking global EDF scheduling.

As a result, some of the properties of global EDF are not respected:
for example, a taskset {(30, 80), (40, 80), (120, 170)} scheduled on
two cores can have unbounded response times for the third task even
if 30/80+40/80+120/170 = 1.5809 < 2

This can be fixed by invoking update_dl_entity() only in case of
wakeup, or if this is a new SCHED_DEADLINE task.

Signed-off-by: Luca Abeni <luca.abeni@unitn.it>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@gmail.com>
Cc: <stable@vger.kernel.org>
Cc: Dario Faggioli <raistlin@linux.it>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1418813432-20797-2-git-send-email-luca.abeni@unitn.it
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-09 11:18:56 +01:00
Yuyang Du
32a8df4e0b sched: Fix odd values in effective_load() calculations
In effective_load, we have (long w * unsigned long tg->shares) / long W,
when w is negative, it is cast to unsigned long and hence the product is
insanely large. Fix this by casting tg->shares to long.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Yuyang Du <yuyang.du@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Andrey Ryabinin <a.ryabinin@samsung.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20141219002956.GA25405@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-09 11:18:54 +01:00
Peter Zijlstra
536ebe9ca9 sched, fanotify: Deal with nested sleeps
As per e23738a730 ("sched, inotify: Deal with nested sleeps").

fanotify_read is a wait loop with sleeps in. Wait loops rely on
task_struct::state and sleeps do too, since that's the only means of
actually sleeping. Therefore the nested sleeps destroy the wait loop
state and the wait loop breaks the sleep functions that assume
TASK_RUNNING (mutex_lock).

Fix this by using the new woken_wake_function and wait_woken() stuff,
which registers wakeups in wait and thereby allows shrinking the
task_state::state changes to the actual sleep part.

Reported-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Eric Paris <eparis@redhat.com>
Link: http://lkml.kernel.org/r/20141216152838.GZ3337@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-09 11:18:12 +01:00
Andi Kleen
5306c31c57 perf/x86/uncore/hsw-ep: Handle systems with only two SBOXes
There was another report of a boot failure with a #GP fault in the
uncore SBOX initialization. The earlier work around was not enough
for this system.

The boot was failing while trying to initialize the third SBOX.

This patch detects parts with only two SBOXes and limits the number
of SBOX units to two there.

Stable material, as it affects boot problems on 3.18.

Tested-by: Andreas Oehler <andreas@oehler-net.de>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Yan, Zheng <zheng.z.yan@intel.com>
Link: http://lkml.kernel.org/r/1420583675-9163-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-09 11:12:30 +01:00
Andy Lutomirski
86c269fea3 perf/x86_64: Improve user regs sampling
Perf reports user regs for kernel-mode samples so that samples can
be backtraced through user code.  The old code was very broken in
syscall context, resulting in useless backtraces.

The new code, in contrast, is still dangerously racy, but it should
at least work most of the time.

Tested-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: chenggang.qcg@taobao.com
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/243560c26ff0f739978e2459e203f6515367634d.1420396372.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-09 11:12:29 +01:00
Andy Lutomirski
88a7c26af8 perf: Move task_pt_regs sampling into arch code
On x86_64, at least, task_pt_regs may be only partially initialized
in many contexts, so x86_64 should not use it without extra care
from interrupt context, let alone NMI context.

This will allow x86_64 to override the logic and will supply some
scratch space to use to make a cleaner copy of user regs.

Tested-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: chenggang.qcg@taobao.com
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Jean Pihet <jean.pihet@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mark Salter <msalter@redhat.com>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Link: http://lkml.kernel.org/r/e431cd4c18c2e1c44c774f10758527fb2d1025c4.1420396372.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-09 11:12:28 +01:00
Peter Zijlstra
0f363b250b x86: Fix off-by-one in instruction decoder
Stephane reported that the PEBS fixup was broken by the recent commit to
the instruction decoder. The thing had an off-by-one which resulted in
not being able to decode the last instruction and always bail.

Reported-by: Stephane Eranian <eranian@google.com>
Fixes: 6ba48ff46f ("x86: Remove arbitrary instruction size limit in instruction decoder")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org # 3.18
Cc: <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Liang Kan <kan.liang@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Link: http://lkml.kernel.org/r/20141216104614.GV3337@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-09 11:12:26 +01:00
Ingo Molnar
300176af03 perf/urgent fixes:
. Free callchains when hist entries are deleted, plugging a massive leak in
   'top -g', where hist_entries (and its callchains) are decayed over time. (Namhyung Kim)
 
 . Fix segfault when showing callchain in the hists browser (report & top) (Namhyung Kim)
 
 . Fix children sort key behavior, and also the 'perf test 32' test that
   was failing due to reliance on undefined behaviour (Namhyung Kim)
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJUrqORAAoJEBpxZoYYoA71ztUIAKt5ElyjO4yVdUctT+/ER9Ku
 Wl0KDEtpbRswtZJmNa88AteBx+ZUwgcDp0A3kKFArxAxnPZ5C0GjeZrd1MR9cY4j
 0hx311ZR2UWi9M51rVBpmy1Cc5HoroNJY6zA/j3o9baeeDfWLLcavo1O5nl7II4n
 SekmyO+zqhNj+kN26OO2tMwzBstGYUJYSlGLKXZ1KCNWYi9qUlvQ5tmb7tAD6/mH
 Tu0ZpeI4QbhH3rb33JJYx4xLap+zYsb67/yzAeSw7wiLeJq3NhWVDHGaLbAUR1hF
 FgRnBV+cxuTEAvehdhwqdd4Gw0CpEdFxENKlZaZIOOPQj+oqcVSXlkxpj00z9ko=
 =aWvF
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

Pull perf/urgent fixes from Arnaldo Carvalho de Melo:

 - Free callchains when hist entries are deleted, plugging a massive leak in
   'top -g', where hist_entries (and its callchains) are decayed over time. (Namhyung Kim)

 - Fix segfault when showing callchain in the hists browser (report & top) (Namhyung Kim)

 - Fix children sort key behavior, and also the 'perf test 32' test that
   was failing due to reliance on undefined behaviour (Namhyung Kim)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-09 11:04:13 +01:00
Vlastimil Babka
9e5e366172 mm, vmscan: prevent kswapd livelock due to pfmemalloc-throttled process being killed
Charles Shirron and Paul Cassella from Cray Inc have reported kswapd
stuck in a busy loop with nothing left to balance, but
kswapd_try_to_sleep() failing to sleep.  Their analysis found the cause
to be a combination of several factors:

1. A process is waiting in throttle_direct_reclaim() on pgdat->pfmemalloc_wait

2. The process has been killed (by OOM in this case), but has not yet been
   scheduled to remove itself from the waitqueue and die.

3. kswapd checks for throttled processes in prepare_kswapd_sleep():

        if (waitqueue_active(&pgdat->pfmemalloc_wait)) {
                wake_up(&pgdat->pfmemalloc_wait);
		return false; // kswapd will not go to sleep
	}

   However, for a process that was already killed, wake_up() does not remove
   the process from the waitqueue, since try_to_wake_up() checks its state
   first and returns false when the process is no longer waiting.

4. kswapd is running on the same CPU as the only CPU that the process is
   allowed to run on (through cpus_allowed, or possibly single-cpu system).

5. CONFIG_PREEMPT_NONE=y kernel is used. If there's nothing to balance, kswapd
   encounters no voluntary preemption points and repeatedly fails
   prepare_kswapd_sleep(), blocking the process from running and removing
   itself from the waitqueue, which would let kswapd sleep.

So, the source of the problem is that we prevent kswapd from going to
sleep until there are processes waiting on the pfmemalloc_wait queue,
and a process waiting on a queue is guaranteed to be removed from the
queue only when it gets scheduled.  This was done to make sure that no
process is left sleeping on pfmemalloc_wait when kswapd itself goes to
sleep.

However, it isn't necessary to postpone kswapd sleep until the
pfmemalloc_wait queue actually empties.  To prevent processes from being
left sleeping, it's actually enough to guarantee that all processes
waiting on pfmemalloc_wait queue have been woken up by the time we put
kswapd to sleep.

This patch therefore fixes this issue by substituting 'wake_up' with
'wake_up_all' and removing 'return false' in the code snippet from
prepare_kswapd_sleep() above.  Note that if any process puts itself in
the queue after this waitqueue_active() check, or after the wake up
itself, it means that the process will also wake up kswapd - and since
we are under prepare_to_wait(), the wake up won't be missed.  Also we
update the comment prepare_kswapd_sleep() to hopefully more clearly
describe the races it is preventing.

Fixes: 5515061d22 ("mm: throttle direct reclaimers if PF_MEMALLOC reserves are low and swap is backed by network storage")
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: <stable@vger.kernel.org>	[3.6+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-08 15:10:52 -08:00
Vladimir Davydov
4bdfc1c4a9 memcg: fix destination cgroup leak on task charges migration
We are supposed to take one css reference per each memory page and per
each swap entry accounted to a memory cgroup.  However, during task
charges migration we take a reference to the destination cgroup twice
per each swap entry: first in mem_cgroup_do_precharge()->try_charge()
and then in mem_cgroup_move_swap_account(), permanently leaking the
destination cgroup.

The hunk taking the second reference seems to be a leftover from the
pre-00501b531c472 ("mm: memcontrol: rewrite charge API") era.  Remove it
to fix the leak.

Fixes: e8ea14cc6e (mm: memcontrol: take a css reference for each charged page)
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-08 15:10:52 -08:00
Johannes Weiner
24d404dc10 mm: memcontrol: switch soft limit default back to infinity
Commit 3e32cb2e0a ("mm: memcontrol: lockless page counters")
accidentally switched the soft limit default from infinity to zero,
which turns all memcgs with even a single page into soft limit excessors
and engages soft limit reclaim on all of them during global memory
pressure.  This makes global reclaim generally more aggressive, but also
inverts the meaning of existing soft limit configurations where unset
soft limits are usually more generous than set ones.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Vladimir Davydov <vdavydov@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-08 15:10:52 -08:00
Joonsoo Kim
70ecb3cb03 mm/debug_pagealloc: remove obsolete Kconfig options
These are obsolete since commit e30825f186 ("mm/debug-pagealloc:
prepare boottime configurable") was merged.  So remove them.

[pebolle@tiscali.nl: find obsolete Kconfig options]
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Paul Bolle <pebolle@tiscali.nl>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dave Hansen <dave@sr71.net>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Jungsoo Son <jungsoo.son@lge.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-08 15:10:52 -08:00
David Drysdale
75069f2b5b vfs: renumber FMODE_NONOTIFY and add to uniqueness check
Fix clashing values for O_PATH and FMODE_NONOTIFY on sparc.  The
clashing O_PATH value was added in commit 5229645bdc ("vfs: add
nonconflicting values for O_PATH") but this can't be changed as it is
user-visible.

FMODE_NONOTIFY is only used internally in the kernel, but it is in the
same numbering space as the other O_* flags, as indicated by the comment
at the top of include/uapi/asm-generic/fcntl.h (and its use in
fs/notify/fanotify/fanotify_user.c).  So renumber it to avoid the clash.

All of this has happened before (commit 12ed2e36c9: "fanotify:
FMODE_NONOTIFY and __O_SYNC in sparc conflict"), and all of this will
happen again -- so update the uniqueness check in fcntl_init() to
include __FMODE_NONOTIFY.

Signed-off-by: David Drysdale <drysdale@google.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Jan Kara <jack@suse.cz>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-08 15:10:52 -08:00
Oleg Nesterov
9de93e7873 arch/blackfin/mach-bf533/boards/stamp.c: add linux/delay.h
build error

  arch/blackfin/mach-bf533/boards/stamp.c:834:2: error: implicit declaration of function 'mdelay'

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-08 15:10:52 -08:00
Xue jiufei
53dc20b9a3 ocfs2: fix the wrong directory passed to ocfs2_lookup_ino_from_name() when link file
In ocfs2_link(), the parent directory inode passed to function
ocfs2_lookup_ino_from_name() is wrong.  Parameter dir is the parent of
new_dentry not old_dentry.  We should get old_dir from old_dentry and
lookup old_dentry in old_dir in case another node remove the old dentry.

With this change, hard linking works again, when paths are relative with
at least one subdirectory.  This is how the problem was reproducable:

  # mkdir a
  # mkdir b
  # touch a/test
  # ln a/test b/test
  ln: failed to create hard link `b/test' => `a/test': No such file or  directory

However when creating links in the same dir, it worked well.

Now the link gets created.

Fixes: 0e048316ff ("ocfs2: check existence of old dentry in ocfs2_link()")
Signed-off-by: joyce.xue <xuejiufei@huawei.com>
Reported-by: Szabo Aron - UBIT <aron@ubit.hu>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Tested-by: Aron Szabo <aron@ubit.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-08 15:10:51 -08:00
Henrik Rydberg
75dd112aac MAINTAINERS: update rydberg's addresses
My ISP finally gave up on the old mail address, so I am moving things
over to bitmath.org instead.  Also change the status fields to better
reflect reality.

Signed-off-by: Henrik Rydberg <rydberg@bitmath.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-08 15:10:51 -08:00
Johannes Weiner
2d6d7f9828 mm: protect set_page_dirty() from ongoing truncation
Tejun, while reviewing the code, spotted the following race condition
between the dirtying and truncation of a page:

__set_page_dirty_nobuffers()       __delete_from_page_cache()
  if (TestSetPageDirty(page))
                                     page->mapping = NULL
				     if (PageDirty())
				       dec_zone_page_state(page, NR_FILE_DIRTY);
				       dec_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);
    if (page->mapping)
      account_page_dirtied(page)
        __inc_zone_page_state(page, NR_FILE_DIRTY);
	__inc_bdi_stat(mapping->backing_dev_info, BDI_RECLAIMABLE);

which results in an imbalance of NR_FILE_DIRTY and BDI_RECLAIMABLE.

Dirtiers usually lock out truncation, either by holding the page lock
directly, or in case of zap_pte_range(), by pinning the mapcount with
the page table lock held.  The notable exception to this rule, though,
is do_wp_page(), for which this race exists.  However, do_wp_page()
already waits for a locked page to unlock before setting the dirty bit,
in order to prevent a race where clear_page_dirty() misses the page bit
in the presence of dirty ptes.  Upgrade that wait to a fully locked
set_page_dirty() to also cover the situation explained above.

Afterwards, the code in set_page_dirty() dealing with a truncation race
is no longer needed.  Remove it.

Reported-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-08 15:10:51 -08:00
Konstantin Khlebnikov
7a3ef208e6 mm: prevent endless growth of anon_vma hierarchy
Constantly forking task causes unlimited grow of anon_vma chain.  Each
next child allocates new level of anon_vmas and links vma to all
previous levels because pages might be inherited from any level.

This patch adds heuristic which decides to reuse existing anon_vma
instead of forking new one.  It adds counter anon_vma->degree which
counts linked vmas and directly descending anon_vmas and reuses anon_vma
if counter is lower than two.  As a result each anon_vma has either vma
or at least two descending anon_vmas.  In such trees half of nodes are
leafs with alive vmas, thus count of anon_vmas is no more than two times
bigger than count of vmas.

This heuristic reuses anon_vmas as few as possible because each reuse
adds false aliasing among vmas and rmap walker ought to scan more ptes
when it searches where page is might be mapped.

Link: http://lkml.kernel.org/r/20120816024610.GA5350@evergreen.ssec.wisc.edu
Fixes: 5beb493052 ("mm: change anon_vma linking to fix multi-process server scalability issue")
[akpm@linux-foundation.org: fix typo, per Rik]
Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Reported-by: Daniel Forrest <dan.forrest@ssec.wisc.edu>
Tested-by: Michal Hocko <mhocko@suse.cz>
Tested-by: Jerome Marchand <jmarchan@redhat.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: <stable@vger.kernel.org>	[2.6.34+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-08 15:10:51 -08:00
Oleg Nesterov
3245d6acab exit: fix race between wait_consider_task() and wait_task_zombie()
wait_consider_task() checks EXIT_ZOMBIE after EXIT_DEAD/EXIT_TRACE and
both checks can fail if we race with EXIT_ZOMBIE -> EXIT_DEAD/EXIT_TRACE
change in between, gcc needs to reload p->exit_state after
security_task_wait().  In this case ->notask_error will be wrongly
cleared and do_wait() can hang forever if it was the last eligible
child.

Many thanks to Arne who carefully investigated the problem.

Note: this bug is very old but it was pure theoretical until commit
b3ab03160d ("wait: completely ignore the EXIT_DEAD tasks").  Before
this commit "-O2" was probably enough to guarantee that compiler won't
read ->exit_state twice.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Arne Goedeke <el@laramies.com>
Tested-by: Arne Goedeke <el@laramies.com>
Cc: <stable@vger.kernel.org>	[3.15+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-08 15:10:51 -08:00
Joseph Qi
eb4f73b4ca ocfs2: remove bogus check in dlm_process_recovery_data
In dlm_process_recovery_data, only when dlm_new_lock failed the ret will
be set to -ENOMEM.  And in this case, newlock is definitely NULL.  So
test newlock is meaningless, remove it.

Signed-off-by: Joseph Qi <joseph.qi@huawei.com>
Reviewed-by: Alex Chen <alex.chen@huawei.com>
Reviewed-by: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-08 15:10:51 -08:00
Linus Torvalds
11c8f01b42 Merge branch 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
Pull kbuild fix from Michal Marek:
 "make mrproper / distclean stopped removing the generated debian/
  directory in v3.16.  This fixes it"

* 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
  kbuild: Fix removal of the debian/ directory
2015-01-08 14:35:00 -08:00
Michal Marek
90ac086bca Makefile: include arch/*/include/generated/uapi before .../generated
The introduction of the uapi directories in v3.7-rc1 moved some of the
generated headers from arch/*/include/generated to the uapi directory,
keeping the #include directives intact.

This creates a problem when bisecting, because the unversioned files are
not cleaned automatically by git and the compiler might include stale
headers as a result.  Instead of cleaning them in the Makefiles, promote
arch/*/include/generated/uapi in the search path.  Under normal
circumstances, there is no overlap between this uapi subdirectory and
its parent, so the include choices remain the same.  We keep
arch/*/include/generated/uapi in the USERINCLUDE variable so that it is
usable standalone.

Note that we cannot completely swap the order of the uapi and
kernel-only directories, since the headers in include/uapi/asm-generic
are meant to be wrapped by their include/asm-generic counterparts when
building kernel code.

Reported-by: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
Reported-by: David Drysdale <dmd@lurklurk.org>
Signed-off-by: Michal Marek <mmarek@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-01-08 14:24:54 -08:00
Linus Torvalds
c4ccac460a A set of assorted pin control fixes for the Rockchip
and STi drivers.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUrj7qAAoJEEEQszewGV1zMe0P/jhTWx/hP6IsgwTxlZrxrQjP
 jLzoRyA9WE3+Rxk+W5XFl/NBX4JJxcIJvJ1ffc5VbjVNV1gZSTnxGn44YdSY3aCW
 7Gkw0GGz8JzbbeZH8jJyPNM01SAMzKHZOIx9xIl9e+97kSku/wPQvOnmMoqXgEjS
 d2Jxrh8VscL8pUexNyqXzI4JNMPnQ/opuzP/ODWoQ9UCv5/2eRlHlam+347yTYAo
 fI65aMHXG/vOjDqMs//vtIs84Lyeh1oQnt6O/wDduzECpaA4y2AKMJpWUalRASaJ
 Rm1B7OEpYDg3HPTEh1lhEUGyYCX0uTzEovJG+t5MCnaxX2S5lU/RJBNfbTHLOkb3
 8JKU2DWrbDnhweyaAJE7QeFD2y35gckz8hnhN+3nmOTBxlae7qXBTEMDk0YJzvC3
 tYPFjoj+/P3IROgWOKDzG5MUSx9pgUFDiEhTvQZeuDWjwQ+9NOZq/xfoyR2yk0ob
 sBPv7SMaiubhAcP0F5GkXjH3G8Kyb9tsOZZsZV5v12EZtb64c4HYqHvlveN4mKRP
 MEKknTVz8vY4yXqEtap4H/T+19aLzHJdr/pA75KKSbmOZ25cCEMBV2Ruqk5m8RFy
 KdbWDaiPaen28TTB0wMyW00jheHa8P1xMF8cwTxUBDH6pM3N9QTG9U270HaIvbAu
 9/6w1W880RCQMox+phVT
 =gZIz
 -----END PGP SIGNATURE-----

Merge tag 'pinctrl-v3.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pinctrl fixes from Linus Walleij:
 "Allright allright I've been lazy over christmas and New Years.  Here
  are a few collected pin control fixes eventually.  Details:

  A set of assorted pin control fixes for the Rockchip and STi drivers"

* tag 'pinctrl-v3.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: st: Add irq_disable hook to st_gpio_irqchip
  pinctrl: st: avoid multiple mutex lock
  pinctrl: rockchip: Fix enable/disable/mask/unmask
  pinctrl: rockchip: Handle wakeup pins
2015-01-08 14:19:44 -08:00
Linus Torvalds
74e59ea05c Power management and ACPI material for 3.19-rc4
- Fix ACPI power management intialization for device objects
    corresponding to devices that are not present at the init time
    (the _STA control method returns 0 for them) and therefore should
    not be regarded as power manageable (Rafael J Wysocki).
 
  - Rename a structure field and two functions used by the ACPI
    processor driver to make them less tied to architectures that
    use APICs (both x86 and ia64) and more suitable for ARM64
    processors (Hanjun Guo).
 
  - Add a disable_native_backlight quirk for Dell XPS15 L521X
    designed in an unusual way preventing native backlight from
    working on that machine (Hans de Goede).
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJUrcCTAAoJEILEb/54YlRxeMUQAKOHS8rlq0XtOxieufCks0Rq
 e96ZExiTdLI/KYZCgqwkSJxR7w2981hbFVIcFccPpo1k9Z1YowSOvWcHvmn1RwHQ
 lXDfCEqWssIXMn0zctH3Ob/uBggvWFx9g5y8slOZ6W2LaUzzNdA+ZqxIbNsgwNSN
 vji2E1m2ZhcwgThPYeXNsvvWJzYyC3LI4nZ8UIKE8kUMM5oxYoIlrW/qjoHc8KNV
 AlUv+e0Z43XHy5jHaS8sQrKGrAPrdUroDRcByw1MG/V8r4vSXepvNjOXXwEZ9SF9
 xPp/bzLLRPK8FW4MQ2VEhTcO0FcjblDTQDhenDHKyjEonWOse5A9VbAyZ2dORfZ+
 juDsO3xalYk+OoHjRDdzxkQLIyQOATTcxkyGxyJf+HjcD6nfsdibWzisM6nl7E9h
 hpwGRhb5sgbWV9QRwmkkvFBP/uCsF3rHA/qCZQCFsxs0n3Ty5wiAg2JEHEL/kPF9
 6vPsX4ttukw5MbNzTyHfXOm41Ula3u4EfJvTdQjWNdx9uifBjsosetw3ho6q4XmP
 e6mCBFIU0Fxxfnmgij5x6ufGCBCrkpbLuZsC63HyaRDiRDN4DuftDLesVEySfJix
 +FiqilOyiOmSXHHC17cF8YNYsxMdFo1mGJDpV1PS1gnh/xwmEgfzwdWrjso7Y76L
 9MilW/AXspuXxegu6JDu
 =NEY0
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-3.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management and ACPI fixes from Rafael Wysocki:
 "These are an ACPI device power management initialization fix (-stable
  material), two commits renaming stuff in the ACPI processor driver to
  make it more suitable for ARM64 processors and a new ACPI backlight
  blacklist entry.

  Specifics:

   - Fix ACPI power management intialization for device objects
     corresponding to devices that are not present at the init time (the
     _STA control method returns 0 for them) and therefore should not be
     regarded as power manageable (Rafael J Wysocki).

   - Rename a structure field and two functions used by the ACPI
     processor driver to make them less tied to architectures that use
     APICs (both x86 and ia64) and more suitable for ARM64 processors
     (Hanjun Guo).

   - Add a disable_native_backlight quirk for Dell XPS15 L521X designed
     in an unusual way preventing native backlight from working on that
     machine (Hans de Goede)"

* tag 'pm+acpi-3.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI / video: Add disable_native_backlight quirk for Dell XPS15 L521X
  ACPI / processor: Rename acpi_(un)map_lsapic() to acpi_(un)map_cpu()
  ACPI / processor: Convert apic_id to phys_id to make it arch agnostic
  ACPI / PM: Fix PM initialization for devices that are not present
2015-01-08 14:11:03 -08:00
Linus Torvalds
086b2a942e Keyrings fixes
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIVAwUAVK1agvu3V2unywtrAQLBEg/9FfHWzF0CcjPeYRaHgpmf28F+VjbYCu3R
 GoUHGhugEwkXOs0jIEBSLxpMjKLC6EhbypjFyQpKpvKUKxa2gwqc8NopFaeN8rhP
 pCwyCTgVVmKmhpEct73jp0B7rw7HGQ9v87oLTBgdaDKy2RQkLLhgJZVC4SAr0/sn
 NXXA/An+xrWyzrisT085Pb7Cklj27k44lbeJfgKBDNqXrAOYG5LiomZmWPphHxq8
 6bvgQulZZQiHEsASRC2JNly6Ijh6HW5OWtSqKIxo904HDaU5DnZsWopaCdcAl3le
 auLE9MRDdhG8oOuOrMqcj9hSieSe1fCAG7fFNqGLHwcXitxvNO/A6W2psCfe9/4N
 umHEjs6F9OBqAnu0AsbAO+K1NgJAikUy56wxpucRK6lfcJ6/xDCKFN3xLoKP+d2J
 AzhNRB7qMah9deVVudU+XuipYuv5F397XIRBJTbR0cvpNO6K71OS4YCaf+q8r5Ex
 gFtwLWlcY5XEqVPSiARaQiNmlx1kpRQzhnu2soMl+Bd0l0lYaI99Xig/z04ay4L+
 Ed+BYWZfu2T/ueDm/INA40b5ZF2y9Qt0RN1CzUyqras+ZIwGhGOeRJfsEvO33sx9
 W8gnjnMDLynarwmajmfaJfvTaj1iyT6omQiYgsjT0fe/JDgnEgKri/XJqxRtcPy5
 nmMN9CdT+hw=
 =CGYU
 -----END PGP SIGNATURE-----

Merge tag 'keys-fixes-20150107' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs

Pull keyrings fixes from David Howells:
 "Two fixes:

   - Fix for the order in which things are done during key garbage
     collection to prevent named keyrings causing a crash
     [CVE-2014-9529].

   - Fix assoc_array to explicitly #include rcupdate.h to prevent
     compilation errors under certain circumstances"

* tag 'keys-fixes-20150107' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
  assoc_array: Include rcupdate.h for call_rcu() definition
  KEYS: close race between key lookup and freeing
2015-01-08 13:52:16 -08:00
Linus Torvalds
b11ecb2785 virtio, vhost fixes for 3.19
This fixes a couple of bugs triggered by hot-unplug
 of virtio devices, as well as a regression in vhost-net.
 
 Cc: Rusty Russell <rusty@rustcorp.com.au>
 Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJUrQ7bAAoJECgfDbjSjVRptOUH/R8uZWxD2LY8cyUpGuUKkhXi
 SX727WQXMYd5tZH0MAiakFvL8+rGz5A/MAZFr+kT1xjgt3WAYHk9CwdvPBpd0YlD
 uawqr2t5t+AzGbNLmn5S7D2OD89Yzk6dOsvotgvjtlOxWTcMhZxS6ynNB6iSM4Ul
 K/5DCnDxgm3BYyljtdAEStQS7pft1BITtK1BBmJZ5EYR2+sOKcNAM7imYKJ0XA52
 Spvkn/SIqLCaapLcDg451f3T1BupqqceaUMO0XKDDrGcLesJC1X1nXcxs4Q6zMED
 Zh6frLnoKrCvzInuIrW5VyLwLn72HTJZ1EtKmp/Q1IdfXXuoBhOZxS8mFnvG09E=
 =q55J
 -----END PGP SIGNATURE-----

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

pull virtio/vhost fixes from Michael Tsirkin:
 "This fixes a couple of bugs triggered by hot-unplug of virtio devices,
  as well as a regression in vhost-net"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  vhost/net: length miscalculation
  virtio_pci: document why we defer kfree
  virtio_pci: defer kfree until release callback
  virtio_pci: device-specific release callback
  virtio: make del_vqs idempotent
2015-01-08 13:05:56 -08:00
Linus Torvalds
1c169383c8 IOMMU Fixes for Linux v3.19-rc3
Including:
 
 	* A domain structure leak fix in the Intel VT-d driver
 
 	* Compile error fix for the VMSA IPMMU driver because of the
 	  IOMMU_EXEC -> IOMMU_NOEXEC conversion
 
 	* Two small cleanups as an aftermath of the merge window and the
 	  domain-leak fix
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJUrU8AAAoJECvwRC2XARrjMLwP/3bv9V+bX4JdCDb0pIY/69kC
 +7pQqrE1la4yFVYGwGrzUD8mcswkMudc0j+mgmWZAoM73AV7k9qTcPRhtOPDIijn
 nE1JezbY0CHjkW2QnKkT9HLMoC5xxESurc1FFeNnAfQD3Lx9T9ZMcsnyT5z3AxHc
 IhqWtRC1bcqEUXmnGCm+VjnzsVH4sxSWc7tGkS6gMflqjQgAd8Fla9EeStwQo0rX
 g139MuS7m1CwPLh9yXN0sQ1PWdczYy3o3p4Q0j+XUtGdJalmSS32NcoJYBHVQgJf
 IzOFqCvVgzWtaiYS9YqKSX+dOJ3sUlK1q8XPikCoDX6TcipTpEirKdzqxmEXCM8h
 WZ0GMfFzSerCfVR9imlKiTuei1SM4t/hQy0VBqA1gOwVb50RVeyUDIRlDECPpnJT
 9uZuB1B5kHGXmpiJ7cUDZjUHnx/wH/mEuJj1eSMXr+KEEMu9QQpM4JgK1eKz5Vw3
 qaaJBOQuccPj2tOgW7TN5liHL2rj+KkDtDGgeUX+sga0yenV0zNfa6Qe5gEt2BFu
 fMHJlrKjabMouPB0Vx5R221/Q1ZdzZkFZ2xsyyk02yQqNrvYB6n0ZEreuq39tPfJ
 XIfIfOTHzpw4DxrGsaK2D8mTOtlnhwFnsalE92pMhE2eU3jprpa3MRDMz9vvPRnu
 7lUl8pv2C8chfBh1miVy
 =e3/b
 -----END PGP SIGNATURE-----

Merge tag 'iommu-fixes-v3.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

Pull IOMMU fixes from Joerg Roedel:
 "Including:

   - a domain structure leak fix in the Intel VT-d driver

   - compile error fix for the VMSA IPMMU driver because of the
     IOMMU_EXEC -> IOMMU_NOEXEC conversion

   - two small cleanups as an aftermath of the merge window and the
     domain-leak fix"

* tag 'iommu-fixes-v3.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
  iommu/rockchip: Drop owner assignment from platform_drivers
  iommu/vt-d: Remove dead code in device_notifier
  iommu/vt-d: Fix dmar_domain leak in iommu_attach_device
  iommu/ipmmu-vmsa: Change IOMMU_EXEC to IOMMU_NOEXEC
2015-01-08 12:07:34 -08:00
Linus Torvalds
716c13a817 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
 "This fixes a build problem with sha-mb with old toolchains and an
  implementation bug in the ctr(aes)/by8 branch of aesni-intel that's
  enabled when AVX is available"

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: sha-mb - Add avx2_supported check.
  crypto: aesni - fix "by8" variant for 128 bit keys
2015-01-08 11:33:51 -08:00
Ilya Dryomov
d7d5a007b1 libceph: fix sparse endianness warnings
The only real issue is the one in auth_x.c and it came with
3.19-rc1 merge.

Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
2015-01-08 20:36:57 +03:00
Ilya Dryomov
0668ff52e2 ceph: use %zu for len in ceph_fill_inline_data()
len is size_t, should be printed with %zu.

Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
2015-01-08 20:36:56 +03:00
Namhyung Kim
c09e31cc12 perf hists browser: Fix segfault when showing callchain
When perf report on TUI shows callchain it checks first node has
siblings to determine whether it needs to print percentage value.

But it missed a case that first node is NULL.  So sometimes it segfaults
like below:

  $ perf top -g
  perf: Segmentation fault
  -------- backtrace --------
  perf[0x4fcefb]
  /usr/lib/libc.so.6(+0x33b20)[0x7f2a35839b20]
  perf(rb_next+0x8)[0x47d3d8]
  perf[0x4f6058]
  perf[0x4f833b]
  perf[0x4f8610]
  perf[0x4f209e]
  perf(ui_browser__run+0x3a)[0x4f2e6a]
  perf[0x4f94ee]
  perf(perf_evlist__tui_browse_hists+0x94)[0x4fbbf4]
  perf[0x444d10]
  /usr/lib/libpthread.so.0(+0x7314)[0x7f2a37070314]
  /usr/lib/libc.so.6(clone+0x6d)[0x7f2a358ee5bd]

  $ addr2line -e `which perf` 0x4f6058
  /home/namhyung/project/linux/tools/perf/ui/browsers/hists.c:553

I don't know why the backtrace didn't print some symbols..

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Fixes: 4087d11cd9 ("perf hists browser: Print overhead percent value for first-level callchain")
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1419401076-21700-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-01-08 11:59:16 -03:00
Namhyung Kim
d114960c48 perf callchain: Free callchains when hist entries are deleted
Markus reported that "perf top -g" can leak ~300MB per second on his
machine.  This is partly because it missed to free callchains when hist
entries are deleted.  Fix it.

Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20141230053813.GD6081@sejong
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-01-08 11:56:35 -03:00
Namhyung Kim
5ca8271022 perf hists: Fix children sort key behavior
When perf report --children resorts output fields, it tries to put
caller above the callee.  But this was only meaningful for a same thread
and doing this requires callchain enabled.  So fix its check before
comparing the callchain depth.

This also changes the hist accumulation tests: In test 3, xmalloc in
bash thread should be above than other perf threads due to alphabetical
order of comm string.  Also it's under page_fault in bash thread since
alphabetical order of dso name.  The sys_perf_event_open in perf thread
is put on the last line since it's self overhead is 0.

In test 4, the sys_perf_event_open is put above other perf entries that
have same children overhead since its callchain depth is smaller.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1419309381-2593-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-01-08 11:26:56 -03:00
Ard Biesheuvel
0e63ea48b4 arm64/efi: add missing call to early_ioremap_reset()
The early ioremap support introduced by patch bf4b558eba
("arm64: add early_ioremap support") failed to add a call to
early_ioremap_reset() at an appropriate time. Without this call,
invocations of early_ioremap etc. that are done too late will go
unnoticed and may cause corruption.

This is exactly what happened when the first user of this feature
was added in patch f84d02755f ("arm64: add EFI runtime services").
The early mapping of the EFI memory map is unmapped during an early
initcall, at which time the early ioremap support is long gone.

Fix by adding the missing call to early_ioremap_reset() to
setup_arch(), and move the offending early_memunmap() to right after
the point where the early mapping of the EFI memory map is last used.

Fixes: f84d02755f ("arm64: add EFI runtime services")
Cc: <stable@vger.kernel.org>
Signed-off-by: Leif Lindholm <leif.lindholm@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
2015-01-08 11:57:04 +00:00
Ingo Molnar
ed9eb845d7 perf/urgent fixes:
- 'perf probe' should fall back to find probe point in symbols when failing
   to do so in a debuginfo file (Masami Hiramatsu)
 
 - Fix 'perf probe' crash in dwarf_getcfi_elf (Namhyung Kim)
 
 - Fix shell completion with 'perf list' --raw-dump option (Taesoo Kim)
 
 - Fix 'perf diff' to sort by baseline field by default (Namhyung Kim)
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJUp1VfAAoJEBpxZoYYoA71q8oIAJcsGVKioROAGgcaorZJlm4M
 XeC1THov9Rly96h73rKn7PNXs+YDouixNc6F63rxjKxePpLk3TWxcRdhjkXZtkzQ
 m4VJR2nZT8UC5OBBonuKB5dvDBrNnhvDYD3VR6cTThBuwKv7eZzLcNBdc2K/oa+y
 fmvv+Mk/6Kv3Ru0giOkSbiMLKaWUDP3stKl10cVIsVP7lIdbDm5IceXJChWnK1Uo
 ydEknREXyNm11dUQ2LR/PpO+R/mAq+/IfgYYbr1mJd67Zgc+iix2H/Y0yq+PtwuH
 Vsw5W+LFAzjQid7PhgNbsYXviMrZ5Vt74tskD9Ux3vCpcjvF/MoCrUJrcAe+Epw=
 =0JUz
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

Pull perf/urgent fixes from Arnaldo Carvalho de Melo:

"
  - 'perf probe' should fall back to find probe point in symbols when failing
    to do so in a debuginfo file (Masami Hiramatsu)

  - Fix 'perf probe' crash in dwarf_getcfi_elf (Namhyung Kim)

  - Fix shell completion with 'perf list' --raw-dump option (Taesoo Kim)

  - Fix 'perf diff' to sort by baseline field by default (Namhyung Kim)
"

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-01-08 08:59:22 +01:00
Dave Airlie
79305ec6e6 Merge tag 'amdkfd-fixes-2015-01-06' of git://people.freedesktop.org/~gabbayo/linux into drm-fixes
- Complete overhaul to the main IOCTL function, kfd_ioctl(), according to
  drm_ioctl() example. This includes changing the IOCTL definitions, so it
  breaks compatibility with previous versions of the userspace. However,
  because the kernel was not officialy released yet, and this the first
  kernel that includes amdkfd, I assume I can still do that at this stage.

- A couple of bug fixes for the non-HWS path (used for bring-ups and
  debugging purposes only).

* tag 'amdkfd-fixes-2015-01-06' of git://people.freedesktop.org/~gabbayo/linux:
  drm/amdkfd: rewrite kfd_ioctl() according to drm_ioctl()
  drm/amdkfd: reformat IOCTL definitions to drm-style
  drm/amdkfd: Do copy_to/from_user in general kfd_ioctl()
  drm/amdkfd: unmap VMID<-->PASID when relesing VMID (non-HWS)
  drm/radeon: Assign VMID to PASID for IH in non-HWS mode
  drm/radeon: do not leave queue acquired if timeout happens in kgd_hqd_destroy()
  drm/amdkfd: Load mqd to hqd in non-HWS mode
  drm/amd: Fixing typos in kfd<->kgd interface
2015-01-08 10:36:37 +10:00
Dave Airlie
eaee8ec4eb Merge branch 'drm-fixes-3.19' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
some minor radeon fixes.

* 'drm-fixes-3.19' of git://people.freedesktop.org/~agd5f/linux:
  drm/radeon: integer underflow in radeon_cp_dispatch_texture()
  drm/radeon: adjust default bapm settings for KV
  drm/radeon: properly filter DP1.2 4k modes on non-DP1.2 hw
  drm/radeon: fix sad_count check for dce3
  drm/radeon: KV has three PPLLs (v2)
2015-01-08 10:19:59 +10:00
Dave Airlie
f6624888a5 Merge branch 'linux-3.19' of git://anongit.freedesktop.org/git/nouveau/linux-2.6 into drm-fixes
- Fix BUG() on !SMP builds
    - Fix for OOPS on pre-NV50 that snuck into -next
    - MCP7[789A] hang fix where firmware hasn't already setup NISO pollers
    - NV4x IGP MSI disable, it doesn't appear to work correctly
    - Add GK208B to recognised boards (no code change aside from adding
    chipset recognition)

* 'linux-3.19' of git://anongit.freedesktop.org/git/nouveau/linux-2.6:
  drm/nouveau/nouveau: Do not BUG_ON(!spin_is_locked()) on UP
  drm/nv4c/mc: disable msi
  drm/nouveau/fb/ram/mcp77: enable NISO poller
  drm/nouveau/fb/ram/mcp77: use carveout reg to determine size
  drm/nouveau/fb/ram/mcp77: subclass nouveau_ram
  drm/nouveau: wake up the card if necessary during gem callbacks
  drm/nouveau/device: Add support for GK208B, resolves bug 86935
  drm/nouveau: fix missing return statement in nouveau_ttm_tt_unpopulate
  drm/nouveau/bios: fix oops on pre-nv50 chipsets
2015-01-08 10:19:24 +10:00
Grygorii Strashko
ac08468867 ARM: 8253/1: mm: use phys_addr_t type in map_lowmem() for kernel mem region
Now local variables kernel_x_start and kernel_x_end defined using
'unsigned long' type which is wrong because they represent physical
memory range and will be calculated wrongly if LPAE is enabled.
As result, all following code in map_lowmem() will not work correctly.

For example, Keystone 2 boot is broken because
 kernel_x_start == 0x0000 0000
 kernel_x_end   == 0x0080 0000

instead of
 kernel_x_start == 0x0000 0008 0000 0000
 kernel_x_end   == 0x0000 0008 0080 0000
and as result whole low memory will be mapped with MT_MEMORY_RW
permissions by code (start > kernel_x_end):
		} else if (start >= kernel_x_end) {
			map.pfn = __phys_to_pfn(start);
			map.virtual = __phys_to_virt(start);
			map.length = end - start;
			map.type = MT_MEMORY_RW;

			create_mapping(&map);
		}

Hence, fix it by using phys_addr_t type for variables kernel_x_start
and kernel_x_end.

Tested-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: Grygorii Strashko <grygorii.strashko@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-01-07 20:33:34 +00:00
Mark Rutland
cca547e9aa ARM: 8249/1: mm: dump: don't skip regions
Currently the arm page table dumping code starts dumping page tables
from USER_PGTABLES_CEILING. This is unnecessary for skipping any entries
related to userspace as the swapper_pg_dir does not contain such
entries, and results in a couple of unfortuante side effects.

Firstly, any kernel mappings which might exist below
USER_PGTABLES_CEILING will not be accounted in the dump output. This
masks any entries erroneously created below this address.

Secondly, if the final page table entry walked is part of a valid
mapping the page table dumping code will not log the region this entry
is part of, as the final note_page call in walk_pgd will trigger an
early return when 0 < USER_PGTABLES_CEILING. Luckily this isn't seen on
contemporary systems as they typically don't have enough RAM to extend
the linear mapping right to the end of the address space.

Due to the way addr is constructed in the walk_* functions, it can never
be less than USER_PGTABLES_CEILING when walking the page tables, so it
is not necessary to avoid dereferencing invalid table addresses. The
existing checks for st->current_prot and st->marker[1].start_address are
sufficient to ensure we will not print and/or dereference garbage when
trying to log information.

This patch removes both problematic uses of USER_PGTABLES_CEILING from
the arm page table dumping code, preventing both of these issues. We
will now report any low mappings, and the final note_page call will not
return early, ensuring all regions are logged.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Steve Capper <steve.capper@linaro.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-01-07 20:33:33 +00:00
Russell King
841ee23025 ARM: wire up execveat syscall
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2015-01-07 20:31:54 +00:00
J. Bruce Fields
49a068f82a rpc: fix xdr_truncate_encode to handle buffer ending on page boundary
A struct xdr_stream at a page boundary might point to the end of one
page or the beginning of the next, but xdr_truncate_encode isn't
prepared to handle the former.

This can cause corruption of NFSv4 READDIR replies in the case that a
readdir entry that would have exceeded the client's dircount/maxcount
limit would have ended exactly on a 4k page boundary.  You're more
likely to hit this case on large directories.

Other xdr_truncate_encode callers are probably also affected.

Reported-by: Holger Hoffstätte <holger.hoffstaette@googlemail.com>
Tested-by: Holger Hoffstätte <holger.hoffstaette@googlemail.com>
Fixes: 3e19ce762b "rpc: xdr_truncate_encode"
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-07 14:03:58 -05:00
Jeff Layton
94ae1db226 nfsd: fix fi_delegees leak when fi_had_conflict returns true
Currently, nfs4_set_delegation takes a reference to an existing
delegation and then checks to see if there is a conflict. If there is
one, then it doesn't release that reference.

Change the code to take the reference after the check and only if there
is no conflict.

Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-01-07 13:38:21 -05:00