- Fixed style error: Missing space before the open parenthesis
- Fixed style warnings: 2x Missing blank line after declaration
One warning left: else after return
(I don't feel comfortable fixing that without side effects)
Signed-off-by: Mario Leinweber <marioleinweber@web.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/20180302182007.28691-1-marioleinweber@web.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Now that the 1Hz tick is offloaded to workqueues, we can safely remove
the residual code that used to handle it locally.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wanpeng Li <kernellwp@gmail.com>
Link: http://lkml.kernel.org/r/1519186649-3242-7-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When a CPU runs in full dynticks mode, a 1Hz tick remains in order to
keep the scheduler stats alive. However this residual tick is a burden
for bare metal tasks that can't stand any interruption at all, or want
to minimize them.
The usual boot parameters "nohz_full=" or "isolcpus=nohz" will now
outsource these scheduler ticks to the global workqueue so that a
housekeeping CPU handles those remotely. The sched_class::task_tick()
implementations have been audited and look safe to be called remotely
as the target runqueue and its current task are passed in parameter
and don't seem to be accessed locally.
Note that in the case of using isolcpus, it's still up to the user to
affine the global workqueues to the housekeeping CPUs through
/sys/devices/virtual/workqueue/cpumask or domains isolation
"isolcpus=nohz,domain".
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wanpeng Li <kernellwp@gmail.com>
Link: http://lkml.kernel.org/r/1519186649-3242-6-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As we prepare for offloading the residual 1hz scheduler ticks to
workqueue, let's affine those to housekeepers so that they don't
interrupt the CPUs that don't want to be disturbed.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wanpeng Li <kernellwp@gmail.com>
Link: http://lkml.kernel.org/r/1519186649-3242-5-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This check is racy but provides a good heuristic to determine whether
a CPU may need a remote tick or not.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wanpeng Li <kernellwp@gmail.com>
Link: http://lkml.kernel.org/r/1519186649-3242-4-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It makes this function more self-explanatory about what it does and how
to use it.
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wanpeng Li <kernellwp@gmail.com>
Link: http://lkml.kernel.org/r/1519186649-3242-3-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Do that rename in order to normalize the hrtick namespace.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Wanpeng Li <kernellwp@gmail.com>
Link: http://lkml.kernel.org/r/1519186649-3242-2-git-send-email-frederic@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If wake_affine() pulls a task to another node for any reason and the node is
no longer preferred then temporarily stop automatic NUMA balancing pulling
the task back. Otherwise, tasks with a strong waker/wakee relationship
may constantly fight automatic NUMA balancing over where a task should
be placed.
Once again netperf is interesting here. The performance barely changes
but automatic NUMA balancing is interesting:
Hmean send-64 354.67 ( 0.00%) 352.15 ( -0.71%)
Hmean send-128 702.91 ( 0.00%) 693.84 ( -1.29%)
Hmean send-256 1350.07 ( 0.00%) 1344.19 ( -0.44%)
Hmean send-1024 5124.38 ( 0.00%) 4941.24 ( -3.57%)
Hmean send-2048 9687.44 ( 0.00%) 9624.45 ( -0.65%)
Hmean send-3312 14577.64 ( 0.00%) 14514.35 ( -0.43%)
Hmean send-4096 16393.62 ( 0.00%) 16488.30 ( 0.58%)
Hmean send-8192 26877.26 ( 0.00%) 26431.63 ( -1.66%)
Hmean send-16384 38683.43 ( 0.00%) 38264.91 ( -1.08%)
Hmean recv-64 354.67 ( 0.00%) 352.15 ( -0.71%)
Hmean recv-128 702.91 ( 0.00%) 693.84 ( -1.29%)
Hmean recv-256 1350.07 ( 0.00%) 1344.19 ( -0.44%)
Hmean recv-1024 5124.38 ( 0.00%) 4941.24 ( -3.57%)
Hmean recv-2048 9687.43 ( 0.00%) 9624.45 ( -0.65%)
Hmean recv-3312 14577.59 ( 0.00%) 14514.35 ( -0.43%)
Hmean recv-4096 16393.55 ( 0.00%) 16488.20 ( 0.58%)
Hmean recv-8192 26876.96 ( 0.00%) 26431.29 ( -1.66%)
Hmean recv-16384 38682.41 ( 0.00%) 38263.94 ( -1.08%)
NUMA alloc hit 1465986 1423090
NUMA alloc miss 0 0
NUMA interleave hit 0 0
NUMA alloc local 1465897 1423003
NUMA base PTE updates 1473 1420
NUMA huge PMD updates 0 0
NUMA page range updates 1473 1420
NUMA hint faults 1383 1312
NUMA hint local faults 451 124
NUMA hint local percent 32 9
There is a slight degrading in performance but there are slightly fewer
NUMA faults. There is a large drop in the percentage of local faults but
the bulk of migrations for netperf are in small shared libraries so it's
reflecting the fact that automatic NUMA balancing has backed off. This is
a case where despite wake_affine() and automatic NUMA balancing fighting
for placement that there is a marginal benefit to rescheduling to local
data quickly. However, it should be noted that wake_affine() and automatic
NUMA balancing fighting each other constantly is undesirable.
However, the benefit in other cases is large. This is the result for NAS
with the D class sizing on a 4-socket machine:
nas-mpi
4.15.0 4.15.0
sdnuma-v1r23 delayretry-v1r23
Time cg.D 557.00 ( 0.00%) 431.82 ( 22.47%)
Time ep.D 77.83 ( 0.00%) 79.01 ( -1.52%)
Time is.D 26.46 ( 0.00%) 26.64 ( -0.68%)
Time lu.D 727.14 ( 0.00%) 597.94 ( 17.77%)
Time mg.D 191.35 ( 0.00%) 146.85 ( 23.26%)
4.15.0 4.15.0
sdnuma-v1r23delayretry-v1r23
User 75665.20 70413.30
System 20321.59 8861.67
Elapsed 766.13 634.92
Minor Faults 16528502 7127941
Major Faults 4553 5068
NUMA alloc local 6963197 6749135
NUMA base PTE updates 366409093 107491434
NUMA huge PMD updates 687556 198880
NUMA page range updates 718437765 209317994
NUMA hint faults 13643410 4601187
NUMA hint local faults 9212593 3063996
NUMA hint local percent 67 66
Note the massive reduction in system CPU usage even though the percentage
of local faults is barely affected. There is a massive reduction in the
number of PTE updates showing that automatic NUMA balancing has backed off.
A critical observation is also that there is a massive reduction in minor
faults which is due to far fewer NUMA hinting faults being trapped.
There were questions on NAS OMP and how it behaved related to threads
being bound to CPUs. First, there are more gains than losses with this
patch applied and a reduction in system CPU usage:
nas-omp
4.16.0-rc1 4.16.0-rc1
sdnuma-v2r1 delayretry-v2r1
Time bt.D 436.71 ( 0.00%) 430.05 ( 1.53%)
Time cg.D 201.02 ( 0.00%) 180.87 ( 10.02%)
Time ep.D 32.84 ( 0.00%) 32.68 ( 0.49%)
Time is.D 9.63 ( 0.00%) 9.64 ( -0.10%)
Time lu.D 331.20 ( 0.00%) 304.80 ( 7.97%)
Time mg.D 54.87 ( 0.00%) 52.72 ( 3.92%)
Time sp.D 1108.78 ( 0.00%) 917.10 ( 17.29%)
Time ua.D 378.81 ( 0.00%) 398.83 ( -5.28%)
4.16.0-rc1 4.16.0-rc1
sdnuma-v2r1delayretry-v2r1
User 305633.08 296751.91
System 451.75 357.80
Elapsed 2595.73 2368.13
However, it does not close the gap between binding and being unbound. There
is negligible difference between the performance of the baseline and a
patched kernel when threads are bound so it is not presented here:
4.16.0-rc1 4.16.0-rc1
delayretry-bind delayretry-unbound
Time bt.D 385.02 ( 0.00%) 430.05 ( -11.70%)
Time cg.D 144.02 ( 0.00%) 180.87 ( -25.59%)
Time ep.D 32.85 ( 0.00%) 32.68 ( 0.52%)
Time is.D 10.52 ( 0.00%) 9.64 ( 8.37%)
Time lu.D 285.31 ( 0.00%) 304.80 ( -6.83%)
Time mg.D 43.21 ( 0.00%) 52.72 ( -22.01%)
Time sp.D 820.24 ( 0.00%) 917.10 ( -11.81%)
Time ua.D 337.09 ( 0.00%) 398.83 ( -18.32%)
4.16.0-rc1 4.16.0-rc1
delayretry-binddelayretry-unbound
User 277731.25 296751.91
System 261.29 357.80
Elapsed 2100.55 2368.13
Unfortunately, while performance is improved by the patch, there is still
quite a long way to go before it's equivalent to hard binding.
Other workloads like hackbench, tbench, dbench and schbench are barely
affected. dbench shows a mix of gains and losses depending on the machine
although in general, the results are more stable.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Giovanni Gherdovich <ggherdovich@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180213133730.24064-7-mgorman@techsingularity.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
find_idlest_group() compares a local group with each other group to select
the one that is most idle. When comparing groups in different NUMA domains,
a very slight imbalance is enough to select a remote NUMA node even if the
runnable load on both groups is 0 or close to 0. This ignores the cost of
remote accesses entirely and is a problem when selecting the CPU for a
newly forked task to run on. This is problematic when a forking server
is almost guaranteed to run on a remote node incurring numerous remote
accesses and potentially causing automatic NUMA balancing to try migrate
the task back or migrate the data to another node. Similar weirdness is
observed if a basic shell command pipes output to another as each process
in the pipeline is likely to start on different nodes and then get adjusted
later by wake_affine().
This patch adds imbalance to remote domains when considering whether to
select CPUs from remote domains. If the local domain is selected, imbalance
will still be used to try select a CPU from a lower scheduler domain's group
instead of stacking tasks on the same CPU.
A variety of workloads and machines were tested and as expected, there is no
difference on UMA. The difference on NUMA can be dramatic. This is a comparison
of elapsed times running the git regression test suite. It's fork-intensive with
short-lived processes:
4.15.0 4.15.0
noexit-v1r23 sdnuma-v1r23
Elapsed min 1706.06 ( 0.00%) 1435.94 ( 15.83%)
Elapsed mean 1709.53 ( 0.00%) 1436.98 ( 15.94%)
Elapsed stddev 2.16 ( 0.00%) 1.01 ( 53.38%)
Elapsed coeffvar 0.13 ( 0.00%) 0.07 ( 44.54%)
Elapsed max 1711.59 ( 0.00%) 1438.01 ( 15.98%)
4.15.0 4.15.0
noexit-v1r23 sdnuma-v1r23
User 5434.12 5188.41
System 4878.77 3467.09
Elapsed 10259.06 8624.21
That shows a considerable reduction in elapsed times. It's important to
note that automatic NUMA balancing does not affect this load as processes
are too short-lived.
There is also a noticable impact on hackbench such as this example using
processes and pipes:
hackbench-process-pipes
4.15.0 4.15.0
noexit-v1r23 sdnuma-v1r23
Amean 1 1.0973 ( 0.00%) 0.9393 ( 14.40%)
Amean 4 1.3427 ( 0.00%) 1.3730 ( -2.26%)
Amean 7 1.4233 ( 0.00%) 1.6670 ( -17.12%)
Amean 12 3.0250 ( 0.00%) 3.3013 ( -9.13%)
Amean 21 9.0860 ( 0.00%) 9.5343 ( -4.93%)
Amean 30 14.6547 ( 0.00%) 13.2433 ( 9.63%)
Amean 48 22.5447 ( 0.00%) 20.4303 ( 9.38%)
Amean 79 29.2010 ( 0.00%) 26.7853 ( 8.27%)
Amean 110 36.7443 ( 0.00%) 35.8453 ( 2.45%)
Amean 141 45.8533 ( 0.00%) 42.6223 ( 7.05%)
Amean 172 55.1317 ( 0.00%) 50.6473 ( 8.13%)
Amean 203 64.4420 ( 0.00%) 58.3957 ( 9.38%)
Amean 234 73.2293 ( 0.00%) 67.1047 ( 8.36%)
Amean 265 80.5220 ( 0.00%) 75.7330 ( 5.95%)
Amean 296 88.7567 ( 0.00%) 82.1533 ( 7.44%)
It's not a universal win as there are occasions when spreading wide and
quickly is a benefit but it's more of a win than it is a loss. For other
workloads, there is little difference but netperf is interesting. Without
the patch, the server and client starts on different nodes but quickly get
migrated due to wake_affine. Hence, the difference is overall performance
is marginal but detectable:
4.15.0 4.15.0
noexit-v1r23 sdnuma-v1r23
Hmean send-64 349.09 ( 0.00%) 354.67 ( 1.60%)
Hmean send-128 699.16 ( 0.00%) 702.91 ( 0.54%)
Hmean send-256 1316.34 ( 0.00%) 1350.07 ( 2.56%)
Hmean send-1024 5063.99 ( 0.00%) 5124.38 ( 1.19%)
Hmean send-2048 9705.19 ( 0.00%) 9687.44 ( -0.18%)
Hmean send-3312 14359.48 ( 0.00%) 14577.64 ( 1.52%)
Hmean send-4096 16324.20 ( 0.00%) 16393.62 ( 0.43%)
Hmean send-8192 26112.61 ( 0.00%) 26877.26 ( 2.93%)
Hmean send-16384 37208.44 ( 0.00%) 38683.43 ( 3.96%)
Hmean recv-64 349.09 ( 0.00%) 354.67 ( 1.60%)
Hmean recv-128 699.16 ( 0.00%) 702.91 ( 0.54%)
Hmean recv-256 1316.34 ( 0.00%) 1350.07 ( 2.56%)
Hmean recv-1024 5063.99 ( 0.00%) 5124.38 ( 1.19%)
Hmean recv-2048 9705.16 ( 0.00%) 9687.43 ( -0.18%)
Hmean recv-3312 14359.42 ( 0.00%) 14577.59 ( 1.52%)
Hmean recv-4096 16323.98 ( 0.00%) 16393.55 ( 0.43%)
Hmean recv-8192 26111.85 ( 0.00%) 26876.96 ( 2.93%)
Hmean recv-16384 37206.99 ( 0.00%) 38682.41 ( 3.97%)
However, what is very interesting is how automatic NUMA balancing behaves.
Each netperf instance runs long enough for balancing to activate:
NUMA base PTE updates 4620 1473
NUMA huge PMD updates 0 0
NUMA page range updates 4620 1473
NUMA hint faults 4301 1383
NUMA hint local faults 1309 451
NUMA hint local percent 30 32
NUMA pages migrated 1335 491
AutoNUMA cost 21% 6%
There is an unfortunate number of remote faults although tracing indicated
that the vast majority are in shared libraries. However, the tendency to
start tasks on the same node if there is capacity means that there were
far fewer PTE updates and faults incurred overall.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Giovanni Gherdovich <ggherdovich@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180213133730.24064-6-mgorman@techsingularity.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
When a task exits, it notifies the parent that it has exited. This is a
sync wakeup and the exiting task may pull the parent towards the wakers
CPU. For simple workloads like using a shell, it was observed that the
shell is pulled across nodes by exiting processes. This is daft as the
parent may be long-lived and properly placed. This patch special cases a
sync wakeup on exit to avoid pulling tasks across nodes. Testing on a range
of workloads and machines showed very little differences in performance
although there was a small 3% boost on some machines running a shellscript
intensive workload (git regression test suite).
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Giovanni Gherdovich <ggherdovich@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180213133730.24064-5-mgorman@techsingularity.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
wake_affine_weight() will consider migrating a task to, or near, the current
CPU if there is a load imbalance. If the CPUs share LLC then either CPU
is valid as a search-for-idle-sibling target and equally appropriate for
stacking two tasks on one CPU if an idle sibling is unavailable. If they do
not share cache then a cross-node migration potentially impacts locality
so while they are equal from a CPU capacity point of view, they are not
equal in terms of memory locality. In either case, it's more appropriate
to migrate only if there is a difference in their effective load.
This patch modifies wake_affine_weight() to only consider migrating a task
if there is a load imbalance for normal wakeups but will allow potential
stacking if the loads are equal and it's a sync wakeup.
For the most part, the different in performance is marginal. For example,
on a 4-socket server running netperf UDP_STREAM on localhost the differences
are as follows:
4.15.0 4.15.0
16rc0 noequal-v1r23
Hmean send-64 355.47 ( 0.00%) 349.50 ( -1.68%)
Hmean send-128 697.98 ( 0.00%) 693.35 ( -0.66%)
Hmean send-256 1328.02 ( 0.00%) 1318.77 ( -0.70%)
Hmean send-1024 5051.83 ( 0.00%) 5051.11 ( -0.01%)
Hmean send-2048 9637.02 ( 0.00%) 9601.34 ( -0.37%)
Hmean send-3312 14355.37 ( 0.00%) 14414.51 ( 0.41%)
Hmean send-4096 16464.97 ( 0.00%) 16301.37 ( -0.99%)
Hmean send-8192 26722.42 ( 0.00%) 26428.95 ( -1.10%)
Hmean send-16384 38137.81 ( 0.00%) 38046.11 ( -0.24%)
Hmean recv-64 355.47 ( 0.00%) 349.50 ( -1.68%)
Hmean recv-128 697.98 ( 0.00%) 693.35 ( -0.66%)
Hmean recv-256 1328.02 ( 0.00%) 1318.77 ( -0.70%)
Hmean recv-1024 5051.83 ( 0.00%) 5051.11 ( -0.01%)
Hmean recv-2048 9636.95 ( 0.00%) 9601.30 ( -0.37%)
Hmean recv-3312 14355.32 ( 0.00%) 14414.48 ( 0.41%)
Hmean recv-4096 16464.74 ( 0.00%) 16301.16 ( -0.99%)
Hmean recv-8192 26721.63 ( 0.00%) 26428.17 ( -1.10%)
Hmean recv-16384 38136.00 ( 0.00%) 38044.88 ( -0.24%)
Stddev send-64 7.30 ( 0.00%) 4.75 ( 34.96%)
Stddev send-128 15.15 ( 0.00%) 22.38 ( -47.66%)
Stddev send-256 13.99 ( 0.00%) 19.14 ( -36.81%)
Stddev send-1024 105.73 ( 0.00%) 67.38 ( 36.27%)
Stddev send-2048 294.57 ( 0.00%) 223.88 ( 24.00%)
Stddev send-3312 302.28 ( 0.00%) 271.74 ( 10.10%)
Stddev send-4096 195.92 ( 0.00%) 121.10 ( 38.19%)
Stddev send-8192 399.71 ( 0.00%) 563.77 ( -41.04%)
Stddev send-16384 1163.47 ( 0.00%) 1103.68 ( 5.14%)
Stddev recv-64 7.30 ( 0.00%) 4.75 ( 34.96%)
Stddev recv-128 15.15 ( 0.00%) 22.38 ( -47.66%)
Stddev recv-256 13.99 ( 0.00%) 19.14 ( -36.81%)
Stddev recv-1024 105.73 ( 0.00%) 67.38 ( 36.27%)
Stddev recv-2048 294.59 ( 0.00%) 223.89 ( 24.00%)
Stddev recv-3312 302.24 ( 0.00%) 271.75 ( 10.09%)
Stddev recv-4096 196.03 ( 0.00%) 121.14 ( 38.20%)
Stddev recv-8192 399.86 ( 0.00%) 563.65 ( -40.96%)
Stddev recv-16384 1163.79 ( 0.00%) 1103.86 ( 5.15%)
The difference in overall performance is marginal but note that most
measurements are less variable. There were similar observations for other
netperf comparisons. hackbench with sockets or threads with processes or
threads showed minor difference with some reduction of migration. tbench
showed only marginal differences that were within the noise. dbench,
regardless of filesystem, showed minor differences all of which are
within noise. Multiple machines, both UMA and NUMA were tested without
any regressions showing up.
The biggest risk with a patch like this is affecting wakeup latencies.
However, the schbench load from Facebook which is very sensitive to wakeup
latency showed a mixed result with mostly improvements in wakeup latency:
4.15.0 4.15.0
16rc0 noequal-v1r23
Lat 50.00th-qrtle-1 38.00 ( 0.00%) 38.00 ( 0.00%)
Lat 75.00th-qrtle-1 49.00 ( 0.00%) 41.00 ( 16.33%)
Lat 90.00th-qrtle-1 52.00 ( 0.00%) 50.00 ( 3.85%)
Lat 95.00th-qrtle-1 54.00 ( 0.00%) 51.00 ( 5.56%)
Lat 99.00th-qrtle-1 63.00 ( 0.00%) 60.00 ( 4.76%)
Lat 99.50th-qrtle-1 66.00 ( 0.00%) 61.00 ( 7.58%)
Lat 99.90th-qrtle-1 78.00 ( 0.00%) 65.00 ( 16.67%)
Lat 50.00th-qrtle-2 38.00 ( 0.00%) 38.00 ( 0.00%)
Lat 75.00th-qrtle-2 42.00 ( 0.00%) 43.00 ( -2.38%)
Lat 90.00th-qrtle-2 46.00 ( 0.00%) 48.00 ( -4.35%)
Lat 95.00th-qrtle-2 49.00 ( 0.00%) 50.00 ( -2.04%)
Lat 99.00th-qrtle-2 55.00 ( 0.00%) 57.00 ( -3.64%)
Lat 99.50th-qrtle-2 58.00 ( 0.00%) 60.00 ( -3.45%)
Lat 99.90th-qrtle-2 65.00 ( 0.00%) 68.00 ( -4.62%)
Lat 50.00th-qrtle-4 41.00 ( 0.00%) 41.00 ( 0.00%)
Lat 75.00th-qrtle-4 45.00 ( 0.00%) 46.00 ( -2.22%)
Lat 90.00th-qrtle-4 50.00 ( 0.00%) 50.00 ( 0.00%)
Lat 95.00th-qrtle-4 54.00 ( 0.00%) 53.00 ( 1.85%)
Lat 99.00th-qrtle-4 61.00 ( 0.00%) 61.00 ( 0.00%)
Lat 99.50th-qrtle-4 65.00 ( 0.00%) 64.00 ( 1.54%)
Lat 99.90th-qrtle-4 76.00 ( 0.00%) 82.00 ( -7.89%)
Lat 50.00th-qrtle-8 48.00 ( 0.00%) 46.00 ( 4.17%)
Lat 75.00th-qrtle-8 55.00 ( 0.00%) 54.00 ( 1.82%)
Lat 90.00th-qrtle-8 60.00 ( 0.00%) 59.00 ( 1.67%)
Lat 95.00th-qrtle-8 63.00 ( 0.00%) 63.00 ( 0.00%)
Lat 99.00th-qrtle-8 71.00 ( 0.00%) 69.00 ( 2.82%)
Lat 99.50th-qrtle-8 74.00 ( 0.00%) 73.00 ( 1.35%)
Lat 99.90th-qrtle-8 98.00 ( 0.00%) 90.00 ( 8.16%)
Lat 50.00th-qrtle-16 56.00 ( 0.00%) 55.00 ( 1.79%)
Lat 75.00th-qrtle-16 68.00 ( 0.00%) 67.00 ( 1.47%)
Lat 90.00th-qrtle-16 77.00 ( 0.00%) 78.00 ( -1.30%)
Lat 95.00th-qrtle-16 82.00 ( 0.00%) 84.00 ( -2.44%)
Lat 99.00th-qrtle-16 90.00 ( 0.00%) 93.00 ( -3.33%)
Lat 99.50th-qrtle-16 93.00 ( 0.00%) 97.00 ( -4.30%)
Lat 99.90th-qrtle-16 110.00 ( 0.00%) 110.00 ( 0.00%)
Lat 50.00th-qrtle-32 68.00 ( 0.00%) 62.00 ( 8.82%)
Lat 75.00th-qrtle-32 90.00 ( 0.00%) 83.00 ( 7.78%)
Lat 90.00th-qrtle-32 110.00 ( 0.00%) 100.00 ( 9.09%)
Lat 95.00th-qrtle-32 122.00 ( 0.00%) 111.00 ( 9.02%)
Lat 99.00th-qrtle-32 145.00 ( 0.00%) 133.00 ( 8.28%)
Lat 99.50th-qrtle-32 154.00 ( 0.00%) 143.00 ( 7.14%)
Lat 99.90th-qrtle-32 2316.00 ( 0.00%) 515.00 ( 77.76%)
Lat 50.00th-qrtle-35 69.00 ( 0.00%) 72.00 ( -4.35%)
Lat 75.00th-qrtle-35 92.00 ( 0.00%) 95.00 ( -3.26%)
Lat 90.00th-qrtle-35 111.00 ( 0.00%) 114.00 ( -2.70%)
Lat 95.00th-qrtle-35 122.00 ( 0.00%) 124.00 ( -1.64%)
Lat 99.00th-qrtle-35 142.00 ( 0.00%) 144.00 ( -1.41%)
Lat 99.50th-qrtle-35 150.00 ( 0.00%) 154.00 ( -2.67%)
Lat 99.90th-qrtle-35 6104.00 ( 0.00%) 5640.00 ( 7.60%)
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Giovanni Gherdovich <ggherdovich@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180213133730.24064-4-mgorman@techsingularity.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
On sync wakeups, the previous CPU effective load may not be used so delay
the calculation until it's needed.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Giovanni Gherdovich <ggherdovich@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180213133730.24064-3-mgorman@techsingularity.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The only caller of wake_affine() knows the CPU ID. Pass it in instead of
rechecking it.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Giovanni Gherdovich <ggherdovich@suse.cz>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180213133730.24064-2-mgorman@techsingularity.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 Kconfig fixes from Thomas Gleixner:
"Three patchlets to correct HIGHMEM64G and CMPXCHG64 dependencies in
Kconfig when CPU selections are explicitely set to M586 or M686"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/Kconfig: Explicitly enumerate i686-class CPUs in Kconfig
x86/Kconfig: Exclude i586-class CPUs lacking PAE support from the HIGHMEM64G Kconfig group
x86/Kconfig: Add missing i586-class CPUs to the X86_CMPXCHG64 Kconfig group
Pull perf updates from Thomas Gleixner:
"Perf tool updates and kprobe fixes:
- perf_mmap overwrite mode fixes/overhaul, prep work to get 'perf
top' using it, making it bearable to use it in large core count
systems such as Knights Landing/Mill Intel systems (Kan Liang)
- s/390 now uses syscall.tbl, just like x86-64 to generate the
syscall table id -> string tables used by 'perf trace' (Hendrik
Brueckner)
- Use strtoull() instead of home grown function (Andy Shevchenko)
- Synchronize kernel ABI headers, v4.16-rc1 (Ingo Molnar)
- Document missing 'perf data --force' option (Sangwon Hong)
- Add perf vendor JSON metrics for ARM Cortex-A53 Processor (William
Cohen)
- Improve error handling and error propagation of ftrace based
kprobes so failures when installing kprobes are not silently
ignored and create disfunctional tracepoints"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
kprobes: Propagate error from disarm_kprobe_ftrace()
kprobes: Propagate error from arm_kprobe_ftrace()
Revert "tools include s390: Grab a copy of arch/s390/include/uapi/asm/unistd.h"
perf s390: Rework system call table creation by using syscall.tbl
perf s390: Grab a copy of arch/s390/kernel/syscall/syscall.tbl
tools/headers: Synchronize kernel ABI headers, v4.16-rc1
perf test: Fix test trace+probe_libc_inet_pton.sh for s390x
perf data: Document missing --force option
perf tools: Substitute yet another strtoull()
perf top: Check the latency of perf_top__mmap_read()
perf top: Switch default mode to overwrite mode
perf top: Remove lost events checking
perf hists browser: Add parameter to disable lost event warning
perf top: Add overwrite fall back
perf evsel: Expose the perf_missing_features struct
perf top: Check per-event overwrite term
perf mmap: Discard legacy interface for mmap read
perf test: Update mmap read functions for backward-ring-buffer test
perf mmap: Introduce perf_mmap__read_event()
perf mmap: Introduce perf_mmap__read_done()
...
Pull irq updates from Thomas Gleixner:
"A small set of updates mostly for irq chip drivers:
- MIPS GIC fix for spurious, masked interrupts
- fix for a subtle IPI bug in GICv3
- do not probe GICv3 ITSs that are marked as disabled
- multi-MSI support for GICv2m
- various small cleanups"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqdomain: Re-use DEFINE_SHOW_ATTRIBUTE() macro
irqchip/bcm: Remove hashed address printing
irqchip/gic-v2m: Add PCI Multi-MSI support
irqchip/gic-v3: Ignore disabled ITS nodes
irqchip/gic-v3: Use wmb() instead of smb_wmb() in gic_raise_softirq()
irqchip/gic-v3: Change pr_debug message to pr_devel
irqchip/mips-gic: Avoid spuriously handling masked interrupts
Pull core fix from Thomas Gleixner:
"A small fix which adds the missing for_each_cpu_wrap() stub for the UP
case to avoid build failures"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cpumask: Make for_each_cpu_wrap() available on UP as well
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABCAAGBQJaiF9dAAoJEPfTWPspceCmkS8P/1bLQUbCKuby+aKG52ik80Xb
ao+CM0Ytn1vKxDRnk3rcZyN35++0c2rLzRlK7SCYQ006ivFFGBBrdPJlJq2WismK
06dMnkqGQGr1I6cIryFsUzi3dSk/uc9S3afgYuc6Ga3tvYvM90q1JA4PNUf4u463
pjJoDwL1ZgXeACtG7r8Bmbjb2LxoWODDqeNe3nTUdZLrdRPROn/mkjqOB+NhsTcL
47nIic+U1+QT8A3+gZgmDRz9TKXgLU+5BdUMGavOi3V3d8ZIsBijY20Inr3ovsCc
rSO6WIipk2u3kTIZr3nXhZs2WfDEo+q/G+7vKz+F0ICf4luPScwpPJk0rv9Uf838
LYKn97uucAssV3+tNTWHprCdOBpG1w2fX7a1oSTczYZztWY6CNJzbOQ9w9WFXxUc
cskF7jBShC5l9XmgwoKOFGrnSsuOG5TOzadNcuW5IDBFGOizAEKiIHyQOobYxIHT
ZwipUgVZFbiK7vlxLssYihgrO5rMgpWz4o54OPmCzpD04d1We+Yf1VSOpMFdpR05
h3YQ3y8tj1Ndicnw4P0aj0wDPh4wFd9vuxVtLuvYryh4dffIeU/GWSfmmedakn+0
uc/7QiOxbXj0NcPBd9cJpTNCEGttfKLbedMyGj9ztMoQJAuRnbwWwY4VUJwY+Lvn
hXBr/UYqkwYXkJLy2uKK
=n5RP
-----END PGP SIGNATURE-----
Merge tag 'for-linus-20180217' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- NVMe pull request from Keith, with fixes all over the map for nvme.
From various folks.
- Classic polling fix, that avoids a latency issue where we still end
up waiting for an interrupt in some cases. From Nitesh Shetty.
- Comment typo fix from Minwoo Im.
* tag 'for-linus-20180217' of git://git.kernel.dk/linux-block:
block: fix a typo in comment of BLK_MQ_POLL_STATS_BKTS
nvme-rdma: fix sysfs invoked reset_ctrl error flow
nvmet: Change return code of discard command if not supported
nvme-pci: Fix timeouts in connecting state
nvme-pci: Remap CMB SQ entries on every controller reset
nvme: fix the deadlock in nvme_update_formats
blk: optimization for classic polling
nvme: Don't use a stack buffer for keep-alive command
nvme_fc: cleanup io completion
nvme_fc: correct abort race condition on resets
nvme: Fix discard buffer overrun
nvme: delete NVME_CTRL_LIVE --> NVME_CTRL_CONNECTING transition
nvme-rdma: use NVME_CTRL_CONNECTING state to mark init process
nvme: rename NVME_CTRL_RECONNECTING state to NVME_CTRL_CONNECTING
- meson-gx: Revert to earlier tuning process
- bcm2835: Don't overwrite max frequency unconditionally
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJah/1EAAoJEP4mhCVzWIwp7CsQAMWYAyzQTNJ6WjI5CRcSq+QM
1Cykv98XarR6s3WhUA6xd+p3C6d5PDHsXdrkS0RYLqDHtfOiKdodbSjlh9oo/Uub
KnjCIM8mDRwoJs6TxvBjqIyfpGQYHVF0rf0W71Ik607NGcosoHr7vgYZo2o/3V6g
lu3xXKam9Uc691VFcTcD2ub4H0qI/RsoZaqIEApTnL1m0d6mAPvxald5AV/tJvKc
5R+l4Wb4dXB64q3uQvcyxQRSaWWbWEETHA79qS04W5+RJdyy/TtpPbCnPs3VqCHn
WUEJpoqmFeWNhkg1lmq9MHQvjaweKyLWFKawlpft3mjhImdQBQNjHaLAWW0spsdN
ifqHCs/9NMJOo0fSWwPjJNeC9UxY0HG1s8yznY+CUnESnMhZ3+h5qToFwI3iMaQF
BjJCjxmNvvRAqlptcR3hqVOdrWmzkwuqTKdh/5LHuYOrrvVbDiFM1EQbCT3pe0B+
pHdlB10+wCz1PlNOvi8OUlqW+Co3/lSasWvkS6hKb8SQgZvWfmtyxewR/Mq/zAqT
WKoYDjsyJCHbDXdZDKLcab5Eo8E67ogEFlkIqMvDTwaVEb1xizydxFFCVMUb8SER
mROOGCS9OhJwugjOvA8HAy8ueOuTrSMWZI5XvjoBIaB8mQLEJGEVkc0IneDiC6P+
5wopD/VKGjYqg6xz1EKC
=+wYZ
-----END PGP SIGNATURE-----
Merge tag 'mmc-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
- meson-gx: Revert to earlier tuning process
- bcm2835: Don't overwrite max frequency unconditionally
* tag 'mmc-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: bcm2835: Don't overwrite max frequency unconditionally
Revert "mmc: meson-gx: include tx phase in the tuning process"
* Use the appropriate OOB layout in the VF610 driver
-----BEGIN PGP SIGNATURE-----
iQI5BAABCAAjBQJah+/3HBxib3Jpcy5icmV6aWxsb25AYm9vdGxpbi5jb20ACgkQ
Ze02AX4ItwCMoQ/+NQjeBPDslHM7FdnrPSF6UXAhtuw6NaXnIp6vmnTnkgV9dru/
nNsiWMZCWD8VAXrtSDRrHUbmuHS+qzG/yxUtTkmnDaOG4YaxBHYPaiWg3C1+9ioV
1EzAX/XEb8EK3xbeFVnr1gMxr2E+/43mJF8b0DXpcX340+AQaMxfKYnHlNl0KqCS
qQV4rhqZ4Zm3j+pXcqt1S21EIQTVcmq/iIziLCMRn+T6zESheTMRwh32kh2E8mea
B8L3cmocf34QdPbJZLULgdjaNbHQHSK8WkYI6x2w+1lJvUo7RHmhrU2di41UAfD+
uTeNhratFWrAvBa2kI60cA/SQe2ZqXXuoAKC1WGir9qxLqHmdx54x6rGQZWar+0K
n93deqQyTef20kOiQCUT/Ps4cA8zveZCKpt0CG8Znx4Z7Znjm8J6HMe1yTSVBbQh
lrevi7hcxDnGn61/2zEUld+BmmDDgDLYoEdV1ofenMRbL4cb4WaKDdwdFUP5mMCM
hC6NlrwT+7zZVtdoEUe6QbTeiottDNovow2L5BAv6c3NyYzV48f7KxTb3Rwyv0Yp
Apje7ZUQpNux+NeuJVCQUrerY91rDk8WtnUEI2PIcL32AjR60brtY/MWBqBWclCk
4/iwHihAttz/J5GuDVWcFcF/QId9zYENT4jJLZjDT7MPAvmFcw5rmI4e4AM=
=MaSC
-----END PGP SIGNATURE-----
Merge tag 'mtd/fixes-for-4.16-rc2' of git://git.infradead.org/linux-mtd
Pull mtd fixes from Boris Brezillon:
- add missing dependency to NAND_MARVELL Kconfig entry
- use the appropriate OOB layout in the VF610 driver
* tag 'mtd/fixes-for-4.16-rc2' of git://git.infradead.org/linux-mtd:
mtd: nand: MTD_NAND_MARVELL should depend on HAS_DMA
mtd: nand: vf610: set correct ooblayout
The main attraction is a fix for a bug in the new drmem code, which was causing
an oops on boot on some versions of Qemu.
There's also a fix for XIVE (Power9 interrupt controller) on KVM, as well as a
few other minor fixes.
Thanks to:
Corentin Labbe, Cyril Bur, Cédric Le Goater, Daniel Black, Nathan Fontenot,
Nicholas Piggin.
-----BEGIN PGP SIGNATURE-----
iQIwBAABCAAaBQJah/esExxtcGVAZWxsZXJtYW4uaWQuYXUACgkQUevqPMjhpYCi
KA/+IaDlvxKezRBNQnj6GElBrgfUzICH6MtG6qo+rCKsTMgbAiZJkk3vz/JgqKY/
EusTNCwcqLaPBDgwoSmbazdtnj7YOwBGdIQOq+rC/qeSV0/gpdo02dPUWaMMOE/x
nj+zASrOsv/o9XWX4XmJeuYWhW/8a/nWXKa+oLt3g/5pIHHP5TXTzMHvHH0Rn23D
1ejwDHDwMNL3p2jHlcf+v1DDol51/Kaa8e8KwJJMf00HVfFVXtdnH7do6I1qBeC0
t7PLDeWnpyY+3M1fNJ303EXIqc9DArUCn6tdhy6om96rGvBddORFuRkS4kkXbx76
pnTRPWnPa9aeC2rU+C84sJDQJgeBCpMYOvw96Yr1SxFhE9z0T+9YYTnZxiB7GISK
5BAf3EzE9dc0RtStrfTKnvcuz2OffPq2VZi3sqjiHFDit2TsF+i7ZkX/CR9UAmaH
HPk4Fbi/IzlSfx/RffrOXYrpsTNcUmzvA/Tj83qGhM30LCMQ24o84eGTZN36a/eg
Z+7/MZawtNElsNNpJz6MYtvEQkHZyrTUS+9iyqRwLXCIy/JIHZYwb4aRtu41SET8
lWwuaLjfwdpVPCeEkiNQwCswtppt4j2XS8Ggqef9GkqElsk6JKyxuZWTv2hRfJUK
DTQvPV4PIhvhrB6qvT11qOm5yDSSV9f6yaLJ5dm3BiXulF0=
=8QPF
-----END PGP SIGNATURE-----
Merge tag 'powerpc-4.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"The main attraction is a fix for a bug in the new drmem code, which
was causing an oops on boot on some versions of Qemu.
There's also a fix for XIVE (Power9 interrupt controller) on KVM, as
well as a few other minor fixes.
Thanks to: Corentin Labbe, Cyril Bur, Cédric Le Goater, Daniel Black,
Nathan Fontenot, Nicholas Piggin"
* tag 'powerpc-4.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/pseries: Check for zero filled ibm,dynamic-memory property
powerpc/pseries: Add empty update_numa_cpu_lookup_table() for NUMA=n
powerpc/powernv: IMC fix out of bounds memory access at shutdown
powerpc/xive: Use hw CPU ids when configuring the CPU queues
powerpc: Expose TSCR via sysfs only on powernv
- Updated the page table accessors to use READ/WRITE_ONCE and prevent
compiler transformation that could lead to an apparent loss of
coherency
- Enabled branch predictor hardening for the Falkor CPU
- Fix interaction between kpti enabling and KASan causing the recursive
page table walking to take a significant time
- Fix some sparse warnings
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAlqH6zoACgkQa9axLQDI
XvHwLhAAnnsscis7BVmUCKo0yGQbPAgJaTaWDQGADkSi8N+h3GmdQYWu+PxVMrwh
eLyASZJ+ioYiWfXQ4rvZcOaWeBzJ7/FCxJb/W+RkZoPIkGJl/myY5UEvm+YefbTT
7ilGae0dpNZBcqiO4OPv36Cqc47AnVw+HsjGcBjMrnUHv7sykqdy8VknLHY9VXdu
l3N0XJ2zc8e29yZFq4i+1ryx/pO42Ps+oKGuPVCB6U5wIYRleTU4dSoiKYRMp1jI
4giL0f1Ho8Dw7DTQp/l/KIlFoMPRGCsda5/miByn1MGXYSoyf2mlso5jyegiKX5f
PSRqPIwwKwIYmC/FR3Lb+P+zZo4YltPnfGfXV5VnzUlYLXAm64qDCPiDbG3I/8wL
5VhpTR40XYB456i4w9+/hu8giMpRt19jTbG/Ugk7M+i6OQ2ZD9L/olCn7OlhL0Np
Ra0Uw8CHyQbU8psyN9zZ6EeYhucA41YC2lMxNNPFupwhoB68VvLDBrSPBaisA0Pu
C2fXJEvsahFYxMqHpw/DAOmb8xokVcdHONiMaDV5ovtRXRlxaTZdPEOHhf/I8hF4
n7ng2GBOzOnIAc+GcYboDVtVbdnoCPcijKSROI+XeIeDpfD6gv/iR1HmRJXTK+eV
lnqqXHRJ7M/U2XpJzGDAQulSc9iOxLV8H2tnsxwxMfI8c7T6Kw0=
=O6of
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
"The bulk of this is the pte accessors annotation to READ/WRITE_ONCE
(we tried to avoid pushing this during the merge window to avoid
conflicts)
- Updated the page table accessors to use READ/WRITE_ONCE and prevent
compiler transformation that could lead to an apparent loss of
coherency
- Enabled branch predictor hardening for the Falkor CPU
- Fix interaction between kpti enabling and KASan causing the
recursive page table walking to take a significant time
- Fix some sparse warnings"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: cputype: Silence Sparse warnings
arm64: mm: Use READ_ONCE/WRITE_ONCE when accessing page tables
arm64: proc: Set PTE_NG for table entries to avoid traversing them twice
arm64: Add missing Falkor part number for branch predictor hardening
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABAgAGBQJah+tbAAoJELDendYovxMvHIYH/38IC3Nd3IWTVsLvHXUCxFrn
fNPKgSyC5/igLbmwjPQ+kbr7bha6Vi3uZwmovoC/9i03gYfzmuuMhhOvOVByYHXg
HHC+kqegB7tZ2GFeR2hrIba4UxBz4ZC0R5+qQYHZMx5dRt0/Llby663mkcK7WEWr
Na8jT32AbIOiCKWHgsmTC7h2ZiSXeY+WVj1B3Re7ovLHMTYoMDQhVi5I7w34bcch
bgTLx4TokC8Z3kCNotPAwrL0rQggEJ+PR91j2mL52uEWv80Q4hgR+QIFtEwiYXmG
jDbx4Y+jQUAu4+r6/2z4S/gFTV93lB+dWbXYuHoFv5Mv1A+Ve7Al744RvrNh3gM=
=UVdB
-----END PGP SIGNATURE-----
Merge tag 'for-linus-4.16a-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
- fixes for the Xen pvcalls frontend driver
- fix for booting Xen pv domains
- fix for the xenbus driver user interface
* tag 'for-linus-4.16a-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
pvcalls-front: wait for other operations to return when release passive sockets
pvcalls-front: introduce a per sock_mapping refcount
x86/xen: Calculate __max_logical_packages on PV domains
xenbus: track caller request id
Passive sockets can have ongoing operations on them, specifically, we
have two wait_event_interruptable calls in pvcalls_front_accept.
Add two wake_up calls in pvcalls_front_release, then wait for the
potential waiters to return and release the sock_mapping refcount.
Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
Acked-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Introduce a per sock_mapping refcount, in addition to the existing
global refcount. Thanks to the sock_mapping refcount, we can safely wait
for it to be 1 in pvcalls_front_release before freeing an active socket,
instead of waiting for the global refcount to be 1.
Signed-off-by: Stefano Stabellini <stefano@aporeto.com>
Acked-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Commit fd8aa9095a ("xen: optimize xenbus driver for multiple concurrent
xenstore accesses") optimized xenbus concurrent accesses but in doing so
broke UABI of /dev/xen/xenbus. Through /dev/xen/xenbus applications are in
charge of xenbus message exchange with the correct header and body. Now,
after the mentioned commit the replies received by application will no
longer have the header req_id echoed back as it was on request (see
specification below for reference), because that particular field is being
overwritten by kernel.
struct xsd_sockmsg
{
uint32_t type; /* XS_??? */
uint32_t req_id;/* Request identifier, echoed in daemon's response. */
uint32_t tx_id; /* Transaction id (0 if not related to a transaction). */
uint32_t len; /* Length of data following this. */
/* Generally followed by nul-terminated string(s). */
};
Before there was only one request at a time so req_id could simply be
forwarded back and forth. To allow simultaneous requests we need a
different req_id for each message thus kernel keeps a monotonic increasing
counter for this field and is written on every request irrespective of
userspace value.
Forwarding again the req_id on userspace requests is not a solution because
we would open the possibility of userspace-generated req_id colliding with
kernel ones. So this patch instead takes another route which is to
artificially keep user req_id while keeping the xenbus logic as is. We do
that by saving the original req_id before xs_send(), use the private kernel
counter as req_id and then once reply comes and was validated, we restore
back the original req_id.
Cc: <stable@vger.kernel.org> # 4.11
Fixes: fd8aa9095a ("xen: optimize xenbus driver for multiple concurrent xenstore accesses")
Reported-by: Bhavesh Davda <bhavesh.davda@oracle.com>
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Sparse makes a fair bit of noise about our MPIDR mask being implicitly
long - let's explicitly describe it as such rather than just relying on
the value forcing automatic promotion.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
-----BEGIN PGP SIGNATURE-----
iQI/BAABCAApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAlqHGfMLHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYNqBhAAicRKvMghVLqrmW8wiy81cBxCZ96UL6gaogmtVnL/
jQ37zcgX77qKMzf/5M2grHQsttURBGa3TaMGPC21E6g8vJ++Oe7gTDhswDGj24yY
yJOK5PrqKAqaTSjHn9c64DsCNia8BwMnY2ypT+c9nCAsUh1Jk+bBJMkyQQAx0/i5
/z2rsc7FDZB9Lq7+DOApQB86ALfbeRaS29QRl1yl6wlLKmKKC57mFjHKom9HujsY
UUuzHO8TFppbv/Gsl/UPns3ONPT6of88iCbSTIC44lO0WFtk/lS0qP3KVI9K96uo
/DTmpTJOZn5d1GPGW0tQ23KjRXH+6MZryMX5SRoPZnJJvQLzLHDCu2OCRNFN3SXD
t+wWBS6kW2ZoeDOAwh2Ncp1SC1hhri9WBAT2MS41kwTeMJ4fHt7rofsIRkMjRJEr
vx6j9fmloL9rYT3KOu0eMapfYIlkg549FsPK5QZfOuXDyNdPw+Wxq7wRoEsTjTkI
32rLWnl+5/1nHMlSjPTpnbK9V+42WL8pTy8Rz2TkmjiiNh9WAsxHVg1XzsrEWwKD
5RQBQl7LBFI8jNlF2Ke9iubm45R3Eu9U8BmduF7pfaACrF8uh5KPMkhKFQs/KHl7
NPvFGbKD/1c3BMsRO0ehnoEchL1mo6K4Tnwos9u4TzxcC/bniWmllV0gRAAvs5TF
pQQ=
=p0Hm
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-4.16-2' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fixes from Christoph Hellwig:
"A few dma-mapping fixes for the fallout from the changes in rc1"
* tag 'dma-mapping-4.16-2' of git://git.infradead.org/users/hch/dma-mapping:
powerpc/macio: set a proper dma_coherent_mask
dma-mapping: fix a comment typo
dma-direct: comment the dma_direct_free calling convention
dma-direct: mark as is_phys
ia64: fix build failure with CONFIG_SWIOTLB
In many cases, page tables can be accessed concurrently by either another
CPU (due to things like fast gup) or by the hardware page table walker
itself, which may set access/dirty bits. In such cases, it is important
to use READ_ONCE/WRITE_ONCE when accessing page table entries so that
entries cannot be torn, merged or subject to apparent loss of coherence
due to compiler transformations.
Whilst there are some scenarios where this cannot happen (e.g. pinned
kernel mappings for the linear region), the overhead of using READ_ONCE
/WRITE_ONCE everywhere is minimal and makes the code an awful lot easier
to reason about. This patch consistently uses these macros in the arch
code, as well as explicitly namespacing pointers to page table entries
from the entries themselves by using adopting a 'p' suffix for the former
(as is sometimes used elsewhere in the kernel source).
Tested-by: Yury Norov <ynorov@caviumnetworks.com>
Tested-by: Richard Ruigrok <rruigrok@codeaurora.org>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
We get a warning about some slow configurations in randconfig kernels:
mm/memory.c:83:2: error: #warning Unfortunate NUMA and NUMA Balancing config, growing page-frame for last_cpupid. [-Werror=cpp]
The warning is reasonable by itself, but gets in the way of randconfig
build testing, so I'm hiding it whenever CONFIG_COMPILE_TEST is set.
The warning was added in 2013 in commit 75980e97da ("mm: fold
page->_last_nid into page->flags where possible").
Cc: stable@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A few fixes for outstanding MIPS issues:
- An __init section mismatch warning when brcmstb_pm is enabled.
- A regression handling multiple mem=X@Y arguments (4.11).
- A USB Kconfig select warning, and related sparc cleanup (4.16).
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEd80NauSabkiESfLYbAtpk944dnoFAlqGyoAACgkQbAtpk944
dnrYFg//VABBzIxIfX45PyZdCyPwcCPT+kY1CithGSQwn54E14ckP9OMjwSdFeUf
LNYVtolGWUDWnf6QDYRMeIBfXve8Yury2ekEezJcq5fZlyHJltDnYnGedqfgl7mT
bSJ9in1nPJnV7O68A53YJD+hDdXbBWcHx0g11nOAXGjKOoZecx9WcN/tjecaC12f
9qnsK3q3PDiDPXkl2u9hPBKkEVzK7aZucrVq92ledHcaO+XM+h7bYKRlNP94VxCq
KPzytCbxHRO3VxO7YazE+C6pBVlOMWm4on665qwIqI+huyUV8RTnAsNXk+F0k1kj
QSTa5dr9bgfb1AdRJQeGyHBFcx2rgfcVQ0AEvbPdsiraIDImBT4MpVmq0t7lGJkN
SoMw/bNovlHiNsnU3hpMo8x4wLJ21PFmZ8vBnpn5aVZWpnMaYbmTnD+53WzVuocA
zgARVOYDoAU2rSyrYpnhQGD3f4K7D8e3hHc3SaYpDbBRop/7NGaU8+l+y65bny8B
gNrPNVrJ4+W5se3/ljmhai0/iF4cnqF2UljRxGkqhuUGhb03zDMlxlLe4xzv5au1
fBPowJzueq+b2i7eJ3RZeHs1rZb1O2t18Aud+jv1KSc3cnHmoiBMxcP2QCcknV9F
JMXJ0k6jTK/aArrvNrZeOgMrUXBhzs716g4zUlsCXgy7CVBTUPA=
=BGD8
-----END PGP SIGNATURE-----
Merge tag 'mips_fixes_4.16_2' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips
Pull MIPS fixes from James Hogan:
"A few fixes for outstanding MIPS issues:
- an __init section mismatch warning when brcmstb_pm is enabled
- a regression handling multiple mem=X@Y arguments (4.11)
- a USB Kconfig select warning, and related sparc cleanup (4.16)"
* tag 'mips_fixes_4.16_2' of git://git.kernel.org/pub/scm/linux/kernel/git/jhogan/mips:
sparc,leon: Select USB_UHCI_BIG_ENDIAN_{MMIO,DESC}
usb: Move USB_UHCI_BIG_ENDIAN_* out of USB_SUPPORT
MIPS: Fix incorrect mem=X@Y handling
MIPS: BMIPS: Fix section mismatch warning
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAlqG8poACgkQxWXV+ddt
WDuHSA//eC+69XpHwohI6pcPQ7Jbr9UCj1L/Gt0U96YSzijGW4Hv3OQEWLIRBu4c
nZbzQYtUunpguLYwfXgUUgXRHBTo2Y5bXZNmF2MtL7JcPOLhLh4h/IcGY7eRd2Vq
qvv2bqr3yAcQo7s6z5U/D8ulohzHQTxG7Jaq/BkVxQqhvu+vdu/9T8ikAWnmSTjw
lONu8soR5QO7tewxz23Cguw/t1bWe1aMXG9Ykd4avyhQHtgzNE+l82i4DYUhK2CM
x8M5/CxnDLPe73IJuA2INCUtpPvR4Qufi5Nz6EN3BrJNCGBkmg18sPIvWlH6LsVh
bsm4Lwz/piq+hkDq2GG+Z79uiGAfCVUWAsnm7yYHwpVyMvwHKlfrcVSAuRZixw5E
/NZ0JEkEOtvzpv4inZFYbAgD+oKfvYvwj9BW5BXfu2aH6hJBImfAeMSd1aHB3uZI
kGgy52k2v2P3WKQOFUbmW417P05DvvGmRvRmU+tSFpB+lXAZqRzoiVIuFm0xwhf1
1SmnYgnSYzPmzIRXAMsSYQeK/8NXDdMZMutaw/AYwX+QBEdIAErf6MWcjI6XZRyG
g8Gr8JcpwSa+H5/LKN5uswfXxfSAsqVHnZhbOVrjyGX0wyR4KJg3ag3KsHd9SCxb
LDEjPSYEDn9yfmw6pK2Q6J26FGYiKpuUXaNiYVNymGe6162IiBM=
=VeA/
-----END PGP SIGNATURE-----
Merge tag 'for-4.16-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"We have a few assorted fixes, some of them show up during fstests so I
gave them more testing"
* tag 'for-4.16-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: Fix use-after-free when cleaning up fs_devs with a single stale device
Btrfs: fix null pointer dereference when replacing missing device
btrfs: remove spurious WARN_ON(ref->count < 0) in find_parent_nodes
btrfs: Ignore errors from btrfs_qgroup_trace_extent_post
Btrfs: fix unexpected -EEXIST when creating new inode
Btrfs: fix use-after-free on root->orphan_block_rsv
Btrfs: fix btrfs_evict_inode to handle abnormal inodes correctly
Btrfs: fix extent state leak from tree log
Btrfs: fix crash due to not cleaning up tree log block's dirty bits
Btrfs: fix deadlock in run_delalloc_nocow
non-zero error with 0). This is particularly important given DM
core's increased use of chained bios.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJahv8zAAoJEMUj8QotnQNaiCEH/Attdh6bOzUImbdxIpFWJNNk
F3Jyhge+OI3OHTAbyslFBl3kF0M7jZS3xq9wLpyIb/iIBs6o0N7eGNjkVuDg+Xnw
/SwWNyM1KBr+eYIs55T3KY1vX4YKzwO65hm4sXN6GiSMxPeFsXRTcPJKYzhW42ST
2gKqbtWG9JDyAZgdIFe0AYQF+oVYPX8lCEPNXy7WtmMCFjRan/g7FT0i14GSHy7S
YnQn+Db6Z/BzApDxAozrzj6OUxlVAgIo+6qp/jR8CoN4TX/V8L6gmLWlirrTJxr0
9hET9/NFMWoR9j3Jfatw2YyuMV2LnMo9FJVW9+cRHMLi5NSUdk03A+H1CgsTSNk=
=z7xt
-----END PGP SIGNATURE-----
Merge tag 'for-4.16/dm-chained-bios-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fix from Mike Snitzer:
"Fix for DM core to properly propagate errors (avoids overriding
non-zero error with 0). This is particularly important given DM core's
increased use of chained bios"
* tag 'for-4.16/dm-chained-bios-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: correctly handle chained bios in dec_pending()
Regression fix in keyboard support for Dell laptops.
Prevent out-of-boundary write in WMI bus driver.
Increase timeout to read functional key status on Lenovo laptops.
The following is an automated git shortlog grouped by driver:
dell-laptop:
- Removed duplicates in DMI whitelist
- fix kbd_get_state's request value
ideapad-laptop:
- Increase timeout to wait for EC answer
wmi:
- fix off-by-one write in wmi_dev_probe()
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEhiZOUlnC9oKN3n3AmT3/83c5Sy0FAlqG9DoACgkQmT3/83c5
Sy13xw/9GsRBnuCp3nkZ/Onq1V1q9ij/pfhpB5TCnQpdK5fMYbkd3KWnU78Fz5Ee
xi5NQaIL4xfcClY6ao1jQSBCorM9QEWT7PR9MBUi6+rc+ZrskXKX2XsCB34hTcpF
vMhTf9uqYxZtDNDfctdsRWlezw2DOMwXRZhilcLFm/579gUJR6mvw3k9Tldwaiac
qSRwIwyiZtzpJYCBdHsCzQGH0EUOSBiyWe+l9/G2hUPY+6p0Tt4n08FKAGEMUJL7
qQJj1olzBaTvJHXxsxazs1SZAT+Ti2RXuGnrK0+cdRFYGw1R1yYgNdOoyT5lHsz5
gou0lSzEb4hylqNln/4zae8+lDSG2mrMJ6hWusFCR0mvM7naXRQ2mGq2j9OuU7i5
Pn6bkI2orFIIvvor/M12ehxdYwujnZPqwlQ3TaKvBQ2nJqwqcIC0m/w9kbQeSgEt
83LUOUTI7xlQuHnHbCVq0voOY9cszLbRrolsqTFBSv2yBNZFGZaGet3qU4yCRv5M
MzALOsyl9r3X2gZzDoL5ZguKzprrbkr4ENnUTYf4UATx4SniqtMp1fbPECLtHEm9
DQqVlEjp1MvOQLghcATSrbshxBv4CY53fcDXdunfKetz822zycxDu6y94wcQfFT/
c1twPUHKvchfImVk7EBeYqEVywlQgyjr24wRV/EACXjhJwo7zNE=
=VkoD
-----END PGP SIGNATURE-----
Merge tag 'platform-drivers-x86-v4.16-4' of git://git.infradead.org/linux-platform-drivers-x86
Pull x86 platform driver fixes from Andy Shevchenko:
- regression fix in keyboard support for Dell laptops
- prevent out-of-boundary write in WMI bus driver
- increase timeout to read functional key status on Lenovo laptops
* tag 'platform-drivers-x86-v4.16-4' of git://git.infradead.org/linux-platform-drivers-x86:
platform/x86: dell-laptop: Removed duplicates in DMI whitelist
platform/x86: dell-laptop: fix kbd_get_state's request value
platform/x86: ideapad-laptop: Increase timeout to wait for EC answer
platform/x86: wmi: fix off-by-one write in wmi_dev_probe()
A collection of usual suspects:
- A handful USB-audio and HD-audio device-specific quirks
- Some trivial fixes for the new AC97 bus stuff
- Another race fix in ALSA sequencer core
-----BEGIN PGP SIGNATURE-----
iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAlqGpdIOHHRpd2FpQHN1
c2UuZGUACgkQLtJE4w1nLE/8Dg//Y/jOsYVVatg8JPYZOv/8FopyjM/+tGCFLWij
oxFnsqK3f4mEbE2MQGZVZDSexwfrro7U7PWRrmxE9/DWkakRbwcjOiKRRDfYcGHA
jE1BIQkcbu3HgTzRWACnHpHMFzwWoOyRv8HVq/qMbYcCPgCDgNPXJat1GopwSkK/
k5SCbEOLAqefTv6tcGPSVie9cjfjj1wSJ0M5jV6zJp1+9feDfN0yttEi+fUDOGeR
dfEuHoeaPpd6PnINxdkY6jaoMZtk91TdUpVEaq0srJXxYwO5V7JiOGt7VV4hgk23
tJMLCfA7NGNQBKVcjs/uLDTeWWqlzYVCD7caCp1z97CidfCX5k9f4rK1r1GzMfez
zrIxAY+Y7nHPlE65sTVToHTa5FjS8J3cdAPovJWRjbY6/C9GYvdx1epaz02HPpZm
va1r21oD02a5+UqwOvX0H4rii5Yc63Mt4FelBfEhv+cGq7m2Sduw32UDk09n3Wst
1Q9ivL5gigf4hEnPMG+aZba0KOgLM+Q3P+EeivoAQhQrp0A9TqZ3aCmDNGTmhc51
Yhgz+kJHcLtDfJnGQXlG7M+pDSepk1Pb8Yl+m0u305sf+hyFwYrcHU5tWc/u+bX+
ZMa1rGEr21yIvBadcO8qrGPiOgWCgGzTTAeI0kMlMZaofOmQQTaiszGVL+/gjOZu
pmNbkfc=
=UMDx
-----END PGP SIGNATURE-----
Merge tag 'sound-4.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A collection of usual suspects:
- a handful USB-audio and HD-audio device-specific quirks
- some trivial fixes for the new AC97 bus stuff
- another race fix in ALSA sequencer core"
* tag 'sound-4.16-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek: PCI quirk for Fujitsu U7x7
ALSA: seq: Fix racy pool initializations
ALSA: usb: add more device quirks for USB DSD devices
ALSA: usb-audio: Fix UAC2 get_ctl request with a RANGE attribute
ALSA: ac97: Fix copy and paste typo in documentation
ALSA: usb-audio: add implicit fb quirk for Behringer UFX1204
ALSA: ac97: kconfig: Remove select of undefined symbol AC97
ALSA: hda/realtek - Enable Thinkpad Dock device for ALC298 platform
ALSA: hda/realtek - Add headset mode support for Dell laptop
ALSA: hda - Fix headset mic detection problem for two Dell machines
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJahnE9AAoJEAx081l5xIa+io0QAJlHvqgoDwzgzEJ2ORyLEMkR
GOg79Bq7Km3KEtrPNMYpVC/c0jsZHeaYIYdo0RhRhpeJRJtuudcqielyVsTGPbBL
Hb7rE2XrcfT3ttmcAe/1OlwVhEyzn9+6DEMIQgY1nYmxBjTMCoeZisOkUSNZzMrk
f5/x+4lGp4mSJ33ruKI01pq22DPDXC7rKobjr22OiZSSr1Gw9ZVvTBdPi/zU6izi
vqNjzgdTnBfW2lOqB/MZneUioXUsagZ+GahCm0AxUYUIFbuUtwnu/dG7eljhjQZg
pXLcsnssXn4+a9Q2uLZ1FSXBei7m6ye23UmRWW8omS45sfXPVsfeNcP9WHISE/Ln
7HDlIdU+KOvTLnq4dLbeWTPgK30cokXJ7SVFR+R3VyVB4hWVDLwZzXSMRnLlxUUT
qc0+9txZDkTBo6COjRHt0B3Umw3n2NABa0cx8T44m/N2S0pNdgksxltZIq84GzbL
btRIYf5LOqJeClzlT7X+XfSOGycEpeKul1NecfBHlW3UHI7EupN5X28hd7nLVM8E
lGkljaOezFhSqBSDkR2dQueY04ZsdGUH+7ehh3cLoMpxlsj/CHeh7BNre0Lzs+LT
1q02X/YyeB/ErMCQypN1hBrRk5D5TSDBjAVb5UhyUClFONgakdnfGgvby8jQ6pqh
WeuBV0kG31A8c/+OXjn+
=+PbR
-----END PGP SIGNATURE-----
Merge tag 'drm-fixes-for-v4.16-rc2' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"One nouveau regression fix, one AMD quirk and a full set of i915
fixes.
The i915 fixes are mostly for things caught by their CI system, main
ones being DSI panel fixes and GEM fixes"
* tag 'drm-fixes-for-v4.16-rc2' of git://people.freedesktop.org/~airlied/linux:
drm/nouveau: Make clock gate support conditional
drm/i915: Fix DSI panels with v1 MIPI sequences without a DEASSERT sequence v3
drm/i915: Free memdup-ed DSI VBT data structures on driver_unload
drm/i915: Add intel_bios_cleanup() function
drm/i915/vlv: Add cdclk workaround for DSI
drm/i915/gvt: fix one typo of render_mmio trace
drm/i915/gvt: Support BAR0 8-byte reads/writes
drm/i915/gvt: add 0xe4f0 into gen9 render list
drm/i915/pmu: Fix building without CONFIG_PM
drm/i915/pmu: Fix sleep under atomic in RC6 readout
drm/i915/pmu: Fix PMU enable vs execlists tasklet race
drm/i915: Lock out execlist tasklet while peeking inside for busy-stats
drm/i915/breadcrumbs: Ignore unsubmitted signalers
drm/i915: Don't wake the device up to check if the engine is asleep
drm/i915: Avoid truncation before clamping userspace's priority value
drm/i915/perf: Fix compiler warning for string truncation
drm/i915/perf: Fix compiler warning for string truncation
drm/amdgpu: add new device to use atpx quirk
dec_pending() is given an error status (possibly 0) to be recorded
against a bio. It can be called several times on the one 'struct
dm_io', and it is careful to only assign a non-zero error to
io->status. However when it then assigned io->status to bio->bi_status,
it is not careful and could overwrite a genuine error status with 0.
This can happen when chained bios are in use. If a bio is chained
beneath the bio that this dm_io is handling, the child bio might
complete and set bio->bi_status before the dm_io completes.
This has been possible since chained bios were introduced in 3.14, and
has become a lot easier to trigger with commit 18a25da843 ("dm: ensure
bio submission follows a depth-first tree walk") as that commit caused
dm to start using chained bios itself.
A particular failure mode is that if a bio spans an 'error' target and a
working target, the 'error' fragment will complete instantly and set the
->bi_status, and the other fragment will normally complete a little
later, and will clear ->bi_status.
The fix is simply to only assign io_error to bio->bi_status when
io_error is not zero.
Reported-and-tested-by: Milan Broz <gmazyland@gmail.com>
Cc: stable@vger.kernel.org (v3.14+)
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
- a MIPS GIC fix for spurious, masked interrupts
- a fix for a subtle IPI bug in GICv3
- do no probe GICv3 ITSs that are marked as disabled
- multi-MSI support for GICv2m
- various cleanups
-----BEGIN PGP SIGNATURE-----
iQJJBAABCAAzFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAlqG6iAVHG1hcmMuenlu
Z2llckBhcm0uY29tAAoJECPQ0LrRPXpDKp8P/2NinZ/E7OveFyMAIKXx9yHTHIsu
upAXBs6sEn8RKuKXgT0bcsiCrDNsZK7kbMmgJ6OgLOi0pNeR1/j8DzzCbrJ7OLyc
Nl0/jQkEkVSMjULqUbwLdxTnS5WP/mP/MPTIQQWc2Hagq9633oZ4IlJVg56z+gJ5
rzISq+d6PlpT7ruknQI8GDXPes7bDms2mKGmayCKbQ5lD88NUVu3V85WHIeKh8ax
U+2gW/9EfGdtcJg/zSJKJis82X1JvMEBt0xLE8xvI3YxRTpJc30DfEamATrqRiUK
tYYeUPy8QDvOuOwDv+mMU9I5jMIFK/WZOPjhkXfa7NK1NSAG6IDMiioaSRX/3sJM
iNhgrNuJvh5bescobOK35HOLM/t3bLoqXBSAkahus2xlRY9GI14Khf402VKqpZRf
BjLhpqkwRs4rSe3FtQJ4J/Aq5Rk5sCeSzGZXsQK9CJjZtIDeGXc8bD5uGHNoczW4
44XP2HG/J/V4ZovNRHT7GqZkL0Md96KNJ+f3SifRucYi2rCatD6D75KDw3H2sdtL
Fs+DPkgm2vPlMowfpPH4VWvwsZ7KEEe5i1zU80j7llopaY6Ivy537jg9s1Gh0tj5
z3yd/IEb0Ytn2m1/dBrnr80pP2TyGdbNJkZAZkj/mlzl6saCgtHTksAUK7doMByK
aLy2TxU+NzBXotuo
=hEZQ
-----END PGP SIGNATURE-----
Merge tag 'irqchip-4.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent
Pull irqchip updates for 4.16-rc2 from Marc Zyngier
- A MIPS GIC fix for spurious, masked interrupts
- A fix for a subtle IPI bug in GICv3
- Do not probe GICv3 ITSs that are marked as disabled
- Multi-MSI support for GICv2m
- Various cleanups
...instead of open coding file operations followed by custom ->open()
callbacks per each attribute.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Since commit ad67b74d24 ("printk: hash addresses printed with %p")
pointers are being hashed when printed. Displaying the virtual memory at
bootup time is not helpful. so delete the prints.
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Jaedon Shin <jaedon.shin@gmail.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
We'd never implemented Multi-MSI support with GICv2m, because
it is weird and clunky, and you'd think people would rather use
MSI-X.
Turns out there is still plenty of devices out there that rely
on Multi-MSI. Oh well, let's teach that trick to the v2m widget,
it is not a big deal anyway.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
On some platforms there's an ITS available but it's not enabled
because reading or writing the registers is denied by the
firmware. In fact, reading or writing them will cause the system
to reset. We could remove the node from DT in such a case, but
it's better to skip nodes that are marked as "disabled" in DT so
that we can describe the hardware that exists and use the status
property to indicate how the firmware has configured things.
Cc: Stuart Yoder <stuyoder@gmail.com>
Cc: Laurentiu Tudor <laurentiu.tudor@nxp.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Rajendra Nayak <rnayak@codeaurora.org>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
A DMB instruction can be used to ensure the relative order of only
memory accesses before and after the barrier. Since writes to system
registers are not memory operations, barrier DMB is not sufficient
for observability of memory accesses that occur before ICC_SGI1R_EL1
writes.
A DSB instruction ensures that no instructions that appear in program
order after the DSB instruction, can execute until the DSB instruction
has completed.
Cc: stable@vger.kernel.org
Acked-by: Will Deacon <will.deacon@arm.com>,
Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Commit 7778c4b27c ("irqchip: mips-gic: Use pcpu_masks to avoid reading
GIC_SH_MASK*") removed the read of the hardware mask register when
handling shared interrupts, instead using the driver's shadow pcpu_masks
entry as the effective mask. Unfortunately this did not take account of
the write to pcpu_masks during gic_shared_irq_domain_map, which
effectively unmasks the interrupt early. If an interrupt is asserted,
gic_handle_shared_int decodes and processes the interrupt even though it
has not yet been unmasked via gic_unmask_irq, which also sets the
appropriate bit in pcpu_masks.
On the MIPS Boston board, when a console command line of
"console=ttyS0,115200n8r" is passed, the modem status IRQ is enabled in
the UART, which is immediately raised to the GIC. The interrupt has been
mapped, but no handler has yet been registered, nor is it expected to be
unmasked. However, the write to pcpu_masks in gic_shared_irq_domain_map
has effectively unmasked it, resulting in endless reports of:
[ 5.058454] irq 13, desc: ffffffff80a7ad80, depth: 1, count: 0, unhandled: 0
[ 5.062057] ->handle_irq(): ffffffff801b1838,
[ 5.062175] handle_bad_irq+0x0/0x2c0
Where IRQ 13 is the UART interrupt.
To fix this, just remove the write to pcpu_masks in
gic_shared_irq_domain_map. The existing write in gic_unmask_irq is the
correct place for what is now the effective unmasking.
Cc: stable@vger.kernel.org
Fixes: 7778c4b27c ("irqchip: mips-gic: Use pcpu_masks to avoid reading GIC_SH_MASK*")
Signed-off-by: Matt Redfearn <matt.redfearn@mips.com>
Reviewed-by: Paul Burton <paul.burton@mips.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Some versions of QEMU will produce an ibm,dynamic-reconfiguration-memory
node with a ibm,dynamic-memory property that is zero-filled. This
causes the drmem code to oops trying to parse this property.
The fix for this is to validate that the property does contain LMB
entries before trying to parse it and bail if the count is zero.
Oops: Kernel access of bad area, sig: 11 [#1]
DAR: 0000000000000010
NIP read_drconf_v1_cell+0x54/0x9c
LR read_drconf_v1_cell+0x48/0x9c
Call Trace:
__param_initcall_debug+0x0/0x28 (unreliable)
drmem_init+0x144/0x2f8
do_one_initcall+0x64/0x1d0
kernel_init_freeable+0x298/0x38c
kernel_init+0x24/0x160
ret_from_kernel_thread+0x5c/0xb4
The ibm,dynamic-reconfiguration-memory device tree property generated
that causes this:
ibm,dynamic-reconfiguration-memory {
ibm,lmb-size = <0x0 0x10000000>;
ibm,memory-flags-mask = <0xff>;
ibm,dynamic-memory = <0x0 0x0 0x0 0x0 0x0 0x0>;
linux,phandle = <0x7e57eed8>;
ibm,associativity-lookup-arrays = <0x1 0x4 0x0 0x0 0x0 0x0>;
ibm,memory-preservation-time = <0x0>;
};
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Reviewed-by: Cyril Bur <cyrilbur@gmail.com>
Tested-by: Daniel Black <daniel@linux.vnet.ibm.com>
[mpe: Trim oops report]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>