With ip6gre we have a tunnel header which also makes the tunnel MTU
smaller. We need to reserve room for it. Previously we were using up
space reserved for the Tunnel Encapsulation Limit option
header (RFC 2473).
Also, after commit b05229f442 ("gre6: Cleanup GREv6 transmit path,
call common GRE functions") our contract with the caller has
changed. Now we check if the packet length exceeds the tunnel MTU after
the tunnel header has been pushed, unlike before.
This is reflected in the check where we look at the packet length minus
the size of the tunnel header, which is already accounted for in tunnel
MTU.
Fixes: b05229f442 ("gre6: Cleanup GREv6 transmit path, call common GRE functions")
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix to show correct locations for events on modules by relocating given
address instead of retrying after failure.
This happens when the module text size is big enough, bigger than
sh_addr, because the original code retries with given address + sh_addr
if it failed to find CU DIE at the given address.
Any address smaller than sh_addr always fails and it retries with the
correct address, but addresses bigger than sh_addr will get a CU DIE
which is on the given address (not adjusted by sh_addr).
In my environment(x86-64), the sh_addr of ".text" section is 0x10030.
Since i915 is a huge kernel module, we can see this issue as below.
$ grep "[Tt] .*\[i915\]" /proc/kallsyms | sort | head -n1
ffffffffc0270000 t i915_switcheroo_can_switch [i915]
ffffffffc0270000 + 0x10030 = ffffffffc0280030, so we'll check
symbols cross this boundary.
$ grep "[Tt] .*\[i915\]" /proc/kallsyms | grep -B1 ^ffffffffc028\
| head -n 2
ffffffffc027ff80 t haswell_init_clock_gating [i915]
ffffffffc0280110 t valleyview_init_clock_gating [i915]
So setup probes on both function and see what happen.
$ sudo ./perf probe -m i915 -a haswell_init_clock_gating \
-a valleyview_init_clock_gating
Added new events:
probe:haswell_init_clock_gating (on haswell_init_clock_gating in i915)
probe:valleyview_init_clock_gating (on valleyview_init_clock_gating in i915)
You can now use it in all perf tools, such as:
perf record -e probe:valleyview_init_clock_gating -aR sleep 1
$ sudo ./perf probe -l
probe:haswell_init_clock_gating (on haswell_init_clock_gating@gpu/drm/i915/intel_pm.c in i915)
probe:valleyview_init_clock_gating (on i915_vga_set_decode:4@gpu/drm/i915/i915_drv.c in i915)
As you can see, haswell_init_clock_gating is correctly shown,
but valleyview_init_clock_gating is not.
With this patch, both events are shown correctly.
$ sudo ./perf probe -l
probe:haswell_init_clock_gating (on haswell_init_clock_gating@gpu/drm/i915/intel_pm.c in i915)
probe:valleyview_init_clock_gating (on valleyview_init_clock_gating@gpu/drm/i915/intel_pm.c in i915)
Committer notes:
In my case:
# perf probe -m i915 -a haswell_init_clock_gating -a valleyview_init_clock_gating
Added new events:
probe:haswell_init_clock_gating (on haswell_init_clock_gating in i915)
probe:valleyview_init_clock_gating (on valleyview_init_clock_gating in i915)
You can now use it in all perf tools, such as:
perf record -e probe:valleyview_init_clock_gating -aR sleep 1
# perf probe -l
probe:haswell_init_clock_gating (on i915_getparam+432@gpu/drm/i915/i915_drv.c in i915)
probe:valleyview_init_clock_gating (on __i915_printk+240@gpu/drm/i915/i915_drv.c in i915)
#
# readelf -SW /lib/modules/4.9.0+/build/vmlinux | egrep -w '.text|Name'
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
[ 1] .text PROGBITS ffffffff81000000 200000 822fd3 00 AX 0 0 4096
#
So both are b0rked, now with the fix:
# perf probe -m i915 -a haswell_init_clock_gating -a valleyview_init_clock_gating
Added new events:
probe:haswell_init_clock_gating (on haswell_init_clock_gating in i915)
probe:valleyview_init_clock_gating (on valleyview_init_clock_gating in i915)
You can now use it in all perf tools, such as:
perf record -e probe:valleyview_init_clock_gating -aR sleep 1
# perf probe -l
probe:haswell_init_clock_gating (on haswell_init_clock_gating@gpu/drm/i915/intel_pm.c in i915)
probe:valleyview_init_clock_gating (on valleyview_init_clock_gating@gpu/drm/i915/intel_pm.c in i915)
#
Both looks correct.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/148411436777.9978.1440275861947194930.stgit@devbox
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This is an IPv6 version of commit 24803f38a5 ("igmp: do not remove igmp
souce list..."). In mld_del_delrec(), we will restore back all source filter
info instead of flush them.
Move mld_clear_delrec() from ipv6_mc_down() to ipv6_mc_destroy_dev() since
we should not remove source list info when set link down. Remove
igmp6_group_dropped() in ipv6_mc_destroy_dev() since we have called it in
ipv6_mc_down().
Also clear all source info after igmp6_group_dropped() instead of in it
because ipv6_mc_down() will call igmp6_group_dropped().
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
bugs.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJYfOk6AAoJECebzXlCjuG+Lj4QALaLKRRbIdrz6nmg7gUmpTWc
CdW8NMbzwSCXmYoivsTHBlhXZKsi5vVjnFXMCM/P85ddmipXdcTFCDLmmNoKUQ0M
jODlLX90ctaZKCDBVSaH4htAz2gkFv7z5IllX0YDQqHyiuzh/9KoV+AFCgPZPTpL
O1XRmfWz+yJDydz4hb3i5f2JvMk9P/tCXLnheuxxTIMSl2/fIfgF81eWwDpFqcA2
27+PyWWjZehVnZ77ca/mWJj2n0+gBINiKafcfF39NK/Hv2q4aauB3k7c4blecc9Q
m/IT3mKifvHvdNCmvHD5s74h4OikEGYpqaSjonMptZnWgfM4/gtF7yTiQjsOMDx/
w6W/tfHlGrvegpzhjaIaoZZ50EZp7xwGNNZYgH4J44kytYpolrhsOR6NqCLTqpej
xG2Kd89ZtnAgc/7T7ET/1PqpZ8f9M9pyV3E8s36OvF4AYQUNrfzbWSTQcZy3WGBP
YuoUCzacIbNbGgu4m6Zx5l/vKW5yn45xbUMp7T9S4WoxYMx6a5vViU0NiF7KsQDu
pcDT92DZ57KJFtCw7Ig08ILKsSXmNApH5/4mIrkX3quZuH4j2XapEJ9u//fmfZBd
Q+Sgv8RXcGELUJIg9yfmoWgPDA/oYslc7ynBV0lXLNgBuod//dGSlZ+6KfFFJYr8
XVOxwPTiiBIlc9lvB9eA
=tb4L
-----END PGP SIGNATURE-----
Merge tag 'nfsd-4.10-1' of git://linux-nfs.org/~bfields/linux
Pull nfsd fixes from Bruce Fields:
"Miscellaneous nfsd bugfixes, one for a 4.10 regression, three for
older bugs"
* tag 'nfsd-4.10-1' of git://linux-nfs.org/~bfields/linux:
svcrdma: avoid duplicate dma unmapping during error recovery
sunrpc: don't call sleeping functions from the notifier block callbacks
svcrpc: don't leak contexts on PROC_DESTROY
nfsd: fix supported attributes for acl & labels
The following patch was sketched by Russell in response to my
crashes on the PB11MPCore after the patch for software-based
priviledged no access support for ARMv8.1. See this thread:
http://marc.info/?l=linux-arm-kernel&m=144051749807214&w=2
I am unsure what is going on, I suspect everyone involved in
the discussion is. I just want to repost this to get the
discussion restarted, as I still have to apply this patch
with every kernel iteration to get my PB11MPCore Realview
running.
Testing by Neil Armstrong on the Oxnas NAS has revealed that
this bug exist also on that widely deployed hardware, so
we are probably currently regressing all ARM11MPCore systems.
Cc: Russell King <linux@armlinux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Fixes: a5e090acbf ("ARM: software-based priviledged-no-access support")
Tested-by: Neil Armstrong <narmstrong@baylibre.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
During interface opening MAC address stored in netdev->dev_addr is
programmed in the HW with exception of BE3 VFs where the initial
MAC is programmed by parent PF. This is OK when MAC address is not
changed when an interfaces is down. In this case the requested MAC is
stored to netdev->dev_addr and later is stored into HW during opening.
But this is not done for all BE3 VFs so the NIC HW does not know
anything about this change and all traffic is filtered.
This is the case of bonding if fail_over_mac == 0 where the MACs of
the slaves are changed while they are down.
The be2net behavior is too restrictive because if a BE3 VF has
the FILTMGMT privilege then it is able to modify its MAC without
any restriction.
To solve the described problem the driver should take care about these
privileged BE3 VFs so the MAC is programmed during opening. And by
contrast unpriviled BE3 VFs should not be allowed to change its MAC
in any case.
Cc: Sathya Perla <sathya.perla@broadcom.com>
Cc: Ajit Khaparde <ajit.khaparde@broadcom.com>
Cc: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Cc: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
BE3 VFs without FILTMGMT privilege are not allowed to modify its MAC,
VLAN table and UC/MC lists. So don't try to delete MAC on such VFs.
Cc: Sathya Perla <sathya.perla@broadcom.com>
Cc: Ajit Khaparde <ajit.khaparde@broadcom.com>
Cc: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Cc: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Return value from be_mcc_notify_wait() contains a base completion status
together with an additional status. The base_status() macro need to be
used to access base status.
Fixes: e3a7ae2 be2net: Changing MAC Address of a VF was broken
Cc: Sathya Perla <sathya.perla@broadcom.com>
Cc: Ajit Khaparde <ajit.khaparde@broadcom.com>
Cc: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Cc: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Ivan Vecera <cera@cera.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
The #warning was present 10 years ago when the driver first got merged.
As the platform is rather obsolete by now, it seems very unlikely that
the warning will cause anyone to fix the code properly.
kernelci.org reports the warning for every build in the meantime, so
I think it's better to just turn it into a code comment to reduce
noise.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Update my entries in the MAINTAINERS file with the same email address
for kernel work, and, now that the git tree is hosted on more suitable
hardware, add git tree references where appropriate.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Disable BH around the call to napi_schedule() to avoid following warning
[ 52.095499] NOHZ: local_softirq_pending 08
[ 52.421291] NOHZ: local_softirq_pending 08
[ 52.608313] NOHZ: local_softirq_pending 08
Fixes: 8d59de8f7b ("net/mlx4_en: Process all completions in RX rings after port goes up")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Erez Shitrit <erezsh@mellanox.com>
Cc: Eugenia Emantayev <eugenia@mellanox.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Acked-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hedberg says:
====================
pull request: bluetooth 2017-01-16
Here are a couple of important 802.15.4 driver fixes for the 4.10
kernel.
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Regressions for not being able to detect an eMMC HS DDR mode card has been
reported for the sdhci-esdhc-imx driver, but potentially other sdhci
variants may suffer from the similar problem.
The commit e173f8911f ("mmc: core: Update CMD13 polling policy when
switch to HS DDR mode"), is causing the problem. It seems that change moved
one step to far, regarding changing the host's timing before polling for a
busy card.
To fix this, let's move back to the behaviour when the host's timing is
updated after the polling, but before the switch status is fetched and
validated.
In cases when polling with CMD13, we keep validating the switch status at
each attempt. However, to align with the other card busy detections
mechanism, let's fetch and validate the switch status also after the host's
timing is updated.
Reported-by: Clemens Gruber <clemens.gruber@pqgruber.com>
Reported-by: Gary Bisson <gary.bisson@boundarydevices.com>
Fixes: e173f8911f ("mmc: core: Update CMD13 polling policy when switch..")
Cc: Shawn Lin <shawn.lin@rock-chips.com>
Cc: Dong Aisheng <aisheng.dong@nxp.com>
Cc: Haibo Chen <haibo.chen@nxp.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Tested-by: Clemens Gruber <clemens.gruber@pqgruber.com>
Tested-by: Jagan Teki <jagan@amarulasolutions.com>
Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com>
Tested-by: Haibo Chen <haibo.chen@nxp.com>
Reviewed-by: Dong Aisheng <aisheng.dong@nxp.com>
Mathieu reported that the LTTNG modules are broken as of 4.10-rc1 due to
the removal of the cpu hotplug notifiers.
Usually I don't care much about out of tree modules, but LTTNG is widely
used in distros. There are two ways to solve that:
1) Reserve a hotplug state for LTTNG
2) Add a dynamic range for the prepare states.
While #1 is the simplest solution, #2 is the proper one as we can convert
in tree users, which do not care about ordering, to the dynamic range as
well.
Add a dynamic range which allows LTTNG to request states in the prepare
stage.
Reported-and-tested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Sewior <bigeasy@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1701101353010.3401@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull an urgent RCU fix from Paul E. McKenney:
"This series contains a pair of commits that permit RCU synchronous grace
periods (synchronize_rcu() and friends) to work correctly throughout boot.
This eliminates the current "dead time" starting when the scheduler spawns
its first taks and ending when the last of RCU's kthreads is spawned
(this last happens during early_initcall() time). Although RCU's
synchronous grace periods have long been documented as not working
during this time, prior to 4.9, the expedited grace periods worked by
accident, and some ACPI code came to rely on this unintentional behavior.
(Note that this unintentional behavior was -not- reliable. For example,
failures from ACPI could occur on !SMP systems and on systems booting
with the rcu_normal kernel boot parameter.)
Either way, there is a bug that needs fixing, and the 4.9 switch of RCU's
expedited grace periods to workqueues could be considered to have caused
a regression. This series therefore makes RCU's expedited grace periods
operate correctly throughout the boot process. This has been demonstrated
to fix the problems ACPI was encountering, and has the added longer-term
benefit of simplifying RCU's behavior."
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We have quite a lot of code that depends on the order of the
__ctl_load inline assemby and subsequent memory accesses, like
e.g. disabling lowcore protection and the writing to lowcore.
Since the __ctl_load macro does not have memory barrier semantics, nor
any other dependencies the compiler is, theoretically, free to shuffle
code around. Or in other words: storing to lowcore could happen before
lowcore protection is disabled.
In order to avoid this class of potential bugs simply add a full
memory barrier to the __ctl_load macro.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
in actually sending them out - will try to do better in
the future:
* handle VHT opmode properly when hostapd is controlling
full station state
* two fixes for minimum channel width in mac80211
* don't leave SMPS set to junk in HT capabilities
* fix headroom when forwarding mesh packets, recently
broken by another fix that failed to take into account
frame encryption
* fix the TID in null-data packets indicating EOSP (end
of service period) in U-APSD
* prevent attempting to use (and then failing which
results in crashes) TXQs on stations that aren't added
to the driver yet
-----BEGIN PGP SIGNATURE-----
iQIcBAABCgAGBQJYeNveAAoJEGt7eEactAAdCjgP+gIIUjpH08MazRE9Nl6gqRGW
G6EAoaQDQ8AjiVo8nIJ33X2jmHRIPcyelyfMhdww3AtMJRrvl9wVhQvwKWI4+l6D
o8UVMBu66fq2luQA6AtOkkU6SkKwC7Se72Mdx6OA/zvdZz4/9Y4nZpOFPVwCbxqL
UJRzjrhnbS51YgL/Y0s0Vp+7Rv7IYCuH9JORNarMO5sYxaVpLWihJhbShX1bOshw
uFTHOjNRseImLhM4GOvVA7fSUiK8jxEuMECmlDKQB/6nVxMskE54yLOqMB5Ys6va
2CKjp5xqM4FfqB4LMNp7soJAXUOXvqk25JXAAbkVNo4VRd2Y4GoJ7XdBPqd4kfKb
rPlr+UP2xaSQsfqIoe+uFr/lVUmm4oCTrS/Mo7YVrjSMU7fntYwZccr9aw5jrQFW
YU+1QTF25HEb1SL18FH9JNWpoTlOlpB3bwpbAW4BHzsqDYc76CE/oyDLdZ6zQYlu
z92cLuycGTwgNhi1zDNThxB81zhZsWH1Jbh9ppZBdDPxo6E4DMK71okreGAXBEMJ
IdFZZqJbYW4I/TsUtse4atozk6oXlTEFY/pX4Qm0gwGwmqQx+wfSNLbymsD7gR42
OkL+bZ701HBl2gJUWuICrM2lFWtD4/o6oWpaW2I7QUAQhjvUr4ld9kthVMF2vdlt
w316EIfZQzBKw7Z7zjZI
=GU6q
-----END PGP SIGNATURE-----
Merge tag 'mac80211-for-davem-2017-01-13' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
We have a number of fixes, in part because I was late
in actually sending them out - will try to do better in
the future:
* handle VHT opmode properly when hostapd is controlling
full station state
* two fixes for minimum channel width in mac80211
* don't leave SMPS set to junk in HT capabilities
* fix headroom when forwarding mesh packets, recently
broken by another fix that failed to take into account
frame encryption
* fix the TID in null-data packets indicating EOSP (end
of service period) in U-APSD
* prevent attempting to use (and then failing which
results in crashes) TXQs on stations that aren't added
to the driver yet
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull namespace fixes from Eric Biederman:
"This tree contains 4 fixes.
The first is a fix for a race that can causes oopses under the right
circumstances, and that someone just recently encountered.
Past that are several small trivial correct fixes. A real issue that
was blocking development of an out of tree driver, but does not appear
to have caused any actual problems for in-tree code. A potential
deadlock that was reported by lockdep. And a deadlock people have
experienced and took the time to track down caused by a cleanup that
removed the code to drop a reference count"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
sysctl: Drop reference added by grab_header in proc_sys_readdir
pid: fix lockdep deadlock warning due to ucount_lock
libfs: Modify mount_pseudo_xattr to be clear it is not a userspace mount
mnt: Protect the mountpoint hashtable with mount_lock
Here are some small char/misc driver fixes for 4.10-rc4 that resolve
some reported issues. The MEI driver issue resolves a lot of problems
that people have been having, as does the mem driver fix. The other
minor fixes resolve other reported issues.
All of these have been in linux-next for a while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWHtrhQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+yk7xACeJZyxp00SO0WqDZlOWmLShYtwfdEAoLgT3A7t
JkoIfE+/JNbpfLAPwl4N
=gSvf
-----END PGP SIGNATURE-----
Merge tag 'char-misc-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are some small char/misc driver fixes for 4.10-rc4 that resolve
some reported issues.
The MEI driver issue resolves a lot of problems that people have been
having, as does the mem driver fix. The other minor fixes resolve
other reported issues.
All of these have been in linux-next for a while"
* tag 'char-misc-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
vme: Fix wrong pointer utilization in ca91cx42_slave_get
auxdisplay: fix new ht16k33 build errors
ppdev: don't print a free'd string
extcon: return error code on failure
drivers: char: mem: Fix thinkos in kmem address checks
mei: bus: enable OS version only for SPT and newer
Here is a single patch being reverted to remove a feature that was added
in 4.10-rc1 that isn't quite ready for release. It will be redone as a
debugfs file instead of a sysfs file in the future.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWHtr+A8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymkJQCfZgr9tNZfzUZxojP6lEIiSAEsDO0Anj8l1ZZf
PRICi50BvVtENhOyE25m
=arvB
-----END PGP SIGNATURE-----
Merge tag 'driver-core-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fix from Greg KH:
"Here is a single patch being reverted to remove a feature that was
added in 4.10-rc1 that isn't quite ready for release.
It will be redone as a debugfs file instead of a sysfs file in the
future"
* tag 'driver-core-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
Revert "driver core: Add deferred_probe attribute to devices in sysfs"
Here are some small tty/serial driver fixes for 4.10-rc4 to resolve a
number of reported issues.
Nothing major here at all, one revert of a problematic patch, and some
other tiny bugfixes. Full details are in the shortlog below.
All have been in linux-next with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWHtshA8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ym6PQCeJtdwrkIQmisQZYcMpAZrnQnpF9YAnjrcqWZ4
2glC2IsH8mDm5DR81axE
=GIdT
-----END PGP SIGNATURE-----
Merge tag 'tty-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are some small tty/serial driver fixes for 4.10-rc4 to resolve a
number of reported issues.
Nothing major here at all, one revert of a problematic patch, and some
other tiny bugfixes. Full details are in the shortlog below.
All have been in linux-next with no reported issues"
* tag 'tty-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
sysrq: attach sysrq handler correctly for 32-bit kernel
Revert "tty: serial: 8250: add CON_CONSDEV to flags"
Clearing FIFOs in RS485 emulation mode causes subsequent transmits to break
8250_pci: Fix potential use-after-free in error path
tty/serial: atmel: RS485 half duplex w/DMA: enable RX after TX is done
tty/serial: atmel_serial: BUG: stop DMA from transmitting in stop_tx
Here are a few small USB driver fixes for 4.10-rc4 to resolve some
reported issues. The "largest" here is a number of bugs being fixed in
the ch341 usb-serial driver, to hopefully resolve the mess of different
devices floating around that use this driver that have been having
problems with the 4.10-rc1 release. There's also a tiny musb fix that I
missed in the last pull request, as well as the traditional xhci fix
rounding out the batch.
All have been in linux-next with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWHttLg8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ylfnQCguPTDSPkdU5vSsu8eCEplsql6izUAnjz5WAuI
YDQBXrYkmQ5HRM4U2/8T
=M2Eg
-----END PGP SIGNATURE-----
Merge tag 'usb-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are a few small USB driver fixes for 4.10-rc4 to resolve some
reported issues.
The "largest" here is a number of bugs being fixed in the ch341
usb-serial driver, to hopefully resolve the mess of different devices
floating around that use this driver that have been having problems
with the 4.10-rc1 release.
There's also a tiny musb fix that I missed in the last pull request,
as well as the traditional xhci fix rounding out the batch.
All have been in linux-next with no reported issues"
* tag 'usb-4.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
xhci: fix deadlock at host remove by running watchdog correctly
USB: serial: ch341: fix control-message error handling
usb: musb: fix runtime PM in debugfs
wusbcore: Fix one more crypto-on-the-stack bug
USB: serial: kl5kusb105: fix line-state error handling
USB: serial: ch341: fix baud rate and line-control handling
USB: serial: ch341: fix line settings after reset-resume
USB: serial: ch341: fix resume after reset
USB: serial: ch341: fix open error handling
USB: serial: ch341: fix modem-control and B0 handling
USB: serial: ch341: fix open and resume after B0
USB: serial: ch341: fix initial modem-control state
Pull i2c fixes from Wolfram Sang:
"Bugfixes for I2C. Mostly core this time which is a bit unusual but
nothing really scary in there"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: piix4: Avoid race conditions with IMC
i2c: fix spelling mistake: "insufficent" -> "insufficient"
i2c: print correct device invalid address
i2c: do not enable fall back to Host Notify by default
i2c: fix kernel memory disclosure in dev interface
Pull x86 fixes from Ingo Molnar:
"Misc fixes:
- unwinder fixes
- AMD CPU topology enumeration fixes
- microcode loader fixes
- x86 embedded platform fixes
- fix for a bootup crash that may trigger when clearcpuid= is used
with invalid values"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mpx: Use compatible types in comparison to fix sparse error
x86/tsc: Add the Intel Denverton Processor to native_calibrate_tsc()
x86/entry: Fix the end of the stack for newly forked tasks
x86/unwind: Include __schedule() in stack traces
x86/unwind: Disable KASAN checks for non-current tasks
x86/unwind: Silence warnings for non-current tasks
x86/microcode/intel: Use correct buffer size for saving microcode data
x86/microcode/intel: Fix allocation size of struct ucode_patch
x86/microcode/intel: Add a helper which gives the microcode revision
x86/microcode: Use native CPUID to tickle out microcode revision
x86/CPU: Add native CPUID variants returning a single datum
x86/boot: Add missing declaration of string functions
x86/CPU/AMD: Fix Bulldozer topology
x86/platform/intel-mid: Rename 'spidev' to 'mrfld_spidev'
x86/cpu: Fix typo in the comment for Anniedale
x86/cpu: Fix bootup crashes by sanitizing the argument of the 'clearcpuid=' command-line option
Pull NOHZ fix from Ingo Molnar:
"This fixes an old NOHZ race where we incorrectly calculate the next
timer interrupt in certain circumstances where hrtimers are pending,
that can cause hard to reproduce stalled-values artifacts in
/proc/stat"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
nohz: Fix collision between tick and other hrtimers
Pull perf fixes from Ingo Molnar:
"Misc race fixes uncovered by fuzzing efforts, a Sparse fix, two PMU
driver fixes, plus miscellanous tooling fixes"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86: Reject non sampling events with precise_ip
perf/x86/intel: Account interrupts for PEBS errors
perf/core: Fix concurrent sys_perf_event_open() vs. 'move_group' race
perf/core: Fix sys_perf_event_open() vs. hotplug
perf/x86/intel: Use ULL constant to prevent undefined shift behaviour
perf/x86/intel/uncore: Fix hardcoded socket 0 assumption in the Haswell init code
perf/x86: Set pmu->module in Intel PMU modules
perf probe: Fix to probe on gcc generated symbols for offline kernel
perf probe: Fix --funcs to show correct symbols for offline module
perf symbols: Robustify reading of build-id from sysfs
perf tools: Install tools/lib/traceevent plugins with install-bin
tools lib traceevent: Fix prev/next_prio for deadline tasks
perf record: Fix --switch-output documentation and comment
perf record: Make __record_options static
tools lib subcmd: Add OPT_STRING_OPTARG_SET option
perf probe: Fix to get correct modname from elf header
samples/bpf trace_output_user: Remove duplicate sys/ioctl.h include
samples/bpf sock_example: Avoid getting ethhdr from two includes
perf sched timehist: Show total scheduling time
Pull EFI fixes from Ingo Molnar:
"A number of regression fixes:
- Fix a boot hang on machines that have somewhat unusual memory map
entries of phys_addr=0x0 num_pages=0, which broke due to a recent
commit. This commit got cherry-picked from the v4.11 queue because
the bug is affecting real machines.
- Fix a boot hang also reported by KASAN, caused by incorrect init
ordering introduced by a recent optimization.
- Fix a recent robustification fix to allocate_new_fdt_and_exit_boot()
that introduced an invalid assumption. Neither bugs were seen in
the wild AFAIK"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/x86: Prune invalid memory map entries and fix boot regression
x86/efi: Don't allocate memmap through memblock after mm_init()
efi/libstub/arm*: Pass latest memory map to the kernel
Some drivers do depend on page mappings to be page aligned.
Swiotlb already enforces such alignment for mappings greater than page,
extend that to page-sized mappings as well.
Without this fix, nvme hits BUG() in nvme_setup_prps(), because that routine
assumes page-aligned mappings.
Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Konrad Rzeszutek Wilk <konrad@kernel.org>
The current preemptible RCU implementation goes through three phases
during bootup. In the first phase, there is only one CPU that is running
with preemption disabled, so that a no-op is a synchronous grace period.
In the second mid-boot phase, the scheduler is running, but RCU has
not yet gotten its kthreads spawned (and, for expedited grace periods,
workqueues are not yet running. During this time, any attempt to do
a synchronous grace period will hang the system (or complain bitterly,
depending). In the third and final phase, RCU is fully operational and
everything works normally.
This has been OK for some time, but there has recently been some
synchronous grace periods showing up during the second mid-boot phase.
This code worked "by accident" for awhile, but started failing as soon
as expedited RCU grace periods switched over to workqueues in commit
8b355e3bc1 ("rcu: Drive expedited grace periods from workqueue").
Note that the code was buggy even before this commit, as it was subject
to failure on real-time systems that forced all expedited grace periods
to run as normal grace periods (for example, using the rcu_normal ksysfs
parameter). The callchain from the failure case is as follows:
early_amd_iommu_init()
|-> acpi_put_table(ivrs_base);
|-> acpi_tb_put_table(table_desc);
|-> acpi_tb_invalidate_table(table_desc);
|-> acpi_tb_release_table(...)
|-> acpi_os_unmap_memory
|-> acpi_os_unmap_iomem
|-> acpi_os_map_cleanup
|-> synchronize_rcu_expedited
The kernel showing this callchain was built with CONFIG_PREEMPT_RCU=y,
which caused the code to try using workqueues before they were
initialized, which did not go well.
This commit therefore reworks RCU to permit synchronous grace periods
to proceed during this mid-boot phase. This commit is therefore a
fix to a regression introduced in v4.9, and is therefore being put
forward post-merge-window in v4.10.
This commit sets a flag from the existing rcu_scheduler_starting()
function which causes all synchronous grace periods to take the expedited
path. The expedited path now checks this flag, using the requesting task
to drive the expedited grace period forward during the mid-boot phase.
Finally, this flag is updated by a core_initcall() function named
rcu_exp_runtime_mode(), which causes the runtime codepaths to be used.
Note that this arrangement assumes that tasks are not sent POSIX signals
(or anything similar) from the time that the first task is spawned
through core_initcall() time.
Fixes: 8b355e3bc1 ("rcu: Drive expedited grace periods from workqueue")
Reported-by: "Zheng, Lv" <lv.zheng@intel.com>
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: Stan Kain <stan.kain@gmail.com>
Tested-by: Ivan <waffolz@hotmail.com>
Tested-by: Emanuel Castelo <emanuel.castelo@gmail.com>
Tested-by: Bruno Pesavento <bpesavento@infinito.it>
Tested-by: Borislav Petkov <bp@suse.de>
Tested-by: Frederic Bezies <fredbezies@gmail.com>
Cc: <stable@vger.kernel.org> # 4.9.0-
It is now legal to invoke synchronize_sched() at early boot, which causes
Tiny RCU's synchronize_sched() to emit spurious splats. This commit
therefore removes the cond_resched() from Tiny RCU's synchronize_sched().
Fixes: 8b355e3bc1 ("rcu: Drive expedited grace periods from workqueue")
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: <stable@vger.kernel.org> # 4.9.0-
Pull vfs fixes from Al Viro.
The most notable fix here is probably the fix for a splice regression
("fix a fencepost error in pipe_advance()") noticed by Alan Wylie.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fix a fencepost error in pipe_advance()
coredump: Ensure proper size of sparse core files
aio: fix lock dep warning
tmpfs: clear S_ISGID when setting posix ACLs
Pull block fixes from Jens Axboe:
- the virtio_blk stack DMA corruption fix from Christoph, fixing and
issue with VMAP stacks.
- O_DIRECT blkbits calculation fix from Chandan.
- discard regression fix from Christoph.
- queue init error handling fixes for nbd and virtio_blk, from Omar and
Jeff.
- two small nvme fixes, from Christoph and Guilherme.
- rename of blk_queue_zone_size and bdev_zone_size to _sectors instead,
to more closely follow what we do in other places in the block layer.
This interface is new for this series, so let's get the naming right
before releasing a kernel with this feature. From Damien.
* 'for-linus' of git://git.kernel.dk/linux-block:
block: don't try to discard from __blkdev_issue_zeroout
sd: remove __data_len hack for WRITE SAME
nvme: use blk_rq_payload_bytes
scsi: use blk_rq_payload_bytes
block: add blk_rq_payload_bytes
block: Rename blk_queue_zone_size and bdev_zone_size
nvme: apply DELAY_BEFORE_CHK_RDY quirk at probe time too
nvme-rdma: fix nvme_rdma_queue_is_ready
virtio_blk: fix panic in initialization error path
nbd: blk_mq_init_queue returns an error code on failure, not NULL
virtio_blk: avoid DMA to stack for the sense buffer
do_direct_IO: Use inode->i_blkbits to compute block count to be cleaned
The logics in pipe_advance() used to release all buffers past the new
position failed in cases when the number of buffers to release was equal
to pipe->buffers. If that happened, none of them had been released,
leaving pipe full. Worse, it was trivial to trigger and we end up with
pipe full of uninitialized pages. IOW, it's an infoleak.
Cc: stable@vger.kernel.org # v4.9
Reported-by: "Alan J. Wylie" <alan@wylie.me.uk>
Tested-by: "Alan J. Wylie" <alan@wylie.me.uk>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
If the last section of a core file ends with an unmapped or zero page,
the size of the file does not correspond with the last dump_skip() call.
gdb complains that the file is truncated and can be confusing to users.
After all of the vma sections are written, make sure that the file size
is no smaller than the current file position.
This problem can be demonstrated with gdb's bigcore testcase on the
sparc architecture.
Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
dmaengine fixes for 4.10-rc4
The fixes this time around are spread over drivers, pretty normal update.
o PCI ID for SKL ioatdma, workaround for SKX and ioat_alloc_chan_resources
sleepy allocation fix.
o dw kconfig typo fix
o null pointer deref for stm32
o MAINTAINERS Update for at_hdmac
o pl330 runtime pm fixes
o omap-dma port window fix
o rcar-dmac unmap slave resource fix.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJYekAeAAoJEHwUBw8lI4NHyWAQAMQxH4dOms2p4PuM65HCwJFW
BvEsljmvn+XmlqwhaMxrwGcyMPUpVSkQUkMQZYa9hMAH4fwWLIAemxdHYR9WpXL4
YsVz9pAgDs6G3XbunhMA6mM5vyR18uKkzUaLWtU/z4nw45RWe7NkG9Gw7k2gWKfK
rNYIOYts4Ue3sPXsYjYn4mdHkWslFcaKuxxRZGpKAmF/V7i7m5xg6WVEykVRK7EL
0e944zdAwmRkc3dcW+SWmZqbZwp+Sjj0Vn5kCbTEoItUQN40VnxOK25ZN+YAxpxg
28pljjTfo0LRW0rgNUGulFhg0ssMns4J3gArfqbzRvDWOFCH6MZ1e1CwbLmZLF9U
rK/escqsKuXFG76EstdZzfjq0H15vfF1backq23ww6WXr9w4TAUwUJR7mwShVP+O
9WTkfjwlXEnUoixutfsfTr7YSdztCBXB34BLJxNKItl5Mm7N923BTPW06e9J7S61
Fiv8E85U4WnMe1BLBPNK550Zz6YUAtqfBeYBFWZExury9jSX5tIjeoEbaxDO+FEJ
fvWcjFoHZ2nQfigyafaoEIkpvjnTgveSs39tPWhTat/pucCb7za/aiA7syR9oGF2
kRa7++5TKt7q+hr1KOzvK3TNgI8fVItBd0Asi0LejIdJa4OoxAPNa7EuRLNh+vZG
YQoKllPDYToI/RowjGKW
=pJ7d
-----END PGP SIGNATURE-----
Merge tag 'dmaengine-fix-4.10-rc4' of git://git.infradead.org/users/vkoul/slave-dma
Pull dmaengine fixes from Vinod Koul:
"The fixes this time around are spread over drivers, pretty normal
update:
- PCI ID for SKL ioatdma, workaround for SKX and
ioat_alloc_chan_resources sleepy allocation fix
- dw kconfig typo fix
- null pointer deref for stm32
- MAINTAINERS Update for at_hdmac
- pl330 runtime pm fixes
- omap-dma port window fix
- rcar-dmac unmap slave resource fix"
* tag 'dmaengine-fix-4.10-rc4' of git://git.infradead.org/users/vkoul/slave-dma:
dmaengine: rcar-dmac: unmap slave resource when channel is freed
dmaengine: omap-dma: Fix the port_window support
dmaengine: iota: ioat_alloc_chan_resources should not perform sleeping allocations.
dmaengine: pl330: Fix runtime PM support for terminated transfers
MAINTAINERS: dmaengine: Update + Hand over the at_hdmac driver to Ludovic
dmaengine: omap-dma: Fix dynamic lch_map allocation
dmaengine: ti-dma-crossbar: Add some 'of_node_put()' in error path.
dmaengine: stm32-dma: Fix null pointer dereference in stm32_dma_tx_status
dmaengine: stm32-dma: Set correct args number for DMA request from DT
dmaengine: dw: fix typo in Kconfig
dmaengine: ioatdma: workaround SKX ioatdma version
dmaengine: ioatdma: Add Skylake PCI Dev ID
Some machines, such as the Lenovo ThinkPad W541 with firmware GNET80WW
(2.28), include memory map entries with phys_addr=0x0 and num_pages=0.
These machines fail to boot after the following commit,
commit 8e80632fb2 ("efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()")
Fix this by removing such bogus entries from the memory map.
Furthermore, currently the log output for this case (with efi=debug)
looks like:
[ 0.000000] efi: mem45: [Reserved | | | | | | | | | | | | ] range=[0x0000000000000000-0xffffffffffffffff] (0MB)
This is clearly wrong, and also not as informative as it could be. This
patch changes it so that if we find obviously invalid memory map
entries, we print an error and skip those entries. It also detects the
display of the address range calculation overflow, so the new output is:
[ 0.000000] efi: [Firmware Bug]: Invalid EFI memory map entries:
[ 0.000000] efi: mem45: [Reserved | | | | | | | | | | | | ] range=[0x0000000000000000-0x0000000000000000] (invalid)
It also detects memory map sizes that would overflow the physical
address, for example phys_addr=0xfffffffffffff000 and
num_pages=0x0200000000000001, and prints:
[ 0.000000] efi: [Firmware Bug]: Invalid EFI memory map entries:
[ 0.000000] efi: mem45: [Reserved | | | | | | | | | | | | ] range=[phys_addr=0xfffffffffffff000-0x20ffffffffffffffff] (invalid)
It then removes these entries from the memory map.
Signed-off-by: Peter Jones <pjones@redhat.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[ardb: refactor for clarity with no functional changes, avoid PAGE_SHIFT]
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
[Matt: Include bugzilla info in commit log]
Cc: <stable@vger.kernel.org> # v4.9+
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=191121
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This reverts commit 6751667a29.
Rob Herring objected to it, and a replacement for it will be added using
debugfs in the future.
Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
Reported-by: Rob Herring <robh@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As Peter suggested [1] rejecting non sampling PEBS events,
because they dont make any sense and could cause bugs
in the NMI handler [2].
[1] http://lkml.kernel.org/r/20170103094059.GC3093@worktop
[2] http://lkml.kernel.org/r/1482931866-6018-3-git-send-email-jolsa@kernel.org
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vince@deater.net>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/20170103142454.GA26251@krava
Signed-off-by: Ingo Molnar <mingo@kernel.org>
It's possible to set up PEBS events to get only errors and not
any data, like on SNB-X (model 45) and IVB-EP (model 62)
via 2 perf commands running simultaneously:
taskset -c 1 ./perf record -c 4 -e branches:pp -j any -C 10
This leads to a soft lock up, because the error path of the
intel_pmu_drain_pebs_nhm() does not account event->hw.interrupt
for error PEBS interrupts, so in case you're getting ONLY
errors you don't have a way to stop the event when it's over
the max_samples_per_tick limit:
NMI watchdog: BUG: soft lockup - CPU#22 stuck for 22s! [perf_fuzzer:5816]
...
RIP: 0010:[<ffffffff81159232>] [<ffffffff81159232>] smp_call_function_single+0xe2/0x140
...
Call Trace:
? trace_hardirqs_on_caller+0xf5/0x1b0
? perf_cgroup_attach+0x70/0x70
perf_install_in_context+0x199/0x1b0
? ctx_resched+0x90/0x90
SYSC_perf_event_open+0x641/0xf90
SyS_perf_event_open+0x9/0x10
do_syscall_64+0x6c/0x1f0
entry_SYSCALL64_slow_path+0x25/0x25
Add perf_event_account_interrupt() which does the interrupt
and frequency checks and call it from intel_pmu_drain_pebs_nhm()'s
error path.
We keep the pending_kill and pending_wakeup logic only in the
__perf_event_overflow() path, because they make sense only if
there's any data to deliver.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vince@deater.net>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1482931866-6018-2-git-send-email-jolsa@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Di Shen reported a race between two concurrent sys_perf_event_open()
calls where both try and move the same pre-existing software group
into a hardware context.
The problem is exactly that described in commit:
f63a8daa58 ("perf: Fix event->ctx locking")
... where, while we wait for a ctx->mutex acquisition, the event->ctx
relation can have changed under us.
That very same commit failed to recognise sys_perf_event_context() as an
external access vector to the events and thereby didn't apply the
established locking rules correctly.
So while one sys_perf_event_open() call is stuck waiting on
mutex_lock_double(), the other (which owns said locks) moves the group
about. So by the time the former sys_perf_event_open() acquires the
locks, the context we've acquired is stale (and possibly dead).
Apply the established locking rules as per perf_event_ctx_lock_nested()
to the mutex_lock_double() for the 'move_group' case. This obviously means
we need to validate state after we acquire the locks.
Reported-by: Di Shen (Keen Lab)
Tested-by: John Dias <joaodias@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Min Chong <mchong@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Fixes: f63a8daa58 ("perf: Fix event->ctx locking")
Link: http://lkml.kernel.org/r/20170106131444.GZ3174@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
There is problem with installing an event in a task that is 'stuck' on
an offline CPU.
Blocked tasks are not dis-assosciated from offlined CPUs, after all, a
blocked task doesn't run and doesn't require a CPU etc.. Only on
wakeup do we ammend the situation and place the task on a available
CPU.
If we hit such a task with perf_install_in_context() we'll loop until
either that task wakes up or the CPU comes back online, if the task
waking depends on the event being installed, we're stuck.
While looking into this issue, I also spotted another problem, if we
hit a task with perf_install_in_context() that is in the middle of
being migrated, that is we observe the old CPU before sending the IPI,
but run the IPI (on the old CPU) while the task is already running on
the new CPU, things also go sideways.
Rework things to rely on task_curr() -- outside of rq->lock -- which
is rather tricky. Imagine the following scenario where we're trying to
install the first event into our task 't':
CPU0 CPU1 CPU2
(current == t)
t->perf_event_ctxp[] = ctx;
smp_mb();
cpu = task_cpu(t);
switch(t, n);
migrate(t, 2);
switch(p, t);
ctx = t->perf_event_ctxp[]; // must not be NULL
smp_function_call(cpu, ..);
generic_exec_single()
func();
spin_lock(ctx->lock);
if (task_curr(t)) // false
add_event_to_ctx();
spin_unlock(ctx->lock);
perf_event_context_sched_in();
spin_lock(ctx->lock);
// sees event
So its CPU0's store of t->perf_event_ctxp[] that must not go 'missing'.
Because if CPU2's load of that variable were to observe NULL, it would
not try to schedule the ctx and we'd have a task running without its
counter, which would be 'bad'.
As long as we observe !NULL, we'll acquire ctx->lock. If we acquire it
first and not see the event yet, then CPU0 must observe task_curr()
and retry. If the install happens first, then we must see the event on
sched-in and all is well.
I think we can translate the first part (until the 'must not be NULL')
of the scenario to a litmus test like:
C C-peterz
{
}
P0(int *x, int *y)
{
int r1;
WRITE_ONCE(*x, 1);
smp_mb();
r1 = READ_ONCE(*y);
}
P1(int *y, int *z)
{
WRITE_ONCE(*y, 1);
smp_store_release(z, 1);
}
P2(int *x, int *z)
{
int r1;
int r2;
r1 = smp_load_acquire(z);
smp_mb();
r2 = READ_ONCE(*x);
}
exists
(0:r1=0 /\ 2:r1=1 /\ 2:r2=0)
Where:
x is perf_event_ctxp[],
y is our tasks's CPU, and
z is our task being placed on the rq of CPU2.
The P0 smp_mb() is the one added by this patch, ordering the store to
perf_event_ctxp[] from find_get_context() and the load of task_cpu()
in task_function_call().
The smp_store_release/smp_load_acquire model the RCpc locking of the
rq->lock and the smp_mb() of P2 is the context switch switching from
whatever CPU2 was running to our task 't'.
This litmus test evaluates into:
Test C-peterz Allowed
States 7
0:r1=0; 2:r1=0; 2:r2=0;
0:r1=0; 2:r1=0; 2:r2=1;
0:r1=0; 2:r1=1; 2:r2=1;
0:r1=1; 2:r1=0; 2:r2=0;
0:r1=1; 2:r1=0; 2:r2=1;
0:r1=1; 2:r1=1; 2:r2=0;
0:r1=1; 2:r1=1; 2:r2=1;
No
Witnesses
Positive: 0 Negative: 7
Condition exists (0:r1=0 /\ 2:r1=1 /\ 2:r2=0)
Observation C-peterz Never 0 7
Hash=e427f41d9146b2a5445101d3e2fcaa34
And the strong and weak model agree.
Reported-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: Will Deacon <will.deacon@arm.com>
Cc: jeremy.linton@arm.com
Link: http://lkml.kernel.org/r/20161209135900.GU3174@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
info->si_addr is of type void __user *, so it should be compared against
something from the same address space.
This fixes the following sparse error:
arch/x86/mm/mpx.c:296:27: error: incompatible types in comparison expression (different address spaces)
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The Intel Denverton microserver uses a 25 MHz TSC crystal,
so we can derive its exact [*] TSC frequency
using CPUID and some arithmetic, eg.:
TSC: 1800 MHz (25000000 Hz * 216 / 3 / 1000000)
[*] 'exact' is only as good as the crystal, which should be +/- 20ppm
Signed-off-by: Len Brown <len.brown@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/306899f94804aece6d8fa8b4223ede3b48dbb59c.1484287748.git.len.brown@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull btrfs fixes from Chris Mason:
"These are all over the place.
The tracepoint part of the pull fixes a crash and adds a little more
information to two tracepoints, while the rest are good old fashioned
fixes"
* 'for-linus-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
btrfs: make tracepoint format strings more compact
Btrfs: add truncated_len for ordered extent tracepoints
Btrfs: add 'inode' for extent map tracepoint
btrfs: fix crash when tracepoint arguments are freed by wq callbacks
Btrfs: adjust outstanding_extents counter properly when dio write is split
Btrfs: fix lockdep warning about log_mutex
Btrfs: use down_read_nested to make lockdep silent
btrfs: fix locking when we put back a delayed ref that's too new
btrfs: fix error handling when run_delayed_extent_op fails
btrfs: return the actual error value from from btrfs_uuid_tree_iterate
window.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQEcBAABCAAGBQJYeQymAAoJEEp/3jgCEfOLLVsH/28qRsjVPWr5JuL1SF86//kd
rAi7QUfbNgXHqbb10a9za9pNuLhHr3kImIfvQ04wYiYQY+IaAapiRXwQev8BsNAa
yENUc8XwNgydw4FU1ia5PkGOJLDtujtfgjWT2v+gf1HUzLaV6alBzqDwUZBt3xJz
mlYC82oFkXPa0BFmLUXtT/jJu/ZI8caO4KB34/UKi7LjBQk1ca7E2xVUoDtdQmEm
ciPE98akU4JiB99aOgGdwemBzkAMHEGQpImTzqHr/tbIUj0MqVAjH9FVOhRCbjMy
6MSR+U9yUzJkBzefS5enijAoExVc8cD/A0nIaKGVb6qWrIrk51/Opl6iILeVLUo=
=28cq
-----END PGP SIGNATURE-----
Merge tag 'ceph-for-4.10-rc4' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"Two small fixups for the filesystem changes that went into this merge
window"
* tag 'ceph-for-4.10-rc4' of git://github.com/ceph/ceph-client:
ceph: fix get_oldest_context()
ceph: fix mds cluster availability check