Nadav reported parallel MADV_DONTNEED on same range has a stale TLB
problem and Mel fixed it[1] and found same problem on MADV_FREE[2].
Quote from Mel Gorman:
"The race in question is CPU 0 running madv_free and updating some PTEs
while CPU 1 is also running madv_free and looking at the same PTEs.
CPU 1 may have writable TLB entries for a page but fail the pte_dirty
check (because CPU 0 has updated it already) and potentially fail to
flush.
Hence, when madv_free on CPU 1 returns, there are still potentially
writable TLB entries and the underlying PTE is still present so that a
subsequent write does not necessarily propagate the dirty bit to the
underlying PTE any more. Reclaim at some unknown time at the future
may then see that the PTE is still clean and discard the page even
though a write has happened in the meantime. I think this is possible
but I could have missed some protection in madv_free that prevents it
happening."
This patch aims for solving both problems all at once and is ready for
other problem with KSM, MADV_FREE and soft-dirty story[3].
TLB batch API(tlb_[gather|finish]_mmu] uses [inc|dec]_tlb_flush_pending
and mmu_tlb_flush_pending so that when tlb_finish_mmu is called, we can
catch there are parallel threads going on. In that case, forcefully,
flush TLB to prevent for user to access memory via stale TLB entry
although it fail to gather page table entry.
I confirmed this patch works with [4] test program Nadav gave so this
patch supersedes "mm: Always flush VMA ranges affected by zap_page_range
v2" in current mmotm.
NOTE:
This patch modifies arch-specific TLB gathering interface(x86, ia64,
s390, sh, um). It seems most of architecture are straightforward but
s390 need to be careful because tlb_flush_mmu works only if
mm->context.flush_mm is set to non-zero which happens only a pte entry
really is cleared by ptep_get_and_clear and friends. However, this
problem never changes the pte entries but need to flush to prevent
memory access from stale tlb.
[1] http://lkml.kernel.org/r/20170725101230.5v7gvnjmcnkzzql3@techsingularity.net
[2] http://lkml.kernel.org/r/20170725100722.2dxnmgypmwnrfawp@suse.de
[3] http://lkml.kernel.org/r/BD3A0EBE-ECF4-41D4-87FA-C755EA9AB6BD@gmail.com
[4] https://patchwork.kernel.org/patch/9861621/
[minchan@kernel.org: decrease tlb flush pending count in tlb_finish_mmu]
Link: http://lkml.kernel.org/r/20170808080821.GA31730@bbox
Link: http://lkml.kernel.org/r/20170802000818.4760-7-namit@vmware.com
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
Reported-by: Nadav Amit <namit@vmware.com>
Reported-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch is a preparatory patch for solving race problems caused by
TLB batch. For that, we will increase/decrease TLB flush pending count
of mm_struct whenever tlb_[gather|finish]_mmu is called.
Before making it simple, this patch separates architecture specific part
and rename it to arch_tlb_[gather|finish]_mmu and generic part just
calls it.
It shouldn't change any behavior.
Link: http://lkml.kernel.org/r/20170802000818.4760-5-namit@vmware.com
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The send call ignores unknown flags. Legacy applications may already
unwittingly pass MSG_ZEROCOPY. Continue to ignore this flag unless a
socket opts in to zerocopy.
Introduce socket option SO_ZEROCOPY to enable MSG_ZEROCOPY processing.
Processes can also query this socket option to detect kernel support
for the feature. Older kernels will return ENOPROTOOPT.
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
struct siginfo is a union and the kernel since 2.4 has been hiding a union
tag in the high 16bits of si_code using the values:
__SI_KILL
__SI_TIMER
__SI_POLL
__SI_FAULT
__SI_CHLD
__SI_RT
__SI_MESGQ
__SI_SYS
While this looks plausible on the surface, in practice this situation has
not worked well.
- Injected positive signals are not copied to user space properly
unless they have these magic high bits set.
- Injected positive signals are not reported properly by signalfd
unless they have these magic high bits set.
- These kernel internal values leaked to userspace via ptrace_peek_siginfo
- It was possible to inject these kernel internal values and cause the
the kernel to misbehave.
- Kernel developers got confused and expected these kernel internal values
in userspace in kernel self tests.
- Kernel developers got confused and set si_code to __SI_FAULT which
is SI_USER in userspace which causes userspace to think an ordinary user
sent the signal and that it was not kernel generated.
- The values make it impossible to reorganize the code to transform
siginfo_copy_to_user into a plain copy_to_user. As si_code must
be massaged before being passed to userspace.
So remove these kernel internal si codes and make the kernel code simpler
and more maintainable.
To replace these kernel internal magic si_codes introduce the helper
function siginfo_layout, that takes a signal number and an si_code and
computes which union member of siginfo is being used. Have
siginfo_layout return an enumeration so that gcc will have enough
information to warn if a switch statement does not handle all of union
members.
A couple of architectures have a messed up ABI that defines signal
specific duplications of SI_USER which causes more special cases in
siginfo_layout than I would like. The good news is only problem
architectures pay the cost.
Update all of the code that used the previous magic __SI_ values to
use the new SIL_ values and to call siginfo_layout to get those
values. Escept where not all of the cases are handled remove the
defaults in the switch statements so that if a new case is missed in
the future the lack will show up at compile time.
Modify the code that copies siginfo si_code to userspace to just copy
the value and not cast si_code to a short first. The high bits are no
longer used to hold a magic union member.
Fixup the siginfo header files to stop including the __SI_ values in
their constants and for the headers that were missing it to properly
update the number of si_codes for each signal type.
The fixes to copy_siginfo_from_user32 implementations has the
interesting property that several of them perviously should never have
worked as the __SI_ values they depended up where kernel internal.
With that dependency gone those implementations should work much
better.
The idea of not passing the __SI_ values out to userspace and then
not reinserting them has been tested with criu and criu worked without
changes.
Ref: 2.4.0-test1
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Setting si_code to __SI_FAULT results in a userspace seeing
an si_code of 0. This is the same si_code as SI_USER. Posix
and common sense requires that SI_USER not be a signal specific
si_code. As such this use of 0 for the si_code is a pretty
horribly broken ABI.
Given that ia64 is on it's last legs I don't know that it is worth
fixing this, but it is worth documenting what is going on so that
no one decides to copy this bad decision.
This was introduced in 2.3.51 so this mess has had a long time for
people to be able to start depending on it.
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: linux-ia64@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
The SME patches we are about to apply add some E820 logic, so merge in
pending E820 code changes first, to have a single code base.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull uacess-unaligned removal from Al Viro:
"That stuff had just one user, and an exotic one, at that - binfmt_flat
on arm and m68k"
* 'work.uaccess-unaligned' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
kill {__,}{get,put}_user_unaligned()
binfmt_flat: flat_{get,put}_addr_from_rp() should be able to fail
"generic-y += kvm_para.h" is doubled in asm/Kbuild and uapi/asm/Kbuild.
The one in the former should be simply removed because kvm_para.h is
exported.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Pull user access str* updates from Al Viro:
"uaccess str...() dead code removal"
* 'uaccess.strlen' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
s390 keyboard.c: don't open-code strndup_user()
mips: get rid of unused __strnlen_user()
get rid of unused __strncpy_from_user() instances
kill strlen_user()
In this new subsystem we'll try to properly maintain all the generic
code related to dma-mapping, and will further consolidate arch code
into common helpers.
This pull request contains:
- removal of the DMA_ERROR_CODE macro, replacing it with calls
to ->mapping_error so that the dma_map_ops instances are
more self contained and can be shared across architectures (me)
- removal of the ->set_dma_mask method, which duplicates the
->dma_capable one in terms of functionality, but requires more
duplicate code.
- various updates for the coherent dma pool and related arm code
(Vladimir)
- various smaller cleanups (me)
-----BEGIN PGP SIGNATURE-----
iQI/BAABCAApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAlldmw0LHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYOiKA/+Ln1mFLSf3nfTzIHa24Bbk8ZTGr0B8TD4Vmyyt8iG
oO3AeaTLn3d6ugbH/uih/tPz8PuyXsdiTC1rI/ejDMiwMTSjW6phSiIHGcStSR9X
VFNhmMFacp7QpUpvxceV0XZYKDViAoQgHeGdp3l+K5h/v4AYePV/v/5RjQPaEyOh
YLbCzETO+24mRWdJxdAqtTW4ovYhzj6XsiJ+pAjlV0+SWU6m5L5E+VAPNi1vqv1H
1O2KeCFvVYEpcnfL3qnkw2timcjmfCfeFAd9mCUAc8mSRBfs3QgDTKw3XdHdtRml
LU2WuA5cpMrOdBO4mVra2plo8E2szvpB1OZZXoKKdCpK3VGwVpVHcTvClK2Ks/3B
GDLieroEQNu2ZIUIdWXf/g2x6le3BcC9MmpkAhnGPqCZ7skaIBO5Cjpxm0zTJAPl
PPY3CMBBEktAvys6DcudOYGixNjKUuAm5lnfpcfTEklFdG0AjhdK/jZOplAFA6w4
LCiy0rGHM8ZbVAaFxbYoFCqgcjnv6EjSiqkJxVI4fu/Q7v9YXfdPnEmE0PJwCVo5
+i7aCLgrYshTdHr/F3e5EuofHN3TDHwXNJKGh/x97t+6tt326QMvDKX059Kxst7R
rFukGbrYvG8Y7yXwrSDbusl443ta0Ht7T1oL4YUoJTZp0nScAyEluDTmrH1JVCsT
R4o=
=0Fso
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-4.13' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping infrastructure from Christoph Hellwig:
"This is the first pull request for the new dma-mapping subsystem
In this new subsystem we'll try to properly maintain all the generic
code related to dma-mapping, and will further consolidate arch code
into common helpers.
This pull request contains:
- removal of the DMA_ERROR_CODE macro, replacing it with calls to
->mapping_error so that the dma_map_ops instances are more self
contained and can be shared across architectures (me)
- removal of the ->set_dma_mask method, which duplicates the
->dma_capable one in terms of functionality, but requires more
duplicate code.
- various updates for the coherent dma pool and related arm code
(Vladimir)
- various smaller cleanups (me)"
* tag 'dma-mapping-4.13' of git://git.infradead.org/users/hch/dma-mapping: (56 commits)
ARM: dma-mapping: Remove traces of NOMMU code
ARM: NOMMU: Set ARM_DMA_MEM_BUFFERABLE for M-class cpus
ARM: NOMMU: Introduce dma operations for noMMU
drivers: dma-mapping: allow dma_common_mmap() for NOMMU
drivers: dma-coherent: Introduce default DMA pool
drivers: dma-coherent: Account dma_pfn_offset when used with device tree
dma: Take into account dma_pfn_offset
dma-mapping: replace dmam_alloc_noncoherent with dmam_alloc_attrs
dma-mapping: remove dmam_free_noncoherent
crypto: qat - avoid an uninitialized variable warning
au1100fb: remove a bogus dma_free_nonconsistent call
MAINTAINERS: add entry for dma mapping helpers
powerpc: merge __dma_set_mask into dma_set_mask
dma-mapping: remove the set_dma_mask method
powerpc/cell: use the dma_supported method for ops switching
powerpc/cell: clean up fixed mapping dma_ops initialization
tile: remove dma_supported and mapping_error methods
xen-swiotlb: remove xen_swiotlb_set_dma_mask
arm: implement ->dma_supported instead of ->set_dma_mask
mips/loongson64: implement ->dma_supported instead of ->set_dma_mask
...
Pull networking updates from David Miller:
"Reasonably busy this cycle, but perhaps not as busy as in the 4.12
merge window:
1) Several optimizations for UDP processing under high load from
Paolo Abeni.
2) Support pacing internally in TCP when using the sch_fq packet
scheduler for this is not practical. From Eric Dumazet.
3) Support mutliple filter chains per qdisc, from Jiri Pirko.
4) Move to 1ms TCP timestamp clock, from Eric Dumazet.
5) Add batch dequeueing to vhost_net, from Jason Wang.
6) Flesh out more completely SCTP checksum offload support, from
Davide Caratti.
7) More plumbing of extended netlink ACKs, from David Ahern, Pablo
Neira Ayuso, and Matthias Schiffer.
8) Add devlink support to nfp driver, from Simon Horman.
9) Add RTM_F_FIB_MATCH flag to RTM_GETROUTE queries, from Roopa
Prabhu.
10) Add stack depth tracking to BPF verifier and use this information
in the various eBPF JITs. From Alexei Starovoitov.
11) Support XDP on qed device VFs, from Yuval Mintz.
12) Introduce BPF PROG ID for better introspection of installed BPF
programs. From Martin KaFai Lau.
13) Add bpf_set_hash helper for TC bpf programs, from Daniel Borkmann.
14) For loads, allow narrower accesses in bpf verifier checking, from
Yonghong Song.
15) Support MIPS in the BPF selftests and samples infrastructure, the
MIPS eBPF JIT will be merged in via the MIPS GIT tree. From David
Daney.
16) Support kernel based TLS, from Dave Watson and others.
17) Remove completely DST garbage collection, from Wei Wang.
18) Allow installing TCP MD5 rules using prefixes, from Ivan
Delalande.
19) Add XDP support to Intel i40e driver, from Björn Töpel
20) Add support for TC flower offload in nfp driver, from Simon
Horman, Pieter Jansen van Vuuren, Benjamin LaHaise, Jakub
Kicinski, and Bert van Leeuwen.
21) IPSEC offloading support in mlx5, from Ilan Tayari.
22) Add HW PTP support to macb driver, from Rafal Ozieblo.
23) Networking refcount_t conversions, From Elena Reshetova.
24) Add sock_ops support to BPF, from Lawrence Brako. This is useful
for tuning the TCP sockopt settings of a group of applications,
currently via CGROUPs"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1899 commits)
net: phy: dp83867: add workaround for incorrect RX_CTRL pin strap
dt-bindings: phy: dp83867: provide a workaround for incorrect RX_CTRL pin strap
cxgb4: Support for get_ts_info ethtool method
cxgb4: Add PTP Hardware Clock (PHC) support
cxgb4: time stamping interface for PTP
nfp: default to chained metadata prepend format
nfp: remove legacy MAC address lookup
nfp: improve order of interfaces in breakout mode
net: macb: remove extraneous return when MACB_EXT_DESC is defined
bpf: add missing break in for the TCP_BPF_SNDCWND_CLAMP case
bpf: fix return in load_bpf_file
mpls: fix rtm policy in mpls_getroute
net, ax25: convert ax25_cb.refcount from atomic_t to refcount_t
net, ax25: convert ax25_route.refcount from atomic_t to refcount_t
net, ax25: convert ax25_uid_assoc.refcount from atomic_t to refcount_t
net, sctp: convert sctp_ep_common.refcnt from atomic_t to refcount_t
net, sctp: convert sctp_transport.refcnt from atomic_t to refcount_t
net, sctp: convert sctp_chunk.refcnt from atomic_t to refcount_t
net, sctp: convert sctp_datamsg.refcnt from atomic_t to refcount_t
net, sctp: convert sctp_auth_bytes.refcnt from atomic_t to refcount_t
...
Now that lazy TLB suppresses all flush IPIs (as opposed to all but
the first), there's no need to leave_mm() when going idle.
This means we can get rid of the rcuidle hack in
switch_mm_irqs_off() and we can unexport leave_mm().
This also removes acpi_unlazy_tlb() from the x86 and ia64 headers,
since it has no callers any more.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Nadav Amit <nadav.amit@gmail.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/03c699cfd6021e467be650d6b73deaccfe4b4bd7.1498751203.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
around. Highlights include:
- Conversion of a bunch of security documentation into RST
- The conversion of the remaining DocBook templates by The Amazing
Mauro Machine. We can now drop the entire DocBook build chain.
- The usual collection of fixes and minor updates.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJZWkGAAAoJEI3ONVYwIuV6rf0P/0B3JTiVPKS/WUx53+jzbAi4
1BN7dmmuMxE1bWpgdEq+ac4aKxm07iAojuntuMj0qz/ZB1WARcmvEqqzI5i4wfq9
5MrLduLkyuWfr4MOPseKJ2VK83p8nkMOiO7jmnBsilu7fE4nF+5YY9j4cVaArfMy
cCQvAGjQzvej2eiWMGUSLHn4QFKh00aD7cwKyBVsJ08b27C9xL0J2LQyCDZ4yDgf
37/MH3puEd3HX/4qAwLonIxT3xrIrrbDturqLU7OSKcWTtGZNrYyTFbwR3RQtqWd
H8YZVg2Uyhzg9MYhkbQ2E5dEjUP4mkegcp6/JTINH++OOPpTbdTJgirTx7VTkSf1
+kL8t7+Ayxd0FH3+77GJ5RMj8LUK6rj5cZfU5nClFQKWXP9UL3IelQ3Nl+SpdM8v
ZAbR2KjKgH9KS6+cbIhgFYlvY+JgPkOVruwbIAc7wXVM3ibk1sWoBOFEujcbueWh
yDpQv3l1UX0CKr3jnevJoW26LtEbGFtC7gSKZ+3btyeSBpWFGlii42KNycEGwUW0
ezlwryDVHzyTUiKllNmkdK4v73mvPsZHEjgmme4afKAIiUilmcUF4XcqD86hISFT
t+UJLA/zEU+0sJe26o2nK6GNJzmo4oCtVyxfhRe26Ojs1n80xlYgnZRfuIYdd31Z
nwLBnwDCHAOyX91WXp9G
=cVjZ
-----END PGP SIGNATURE-----
Merge tag 'docs-4.13' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"There has been a fair amount of activity in the docs tree this time
around. Highlights include:
- Conversion of a bunch of security documentation into RST
- The conversion of the remaining DocBook templates by The Amazing
Mauro Machine. We can now drop the entire DocBook build chain.
- The usual collection of fixes and minor updates"
* tag 'docs-4.13' of git://git.lwn.net/linux: (90 commits)
scripts/kernel-doc: handle DECLARE_HASHTABLE
Documentation: atomic_ops.txt is core-api/atomic_ops.rst
Docs: clean up some DocBook loose ends
Make the main documentation title less Geocities
Docs: Use kernel-figure in vidioc-g-selection.rst
Docs: fix table problems in ras.rst
Docs: Fix breakage with Sphinx 1.5 and upper
Docs: Include the Latex "ifthen" package
doc/kokr/howto: Only send regression fixes after -rc1
docs-rst: fix broken links to dynamic-debug-howto in kernel-parameters
doc: Document suitability of IBM Verse for kernel development
Doc: fix a markup error in coding-style.rst
docs: driver-api: i2c: remove some outdated information
Documentation: DMA API: fix a typo in a function name
Docs: Insert missing space to separate link from text
doc/ko_KR/memory-barriers: Update control-dependencies example
Documentation, kbuild: fix typo "minimun" -> "minimum"
docs: Fix some formatting issues in request-key.rst
doc: ReSTify keys-trusted-encrypted.txt
doc: ReSTify keys-request-key.txt
...
Pull timer updates from Thomas Gleixner:
"A rather large update for timers/timekeeping:
- compat syscall consolidation (Al Viro)
- Posix timer consolidation (Christoph Helwig / Thomas Gleixner)
- Cleanup of the device tree based initialization for clockevents and
clocksources (Daniel Lezcano)
- Consolidation of the FTTMR010 clocksource/event driver (Linus
Walleij)
- The usual set of small fixes and updates all over the place"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (93 commits)
timers: Make the cpu base lock raw
clocksource/drivers/mips-gic-timer: Fix an error code in 'gic_clocksource_of_init()'
clocksource/drivers/fsl_ftm_timer: Unmap region obtained by of_iomap
clocksource/drivers/tcb_clksrc: Make IO endian agnostic
clocksource/drivers/sun4i: Switch to the timer-of common init
clocksource/drivers/timer-of: Fix invalid iomap check
Revert "ktime: Simplify ktime_compare implementation"
clocksource/drivers: Fix uninitialized variable use in timer_of_init
kselftests: timers: Add test for frequency step
kselftests: timers: Fix inconsistency-check to not ignore first timestamp
time: Add warning about imminent deprecation of CONFIG_GENERIC_TIME_VSYSCALL_OLD
time: Clean up CLOCK_MONOTONIC_RAW time handling
posix-cpu-timers: Make timespec to nsec conversion safe
itimer: Make timeval to nsec conversion range limited
timers: Fix parameter description of try_to_del_timer_sync()
ktime: Simplify ktime_compare implementation
clocksource/drivers/fttmr010: Factor out clock read code
clocksource/drivers/fttmr010: Implement delay timer
clocksource/drivers: Add timer-of common init routine
clocksource/drivers/tcb_clksrc: Save timer context on suspend/resume
...
The only user of thread_saved_pc() in non-arch-specific code was removed
in commit 8243d55977 ("sched/core: Remove pointless printout in
sched_show_task()"). Remove the implementations as well.
Some architectures use thread_saved_pc() in their arch-specific code.
Leave their thread_saved_pc() intact.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This adds the new getsockopt(2) option SO_PEERGROUPS on SOL_SOCKET to
retrieve the auxiliary groups of the remote peer. It is designed to
naturally extend SO_PEERCRED. That is, the underlying data is from the
same credentials. Regarding its syntax, it is based on SO_PEERSEC. That
is, if the provided buffer is too small, ERANGE is returned and @optlen
is updated. Otherwise, the information is copied, @optlen is set to the
actual size, and 0 is returned.
While SO_PEERCRED (and thus `struct ucred') already returns the primary
group, it lacks the auxiliary group vector. However, nearly all access
controls (including kernel side VFS and SYSVIPC, but also user-space
polkit, DBus, ...) consider the entire set of groups, rather than just
the primary group. But this is currently not possible with pure
SO_PEERCRED. Instead, user-space has to work around this and query the
system database for the auxiliary groups of a UID retrieved via
SO_PEERCRED.
Unfortunately, there is no race-free way to query the auxiliary groups
of the PID/UID retrieved via SO_PEERCRED. Hence, the current user-space
solution is to use getgrouplist(3p), which itself falls back to NSS and
whatever is configured in nsswitch.conf(3). This effectively checks
which groups we *would* assign to the user if it logged in *now*. On
normal systems it is as easy as reading /etc/group, but with NSS it can
resort to quering network databases (eg., LDAP), using IPC or network
communication.
Long story short: Whenever we want to use auxiliary groups for access
checks on IPC, we need further IPC to talk to the user/group databases,
rather than just relying on SO_PEERCRED and the incoming socket. This
is unfortunate, and might even result in dead-locks if the database
query uses the same IPC as the original request.
So far, those recursions / dead-locks have been avoided by using
primitive IPC for all crucial NSS modules. However, we want to avoid
re-inventing the wheel for each NSS module that might be involved in
user/group queries. Hence, we would preferably make DBus (and other IPC
that supports access-management based on groups) work without resorting
to the user/group database. This new SO_PEERGROUPS ioctl would allow us
to make dbus-daemon work without ever calling into NSS.
Cc: Michal Sekletar <msekleta@redhat.com>
Cc: Simon McVittie <simon.mcvittie@collabora.co.uk>
Reviewed-by: Tom Gundersen <teg@jklm.no>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
By moving the kernel side __SI_* defintions right next to the userspace
ones we can kill the non-uapi versions of <asm/siginfo.h> include
include/asm-generic/siginfo.h and untangle the unholy mess of includes.
[ tglx: Removed uapi/asm/siginfo.h from m32r, microblaze, mn10300 and score ]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-ia64@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: sparclinux@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>
Link: http://lkml.kernel.org/r/20170603190102.28866-6-hch@lst.de
Since ia64 defines __ARCH_SI_PREAMBLE_SIZE it can just use the generic
copy_siginfo implementation, which is identical to the architecture
specific one.
With that support for HAVE_ARCH_COPY_SIGINFO can go away entirely.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-ia64@vger.kernel.org
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: sparclinux@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>
Link: http://lkml.kernel.org/r/20170603190102.28866-3-hch@lst.de
A definition was only provided for asm-generic/socket.h
using platforms, define it for the others as well
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Mauro says:
This patch series convert the remaining DocBooks to ReST.
The first version was originally
send as 3 patch series:
[PATCH 00/36] Convert DocBook documents to ReST
[PATCH 0/5] Convert more books to ReST
[PATCH 00/13] Get rid of DocBook
The lsm book was added as if it were a text file under
Documentation. The plan is to merge it with another file
under Documentation/security, after both this series and
a security Documentation patch series gets merged.
It also adjusts some Sphinx-pedantic errors/warnings on
some kernel-doc markups.
I also added some patches here to add PDF output for all
existing ReST books.
Regularly, when a new header is created in include/uapi/, the developer
forgets to add it in the corresponding Kbuild file. This error is usually
detected after the release is out.
In fact, all headers under uapi directories should be exported, thus it's
useless to have an exhaustive list.
After this patch, the following files, which were not exported, are now
exported (with make headers_install_all):
asm-arc/kvm_para.h
asm-arc/ucontext.h
asm-blackfin/shmparam.h
asm-blackfin/ucontext.h
asm-c6x/shmparam.h
asm-c6x/ucontext.h
asm-cris/kvm_para.h
asm-h8300/shmparam.h
asm-h8300/ucontext.h
asm-hexagon/shmparam.h
asm-m32r/kvm_para.h
asm-m68k/kvm_para.h
asm-m68k/shmparam.h
asm-metag/kvm_para.h
asm-metag/shmparam.h
asm-metag/ucontext.h
asm-mips/hwcap.h
asm-mips/reg.h
asm-mips/ucontext.h
asm-nios2/kvm_para.h
asm-nios2/ucontext.h
asm-openrisc/shmparam.h
asm-parisc/kvm_para.h
asm-powerpc/perf_regs.h
asm-sh/kvm_para.h
asm-sh/ucontext.h
asm-tile/shmparam.h
asm-unicore32/shmparam.h
asm-unicore32/ucontext.h
asm-x86/hwcap2.h
asm-xtensa/kvm_para.h
drm/armada_drm.h
drm/etnaviv_drm.h
drm/vgem_drm.h
linux/aspeed-lpc-ctrl.h
linux/auto_dev-ioctl.h
linux/bcache.h
linux/btrfs_tree.h
linux/can/vxcan.h
linux/cifs/cifs_mount.h
linux/coresight-stm.h
linux/cryptouser.h
linux/fsmap.h
linux/genwqe/genwqe_card.h
linux/hash_info.h
linux/kcm.h
linux/kcov.h
linux/kfd_ioctl.h
linux/lightnvm.h
linux/module.h
linux/nbd-netlink.h
linux/nilfs2_api.h
linux/nilfs2_ondisk.h
linux/nsfs.h
linux/pr.h
linux/qrtr.h
linux/rpmsg.h
linux/sched/types.h
linux/sed-opal.h
linux/smc.h
linux/smc_diag.h
linux/stm.h
linux/switchtec_ioctl.h
linux/vfio_ccw.h
linux/wil6210_uapi.h
rdma/bnxt_re-abi.h
Note that I have removed from this list the files which are generated in every
exported directories (like .install or .install.cmd).
Thanks to Julien Floret <julien.floret@6wind.com> for the tip to get all
subdirs with a pure makefile command.
For the record, note that exported files for asm directories are a mix of
files listed by:
- include/uapi/asm-generic/Kbuild.asm;
- arch/<arch>/include/uapi/asm/Kbuild;
- arch/<arch>/include/asm/Kbuild.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Acked-by: Mark Salter <msalter@redhat.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJZEHmsAAoJEFmIoMA60/r88SgQAJbFddueb0+DfJ+USDud4b/Z
akfS+G1UAm+TgtMyh1wM49dHzFssp36uWJxtWI+bPqBzuy94PMCbz7JVUV28gX9G
tFhFuc5YH94I/3y85rbZnolb6uZN9MhLjzTFqDC9ilW6HFqmwK4t4wlHSCjQN1St
svLYvs2G6n6/VK3Fre7/wOvdZ1erG4Qod+kn5Tx3K5TQydmRlaSBfK+DRANuDBkM
KzGO7Bkc/Cx8hb9pHmaey/wxmNrrgmVjTtWrEnb2tEq833zP4h6GhUIJEKodMSi5
gXPNZgKlu3n5L592M0UCh4EoHejzkv9wrcsoDm+djmsc5Zg2Howq4kAdHP8k4hUG
0gt8n0ni9vhJN56jikrGi7cAdHCKSNnx2Ue/qTCbX0ncB3XUMuJxJwCsgW/6wa9f
oU7tRtTS03UltnKoFAcyYclS4TaSY4SA4ySaK6Hi+cRkdVFDdyHQYbHHNSU7MsA+
IS2tXvGoIdSYyrZMHSRcl2rRTfYQUkmPEvBF3LvqZr32M4mJMmUNAPLZaly373ZE
iwq0ZJlrLeM0cqdFIG3S60RtJyQk/HBN1NMqrYHArWOxvWIgNd5F8NCsTTxY3wU3
IxgBIuUFcbVwVkqEHGs8K5AvB3oghqdnA3eGOV79799eMtLn3LOvyIlpHMSw9WUq
ags00JtMLitfNPBH3eSl
=eE4D
-----END PGP SIGNATURE-----
Merge tag 'pci-v4.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
- add framework for supporting PCIe devices in Endpoint mode (Kishon
Vijay Abraham I)
- use non-postable PCI config space mappings when possible (Lorenzo
Pieralisi)
- clean up and unify mmap of PCI BARs (David Woodhouse)
- export and unify Function Level Reset support (Christoph Hellwig)
- avoid FLR for Intel 82579 NICs (Sasha Neftin)
- add pci_request_irq() and pci_free_irq() helpers (Christoph Hellwig)
- short-circuit config access failures for disconnected devices (Keith
Busch)
- remove D3 sleep delay when possible (Adrian Hunter)
- freeze PME scan before suspending devices (Lukas Wunner)
- stop disabling MSI/MSI-X in pci_device_shutdown() (Prarit Bhargava)
- disable boot interrupt quirk for ASUS M2N-LR (Stefan Assmann)
- add arch-specific alignment control to improve device passthrough by
avoiding multiple BARs in a page (Yongji Xie)
- add sysfs sriov_drivers_autoprobe to control VF driver binding
(Bodong Wang)
- allow slots below PCI-to-PCIe "reverse bridges" (Bjorn Helgaas)
- fix crashes when unbinding host controllers that don't support
removal (Brian Norris)
- add driver for MicroSemi Switchtec management interface (Logan
Gunthorpe)
- add driver for Faraday Technology FTPCI100 host bridge (Linus
Walleij)
- add i.MX7D support (Andrey Smirnov)
- use generic MSI support for Aardvark (Thomas Petazzoni)
- make Rockchip driver modular (Brian Norris)
- advertise 128-byte Read Completion Boundary support for Rockchip
(Shawn Lin)
- advertise PCI_EXP_LNKSTA_SLC for Rockchip root port (Shawn Lin)
- convert atomic_t to refcount_t in HV driver (Elena Reshetova)
- add CPU IRQ affinity in HV driver (K. Y. Srinivasan)
- fix PCI bus removal in HV driver (Long Li)
- add support for ThunderX2 DMA alias topology (Jayachandran C)
- add ThunderX pass2.x 2nd node MCFG quirk (Tomasz Nowicki)
- add ITE 8893 bridge DMA alias quirk (Jarod Wilson)
- restrict Cavium ACS quirk only to CN81xx/CN83xx/CN88xx devices
(Manish Jaggi)
* tag 'pci-v4.12-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (146 commits)
PCI: Don't allow unbinding host controllers that aren't prepared
ARM: DRA7: clockdomain: Change the CLKTRCTRL of CM_PCIE_CLKSTCTRL to SW_WKUP
MAINTAINERS: Add PCI Endpoint maintainer
Documentation: PCI: Add userguide for PCI endpoint test function
tools: PCI: Add sample test script to invoke pcitest
tools: PCI: Add a userspace tool to test PCI endpoint
Documentation: misc-devices: Add Documentation for pci-endpoint-test driver
misc: Add host side PCI driver for PCI test function device
PCI: Add device IDs for DRA74x and DRA72x
dt-bindings: PCI: dra7xx: Add DT bindings to enable unaligned access
PCI: dwc: dra7xx: Workaround for errata id i870
dt-bindings: PCI: dra7xx: Add DT bindings for PCI dra7xx EP mode
PCI: dwc: dra7xx: Add EP mode support
PCI: dwc: dra7xx: Facilitate wrapper and MSI interrupts to be enabled independently
dt-bindings: PCI: Add DT bindings for PCI designware EP mode
PCI: dwc: designware: Add EP mode support
Documentation: PCI: Add binding documentation for pci-test endpoint function
ixgbe: Use pcie_flr() instead of duplicating it
IB/hfi1: Use pcie_flr() instead of duplicating it
PCI: imx6: Fix spelling mistake: "contol" -> "control"
...
Pull networking updates from David Millar:
"Here are some highlights from the 2065 networking commits that
happened this development cycle:
1) XDP support for IXGBE (John Fastabend) and thunderx (Sunil Kowuri)
2) Add a generic XDP driver, so that anyone can test XDP even if they
lack a networking device whose driver has explicit XDP support
(me).
3) Sparc64 now has an eBPF JIT too (me)
4) Add a BPF program testing framework via BPF_PROG_TEST_RUN (Alexei
Starovoitov)
5) Make netfitler network namespace teardown less expensive (Florian
Westphal)
6) Add symmetric hashing support to nft_hash (Laura Garcia Liebana)
7) Implement NAPI and GRO in netvsc driver (Stephen Hemminger)
8) Support TC flower offload statistics in mlxsw (Arkadi Sharshevsky)
9) Multiqueue support in stmmac driver (Joao Pinto)
10) Remove TCP timewait recycling, it never really could possibly work
well in the real world and timestamp randomization really zaps any
hint of usability this feature had (Soheil Hassas Yeganeh)
11) Support level3 vs level4 ECMP route hashing in ipv4 (Nikolay
Aleksandrov)
12) Add socket busy poll support to epoll (Sridhar Samudrala)
13) Netlink extended ACK support (Johannes Berg, Pablo Neira Ayuso,
and several others)
14) IPSEC hw offload infrastructure (Steffen Klassert)"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2065 commits)
tipc: refactor function tipc_sk_recv_stream()
tipc: refactor function tipc_sk_recvmsg()
net: thunderx: Optimize page recycling for XDP
net: thunderx: Support for XDP header adjustment
net: thunderx: Add support for XDP_TX
net: thunderx: Add support for XDP_DROP
net: thunderx: Add basic XDP support
net: thunderx: Cleanup receive buffer allocation
net: thunderx: Optimize CQE_TX handling
net: thunderx: Optimize RBDR descriptor handling
net: thunderx: Support for page recycling
ipx: call ipxitf_put() in ioctl error path
net: sched: add helpers to handle extended actions
qed*: Fix issues in the ptp filter config implementation.
qede: Fix concurrency issue in PTP Tx path processing.
stmmac: Add support for SIMATIC IOT2000 platform
net: hns: fix ethtool_get_strings overflow in hns driver
tcp: fix wraparound issue in tcp_lp
bpf, arm64: fix jit branch offset related to ldimm64
bpf, arm64: implement jiting of BPF_XADD
...
Pull uaccess unification updates from Al Viro:
"This is the uaccess unification pile. It's _not_ the end of uaccess
work, but the next batch of that will go into the next cycle. This one
mostly takes copy_from_user() and friends out of arch/* and gets the
zero-padding behaviour in sync for all architectures.
Dealing with the nocache/writethrough mess is for the next cycle;
fortunately, that's x86-only. Same for cleanups in iov_iter.c (I am
sold on access_ok() in there, BTW; just not in this pile), same for
reducing __copy_... callsites, strn*... stuff, etc. - there will be a
pile about as large as this one in the next merge window.
This one sat in -next for weeks. -3KLoC"
* 'work.uaccess' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (96 commits)
HAVE_ARCH_HARDENED_USERCOPY is unconditional now
CONFIG_ARCH_HAS_RAW_COPY_USER is unconditional now
m32r: switch to RAW_COPY_USER
hexagon: switch to RAW_COPY_USER
microblaze: switch to RAW_COPY_USER
get rid of padding, switch to RAW_COPY_USER
ia64: get rid of copy_in_user()
ia64: sanitize __access_ok()
ia64: get rid of 'segment' argument of __do_{get,put}_user()
ia64: get rid of 'segment' argument of __{get,put}_user_check()
ia64: add extable.h
powerpc: get rid of zeroing, switch to RAW_COPY_USER
esas2r: don't open-code memdup_user()
alpha: fix stack smashing in old_adjtimex(2)
don't open-code kernel_setsockopt()
mips: switch to RAW_COPY_USER
mips: get rid of tail-zeroing in primitives
mips: make copy_from_user() zero tail explicitly
mips: clean and reorder the forest of macros...
mips: consolidate __invoke_... wrappers
...
Now that we eliminated the different behaviour in separately-reviewable
commits, we can switch IA64 to the generic implementation.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Tested-by: Tony Luck <tony.luck@intel.com>
We can declare it <linux/pci.h> even on platforms where it isn't going to
be defined. There's no need to have it littered through the various
<asm/pci.h> files.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Most of the almost-identical versions of pci_mmap_page_range() silently
ignore the 'write_combine' argument and give uncached mappings.
Yet we allow the PCIIOC_WRITE_COMBINE ioctl in /proc/bus/pci, expose the
'resourceX_wc' file in sysfs, and allow an attempted mapping to apparently
succeed.
To fix this, introduce a macro arch_can_pci_mmap_wc() which indicates
whether the platform can do a write-combining mapping. On x86 this ends up
being pat_enabled(), while the few other platforms that support it can just
set it to a literal '1'.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Conflicts were simply overlapping changes. In the net/ipv4/route.c
case the code had simply moved around a little bit and the same fix
was made in both 'net' and 'net-next'.
In the net/sched/sch_generic.c case a fix in 'net' happened at
the same time that a new argument was added to qdisc_hash_add().
Signed-off-by: David S. Miller <davem@davemloft.net>
The ia64 build generates many warnings like this:
WARNING: EXPORT symbol "empty_zero_page" [vmlinux] version generation failed, symbol will not be versioned.
Besides adding the necessary header this also requires fiddling with
some explicit .S -> .o rules.
Cc: IA64-ML <linux-ia64@vger.kernel.org>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Introduce a new getsockopt operation to retrieve the socket cookie
for a specific socket based on the socket fd. It returns a unique
non-decreasing cookie for each socket.
Tested: https://android-review.googlesource.com/#/c/358163/
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Chenbo Feng <fengc@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This socket option returns the NAPI ID associated with the queue on which
the last frame is received. This information can be used by the apps to
split the incoming flows among the threads based on the Rx queue on which
they are received.
If the NAPI ID actually represents a sender_cpu then the value is ignored
and 0 is returned.
Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Allows reading of SK_MEMINFO_VARS via socket option. This way an
application can get all meminfo related information in single socket
option call instead of multiple calls.
Adds helper function, sk_get_meminfo(), and uses that for both
getsockopt and sock_diag_put_meminfo().
Suggested by Eric Dumazet.
Signed-off-by: Josh Hunt <johunt@akamai.com>
Reviewed-by: Jason Baron <jbaron@akamai.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If an architecture uses 4level-fixup.h we don't need to do anything as
it includes 5level-fixup.h.
If an architecture uses pgtable-nop*d.h, define __ARCH_USE_5LEVEL_HACK
before inclusion of the header. It makes asm-generic code to use
5level-fixup.h.
If an architecture has 4-level paging or folds levels on its own,
include 5level-fixup.h directly.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
prefetch_stack() is defined by IA64, but not actually used anywhere anymore.
Remove it.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Update code that relied on sched.h including various MM types for them.
This will allow us to remove the <linux/mm_types.h> include from <linux/sched.h>.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Often all is needed is these small helpers, instead of compiler.h or a
full kprobes.h. This is important for asm helpers, in fact even some
asm/kprobes.h make use of these helpers... instead just keep a generic
asm file with helpers useful for asm code with the least amount of
clutter as possible.
Likewise we need now to also address what to do about this file for both
when architectures have CONFIG_HAVE_KPROBES, and when they do not. Then
for when architectures have CONFIG_HAVE_KPROBES but have disabled
CONFIG_KPROBES.
Right now most asm/kprobes.h do not have guards against CONFIG_KPROBES,
this means most architecture code cannot include asm/kprobes.h safely.
Correct this and add guards for architectures missing them.
Additionally provide architectures that not have kprobes support with
the default asm-generic solution. This lets us force asm/kprobes.h on
the header include/linux/kprobes.h always, but most importantly we can
now safely include just asm/kprobes.h on architecture code without
bringing the full kitchen sink of header files.
Two architectures already provided a guard against CONFIG_KPROBES on its
kprobes.h: sh, arch. The rest of the architectures needed gaurds added.
We avoid including any not-needed headers on asm/kprobes.h unless
kprobes have been enabled.
In a subsequent atomic change we can try now to remove compiler.h from
include/linux/kprobes.h.
During this sweep I've also identified a few architectures defining a
common macro needed for both kprobes and ftrace, that of the definition
of the breakput instruction up. Some refer to this as
BREAKPOINT_INSTRUCTION. This must be kept outside of the #ifdef
CONFIG_KPROBES guard.
[mcgrof@kernel.org: fix arm64 build]
Link: http://lkml.kernel.org/r/CAB=NE6X1WMByuARS4mZ1g9+W=LuVBnMDnh_5zyN0CLADaVh=Jw@mail.gmail.com
[sfr@canb.auug.org.au: fixup for kprobes declarations moving]
Link: http://lkml.kernel.org/r/20170214165933.13ebd4f4@canb.auug.org.au
Link: http://lkml.kernel.org/r/20170203233139.32682-1-mcgrof@kernel.org
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bart Van Assche noted that the ib DMA mapping code was significantly
similar enough to the core DMA mapping code that with a few changes
it was possible to remove the IB DMA mapping code entirely and
switch the RDMA stack to use the core DMA mapping code. This resulted
in a nice set of cleanups, but touched the entire tree. This branch
will be submitted separately to Linus at the end of the merge window
as per normal practice for tree wide changes like this.
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJYo06oAAoJELgmozMOVy/d9Z8QALedWHdu98St1L0u2c8sxnR9
2zo/4sF5Vb9u7FpmdIX32L4SQ9s9KhPE8Qp8NtZLf9v10zlDebIRJDpXknXtKooV
CAXxX4sxBXV27/UrhbZEfXiPrmm6ccJFyIfRnMU6NlMqh2AtAsRa5AC2/RMp8oUD
Med97PFiF0o6TD22/UH1VFbRpX1zjaKyqm7a3as5sJfzNA+UGIZAQ7Euz8000DKZ
xCgVLTEwS0FmOujtBkCst7xa9TjuqR1HLOB4DdGvAhP6BHdz2yamM7Qmh9NN+NEX
0BtjsuXomtn6j6AszGC+bpipCZh3NUigcwoFAARXCYFHibBvo4DPdFeGsraFgXdy
1+KyR8CCeQG3Aly5Vwr264RFPGkGpwMj8PsBlXgQVtrlg4rriaCzOJNmIIbfdADw
ftqhxBOzReZw77aH2s+9p2ILRfcAmPqhynLvFGFo9LBvsik8LVso7YgZN0xGxwcI
IjI/XGC8UskPVsIZBIYA6sl2bYzgOjtBIHiXjRrPlW3uhduIXLrvKFfLPP/5XLAG
ehLXK+J0bfsyY9ClmlNS8oH/WdLhXAyy/KNmnj5bRRm9qg6BRJR3bsOBhZJODuoC
XgEXFfF6/7roNESWxowff7pK0rTkRg/m/Pa4VQpeO+6NWHE7kgZhL6kyIp5nKcwS
3e7mgpcwC+3XfA/6vU3F
=e0Si
-----END PGP SIGNATURE-----
Merge tag 'for-next-dma_ops' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma DMA mapping updates from Doug Ledford:
"Drop IB DMA mapping code and use core DMA code instead.
Bart Van Assche noted that the ib DMA mapping code was significantly
similar enough to the core DMA mapping code that with a few changes it
was possible to remove the IB DMA mapping code entirely and switch the
RDMA stack to use the core DMA mapping code.
This resulted in a nice set of cleanups, but touched the entire tree
and has been kept separate for that reason."
* tag 'for-next-dma_ops' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (37 commits)
IB/rxe, IB/rdmavt: Use dma_virt_ops instead of duplicating it
IB/core: Remove ib_device.dma_device
nvme-rdma: Switch from dma_device to dev.parent
RDS: net: Switch from dma_device to dev.parent
IB/srpt: Modify a debug statement
IB/srp: Switch from dma_device to dev.parent
IB/iser: Switch from dma_device to dev.parent
IB/IPoIB: Switch from dma_device to dev.parent
IB/rxe: Switch from dma_device to dev.parent
IB/vmw_pvrdma: Switch from dma_device to dev.parent
IB/usnic: Switch from dma_device to dev.parent
IB/qib: Switch from dma_device to dev.parent
IB/qedr: Switch from dma_device to dev.parent
IB/ocrdma: Switch from dma_device to dev.parent
IB/nes: Remove a superfluous assignment statement
IB/mthca: Switch from dma_device to dev.parent
IB/mlx5: Switch from dma_device to dev.parent
IB/mlx4: Switch from dma_device to dev.parent
IB/i40iw: Remove a superfluous assignment statement
IB/hns: Switch from dma_device to dev.parent
...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJYrFn+AAoJEOvOhAQsB9HWbDAP/i1bMxYJSnwD2D/PBvzpY2AW
uinEigaRr6kpRCOfa9FrfgxokKfosZOx5h7Se3f6O3mPwgpsU+dqbaE18Z5XSgxh
+a9+HvAv3/XNZg7SvBtBaoYDblHWJ6AJ9rN9fuKg3e8btE3rSFG147vj1atlVz1+
iRsXcCPb1p5db2+wZdsYJPI5Zwt4N0nR6cxPX4RQ6jseiVqPpt/FDtB60RYCjbID
J0cOk1VV1Jn2H1Rfl+hjNQjIPMNx3zftOLQ2usr/kwuEqeuTKZR06yLXFOT6bdXU
6JBdfL+e2kHKbaLyJGr6MCjTokaMgN3SGZJWJqHgk5Nggq5BD+2c4AOs8t6URnE0
KThGiyY+YI5C/W6kMlEozLARiMKe4IIQpx1uj2Hv+YkndntvqjCqvfdQQJKnzm0G
YWfPnsG2dysiovwEOBoBwyFVFLFzzJ1o3uyRGkCzVGaLQVzD5ktAJM6ynMOxwcIn
zSN+agzdTAD7QJIDaa1p2r5fAqy7i4xIn2+ts1s9c410fdUTB4A2QJzTywjPAdCp
IRxcLLpYDeBZ5cbhqjR677WgPtteYFTljoX+/8BOFO2PI+HjKHrxfW02WSiCS0iu
CUndrlpmuyKlIrpw7mYpDTbORcSQSiUkB1pRGT7poh2p0KKAGSo9ZrRbx+qRvPdH
AxO+ZR6Jjj5LAMk5MkRz
=RK2h
-----END PGP SIGNATURE-----
Merge tag 'extable-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
Pull exception table module split from Paul Gortmaker:
"Final extable.h related changes.
This completes the separation of exception table content from the
module.h header file. This is achieved with the final commit that
removes the one line back compatible change that sourced extable.h
into the module.h file.
The commits are unchanged since January, with the exception of a
couple Acks that came in for the last two commits a bit later. The
changes have been in linux-next for quite some time[1] and have got
widespread arch coverage via toolchains I have and also from
additional ones the kbuild bot has.
Maintaners of the various arch were Cc'd during the postings to
lkml[2] and informed that the intention was to take the remaining arch
specific changes and lump them together with the final two non-arch
specific changes and submit for this merge window.
The ia64 diffstat stands out and probably warrants a mention. In an
earlier review, Al Viro made a valid comment that the original header
separation of content left something to be desired, and that it get
fixed as a part of this change, hence the larger diffstat"
* tag 'extable-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (21 commits)
module.h: remove extable.h include now users have migrated
core: migrate exception table users off module.h and onto extable.h
cris: migrate exception table users off module.h and onto extable.h
hexagon: migrate exception table users off module.h and onto extable.h
microblaze: migrate exception table users off module.h and onto extable.h
unicore32: migrate exception table users off module.h and onto extable.h
score: migrate exception table users off module.h and onto extable.h
metag: migrate exception table users off module.h and onto extable.h
arc: migrate exception table users off module.h and onto extable.h
nios2: migrate exception table users off module.h and onto extable.h
sparc: migrate exception table users onto extable.h
openrisc: migrate exception table users off module.h and onto extable.h
frv: migrate exception table users off module.h and onto extable.h
sh: migrate exception table users off module.h and onto extable.h
xtensa: migrate exception table users off module.h and onto extable.h
mn10300: migrate exception table users off module.h and onto extable.h
alpha: migrate exception table users off module.h and onto extable.h
arm: migrate exception table users off module.h and onto extable.h
m32r: migrate exception table users off module.h and onto extable.h
ia64: ensure exception table search users include extable.h
...
cputime_t is now only used by two architectures:
* powerpc (when CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y)
* s390
And since the core doesn't use it anymore, we don't need any arch support
from the others. So we can remove their stub implementations.
A final cleanup would be to provide an efficient pure arch
implementation of cputime_to_nsec() for s390 and powerpc and finally
remove include/linux/cputime.h .
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1485832191-26889-36-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This nsec based cputime_t implementation isn't used anymore. We can
remove it.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1485832191-26889-31-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Initially, nsec based cputime_t implementation belonged to IA64. It got
exported later for CONFIG_VIRT_CPU_ACCOUNTING_GEN but now it is again
only used by IA64. So let's move it back there.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1485832191-26889-29-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
We start with a delete of a duplicate prototype in asm/exception.h
that no longer needs to exist, as it duplicates content in extable.h
and since that header is so small, there is no point trying to
avoid using it.
Then we make sure anyone using search_exception_tables directly or
via the ia64_done_with_exception macro has included extable.h
In the process, we remove an include of moduleloader.h that was
apparently not really required; it would have been fetching in
module.h and hence the previous location of the exception search
function prototypes, but we need not rely on that anymore.
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: linux-ia64@vger.kernel.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Move ia64_done_with_exception out of asm/uaccess.h (which is widely
used) and into asm/exception.h (like ARM has) and then ensure the
few callers of it include this new header.
Most of the other C content in asm files is implemented in macro form.
So we do that conversion at the same time as the move.
There are two C exception prototypes that move along with the macro.
One of them will become redundant when we switch over to using the
<linux/extable.h> instead of <linux/module.h> header in a subsequent
commit.
Also relocate a couple of the automated asm --> linux uaccess
conversions to preserve the linux and asm independent grouping, since
we are in the file at that location anyway.
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: linux-ia64@vger.kernel.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Introduce a new architecture-specific get_arch_dma_ops() function
that takes a struct bus_type * argument. Add get_dma_ops() in
<linux/dma-mapping.h>.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Juergen Gross <jgross@suse.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Russell King <linux@armlinux.org.uk>
Cc: x86@kernel.org
Signed-off-by: Doug Ledford <dledford@redhat.com>
Currently CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y accounts the cputime on
any context boundary: irq entry/exit, guest entry/exit, context switch,
etc...
Calling functions such as account_system_time(), account_user_time()
and such can be costly, especially if they are called on many fastpath
such as twice per IRQ. Those functions do more than just accounting to
kcpustat and task cputime. Depending on the config, some subsystems can
perform unpleasant multiplications and divisions, among other things.
So lets accumulate the cputime instead and delay the accounting on ticks
and context switches only.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Link: http://lkml.kernel.org/r/1483636310-6557-9-git-send-email-fweisbec@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull x86 fixes and cleanups from Thomas Gleixner:
"This set of updates contains:
- Robustification for the logical package managment. Cures the AMD
and virtualization issues.
- Put the correct start_cpu() return address on the stack of the idle
task.
- Fixups for the fallout of the nodeid <-> cpuid persistent mapping
modifciations
- Move the x86/MPX specific mm_struct member to the arch specific
mm_context where it belongs
- Cleanups for C89 struct initializers and useless function
arguments"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/floppy: Use designated initializers
x86/mpx: Move bd_addr to mm_context_t
x86/mm: Drop unused argument 'removed' from sync_global_pgds()
ACPI/NUMA: Do not map pxm to node when NUMA is turned off
x86/acpi: Use proper macro for invalid node
x86/smpboot: Prevent false positive out of bounds cpumask access warning
x86/boot/64: Push correct start_cpu() return address
x86/boot/64: Use 'push' instead of 'call' in start_cpu()
x86/smpboot: Make logical package management more robust
Now that we check for page size change early in the loop, we can
partially revert e9d55e1570 ("mm: change the interface for
__tlb_remove_page").
This simplies the code much, by removing the need to track the last
address with which we adjusted the range. We also go back to the older
way of filling the mmu_gather array, ie, we add an entry and then check
whether the gather batch is full.
Link: http://lkml.kernel.org/r/20161026084839.27299-6-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With commit e77b0852b5 ("mm/mmu_gather: track page size with mmu
gather and force flush if page size change") we added the ability to
force a tlb flush when the page size change in a mmu_gather loop. We
did that by checking for a page size change every time we added a page
to mmu_gather for lazy flush/remove. We can improve that by moving the
page size change check early and not doing it every time we add a page.
This also helps us to do tlb flush when invalidating a range covering
dax mapping. Wrt dax mapping we don't have a backing struct page and
hence we don't call tlb_remove_page, which earlier forced the tlb flush
on page size change. Moving the page size change check earlier means we
will do the same even for dax mapping.
We also avoid doing this check on architecture other than powerpc.
In a later patch we will remove page size check from tlb_remove_page().
Link: http://lkml.kernel.org/r/20161026084839.27299-5-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This add tlb_remove_hugetlb_entry similar to tlb_remove_pmd_tlb_entry.
Link: http://lkml.kernel.org/r/20161026084839.27299-4-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull locking updates from Ingo Molnar:
"The tree got pretty big in this development cycle, but the net effect
is pretty good:
115 files changed, 673 insertions(+), 1522 deletions(-)
The main changes were:
- Rework and generalize the mutex code to remove per arch mutex
primitives. (Peter Zijlstra)
- Add vCPU preemption support: add an interface to query the
preemption status of vCPUs and use it in locking primitives - this
optimizes paravirt performance. (Pan Xinhui, Juergen Gross,
Christian Borntraeger)
- Introduce cpu_relax_yield() and remov cpu_relax_lowlatency() to
clean up and improve the s390 lock yielding machinery and its core
kernel impact. (Christian Borntraeger)
- Micro-optimize mutexes some more. (Waiman Long)
- Reluctantly add the to-be-deprecated mutex_trylock_recursive()
interface on a temporary basis, to give the DRM code more time to
get rid of its locking hacks. Any other users will be NAK-ed on
sight. (We turned off the deprecation warning for the time being to
not pollute the build log.) (Peter Zijlstra)
- Improve the rtmutex code a bit, in light of recent long lived
bugs/races. (Thomas Gleixner)
- Misc fixes, cleanups"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
x86/paravirt: Fix bool return type for PVOP_CALL()
x86/paravirt: Fix native_patch()
locking/ww_mutex: Use relaxed atomics
locking/rtmutex: Explain locking rules for rt_mutex_proxy_unlock()/init_proxy_locked()
locking/rtmutex: Get rid of RT_MUTEX_OWNER_MASKALL
x86/paravirt: Optimize native pv_lock_ops.vcpu_is_preempted()
locking/mutex: Break out of expensive busy-loop on {mutex,rwsem}_spin_on_owner() when owner vCPU is preempted
locking/osq: Break out of spin-wait busy waiting loop for a preempted vCPU in osq_lock()
Documentation/virtual/kvm: Support the vCPU preemption check
x86/xen: Support the vCPU preemption check
x86/kvm: Support the vCPU preemption check
x86/kvm: Support the vCPU preemption check
kvm: Introduce kvm_write_guest_offset_cached()
locking/core, x86/paravirt: Implement vcpu_is_preempted(cpu) for KVM and Xen guests
locking/spinlocks, s390: Implement vcpu_is_preempted(cpu)
locking/core, powerpc: Implement vcpu_is_preempted(cpu)
sched/core: Introduce the vcpu_is_preempted(cpu) interface
sched/wake_q: Rename WAKE_Q to DEFINE_WAKE_Q
locking/core: Provide common cpu_relax_yield() definition
locking/mutex: Don't mark mutex_trylock_recursive() as deprecated, temporarily
...
This patch exports the sender chronograph stats via the socket
SO_TIMESTAMPING channel. Currently we can instrument how long a
particular application unit of data was queued in TCP by tracking
SOF_TIMESTAMPING_TX_SOFTWARE and SOF_TIMESTAMPING_TX_SCHED. Having
these sender chronograph stats exported simultaneously along with
these timestamps allow further breaking down the various sender
limitation. For example, a video server can tell if a particular
chunk of video on a connection takes a long time to deliver because
TCP was experiencing small receive window. It is not possible to
tell before this patch without packet traces.
To prepare these stats, the user needs to set
SOF_TIMESTAMPING_OPT_STATS and SOF_TIMESTAMPING_OPT_TSONLY flags
while requesting other SOF_TIMESTAMPING TX timestamps. When the
timestamps are available in the error queue, the stats are returned
in a separate control message of type SCM_TIMESTAMPING_OPT_STATS,
in a list of TLVs (struct nlattr) of types: TCP_NLA_BUSY_TIME,
TCP_NLA_RWND_LIMITED, TCP_NLA_SNDBUF_LIMITED. Unit is microsecond.
Signed-off-by: Francis Yan <francisyyan@gmail.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
No need to duplicate the same define everywhere. Since
the only user is stop-machine and the only provider is
s390, we can use a default implementation of cpu_relax_yield()
in sched.h.
Suggested-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Noam Camus <noamc@ezchip.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: kvm@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-s390 <linux-s390@vger.kernel.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: sparclinux@vger.kernel.org
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1479298985-191589-1-git-send-email-borntraeger@de.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
As there are no users left, we can remove cpu_relax_lowlatency()
implementations from every architecture.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Noam Camus <noamc@ezchip.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Cc: <linux-arch@vger.kernel.org>
Link: http://lkml.kernel.org/r/1477386195-32736-6-git-send-email-borntraeger@de.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
For spinning loops people do often use barrier() or cpu_relax().
For most architectures cpu_relax and barrier are the same, but on
some architectures cpu_relax can add some latency.
For example on power,sparc64 and arc, cpu_relax can shift the CPU
towards other hardware threads in an SMT environment.
On s390 cpu_relax does even more, it uses an hypercall to the
hypervisor to give up the timeslice.
In contrast to the SMT yielding this can result in larger latencies.
In some places this latency is unwanted, so another variant
"cpu_relax_lowlatency" was introduced. Before this is used in more
and more places, lets revert the logic and provide a cpu_relax_yield
that can be called in places where yielding is more important than
latency. By default this is the same as cpu_relax on all architectures.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Noam Camus <noamc@ezchip.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1477386195-32736-2-git-send-email-borntraeger@de.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Its all generic atomic_long_t stuff now.
Tested-by: Jason Low <jason.low2@hpe.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull kbuild updates from Michal Marek:
- EXPORT_SYMBOL for asm source by Al Viro.
This does bring a regression, because genksyms no longer generates
checksums for these symbols (CONFIG_MODVERSIONS). Nick Piggin is
working on a patch to fix this.
Plus, we are talking about functions like strcpy(), which rarely
change prototypes.
- Fixes for PPC fallout of the above by Stephen Rothwell and Nick
Piggin
- fixdep speedup by Alexey Dobriyan.
- preparatory work by Nick Piggin to allow architectures to build with
-ffunction-sections, -fdata-sections and --gc-sections
- CONFIG_THIN_ARCHIVES support by Stephen Rothwell
- fix for filenames with colons in the initramfs source by me.
* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: (22 commits)
initramfs: Escape colons in depfile
ppc: there is no clear_pages to export
powerpc/64: whitelist unresolved modversions CRCs
kbuild: -ffunction-sections fix for archs with conflicting sections
kbuild: add arch specific post-link Makefile
kbuild: allow archs to select link dead code/data elimination
kbuild: allow architectures to use thin archives instead of ld -r
kbuild: Regenerate genksyms lexer
kbuild: genksyms fix for typeof handling
fixdep: faster CONFIG_ search
ia64: move exports to definitions
sparc32: debride memcpy.S a bit
[sparc] unify 32bit and 64bit string.h
sparc: move exports to definitions
ppc: move exports to definitions
arm: move exports to definitions
s390: move exports to definitions
m68k: move exports to definitions
alpha: move exports to actual definitions
x86: move exports to actual definitions
...
Pull libata updates from Tejun Heo:
- Write same support added
- Minor ahci MSIX irq handling updates
- Non-critical SCSI command translation fixes
- Controller specific changes
* 'for-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
ahci: qoriq: Revert "ahci: qoriq: Disable NCQ on ls2080a SoC"
libata: remove <asm-generic/libata-portmap.h>
libata: remove unused definitions from <asm/libata-portmap.h>
pata_at91: Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR
ata: Replace BUG() with BUG_ON().
ata: sata_mv: Replacing dma_pool_alloc and memset with a single call dma_pool_zalloc.
libata: Some drives failing on SCT Write Same
ahci: use pci_alloc_irq_vectors
libata: SCT Write Same handle ATA_DFLAG_PIO
libata: SCT Write Same / DSM Trim
libata: Add support for SCT Write Same
libata: Safely overwrite attached page in WRITE SAME xlat
ahci: also use a per-port lock for the multi-MSIX case
ARM: dts: STiH407-family: Add ports-implemented property in sata nodes
ahci: st: Add ports-implemented property in support
ahci: qoriq: enable snoopable sata read and write
ahci: qoriq: adjust sata parameter
libata-scsi: fix MODE SELECT translation for Control mode page
libata-scsi: use u8 array to store mode page copy
This is based on s390 version and needed to get rid of
CONFIG_ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
Link: http://lkml.kernel.org/r/1473703083-8625-2-git-send-email-vgupta@synopsys.com
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull uaccess fixes from Al Viro:
"Fixes for broken uaccess primitives - mostly lack of proper zeroing
in copy_from_user()/get_user()/__get_user(), but for several
architectures there's more (broken clear_user() on frv and
strncpy_from_user() on hexagon)"
* 'uaccess-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (28 commits)
avr32: fix copy_from_user()
microblaze: fix __get_user()
microblaze: fix copy_from_user()
m32r: fix __get_user()
blackfin: fix copy_from_user()
sparc32: fix copy_from_user()
sh: fix copy_from_user()
sh64: failing __get_user() should zero
score: fix copy_from_user() and friends
score: fix __get_user/get_user
s390: get_user() should zero on failure
ppc32: fix copy_from_user()
parisc: fix copy_from_user()
openrisc: fix copy_from_user()
nios2: fix __get_user()
nios2: copy_from_user() should zero the tail of destination
mn10300: copy_from_user() should zero on access_ok() failure...
mn10300: failing __get_user() and get_user() should zero
mips: copy_from_user() must zero the destination on access_ok() failure
ARC: uaccess: get_user to zero out dest in cause of fault
...
Instead of having each caller of check_object_size() need to remember to
check for a const size parameter, move the check into check_object_size()
itself. This actually matches the original implementation in PaX, though
this commit cleans up the now-redundant builtin_const() calls in the
various architectures.
Signed-off-by: Kees Cook <keescook@chromium.org>
If CONFIG_VMAP_STACK=y is selected, kernel stacks are allocated with
__vmalloc_node_range().
Grsecurity has had a similar feature (called GRKERNSEC_KSTACKOVERFLOW=y)
for a long time.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/14c07d4fd173a5b117f51e8b939f9f4323e39899.1470907718.git.luto@kernel.org
[ Minor edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Here we have another kind of deviation from the default case -
a difference between exporting functions and non-functions.
EXPORT_DATA_SYMBOL... is really different from EXPORT_SYMBOL...
on ia64, and we need to use the right one when moving exports
from *.c where C compiler has the required information to
*.S, where we need to supply it manually.
parisc64 will be another one like that.
Tested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cleanups:
- huge cleanup of rtc-generic and char/genrtc this allowed to cleanup rtc-cmos,
rtc-sh, rtc-m68k, rtc-powerpc and rtc-parisc
- move mn10300 to rtc-cmos
Subsystem:
- fix wakealarms after hibernate
- multiples fixes for rctest
- simplify implementations of .read_alarm
New drivers:
- Maxim MAX6916
Drivers:
- ds1307: fix weekday
- m41t80: add wakeup support
- pcf85063: add support for PCF85063A variant
- rv8803: extend i2c fix and other fixes
- s35390a: fix alarm reading, this fixes instant reboot after shutdown for QNAP
TS-41x
- s3c: clock fixes
-----BEGIN PGP SIGNATURE-----
iQIcBAABCgAGBQJXokhIAAoJENiigzvaE+LCZqQP+wWzintN/N1u3dKiVB7iSdwq
+S/jAXD9wW8OK9PI60/YUGRYeUXmZW9t4XYg1VKCxU9KpVC17LgOtDyXD8BufP1V
uREJEzZw9O7zCCjeHp/ICFjBkc62Net6ZDOO+ZyXPNfddpS1Xq1uUgXLZc/202UR
ID/kewu0pJRDnoxyqznWn9+8D33w/ygXs2slY2Ive0ONtjdgxGcsj2rNbb2RYn2z
OP7br3lLg7qkFh4TtXb61eh/9GYIk6wzP/CrX5l/jH4SjQnrIk5g/X/Cd1qQ/qso
JZzFoonOKvIp5Gw/+fZ9NP3YFcnkoRMv4NjZV8PAmsYLds+ibRiBcoB8u6FmiJV7
WW5uopgPkfCGN5BV3+QHwJDVe+WlgnlzaT5zPUCcP5KWusDts4fWIgzP7vrtAzf4
3OJLrgSGdBeOqWnJD21nxKUD27JOseX7D+BFtwxR4lMsXHqlHJfETpZ8gts1ZGH3
2U353j/jkZvGWmc6dMcuxOXT2K4VqpYeIIqs0IcLu6hM9crtR89zPR2Iu1AilfDW
h2NroF+Q//SgMMzWoTEG6Tn7RAc7MthgA/tRCFZF9CBMzNs988w0CTHnKsIHmjpU
UKkMeJGAC9YrPYIcqrg0oYsmLUWXc8JuZbGJBnei3BzbaMTlcwIN9qj36zfq6xWc
TMLpbWEoIsgFIZMP/hAP
=rpGB
-----END PGP SIGNATURE-----
Merge tag 'rtc-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"RTC for 4.8
Cleanups:
- huge cleanup of rtc-generic and char/genrtc this allowed to cleanup
rtc-cmos, rtc-sh, rtc-m68k, rtc-powerpc and rtc-parisc
- move mn10300 to rtc-cmos
Subsystem:
- fix wakealarms after hibernate
- multiples fixes for rctest
- simplify implementations of .read_alarm
New drivers:
- Maxim MAX6916
Drivers:
- ds1307: fix weekday
- m41t80: add wakeup support
- pcf85063: add support for PCF85063A variant
- rv8803: extend i2c fix and other fixes
- s35390a: fix alarm reading, this fixes instant reboot after
shutdown for QNAP TS-41x
- s3c: clock fixes"
* tag 'rtc-4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (65 commits)
rtc: rv8803: Clear V1F when setting the time
rtc: rv8803: Stop the clock while setting the time
rtc: rv8803: Always apply the I²C workaround
rtc: rv8803: Fix read day of week
rtc: rv8803: Remove the check for valid time
rtc: rv8803: Kconfig: Indicate rx8900 support
rtc: asm9260: remove .owner field for driver
rtc: at91sam9: Fix missing spin_lock_init()
rtc: m41t80: add suspend handlers for alarm IRQ
rtc: m41t80: make it a real error message
rtc: pcf85063: Add support for the PCF85063A device
rtc: pcf85063: fix year range
rtc: hym8563: in .read_alarm set .tm_sec to 0 to signal minute accuracy
rtc: explicitly set tm_sec = 0 for drivers with minute accurancy
rtc: s3c: Add s3c_rtc_{enable/disable}_clk in s3c_rtc_setfreq()
rtc: s3c: Remove unnecessary call to disable already disabled clock
rtc: abx80x: use devm_add_action_or_reset()
rtc: m41t80: use devm_add_action_or_reset()
rtc: fix a typo and reduce three empty lines to one
rtc: s35390a: improve two comments in .set_alarm
...
The dma-mapping core and the implementations do not change the DMA
attributes passed by pointer. Thus the pointer can point to const data.
However the attributes do not have to be a bitfield. Instead unsigned
long will do fine:
1. This is just simpler. Both in terms of reading the code and setting
attributes. Instead of initializing local attributes on the stack
and passing pointer to it to dma_set_attr(), just set the bits.
2. It brings safeness and checking for const correctness because the
attributes are passed by value.
Semantic patches for this change (at least most of them):
virtual patch
virtual context
@r@
identifier f, attrs;
@@
f(...,
- struct dma_attrs *attrs
+ unsigned long attrs
, ...)
{
...
}
@@
identifier r.f;
@@
f(...,
- NULL
+ 0
)
and
// Options: --all-includes
virtual patch
virtual context
@r@
identifier f, attrs;
type t;
@@
t f(..., struct dma_attrs *attrs);
@@
identifier r.f;
@@
f(...,
- NULL
+ 0
)
Link: http://lkml.kernel.org/r/1468399300-5399-2-git-send-email-k.kozlowski@samsung.com
Signed-off-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Acked-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no>
Acked-by: Mark Salter <msalter@redhat.com> [c6x]
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> [cris]
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> [drm]
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
Acked-by: Fabien Dessenne <fabien.dessenne@st.com> [bdisp]
Reviewed-by: Marek Szyprowski <m.szyprowski@samsung.com> [vb2-core]
Acked-by: David Vrabel <david.vrabel@citrix.com> [xen]
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [xen swiotlb]
Acked-by: Joerg Roedel <jroedel@suse.de> [iommu]
Acked-by: Richard Kuo <rkuo@codeaurora.org> [hexagon]
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> [s390]
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Acked-by: Hans-Christian Noren Egtvedt <egtvedt@samfundet.no> [avr32]
Acked-by: Vineet Gupta <vgupta@synopsys.com> [arc]
Acked-by: Robin Murphy <robin.murphy@arm.com> [arm64 and dma-iommu]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In general, there's no need for the "restore sigmask" flag to live in
ti->flags. alpha, ia64, microblaze, powerpc, sh, sparc (64-bit only),
tile, and x86 use essentially identical alternative implementations,
placing the flag in ti->status.
Replace those optimized implementations with an equally good common
implementation that stores it in a bitfield in struct task_struct and
drop the custom implementations.
Additional architectures can opt in by removing their
TIF_RESTORE_SIGMASK defines.
Link: http://lkml.kernel.org/r/8a14321d64a28e40adfddc90e18a96c086a6d6f9.1468522723.git.luto@kernel.org
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Tested-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc]
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Cc: Rich Felker <dalias@libc.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge updates from Andrew Morton:
- a few misc bits
- ocfs2
- most(?) of MM
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (125 commits)
thp: fix comments of __pmd_trans_huge_lock()
cgroup: remove unnecessary 0 check from css_from_id()
cgroup: fix idr leak for the first cgroup root
mm: memcontrol: fix documentation for compound parameter
mm: memcontrol: remove BUG_ON in uncharge_list
mm: fix build warnings in <linux/compaction.h>
mm, thp: convert from optimistic swapin collapsing to conservative
mm, thp: fix comment inconsistency for swapin readahead functions
thp: update Documentation/{vm/transhuge,filesystems/proc}.txt
shmem: split huge pages beyond i_size under memory pressure
thp: introduce CONFIG_TRANSPARENT_HUGE_PAGECACHE
khugepaged: add support of collapse for tmpfs/shmem pages
shmem: make shmem_inode_info::lock irq-safe
khugepaged: move up_read(mmap_sem) out of khugepaged_alloc_page()
thp: extract khugepaged from mm/huge_memory.c
shmem, thp: respect MADV_{NO,}HUGEPAGE for file mappings
shmem: add huge pages support
shmem: get_unmapped_area align huge page
shmem: prepare huge= mount option and sysfs knob
mm, rmap: account shmem thp pages
...
- Support for ACPI SSDT overlays allowing Secondary System
Description Tables (SSDTs) to be loaded at any time from EFI
variables or via configfs (Octavian Purdila, Mika Westerberg).
- Support for the ACPI LPI (Low-Power Idle) feature introduced in
ACPI 6.0 and allowing processor idle states to be represented in
ACPI tables in a hierarchical way (with the help of Processor
Container objects) and support for ACPI idle states management
on ARM64, based on LPI (Sudeep Holla).
- General improvements of ACPI support for NUMA and ARM64 support
for ACPI-based NUMA (Hanjun Guo, David Daney, Robert Richter).
- General improvements of the ACPI table upgrade mechanism and
ARM64 support for that feature (Aleksey Makarov, Jon Masters).
- Support for the Boot Error Record Table (BERT) in APEI and
improvements of kernel messages printed by the error injection
code (Huang Ying, Borislav Petkov).
- New driver for the Intel Broxton WhiskeyCove PMIC operation
region and support for the REGS operation region on Broxton,
PMIC code cleanups (Bin Gao, Felipe Balbi, Paul Gortmaker).
- New driver for the power participant device which is part of the
Dynamic Power and Thermal Framework (DPTF) and DPTF-related code
reorganization (Srinivas Pandruvada).
- Support for the platform-initiated graceful shutdown feature
introduced in ACPI 6.1 (Prashanth Prakash).
- ACPI button driver update related to lid input events generated
automatically on initialization and system resume that have been
problematic for some time (Lv Zheng).
- ACPI EC driver cleanups (Lv Zheng).
- Documentation of the ACPICA release automation process and the
in-kernel ACPI AML debugger (Lv Zheng).
- New blacklist entry and two fixes for the ACPI backlight driver
(Alex Hung, Arvind Yadav, Ralf Gerbig).
- Cleanups of the ACPI pci_slot driver (Joe Perches, Paul Gortmaker).
- ACPI CPPC code changes to make it more robust against possible
defects in ACPI tables and new symbol definitions for PCC (Hoan
Tran).
- System reboot code modification to execute the ACPI _PTS (Prepare
To Sleep) method in addition to _TTS (Ocean He).
- ACPICA-related change to carry out lock ordering checks in ACPICA
if ACPICA debug is enabled in the kernel (Lv Zheng).
- Assorted minor fixes and cleanups (Andy Shevchenko, Baoquan He,
Bhaktipriya Shridhar, Paul Gortmaker, Rafael Wysocki).
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iQIcBAABCAAGBQJXl8A7AAoJEILEb/54YlRxF0kQAI6mH0yan60Osu4598+VNvgv
wxOWl1TEbKd+LaJkofRZ+FPzZkQf5c/h/8Oo8Q3LEpFhjkARhhX7ThDjS5v2Nx6v
I/icQ64ynPUPrw6hGNVrmec9ofZjiAs3j3Rt2bEiae+YN6guvfhWE+kBCHo2G/nN
o4BSaxYjkphUTDSi4/5BfaocV2sl3apvwjtAj8zgGn4RD81bFFLnblynHkqJVcoN
HAfm7QTVjT01Zkv565OSZgK8CFcD8Ky2KKKBQvcIW8zQmD6IXaoTHSYSwL0SH+oK
bxUZUmWVfFWw4kDTAY9mw0QwtWz9ODTWh/WMhs3itWRRN5qHfogs99rCVYFtFufQ
ODVy4wpt4wmpzZVhyUDTTigAhznPAtCam6EpL1YeNbtyrRN4evvZVFHBZJnmhosX
zI9iLF4eqdnJZKvh+L1VFU+py8aAZpz1ZEOatNMI+xdhArbGm7v89cldzaRkJhuW
LZr+JqYQGaOZS5qSnymwJL1KfF66+2QGpzdvzJN5FNIDACoqanATbZ/Iie2ENcM+
WwCEWrGJFDmM30raBNNcvx0yHFtVkcNbOymla4paVg7i29nu88Ynw4Z6seIIP11C
DryzLFhw+3jdTg2zK/te/wkhciJ0F+iZjo6VXywSMnwatf36bpdp4r4JLUVfEo2t
8DOGKyFMLYY1zOPMK9Th
=YwbM
-----END PGP SIGNATURE-----
Merge tag 'acpi-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI updates from Rafael Wysocki:
"The new feaures here are the support for ACPI overlays (allowing ACPI
tables to be loaded at any time from EFI variables or via configfs)
and the LPI (Low-Power Idle) support. Also notable is the ACPI-based
NUMA support for ARM64.
Apart from that we have two new drivers, for the DPTF (Dynamic Power
and Thermal Framework) power participant device and for the Intel
Broxton WhiskeyCove PMIC, some more PMIC-related changes, support for
the Boot Error Record Table (BERT) in APEI and support for
platform-initiated graceful shutdown.
Plus two new pieces of documentation and usual assorted fixes and
cleanups in quite a few places.
Specifics:
- Support for ACPI SSDT overlays allowing Secondary System
Description Tables (SSDTs) to be loaded at any time from EFI
variables or via configfs (Octavian Purdila, Mika Westerberg).
- Support for the ACPI LPI (Low-Power Idle) feature introduced in
ACPI 6.0 and allowing processor idle states to be represented in
ACPI tables in a hierarchical way (with the help of Processor
Container objects) and support for ACPI idle states management on
ARM64, based on LPI (Sudeep Holla).
- General improvements of ACPI support for NUMA and ARM64 support for
ACPI-based NUMA (Hanjun Guo, David Daney, Robert Richter).
- General improvements of the ACPI table upgrade mechanism and ARM64
support for that feature (Aleksey Makarov, Jon Masters).
- Support for the Boot Error Record Table (BERT) in APEI and
improvements of kernel messages printed by the error injection code
(Huang Ying, Borislav Petkov).
- New driver for the Intel Broxton WhiskeyCove PMIC operation region
and support for the REGS operation region on Broxton, PMIC code
cleanups (Bin Gao, Felipe Balbi, Paul Gortmaker).
- New driver for the power participant device which is part of the
Dynamic Power and Thermal Framework (DPTF) and DPTF-related code
reorganization (Srinivas Pandruvada).
- Support for the platform-initiated graceful shutdown feature
introduced in ACPI 6.1 (Prashanth Prakash).
- ACPI button driver update related to lid input events generated
automatically on initialization and system resume that have been
problematic for some time (Lv Zheng).
- ACPI EC driver cleanups (Lv Zheng).
- Documentation of the ACPICA release automation process and the
in-kernel ACPI AML debugger (Lv Zheng).
- New blacklist entry and two fixes for the ACPI backlight driver
(Alex Hung, Arvind Yadav, Ralf Gerbig).
- Cleanups of the ACPI pci_slot driver (Joe Perches, Paul Gortmaker).
- ACPI CPPC code changes to make it more robust against possible
defects in ACPI tables and new symbol definitions for PCC (Hoan
Tran).
- System reboot code modification to execute the ACPI _PTS (Prepare
To Sleep) method in addition to _TTS (Ocean He).
- ACPICA-related change to carry out lock ordering checks in ACPICA
if ACPICA debug is enabled in the kernel (Lv Zheng).
- Assorted minor fixes and cleanups (Andy Shevchenko, Baoquan He,
Bhaktipriya Shridhar, Paul Gortmaker, Rafael Wysocki)"
* tag 'acpi-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (71 commits)
ACPI: enable ACPI_PROCESSOR_IDLE on ARM64
arm64: add support for ACPI Low Power Idle(LPI)
drivers: firmware: psci: initialise idle states using ACPI LPI
cpuidle: introduce CPU_PM_CPU_IDLE_ENTER macro for ARM{32, 64}
arm64: cpuidle: drop __init section marker to arm_cpuidle_init
ACPI / processor_idle: Add support for Low Power Idle(LPI) states
ACPI / processor_idle: introduce ACPI_PROCESSOR_CSTATE
ACPI / DPTF: move int340x_thermal.c to the DPTF folder
ACPI / DPTF: Add DPTF power participant driver
ACPI / lpat: make it explicitly non-modular
ACPI / dock: make dock explicitly non-modular
ACPI / PCI: make pci_slot explicitly non-modular
ACPI / PMIC: remove modular references from non-modular code
ACPICA: Linux: Enable ACPI_MUTEX_DEBUG for Linux kernel
ACPI: Rename configfs.c to acpi_configfs.c to prevent link error
ACPI / debugger: Add AML debugger documentation
ACPI: Add documentation describing ACPICA release automation
ACPI: add support for loading SSDTs via configfs
ACPI: add support for configfs
efi / ACPI: load SSTDs from EFI variables
...
This allows an arch which needs to do special handing with respect to
different page size when flushing tlb to implement the same in mmu
gather.
Link: http://lkml.kernel.org/r/1465049193-22197-3-git-send-email-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This updates the generic and arch specific implementation to return true
if we need to do a tlb flush. That means if a __tlb_remove_page
indicate a flush is needed, the page we try to remove need to be tracked
and added again after the flush. We need to track it because we have
already update the pte to none and we can't just loop back.
This change is done to enable us to do a tlb_flush when we try to flush
a range that consists of different page sizes. For architectures like
ppc64, we can do a range based tlb flush and we need to track page size
for that. When we try to remove a huge page, we will force a tlb flush
and starts a new mmu gather.
[aneesh.kumar@linux.vnet.ibm.com: mm-change-the-interface-for-__tlb_remove_page-v3]
Link: http://lkml.kernel.org/r/1465049193-22197-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1464860389-29019-2-git-send-email-aneesh.kumar@linux.vnet.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull locking updates from Ingo Molnar:
"The locking tree was busier in this cycle than the usual pattern - a
couple of major projects happened to coincide.
The main changes are:
- implement the atomic_fetch_{add,sub,and,or,xor}() API natively
across all SMP architectures (Peter Zijlstra)
- add atomic_fetch_{inc/dec}() as well, using the generic primitives
(Davidlohr Bueso)
- optimize various aspects of rwsems (Jason Low, Davidlohr Bueso,
Waiman Long)
- optimize smp_cond_load_acquire() on arm64 and implement LSE based
atomic{,64}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}()
on arm64 (Will Deacon)
- introduce smp_acquire__after_ctrl_dep() and fix various barrier
mis-uses and bugs (Peter Zijlstra)
- after discovering ancient spin_unlock_wait() barrier bugs in its
implementation and usage, strengthen its semantics and update/fix
usage sites (Peter Zijlstra)
- optimize mutex_trylock() fastpath (Peter Zijlstra)
- ... misc fixes and cleanups"
* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (67 commits)
locking/atomic: Introduce inc/dec variants for the atomic_fetch_$op() API
locking/barriers, arch/arm64: Implement LDXR+WFE based smp_cond_load_acquire()
locking/static_keys: Fix non static symbol Sparse warning
locking/qspinlock: Use __this_cpu_dec() instead of full-blown this_cpu_dec()
locking/atomic, arch/tile: Fix tilepro build
locking/atomic, arch/m68k: Remove comment
locking/atomic, arch/arc: Fix build
locking/Documentation: Clarify limited control-dependency scope
locking/atomic, arch/rwsem: Employ atomic_long_fetch_add()
locking/atomic, arch/qrwlock: Employ atomic_fetch_add_acquire()
locking/atomic, arch/mips: Convert to _relaxed atomics
locking/atomic, arch/alpha: Convert to _relaxed atomics
locking/atomic: Remove the deprecated atomic_{set,clear}_mask() functions
locking/atomic: Remove linux/atomic.h:atomic_fetch_or()
locking/atomic: Implement atomic{,64,_long}_fetch_{add,sub,and,andnot,or,xor}{,_relaxed,_acquire,_release}()
locking/atomic: Fix atomic64_relaxed() bits
locking/atomic, arch/xtensa: Implement atomic_fetch_{add,sub,and,or,xor}()
locking/atomic, arch/x86: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
locking/atomic, arch/tile: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
locking/atomic, arch/sparc: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
...
We've had the thread info allocated together with the thread stack for
most architectures for a long time (since the thread_info was split off
from the task struct), but that is about to change.
But the patches that move the thread info to be off-stack (and a part of
the task struct instead) made it clear how confused the allocator and
freeing functions are.
Because the common case was that we share an allocation with the thread
stack and the thread_info, the two pointers were identical. That
identity then meant that we would have things like
ti = alloc_thread_info_node(tsk, node);
...
tsk->stack = ti;
which certainly _worked_ (since stack and thread_info have the same
value), but is rather confusing: why are we assigning a thread_info to
the stack? And if we move the thread_info away, the "confusing" code
just gets to be entirely bogus.
So remove all this confusion, and make it clear that we are doing the
stack allocation by renaming and clarifying the function names to be
about the stack. The fact that the thread_info then shares the
allocation is an implementation detail, and not really about the
allocation itself.
This is a pure renaming and type fix: we pass in the same pointer, it's
just that we clarify what the pointer means.
The ia64 code that actually only has one single allocation (for all of
task_struct, thread_info and kernel thread stack) now looks a bit odd,
but since "tsk->stack" is actually not even used there, that oddity
doesn't matter. It would be a separate thing to clean that up, I
intentionally left the ia64 changes as a pure brute-force renaming and
type change.
Acked-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Implement FETCH-OP atomic primitives, these are very similar to the
existing OP-RETURN primitives we already have, except they return the
value of the atomic variable _before_ modification.
This is especially useful for irreversible operations -- such as
bitops (because it becomes impossible to reconstruct the state prior
to modification).
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-arch@vger.kernel.org
Cc: linux-ia64@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This patch updates/fixes all spin_unlock_wait() implementations.
The update is in semantics; where it previously was only a control
dependency, we now upgrade to a full load-acquire to match the
store-release from the spin_unlock() we waited on. This ensures that
when spin_unlock_wait() returns, we're guaranteed to observe the full
critical section we waited on.
This fixes a number of spin_unlock_wait() users that (not
unreasonably) rely on this.
I also fixed a number of ticket lock versions to only wait on the
current lock holder, instead of for a full unlock, as this is
sufficient.
Furthermore; again for ticket locks; I added an smp_rmb() in between
the initial ticket load and the spin loop testing the current value
because I could not convince myself the address dependency is
sufficient, esp. if the loads are of different sizes.
I'm more than happy to remove this smp_rmb() again if people are
certain the address dependency does indeed work as expected.
Note: PPC32 will be fixed independently
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: chris@zankel.net
Cc: cmetcalf@mellanox.com
Cc: davem@davemloft.net
Cc: dhowells@redhat.com
Cc: james.hogan@imgtec.com
Cc: jejb@parisc-linux.org
Cc: linux@armlinux.org.uk
Cc: mpe@ellerman.id.au
Cc: ralf@linux-mips.org
Cc: realmz6@gmail.com
Cc: rkuo@codeaurora.org
Cc: rth@twiddle.net
Cc: schwidefsky@de.ibm.com
Cc: tony.luck@intel.com
Cc: vgupta@synopsys.com
Cc: ysato@users.sourceforge.jp
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
A while back Viro posted a number of 'interesting' mutex_is_locked()
users on IRC, one of those was RCU.
RCU seems to use mutex_is_locked() to avoid doing mutex_trylock(), the
regular load before modify pattern.
While the use isn't wrong per se, its curious in that its needed at all,
mutex_trylock() should be good enough on its own to avoid the pointless
cacheline bounces.
So fix those and remove the mutex_is_locked() (ab)use from RCU.
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Paul McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Waiman Long <Waiman.Long@hpe.com>
Link: http://lkml.kernel.org/r/20160601185815.GW3190@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The rwsem-xadd count has been converted to an atomic variable and the
rwsem code now directly uses atomic_long_add() and
atomic_long_add_return(), so we can remove the arch implementations of
rwsem_atomic_add() and rwsem_atomic_update().
Signed-off-by: Jason Low <jason.low2@hpe.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Jason Low <jason.low2@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Terry Rudd <terry.rudd@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Waiman Long <Waiman.Long@hpe.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Convert the rwsem count variable to an atomic_long_t since we use it
as an atomic variable. This also allows us to remove the
rwsem_atomic_{add,update}() "abstraction" which would now be an unnecesary
level of indirection. In follow up patches, we also remove the
rwsem_atomic_{add,update}() definitions across the various architectures.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Jason Low <jason.low2@hpe.com>
[ Build warning fixes on various architectures. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Terry Rudd <terry.rudd@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Waiman Long <Waiman.Long@hpe.com>
Link: http://lkml.kernel.org/r/1465017963-4839-2-git-send-email-jason.low2@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Nothing on these architectures ever includes the asm/mc146818rtc.h
file, the drivers that used to do this have been fixed long ago,
and the remaining users are all PC-specific.
This removes the files for good.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Alexandre Belloni <alexandre.belloni@free-electrons.com>