mirror of
https://github.com/torvalds/linux.git
synced 2024-11-12 15:11:50 +00:00
47a38e1550
22667 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Linus Torvalds
|
f5364c150a |
Merge branch 'stacking-fixes' (vfs stacking fixes from Jann)
Merge filesystem stacking fixes from Jann Horn. * emailed patches from Jann Horn <jannh@google.com>: sched: panic on corrupted stack end ecryptfs: forbid opening files without mmap handler proc: prevent stacking filesystems on top |
||
Jann Horn
|
29d6455178 |
sched: panic on corrupted stack end
Until now, hitting this BUG_ON caused a recursive oops (because oops handling involves do_exit(), which calls into the scheduler, which in turn raises an oops), which caused stuff below the stack to be overwritten until a panic happened (e.g. via an oops in interrupt context, caused by the overwritten CPU index in the thread_info). Just panic directly. Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Linus Torvalds
|
60e383037b |
Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar: "Two scheduler debugging fixes" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/debug: Fix 'schedstats=enable' cmdline option sched/debug: Fix /proc/sched_debug regression |
||
Linus Torvalds
|
7fcbc230c6 |
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar: "A handful of tooling fixes, two PMU driver fixes and a cleanup of redundant code that addresses a security analyzer false positive" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Remove a redundant check perf/x86/intel/uncore: Remove SBOX support for Broadwell server perf ctf: Convert invalid chars in a string before set value perf record: Fix crash when kptr is restricted perf symbols: Check kptr_restrict for root perf/x86/intel/rapl: Fix pmus free during cleanup |
||
Linus Torvalds
|
02b07bde61 |
Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Ingo Molnar: "Misc fixes: - a file-based futex fix - one more spin_unlock_wait() fix - a ww-mutex deadlock detection improvement/fix - and a raw_read_seqcount_latch() barrier fix" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: futex: Calculate the futex key based on a tail page for file-based futexes locking/qspinlock: Fix spin_unlock_wait() some more locking/ww_mutex: Report recursive ww_mutex locking early locking/seqcount: Re-fix raw_read_seqcount_latch() |
||
Linus Torvalds
|
698ea54dde |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller: 1) nfnetlink timestamp taken from wrong skb, fix from Florian Westphal. 2) Revert some msleep conversions in rtlwifi as these spots are in atomic context, from Larry Finger. 3) Validate that NFTA_SET_TABLE attribute is actually specified when we call nf_tables_getset(). From Phil Turnbull. 4) Don't do mdio_reset in stmmac driver with spinlock held as that can sleep, from Vincent Palatin. 5) sk_filter() does things other than run a BPF filter, so we should not elide it's call just because sk->sk_filter is NULL. Fix from Eric Dumazet. 6) Fix missing backlog updates in several packet schedulers, from Cong Wang. 7) bnx2x driver should allow VLAN add/remove while the interface is down, from Michal Schmidt. 8) Several RDS/TCP race fixes from Sowmini Varadhan. 9) fq_codel scheduler doesn't return correct queue length in dumps, from Eric Dumazet. 10) Fix TCP stats for tail loss probe and early retransmit in ipv6, from Yuchung Cheng. 11) Properly initialize udp_tunnel_socket_cfg in l2tp_tunnel_create(), from Guillaume Nault. 12) qfq scheduler leaks SKBs if a kzalloc fails, fix from Florian Westphal. 13) sock_fprog passed into PACKET_FANOUT_DATA needs compat handling, from Willem de Bruijn. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (85 commits) vmxnet3: segCnt can be 1 for LRO packets packet: compat support for sock_fprog stmmac: fix parameter to dwmac4_set_umac_addr() net/mlx5e: Fix blue flame quota logic net/mlx5e: Use ndo_stop explicitly at shutdown flow net/mlx5: E-Switch, always set mc_promisc for allmulti vports net/mlx5: E-Switch, Modify node guid on vf set MAC net/mlx5: E-Switch, Fix vport enable flow net/mlx5: E-Switch, Use the correct error check on returned pointers net/mlx5: E-Switch, Use the correct free() function net/mlx5: Fix E-Switch flow steering capabilities check net/mlx5: Fix flow steering NIC capabilities check net/mlx5: Fix root flow table update net/mlx5: Fix MLX5_CMD_OP_MAX to be defined correctly net/mlx5: Fix masking of reserved bits in XRCD number net/mlx5: Fix the size of modify QP mailbox mlxsw: spectrum: Don't sleep during ndo_get_phys_port_name() mlxsw: spectrum: Make split flow match firmware requirements wext: Fix 32 bit iwpriv compatibility issue with 64 bit Kernel cfg80211: remove get/set antenna and tx power warnings ... |
||
Linus Torvalds
|
524a3f2ca2 |
Power management fixes for v4.7-rc3
- Fix two intel_pstate initialization issues, one of which was introduced during the 4.4 cycle (Srinivas Pandruvada). - Fix kernel build with CONFIG_UBSAN set and CONFIG_CPU_IDLE unset (Catalin Marinas). -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJXWgWVAAoJEILEb/54YlRxG5EP/itHzuh6Sq706zRRAL2zLAiW 9ZEgnH7Q0rbcaGTkRUe9BInJgS1lOxXlnzaQvx0nMN8omahYasl0FjX+5a+pwV6T wE2tuw2wby0XI0gDwsEvTHpy6etCis46JfH8TJfPSkJOog83ZYoWeeolE9jwXPpP Zgw8Eby/PvLr89tiDJr1/keChnQu0u2Ejz2hHEygc1Z5uxthVWfM8aEVaq8TnAPx TBExCvjZ3+KuEjQTQdSy2Bw8h8z0AO086NARQJ7JwCtSqIyo6WEPvJ5q+Wcrt/Uq P0C7Lpkcv6323sHm5hMCy+M3ES5Kws7cRFK3D2qjpTdtGr0/ibbHcrp2Vcw6b1Pr 8yRzN/ZJ4eLnLZLqggi42FByCygo1uy9zcqcbBCwDtmseM55+NgZGtmGgRBlRRw0 eOZwNmDHV++uUMF0EclBpIPG4mgUa7EdlJBFBgsRDeRMk+7VqKuw2Nl+F8d+gfvn a0uUMHLUoSA7eI8/pKwGbdiaMVSKQahhPXhZcJOmBdV7eLFX5rCKiRujhNg2XNgU GaVCRgwP7HRs5RmrQQByQS09BeCgK/sqCZcWZh1VrHQFfww/cxnFQTYJuapgnz2i y0n6B2foMF2B3dxMDioqDkNWl7V/fgO0YuKCAea5eFrQA9GZf1zM8sbEtaq2khZw xCa0iom5cIrBOCc3gvTv =qmBM -----END PGP SIGNATURE----- Merge tag 'pm-4.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "Stable-candidate fixes for the intel_pstate driver and the cpuidle core. Specifics: - Fix two intel_pstate initialization issues, one of which was introduced during the 4.4 cycle (Srinivas Pandruvada) - Fix kernel build with CONFIG_UBSAN set and CONFIG_CPU_IDLE unset (Catalin Marinas)" * tag 'pm-4.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: intel_pstate: Fix ->set_policy() interface for no_turbo cpufreq: intel_pstate: Fix code ordering in intel_pstate_set_policy() cpuidle: Do not access cpuidle_devices when !CONFIG_CPU_IDLE |
||
Rafael J. Wysocki
|
3681196ae5 |
Merge branches 'pm-cpufreq-fixes' and 'pm-cpuidle'
* pm-cpufreq-fixes: cpufreq: intel_pstate: Fix ->set_policy() interface for no_turbo cpufreq: intel_pstate: Fix code ordering in intel_pstate_set_policy() * pm-cpuidle: cpuidle: Do not access cpuidle_devices when !CONFIG_CPU_IDLE |
||
Zhouyi Zhou
|
ba62bafe94 |
kernel/relay.c: fix potential memory leak
When relay_open_buf() fails in relay_open(), code will goto free_bufs, but chan is nowhere freed. Link: http://lkml.kernel.org/r/1464777927-19675-1-git-send-email-yizhouzhou@ict.ac.cn Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Mel Gorman
|
077fa7aed1 |
futex: Calculate the futex key based on a tail page for file-based futexes
Mike Galbraith reported that the LTP test case futex_wake04 was broken by commit |
||
Josh Poimboeuf
|
4698f88c06 |
sched/debug: Fix 'schedstats=enable' cmdline option
The 'schedstats=enable' option doesn't work, and also produces the
following warning during boot:
WARNING: CPU: 0 PID: 0 at /home/jpoimboe/git/linux/kernel/jump_label.c:61 static_key_slow_inc+0x8c/0xa0
static_key_slow_inc used before call to jump_label_init
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 4.7.0-rc1+ #25
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
0000000000000086 3ae3475a4bea95d4 ffffffff81e03da8 ffffffff8143fc83
ffffffff81e03df8 0000000000000000 ffffffff81e03de8 ffffffff810b1ffb
0000003d00000096 ffffffff823514d0 ffff88007ff197c8 0000000000000000
Call Trace:
[<ffffffff8143fc83>] dump_stack+0x85/0xc2
[<ffffffff810b1ffb>] __warn+0xcb/0xf0
[<ffffffff810b207f>] warn_slowpath_fmt+0x5f/0x80
[<ffffffff811e9c0c>] static_key_slow_inc+0x8c/0xa0
[<ffffffff810e07c6>] static_key_enable+0x16/0x40
[<ffffffff8216d633>] setup_schedstats+0x29/0x94
[<ffffffff82148a05>] unknown_bootoption+0x89/0x191
[<ffffffff810d8617>] parse_args+0x297/0x4b0
[<ffffffff82148d61>] start_kernel+0x1d8/0x4a9
[<ffffffff8214897c>] ? set_init_arg+0x55/0x55
[<ffffffff82148120>] ? early_idt_handler_array+0x120/0x120
[<ffffffff821482db>] x86_64_start_reservations+0x2f/0x31
[<ffffffff82148427>] x86_64_start_kernel+0x14a/0x16d
The problem is that it tries to update the 'sched_schedstats' static key
before jump labels have been initialized.
Changing jump_label_init() to be called earlier before
parse_early_param() wouldn't fix it: it would still fail trying to
poke_text() because mm isn't yet initialized.
Instead, just create a temporary '__sched_schedstats' variable which can
be copied to the static key later during sched_init() after jump labels
have been initialized.
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes:
|
||
Josh Poimboeuf
|
9c57259117 |
sched/debug: Fix /proc/sched_debug regression
Commit: |
||
Alexander Shishkin
|
62a92c8f55 |
perf/core: Remove a redundant check
There is no way to end up in _free_event() with event::pmu being NULL. The latter is initialized in event allocation path and remains set forever. In case of allocation failure, the error path doesn't use _free_event(). Having the check, however, suggests that it is possible to have a event::pmu==NULL situation in _free_event() and confuses the robots. This patch gets rid of the check. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: eranian@google.com Cc: vince@deater.net Link: http://lkml.kernel.org/r/1465303455-26032-1-git-send-email-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
Peter Zijlstra
|
2c61002271 |
locking/qspinlock: Fix spin_unlock_wait() some more
While this prior commit: |
||
Daniel Borkmann
|
5b6c1b4d46 |
bpf, trace: use READ_ONCE for retrieving file ptr
In bpf_perf_event_read() and bpf_perf_event_output(), we must use READ_ONCE() for fetching the struct file pointer, which could get updated concurrently, so we must prevent the compiler from potential refetching. We already do this with tail calls for fetching the related bpf_prog, but not so on stored perf events. Semantics for both are the same with regards to updates. Fixes: |
||
Linus Torvalds
|
8c52b6dcdd |
Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner: - a few simple fixes for fallout from the recent gic-v3 changes - a workaround for a Cavium thunderX erratum - a bugfix for the pic32 irqchip to make external interrupts work proper - a missing return value in the generic IPI management code * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/irq-pic32-evic: Fix bug with external interrupts. irqchip/gicv3-its: numa: Enable workaround for Cavium thunderx erratum 23144 irqchip/gic-v3: Fix quiescence check in gic_enable_redist irqchip/gic-v3: Fix copy+paste mistakes in defines irqchip/gic-v3: Fix ICC_SGI1R_EL1.INTID decoding mask genirq: Fix missing return value in irq_destroy_ipi() |
||
Thomas Gleixner
|
2eec3707a3 |
irqchip updates for 4.7-rc1:
- A number of embarassing buglets (GICv3, PIC32) - A more substential errata workaround for Cavium's GICv3 ITS (kept for post-rc1 due to its dependency on NUMA) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJXUXcjAAoJECPQ0LrRPXpDGFcQAICkj5cIsQghW2dgR0eo2d7S +ieyxr55tz3A0c1Cisw09ESHz6wCrQ+PmvXkKISIG1l2AHv9UOUZVrsB1cl2CvBg C9ClvUyMafiZ+FlhxMO1QM1Vfa3EUV2EPIx3mh5klUp8ph/cT+aArJe+WmJApS25 nlYiobi2AE0+m2V5ekikMtVbM5xXWKHPRgzYqZUlNBV74k/FGgRlBk/bw1AWqnsd TfUF+QZpEd/4GPglbQLvwJwjQg+qanl59CJqi403U00emLuvRqdqTeMoqPEfw5id MgVPMtUF+N7fgtygo20oPFBriFBHFUTj+c5Oafd4ahgp6eU02HYX8A7w/jj84tTP cPa9bcoyKyec5vpO1mbU2a/VzqXPDNL17Dg9tRaf0NpksMeLvBh14jXWp5v8vEqU Qm4mFlmEYKivWTJhz6pGJmxFX/X5vMa2wrFY7xvOVYby5mSuEGD7+puuKuVNNBEa THaElOYM4ogTrUBM39dzfzXxSEQN/bcLHXNd2IDuUUK49NvNFjnke3PvLxesiDuT Pxk4mO912+/Ldk7K1LVVVWltVtOHFNG+I7a3R75gwftHXMYONuOpEiflA4QorxVk 9Rq9yUI4h/69V5fIthBN8BUYU5hxaTtLi0DI1fWSweugZb5PUXiagKnXKaIOLQ/4 A3pvoYEHdynDVO7nJ+Kd =E+j+ -----END PGP SIGNATURE----- Merge tag 'irqchip-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent Merge irqchip updates from Marc Zyngier: - A number of embarassing buglets (GICv3, PIC32) - A more substential errata workaround for Cavium's GICv3 ITS (kept for post-rc1 due to its dependency on NUMA) |
||
Chris Wilson
|
0422e83d84 |
locking/ww_mutex: Report recursive ww_mutex locking early
Recursive locking for ww_mutexes was originally conceived as an exception. However, it is heavily used by the DRM atomic modesetting code. Currently, the recursive deadlock is checked after we have queued up for a busy-spin and as we never release the lock, we spin until kicked, whereupon the deadlock is discovered and reported. A simple solution for the now common problem is to move the recursive deadlock discovery to the first action when taking the ww_mutex. Suggested-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1464293297-19777-1-git-send-email-chris@chris-wilson.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
Catalin Marinas
|
9bd616e3db |
cpuidle: Do not access cpuidle_devices when !CONFIG_CPU_IDLE
The cpuidle_devices per-CPU variable is only defined when CPU_IDLE is
enabled. Commit
|
||
Linus Torvalds
|
6b15d6650c |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller: 1) Fix negative error code usage in ATM layer, from Stefan Hajnoczi. 2) If CONFIG_SYSCTL is disabled, the default TTL is not initialized properly. From Ezequiel Garcia. 3) Missing spinlock init in mvneta driver, from Gregory CLEMENT. 4) Missing unlocks in hwmb error paths, also from Gregory CLEMENT. 5) Fix deadlock on team->lock when propagating features, from Ivan Vecera. 6) Work around buffer offset hw bug in alx chips, from Feng Tang. 7) Fix double listing of SCTP entries in sctp_diag dumps, from Xin Long. 8) Various statistics bug fixes in mlx4 from Eric Dumazet. 9) Fix some randconfig build errors wrt fou ipv6 from Arnd Bergmann. 10) All of l2tp was namespace aware, but the ipv6 support code was not doing so. From Shmulik Ladkani. 11) Handle on-stack hrtimers properly in pktgen, from Guenter Roeck. 12) Propagate MAC changes properly through VLAN devices, from Mike Manning. 13) Fix memory leak in bnx2x_init_one(), from Vitaly Kuznetsov. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (62 commits) sfc: Track RPS flow IDs per channel instead of per function usbnet: smsc95xx: fix link detection for disabled autonegotiation virtio_net: fix virtnet_open and virtnet_probe competing for try_fill_recv bnx2x: avoid leaking memory on bnx2x_init_one() failures fou: fix IPv6 Kconfig options openvswitch: update checksum in {push,pop}_mpls sctp: sctp_diag should dump sctp socket type net: fec: update dirty_tx even if no skb vlan: Propagate MAC address to VLANs atm: iphase: off by one in rx_pkt() atm: firestream: add more reserved strings vxlan: Accept user specified MTU value when create new vxlan link net: pktgen: Call destroy_hrtimer_on_stack() timer: Export destroy_hrtimer_on_stack() net: l2tp: Make l2tp_ip6 namespace aware Documentation: ip-sysctl.txt: clarify secure_redirects sfc: use flow dissector helpers for aRFS ieee802154: fix logic error in ieee802154_llsec_parse_dev_addr net: nps_enet: Disable interrupts before napi reschedule net/lapb: tuse %*ph to dump buffers ... |
||
Guenter Roeck
|
c08376ac97 |
timer: Export destroy_hrtimer_on_stack()
hrtimer_init_on_stack() needs a matching call to destroy_hrtimer_on_stack(), so both need to be exported. Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Linus Torvalds
|
d102a56edb |
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro: "Followups to the parallel lookup work: - update docs - restore killability of the places that used to take ->i_mutex killably now that we have down_write_killable() merged - Additionally, it turns out that I missed a prerequisite for security_d_instantiate() stuff - ->getxattr() wasn't the only thing that could be called before dentry is attached to inode; with smack we needed the same treatment applied to ->setxattr() as well" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: switch ->setxattr() to passing dentry and inode separately switch xattr_handler->set() to passing dentry and inode separately restore killability of old mutex_lock_killable(&inode->i_mutex) users add down_write_killable_nested() update D/f/directory-locking |
||
Arnd Bergmann
|
287980e49f |
remove lots of IS_ERR_VALUE abuses
Most users of IS_ERR_VALUE() in the kernel are wrong, as they pass an 'int' into a function that takes an 'unsigned long' argument. This happens to work because the type is sign-extended on 64-bit architectures before it gets converted into an unsigned type. However, anything that passes an 'unsigned short' or 'unsigned int' argument into IS_ERR_VALUE() is guaranteed to be broken, as are 8-bit integers and types that are wider than 'unsigned long'. Andrzej Hajda has already fixed a lot of the worst abusers that were causing actual bugs, but it would be nice to prevent any users that are not passing 'unsigned long' arguments. This patch changes all users of IS_ERR_VALUE() that I could find on 32-bit ARM randconfig builds and x86 allmodconfig. For the moment, this doesn't change the definition of IS_ERR_VALUE() because there are probably still architecture specific users elsewhere. Almost all the warnings I got are for files that are better off using 'if (err)' or 'if (err < 0)'. The only legitimate user I could find that we get a warning for is the (32-bit only) freescale fman driver, so I did not remove the IS_ERR_VALUE() there but changed the type to 'unsigned long'. For 9pfs, I just worked around one user whose calling conventions are so obscure that I did not dare change the behavior. I was using this definition for testing: #define IS_ERR_VALUE(x) ((unsigned long*)NULL == (typeof (x)*)NULL && \ unlikely((unsigned long long)(x) >= (unsigned long long)(typeof(x))-MAX_ERRNO)) which ends up making all 16-bit or wider types work correctly with the most plausible interpretation of what IS_ERR_VALUE() was supposed to return according to its users, but also causes a compile-time warning for any users that do not pass an 'unsigned long' argument. I suggested this approach earlier this year, but back then we ended up deciding to just fix the users that are obviously broken. After the initial warning that caused me to get involved in the discussion (fs/gfs2/dir.c) showed up again in the mainline kernel, Linus asked me to send the whole thing again. [ Updated the 9p parts as per Al Viro - Linus ] Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Andrzej Hajda <a.hajda@samsung.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.org/lkml/2016/1/7/363 Link: https://lkml.org/lkml/2016/5/27/486 Acked-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> # For nvmem part Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Linus Torvalds
|
5b26fc8824 |
Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
Pull kbuild updates from Michal Marek: - new option CONFIG_TRIM_UNUSED_KSYMS which does a two-pass build and unexports symbols which are not used in the current config [Nicolas Pitre] - several kbuild rule cleanups [Masahiro Yamada] - warning option adjustments for gcov etc [Arnd Bergmann] - a few more small fixes * 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: (31 commits) kbuild: move -Wunused-const-variable to W=1 warning level kbuild: fix if_change and friends to consider argument order kbuild: fix adjust_autoksyms.sh for modules that need only one symbol kbuild: fix ksym_dep_filter when multiple EXPORT_SYMBOL() on the same line gcov: disable -Wmaybe-uninitialized warning gcov: disable tree-loop-im to reduce stack usage gcov: disable for COMPILE_TEST Kbuild: disable 'maybe-uninitialized' warning for CONFIG_PROFILE_ALL_BRANCHES Kbuild: change CC_OPTIMIZE_FOR_SIZE definition kbuild: forbid kernel directory to contain spaces and colons kbuild: adjust ksym_dep_filter for some cmd_* renames kbuild: Fix dependencies for final vmlinux link kbuild: better abstract vmlinux sequential prerequisites kbuild: fix call to adjust_autoksyms.sh when output directory specified kbuild: Get rid of KBUILD_STR kbuild: rename cmd_as_s_S to cmd_cpp_s_S kbuild: rename cmd_cc_i_c to cmd_cpp_i_c kbuild: drop redundant "PHONY += FORCE" kbuild: delete unnecessary "@:" kbuild: mark help target as PHONY ... |
||
Michal Hocko
|
7ef949d77f |
mm: oom_reaper: remove some bloat
mmput_async is currently used only from the oom_reaper which is defined only for CONFIG_MMU. We can save work_struct in mm_struct for !CONFIG_MMU. [akpm@linux-foundation.org: fix typo, per Minchan] Link: http://lkml.kernel.org/r/20160520061658.GB19172@dhcp22.suse.cz Reported-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Wenwei Tao
|
b00c52dae6 |
cgroup: remove redundant cleanup in css_create
When create css failed, before call css_free_rcu_fn, we remove the css id and exit the percpu_ref, but we will do these again in css_free_work_fn, so they are redundant. Especially the css id, that would cause problem if we remove it twice, since it may be assigned to another css after the first remove. tj: This was broken by two commits updating the free path without synchronizing the creation failure path. This can be easily triggered by trying to create more than 64k memory cgroups. Signed-off-by: Wenwei Tao <ww.tao0320@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Vladimir Davydov <vdavydov@parallels.com> Fixes: |
||
Al Viro
|
887bddfa90 |
add down_write_killable_nested()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
Linus Torvalds
|
f89eae4ee7 |
Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar: "Two fixes: one for a lost wakeup, the other to fix the compiler optimizing out preempt operations on ARM64 (and possibly other non-x86 architectures)" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/core: Fix remote wakeups sched/preempt: Fix preempt_count manipulations |
||
Linus Torvalds
|
bdc6b758e4 |
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar: "Mostly tooling and PMU driver fixes, but also a number of late updates such as the reworking of the call-chain size limiting logic to make call-graph recording more robust, plus tooling side changes for the new 'backwards ring-buffer' extension to the perf ring-buffer" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits) perf record: Read from backward ring buffer perf record: Rename variable to make code clear perf record: Prevent reading invalid data in record__mmap_read perf evlist: Add API to pause/resume perf trace: Use the ptr->name beautifier as default for "filename" args perf trace: Use the fd->name beautifier as default for "fd" args perf report: Add srcline_from/to branch sort keys perf evsel: Record fd into perf_mmap perf evsel: Add overwrite attribute and check write_backward perf tools: Set buildid dir under symfs when --symfs is provided perf trace: Only auto set call-graph to "dwarf" when syscalls are being traced perf annotate: Sort list of recognised instructions perf annotate: Fix identification of ARM blt and bls instructions perf tools: Fix usage of max_stack sysctl perf callchain: Stop validating callchains by the max_stack sysctl perf trace: Fix exit_group() formatting perf top: Use machine->kptr_restrict_warned perf trace: Warn when trying to resolve kernel addresses with kptr_restrict=1 perf machine: Do not bail out if not managing to read ref reloc symbol perf/x86/intel/p4: Trival indentation fix, remove space ... |
||
Linus Torvalds
|
877c057d2b |
More power management updates for v4.7-rc1
- Stable-candidate cpuidle fix to make it check the right variable when deciding whether or not to enable interrupts on the local CPU so as to avoid enabling iterrupts too early in some cases if the system has both coupled and per-core idle states (Daniel Lezcano). - Stable-candidate PM core fix to make it handle failures at the "late suspend" stage of device suspend consistently for all devices regardless of whether or not async suspend/resume is enabled for them (Rafael Wysocki). - Cleanups in the cpufreq core, the schedutil governor and the intel_pstate driver (Rafael Wysocki, Pankaj Gupta, Viresh Kumar). -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJXRglcAAoJEILEb/54YlRxCwEQAKQNLO5wrNnOQmVrMHaOk1XH 4zT+AdggLRVgBld4d4Vcli2zJPZfpnZuzjJdwTB5zLgQ4WIcBb6meOfH87XGqRyJ o4ksyEUhpvDk8AdlmTA5CvvjFuydPJG5ZUSiM035XRT9heebvhgyaMBnT3ucXbq9 7LhNhCQ+a8arndt9ePO7tZnFfQQUbwNJ2BDVuH5DJBqMIFOo2/Kpag43CdFWlWZT jnWaleDCjSmanuJ/45bFJHJeSZ7PK2etnArfzKtb9QLSGnuEfFPdHuUzJYo5dkP7 UBeYA94hhfR3f5FJIqNlF3N+eLEX1idpwxc8+CJLLDKDd1ZCBrbLoz5fwM+fVn0h AfmyR+J1czcbiphsmpViOYDRrKdiQVkbP6SpBswvgMCZAcNDF2bxhzOlcuTUc+u0 8xsjWOtArL6uvzsAHa1HY6hhgUn9FB8m20HX+DmS2/zzqyzoRefenoyVcuLsAhXC fm+sARQ7tvy3OoGRQ9mloWgv2X5iQUY5IVjOG2amIhbUvVmKQutPjVTGTwHqmmcb 2nNYptLsTA6crvnexPcPHY+OFjkQl/omtfaMx+OJl63yhln5ibveGOfZ6F8sPdoB bRqHuHoK/xh9hSNwj117ZFzq1nm54mLjh0Yhw3EXFcV4I9vdsTp8/WeNThGvT17j M+6PDXyjlwh3HZpGm+HW =3vtL -----END PGP SIGNATURE----- Merge tag 'pm-4.7-rc1-more' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more power management updates from Rafael Wysocki: "These are two stable-candidate fixes (PM core, cpuidle) and a bunch of cpufreq cleanups. Specifics: - Stable-candidate cpuidle fix to make it check the right variable when deciding whether or not to enable interrupts on the local CPU so as to avoid enabling iterrupts too early in some cases if the system has both coupled and per-core idle states (Daniel Lezcano). - Stable-candidate PM core fix to make it handle failures at the "late suspend" stage of device suspend consistently for all devices regardless of whether or not async suspend/resume is enabled for them (Rafael Wysocki). - Cleanups in the cpufreq core, the schedutil governor and the intel_pstate driver (Rafael Wysocki, Pankaj Gupta, Viresh Kumar)" * tag 'pm-4.7-rc1-more' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: PM / sleep: Handle failures in device_suspend_late() consistently cpufreq: schedutil: Improve prints messages with pr_fmt cpuidle: Fix cpuidle_state_is_coupled() argument in cpuidle_enter() cpufreq: simplified goto out in cpufreq_register_driver() cpufreq: governor: CPUFREQ_GOV_STOP never fails cpufreq: governor: CPUFREQ_GOV_POLICY_EXIT never fails intel_pstate: Simplify conditional in intel_pstate_set_policy() |
||
Rafael J. Wysocki
|
4c2628cd75 |
Merge branches 'pm-cpufreq', 'pm-cpuidle' and 'pm-core'
* pm-cpufreq: cpufreq: schedutil: Improve prints messages with pr_fmt cpufreq: simplified goto out in cpufreq_register_driver() cpufreq: governor: CPUFREQ_GOV_STOP never fails cpufreq: governor: CPUFREQ_GOV_POLICY_EXIT never fails intel_pstate: Simplify conditional in intel_pstate_set_policy() * pm-cpuidle: cpuidle: Fix cpuidle_state_is_coupled() argument in cpuidle_enter() * pm-core: PM / sleep: Handle failures in device_suspend_late() consistently |
||
Peter Zijlstra
|
b7e7ade34e |
sched/core: Fix remote wakeups
Commit: |
||
Linus Torvalds
|
0e01df100b |
Fix a number of bugs, most notably a potential stale data exposure
after a crash and a potential BUG_ON crash if a file has the data journalling flag enabled while it has dirty delayed allocation blocks that haven't been written yet. Also fix a potential crash in the new project quota code and a maliciously corrupted file system. In addition, fix some DAX-specific bugs, including when there is a transient ENOSPC situation and races between writes via direct I/O and an mmap'ed segment that could lead to lost I/O. Finally the usual set of miscellaneous cleanups. -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAABCAAGBQJXQ40fAAoJEPL5WVaVDYGjnwMH+wXHASgPfzZgtRInsTG8W/2L jsmAcMlyMAYIATWMppNtPIq0td49z1dYO0YkKhtPVMwfzu230IFWhGWp93WqP9ve XYHMmaBorFlMAzWgMKn1K0ExWZlV+ammmcTKgU0kU4qyZp0G/NnMtlXIkSNv2amI 9Mn6R+v97c20gn8e9HWP/IVWkgPr+WBtEXaSGjC7dL6yI8hL+rJMqN82D76oU5ea vtwzrna/ISijy+etYmQzqHNYNaBKf40+B5HxQZw/Ta3FSHofBwXAyLaeEAr260Mf V3Eg2NDcKQxiZ3adBzIUvrRnrJV381OmHoguo8Frs8YHTTRiZ0T/s7FGr2Q0NYE= =7yIM -----END PGP SIGNATURE----- Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Fix a number of bugs, most notably a potential stale data exposure after a crash and a potential BUG_ON crash if a file has the data journalling flag enabled while it has dirty delayed allocation blocks that haven't been written yet. Also fix a potential crash in the new project quota code and a maliciously corrupted file system. In addition, fix some DAX-specific bugs, including when there is a transient ENOSPC situation and races between writes via direct I/O and an mmap'ed segment that could lead to lost I/O. Finally the usual set of miscellaneous cleanups" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (23 commits) ext4: pre-zero allocated blocks for DAX IO ext4: refactor direct IO code ext4: fix race in transient ENOSPC detection ext4: handle transient ENOSPC properly for DAX dax: call get_blocks() with create == 1 for write faults to unwritten extents ext4: remove unmeetable inconsisteny check from ext4_find_extent() jbd2: remove excess descriptions for handle_s ext4: remove unnecessary bio get/put ext4: silence UBSAN in ext4_mb_init() ext4: address UBSAN warning in mb_find_order_for_block() ext4: fix oops on corrupted filesystem ext4: fix check of dqget() return value in ext4_ioctl_setproject() ext4: clean up error handling when orphan list is corrupted ext4: fix hang when processing corrupted orphaned inode list ext4: remove trailing \n from ext4_warning/ext4_error calls ext4: fix races between changing inode journal mode and ext4_writepages ext4: handle unwritten or delalloc buffers before enabling data journaling ext4: fix jbd2 handle extension in ext4_ext_truncate_extend_restart() ext4: do not ask jbd2 to write data for delalloc buffers jbd2: add support for avoiding data writes during transaction commits ... |
||
Matt Redfearn
|
59fa586020 |
genirq: Fix missing return value in irq_destroy_ipi()
Commit |
||
Michal Hocko
|
598fdc1d66 |
uprobes: wait for mmap_sem for write killable
xol_add_vma needs mmap_sem for write. If the waiting task gets killed by the oom killer it would block oom_reaper from asynchronous address space reclaim and reduce the chances of timely OOM resolving. Wait for the lock in the killable mode and return with EINTR if the task got killed while waiting. Do not warn in dup_xol_work if __create_xol_area failed due to fatal signal pending because this is usually considered a kernel issue. Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Michal Hocko
|
17b0573d77 |
prctl: make PR_SET_THP_DISABLE wait for mmap_sem killable
PR_SET_THP_DISABLE requires mmap_sem for write. If the waiting task gets killed by the oom killer it would block oom_reaper from asynchronous address space reclaim and reduce the chances of timely OOM resolving. Wait for the lock in the killable mode and return with EINTR if the task got killed while waiting. Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Alex Thorlton <athorlton@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Michal Hocko
|
7c05126793 |
mm, fork: make dup_mmap wait for mmap_sem for write killable
dup_mmap needs to lock current's mm mmap_sem for write. If the waiting task gets killed by the oom killer it would block oom_reaper from asynchronous address space reclaim and reduce the chances of timely OOM resolving. Wait for the lock in the killable mode and return with EINTR if the task got killed while waiting. Signed-off-by: Michal Hocko <mhocko@suse.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Xunlei Pang
|
7a0058ec78 |
s390/kexec: consolidate crash_map/unmap_reserved_pages() and arch_kexec_protect(unprotect)_crashkres()
Commit 3f625002581b ("kexec: introduce a protection mechanism for the crashkernel reserved memory") is a similar mechanism for protecting the crash kernel reserved memory to previous crash_map/unmap_reserved_pages() implementation, the new one is more generic in name and cleaner in code (besides, some arch may not be allowed to unmap the pgtable). Therefore, this patch consolidates them, and uses the new arch_kexec_protect(unprotect)_crashkres() to replace former crash_map/unmap_reserved_pages() which by now has been only used by S390. The consolidation work needs the crash memory to be mapped initially, this is done in machine_kdump_pm_init() which is after reserve_crashkernel(). Once kdump kernel is loaded, the new arch_kexec_protect_crashkres() implemented for S390 will actually unmap the pgtable like before. Signed-off-by: Xunlei Pang <xlpang@redhat.com> Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Minfei Huang <mhuang@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Dave Young <dyoung@redhat.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Minfei Huang
|
0eea08678e |
kexec: do a cleanup for function kexec_load
There are a lof of work to be done in function kexec_load, not only for allocating structs and loading initram, but also for some misc. To make it more clear, wrap a new function do_kexec_load which is used to allocate structs and load initram. And the pre-work will be done in kexec_load. Signed-off-by: Minfei Huang <mnfhuang@gmail.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Xunlei Pang <xlpang@redhat.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Minfei Huang
|
917a35605f |
kexec: make a pair of map/unmap reserved pages in error path
For some arch, kexec shall map the reserved pages, then use them, when we try to start the kdump service. kexec may return directly, without unmaping the reserved pages, if it fails during starting service. To fix it, we make a pair of map/unmap reserved pages both in generic path and error path. This patch only affects s390. Other architecturess don't implement the interface of crash_unmap_reserved_pages and crash_map_reserved_pages. It isn't a urgent patch. Kernel can work well without any risk, although the reserved pages are not unmapped before returning in error path. Signed-off-by: Minfei Huang <mnfhuang@gmail.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Xunlei Pang <xlpang@redhat.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Xunlei Pang
|
9b492cf580 |
kexec: introduce a protection mechanism for the crashkernel reserved memory
For the cases that some kernel (module) path stamps the crash reserved memory(already mapped by the kernel) where has been loaded the second kernel data, the kdump kernel will probably fail to boot when panic happens (or even not happens) leaving the culprit at large, this is unacceptable. The patch introduces a mechanism for detecting such cases: 1) After each crash kexec loading, it simply marks the reserved memory regions readonly since we no longer access it after that. When someone stamps the region, the first kernel will panic and trigger the kdump. The weak arch_kexec_protect_crashkres() is introduced to do the actual protection. 2) To allow multiple loading, once 1) was done we also need to remark the reserved memory to readwrite each time a system call related to kdump is made. The weak arch_kexec_unprotect_crashkres() is introduced to do the actual protection. The architecture can make its specific implementation by overriding arch_kexec_protect_crashkres() and arch_kexec_unprotect_crashkres(). Signed-off-by: Xunlei Pang <xlpang@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: Dave Young <dyoung@redhat.com> Cc: Minfei Huang <mhuang@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Andi Kleen
|
725fc629ff |
kernek/fork.c: allocate idle task for a CPU always on its local node
Linux preallocates the task structs of the idle tasks for all possible CPUs. This currently means they all end up on node 0. This also implies that the cache line of MWAIT, which is around the flags field in the task struct, are all located in node 0. We see a noticeable performance improvement on Knights Landing CPUs when the cache lines used for MWAIT are located in the local nodes of the CPUs using them. I would expect this to give a (likely slight) improvement on other systems too. The patch implements placing the idle task in the node of its CPUs, by passing the right target node to copy_process() [akpm@linux-foundation.org: use NUMA_NO_NODE, not a bare -1] Link: http://lkml.kernel.org/r/1463492694-15833-1-git-send-email-andi@firstfloor.org Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Wang Xiaoqiang
|
747800efbe |
kernel/signal.c: convert printk(KERN_<LEVEL> ...) to pr_<level>(...)
Use pr_<level> instead of printk(KERN_<LEVEL> ). Signed-off-by: Wang Xiaoqiang <wangxq10@lzu.edu.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Oleg Nesterov
|
91c4e8ea8f |
wait: allow sys_waitid() to accept __WNOTHREAD/__WCLONE/__WALL
I see no reason why waitid() can't support other linux-specific flags allowed in sys_wait4(). In particular this change can help if we reconsider the previous change ("wait/ptrace: assume __WALL if the child is traced") which adds the "automagical" __WALL for debugger. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Jan Kratochvil <jan.kratochvil@redhat.com> Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> Cc: Pedro Alves <palves@redhat.com> Cc: Roland McGrath <roland@hack.frob.com> Cc: <syzkaller@googlegroups.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Oleg Nesterov
|
bf959931dd |
wait/ptrace: assume __WALL if the child is traced
The following program (simplified version of generated by syzkaller) #include <pthread.h> #include <unistd.h> #include <sys/ptrace.h> #include <stdio.h> #include <signal.h> void *thread_func(void *arg) { ptrace(PTRACE_TRACEME, 0,0,0); return 0; } int main(void) { pthread_t thread; if (fork()) return 0; while (getppid() != 1) ; pthread_create(&thread, NULL, thread_func, NULL); pthread_join(thread, NULL); return 0; } creates an unreapable zombie if /sbin/init doesn't use __WALL. This is not a kernel bug, at least in a sense that everything works as expected: debugger should reap a traced sub-thread before it can reap the leader, but without __WALL/__WCLONE do_wait() ignores sub-threads. Unfortunately, it seems that /sbin/init in most (all?) distributions doesn't use it and we have to change the kernel to avoid the problem. Note also that most init's use sys_waitid() which doesn't allow __WALL, so the necessary user-space fix is not that trivial. This patch just adds the "ptrace" check into eligible_child(). To some degree this matches the "tsk->ptrace" in exit_notify(), ->exit_signal is mostly ignored when the tracee reports to debugger. Or WSTOPPED, the tracer doesn't need to set this flag to wait for the stopped tracee. This obviously means the user-visible change: __WCLONE and __WALL no longer have any meaning for debugger. And I can only hope that this won't break something, but at least strace/gdb won't suffer. We could make a more conservative change. Say, we can take __WCLONE into account, or !thread_group_leader(). But it would be nice to not complicate these historical/confusing checks. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Jan Kratochvil <jan.kratochvil@redhat.com> Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> Cc: Pedro Alves <palves@redhat.com> Cc: Roland McGrath <roland@hack.frob.com> Cc: <syzkaller@googlegroups.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Ralf Baechle
|
f43edca7ed |
ELF/MIPS build fix
CONFIG_MIPS32_N32=y but CONFIG_BINFMT_ELF disabled results in the following linker errors: arch/mips/built-in.o: In function `elf_core_dump': binfmt_elfn32.c:(.text+0x23dbc): undefined reference to `elf_core_extra_phdrs' binfmt_elfn32.c:(.text+0x246e4): undefined reference to `elf_core_extra_data_size' binfmt_elfn32.c:(.text+0x248d0): undefined reference to `elf_core_write_extra_phdrs' binfmt_elfn32.c:(.text+0x24ac4): undefined reference to `elf_core_write_extra_data' CONFIG_MIPS32_O32=y but CONFIG_BINFMT_ELF disabled results in the following linker errors: arch/mips/built-in.o: In function `elf_core_dump': binfmt_elfo32.c:(.text+0x28a04): undefined reference to `elf_core_extra_phdrs' binfmt_elfo32.c:(.text+0x29330): undefined reference to `elf_core_extra_data_size' binfmt_elfo32.c:(.text+0x2951c): undefined reference to `elf_core_write_extra_phdrs' binfmt_elfo32.c:(.text+0x29710): undefined reference to `elf_core_write_extra_data' This is because binfmt_elfn32 and binfmt_elfo32 are using symbols from elfcore but for these configurations elfcore will not be built. Fixed by making elfcore selectable by a separate config symbol which unlike the current mechanism can also be used from other directories than kernel/, then having each flavor of ELF that relies on elfcore.o, select it in Kconfig, including CONFIG_MIPS32_N32 and CONFIG_MIPS32_O32 which fixes this issue. Link: http://lkml.kernel.org/r/20160520141705.GA1913@linux-mips.org Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Reviewed-by: James Hogan <james.hogan@imgtec.com> Cc: "Maciej W. Rozycki" <macro@linux-mips.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Daniel Borkmann
|
612bacad78 |
bpf, inode: disallow userns mounts
Follow-up to commit |
||
Linus Torvalds
|
7639dad93a |
Three more changes.
1) I forgot that I had another selftest to stress test the ftrace instance creation. It was actually suppose to go into the 4.6 merge window, but I never committed it. I almost forgot about it again, but noticed it was missing from your tree. 2) Soumya PN sent me a clean up patch to not disable interrupts when taking the tasklist_lock for read, as it's unnecessary because that lock is never taken for write in irq context. 3) Newer gcc's can cause the jump in the function_graph code to the global ftrace_stub label to be a short jump instead of a long one. As that jump is dynamically converted to jump to the trace code to do function graph tracing, and that conversion expects a long jump it can corrupt the ftrace_stub itself (it's directly after that call). One way to prevent gcc from using a short jump is to declare the ftrace_stub as a weak function, which we do here to keep gcc from optimizing too much. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJXQhYQAAoJEKKk/i67LK/82pAH/3XzRCP366HqWnKdvluPB8vX UnVoXGAX1Eh2ZpvlPIJBXNYOZlnGRMMMAoeI+su31FoJHrzTzfGXvRynTkZPFZtd XakvHfACjtGtvi2MuCN1t9/d1ty/ob2o05KB9qc+JRlzHM09qTL/HX8hwZeEsMQ4 NYgEY4Y727LOSCrJieLktchpwtie77q8Wq25oiWIVWOyDjpCsPnZyaOqaQSANot9 Gd00cixbMam7Ba1BjoRsRQZaT2pYZ8vt7HDXDBfAOW1oOjalWARLhRg/zww1V3WD DEptuEeyAgMJS3v76Z6Sbk/QM7hyGUWCcmC2qaN1yc2n1Sh+zBOiN1eyiiUh/2U= =ERxv -----END PGP SIGNATURE----- Merge tag 'trace-v4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull motr tracing updates from Steven Rostedt: "Three more changes. - I forgot that I had another selftest to stress test the ftrace instance creation. It was actually suppose to go into the 4.6 merge window, but I never committed it. I almost forgot about it again, but noticed it was missing from your tree. - Soumya PN sent me a clean up patch to not disable interrupts when taking the tasklist_lock for read, as it's unnecessary because that lock is never taken for write in irq context. - Newer gcc's can cause the jump in the function_graph code to the global ftrace_stub label to be a short jump instead of a long one. As that jump is dynamically converted to jump to the trace code to do function graph tracing, and that conversion expects a long jump it can corrupt the ftrace_stub itself (it's directly after that call). One way to prevent gcc from using a short jump is to declare the ftrace_stub as a weak function, which we do here to keep gcc from optimizing too much" * tag 'trace-v4.7-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftrace/x86: Set ftrace_stub to weak to prevent gcc from using short jumps to it ftrace: Don't disable irqs when taking the tasklist_lock read_lock ftracetest: Add instance created, delete, read and enable event test |
||
Linus Torvalds
|
bd28b14591 |
x86: remove more uaccess_32.h complexity
I'm looking at trying to possibly merge the 32-bit and 64-bit versions of the x86 uaccess.h implementation, but first this needs to be cleaned up. For example, the 32-bit version of "__copy_from_user_inatomic()" is mostly the special cases for the constant size, and it's actually almost never relevant. Most users aren't actually using a constant size anyway, and the few cases that do small constant copies are better off just using __get_user() instead. So get rid of the unnecessary complexity. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Linus Torvalds
|
5469dc270c |
Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton: - the rest of MM - KASAN updates - procfs updates - exit, fork updates - printk updates - lib/ updates - radix-tree testsuite updates - checkpatch updates - kprobes updates - a few other misc bits * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (162 commits) samples/kprobes: print out the symbol name for the hooks samples/kprobes: add a new module parameter kprobes: add the "tls" argument for j_do_fork init/main.c: simplify initcall_blacklisted() fs/efs/super.c: fix return value checkpatch: improve --git <commit-count> shortcut checkpatch: reduce number of `git log` calls with --git checkpatch: add support to check already applied git commits checkpatch: add --list-types to show message types to show or ignore checkpatch: advertise the --fix and --fix-inplace options more checkpatch: whine about ACCESS_ONCE checkpatch: add test for keywords not starting on tabstops checkpatch: improve CONSTANT_COMPARISON test for structure members checkpatch: add PREFER_IS_ENABLED test lib/GCD.c: use binary GCD algorithm instead of Euclidean radix-tree: free up the bottom bit of exceptional entries for reuse dax: move RADIX_DAX_ definitions to dax.c radix-tree: make radix_tree_descend() more useful radix-tree: introduce radix_tree_replace_clear_tags() radix-tree: tidy up __radix_tree_create() ... |
||
Linus Torvalds
|
087afe8aaf |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes and more updates from David Miller: 1) Tunneling fixes from Tom Herbert and Alexander Duyck. 2) AF_UNIX updates some struct sock bit fields with the socket lock, whereas setsockopt() sets overlapping ones with locking. Seperate out the synchronized vs. the AF_UNIX unsynchronized ones to avoid corruption. From Andrey Ryabinin. 3) Mount BPF filesystem with mount_nodev rather than mount_ns, from Eric Biederman. 4) A couple kmemdup conversions, from Muhammad Falak R Wani. 5) BPF verifier fixes from Alexei Starovoitov. 6) Don't let tunneled UDP packets get stuck in socket queues, if something goes wrong during the encapsulation just drop the packet rather than signalling an error up the call stack. From Hannes Frederic Sowa. 7) SKB ref after free in batman-adv, from Florian Westphal. 8) TCP iSCSI, ocfs2, rds, and tipc have to disable BH in it's TCP callbacks since the TCP stack runs pre-emptibly now. From Eric Dumazet. 9) Fix crash in fixed_phy_add, from Rabin Vincent. 10) Fix length checks in xen-netback, from Paul Durrant. 11) Fix mixup in KEY vs KEYID macsec attributes, from Sabrina Dubroca. 12) RDS connection spamming bug fixes from Sowmini Varadhan * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (152 commits) net: suppress warnings on dev_alloc_skb uapi glibc compat: fix compilation when !__USE_MISC in glibc udp: prevent skbs lingering in tunnel socket queues bpf: teach verifier to recognize imm += ptr pattern bpf: support decreasing order in direct packet access net: usb: ch9200: use kmemdup ps3_gelic: use kmemdup net:liquidio: use kmemdup bpf: Use mount_nodev not mount_ns to mount the bpf filesystem net: cdc_ncm: update datagram size after changing mtu tuntap: correctly wake up process during uninit intel: Add support for IPv6 IP-in-IP offload ip6_gre: Do not allow segmentation offloads GRE_CSUM is enabled with FOU/GUE RDS: TCP: Avoid rds connection churn from rogue SYNs RDS: TCP: rds_tcp_accept_worker() must exit gracefully when terminating rds-tcp net: sock: move ->sk_shutdown out of bitfields. ipv6: Don't reset inner headers in ip6_tnl_xmit ip4ip6: Support for GSO/GRO ip6ip6: Support for GSO/GRO ipv6: Set features for IPv6 tunnels ... |
||
Matthew Wilcox
|
e9256efcc8 |
radix-tree: introduce radix_tree_empty
Commit
|
||
Andy Shevchenko
|
ede9c27749 |
kernel/sysctl_binary.c: use generic UUID library
UUID library provides uuid_be type and uuid_be_to_bin() function. This substitutes open coded variant by generic library calls. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com> Cc: Mimi Zohar <zohar@linux.vnet.ibm.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Petr Mladek
|
cf9b1106c8 |
printk/nmi: flush NMI messages on the system panic
In NMI context, printk() messages are stored into per-CPU buffers to avoid a possible deadlock. They are normally flushed to the main ring buffer via an IRQ work. But the work is never called when the system calls panic() in the very same NMI handler. This patch tries to flush NMI buffers before the crash dump is generated. In this case it does not risk a double release and bails out when the logbuf_lock is already taken. The aim is to get the messages into the main ring buffer when possible. It makes them better accessible in the vmcore. Then the patch tries to flush the buffers second time when other CPUs are down. It might be more aggressive and reset logbuf_lock. The aim is to get the messages available for the consequent kmsg_dump() and console_flush_on_panic() calls. The patch causes vprintk_emit() to be called even in NMI context again. But it is done via printk_deferred() so that the console handling is skipped. Consoles use internal locks and we could not prevent a deadlock easily. They are explicitly called later when the crash dump is not generated, see console_flush_on_panic(). Signed-off-by: Petr Mladek <pmladek@suse.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: David Miller <davem@davemloft.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jan Kara <jack@suse.cz> Cc: Jiri Kosina <jkosina@suse.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Petr Mladek
|
427934b871 |
printk/nmi: increase the size of NMI buffer and make it configurable
Testing has shown that the backtrace sometimes does not fit into the 4kB
temporary buffer that is used in NMI context. The warnings are gone
when I double the temporary buffer size.
This patch doubles the buffer size and makes it configurable.
Note that this problem existed even in the x86-specific implementation
that was added by the commit
|
||
Petr Mladek
|
b522deabc6 |
printk/nmi: warn when some message has been lost in NMI context
We could not resize the temporary buffer in NMI context. Let's warn if a message is lost. This is rather theoretical. printk() should not be used in NMI. The only sensible use is when we want to print backtrace from all CPUs. The current buffer should be enough for this purpose. [akpm@linux-foundation.org: whitespace fixlet] Signed-off-by: Petr Mladek <pmladek@suse.com> Cc: Jan Kara <jack@suse.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: Daniel Thompson <daniel.thompson@linaro.org> Cc: Jiri Kosina <jkosina@suse.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: David Miller <davem@davemloft.net> Cc: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Petr Mladek
|
42a0bb3f71 |
printk/nmi: generic solution for safe printk in NMI
printk() takes some locks and could not be used a safe way in NMI
context.
The chance of a deadlock is real especially when printing stacks from
all CPUs. This particular problem has been addressed on x86 by the
commit
|
||
Jiri Slaby
|
0740aa5f63 |
fork: free thread in copy_process on failure
When using this program (as root): #include <err.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/io.h> #include <sys/types.h> #include <sys/wait.h> #define ITER 1000 #define FORKERS 15 #define THREADS (6000/FORKERS) // 1850 is proc max static void fork_100_wait() { unsigned a, to_wait = 0; printf("\t%d forking %d\n", THREADS, getpid()); for (a = 0; a < THREADS; a++) { switch (fork()) { case 0: usleep(1000); exit(0); break; case -1: break; default: to_wait++; break; } } printf("\t%d forked from %d, waiting for %d\n", THREADS, getpid(), to_wait); for (a = 0; a < to_wait; a++) wait(NULL); printf("\t%d waited from %d\n", THREADS, getpid()); } static void run_forkers() { pid_t forkers[FORKERS]; unsigned a; for (a = 0; a < FORKERS; a++) { switch ((forkers[a] = fork())) { case 0: fork_100_wait(); exit(0); break; case -1: err(1, "DIE fork of %d'th forker", a); break; default: break; } } for (a = 0; a < FORKERS; a++) waitpid(forkers[a], NULL, 0); } int main() { unsigned a; int ret; ret = ioperm(10, 20, 0); if (ret < 0) err(1, "ioperm"); for (a = 0; a < ITER; a++) run_forkers(); return 0; } kmemleak reports many occurences of this leak: unreferenced object 0xffff8805917c8000 (size 8192): comm "fork-leak", pid 2932, jiffies 4295354292 (age 1871.028s) hex dump (first 32 bytes): ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................ ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ................ backtrace: [<ffffffff814cfbf5>] kmemdup+0x25/0x50 [<ffffffff8103ab43>] copy_thread_tls+0x6c3/0x9a0 [<ffffffff81150174>] copy_process+0x1a84/0x5790 [<ffffffff811dc375>] wake_up_new_task+0x2d5/0x6f0 [<ffffffff8115411d>] _do_fork+0x12d/0x820 ... Due to the leakage of the memory items which should have been freed in arch/x86/kernel/process.c:exit_thread(). Make sure the memory is freed when fork fails later in copy_process. This is done by calling exit_thread with the thread to kill. Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: "David S. Miller" <davem@davemloft.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Aurelien Jacquiot <a-jacquiot@ti.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen Liqin <liqin.linux@gmail.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Chris Zankel <chris@zankel.net> Cc: David Howells <dhowells@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: James Hogan <james.hogan@imgtec.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Jonas Bonn <jonas@southpole.se> Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> Cc: Lennox Wu <lennox.wu@gmail.com> Cc: Ley Foon Tan <lftan@altera.com> Cc: Mark Salter <msalter@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Mikael Starvik <starvik@axis.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rich Felker <dalias@libc.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Richard Weinberger <richard@nod.at> Cc: Russell King <linux@arm.linux.org.uk> Cc: Steven Miao <realmz6@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Jiri Slaby
|
e64646946e |
exit_thread: accept a task parameter to be exited
We need to call exit_thread from copy_process in a fail path. So make it accept task_struct as a parameter. [v2] * s390: exit_thread_runtime_instr doesn't make sense to be called for non-current tasks. * arm: fix the comment in vfp_thread_copy * change 'me' to 'tsk' for task_struct * now we can change only archs that actually have exit_thread [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: "David S. Miller" <davem@davemloft.net> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> Cc: Aurelien Jacquiot <a-jacquiot@ti.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chen Liqin <liqin.linux@gmail.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Chris Zankel <chris@zankel.net> Cc: David Howells <dhowells@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: James Hogan <james.hogan@imgtec.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Jesper Nilsson <jesper.nilsson@axis.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Jonas Bonn <jonas@southpole.se> Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> Cc: Lennox Wu <lennox.wu@gmail.com> Cc: Ley Foon Tan <lftan@altera.com> Cc: Mark Salter <msalter@redhat.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Mikael Starvik <starvik@axis.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Rich Felker <dalias@libc.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Richard Weinberger <richard@nod.at> Cc: Russell King <linux@arm.linux.org.uk> Cc: Steven Miao <realmz6@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Michal Hocko
|
ec8d7c14ea |
mm, oom_reaper: do not mmput synchronously from the oom reaper context
Tetsuo has properly noted that mmput slow path might get blocked waiting for another party (e.g. exit_aio waits for an IO). If that happens the oom_reaper would be put out of the way and will not be able to process next oom victim. We should strive for making this context as reliable and independent on other subsystems as much as possible. Introduce mmput_async which will perform the slow path from an async (WQ) context. This will delay the operation but that shouldn't be a problem because the oom_reaper has reclaimed the victim's address space for most cases as much as possible and the remaining context shouldn't bind too much memory anymore. The only exception is when mmap_sem trylock has failed which shouldn't happen too often. The issue is only theoretical but not impossible. Signed-off-by: Michal Hocko <mhocko@suse.com> Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Alexei Starovoitov
|
1b9b69ecb3 |
bpf: teach verifier to recognize imm += ptr pattern
Humans don't write C code like:
u8 *ptr = skb->data;
int imm = 4;
imm += ptr;
but from llvm backend point of view 'imm' and 'ptr' are registers and
imm += ptr may be preferred vs ptr += imm depending which register value
will be used further in the code, while verifier can only recognize ptr += imm.
That caused small unrelated changes in the C code of the bpf program to
trigger rejection by the verifier. Therefore teach the verifier to recognize
both ptr += imm and imm += ptr.
For example:
when R6=pkt(id=0,off=0,r=62) R7=imm22
after r7 += r6 instruction
will be R6=pkt(id=0,off=0,r=62) R7=pkt(id=0,off=22,r=62)
Fixes:
|
||
Alexei Starovoitov
|
d91b28ed42 |
bpf: support decreasing order in direct packet access
when packet headers are accessed in 'decreasing' order (like TCP port
may be fetched before the program reads IP src) the llvm may generate
the following code:
[...] // R7=pkt(id=0,off=22,r=70)
r2 = *(u32 *)(r7 +0) // good access
[...]
r7 += 40 // R7=pkt(id=0,off=62,r=70)
r8 = *(u32 *)(r7 +0) // good access
[...]
r1 = *(u32 *)(r7 -20) // this one will fail though it's within a safe range
// it's doing *(u32*)(skb->data + 42)
Fix verifier to recognize such code pattern
Alos turned out that 'off > range' condition is not a verifier bug.
It's a buggy program that may do something like:
if (ptr + 50 > data_end)
return 0;
ptr += 60;
*(u32*)ptr;
in such case emit
"invalid access to packet, off=0 size=4, R1(id=0,off=60,r=50)" error message,
so all information is available for the program author to fix the program.
Fixes:
|
||
Eric W. Biederman
|
e27f4a942a |
bpf: Use mount_nodev not mount_ns to mount the bpf filesystem
While reviewing the filesystems that set FS_USERNS_MOUNT I spotted the
bpf filesystem. Looking at the code I saw a broken usage of mount_ns
with current->nsproxy->mnt_ns. As the code does not acquire a
reference to the mount namespace it can not possibly be correct to
store the mount namespace on the superblock as it does.
Replace mount_ns with mount_nodev so that each mount of the bpf
filesystem returns a distinct instance, and the code is not buggy.
In discussion with Hannes Frederic Sowa it was reported that the use
of mount_ns was an attempt to have one bpf instance per mount
namespace, in an attempt to keep resources that pin resources from
hiding. That intent simply does not work, the vfs is not built to
allow that kind of behavior. Which means that the bpf filesystem
really is buggy both semantically and in it's implemenation as it does
not nor can it implement the original intent.
This change is userspace visible, but my experience with similar
filesystems leads me to believe nothing will break with a model of each
mount of the bpf filesystem is distinct from all others.
Fixes:
|
||
Daniel Borkmann
|
b7552e1bcc |
bpf: rather use get_random_int for randomizations
Start address randomization and blinding in BPF currently use prandom_u32(). prandom_u32() values are not exposed to unpriviledged user space to my knowledge, but given other kernel facilities such as ASLR, stack canaries, etc make use of stronger get_random_int(), we better make use of it here as well given blinding requests successively new random values. get_random_int() has minimal entropy pool depletion, is not cryptographically secure, but doesn't need to be for our use cases here. Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> |
||
Soumya PN
|
6112a300c9 |
ftrace: Don't disable irqs when taking the tasklist_lock read_lock
In ftrace.c inside the function alloc_retstack_tasklist() (which will be invoked when function_graph tracing is on) the tasklist_lock is being held as reader while iterating through a list of threads. Here the lock is being held as reader with irqs disabled. The tasklist_lock is never write_locked in interrupt context so it is safe to not disable interrupts for the duration of read_lock in this block which, can be significant, given the block of code iterates through all threads. Hence changing the code to call read_lock() and read_unlock() instead of read_lock_irqsave() and read_unlock_irqrestore(). A similar change was made in commits: |
||
Linus Torvalds
|
c04a588029 |
powerpc updates for 4.7
Highlights: - Support for Power ISA 3.0 (Power9) Radix Tree MMU from Aneesh Kumar K.V - Live patching support for ppc64le (also merged via livepatching.git) Various cleanups & minor fixes from: - Aaro Koskinen, Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V, Chris Smart, Daniel Axtens, Frederic Barrat, Gavin Shan, Ian Munsie, Lennart Sorensen, Madhavan Srinivasan, Mahesh Salgaonkar, Markus Elfring, Michael Ellerman, Oliver O'Halloran, Paul Gortmaker, Paul Mackerras, Rashmica Gupta, Russell Currey, Suraj Jitindar Singh, Thiago Jung Bauermann, Valentin Rothberg, Vipin K Parashar. General: - Update LMB associativity index during DLPAR add/remove from Nathan Fontenot - Fix branching to OOL handlers in relocatable kernel from Hari Bathini - Add support for userspace Power9 copy/paste from Chris Smart - Always use STRICT_MM_TYPECHECKS from Michael Ellerman - Add mask of possible MMU features from Michael Ellerman PCI: - Enable pass through of NVLink to guests from Alexey Kardashevskiy - Cleanups in preparation for powernv PCI hotplug from Gavin Shan - Don't report error in eeh_pe_reset_and_recover() from Gavin Shan - Restore initial state in eeh_pe_reset_and_recover() from Gavin Shan - Revert "powerpc/eeh: Fix crash in eeh_add_device_early() on Cell" from Guilherme G. Piccoli - Remove the dependency on EEH struct in DDW mechanism from Guilherme G. Piccoli selftests: - Test cp_abort during context switch from Chris Smart - Add several tests for transactional memory support from Rashmica Gupta perf: - Add support for sampling interrupt register state from Anju T - Add support for unwinding perf-stackdump from Chandan Kumar cxl: - Configure the PSL for two CAPI ports on POWER8NVL from Philippe Bergheaud - Allow initialization on timebase sync failures from Frederic Barrat - Increase timeout for detection of AFU mmio hang from Frederic Barrat - Handle num_of_processes larger than can fit in the SPA from Ian Munsie - Ensure PSL interrupt is configured for contexts with no AFU IRQs from Ian Munsie - Add kernel API to allow a context to operate with relocate disabled from Ian Munsie - Check periodically the coherent platform function's state from Christophe Lombard Freescale: - Updates from Scott: "Contains 86xx fixes, minor device tree fixes, an erratum workaround, and a kconfig dependency fix." -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJXPsGzAAoJEFHr6jzI4aWAVoAP/iKdrDe0eYHlVAE9SqnbsiZs lgDxdsC8P3fsmP1G9o/HkKhC82zHl/La8Ztz8dtqa+LkSzbfliWP1ztJsI7GsBFo tyCKzWnX9Rwvd3meHu/o/SQ29TNLm/PbPyyRqpj5QPbJ8XCXkAXR7ZZZqjvcMsJW /AgIr7Cgf53tl9oZzzl/c7CnNHhMq+NBdA71vhWtUx+T97wfJEGyKW6HhZyHDbEU iAki7fu77ZpEqC/Fh9swf0dCGBJ+a132NoMVo0AdV7EQLznUYlQpQEqa+1PyHZOP /ArOzf2mDg6m3PfCo1eiB07v8PnVZ3llEUbVAJNg3GUxbE4SHrqq/kwm0iElm3p/ DvFxerCwdX9vmskJX4wDs+pSZRabXYj9XVMptsgFzA4joWrqqb7mBHqaort88YcY YSljEt1bHyXmiJ+dBya40qARsWUkCVN7ZgEzdxckq0KI3w7g2tqpqIbO2lClWT6t B3GpqQ4jp34+d1M14FB91fIGK7tMvOhSInE0Mv9+tPvRsepXqiiU/SwdAtRlr3m2 zs/K+4FYcVjJ3Rmpgc+tI38PbZxHe212I35YN6L1LP+4ZfAtzz0NyKdooTIBtkbO 19pX4WbBjKq8zK+YutrySncBIrbnI6VjW51vtRhgVKZliPFO/6zKagyU6FbxM+E5 udQES+t3F/9gvtxgxtDe =YvyQ -----END PGP SIGNATURE----- Merge tag 'powerpc-4.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: "Highlights: - Support for Power ISA 3.0 (Power9) Radix Tree MMU from Aneesh Kumar K.V - Live patching support for ppc64le (also merged via livepatching.git) Various cleanups & minor fixes from: - Aaro Koskinen, Alexey Kardashevskiy, Andrew Donnellan, Aneesh Kumar K.V, Chris Smart, Daniel Axtens, Frederic Barrat, Gavin Shan, Ian Munsie, Lennart Sorensen, Madhavan Srinivasan, Mahesh Salgaonkar, Markus Elfring, Michael Ellerman, Oliver O'Halloran, Paul Gortmaker, Paul Mackerras, Rashmica Gupta, Russell Currey, Suraj Jitindar Singh, Thiago Jung Bauermann, Valentin Rothberg, Vipin K Parashar. General: - Update LMB associativity index during DLPAR add/remove from Nathan Fontenot - Fix branching to OOL handlers in relocatable kernel from Hari Bathini - Add support for userspace Power9 copy/paste from Chris Smart - Always use STRICT_MM_TYPECHECKS from Michael Ellerman - Add mask of possible MMU features from Michael Ellerman PCI: - Enable pass through of NVLink to guests from Alexey Kardashevskiy - Cleanups in preparation for powernv PCI hotplug from Gavin Shan - Don't report error in eeh_pe_reset_and_recover() from Gavin Shan - Restore initial state in eeh_pe_reset_and_recover() from Gavin Shan - Revert "powerpc/eeh: Fix crash in eeh_add_device_early() on Cell" from Guilherme G Piccoli - Remove the dependency on EEH struct in DDW mechanism from Guilherme G Piccoli selftests: - Test cp_abort during context switch from Chris Smart - Add several tests for transactional memory support from Rashmica Gupta perf: - Add support for sampling interrupt register state from Anju T - Add support for unwinding perf-stackdump from Chandan Kumar cxl: - Configure the PSL for two CAPI ports on POWER8NVL from Philippe Bergheaud - Allow initialization on timebase sync failures from Frederic Barrat - Increase timeout for detection of AFU mmio hang from Frederic Barrat - Handle num_of_processes larger than can fit in the SPA from Ian Munsie - Ensure PSL interrupt is configured for contexts with no AFU IRQs from Ian Munsie - Add kernel API to allow a context to operate with relocate disabled from Ian Munsie - Check periodically the coherent platform function's state from Christophe Lombard Freescale: - Updates from Scott: "Contains 86xx fixes, minor device tree fixes, an erratum workaround, and a kconfig dependency fix." * tag 'powerpc-4.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (192 commits) powerpc/86xx: Fix PCI interrupt map definition powerpc/86xx: Move pci1 definition to the include file powerpc/fsl: Fix build of the dtb embedded kernel images powerpc/fsl: Fix rcpm compatible string powerpc/fsl: Remove FSL_SOC dependency from FSL_LBC powerpc/fsl-pci: Add a workaround for PCI 5 errata powerpc/fsl: Fix SPI compatible on t208xrdb and t1040rdb powerpc/powernv/npu: Add PE to PHB's list powerpc/powernv: Fix insufficient memory allocation powerpc/iommu: Remove the dependency on EEH struct in DDW mechanism Revert "powerpc/eeh: Fix crash in eeh_add_device_early() on Cell" powerpc/eeh: Drop unnecessary label in eeh_pe_change_owner() powerpc/eeh: Ignore handlers in eeh_pe_reset_and_recover() powerpc/eeh: Restore initial state in eeh_pe_reset_and_recover() powerpc/eeh: Don't report error in eeh_pe_reset_and_recover() Revert "powerpc/powernv: Exclude root bus in pnv_pci_reset_secondary_bus()" powerpc/powernv/npu: Enable NVLink pass through powerpc/powernv/npu: Rework TCE Kill handling powerpc/powernv/npu: Add set/unset window helpers powerpc/powernv/ioda2: Export debug helper pe_level_printk() ... |
||
Linus Torvalds
|
a1c28b75a9 |
Merge branch 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM updates from Russell King: "Changes included in this pull request: - revert pxa2xx-flash back to using ioremap_cached() and switch memremap() to use arch_memremap_wb() - remove pci=firmware command line argument handling - remove unnecessary arm_dma_set_mask() implementation, the generic implementation will do for ARM - removal of the ARM kallsyms "hack" to work around mode switching veneers and vectors located below PAGE_OFFSET - tidy up build system output a little - add L2 cache power management DT bindings - remove duplicated local_irq_disable() in reboot paths - handle AMBA primecell devices better at registration time with PM domains (needed for Samsung SoCs) - ARM specific preparation to support Keystone II kexec" * 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: ARM: 8567/1: cache-uniphier: activate ways for secondary CPUs ARM: 8570/2: Documentation: devicetree: Add PL310 PM bindings ARM: 8569/1: pl2x0: Add OF control of cache power management ARM: 8568/1: reboot: remove duplicated local_irq_disable() ARM: 8566/1: drivers: amba: properly handle devices with power domains ARM: provide arm_has_idmap_alias() helper ARM: kexec: remove 512MB restriction on kexec crashdump ARM: provide improved virt_to_idmap() functionality ARM: kexec: fix crashkernel= handling ARM: 8557/1: specify install, zinstall, and uinstall as PHONY targets ARM: 8562/1: suppress "include/generated/mach-types.h is up to date." ARM: 8553/1: kallsyms: remove --page-offset command line option ARM: 8552/1: kallsyms: remove special lower address limit for CONFIG_ARM ARM: 8555/1: kallsyms: ignore ARM mode switching veneers ARM: 8548/1: dma-mapping: remove arm_dma_set_mask() ARM: 8554/1: kernel: pci: remove pci=firmware command line parameter handling ARM: memremap: implement arch_memremap_wb() memremap: add arch specific hook for MEMREMAP_WB mappings mtd: pxa2xx-flash: switch back from memremap to ioremap_cached ARM: reintroduce ioremap_cached() for creating cached I/O mappings |
||
Ingo Molnar
|
21f77d231f |
perf/core improvements and fixes:
User visible: - Honour the kernel.perf_event_max_stack knob more precisely by not counting PERF_CONTEXT_{KERNEL,USER} when deciding when to stop adding entries to the perf_sample->ip_callchain[] array (Arnaldo Carvalho de Melo) - Fix identation of 'stalled-backend-cycles' in 'perf stat' (Namhyung Kim) - Update runtime using 'cpu-clock' event in 'perf stat' (Namhyung Kim) - Use 'cpu-clock' for cpu targets in 'perf stat' (Namhyung Kim) - Avoid fractional digits for integer scales in 'perf stat' (Andi Kleen) - Store vdso buildid unconditionally, as it appears in callchains and we're not checking those when creating the build-id table, so we end up not being able to resolve VDSO symbols when doing analysis on a different machine than the one where recording was done, possibly of a different arch even (arm -> x86_64) (He Kuang) Infrastructure: - Generalize max_stack sysctl handler, will be used for configuring multiple kernel knobs related to callchains (Arnaldo Carvalho de Melo) Cleanups: - Introduce DSO__NAME_KALLSYMS and DSO__NAME_KCORE, to stop using open coded strings (Masami Hiramatsu) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAABCAAGBQJXOn7eAAoJENZQFvNTUqpAsOAP/3f/XJekPQAnMcKRBp2noCuj nRu1kBltVJyP8iOU5PKSJwel4F9ykNNMl+/rzzxHDo13IM8uc+HnZOJZ6e9mJIJ1 xqjdqM4EDlYYoFApJzCjTK6CMlevCazosdQT1bbmMDYVPc2uQR/GnutFrzqf/Plg hEougIGtfrdy85g95CRdxpy2yMwDK4EwsiDRm9ib1hnuamQZl97buWemBVqSJmLY p82E2aMU5Fv5+B8AO4I7V88ZmgpmryjxpM+LjffgNUDSKsSHrlG4NiQ3znV1bgst Rc++w78+qxoIozOu6/IX8eSI2L/1eyM/yQ6Qre0KuvYXCl+NopTAYSSJlaA4tyHF c55z7HucuyATN3PrFRHlbWUT/RMIVC0j0lnZOc7SJLl90hJQ+nv0iZcbYwMbeHu1 3LGlcd9jDwQYiClbaT9ATxZJ8B9An0/k/HJdatbAHN0wRomP2Ozz/qD2nmEbUwpV sCyLOo/LJkvVkuUjSg6ZiOArNIk4iTSPSAUV+SAL6YOEOZMAX5ISUJQ174+zFC9a gqtVsCXvwLIsndXb8ys1r9/fit/MUci0OzKX3SG1K765+E4Bk23KcAgMNbM/a7lp ZmHDXMC+yBYcnYNnaxkp7c55CWUlKGOeR4e+KmB99KoeIleYgPhD2UM5beo61TmN yUEPtiiFiZmTRkiAu83R =7OdF -----END PGP SIGNATURE----- Merge tag 'perf-core-for-mingo-20160516' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: User visible changes: - Honour the kernel.perf_event_max_stack knob more precisely by not counting PERF_CONTEXT_{KERNEL,USER} when deciding when to stop adding entries to the perf_sample->ip_callchain[] array (Arnaldo Carvalho de Melo) - Fix identation of 'stalled-backend-cycles' in 'perf stat' (Namhyung Kim) - Update runtime using 'cpu-clock' event in 'perf stat' (Namhyung Kim) - Use 'cpu-clock' for cpu targets in 'perf stat' (Namhyung Kim) - Avoid fractional digits for integer scales in 'perf stat' (Andi Kleen) - Store vdso buildid unconditionally, as it appears in callchains and we're not checking those when creating the build-id table, so we end up not being able to resolve VDSO symbols when doing analysis on a different machine than the one where recording was done, possibly of a different arch even (arm -> x86_64) (He Kuang) Infrastructure changes: - Generalize max_stack sysctl handler, will be used for configuring multiple kernel knobs related to callchains (Arnaldo Carvalho de Melo) Cleanups: - Introduce DSO__NAME_KALLSYMS and DSO__NAME_KCORE, to stop using open coded strings (Masami Hiramatsu) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> |
||
Linus Torvalds
|
a05a70db34 |
Merge branch 'akpm' (patches from Andrew)
Merge updates from Andrew Morton: - fsnotify fix - poll() timeout fix - a few scripts/ tweaks - debugobjects updates - the (small) ocfs2 queue - Minor fixes to kernel/padata.c - Maybe half of the MM queue * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (117 commits) mm, page_alloc: restore the original nodemask if the fast path allocation failed mm, page_alloc: uninline the bad page part of check_new_page() mm, page_alloc: don't duplicate code in free_pcp_prepare mm, page_alloc: defer debugging checks of pages allocated from the PCP mm, page_alloc: defer debugging checks of freed pages until a PCP drain cpuset: use static key better and convert to new API mm, page_alloc: inline pageblock lookup in page free fast paths mm, page_alloc: remove unnecessary variable from free_pcppages_bulk mm, page_alloc: pull out side effects from free_pages_check mm, page_alloc: un-inline the bad part of free_pages_check mm, page_alloc: check multiple page fields with a single branch mm, page_alloc: remove field from alloc_context mm, page_alloc: avoid looking up the first zone in a zonelist twice mm, page_alloc: shortcut watermark checks for order-0 pages mm, page_alloc: reduce cost of fair zone allocation policy retry mm, page_alloc: shorten the page allocator fast path mm, page_alloc: check once if a zone has isolated pageblocks mm, page_alloc: move __GFP_HARDWALL modifications out of the fastpath mm, page_alloc: simplify last cpupid reset mm, page_alloc: remove unnecessary initialisation from __alloc_pages_nodemask() ... |
||
Vlastimil Babka
|
002f290627 |
cpuset: use static key better and convert to new API
An important function for cpusets is cpuset_node_allowed(), which optimizes on the fact if there's a single root CPU set, it must be trivially allowed. But the check "nr_cpusets() <= 1" doesn't use the cpusets_enabled_key static key the right way where static keys eliminate branching overhead with jump labels. This patch converts it so that static key is used properly. It's also switched to the new static key API and the checking functions are converted to return bool instead of int. We also provide a new variant __cpuset_zone_allowed() which expects that the static key check was already done and they key was enabled. This is needed for get_page_from_freelist() where we want to also avoid the relatively slower check when ALLOC_CPUSET is not set in alloc_flags. The impact on the page allocator microbenchmark is less than expected but the cleanup in itself is worthwhile. 4.6.0-rc2 4.6.0-rc2 multcheck-v1r20 cpuset-v1r20 Min alloc-odr0-1 348.00 ( 0.00%) 348.00 ( 0.00%) Min alloc-odr0-2 254.00 ( 0.00%) 254.00 ( 0.00%) Min alloc-odr0-4 213.00 ( 0.00%) 213.00 ( 0.00%) Min alloc-odr0-8 186.00 ( 0.00%) 183.00 ( 1.61%) Min alloc-odr0-16 173.00 ( 0.00%) 171.00 ( 1.16%) Min alloc-odr0-32 166.00 ( 0.00%) 163.00 ( 1.81%) Min alloc-odr0-64 162.00 ( 0.00%) 159.00 ( 1.85%) Min alloc-odr0-128 160.00 ( 0.00%) 157.00 ( 1.88%) Min alloc-odr0-256 169.00 ( 0.00%) 166.00 ( 1.78%) Min alloc-odr0-512 180.00 ( 0.00%) 180.00 ( 0.00%) Min alloc-odr0-1024 188.00 ( 0.00%) 187.00 ( 0.53%) Min alloc-odr0-2048 194.00 ( 0.00%) 193.00 ( 0.52%) Min alloc-odr0-4096 199.00 ( 0.00%) 198.00 ( 0.50%) Min alloc-odr0-8192 202.00 ( 0.00%) 201.00 ( 0.50%) Min alloc-odr0-16384 203.00 ( 0.00%) 202.00 ( 0.49%) Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Acked-by: Zefan Li <lizefan@huawei.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Hugh Dickins
|
52b6f46bc1 |
mm: /proc/sys/vm/stat_refresh to force vmstat update
Provide /proc/sys/vm/stat_refresh to force an immediate update of per-cpu into global vmstats: useful to avoid a sleep(2) or whatever before checking counts when testing. Originally added to work around a bug which left counts stranded indefinitely on a cpu going idle (an inaccuracy magnified when small below-batch numbers represent "huge" amounts of memory), but I believe that bug is now fixed: nonetheless, this is still a useful knob. Its schedule_on_each_cpu() is probably too expensive just to fold into reading /proc/meminfo itself: give this mode 0600 to prevent abuse. Allow a write or a read to do the same: nothing to read, but "grep -h Shmem /proc/sys/vm/stat_refresh /proc/meminfo" is convenient. Oh, and since global_page_state() itself is careful to disguise any underflow as 0, hack in an "Invalid argument" and pr_warn() if a counter is negative after the refresh - this helped to fix a misaccounting of NR_ISOLATED_FILE in my migration code. But on recent kernels, I find that NR_ALLOC_BATCH and NR_PAGES_SCANNED often go negative some of the time. I have not yet worked out why, but have no evidence that it's actually harmful. Punt for the moment by just ignoring the anomaly on those. Signed-off-by: Hugh Dickins <hughd@google.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andres Lagar-Cavilla <andreslc@google.com> Cc: Yang Shi <yang.shi@linaro.org> Cc: Ning Qu <quning@gmail.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Andres Lagar-Cavilla <andreslc@google.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Andrew Morton
|
0edaf86cf1 |
include/linux/nodemask.h: create next_node_in() helper
Lots of code does node = next_node(node, XXX); if (node == MAX_NUMNODES) node = first_node(XXX); so create next_node_in() to do this and use it in various places. [mhocko@suse.com: use next_node_in() helper] Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@kernel.org> Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: David Rientjes <rientjes@google.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Laura Abbott <lauraa@codeaurora.org> Cc: Hui Zhu <zhuhui@xiaomi.com> Cc: Wang Xiaoqiang <wangxq10@lzu.edu.cn> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Joonsoo Kim
|
0139aa7b7f |
mm: rename _count, field of the struct page, to _refcount
Many developers already know that field for reference count of the struct page is _count and atomic type. They would try to handle it directly and this could break the purpose of page reference count tracepoint. To prevent direct _count modification, this patch rename it to _refcount and add warning message on the code. After that, developer who need to handle reference count will find that field should not be accessed directly. [akpm@linux-foundation.org: fix comments, per Vlastimil] [akpm@linux-foundation.org: Documentation/vm/transhuge.txt too] [sfr@canb.auug.org.au: sync ethernet driver changes] Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Sunil Goutham <sgoutham@cavium.com> Cc: Chris Metcalf <cmetcalf@mellanox.com> Cc: Manish Chopra <manish.chopra@qlogic.com> Cc: Yuval Mintz <yuval.mintz@qlogic.com> Cc: Tariq Toukan <tariqt@mellanox.com> Cc: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Arnd Bergmann
|
19d795b677 |
kernel/padata.c: hide unused functions
A recent cleanup removed some exported functions that were not used anywhere, which in turn exposed the fact that some other functions in the same file are only used in some configurations. We now get a warning about them when CONFIG_HOTPLUG_CPU is disabled: kernel/padata.c:670:12: error: '__padata_remove_cpu' defined but not used [-Werror=unused-function] static int __padata_remove_cpu(struct padata_instance *pinst, int cpu) ^~~~~~~~~~~~~~~~~~~ kernel/padata.c:650:12: error: '__padata_add_cpu' defined but not used [-Werror=unused-function] static int __padata_add_cpu(struct padata_instance *pinst, int cpu) This rearranges the code so the __padata_remove_cpu/__padata_add_cpu functions are within the #ifdef that protects the code that calls them. [akpm@linux-foundation.org: coding-style fixes] Fixes: 4ba6d78c671e ("kernel/padata.c: removed unused code") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Richard Cochran <rcochran@linutronix.de> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Richard Cochran
|
815613da6a |
kernel/padata.c: removed unused code
By accident I stumbled across code that has never been used. This driver has EXPORT_SYMBOL functions, and the only user of the code is pcrypt.c, but this only uses a subset of the exported symbols. According to 'git log -G', the functions, padata_set_cpumasks, padata_add_cpu, and padata_remove_cpu have never been used since they were first introduced. This patch removes the unused code. On one 64 bit build, with CRYPTO_PCRYPT built in, the text is more than 4k smaller. kbuild_hp> size $KBUILD_OUTPUT/vmlinux text data bss dec hex filename 10566658 |
||
Du, Changbin
|
b9fdac7f66 |
debugobjects: insulate non-fixup logic related to static obj from fixup callbacks
When activating a static object we need make sure that the object is tracked in the object tracker. If it is a non-static object then the activation is illegal. In previous implementation, each subsystem need take care of this in their fixup callbacks. Actually we can put it into debugobjects core. Thus we can save duplicated code, and have *pure* fixup callbacks. To achieve this, a new callback "is_static_object" is introduced to let the type specific code decide whether a object is static or not. If yes, we take it into object tracker, otherwise give warning and invoke fixup callback. This change has paassed debugobjects selftest, and I also do some test with all debugobjects supports enabled. At last, I have a concern about the fixups that can it change the object which is in incorrect state on fixup? Because the 'addr' may not point to any valid object if a non-static object is not tracked. Then Change such object can overwrite someone's memory and cause unexpected behaviour. For example, the timer_fixup_activate bind timer to function stub_timer. Link: http://lkml.kernel.org/r/1462576157-14539-1-git-send-email-changbin.du@intel.com [changbin.du@intel.com: improve code comments where invoke the new is_static_object callback] Link: http://lkml.kernel.org/r/1462777431-8171-1-git-send-email-changbin.du@intel.com Signed-off-by: Du, Changbin <changbin.du@intel.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Triplett <josh@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tejun Heo <tj@kernel.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Du, Changbin
|
3263d28eb5 |
rcu: update debugobjects fixup callbacks return type
Update the return type to use bool instead of int, corresponding to cheange (debugobjects: make fixup functions return bool instead of int). Signed-off-by: Du, Changbin <changbin.du@intel.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Triplett <josh@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tejun Heo <tj@kernel.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Du, Changbin
|
e3252464da |
timer: update debugobjects fixup callbacks return type
Update the return type to use bool instead of int, corresponding to cheange (debugobjects: make fixup functions return bool instead of int). Signed-off-by: Du, Changbin <changbin.du@intel.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Triplett <josh@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tejun Heo <tj@kernel.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Du, Changbin
|
02a982a6ec |
workqueue: update debugobjects fixup callbacks return type
Update the return type to use bool instead of int, corresponding to change (debugobjects: make fixup functions return bool instead of int) Signed-off-by: Du, Changbin <changbin.du@intel.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Josh Triplett <josh@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tejun Heo <tj@kernel.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Deepa Dinamani
|
8e4f70e218 |
time: remove timespec_add_safe()
All references to timespec_add_safe() now use timespec64_add_safe(). The plan is to replace struct timespec references with struct timespec64 throughout the kernel as timespec is not y2038 safe. Drop timespec_add_safe() and use timespec64_add_safe() for all architectures. Link: http://lkml.kernel.org/r/1461947989-21926-4-git-send-email-deepa.kernel@gmail.com Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: John Stultz <john.stultz@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Deepa Dinamani
|
bc2c53e5f1 |
time: add missing implementation for timespec64_add_safe()
timespec64_add_safe() has been defined in time64.h for 64 bit systems. But, 32 bit systems only have an extern function prototype defined. Provide a definition for the above function. The function will be necessary as part of y2038 changes. struct timespec is not y2038 safe. All references to timespec will be replaced by struct timespec64. The function is meant to be a replacement for timespec_add_safe(). The implementation is similar to timespec_add_safe(). Link: http://lkml.kernel.org/r/1461947989-21926-2-git-send-email-deepa.kernel@gmail.com Signed-off-by: Deepa Dinamani <deepa.kernel@gmail.com> Acked-by: John Stultz <john.stultz@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Linus Torvalds
|
07b75260eb |
Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull MIPS updates from Ralf Baechle: "This is the main pull request for MIPS for 4.7. Here's the summary of the changes: - ATH79: Support for DTB passuing using the UHI boot protocol - ATH79: Remove support for builtin DTB. - ATH79: Add zboot debug serial support. - ATH79: Add initial support for Dragino MS14 (Dragine 2), Onion Omega and DPT-Module. - ATH79: Update devicetree clock support for AR9132 and AR9331. - ATH79: Cleanup the DT code. - ATH79: Support newer SOCs in ath79_ddr_ctrl_init. - ATH79: Fix regression in PCI window initialization. - BCM47xx: Move SPROM driver to drivers/firmware/ - BCM63xx: Enable partition parser in defconfig. - BMIPS: BMIPS5000 has I cache filing from D cache - BMIPS: BMIPS: Add cpu-feature-overrides.h - BMIPS: Add Whirlwind support - BMIPS: Adjust mips-hpt-frequency for BCM7435 - BMIPS: Remove maxcpus from BCM97435SVMB DTS - BMIPS: Add missing 7038 L1 register cells to BCM7435 - BMIPS: Various tweaks to initialization code. - BMIPS: Enable partition parser in defconfig. - BMIPS: Cache tweaks. - BMIPS: Add UART, I2C and SATA devices to DT. - BMIPS: Add BCM6358 and BCM63268support - BMIPS: Add device tree example for BCM6358. - BMIPS: Improve Improve BCM6328 and BCM6368 device trees - Lantiq: Add support for device tree file from boot loader - Lantiq: Allow build with no built-in DT. - Loongson 3: Reserve 32MB for RS780E integrated GPU. - Loongson 3: Fix build error after ld-version.sh modification - Loongson 3: Move chipset ACPI code from drivers to arch. - Loongson 3: Speedup irq processing. - Loongson 3: Add basic Loongson 3A support. - Loongson 3: Set cache flush handlers to nop. - Loongson 3: Invalidate special TLBs when needed. - Loongson 3: Fast TLB refill handler. - MT7620: Fallback strategy for invalid syscfg0. - Netlogic: Fix CP0_EBASE redefinition warnings - Octeon: Initialization fixes - Octeon: Add DTS files for the D-Link DSR-1000N and EdgeRouter Lite - Octeon: Enable add Octeon-drivers in cavium_octeon_defconfig - Octeon: Correctly handle endian-swapped initramfs images. - Octeon: Support CN73xx, CN75xx and CN78xx. - Octeon: Remove dead code from cvmx-sysinfo. - Octeon: Extend number of supported CPUs past 32. - Octeon: Remove some code limiting NR_IRQS to 255. - Octeon: Simplify octeon_irq_ciu_gpio_set_type. - Octeon: Mark some functions __init in smp.c - Octeon: Octeon: Add Octeon III CN7xxx interface detection - PIC32: Add serial driver and bindings for it. - PIC32: Add PIC32 deadman timer driver and bindings. - PIC32: Add PIC32 clock timer driver and bindings. - Pistachio: Determine SoC revision during boot - Sibyte: Fix Kconfig dependencies of SIBYTE_BUS_WATCHER. - Sibyte: Strip redundant comments from bcm1480_regs.h. - Panic immediately if panic_on_oops is set. - module: fix incorrect IS_ERR_VALUE macro usage. - module: Make consistent use of pr_* - Remove no longer needed work_on_cpu() call. - Remove CONFIG_IPV6_PRIVACY from defconfigs. - Fix registers of non-crashing CPUs in dumps. - Handle MIPSisms in new vmcore_elf32_check_arch. - Select CONFIG_HANDLE_DOMAIN_IRQ and make it work. - Allow RIXI to be used on non-R2 or R6 cores. - Reserve nosave data for hibernation - Fix siginfo.h to use strict POSIX types. - Don't unwind user mode with EVA. - Fix watchpoint restoration - Ptrace watchpoints for R6. - Sync icache when it fills from dcache - I6400 I-cache fills from dcache. - Various MSA fixes. - Cleanup MIPS_CPU_* definitions. - Signal: Move generic copy_siginfo to signal.h - Signal: Fix uapi include in exported asm/siginfo.h - Timer fixes for sake of KVM. - XPA TLB refill fixes. - Treat perf counter feature - Update John Crispin's email address - Add PIC32 watchdog and bindings. - Handle R10000 LL/SC bug in set_pte() - cpufreq: Various fixes for Longson1. - R6: Fix R2 emulation. - mathemu: Cosmetic fix to ADDIUPC emulation, plenty of other small fixes - ELF: ABI and FP fixes. - Allow for relocatable kernel and use that to support KASLR. - Fix CPC_BASE_ADDR mask - Plenty fo smp-cps, CM, R6 and M6250 fixes. - Make reset_control_ops const. - Fix kernel command line handling of leading whitespace. - Cleanups to cache handling. - Add brcm, bcm6345-l1-intc device tree bindings. - Use generic clkdev.h header - Remove CLK_IS_ROOT usage. - Misc small cleanups. - CM: Fix compilation error when !MIPS_CM - oprofile: Fix a preemption issue - Detect DSP ASE v3 support:1" * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (275 commits) MIPS: pic32mzda: fix getting timer clock rate. MIPS: ath79: fix regression in PCI window initialization MIPS: ath79: make ath79_ddr_ctrl_init() compatible for newer SoCs MIPS: Fix VZ probe gas errors with binutils <2.24 MIPS: perf: Fix I6400 event numbers MIPS: DEC: Export `ioasic_ssr_lock' to modules MIPS: MSA: Fix a link error on `_init_msa_upper' with older GCC MIPS: CM: Fix compilation error when !MIPS_CM MIPS: Fix genvdso error on rebuild USB: ohci-jz4740: Remove obsolete driver MIPS: JZ4740: Probe OHCI platform device via DT MIPS: JZ4740: Qi LB60: Remove support for AVT2 variant MIPS: pistachio: Determine SoC revision during boot MIPS: BMIPS: Adjust mips-hpt-frequency for BCM7435 mips: mt7620: fallback to SDRAM when syscfg0 does not have a valid value for the memory type MIPS: Prevent "restoration" of MSA context in non-MSA kernels MIPS: cevt-r4k: Dynamically calculate min_delta_ns MIPS: malta-time: Take seconds into account MIPS: malta-time: Start GIC count before syncing to RTC MIPS: Force CPUs to lose FP context during mode switches ... |
||
Linus Torvalds
|
f4f27d0028 |
Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris: "Highlights: - A new LSM, "LoadPin", from Kees Cook is added, which allows forcing of modules and firmware to be loaded from a specific device (this is from ChromeOS, where the device as a whole is verified cryptographically via dm-verity). This is disabled by default but can be configured to be enabled by default (don't do this if you don't know what you're doing). - Keys: allow authentication data to be stored in an asymmetric key. Lots of general fixes and updates. - SELinux: add restrictions for loading of kernel modules via finit_module(). Distinguish non-init user namespace capability checks. Apply execstack check on thread stacks" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (48 commits) LSM: LoadPin: provide enablement CONFIG Yama: use atomic allocations when reporting seccomp: Fix comment typo ima: add support for creating files using the mknodat syscall ima: fix ima_inode_post_setattr vfs: forbid write access when reading a file into memory fs: fix over-zealous use of "const" selinux: apply execstack check on thread stacks selinux: distinguish non-init user namespace capability checks LSM: LoadPin for kernel file loading restrictions fs: define a string representation of the kernel_read_file_id enumeration Yama: consolidate error reporting string_helpers: add kstrdup_quotable_file string_helpers: add kstrdup_quotable_cmdline string_helpers: add kstrdup_quotable selinux: check ss_initialized before revalidating an inode label selinux: delay inode label lookup as long as possible selinux: don't revalidate an inode's label when explicitly setting it selinux: Change bool variable name to index. KEYS: Add KEYCTL_DH_COMPUTE command ... |
||
Linus Torvalds
|
2600a46ee0 |
This includes two new updates for the ftrace infrastructure.
1) With the changing of the code for filtering events by pid, from a list of pids to a bitmask, we can now easily implement following forks. With a new tracing option "event-fork" which, when set, will have tasks with pids in set_event_pid, when they fork, to have their child pids added to set_event_pid and the child will be traced as well. Note, if "event-fork" is set and a task with its pid in set_event_pid exits, its pid will be removed from set_event_pid 2) The addition of Tom Zanussi's hist triggers. This includes a very thorough documentatino on how to use the hist triggers with events. This introduces a quick and easy way to get histogram data from events and their fields. Some other cleanups and updates were added as well. Like Masami Hiramatsu added test cases for the event trigger and hist triggers. Also I added a speed up of filtering by using a temp buffer when filters are set. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJXPIv1AAoJEKKk/i67LK/8WZcIAIaaHJMctDCfXPg8OoT1LLI/ yUxgWvQRM7iwGV8YjuaXlyxTDJU0XVoNpPF5ZGiePlRDSCUboNvgcNVHRusJJKqM oV1BTsq2x5eY12agA8kSOHcqGP7saqa2H+RJ4+3jNB/DTtOwJ8RzodlqWQ7PZbRG 0IDvD7buh9NeDS2am835RB+Xhy/jNBrkoJjpvMNaG5nZypsMq8D524RzyBm6RYjp p+KLo3/yDc0+khv1hIs1c/w+LXNs7XtpPjpAKBa8B4xOiXndh3IosjX3JnL+0f+6 EvXt6qRfBKCE5o2BM397qjE3V/L0/SfzTijuL1WMd88ZvPGqwcsslQekmxKAb1E= =WBTB -----END PGP SIGNATURE----- Merge tag 'trace-v4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: "This includes two new updates for the ftrace infrastructure. - With the changing of the code for filtering events by pid, from a list of pids to a bitmask, we can now easily implement following forks. With a new tracing option "event-fork" which, when set, will have tasks with pids in set_event_pid, when they fork, to have their child pids added to set_event_pid and the child will be traced as well. Note, if "event-fork" is set and a task with its pid in set_event_pid exits, its pid will be removed from set_event_pid - The addition of Tom Zanussi's hist triggers. This includes a very thorough documentatino on how to use the hist triggers with events. This introduces a quick and easy way to get histogram data from events and their fields. Some other cleanups and updates were added as well. Like Masami Hiramatsu added test cases for the event trigger and hist triggers. Also I added a speed up of filtering by using a temp buffer when filters are set" * tag 'trace-v4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (45 commits) tracing: Use temp buffer when filtering events tracing: Remove TRACE_EVENT_FL_USE_CALL_FILTER logic tracing: Remove unused function trace_current_buffer_lock_reserve() tracing: Remove one use of trace_current_buffer_lock_reserve() tracing: Have trace_buffer_unlock_commit() call the _regs version with NULL tracing: Remove unused function trace_current_buffer_discard_commit() tracing: Move trace_buffer_unlock_commit{_regs}() to local header tracing: Fold filter_check_discard() into its only user tracing: Make filter_check_discard() local tracing: Move event_trigger_unlock_commit{_regs}() to local header tracing: Don't use the address of the buffer array name in copy_from_user tracing: Handle tracing_map_alloc_elts() error path correctly tracing: Add check for NULL event field when creating hist field tracing: checking for NULL instead of IS_ERR() tracing: Do not inherit event-fork option for instances tracing: Fix unsigned comparison to zero in hist trigger code kselftests/ftrace: Add a test for log2 modifier of hist trigger tracing: Add hist trigger 'log2' modifier kselftests/ftrace: Add hist trigger testcases kselftests/ftrace : Add event trigger testcases ... |
||
Linus Torvalds
|
03e1aa1cbb |
Merge branch 'stable-4.7' of git://git.infradead.org/users/pcmoore/audit
Pull audit updates from Paul Moore: "Four small audit patches for 4.7. Two are simple cleanups around the audit thread management code, one adds a tty field to AUDIT_LOGIN events, and the final patch makes tty_name() usable regardless of CONFIG_TTY. Nothing controversial, and it all passes our regression test" * 'stable-4.7' of git://git.infradead.org/users/pcmoore/audit: tty: provide tty_name() even without CONFIG_TTY audit: add tty field to LOGIN event audit: we don't need to __set_current_state(TASK_RUNNING) audit: cleanup prune_tree_thread |
||
Viresh Kumar
|
60f05e86cf |
cpufreq: schedutil: Improve prints messages with pr_fmt
Prefix print messages with KBUILD_MODNAME, i.e 'cpufreq_schedutil: '. This helps to keep similar formatting for all the print messages particular to a file and identify those easily in kernel logs. Its already done this way for rest of the governors. Along with that, remove the (now) redundant bits from a print message. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> |
||
Linus Torvalds
|
9e17632c0a |
Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs cleanups from Al Viro: "Assorted cleanups and fixes all over the place" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: coredump: only charge written data against RLIMIT_CORE coredump: get rid of coredump_params->written ecryptfs_lookup(): try either only encrypted or plaintext name ecryptfs: avoid multiple aliases for directories bpf: reject invalid names right in ->lookup() __d_alloc(): treat NULL name as QSTR("/", 1) mtd: switch ubi_open_volume_path() to vfs_stat() mtd: switch open_mtd_by_chdev() to use of vfs_stat() |
||
Linus Torvalds
|
1eccc6e152 |
This is the bulk of GPIO changes for kernel cycle v4.7:
Core infrastructural changes: - Support for natively single-ended GPIO driver stages. This means that if the hardware has registers to configure open drain or open source configuration, we use that rather than (as we did before) try to emulate it by switching the line to an input to get high impedance. This is also documented throughly in Documentation/gpio/driver.txt for those of you who did not understand one word of what I just wrote. - Start to do away with the unnecessarily complex and unitelligible ARCH_REQUIRE_GPIOLIB and ARCH_WANT_OPTIONAL_GPIOLIB, another evolutional artifact from the time when the GPIO subsystem was unmaintained. Archs can now just select GPIOLIB and be done with it, cleanups to arches will trickle in for the next kernel. Some minor archs ACKed the changes immediately so these are included in this pull request. - Advancing the use of the data pointer inside the GPIO device for storing driver data by switching the PowerPC, Super-H Unicore and a few other subarches or subsystem drivers in ALSA SoC, Input, serial, SSB, staging etc to use it. - The initialization now reads the input/output state of the GPIO lines, so that each GPIO descriptor knows - if this callback is implemented - whether the line is input or output. This also reflects nicely in userspace "lsgpio". - It is now possible to name GPIO producer names, line names, from the device tree. (Platform data has been supported for a while.) I bet we will get a similar mechanism for ACPI one of those days. This makes is possible to get sensible producer names for e.g. GPIO rails in "lsgpio" in userspace. New drivers: - New driver for the Loongson1. - The XLP driver now supports Broadcom Vulcan ARM64. - The IT87 driver now supports IT8620 and IT8628. - The PCA953X driver now supports Galileo Gen2. Driver improvements: - MCP23S08 was switched to use the gpiolib irqchip helpers and now also suppors level-triggered interrupts. - 74x164 and RCAR now supports the .set_multiple() callback - AMDPT was converted to use generic GPIO. - TC3589x, TPS65218, SX150X, F7188X, MENZ127, VX855, WM831X, WM8994 support the new single ended callback for open drain and in some cases open source. - Implement the .get_direction() callback for a few more drivers like PL061, Xgene. Cleanups: - Paul Gortmaker combed through the drivers and de-modularized those who are not really modules. - Move the GPIO poweroff DT bindings to the power subdir where they belong. - Rename gpio-generic.c to gpio-mmio.c, which is much more to the point. That's what it is handling, nothing more, nothing less. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJXOuJ5AAoJEEEQszewGV1zNXsQAII5wtkP69WRJ3goYBKg1dZN DkuLqZyVI4hCgRhptzUW10gDLHKKOCVubfetTJHSpyG/dWDJXPCyH6FHF+pW6lMX y+em8kAvWctKpaosy4EM7O55/IohW0/fNCTOfzfrUNivjydFuA2XwPUiPqC7111O DeKlC/t+W1JEvZTiKMi83pKq+9wqhiHmD0qxRHhV57S+MT8e7mdlSKOp7uUkKPkg LPlerXosnmeFjL2emuSnKl/tq8pOyruU6uaIGG/uwpbo2W86Dok9GY2GWkQ4pANT pDtprc4aJ/Clf6Q0CoKwQbmAozqTDeJo+Und9tRs2KuZRly2bWOcyVE0lyK+Y4s0 544LcKw2q6cB9ARZ6JExEVRJejPISGKMqo9TaHkyNSIJoiiatKYvNS4WVeFtTgbI W+1WfM1svPymNRqVPO1PMLV+3m9dalDH2WjtaFF21uCAQ/G0AuPEHjEDbbx0HIpb qrvWmYzZ97Rm/LdYROFRO53nEdCp2jh6c3n4/2kGYM8H0suvGxXZsB1g4i+Dm+B+ qKVTS282azlDuH9ohXeXizeb6atK6s8TC3Rmew97SmXDO00cUQzEQO/ZquRLHY9r n83afQ4OL2Z9yruAxAk7pCshVSyheOsHuFPuZ7bwPW31VMdoWNRkhnaTUXMjGfYg 3y39IHrCKWNMCCVM1iNl =z4d6 -----END PGP SIGNATURE----- Merge tag 'gpio-v4.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Pull GPIO updates from Linus Walleij: "This is the bulk of GPIO changes for kernel cycle v4.7: Core infrastructural changes: - Support for natively single-ended GPIO driver stages. This means that if the hardware has registers to configure open drain or open source configuration, we use that rather than (as we did before) try to emulate it by switching the line to an input to get high impedance. This is also documented throughly in Documentation/gpio/driver.txt for those of you who did not understand one word of what I just wrote. - Start to do away with the unnecessarily complex and unitelligible ARCH_REQUIRE_GPIOLIB and ARCH_WANT_OPTIONAL_GPIOLIB, another evolutional artifact from the time when the GPIO subsystem was unmaintained. Archs can now just select GPIOLIB and be done with it, cleanups to arches will trickle in for the next kernel. Some minor archs ACKed the changes immediately so these are included in this pull request. - Advancing the use of the data pointer inside the GPIO device for storing driver data by switching the PowerPC, Super-H Unicore and a few other subarches or subsystem drivers in ALSA SoC, Input, serial, SSB, staging etc to use it. - The initialization now reads the input/output state of the GPIO lines, so that each GPIO descriptor knows - if this callback is implemented - whether the line is input or output. This also reflects nicely in userspace "lsgpio". - It is now possible to name GPIO producer names, line names, from the device tree. (Platform data has been supported for a while). I bet we will get a similar mechanism for ACPI one of those days. This makes is possible to get sensible producer names for e.g. GPIO rails in "lsgpio" in userspace. New drivers: - New driver for the Loongson1. - The XLP driver now supports Broadcom Vulcan ARM64. - The IT87 driver now supports IT8620 and IT8628. - The PCA953X driver now supports Galileo Gen2. Driver improvements: - MCP23S08 was switched to use the gpiolib irqchip helpers and now also suppors level-triggered interrupts. - 74x164 and RCAR now supports the .set_multiple() callback - AMDPT was converted to use generic GPIO. - TC3589x, TPS65218, SX150X, F7188X, MENZ127, VX855, WM831X, WM8994 support the new single ended callback for open drain and in some cases open source. - Implement the .get_direction() callback for a few more drivers like PL061, Xgene. Cleanups: - Paul Gortmaker combed through the drivers and de-modularized those who are not really modules. - Move the GPIO poweroff DT bindings to the power subdir where they belong. - Rename gpio-generic.c to gpio-mmio.c, which is much more to the point. That's what it is handling, nothing more, nothing less" * tag 'gpio-v4.7-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (126 commits) MIPS: do away with ARCH_[WANT_OPTIONAL|REQUIRE]_GPIOLIB gpio: zevio: make it explicitly non-modular gpio: timberdale: make it explicitly non-modular gpio: stmpe: make it explicitly non-modular gpio: sodaville: make it explicitly non-modular pinctrl: sh-pfc: Let gpio_chip.to_irq() return zero on error gpio: dwapb: Add ACPI device ID for DWAPB GPIO controller on X-Gene platforms gpio: dt-bindings: add wd,mbl-gpio bindings gpio: of: make it possible to name GPIO lines gpio: make gpiod_to_irq() return negative for NO_IRQ gpio: xgene: implement .get_direction() gpio: xgene: Enable ACPI support for X-Gene GFC GPIO driver gpio: tegra: Implement gpio_get_direction callback gpio: set up initial state from .get_direction() gpio: rename gpio-generic.c into gpio-mmio.c gpio: generic: fix GPIO_GENERIC_PLATFORM is set to module case gpio: dwapb: add gpio-signaled acpi event support gpio: dwapb: convert device node to fwnode gpio: dwapb: remove name from dwapb_port_property gpio/qoriq: select IRQ_DOMAIN ... |
||
Linus Torvalds
|
0b86c75db6 |
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching
Pull livepatching updates from Jiri Kosina: - remove of our own implementation of architecture-specific relocation code and leveraging existing code in the module loader to perform arch-dependent work, from Jessica Yu. The relevant patches have been acked by Rusty (for module.c) and Heiko (for s390). - live patching support for ppc64le, which is a joint work of Michael Ellerman and Torsten Duwe. This is coming from topic branch that is share between livepatching.git and ppc tree. - addition of livepatching documentation from Petr Mladek * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching: livepatch: make object/func-walking helpers more robust livepatch: Add some basic livepatch documentation powerpc/livepatch: Add live patching support on ppc64le powerpc/livepatch: Add livepatch stack to struct thread_info powerpc/livepatch: Add livepatch header livepatch: Allow architectures to specify an alternate ftrace location ftrace: Make ftrace_location_range() global livepatch: robustify klp_register_patch() API error checking Documentation: livepatch: outline Elf format and requirements for patch modules livepatch: reuse module loader code to write relocations module: s390: keep mod_arch_specific for livepatch modules module: preserve Elf information for livepatch modules Elf: add livepatch-specific Elf constants |
||
Linus Torvalds
|
a7fd20d1c4 |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller: "Highlights: 1) Support SPI based w5100 devices, from Akinobu Mita. 2) Partial Segmentation Offload, from Alexander Duyck. 3) Add GMAC4 support to stmmac driver, from Alexandre TORGUE. 4) Allow cls_flower stats offload, from Amir Vadai. 5) Implement bpf blinding, from Daniel Borkmann. 6) Optimize _ASYNC_ bit twiddling on sockets, unless the socket is actually using FASYNC these atomics are superfluous. From Eric Dumazet. 7) Run TCP more preemptibly, also from Eric Dumazet. 8) Support LED blinking, EEPROM dumps, and rxvlan offloading in mlx5e driver, from Gal Pressman. 9) Allow creating ppp devices via rtnetlink, from Guillaume Nault. 10) Improve BPF usage documentation, from Jesper Dangaard Brouer. 11) Support tunneling offloads in qed, from Manish Chopra. 12) aRFS offloading in mlx5e, from Maor Gottlieb. 13) Add RFS and RPS support to SCTP protocol, from Marcelo Ricardo Leitner. 14) Add MSG_EOR support to TCP, this allows controlling packet coalescing on application record boundaries for more accurate socket timestamp sampling. From Martin KaFai Lau. 15) Fix alignment of 64-bit netlink attributes across the board, from Nicolas Dichtel. 16) Per-vlan stats in bridging, from Nikolay Aleksandrov. 17) Several conversions of drivers to ethtool ksettings, from Philippe Reynes. 18) Checksum neutral ILA in ipv6, from Tom Herbert. 19) Factorize all of the various marvell dsa drivers into one, from Vivien Didelot 20) Add VF support to qed driver, from Yuval Mintz" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1649 commits) Revert "phy dp83867: Fix compilation with CONFIG_OF_MDIO=m" Revert "phy dp83867: Make rgmii parameters optional" r8169: default to 64-bit DMA on recent PCIe chips phy dp83867: Make rgmii parameters optional phy dp83867: Fix compilation with CONFIG_OF_MDIO=m bpf: arm64: remove callee-save registers use for tmp registers asix: Fix offset calculation in asix_rx_fixup() causing slow transmissions switchdev: pass pointer to fib_info instead of copy net_sched: close another race condition in tcf_mirred_release() tipc: fix nametable publication field in nl compat drivers: net: Don't print unpopulated net_device name qed: add support for dcbx. ravb: Add missing free_irq() calls to ravb_close() qed: Remove a stray tab net: ethernet: fec-mpc52xx: use phy_ethtool_{get|set}_link_ksettings net: ethernet: fec-mpc52xx: use phydev from struct net_device bpf, doc: fix typo on bpf_asm descriptions stmmac: hardware TX COE doesn't work when force_thresh_dma_mode is set net: ethernet: fs-enet: use phy_ethtool_{get|set}_link_ksettings net: ethernet: fs-enet: use phydev from struct net_device ... |
||
Linus Torvalds
|
a4d1dbed0e |
Merge branch 'for-4.7/core' of git://git.kernel.dk/linux-block
Pull core block layer updates from Jens Axboe: "This is the core block IO changes for this merge window. Nothing earth shattering in here, it's mostly just fixes. In detail: - Fix for a long standing issue where wrong ordering in blk-mq caused order_to_size() to spew a warning. From Bart. - Async discard support from Christoph. Basically just splitting our sync interface into a submit + wait part. - Add a cleaner interface for flagging whether a device has a write back cache or not. We've previously overloaded blk_queue_flush() with this, but let's make it more explicit. Drivers cleaned up and updated in the drivers pull request. From me. - Fix for a double check for whether IO accounting is enabled or not. From Michael Callahan. - Fix for the async discard from Mike Snitzer, reinstating the early EOPNOTSUPP return if the device doesn't support discards. - Also from Mike, export bio_inc_remaining() so dm can drop it's private copy of it. - From Ming Lin, add support for passing in an offset for request payloads. - Tag function export from Sagi, which will be used in NVMe in the drivers pull. - Two blktrace related fixes from Shaohua. - Propagate NOMERGE flag when making a request from a bio, also from Shaohua. - An optimization to not parse cgroup paths in blk-throttle, if we don't need to. From Shaohua" * 'for-4.7/core' of git://git.kernel.dk/linux-block: blk-mq: fix undefined behaviour in order_to_size() blk-throttle: don't parse cgroup path if trace isn't enabled blktrace: add missed mask name blktrace: delete garbage for message trace block: make bio_inc_remaining() interface accessible again block: reinstate early return of -EOPNOTSUPP from blkdev_issue_discard block: Minor blk_account_io_start usage cleanup block: add __blkdev_issue_discard block: remove struct bio_batch block: copy NOMERGE flag from bio to request block: add ability to flag write back caching on a device blk-mq: Export tagset iter function block: add offset in blk_add_request_payload() writeback: Fix performance regression in wb_over_bg_thresh() |
||
Linus Torvalds
|
7f427d3a60 |
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull parallel filesystem directory handling update from Al Viro. This is the main parallel directory work by Al that makes the vfs layer able to do lookup and readdir in parallel within a single directory. That's a big change, since this used to be all protected by the directory inode mutex. The inode mutex is replaced by an rwsem, and serialization of lookups of a single name is done by a "in-progress" dentry marker. The series begins with xattr cleanups, and then ends with switching filesystems over to actually doing the readdir in parallel (switching to the "iterate_shared()" that only takes the read lock). A more detailed explanation of the process from Al Viro: "The xattr work starts with some acl fixes, then switches ->getxattr to passing inode and dentry separately. This is the point where the things start to get tricky - that got merged into the very beginning of the -rc3-based #work.lookups, to allow untangling the security_d_instantiate() mess. The xattr work itself proceeds to switch a lot of filesystems to generic_...xattr(); no complications there. After that initial xattr work, the series then does the following: - untangle security_d_instantiate() - convert a bunch of open-coded lookup_one_len_unlocked() to calls of that thing; one such place (in overlayfs) actually yields a trivial conflict with overlayfs fixes later in the cycle - overlayfs ended up switching to a variant of lookup_one_len_unlocked() sans the permission checks. I would've dropped that commit (it gets overridden on merge from #ovl-fixes in #for-next; proper resolution is to use the variant in mainline fs/overlayfs/super.c), but I didn't want to rebase the damn thing - it was fairly late in the cycle... - some filesystems had managed to depend on lookup/lookup exclusion for *fs-internal* data structures in a way that would break if we relaxed the VFS exclusion. Fixing hadn't been hard, fortunately. - core of that series - parallel lookup machinery, replacing ->i_mutex with rwsem, making lookup_slow() take it only shared. At that point lookups happen in parallel; lookups on the same name wait for the in-progress one to be done with that dentry. Surprisingly little code, at that - almost all of it is in fs/dcache.c, with fs/namei.c changes limited to lookup_slow() - making it use the new primitive and actually switching to locking shared. - parallel readdir stuff - first of all, we provide the exclusion on per-struct file basis, same as we do for read() vs lseek() for regular files. That takes care of most of the needed exclusion in readdir/readdir; however, these guys are trickier than lookups, so I went for switching them one-by-one. To do that, a new method '->iterate_shared()' is added and filesystems are switched to it as they are either confirmed to be OK with shared lock on directory or fixed to be OK with that. I hope to kill the original method come next cycle (almost all in-tree filesystems are switched already), but it's still not quite finished. - several filesystems get switched to parallel readdir. The interesting part here is dealing with dcache preseeding by readdir; that needs minor adjustment to be safe with directory locked only shared. Most of the filesystems doing that got switched to in those commits. Important exception: NFS. Turns out that NFS folks, with their, er, insistence on VFS getting the fuck out of the way of the Smart Filesystem Code That Knows How And What To Lock(tm) have grown the locking of their own. They had their own homegrown rwsem, with lookup/readdir/atomic_open being *writers* (sillyunlink is the reader there). Of course, with VFS getting the fuck out of the way, as requested, the actual smarts of the smart filesystem code etc. had become exposed... - do_last/lookup_open/atomic_open cleanups. As the result, open() without O_CREAT locks the directory only shared. Including the ->atomic_open() case. Backmerge from #for-linus in the middle of that - atomic_open() fix got brought in. - then comes NFS switch to saner (VFS-based ;-) locking, killing the homegrown "lookup and readdir are writers" kinda-sorta rwsem. All exclusion for sillyunlink/lookup is done by the parallel lookups mechanism. Exclusion between sillyunlink and rmdir is a real rwsem now - rmdir being the writer. Result: NFS lookups/readdirs/O_CREAT-less opens happen in parallel now. - the rest of the series consists of switching a lot of filesystems to parallel readdir; in a lot of cases ->llseek() gets simplified as well. One backmerge in there (again, #for-linus - rockridge fix)" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (74 commits) ext4: switch to ->iterate_shared() hfs: switch to ->iterate_shared() hfsplus: switch to ->iterate_shared() hostfs: switch to ->iterate_shared() hpfs: switch to ->iterate_shared() hpfs: handle allocation failures in hpfs_add_pos() gfs2: switch to ->iterate_shared() f2fs: switch to ->iterate_shared() afs: switch to ->iterate_shared() befs: switch to ->iterate_shared() befs: constify stuff a bit isofs: switch to ->iterate_shared() get_acorn_filename(): deobfuscate a bit btrfs: switch to ->iterate_shared() logfs: no need to lock directory in lseek switch ecryptfs to ->iterate_shared 9p: switch to ->iterate_shared() fat: switch to ->iterate_shared() romfs, squashfs: switch to ->iterate_shared() more trivial ->iterate_shared conversions ... |
||
Linus Torvalds
|
ede40902cf |
Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner: "This update delivers: - Yet another interrupt chip diver (LPC32xx) - Core functions to handle partitioned per-cpu interrupts - Enhancements to the IPI core - Proper handling of irq type configuration - A large set of ARM GIC enhancements - The usual pile of small fixes, cleanups and enhancements" * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits) irqchip/bcm2836: Use a more generic memory barrier call irqchip/bcm2836: Fix compiler warning on 64-bit build irqchip/bcm2836: Drop smp_set_ops on arm64 builds irqchip/gic: Add helper functions for GIC setup and teardown irqchip/gic: Store GIC configuration parameters irqchip/gic: Pass GIC pointer to save/restore functions irqchip/gic: Return an error if GIC initialisation fails irqchip/gic: Remove static irq_chip definition for eoimode1 irqchip/gic: Don't initialise chip if mapping IO space fails irqchip/gic: WARN if setting the interrupt type for a PPI fails irqchip/gic: Don't unnecessarily write the IRQ configuration irqchip: Mask the non-type/sense bits when translating an IRQ genirq: Ensure IRQ descriptor is valid when setting-up the IRQ irqchip/gic-v3: Configure all interrupts as non-secure Group-1 irqchip/gic-v2m: Add workaround for Broadcom NS2 GICv2m erratum irqchip/irq-alpine-msi: Don't use <asm-generic/msi.h> irqchip/mbigen: Checking for IS_ERR() instead of NULL irqchip/gic-v3: Remove inexistant register definition irqchip/gicv3-its: Don't allow devices whose ID is outside range irqchip: Add LPC32xx interrupt controller driver ... |
||
Linus Torvalds
|
91e8d0cbc9 |
Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner: "A rather small set of patches from the timer departement: - Some more y2038 work - Yet another new clocksource driver - The usual set of small fixes, cleanups and enhancements" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: clocksource/drivers/tegra: Remove unused suspend/resume code clockevents/driversi/mps2: add MPS2 Timer driver dt-bindings: document the MPS2 timer bindings clocksource/drivers/mtk_timer: Add __init attribute clockevents/drivers/dw_apb_timer: Implement ->set_state_oneshot_stopped() time: Introduce do_sys_settimeofday64() security: Introduce security_settime64() clocksource: Add missing include of of.h. |
||
Linus Torvalds
|
2fe2edf85f |
Hao Qin reported an integer overflow possibility with signed and
unsigned numbers in the ring-buffer code. https://bugzilla.kernel.org/show_bug.cgi?id=118001 At first I did not think this was too much of an issue, because the overflow would be caught later when either too much data was allocated or it would trigger RB_WARN_ON() which shuts down the ring buffer. But looking closer into it, I found that the right settings could bypass the checks and crash the kernel. Luckily, this is only accessible by root. The first fix is to convert all the variables into long, such that we don't get into issues between 32 bit variables being assigned 64 bit ones. This fixes the RB_WARN_ON() triggering. The next fix is to get rid of a duplicate DIV_ROUND_UP() that when called twice with the right value, can cause a kernel crash. The first DIV_ROUND_UP() is to normalize the input and it is checked against the minimum allowable value. But then DIV_ROUND_UP() is called again, which can overflow due to the (a + b - 1)/b, logic. The first called upped the value, the second can overflow (with the +b part). The second call to DIV_ROUND_UP() came in via a second change a while ago and the code is cleaned up to remove it. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEbBAABAgAGBQJXOdaqAAoJEKKk/i67LK/8FSAH93vLHClJJFaD5kn8dRhTS7rl xVHAC5jHCHiKkQqIGI/N7qhzZ7DqiXpIQjs8KcE86Ser65AGNA48aeBKAA6xSQ+k nghDGhiwLixaMIUFA7SNry4VBEcbACxtLENIhBMWo9fmw85jVTH98B958J6CXdlL g6OC/PCNmt7eZwPrSB/aqpZ1Jp0Fik3GMXjMtY7axo9D+ONm7LF9qiHT9BcyKxN4 WHC83yDwUsWqLWxuvuhpGAeMu+nCQurRsPebyXwFh4hj56fhWJjv21ZLKtn2MjKL 8VO9sKCVEQTvLRGSzPMNP9lxkeuVp/wPrj2JRvX2JtGOqurnRNt2gqIZn2qPqA== =Zjyz -----END PGP SIGNATURE----- Merge tag 'trace-fixes-v4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing ring-buffer fixes from Steven Rostedt: "Hao Qin reported an integer overflow possibility with signed and unsigned numbers in the ring-buffer code. https://bugzilla.kernel.org/show_bug.cgi?id=118001 At first I did not think this was too much of an issue, because the overflow would be caught later when either too much data was allocated or it would trigger RB_WARN_ON() which shuts down the ring buffer. But looking closer into it, I found that the right settings could bypass the checks and crash the kernel. Luckily, this is only accessible by root. The first fix is to convert all the variables into long, such that we don't get into issues between 32 bit variables being assigned 64 bit ones. This fixes the RB_WARN_ON() triggering. The next fix is to get rid of a duplicate DIV_ROUND_UP() that when called twice with the right value, can cause a kernel crash. The first DIV_ROUND_UP() is to normalize the input and it is checked against the minimum allowable value. But then DIV_ROUND_UP() is called again, which can overflow due to the (a + b - 1)/b, logic. The first called upped the value, the second can overflow (with the +b part). The second call to DIV_ROUND_UP() came in via a second change a while ago and the code is cleaned up to remove it" * tag 'trace-fixes-v4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ring-buffer: Prevent overflow of size in ring_buffer_resize() ring-buffer: Use long for nr_pages to avoid overflow failures |
||
Jiri Kosina
|
be69f70e63 | Merge branches 'for-4.7/core', 'for-4.7/livepatching-doc' and 'for-4.7/livepatching-ppc64' into for-linus | ||
Al Viro
|
0e0162bb8c |
Merge branch 'ovl-fixes' into for-linus
Backmerge to resolve a conflict in ovl_lookup_real(); "ovl_lookup_real(): use lookup_one_len_unlocked()" instead, but it was too late in the cycle to rebase. |
||
Linus Torvalds
|
d57d394319 |
Power management material for v4.7-rc1
- New cpufreq "schedutil" governor (making decisions based on CPU utilization information provided by the scheduler and capable of switching CPU frequencies right away if the underlying driver supports that) and support for fast frequency switching in the acpi-cpufreq driver (Rafael Wysocki). - Consolidation of CPU frequency management on ARM platforms allowing them to get rid of some platform-specific boilerplate code if they are going to use the cpufreq-dt driver (Viresh Kumar, Finley Xiao, Marc Gonzalez). - Support for ACPI _PPC and CPU frequency limits in the intel_pstate driver (Srinivas Pandruvada). - Fixes and cleanups in the cpufreq core and generic governor code (Rafael Wysocki, Sai Gurrappadi). - intel_pstate driver optimizations and cleanups (Rafael Wysocki, Philippe Longepe, Chen Yu, Joe Perches). - cpufreq powernv driver fixes and cleanups (Akshay Adiga, Shilpasri Bhat). - cpufreq qoriq driver fixes and cleanups (Jia Hongtao). - ACPI cpufreq driver cleanups (Viresh Kumar). - Assorted cpufreq driver updates (Ashwin Chaugule, Geliang Tang, Javier Martinez Canillas, Paul Gortmaker, Sudeep Holla). - Assorted cpufreq fixes and cleanups (Joe Perches, Arnd Bergmann). - Fixes and cleanups in the OPP (Operating Performance Points) framework, mostly related to OPP sharing, and reorganization of OF-dependent code in it (Viresh Kumar, Arnd Bergmann, Sudeep Holla). - New "passive" governor for devfreq (for SoC subsystems that will rely on someone else for the management of their power resources) and consolidation of devfreq support for Exynos platforms, coding style and typo fixes for devfreq (Chanwoo Choi, MyungJoo Ham). - PM core fixes and cleanups, mostly to make it work better with the generic power domains (genpd) framework, and updates for that framework (Ulf Hansson, Thierry Reding, Colin Ian King). - Intel Broxton support for the intel_idle driver (Len Brown). - cpuidle core optimization and fix (Daniel Lezcano, Dave Gerlach). - ARM cpuidle cleanups (Jisheng Zhang). - Intel Kabylake support for the RAPL power capping driver (Jacob Pan). - AVS (Adaptive Voltage Switching) rockchip-io driver update (Heiko Stuebner). - Updates for the cpupower tool (Arjun Sreedharan, Colin Ian King, Mattia Dongili, Thomas Renninger). / -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABCAAGBQJXOjLgAAoJEILEb/54YlRxfn0P/RbSPpNlUNBIE8DFrdD9jRdJ TIpZ7uiHi9tU1ZF17UBbb/SwuWfYVnVmiorZGRfFOtGaoqh0HFZ/nplDz99rK0ku vW2OnbojMQEUMU3IcUT1y4BsSl0H23f7ZOKrdprALeWxDQmbgnYjrE6vkX6hRtld A8eeZvIEJ5CzV8S+9aOOOpojW2yXk5dYGdZ7gpQdoM0n7zVLyPnNucJoha3BYmOG FwKEIe05RpIhfLfGT0CXIRcOzwAZ6ZWKgOrXUrx/AadPbvu/TP9zkI0djYI8ukyv z2oiO/GExoeGVuUzvy8vY5SiH4NQvViftFzMZepcsmjxmVglohMPRL8VLjZIBckk DDcqH9e0OQI20jjYT1vIf5+JWBvLxuQfGtyzI0S+sE/elB1zI/3O8p+8N2CuF5n+ my2dawIewnHI/0AdSpJ+K7DVrfwPHAX19axtPX3dJSLh2OuHCPNlAtbxRGAriBfH Zv9NETxlrch69o2AD4K54DErWV1FsYLznzK5Zms6MC2Ispbb+oiYpacTlZblznvb H5U2SSNlA5Niir3vVJ01nKRtzxlWoi67CQxbYrGhlaR0nTTxf9HqWgcSiTZrn7Pv hs+LA2aUfMf3JGjStdORS7S8biQSid5vypfkglpWLZBKHNC9BqqZd9gSM+jF3FVh ps4mMM4UXY4hnoFDkMBI =WM89 -----END PGP SIGNATURE----- Merge tag 'pm-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "The majority of changes go into the cpufreq subsystem this time. To me, quite obviously, the biggest ticket item is the new "schedutil" governor. Interestingly enough, it's the first new cpufreq governor since the beginning of the git era (except for some out-of-the-tree ones). There are two main differences between it and the existing governors. First, it uses the information provided by the scheduler directly for making its decisions, so it doesn't have to track anything by itself. Second, it can invoke drivers (supporting that feature) to adjust CPU performance right away without having to spawn work items to be executed in process context or similar. Currently, the acpi-cpufreq driver is the only one supporting that mode of operation, but then it is used on a large number of systems. The "schedutil" governor as included here is very simple and mostly regarded as a foundation for future work on the integration of the scheduler with CPU power management (in fact, there is work in progress on top of it already). Nevertheless it works and the preliminary results obtained with it are encouraging. There also is some consolidation of CPU frequency management for ARM platforms that can add their machine IDs the the new stub dt-platdev driver now and that will take care of creating the requisite platform device for cpufreq-dt, so it is not necessary to do that in platform code any more. Several ARM platforms are switched over to using this generic mechanism. In addition to that, the intel_pstate driver is now going to respect CPU frequency limits set by the platform firmware (or a BMC) and provided via the ACPI _PPC object. The devfreq subsystem is getting a new "passive" governor for SoCs subsystems that will depend on somebody else to manage their voltage rails and its support for Samsung Exynos SoCs is consolidated. The rest is support for new hardware (Intel Broxton support in intel_idle for one example), bug fixes, optimizations and cleanups in a number of places. Specifics: - New cpufreq "schedutil" governor (making decisions based on CPU utilization information provided by the scheduler and capable of switching CPU frequencies right away if the underlying driver supports that) and support for fast frequency switching in the acpi-cpufreq driver (Rafael Wysocki) - Consolidation of CPU frequency management on ARM platforms allowing them to get rid of some platform-specific boilerplate code if they are going to use the cpufreq-dt driver (Viresh Kumar, Finley Xiao, Marc Gonzalez) - Support for ACPI _PPC and CPU frequency limits in the intel_pstate driver (Srinivas Pandruvada) - Fixes and cleanups in the cpufreq core and generic governor code (Rafael Wysocki, Sai Gurrappadi) - intel_pstate driver optimizations and cleanups (Rafael Wysocki, Philippe Longepe, Chen Yu, Joe Perches) - cpufreq powernv driver fixes and cleanups (Akshay Adiga, Shilpasri Bhat) - cpufreq qoriq driver fixes and cleanups (Jia Hongtao) - ACPI cpufreq driver cleanups (Viresh Kumar) - Assorted cpufreq driver updates (Ashwin Chaugule, Geliang Tang, Javier Martinez Canillas, Paul Gortmaker, Sudeep Holla) - Assorted cpufreq fixes and cleanups (Joe Perches, Arnd Bergmann) - Fixes and cleanups in the OPP (Operating Performance Points) framework, mostly related to OPP sharing, and reorganization of OF-dependent code in it (Viresh Kumar, Arnd Bergmann, Sudeep Holla) - New "passive" governor for devfreq (for SoC subsystems that will rely on someone else for the management of their power resources) and consolidation of devfreq support for Exynos platforms, coding style and typo fixes for devfreq (Chanwoo Choi, MyungJoo Ham) - PM core fixes and cleanups, mostly to make it work better with the generic power domains (genpd) framework, and updates for that framework (Ulf Hansson, Thierry Reding, Colin Ian King) - Intel Broxton support for the intel_idle driver (Len Brown) - cpuidle core optimization and fix (Daniel Lezcano, Dave Gerlach) - ARM cpuidle cleanups (Jisheng Zhang) - Intel Kabylake support for the RAPL power capping driver (Jacob Pan) - AVS (Adaptive Voltage Switching) rockchip-io driver update (Heiko Stuebner) - Updates for the cpupower tool (Arjun Sreedharan, Colin Ian King, Mattia Dongili, Thomas Renninger)" * tag 'pm-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (112 commits) intel_pstate: Clean up get_target_pstate_use_performance() intel_pstate: Use sample.core_avg_perf in get_avg_pstate() intel_pstate: Clarify average performance computation intel_pstate: Avoid unnecessary synchronize_sched() during initialization cpufreq: schedutil: Make default depend on CONFIG_SMP cpufreq: powernv: del_timer_sync when global and local pstate are equal cpufreq: powernv: Move smp_call_function_any() out of irq safe block intel_pstate: Clean up intel_pstate_get() cpufreq: schedutil: Make it depend on CONFIG_SMP cpufreq: governor: Fix handling of special cases in dbs_update() PM / OPP: Move CONFIG_OF dependent code in a separate file cpufreq: intel_pstate: Ignore _PPC processing under HWP cpufreq: arm_big_little: use generic OPP functions for {init, free}_opp_table PM / OPP: add non-OF versions of dev_pm_opp_{cpumask_, }remove_table cpufreq: tango: Use generic platdev driver PM / OPP: pass cpumask by reference cpufreq: Fix GOV_LIMITS handling for the userspace governor cpupower: fix potential memory leak PM / devfreq: style/typo fixes PM / devfreq: exynos: Add the detailed correlation for Exynos5422 bus .. |
||
Arnaldo Carvalho de Melo
|
c85b033496 |
perf core: Separate accounting of contexts and real addresses in a stack trace
The perf_sample->ip_callchain->nr value includes all the entries in the ip_callchain->ip[] array, real addresses and PERF_CONTEXT_{KERNEL,USER,etc}, while what the user expects is that what is in the kernel.perf_event_max_stack sysctl or in the upcoming per event perf_event_attr.sample_max_stack knob be honoured in terms of IP addresses in the stack trace. So allocate a bunch of extra entries for contexts, and do the accounting via perf_callchain_entry_ctx struct members. A new sysctl, kernel.perf_event_max_contexts_per_stack is also introduced for investigating possible bugs in the callchain implementation by some arch. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Brendan Gregg <brendan.d.gregg@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: He Kuang <hekuang@huawei.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Milian Wolff <milian.wolff@kdab.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: Wang Nan <wangnan0@huawei.com> Cc: Zefan Li <lizefan@huawei.com> Link: http://lkml.kernel.org/n/tip-3b4wnqk340c4sg4gwkfdi9yk@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
Arnaldo Carvalho de Melo
|
3e4de4ec4c |
perf core: Add perf_callchain_store_context() helper
We need have different helpers to account how many contexts we have in the sample and for real addresses, so do it now as a prep patch, to ease review. Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-q964tnyuqrxw5gld18vizs3c@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |