Many systems build/test up-to-date kernels with older libcs, and
an older glibc (2.17) lacks the definition of SOL_DCCP in
/usr/include/bits/socket.h (it was added in the 4.6 timeframe).
Adding the definition to the test program avoids a compilation
failure that gets in the way of building tools/testing/selftests/net.
The test itself will work once the definition is added; either
skipping due to DCCP not being configured in the kernel under test
or passing, so there are no other more up-to-date glibc dependencies
here it seems beyond that missing definition.
Fixes: 11fb60d108 ("selftests: net: reuseport_addr_any: add DCCP")
Signed-off-by: Alan Maguire <alan.maguire@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The hardware offloading of the NETIF_F_HW_CSUM and NETIF_F_RXCSUM
features requires the use of Transmit Status Blocks before transmit
frame data and Receive Status Blocks before receive frame data to
carry the checksum information.
Unfortunately, these status blocks are currently only enabled when
the NETIF_F_HW_CSUM feature is enabled. As a result NETIF_F_RXCSUM
will not actually be offloaded to the hardware unless both it and
NETIF_F_HW_CSUM are enabled. Fortunately, that is the default
configuration.
This commit addresses this issue by always enabling the use of
status blocks on both transmit and receive frames. Further, it
replaces the use of a dedicated flag within the driver private
data structure with direct use of the netdev features flags.
Fixes: 8101553978 ("net: bcmgenet: use CHECKSUM_COMPLETE for NETIF_F_RXCSUM")
Signed-off-by: Doug Berger <opendmb@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
When the DP83867 PHY is strapped to enable Fast Link Drop (FLD) feature
STRAP_STS2.STRAP_ FLD (reg 0x006F bit 10), the Energy Lost Threshold for
FLD Energy Lost Mode FLD_THR_CFG.ENERGY_LOST_FLD_THR (reg 0x002e bits 2:0)
will be defaulted to 0x2. This may cause the phy link to be unstable. The
new DP83867 DM recommends to always restore ENERGY_LOST_FLD_THR to 0x1.
Hence, restore default value of FLD_THR_CFG.ENERGY_LOST_FLD_THR to 0x1 when
FLD is enabled by bootstrapping as recommended by DM.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Make sure we clean up devicetree related configuration
also when clock init fails.
Fixes: fecd4d7eef ("net: stmmac: dwmac-rk: Add integrated PHY support")
Signed-off-by: Emil Renner Berthing <kernel@esmil.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
As the description before netdev_run_todo, we cannot call free_netdev
before rtnl_unlock, fix it by reorder the code.
This patch is a 1:1 copy of upstream slip.c commit f596c87005
("slip: not call free_netdev before rtnl_unlock in slip_open").
Reported-by: yangerkun <yangerkun@huawei.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Headers ionic_if.h and ionic_regs.h are licensed under three alternative
licenses and the used SPDX-License-Identifier expression makes
./scripts/spdxcheck.py complain:
drivers/net/ethernet/pensando/ionic/ionic_if.h: 1:52 Syntax error: OR
drivers/net/ethernet/pensando/ionic/ionic_regs.h: 1:52 Syntax error: OR
As OR is associative, it is irrelevant if the parentheses are put around
the first or the second OR-expression.
Simply add parentheses to make spdxcheck.py happy.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Acked-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
The port->hsr is used in the hsr_handle_frame(), which is a
callback of rx_handler.
hsr master and slaves are initialized in hsr_add_port().
This function initializes several pointers, which includes port->hsr after
registering rx_handler.
So, in the rx_handler routine, un-initialized pointer would be used.
In order to fix this, pointers should be initialized before
registering rx_handler.
Test commands:
ip netns del left
ip netns del right
modprobe -rv veth
modprobe -rv hsr
killall ping
modprobe hsr
ip netns add left
ip netns add right
ip link add veth0 type veth peer name veth1
ip link add veth2 type veth peer name veth3
ip link add veth4 type veth peer name veth5
ip link set veth1 netns left
ip link set veth3 netns right
ip link set veth4 netns left
ip link set veth5 netns right
ip link set veth0 up
ip link set veth2 up
ip link set veth0 address fc:00:00:00:00:01
ip link set veth2 address fc:00:00:00:00:02
ip netns exec left ip link set veth1 up
ip netns exec left ip link set veth4 up
ip netns exec right ip link set veth3 up
ip netns exec right ip link set veth5 up
ip link add hsr0 type hsr slave1 veth0 slave2 veth2
ip a a 192.168.100.1/24 dev hsr0
ip link set hsr0 up
ip netns exec left ip link add hsr1 type hsr slave1 veth1 slave2 veth4
ip netns exec left ip a a 192.168.100.2/24 dev hsr1
ip netns exec left ip link set hsr1 up
ip netns exec left ip n a 192.168.100.1 dev hsr1 lladdr \
fc:00:00:00:00:01 nud permanent
ip netns exec left ip n r 192.168.100.1 dev hsr1 lladdr \
fc:00:00:00:00:01 nud permanent
for i in {1..100}
do
ip netns exec left ping 192.168.100.1 &
done
ip netns exec left hping3 192.168.100.1 -2 --flood &
ip netns exec right ip link add hsr2 type hsr slave1 veth3 slave2 veth5
ip netns exec right ip a a 192.168.100.3/24 dev hsr2
ip netns exec right ip link set hsr2 up
ip netns exec right ip n a 192.168.100.1 dev hsr2 lladdr \
fc:00:00:00:00:02 nud permanent
ip netns exec right ip n r 192.168.100.1 dev hsr2 lladdr \
fc:00:00:00:00:02 nud permanent
for i in {1..100}
do
ip netns exec right ping 192.168.100.1 &
done
ip netns exec right hping3 192.168.100.1 -2 --flood &
while :
do
ip link add hsr0 type hsr slave1 veth0 slave2 veth2
ip a a 192.168.100.1/24 dev hsr0
ip link set hsr0 up
ip link del hsr0
done
Splat looks like:
[ 120.954938][ C0] general protection fault, probably for non-canonical address 0xdffffc0000000006: 0000 [#1]I
[ 120.957761][ C0] KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037]
[ 120.959064][ C0] CPU: 0 PID: 1511 Comm: hping3 Not tainted 5.6.0-rc5+ #460
[ 120.960054][ C0] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 120.962261][ C0] RIP: 0010:hsr_addr_is_self+0x65/0x2a0 [hsr]
[ 120.963149][ C0] Code: 44 24 18 70 73 2f c0 48 c1 eb 03 48 8d 04 13 c7 00 f1 f1 f1 f1 c7 40 04 00 f2 f2 f2 4
[ 120.966277][ C0] RSP: 0018:ffff8880d9c09af0 EFLAGS: 00010206
[ 120.967293][ C0] RAX: 0000000000000006 RBX: 1ffff1101b38135f RCX: 0000000000000000
[ 120.968516][ C0] RDX: dffffc0000000000 RSI: ffff8880d17cb208 RDI: 0000000000000000
[ 120.969718][ C0] RBP: 0000000000000030 R08: ffffed101b3c0e3c R09: 0000000000000001
[ 120.972203][ C0] R10: 0000000000000001 R11: ffffed101b3c0e3b R12: 0000000000000000
[ 120.973379][ C0] R13: ffff8880aaf80100 R14: ffff8880aaf800f2 R15: ffff8880aaf80040
[ 120.974410][ C0] FS: 00007f58e693f740(0000) GS:ffff8880d9c00000(0000) knlGS:0000000000000000
[ 120.979794][ C0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 120.980773][ C0] CR2: 00007ffcb8b38f29 CR3: 00000000afe8e001 CR4: 00000000000606f0
[ 120.981945][ C0] Call Trace:
[ 120.982411][ C0] <IRQ>
[ 120.982848][ C0] ? hsr_add_node+0x8c0/0x8c0 [hsr]
[ 120.983522][ C0] ? rcu_read_lock_held+0x90/0xa0
[ 120.984159][ C0] ? rcu_read_lock_sched_held+0xc0/0xc0
[ 120.984944][ C0] hsr_handle_frame+0x1db/0x4e0 [hsr]
[ 120.985597][ C0] ? hsr_nl_nodedown+0x2b0/0x2b0 [hsr]
[ 120.986289][ C0] __netif_receive_skb_core+0x6bf/0x3170
[ 120.992513][ C0] ? check_chain_key+0x236/0x5d0
[ 120.993223][ C0] ? do_xdp_generic+0x1460/0x1460
[ 120.993875][ C0] ? register_lock_class+0x14d0/0x14d0
[ 120.994609][ C0] ? __netif_receive_skb_one_core+0x8d/0x160
[ 120.995377][ C0] __netif_receive_skb_one_core+0x8d/0x160
[ 120.996204][ C0] ? __netif_receive_skb_core+0x3170/0x3170
[ ... ]
Reported-by: syzbot+fcf5dd39282ceb27108d@syzkaller.appspotmail.com
Fixes: c5a7591172 ("net/hsr: Use list_head (and rcu) instead of array for slave devices.")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Luo bin says:
====================
hinic: BugFixes
Fix a number of bugs which have been present since the first commit.
The bugs fixed in these patchs are hardly exposed unless given
very specific conditions.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
the minimum value of skb len that hw supports is 32 rather than 17
Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
the second input parameter of wait_for_completion_timeout should
be jiffies instead of millisecond
Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
add read barrier in driver code to keep from reading other fileds
in dma memory which is writable for hw until we have verified the
memory is valid for driver
Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
should disable eq irq before freeing it, must clear event queue
depth in hw before freeing relevant memory to avoid illegal
memory access and update consumer idx to avoid invalid interrupt
Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
it's unreliable for fw to check whether IO is stopped, so driver
wait for enough time to ensure IO process is done in hw before
freeing resources
Signed-off-by: Luo bin <luobin9@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 3f8fd02b1b ("mm/vmalloc: Sync unmappings in
__purge_vmap_area_lazy()") introduced a call to vmalloc_sync_all() in
the vunmap() code-path. While this change was necessary to maintain
correctness on x86-32-pae kernels, it also adds additional cycles for
architectures that don't need it.
Specifically on x86-64 with CONFIG_VMAP_STACK=y some people reported
severe performance regressions in micro-benchmarks because it now also
calls the x86-64 implementation of vmalloc_sync_all() on vunmap(). But
the vmalloc_sync_all() implementation on x86-64 is only needed for newly
created mappings.
To avoid the unnecessary work on x86-64 and to gain the performance
back, split up vmalloc_sync_all() into two functions:
* vmalloc_sync_mappings(), and
* vmalloc_sync_unmappings()
Most call-sites to vmalloc_sync_all() only care about new mappings being
synchronized. The only exception is the new call-site added in the
above mentioned commit.
Shile Zhang directed us to a report of an 80% regression in reaim
throughput.
Fixes: 3f8fd02b1b ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()")
Reported-by: kernel test robot <oliver.sang@intel.com>
Reported-by: Shile Zhang <shile.zhang@linux.alibaba.com>
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Borislav Petkov <bp@suse.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> [GHES]
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20191009124418.8286-1-joro@8bytes.org
Link: https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/4D3JPPHBNOSPFK2KEPC6KGKS6J25AIDB/
Link: http://lkml.kernel.org/r/20191113095530.228959-1-shile.zhang@linux.alibaba.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sachin reports [1] a crash in SLUB __slab_alloc():
BUG: Kernel NULL pointer dereference on read at 0x000073b0
Faulting instruction address: 0xc0000000003d55f4
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in:
CPU: 19 PID: 1 Comm: systemd Not tainted 5.6.0-rc2-next-20200218-autotest #1
NIP: c0000000003d55f4 LR: c0000000003d5b94 CTR: 0000000000000000
REGS: c0000008b37836d0 TRAP: 0300 Not tainted (5.6.0-rc2-next-20200218-autotest)
MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24004844 XER: 00000000
CFAR: c00000000000dec4 DAR: 00000000000073b0 DSISR: 40000000 IRQMASK: 1
GPR00: c0000000003d5b94 c0000008b3783960 c00000000155d400 c0000008b301f500
GPR04: 0000000000000dc0 0000000000000002 c0000000003443d8 c0000008bb398620
GPR08: 00000008ba2f0000 0000000000000001 0000000000000000 0000000000000000
GPR12: 0000000024004844 c00000001ec52a00 0000000000000000 0000000000000000
GPR16: c0000008a1b20048 c000000001595898 c000000001750c18 0000000000000002
GPR20: c000000001750c28 c000000001624470 0000000fffffffe0 5deadbeef0000122
GPR24: 0000000000000001 0000000000000dc0 0000000000000002 c0000000003443d8
GPR28: c0000008b301f500 c0000008bb398620 0000000000000000 c00c000002287180
NIP ___slab_alloc+0x1f4/0x760
LR __slab_alloc+0x34/0x60
Call Trace:
___slab_alloc+0x334/0x760 (unreliable)
__slab_alloc+0x34/0x60
__kmalloc_node+0x110/0x490
kvmalloc_node+0x58/0x110
mem_cgroup_css_online+0x108/0x270
online_css+0x48/0xd0
cgroup_apply_control_enable+0x2ec/0x4d0
cgroup_mkdir+0x228/0x5f0
kernfs_iop_mkdir+0x90/0xf0
vfs_mkdir+0x110/0x230
do_mkdirat+0xb0/0x1a0
system_call+0x5c/0x68
This is a PowerPC platform with following NUMA topology:
available: 2 nodes (0-1)
node 0 cpus:
node 0 size: 0 MB
node 0 free: 0 MB
node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
node 1 size: 35247 MB
node 1 free: 30907 MB
node distances:
node 0 1
0: 10 40
1: 40 10
possible numa nodes: 0-31
This only happens with a mmotm patch "mm/memcontrol.c: allocate
shrinker_map on appropriate NUMA node" [2] which effectively calls
kmalloc_node for each possible node. SLUB however only allocates
kmem_cache_node on online N_NORMAL_MEMORY nodes, and relies on
node_to_mem_node to return such valid node for other nodes since commit
a561ce00b0 ("slub: fall back to node_to_mem_node() node if allocating
on memoryless node"). This is however not true in this configuration
where the _node_numa_mem_ array is not initialized for nodes 0 and 2-31,
thus it contains zeroes and get_partial() ends up accessing
non-allocated kmem_cache_node.
A related issue was reported by Bharata (originally by Ramachandran) [3]
where a similar PowerPC configuration, but with mainline kernel without
patch [2] ends up allocating large amounts of pages by kmalloc-1k
kmalloc-512. This seems to have the same underlying issue with
node_to_mem_node() not behaving as expected, and might probably also
lead to an infinite loop with CONFIG_SLUB_CPU_PARTIAL [4].
This patch should fix both issues by not relying on node_to_mem_node()
anymore and instead simply falling back to NUMA_NO_NODE, when
kmalloc_node(node) is attempted for a node that's not online, or has no
usable memory. The "usable memory" condition is also changed from
node_present_pages() to N_NORMAL_MEMORY node state, as that is exactly
the condition that SLUB uses to allocate kmem_cache_node structures.
The check in get_partial() is removed completely, as the checks in
___slab_alloc() are now sufficient to prevent get_partial() being
reached with an invalid node.
[1] https://lore.kernel.org/linux-next/3381CD91-AB3D-4773-BA04-E7A072A63968@linux.vnet.ibm.com/
[2] https://lore.kernel.org/linux-mm/fff0e636-4c36-ed10-281c-8cdb0687c839@virtuozzo.com/
[3] https://lore.kernel.org/linux-mm/20200317092624.GB22538@in.ibm.com/
[4] https://lore.kernel.org/linux-mm/088b5996-faae-8a56-ef9c-5b567125ae54@suse.cz/
Fixes: a561ce00b0 ("slub: fall back to node_to_mem_node() node if allocating on memoryless node")
Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Reported-by: PUVICHAKRAVARTHY RAMACHANDRAN <puvichakravarthy@in.ibm.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Tested-by: Bharata B Rao <bharata@linux.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Christopher Lameter <cl@linux.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200320115533.9604-1-vbabka@suse.cz
Debugged-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It is safe to traverse mm->notifier_subscriptions->list either under
SRCU read lock or mm->notifier_subscriptions->lock using
hlist_for_each_entry_rcu(). Silence the PROVE_RCU_LIST false positives,
for example,
WARNING: suspicious RCU usage
-----------------------------
mm/mmu_notifier.c:484 RCU-list traversed in non-reader section!!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
3 locks held by libvirtd/802:
#0: ffff9321e3f58148 (&mm->mmap_sem#2){++++}, at: do_mprotect_pkey+0xe1/0x3e0
#1: ffffffff91ae6160 (mmu_notifier_invalidate_range_start){+.+.}, at: change_p4d_range+0x5fa/0x800
#2: ffffffff91ae6e08 (srcu){....}, at: __mmu_notifier_invalidate_range_start+0x178/0x460
stack backtrace:
CPU: 7 PID: 802 Comm: libvirtd Tainted: G I 5.6.0-rc6-next-20200317+ #2
Hardware name: HP ProLiant BL460c Gen8, BIOS I31 11/02/2014
Call Trace:
dump_stack+0xa4/0xfe
lockdep_rcu_suspicious+0xeb/0xf5
__mmu_notifier_invalidate_range_start+0x3ff/0x460
change_p4d_range+0x746/0x800
change_protection+0x1df/0x300
mprotect_fixup+0x245/0x3e0
do_mprotect_pkey+0x23b/0x3e0
__x64_sys_mprotect+0x51/0x70
do_syscall_64+0x91/0xae8
entry_SYSCALL_64_after_hwframe+0x49/0xb3
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
Link: http://lkml.kernel.org/r/20200317175640.2047-1-cai@lca.pw
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This fixes possible lost wakeup introduced by commit a218cc4914.
Originally modifications to ep->wq were serialized by ep->wq.lock, but
in commit a218cc4914 ("epoll: use rwlock in order to reduce
ep_poll_callback() contention") a new rw lock was introduced in order to
relax fd event path, i.e. callers of ep_poll_callback() function.
After the change ep_modify and ep_insert (both are called on epoll_ctl()
path) were switched to ep->lock, but ep_poll (epoll_wait) was using
ep->wq.lock on wqueue list modification.
The bug doesn't lead to any wqueue list corruptions, because wake up
path and list modifications were serialized by ep->wq.lock internally,
but actual waitqueue_active() check prior wake_up() call can be
reordered with modifications of ep ready list, thus wake up can be lost.
And yes, can be healed by explicit smp_mb():
list_add_tail(&epi->rdlink, &ep->rdllist);
smp_mb();
if (waitqueue_active(&ep->wq))
wake_up(&ep->wp);
But let's make it simple, thus current patch replaces ep->wq.lock with
the ep->lock for wqueue modifications, thus wake up path always observes
activeness of the wqueue correcty.
Fixes: a218cc4914 ("epoll: use rwlock in order to reduce ep_poll_callback() contention")
Reported-by: Max Neunhoeffer <max@arangodb.com>
Signed-off-by: Roman Penyaev <rpenyaev@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Tested-by: Max Neunhoeffer <max@arangodb.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Christopher Kohlhoff <chris.kohlhoff@clearpool.io>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Jes Sorensen <jes.sorensen@gmail.com>
Cc: <stable@vger.kernel.org> [5.1+]
Link: http://lkml.kernel.org/r/20200214170211.561524-1-rpenyaev@suse.de
References: https://bugzilla.kernel.org/show_bug.cgi?id=205933
Bisected-by: Max Neunhoeffer <max@arangodb.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jann has brought up a very interesting point [1]. While shared pages
are excluded from MADV_PAGEOUT normally, CoW pages can be easily
reclaimed that way. This can lead to all sorts of hard to debug
problems. E.g. performance problems outlined by Daniel [2].
There are runtime environments where there is a substantial memory
shared among security domains via CoW memory and a easy to reclaim way
of that memory, which MADV_{COLD,PAGEOUT} offers, can lead to either
performance degradation in for the parent process which might be more
privileged or even open side channel attacks.
The feasibility of the latter is not really clear to me TBH but there is
no real reason for exposure at this stage. It seems there is no real
use case to depend on reclaiming CoW memory via madvise at this stage so
it is much easier to simply disallow it and this is what this patch
does. Put it simply MADV_{PAGEOUT,COLD} can operate only on the
exclusively owned memory which is a straightforward semantic.
[1] http://lkml.kernel.org/r/CAG48ez0G3JkMq61gUmyQAaCq=_TwHbi1XKzWRooxZkv08PQKuw@mail.gmail.com
[2] http://lkml.kernel.org/r/CAKOZueua_v8jHCpmEtTB6f3i9e2YnmX4mqdYVWhV4E=Z-n+zRQ@mail.gmail.com
Fixes: 9c276cc65a ("mm: introduce MADV_COLD")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Daniel Colascione <dancol@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200312082248.GS23944@dhcp22.suse.cz
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Prior to this commit, we only directly check the affected cgroup's
memory.high against its usage. However, it's possible that we are being
reclaimed as a result of hitting an ancestor memory.high and should be
penalised based on that, instead.
This patch changes memory.high overage throttling to use the largest
overage in its ancestors when considering how many penalty jiffies to
charge. This makes sure that we penalise poorly behaving cgroups in the
same way regardless of at what level of the hierarchy memory.high was
breached.
Fixes: 0e4b01df86 ("mm, memcg: throttle allocators when failing reclaim over memory.high")
Reported-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Chris Down <chris@chrisdown.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Nathan Chancellor <natechancellor@gmail.com>
Cc: Roman Gushchin <guro@fb.com>
Cc: <stable@vger.kernel.org> [5.4.x+]
Link: http://lkml.kernel.org/r/8cd132f84bd7e16cdb8fde3378cdbf05ba00d387.1584036142.git.chris@chrisdown.name
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 0e4b01df86 had a bunch of fixups to use the right division
method. However, it seems that after all that it still wasn't right --
div_u64 takes a 32-bit divisor.
The headroom is still large (2^32 pages), so on mundane systems you
won't hit this, but this should definitely be fixed.
Fixes: 0e4b01df86 ("mm, memcg: throttle allocators when failing reclaim over memory.high")
Reported-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Chris Down <chris@chrisdown.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Nathan Chancellor <natechancellor@gmail.com>
Cc: <stable@vger.kernel.org> [5.4.x+]
Link: http://lkml.kernel.org/r/80780887060514967d414b3cd91f9a316a16ab98.1584036142.git.chris@chrisdown.name
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In section_deactivate(), pfn_to_page() doesn't work any more after
ms->section_mem_map is resetting to NULL in SPARSEMEM|!VMEMMAP case. It
causes a hot remove failure:
kernel BUG at mm/page_alloc.c:4806!
invalid opcode: 0000 [#1] SMP PTI
CPU: 3 PID: 8 Comm: kworker/u16:0 Tainted: G W 5.5.0-next-20200205+ #340
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
Workqueue: kacpi_hotplug acpi_hotplug_work_fn
RIP: 0010:free_pages+0x85/0xa0
Call Trace:
__remove_pages+0x99/0xc0
arch_remove_memory+0x23/0x4d
try_remove_memory+0xc8/0x130
__remove_memory+0xa/0x11
acpi_memory_device_remove+0x72/0x100
acpi_bus_trim+0x55/0x90
acpi_device_hotplug+0x2eb/0x3d0
acpi_hotplug_work_fn+0x1a/0x30
process_one_work+0x1a7/0x370
worker_thread+0x30/0x380
kthread+0x112/0x130
ret_from_fork+0x35/0x40
Let's move the ->section_mem_map resetting after
depopulate_section_memmap() to fix it.
[akpm@linux-foundation.org: remove unneeded initialization, per David]
Fixes: ba72b4c8cf ("mm/sparsemem: support sub-section hotplug")
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Wei Yang <richardw.yang@linux.intel.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200307084229.28251-2-bhe@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
An eventfd monitors multiple memory thresholds of the cgroup, closes them,
the kernel deletes all events related to this eventfd. Before all events
are deleted, another eventfd monitors the memory threshold of this cgroup,
leading to a crash:
BUG: kernel NULL pointer dereference, address: 0000000000000004
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
PGD 800000033058e067 P4D 800000033058e067 PUD 3355ce067 PMD 0
Oops: 0002 [#1] SMP PTI
CPU: 2 PID: 14012 Comm: kworker/2:6 Kdump: loaded Not tainted 5.6.0-rc4 #3
Hardware name: LENOVO 20AWS01K00/20AWS01K00, BIOS GLET70WW (2.24 ) 05/21/2014
Workqueue: events memcg_event_remove
RIP: 0010:__mem_cgroup_usage_unregister_event+0xb3/0x190
RSP: 0018:ffffb47e01c4fe18 EFLAGS: 00010202
RAX: 0000000000000001 RBX: ffff8bb223a8a000 RCX: 0000000000000001
RDX: 0000000000000001 RSI: ffff8bb22fb83540 RDI: 0000000000000001
RBP: ffffb47e01c4fe48 R08: 0000000000000000 R09: 0000000000000010
R10: 000000000000000c R11: 071c71c71c71c71c R12: ffff8bb226aba880
R13: ffff8bb223a8a480 R14: 0000000000000000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff8bb242680000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000004 CR3: 000000032c29c003 CR4: 00000000001606e0
Call Trace:
memcg_event_remove+0x32/0x90
process_one_work+0x172/0x380
worker_thread+0x49/0x3f0
kthread+0xf8/0x130
ret_from_fork+0x35/0x40
CR2: 0000000000000004
We can reproduce this problem in the following ways:
1. We create a new cgroup subdirectory and a new eventfd, and then we
monitor multiple memory thresholds of the cgroup through this eventfd.
2. closing this eventfd, and __mem_cgroup_usage_unregister_event ()
will be called multiple times to delete all events related to this
eventfd.
The first time __mem_cgroup_usage_unregister_event() is called, the
kernel will clear all items related to this eventfd in thresholds->
primary.
Since there is currently only one eventfd, thresholds-> primary becomes
empty, so the kernel will set thresholds-> primary and hresholds-> spare
to NULL. If at this time, the user creates a new eventfd and monitor
the memory threshold of this cgroup, kernel will re-initialize
thresholds-> primary.
Then when __mem_cgroup_usage_unregister_event () is called for the
second time, because thresholds-> primary is not empty, the system will
access thresholds-> spare, but thresholds-> spare is NULL, which will
trigger a crash.
In general, the longer it takes to delete all events related to this
eventfd, the easier it is to trigger this problem.
The solution is to check whether the thresholds associated with the
eventfd has been cleared when deleting the event. If so, we do nothing.
[akpm@linux-foundation.org: fix comment, per Kirill]
Fixes: 907860ed38 ("cgroups: make cftype.unregister_event() void-returning")
Signed-off-by: Chunguang Xu <brookxu@tencent.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/077a6f67-aefa-4591-efec-f2f3af2b0b02@gmail.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl51dZoQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpt1GD/4qD5KahkC9cdRRcliGoYrY5CLJbEx+Hwlu
QNJKggGJKMs3f4KqD1JwGsZCppD9bPTD2qgjs8Hjkw6HCOabXx8elQBKwyhjVglu
5ogv981fmbzPau3n5gQ1llqjF6aslKpSkA9arQPgKov7gIoa2U3Gc1ZIO5Mz/X/T
J0Z4TqmjMhbjAyP9BqfVIyQyDR9WGvO4U/9XmTclKU4Rex7lT5JRiGVF0ZpqfXDQ
pkJOaqsltZVXN0J6Uy2e0qL5nkWIFhfrqjvoBG2V/ivt9zOfiPzmt9DLNl3S/QyU
TYtNvAg6wuw/DBOdDsLoHztQUWbBqUMhn6892ADc6786TwFZv0/Ytv+CqL2mxbYy
wImli5cnowWNevkNFpm3RLAMA0Oi8NiULb31AmRP23OyGSPB51JWhpnlqEylRnz2
aa8KkA+VHiuaMnsII6Caq6tXsXyNfoDPGYvy5vCIyzZXPTvvH4i7rPr0QYb6VjAa
v89mGE+Nx/eiC9FFEzPCXU5tgA6AMiMzoqXodRoSUSl7Lm+iGm4pPhUX4EIia1NG
6Jc1A4cOQTjM8mptJKPIEbAovHsVMRUext5pinQYtMB58R16ZQfpdzQY1ojJUllk
u2nWPyGswifMfpJ7daMhhYx/z6yQNy/MOqEPG8dNyc96R6OJoFGe+Wp41/9oP4Ly
OjnPSFygKA==
=q5rs
-----END PGP SIGNATURE-----
Merge tag 'block-5.6-20200320' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Just two NVMe fabrics fixes that should go into 5.6"
* tag 'block-5.6-20200320' of git://git.kernel.dk/linux-block:
nvmet-tcp: set MSG_MORE only if we actually have more to send
nvme-rdma: Avoid double freeing of async event data
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl51dbQQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpiV3EADJHB2r2hTTEym5u1PbrEEVkjvdL6InU8lD
lFM7m2g6yZUncwm+aSZynHqAFY6Rd5Jk+gmYMuioi3ZxC2rs7jG1AOTpaeJYmhle
lzkjqSLtl+gdPMA9ydivk1UwILFjtZKG1JNc++tnCn3q7+eCkgnWAlq5b7idG2eF
BS0AEZP6Yz1zStTHLbHSB0StY8ovMIw0VaVQvguHLL9EBpbHmrs0cq3tipWkAyPR
2YwnXbxsJySukkwmBKxEWrGUYDze56jqJIqdFsOE0+WtGV+nk7OScPseXAaP4/+G
Vl23VNfryuZcsBUwI9tY1SzCFEXIwdXVGpCAYwQ/kU5WfvFpYaei+fXVNnL4kjR0
PfpA6XnMsZ3DzqgepmUd92sAA56ZtBxuGjqcSYlg/JwjvUHdpaZDkE2WLqkAMeUN
8A7cUw+R6XWQ2/y6ob7QvKiT/ZDR8GrYUl3EdGE3LhB1ZsvLXJDZpWipwQBzuk9R
vJJOkGst38rjsWnb+nfeLh3AsgjF14wo+2vQL4mKs24xKTIvadHsFAZjKLXZ93Wf
Vn58FaPOYIkjBidYLWb3dlO1ZR8S0803gohLkLV6adH8bCNCWxGTOR51DZLomAsb
nAUCEAJaZrOqaQAuJAFNNpS8+/da3AIF4HVd2EdZ1yFXU15y0+zIxtROjKzg+OxO
M3jC/Aet1Q==
=IMcu
-----END PGP SIGNATURE-----
Merge tag 'io_uring-5.6-20200320' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"Two different fixes in here:
- Fix for a potential NULL pointer deref for links with async or
drain marked (Pavel)
- Fix for not properly checking RLIMIT_NOFILE for async punted
operations.
This affects openat/openat2, which were added this cycle, and
accept4. I did a full audit of other cases where we might check
current->signal->rlim[] and found only RLIMIT_FSIZE for buffered
writes and fallocate. That one is fixed and queued for 5.7 and
marked stable"
* tag 'io_uring-5.6-20200320' of git://git.kernel.dk/linux-block:
io_uring: make sure accept honor rlimit nofile
io_uring: make sure openat/openat2 honor rlimit nofile
io_uring: NULL-deref for IOSQE_{ASYNC,DRAIN}
Pull turbostat updates from Len Brown:
"Update to turbostat v20.03.20.
These patches unlock the full turbostat features for some new
machines, plus a couple other minor tweaks"
* 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
tools/power turbostat: update version
tools/power turbostat: Print cpuidle information
tools/power turbostat: Fix 32-bit capabilities warning
tools/power turbostat: Fix missing SYS_LPI counter on some Chromebooks
tools/power turbostat: Support Elkhart Lake
tools/power turbostat: Support Jasper Lake
tools/power turbostat: Support Ice Lake server
tools/power turbostat: Support Tiger Lake
tools/power turbostat: Fix gcc build warnings
tools/power turbostat: Support Cometlake
The handling of notify->work did not properly maintain notify->kref in two
cases:
1) where the work was already scheduled, another irq_set_affinity_locked()
would get the ref and (no-op-ly) schedule the work. Thus when
irq_affinity_notify() ran, it would drop the original ref but not the
additional one.
2) when cancelling the (old) work in irq_set_affinity_notifier(), if there
was outstanding work a ref had been got for it but was never put.
Fix both by checking the return values of the work handling functions
(schedule_work() for (1) and cancel_work_sync() for (2)) and put the
extra ref if the return value indicates preexisting work.
Fixes: cd7eab44e9 ("genirq: Add IRQ affinity notifiers")
Fixes: 59c39840f5 ("genirq: Prevent use-after-free and work list corruption")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ben Hutchings <ben@decadent.org.uk>
Link: https://lkml.kernel.org/r/24f5983f-2ab5-e83a-44ee-a45b5f9300f5@solarflare.com
Two fixes for bugs introduced this cycle.
One for a crash when shutting down a KVM PR guest (our original style of KVM
which doesn't use hypervisor mode).
And one fix for the recently added 32-bit KASAN_VMALLOC support.
Thanks to:
Christophe Leroy, Greg Kurz, Sean Christopherson.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAl51/+oTHG1wZUBlbGxl
cm1hbi5pZC5hdQAKCRBR6+o8yOGlgLuRD/0fZ4kTt+dyEMt7udP8ocBBNy4uRsNr
mC6nxxSIQ9XhsD/oJJ+wey8c/+/FUYpqAEg/IlWIZg4xeBks7imESIgbA9JSQFp5
pAObxc2AiZ97Fr8OuNrGcsakl2nQwX2tge8pNWJWN6Ukh+CsdbN9evpxFDe2u9h2
acpWFO/KBCm6CD1fd9y94pBQTKRCNGXpMX32SO1dn8UB+ApesUS3YWs209sjETpw
koKY7ZE9rd2YVNuaQp5fO/C/jdBIh+dzHbmeooNo4BFJAys6RNrxEbWqjwoy8es7
OhwmtDn3aWo6HvB5SvWKKvSFOvIyXzcnAV6spCkyk0vhTckEY8tEvjgKAhTNP9nA
Gu7VHx/uEkN+GIW76SyK7XAEUgKwMqfRNbWk8YpI1dNj7zW8Z/3kh/PhKja7lnj6
C77i6FXu0xkOd1+ZL73JNof98EfdrVl2mZGDvv3bCY+ME9zkYjMUj6uKzZ5+OJvq
ALJXMvegQM1Ma8mx3Ah5HwcPqUmIhNNHhCFXtzP9Gcrg4dNNGOCzGbyomcAA6gDb
rzzhe1ErI5HPZ524cMZFHjf0d04X9AOeVk2TyyL0tWgl/gNVaYSHKdaECc2xy3/Q
Pu8T8CvGBhR2uHb74DLxZPtOUhQYdMU7lvJCq9IKLfRWi5E/rsDAVoVJu46vG3QP
cIsl2hgW/7eW0g==
=JOM7
-----END PGP SIGNATURE-----
Merge tag 'powerpc-5.6-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Two fixes for bugs introduced this cycle:
- fix a crash when shutting down a KVM PR guest (our original style
of KVM which doesn't use hypervisor mode)
- fix for the recently added 32-bit KASAN_VMALLOC support
Thanks to: Christophe Leroy, Greg Kurz, Sean Christopherson"
* tag 'powerpc-5.6-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
KVM: PPC: Fix kernel crash with PR KVM
powerpc/kasan: Fix shadow memory protection with CONFIG_KASAN_VMALLOC
In rare cases retransmit logic will make a full skb copy, which will not
trigger the zeroing added in recent change
b738a185be ("tcp: ensure skb->dev is NULL before leaving TCP stack").
Cc: Eric Dumazet <edumazet@google.com>
Fixes: 75c119afe1 ("tcp: implement rb-tree based retransmit queue")
Fixes: 28f8bfd1ac ("netfilter: Support iif matches in POSTROUTING")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
The following patchset contains Netfilter fixes for net:
1) Refetch IP header pointer after pskb_may_pull() in flowtable,
from Haishuang Yan.
2) Fix memleak in flowtable offload in nf_flow_table_free(),
from Paul Blakey.
3) Set control.addr_type mask in flowtable offload, from Edward Cree.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Pull NVMe fixes from Keith:
"Two late nvme fabrics fixes for 5.6: a double free with the rdma
transport, and a regression fix for tcp; please pull."
* 'nvme-5.6-rc6' of git://git.infradead.org/nvme:
nvmet-tcp: set MSG_MORE only if we actually have more to send
nvme-rdma: Avoid double freeing of async event data
We are incorrectly dropping the raid56 and raid1c34 incompat flags when
there are still raid56 and raid1c34 block groups, not when we do not any
of those anymore. The logic just got unintentionally broken after adding
the support for the raid1c34 modes.
Fix this by clear the flags only if we do not have block groups with the
respective profiles.
Fixes: 9c907446dc ("btrfs: drop incompat bit for raid1c34 after last block group is gone")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Trying to initialize a structure with "= {};" will not always clean out
all padding locations in a structure. So be explicit and call memset to
initialize everything for a number of bpf information structures that
are then copied from userspace, sometimes from smaller memory locations
than the size of the structure.
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200320162258.GA794295@kroah.com
For the bpf syscall, we are relying on the compiler to properly zero out
the bpf_attr union that we copy userspace data into. Unfortunately that
doesn't always work properly, padding and other oddities might not be
correctly zeroed, and in some tests odd things have been found when the
stack is pre-initialized to other values.
Fix this by explicitly memsetting the structure to 0 before using it.
Reported-by: Maciej Żenczykowski <maze@google.com>
Reported-by: John Stultz <john.stultz@linaro.org>
Reported-by: Alexander Potapenko <glider@google.com>
Reported-by: Alistair Delva <adelva@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://android-review.googlesource.com/c/kernel/common/+/1235490
Link: https://lore.kernel.org/bpf/20200320094813.GA421650@kroah.com
When we send PDU data, we want to optimize the tcp stack
operation if we have more data to send. So when we set MSG_MORE
when:
- We have more fragments coming in the batch, or
- We have a more data to send in this PDU
- We don't have a data digest trailer
- We optimize with the SUCCESS flag and omit the NVMe completion
(used if sq_head pointer update is disabled)
This addresses a regression in QD=1 with SUCCESS flag optimization
as we unconditionally set MSG_MORE when we didn't actually have
more data to send.
Fixes: 7058329538 ("nvmet-tcp: implement C2HData SUCCESS optimization")
Reported-by: Mark Wunderlich <mark.wunderlich@intel.com>
Tested-by: Mark Wunderlich <mark.wunderlich@intel.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Userspace has no way to query if SEV has been disabled with the
sev module parameter of kvm-amd.ko. Actually it has one, but it
is a hack: do ioctl(KVM_MEM_ENCRYPT_OP, NULL) and check if it
returns EFAULT. Make it a little nicer by returning zero for
SEV enabled and NULL argument, and while at it document the
ioctl arguments.
Cc: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
The WARN_ON is essentially comparing a user-provided value with 0. It is
trivial to trigger it just by passing garbage to KVM_SET_CLOCK. Guests
can break if you do so, but the same applies to every KVM_SET_* ioctl.
So, if it hurts when you do like this, just do not do it.
Reported-by: syzbot+00be5da1d75f1cc95f6b@syzkaller.appspotmail.com
Fixes: 9446e6fce0 ("KVM: x86: fix WARN_ON check of an unsigned less than zero")
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
- Fix panic() when it occurs during secondary CPU startup
- Fix "kpti=off" when KASLR is enabled
- Fix howler in compat syscall table for vDSO clock_getres() fallback
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAl5zxtcQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNG/BB/9BVSLbqBdm6Op14J9zi3S8Qs7udcbo6dAr
vBkBvIl6JK4e284DSoPdCQoXp4QgExm6QEYzl2EjBYMqKCmCzng4w14ctm9FnCry
W8LNKRBaKyml7nDdT2UH1PnKB+Nh6ufv1PZQttN2e664bUl28pqC7MgJ3meJAjj8
a+lVRxIOVFKD5AwV1jfbS1Byx/w8n9Lo/C4wbswFrbHdq6puTuEZbtJiYbkxfqa3
wMXwNeIj1Xh2yVgz2gC02QLuTtLqJlPelhGHYec1hTQkmaSeNy0WvQr6t3oc6c5T
Bngzv7dM5lwXxjT82AqQhSpBUAp+MjYxnWW+hRpy+2BIEnbgGnDR
=w1Z9
-----END PGP SIGNATURE-----
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
- Fix panic() when it occurs during secondary CPU startup
- Fix "kpti=off" when KASLR is enabled
- Fix howler in compat syscall table for vDSO clock_getres() fallback
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: compat: Fix syscall number of compat_clock_getres
arm64: kpti: Fix "kpti=off" when KASLR is enabled
arm64: smp: fix crash_smp_send_stop() behaviour
arm64: smp: fix smp_send_stop() behaviour
Here are some small different driver fixes for 5.6-rc7
- binderfs fix, yet again
- slimbus new device id added
- hwtracing bugfixes for reported issues and a new device id
All of these have been in linux-next with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXnTQmw8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymASwCeLhlhgP/F9W7WqG1hWzJ0Dq8mkEUAn3W7GZs6
0FrrI5ksBmjd0kTX5dbt
=vMc6
-----END PGP SIGNATURE-----
Merge tag 'char-misc-5.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are some small different driver fixes for 5.6-rc7:
- binderfs fix, yet again
- slimbus new device id added
- hwtracing bugfixes for reported issues and a new device id
All of these have been in linux-next with no reported issues"
* tag 'char-misc-5.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
intel_th: pci: Add Elkhart Lake CPU support
intel_th: Fix user-visible error codes
intel_th: msu: Fix the unexpected state warning
stm class: sys-t: Fix the use of time_after()
slimbus: ngd: add v2.1.0 compatible
binderfs: use refcount for binder control devices too
Here are a number of small staging and IIO driver fixes for 5.6-rc7
Nothing major here, just resolutions for some reported problems:
- iio bugfixes for a number of different drivers
- greybus loopback_test fixes
- wfx driver fixes
All of these have been in linux-next with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXnTRmQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykkCwCfS8zLibsZtuv655yAK17C/6YghPQAoJ3iSew+
EkzEDWQ9j2NLE+lpWtKQ
=fZ2X
-----END PGP SIGNATURE-----
Merge tag 'staging-5.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging/IIO fixes from Greg KH:
"Here are a number of small staging and IIO driver fixes for 5.6-rc7
Nothing major here, just resolutions for some reported problems:
- iio bugfixes for a number of different drivers
- greybus loopback_test fixes
- wfx driver fixes
All of these have been in linux-next with no reported issues"
* tag 'staging-5.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: rtl8188eu: Add device id for MERCUSYS MW150US v2
staging: greybus: loopback_test: fix potential path truncations
staging: greybus: loopback_test: fix potential path truncation
staging: greybus: loopback_test: fix poll-mask build breakage
staging: wfx: fix RCU usage between hif_join() and ieee80211_bss_get_ie()
staging: wfx: fix RCU usage in wfx_join_finalize()
staging: wfx: make warning about pending frame less scary
staging: wfx: fix lines ending with a comma instead of a semicolon
staging: wfx: fix warning about freeing in-use mutex during device unregister
staging/speakup: fix get_word non-space look-ahead
iio: ping: set pa_laser_ping_cfg in of_ping_match
iio: chemical: sps30: fix missing triggered buffer dependency
iio: st_sensors: remap SMO8840 to LIS2DH12
iio: light: vcnl4000: update sampling periods for vcnl4040
iio: light: vcnl4000: update sampling periods for vcnl4200
iio: accel: adxl372: Set iio_chan BE
iio: magnetometer: ak8974: Fix negative raw values in sysfs
iio: trigger: stm32-timer: disable master mode when stopping
iio: adc: stm32-dfsdm: fix sleep in atomic context
iio: adc: at91-sama5d2_adc: fix differential channels in triggered mode
Here are some small USB fixes for 5.6-rc7. And there's a thunderbolt
driver fix thrown in for good measure as well.
These fixes are:
- new device ids for usb-serial drivers
- thunderbolt error code fix
- xhci driver fixes
- typec fixes
- cdc-acm driver fixes
- chipidea driver fix
- more USB quirks added for devices that need them.
All of these have been in linux-next with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iGwEABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXnTSfQ8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+yni+QCVEoBjFNoMaMgtQJ5Ogr0dYOLILwCeLRP7Ce0K
GTeo+M+qrTz4DN+Ejqo=
=xWvw
-----END PGP SIGNATURE-----
Merge tag 'usb-5.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some small USB fixes for 5.6-rc7. And there's a thunderbolt
driver fix thrown in for good measure as well.
These fixes are:
- new device ids for usb-serial drivers
- thunderbolt error code fix
- xhci driver fixes
- typec fixes
- cdc-acm driver fixes
- chipidea driver fix
- more USB quirks added for devices that need them.
All of these have been in linux-next with no reported issues"
* tag 'usb-5.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
USB: cdc-acm: fix rounding error in TIOCSSERIAL
USB: cdc-acm: fix close_delay and closing_wait units in TIOCSSERIAL
usb: quirks: add NO_LPM quirk for RTL8153 based ethernet adapters
usb: chipidea: udc: fix sleeping function called from invalid context
USB: serial: pl2303: add device-id for HP LD381
USB: serial: option: add ME910G1 ECM composition 0x110b
usb: host: xhci-plat: add a shutdown
usb: typec: ucsi: displayport: Fix a potential race during registration
usb: typec: ucsi: displayport: Fix NULL pointer dereference
USB: Disable LPM on WD19's Realtek Hub
usb: xhci: apply XHCI_SUSPEND_DELAY to AMD XHCI controller 1022:145c
xhci: Do not open code __print_symbolic() in xhci trace events
thunderbolt: Fix error code in tb_port_is_width_supported()
Here are 3 small tty_io bugfixes for reported issues that Eric has
resolved for 5.6-rc7
All of these have been in linux-next with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCXnTR9A8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ylsYACfZM/J2PW6TXpZfCv01feD3Rx/prkAnj+2DCI7
FeV2J/iXPTulT2eItbsJ
=xkaZ
-----END PGP SIGNATURE-----
Merge tag 'tty-5.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty fixes from Greg KH:
"Here are three small tty_io bugfixes for reported issues that Eric has
resolved for 5.6-rc7
All of these have been in linux-next with no reported issues"
* tag 'tty-5.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty: fix compat TIOCGSERIAL checking wrong function ptr
tty: fix compat TIOCGSERIAL leaking uninitialized memory
tty: drop outdated comments about release_tty() locking
A few fixes covering the issues reported by syzkaller, a couple of
fixes for the MIDI decoding bug, and a few usual HD-audio quirks.
Some of them are about ALSA core stuff, but they are small fixes just
for corner cases, and nothing thrilling.
-----BEGIN PGP SIGNATURE-----
iQJCBAABCAAsFiEEIXTw5fNLNI7mMiVaLtJE4w1nLE8FAl50gKMOHHRpd2FpQHN1
c2UuZGUACgkQLtJE4w1nLE9+vRAAlPwrUSIWAmOtTPTozX9kqJmTspC5FJqXEZZK
2XW3eaaRaYePiRRt2l7cqCiZUDySmZCJF6VivAt3Hx1n4nDRubs+p6JzHtJ+Swqk
vRVZaM7GI5uUcrbVGSGteVGzXsjodaD+p811AdM3hzgzxooa5LOCkKOXGxWUXP05
tsURjTPlRXvmYzm8vlINeQKE9M9aKHt1USthVxRg9iuyXxugFfZVWn4xMCmF4lPH
+Z5B+DGbJfp8FUdYBFjMTew67mm6a760kOwm86bBcCMDbafiVtWb181ZCsnST4YG
4IwRcIr+8XnJhufeOX3tkRJu8+TJvzTVUoJ8d5kr/fNpSBFuM+rB0GcwSANm5k6d
cQKTpkKutDt7F22kk1rRkBqiD+JeCZgcOq+5rM0OcDtgT97rcvb4Ddx/ckppIpeu
w3HSsfuq+tzoJp0EF56ruFUGEnlJcNFAM0wJqiAvnUQQDfowEU3qGb5WiU2Jjgs1
LarcAoTKi3Wz/Sa6eIt+D139YqvgvZ5oEmDMdU9yLWeFvQNLlR5UqqXTGgVwGndm
GL2nanMtK3k/rWaqZHDvmtLRtj/cq5so+7/VGCrAYfp9sFYZt9y4xfmzwRkNveE/
0cDFdE1s1v3dwoZmq+/HY5gIAwn8vYATNlXG+dDDGsdwAGhtE+XoIn+WOFGZh5Dz
ClyqYbI=
=ejlI
-----END PGP SIGNATURE-----
Merge tag 'sound-5.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A few fixes covering the issues reported by syzkaller, a couple of
fixes for the MIDI decoding bug, and a few usual HD-audio quirks.
Some of them are about ALSA core stuff, but they are small fixes just
for corner cases, and nothing thrilling"
* tag 'sound-5.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek - Enable the headset of Acer N50-600 with ALC662
ALSA: hda/realtek - Enable headset mic of Acer X2660G with ALC662
ALSA: seq: oss: Fix running status after receiving sysex
ALSA: seq: virmidi: Fix running status after receiving sysex
ALSA: pcm: oss: Remove WARNING from snd_pcm_plug_alloc() checks
ALSA: hda/realtek: Fix pop noise on ALC225
ALSA: line6: Fix endless MIDI read loop
ALSA: pcm: oss: Avoid plugin buffer overflow
core:
- fix lease warning
i915:
- Track active elements during dequeue
- Fix failure to handle all MCR ranges
- Revert unnecessary workaround
amdgpu:
- Pageflip fix
- VCN clockgating fixes
- GPR debugfs fix for umr
- GPU reset fix
- eDP fix for MBP
- DCN2.x fix
dw-hdmi:
- fix AVI frame colorimetry
komeda:
- fix compiler warning
bochs:
- downgrade a binding failure to a warning
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJedDFEAAoJEAx081l5xIa+Z8QP/iWnU8skiMi3SPnO/ki/2y1D
ryJMMnnS7tQBW8hJ7UGCakimWq9VBuNEzOM7xYeTkKJ2vu2oT7gJkhlpscnPkPrn
zJgaWZh31PbtnusP+8PdQiG1N3EN/FKnzx9P3WMjDrG85CNm745GRrU8VMUsPCvf
W57QdX1LfQeajkNeBdqpYNDhP5Tu00AMeqq3KuR7je14juui8EOzSKGADkHpKqSw
9UqnPMRrum9CHbSBGAQ3qQcAYA4vWrSxsHQdbwd4jN4+pxDZbtZscvxn28VKRd7k
TR6YnLaXAciDRxe3gvfbEbE1zn72Lrt6R97d5ojV+VDV9sdrkN9eDYwhoYXRRIA0
sQFg29WwBRHMeI+3ucsEDdDKx5hLR+fxD6JcGDv8dZeHzTCWZ7PFFG9YtZVUmkQq
bMXWlvNzm7fQIA9DwfQt0nkDqt/Re6qbqRnd/ayOz/IzthMmIyPfIzA5d5u/b354
GZVOLaD9iimCIWMD3qkwfGgZsn0Ss+jewBQ6navKaY15ADDM0OEF9Oa/RBi97DUC
QGcpHYe6eNaMByef7/GVyDfZU1GmTAa8PepAe9x9ItRQxCICFl9fb19iyXu25ZCx
wpSNFgHKfxMigjuLe3czW3If1SbctVMVl2BDeXh5TMFXUGKXHQZa9uf8AhtuHjj4
F6D1JOZQPBfuoY1xukv+
=odvv
-----END PGP SIGNATURE-----
Merge tag 'drm-fixes-2020-03-20' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Hope you are well hiding out above the garage. A few amdgpu changes
but nothing too major. I've had a wisdom tooth out this week so
haven't been to on top of things, but all seems good.
core:
- fix lease warning
i915:
- Track active elements during dequeue
- Fix failure to handle all MCR ranges
- Revert unnecessary workaround
amdgpu:
- Pageflip fix
- VCN clockgating fixes
- GPR debugfs fix for umr
- GPU reset fix
- eDP fix for MBP
- DCN2.x fix
dw-hdmi:
- fix AVI frame colorimetry
komeda:
- fix compiler warning
bochs:
- downgrade a binding failure to a warning"
* tag 'drm-fixes-2020-03-20' of git://anongit.freedesktop.org/drm/drm:
drm/amd/display: Fix pageflip event race condition for DCN.
drm/amdgpu: fix typo for vcn2.5/jpeg2.5 idle check
drm/amdgpu: fix typo for vcn2/jpeg2 idle check
drm/amdgpu: fix typo for vcn1 idle check
drm/lease: fix WARNING in idr_destroy
drm/i915: Handle all MCR ranges
Revert "drm/i915/tgl: Add extra hdc flush workaround"
drm/i915/execlists: Track active elements during dequeue
drm/bochs: downgrade pci_request_region failure from error to warning
drm/amd/display: Add link_rate quirk for Apple 15" MBP 2017
drm/amdgpu: add fbdev suspend/resume on gpu reset
drm/amd/amdgpu: Fix GPR read from debugfs (v2)
drm/amd/display: fix typos for dcn20_funcs and dcn21_funcs struct
drm/komeda: mark PM functions as __maybe_unused
drm/bridge: dw-hdmi: fix AVI frame colorimetry
Just like commit 4022e7af86, this fixes the fact that
IORING_OP_ACCEPT ends up using get_unused_fd_flags(), which checks
current->signal->rlim[] for limits.
Add an extra argument to __sys_accept4_file() that allows us to pass
in the proper nofile limit, and grab it at request prep time.
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Dmitry reports that a test case shows that io_uring isn't honoring a
modified rlimit nofile setting. get_unused_fd_flags() checks the task
signal->rlimi[] for the limits. As this isn't easily inheritable,
provide a __get_unused_fd_flags() that takes the value instead. Then we
can grab it when the request is prepared (from the original task), and
pass that in when we do the async part part of the open.
Reported-by: Dmitry Kadashev <dkadashev@gmail.com>
Tested-by: Dmitry Kadashev <dkadashev@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Building an arm64 defconfig with clang's integrated assembler, this error
occurs:
<instantiation>:2:2: error: unrecognized instruction mnemonic
_ASM_EXTABLE 9999b, 9f
^
arch/arm64/mm/cache.S:50:1: note: while in macro instantiation
user_alt 9f, "dc cvau, x4", "dc civac, x4", 0
^
While GNU as seems fine with case-sensitive macro instantiations, clang
doesn't, so use the actual macro name (_asm_extable) as in the rest of
the file.
Also checked that the generated assembly matches the GCC output.
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Fixes: 290622efc7 ("arm64: fix "dc cvau" cache operation on errata-affected core")
Link: https://github.com/ClangBuiltLinux/linux/issues/924
Signed-off-by: Ilie Halip <ilie.halip@gmail.com>
Signed-off-by: Will Deacon <will@kernel.org>
skb->rbnode is sharing three skb fields : next, prev, dev
When a packet is sent, TCP keeps the original skb (master)
in a rtx queue, which was converted to rbtree a while back.
__tcp_transmit_skb() is responsible to clone the master skb,
and add the TCP header to the clone before sending it
to network layer.
skb_clone() already clears skb->next and skb->prev, but copies
the master oskb->dev into the clone.
We need to clear skb->dev, otherwise lower layers could interpret
the value as a pointer to a netdev.
This old bug surfaced recently when commit 28f8bfd1ac
("netfilter: Support iif matches in POSTROUTING") was merged.
Before this netfilter commit, skb->dev value was ignored and
changed before reaching dev_queue_xmit()
Fixes: 75c119afe1 ("tcp: implement rb-tree based retransmit queue")
Fixes: 28f8bfd1ac ("netfilter: Support iif matches in POSTROUTING")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Martin Zaharinov <micron10@gmail.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>