When it is to cleanup net namespace, rds_tcp_exit_net() will call
rds_tcp_kill_sock(), if t_sock is NULL, it will not call
rds_conn_destroy(), rds_conn_path_destroy() and rds_tcp_conn_free() to free
connection, and the worker cp_conn_w is not stopped, afterwards the net is freed in
net_drop_ns(); While cp_conn_w rds_connect_worker() will call rds_tcp_conn_path_connect()
and reference 'net' which has already been freed.
In rds_tcp_conn_path_connect(), rds_tcp_set_callbacks() will set t_sock = sock before
sock->ops->connect, but if connect() is failed, it will call
rds_tcp_restore_callbacks() and set t_sock = NULL, if connect is always
failed, rds_connect_worker() will try to reconnect all the time, so
rds_tcp_kill_sock() will never to cancel worker cp_conn_w and free the
connections.
Therefore, the condition !tc->t_sock is not needed if it is going to do
cleanup_net->rds_tcp_exit_net->rds_tcp_kill_sock, because tc->t_sock is always
NULL, and there is on other path to cancel cp_conn_w and free
connection. So this patch is to fix this.
rds_tcp_kill_sock():
...
if (net != c_net || !tc->t_sock)
...
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
==================================================================
BUG: KASAN: use-after-free in inet_create+0xbcc/0xd28
net/ipv4/af_inet.c:340
Read of size 4 at addr ffff8003496a4684 by task kworker/u8:4/3721
CPU: 3 PID: 3721 Comm: kworker/u8:4 Not tainted 5.1.0 #11
Hardware name: linux,dummy-virt (DT)
Workqueue: krdsd rds_connect_worker
Call trace:
dump_backtrace+0x0/0x3c0 arch/arm64/kernel/time.c:53
show_stack+0x28/0x38 arch/arm64/kernel/traps.c:152
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x120/0x188 lib/dump_stack.c:113
print_address_description+0x68/0x278 mm/kasan/report.c:253
kasan_report_error mm/kasan/report.c:351 [inline]
kasan_report+0x21c/0x348 mm/kasan/report.c:409
__asan_report_load4_noabort+0x30/0x40 mm/kasan/report.c:429
inet_create+0xbcc/0xd28 net/ipv4/af_inet.c:340
__sock_create+0x4f8/0x770 net/socket.c:1276
sock_create_kern+0x50/0x68 net/socket.c:1322
rds_tcp_conn_path_connect+0x2b4/0x690 net/rds/tcp_connect.c:114
rds_connect_worker+0x108/0x1d0 net/rds/threads.c:175
process_one_work+0x6e8/0x1700 kernel/workqueue.c:2153
worker_thread+0x3b0/0xdd0 kernel/workqueue.c:2296
kthread+0x2f0/0x378 kernel/kthread.c:255
ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1117
Allocated by task 687:
save_stack mm/kasan/kasan.c:448 [inline]
set_track mm/kasan/kasan.c:460 [inline]
kasan_kmalloc+0xd4/0x180 mm/kasan/kasan.c:553
kasan_slab_alloc+0x14/0x20 mm/kasan/kasan.c:490
slab_post_alloc_hook mm/slab.h:444 [inline]
slab_alloc_node mm/slub.c:2705 [inline]
slab_alloc mm/slub.c:2713 [inline]
kmem_cache_alloc+0x14c/0x388 mm/slub.c:2718
kmem_cache_zalloc include/linux/slab.h:697 [inline]
net_alloc net/core/net_namespace.c:384 [inline]
copy_net_ns+0xc4/0x2d0 net/core/net_namespace.c:424
create_new_namespaces+0x300/0x658 kernel/nsproxy.c:107
unshare_nsproxy_namespaces+0xa0/0x198 kernel/nsproxy.c:206
ksys_unshare+0x340/0x628 kernel/fork.c:2577
__do_sys_unshare kernel/fork.c:2645 [inline]
__se_sys_unshare kernel/fork.c:2643 [inline]
__arm64_sys_unshare+0x38/0x58 kernel/fork.c:2643
__invoke_syscall arch/arm64/kernel/syscall.c:35 [inline]
invoke_syscall arch/arm64/kernel/syscall.c:47 [inline]
el0_svc_common+0x168/0x390 arch/arm64/kernel/syscall.c:83
el0_svc_handler+0x60/0xd0 arch/arm64/kernel/syscall.c:129
el0_svc+0x8/0xc arch/arm64/kernel/entry.S:960
Freed by task 264:
save_stack mm/kasan/kasan.c:448 [inline]
set_track mm/kasan/kasan.c:460 [inline]
__kasan_slab_free+0x114/0x220 mm/kasan/kasan.c:521
kasan_slab_free+0x10/0x18 mm/kasan/kasan.c:528
slab_free_hook mm/slub.c:1370 [inline]
slab_free_freelist_hook mm/slub.c:1397 [inline]
slab_free mm/slub.c:2952 [inline]
kmem_cache_free+0xb8/0x3a8 mm/slub.c:2968
net_free net/core/net_namespace.c:400 [inline]
net_drop_ns.part.6+0x78/0x90 net/core/net_namespace.c:407
net_drop_ns net/core/net_namespace.c:406 [inline]
cleanup_net+0x53c/0x6d8 net/core/net_namespace.c:569
process_one_work+0x6e8/0x1700 kernel/workqueue.c:2153
worker_thread+0x3b0/0xdd0 kernel/workqueue.c:2296
kthread+0x2f0/0x378 kernel/kthread.c:255
ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1117
The buggy address belongs to the object at ffff8003496a3f80
which belongs to the cache net_namespace of size 7872
The buggy address is located 1796 bytes inside of
7872-byte region [ffff8003496a3f80, ffff8003496a5e40)
The buggy address belongs to the page:
page:ffff7e000d25a800 count:1 mapcount:0 mapping:ffff80036ce4b000
index:0x0 compound_mapcount: 0
flags: 0xffffe0000008100(slab|head)
raw: 0ffffe0000008100 dead000000000100 dead000000000200 ffff80036ce4b000
raw: 0000000000000000 0000000080040004 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff8003496a4580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8003496a4600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff8003496a4680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8003496a4700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8003496a4780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
Fixes: 467fa15356ac("RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Mao Wenan <maowenan@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski says:
====================
nfp: fix retcode and disable netpoll on representors
This series avoids a potential crash on nfp representor devices
when netpoll is in use. If transmitting the frame through underlying
vNIC fails we'd return an error code (by passing on error code from
__dev_queue_xmit()) and cause double free in netpoll code.
Fix the error code and disable netpoll on reprs altogether.
IRQ-safety of locking the queues and calling __dev_queue_xmit()
is questionable.
Big thanks to John Hurley for debugging and narrowing down
the trace log after I gave up! :)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
NFP reprs are software device on top of the PF's vNIC.
The comment above __dev_queue_xmit() sayeth:
When calling this method, interrupts MUST be enabled. This is because
the BH enable code must have IRQs enabled so that it will not deadlock.
For netconsole we can't guarantee IRQ state, let's just
disable netpoll on representors to be on the safe side.
When the initial implementation of NFP reprs was added by the
commit 5de73ee467 ("nfp: general representor implementation")
.ndo_poll_controller was required for netpoll to be enabled.
Fixes: ac3d9dd034 ("netpoll: make ndo_poll_controller() optional")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
dev_queue_xmit() may return error codes as well as netdev_tx_t,
and it always consumes the skb. Make sure we always return a
correct netdev_tx_t value.
Fixes: eadfa4c3be ("nfp: add stats and xmit helpers for representors")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: John Hurley <john.hurley@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some PHYs will use the 2500BaseX PHY_INTERFACE_MODE when being linked
with a partner using 2.5GBaseT.
Since we can't autonegotiate this speed between the MAC and the PHY, we
need to have the proper comphy support enabled, to make sure we can
safely advertise 2.5G and 1G in BaseT and be able to switch between both
corresponding PHY interface modes. This is now possible since comphy
support was added to this driver.
This commit adds the 2500BaseT mode to the list of supported modes when
using 2500BaseX, and was tested on a setup with an Armada385 and a
88E2010 PHY, both with and without the comphy node in the DT.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
net_hash_mix() currently uses kernel address of a struct net,
and is used in many places that could be used to reveal this
address to a patient attacker, thus defeating KASLR, for
the typical case (initial net namespace, &init_net is
not dynamically allocated)
I believe the original implementation tried to avoid spending
too many cycles in this function, but security comes first.
Also provide entropy regardless of CONFIG_NET_NS.
Fixes: 0b4419162a ("netns: introduce the net_hash_mix "salt" for hashes")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Amit Klein <aksecurity@gmail.com>
Reported-by: Benny Pinkas <benny@pinkas.net>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add flow_dissect for qca tagged packet to get the right hash.
Signed-off-by: Xiaofei Shen <xiaofeis@codeaurora.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Vinod Koul <vkoul@kernel.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for fine-grain timeout support to conntrack action.
The new OVS_CT_ATTR_TIMEOUT attribute of the conntrack action
specifies a timeout to be associated with this connection.
If no timeout is specified, it acts as is, that is the default
timeout for the connection will be automatically applied.
Example usage:
$ nfct timeout add timeout_1 inet tcp syn_sent 100 established 200
$ ovs-ofctl add-flow br0 in_port=1,ip,tcp,action=ct(commit,timeout=timeout_1)
CC: Pravin Shelar <pshelar@ovn.org>
CC: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch exports nf_ct_set_timeout() and nf_ct_destroy_timeout().
The two functions are derived from xt_ct_destroy_timeout() and
xt_ct_set_timeout() in xt_CT.c, and moved to nf_conntrack_timeout.c
without any functional change.
It would be useful for other users (i.e. OVS) that utilizes the
finer-grain conntrack timeout feature.
CC: Pablo Neira Ayuso <pablo@netfilter.org>
CC: Pravin Shelar <pshelar@ovn.org>
Signed-off-by: Yi-Hung Wei <yihung.wei@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Kirsher says:
====================
Intel Wired LAN Driver Fixes 2019-03-26
This series contains updates to igb, ixgbe, i40e and fm10k.
Jake fixes an issue with PTP in i40e where a previous commit resulted
in a regression where the driver would interpret small negative
adjustments as large positive additions, resulting in incorrect
behavior.
Arvind Sankar fixes an issue in igb where a previous commit would cause
a warning in the PCI pm core and resulted in pci_pm_runtime_suspend
would not call pci_save_state or pci_finish_runtime_suspend.
Ivan Vecera fixes MDIO bus registration with ixgbe, where the driver was
ignoring errors returned when registering and would leave the pointer in
a NULL state which triggered a BUG when un-registering.
Stefan Assmann fixes the check for Wake-On-LAN for i40e, which only
supports magic packet.
Yue Haibing fixes a potential NULL pointer de-reference in fm10k by
adding a simple check if the value is NULL.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann says:
====================
s390/qeth: updates 2019-03-28
please apply the following patchset to net-next. This reworks the control
IO code in qeth so that we no longer need to poll for cmd completion,
and refactors the IDX setup code to also use this improved IO path.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This converts the IDX code to use qeth_send_control_data(), replacing
a bunch of duplicated IO code and unbounded waits. It also allows the
IDX sequence to benefit from the improved timeout & notify
infrastructure, so that we can eliminate the DOWN -> ACTIVATING -> UP
transition in the channel state machine.
The patch looks rather big, but most of it is a straight-forward
conversion of the old IDX cmd setup & callbacks to the new model.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
To avoid concurrency issues, some parts of the cmd setup are delayed
until qeth_send_control_data() holds the IO channel's irq_pending
"lock". Rather than hard-coding those setup steps for each cmd type,
have the cmd provide a callback. This will make it easier to also issue
IDX commands via qeth_send_control_data().
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As trivial cleanup before adding more users to qeth_notify_reply(),
move the setup of reply->rc from the caller into the helper.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Current code makes it look like qeth_send_control_data_cb() is some
sort of default callback for all cmds. But in practice, it is only used
for half of the cmd buffers we issue.
Reduce the confusion by only setting this callback for cmds that
actually want it, and while at it give the callback a name that matches
the established naming scheme.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All callers are running in process context now, so we can safely sleep
in qeth_send_control_data() while waiting for a cmd to complete.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All users of the lock are running in process context now.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The inet6addr_chain is atomic. So instead of starting the cmd IO for
SETIP / DELIP straight from the notifier callback, run it from a
workqueue. This is the last step towards removal of cmd IO completion
polling.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Extract a little helper, so that high-level callers can manipulate the
IP table without worrying about the locking. This will make it easier
to convert the code to a different locking primitive later on.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The L2 and L3 .ndo_set_rx_mode callbacks maintain an address cache
to decide which addresses have changed since the last modeset.
When the card is set offline, qeth_l?_stop_card() drains this cache.
This happens only after 1) the net_device has been detached, and
2) any pending RX modeset has completed. Consequently we can access the
cache lock-free.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
.ndo_set_rx_mode gets called in process context, but while holding the
addr_list spinlock. Which means we currently can't sleep while
re-programming the HW, and need to poll for IO completion. That's bad,
in particular since receiving the cmd response can fail silently and
we're then polling until the timeout hits.
As a first step towards eliminating the IO completion polling, run the
RX modeset from a work element and only take the addr_list lock while
updating the RX mode address cache.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko says:
===================
net: call for phys_port_name into devlink directly if possible
phys_port_name may be assembled by a helper in devlink. It is currently
the case only for mlxsw driver. Benefit from the get_devlink_port ndo
and call into devlink directly from dev_get_phys_port_name(). That saves
the trip to the driver, simplifies the code and makes it similar to
recently introduced ethtool-devlink compat helpers.
Move bnxt, partly nfp and dsa to let devlink core generate the name too.
===================
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently if the driver registers devlink port instance, it should set
the devlink port attributes as well. Then the devlink core is able to
obtain physical port name itself, no need for driver to implement
the ndo. Once all drivers will implement devlink port registration,
this ndo should be removed. This warning guides new
drivers to do things as they should be done.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If nn->port is defined it means that devlink_port has been registered
for this port as well. Devlink core is handling the port name
formatting.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since each non-legacy slave has its own devlink port instance
correctly set, rely on devlink core to generate correct phys port name.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order for devlink compat functions to work, implement
ndo_get_devlink_port. Legacy slaves does not have devlink port instances
created for themselves.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rely on the previously introduced fallback and let the core
call devlink in order to get the physical port name.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order for devlink compat functions to work, implement
ndo_get_devlink_port. Legacy slaves does not have devlink port instances
created for themselves.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rely on the previously introduced fallback and let the core call
devlink directly in order to get the physical port name.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In order for devlink compat functions to work, implement
ndo_get_devlink_port.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce devlink_compat_phys_port_name_get() helper that
gets the physical port name for specified netdevice
according to devlink port attributes.
Call this helper from dev_get_phys_port_name()
in case ndo_get_phys_port_name is not defined.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Follow-up patch is going to need a devlink port instance according to
a netdev. Devlink port instance should be always available when devlink
is used. So change the recently introduced ndo_get_devlink to
ndo_get_devlink_port. With that, adjust the wrapper for the only
user to get devlink pointer.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the init/fini flow and register devlink port instance before
netdev. Now it is needed for correct behavior of phys_port_name
generation, but in general it makes sense to register devlink port
first.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Perf fails to parse uncore event alias, for example:
# perf stat -e unc_m_clockticks -a --no-merge sleep 1
event syntax error: 'unc_m_clockticks'
\___ parser error
Current code assumes that the event alias is from one specific PMU.
To find the PMU, perf strcmps the PMU name of event alias with the real
PMU name on the system.
However, the uncore event alias may be from multiple PMUs with common
prefix. The PMU name of uncore event alias is the common prefix.
For example, UNC_M_CLOCKTICKS is clock event for iMC, which include 6
PMUs with the same prefix "uncore_imc" on a skylake server.
The real PMU names on the system for iMC are uncore_imc_0 ...
uncore_imc_5.
The strncmp is used to only check the common prefix for uncore event
alias.
With the patch:
# perf stat -e unc_m_clockticks -a --no-merge sleep 1
Performance counter stats for 'system wide':
723,594,722 unc_m_clockticks [uncore_imc_5]
724,001,954 unc_m_clockticks [uncore_imc_3]
724,042,655 unc_m_clockticks [uncore_imc_1]
724,161,001 unc_m_clockticks [uncore_imc_4]
724,293,713 unc_m_clockticks [uncore_imc_2]
724,340,901 unc_m_clockticks [uncore_imc_0]
1.002090060 seconds time elapsed
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: stable@vger.kernel.org
Fixes: ea1fa48c05 ("perf stat: Handle different PMU names with common prefix")
Link: http://lkml.kernel.org/r/1552672814-156173-1-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Unlike python2, python3 strings are not compatible with byte strings.
That results in disassembly not working for the branches reports. Fixup
those places overlooked in the port to python3.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Fixes: beda0e725e ("perf script python: Add Python3 support to exported-sql-viewer.py")
Link: http://lkml.kernel.org/r/20190327072826.19168-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
smatch complaint:
arch/mips/sgi-ip27/ip27-irq.c:123 shutdown_bridge_irq()
warn: variable dereferenced before check 'hd' (see line 121)
Fix it by removing local variable and use hd->pin directly.
Fixes: 69a07a41d9 ("MIPS: SGI-IP27: rework HUB interrupts")
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: linux-mips@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
KGDB_call_nmi_hook is called by other cpu through smp call.
MIPS smp call is processed in ipi irq handler and regs is saved in
handle_int.
So kgdb_call_nmi_hook get regs by get_irq_regs and regs will be passed
to kgdb_cpu_enter.
Signed-off-by: Chong Qiao <qiaochong@loongson.cn>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Daniel Thompson <daniel.thompson@linaro.org>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: James Hogan <jhogan@kernel.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Christophe Leroy <christophe.leroy@c-s.fr>
Cc: linux-mips@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: QiaoChong <qiaochong@loongson.cn>
- Fix THP handling in the presence of pre-existing PTEs
- Honor request for PTE mappings even when THPs are available
- GICv4 performance improvement
- Take the srcu lock when writing to guest-controlled ITS data structures
- Reset the virtual PMU in preemptible context
- Various cleanups
-----BEGIN PGP SIGNATURE-----
iQJJBAABCgAzFiEEn9UcU+C1Yxj9lZw9I9DQutE9ekMFAlycy1MVHG1hcmMuenlu
Z2llckBhcm0uY29tAAoJECPQ0LrRPXpDlJkQAK0mjfSOyZpeYX48XqE2Vnmr0HH2
bfwh63jv2apR4frlT9XiSV4xUEhM8n6mblnSSuXZlykR74ZxzrCb0QKROTe67jAQ
HxJrK+9JuKZE4VTKtpx+JIyr377NosY/cx2E87Agdof/+L2tqfjKAnI6MLsVIuca
H3F6SH2YVhcPcf0+6nKPUF5wacKlsDZfmrq3Hgk1MbgXG5UmFdjBFlcTUOFSumkK
E4hD93VHiQUDKvJL/qRN3p940c0lVOcDxK9Gu1AhmosvdLYEZJgtZeLBvysDH6Xl
xbq5EU5q/Gks/DEQqallaOiocREc3mZCUlF+pC1sLUHalEClEvr47hlXCLHz9RzA
yV0wW7Foo0oZb+zQyswrCZbAhpO7Vuqd00ghSgV6n6hTS2O7t9syiSe3QCvgCkoy
YZ+Eogx66m68bvbBqFrLwGHUlg+XSpw8ZZ5sh6d6iCVjSVp7ptbmxMgxuMQoJgVT
jl7WbXWYdZ2f3gYlKhtN+3IGxFIAUvFeSxFxJYnwWQPEcR6gY/lBCbwGTL5WT96F
K12V6egTau3QrG+hP+DYPwzWNPcC5oNlayOKNPlTuiwzp2PsSdBAMpEN4Kadl8m8
cMvvRhrcpmrTZdLLQATU9O9A75M1si1a9zaLuC1icYZwYeV593BnqvBBRwS9r/1S
ccg/6EkNpWZby3yd
=mpQX
-----END PGP SIGNATURE-----
Merge tag 'kvmarm-fixes-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master
KVM/ARM fixes for 5.1
- Fix THP handling in the presence of pre-existing PTEs
- Honor request for PTE mappings even when THPs are available
- GICv4 performance improvement
- Take the srcu lock when writing to guest-controlled ITS data structures
- Reset the virtual PMU in preemptible context
- Various cleanups
pyside version 1 fails to handle python3 large integers in some cases,
resulting in Qt getting into a never-ending loop. This affects:
samples Table
samples_view Table
All branches Report
Selected branches Report
Add workarounds for those cases.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Fixes: beda0e725e ("perf script python: Add Python3 support to exported-sql-viewer.py")
Link: http://lkml.kernel.org/r/20190327072826.19168-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Since commit 1fb87b8e95 ("perf machine: Don't search for active kernel
start in __machine__create_kernel_maps"), the __machine__create_kernel_maps()
just create a map what start and end are both zero. Though the address will be
updated later, the order of map in the rbtree may be incorrect.
The commit ee05d21791 ("perf machine: Set main kernel end address properly")
fixed the logic in machine__create_kernel_maps(), but it's still wrong in
function machine__process_kernel_mmap_event().
To reproduce this issue, we need an environment which the module address
is before the kernel text segment. I tested it on an aarch64 machine with
kernel 4.19.25:
[root@localhost hulk]# grep _stext /proc/kallsyms
ffff000008081000 T _stext
[root@localhost hulk]# grep _etext /proc/kallsyms
ffff000009780000 R _etext
[root@localhost hulk]# tail /proc/modules
hisi_sas_v2_hw 77824 0 - Live 0xffff00000191d000
nvme_core 126976 7 nvme, Live 0xffff0000018b6000
mdio 20480 1 ixgbe, Live 0xffff0000018ab000
hisi_sas_main 106496 1 hisi_sas_v2_hw, Live 0xffff000001861000
hns_mdio 20480 2 - Live 0xffff000001822000
hnae 28672 3 hns_dsaf,hns_enet_drv, Live 0xffff000001815000
dm_mirror 40960 0 - Live 0xffff000001804000
dm_region_hash 32768 1 dm_mirror, Live 0xffff0000017f5000
dm_log 32768 2 dm_mirror,dm_region_hash, Live 0xffff0000017e7000
dm_mod 315392 17 dm_mirror,dm_log, Live 0xffff000001780000
[root@localhost hulk]#
Before fix:
[root@localhost bin]# perf record sleep 3
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.011 MB perf.data (9 samples) ]
[root@localhost bin]# perf buildid-list -i perf.data
4c4e46c971ca935f781e603a09b52a92e8bdfee8 [vdso]
[root@localhost bin]# perf buildid-list -i perf.data -H
0000000000000000000000000000000000000000 /proc/kcore
[root@localhost bin]#
After fix:
[root@localhost tools]# ./perf/perf record sleep 3
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.011 MB perf.data (9 samples) ]
[root@localhost tools]# ./perf/perf buildid-list -i perf.data
28a6c690262896dbd1b5e1011ed81623e6db0610 [kernel.kallsyms]
106c14ce6e4acea3453e484dc604d66666f08a2f [vdso]
[root@localhost tools]# ./perf/perf buildid-list -i perf.data -H
28a6c690262896dbd1b5e1011ed81623e6db0610 /proc/kcore
Signed-off-by: Wei Li <liwei391@huawei.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Kim Phillips <kim.phillips@arm.com>
Cc: Li Bin <huawei.libin@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190228092003.34071-1-liwei391@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick up the changes in:
2b57ecd020 ("KVM: PPC: Book3S: Add count cache flush parameters to kvmppc_get_cpu_char()")
That don't cause any changes in the tools.
This silences this perf build warning:
Warning: Kernel ABI header at 'tools/arch/powerpc/include/uapi/asm/kvm.h' differs from latest version at 'arch/powerpc/include/uapi/asm/kvm.h'
diff -u tools/arch/powerpc/include/uapi/asm/kvm.h arch/powerpc/include/uapi/asm/kvm.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Link: https://lkml.kernel.org/n/tip-4pb7ywp9536hub2pnj4hu6i4@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To pick up the changes introduced in the following csets:
2b188cc1bb ("Add io_uring IO interface")
edafccee56 ("io_uring: add support for pre-mapped user IO buffers")
3eb39f4793 ("signal: add pidfd_send_signal() syscall")
This makes 'perf trace' to become aware of these new syscalls, so that
one can use them like 'perf trace -e ui_uring*,*signal' to do a system
wide strace-like session looking at those syscalls, for instance.
For example:
# perf trace -s io_uring-cp ~acme/isos/RHEL-x86_64-dvd1.iso ~/bla
Summary of events:
io_uring-cp (383), 1208866 events, 100.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
-------------- ------ -------- ------ ------- ------- ------
io_uring_enter 605780 2955.615 0.000 0.005 33.804 1.94%
openat 4 459.446 0.004 114.861 459.435 100.00%
munmap 4 0.073 0.009 0.018 0.042 44.03%
mmap 10 0.054 0.002 0.005 0.026 43.24%
brk 28 0.038 0.001 0.001 0.003 7.51%
io_uring_setup 1 0.030 0.030 0.030 0.030 0.00%
mprotect 4 0.014 0.002 0.004 0.005 14.32%
close 5 0.012 0.001 0.002 0.004 28.87%
fstat 3 0.006 0.001 0.002 0.003 35.83%
read 4 0.004 0.001 0.001 0.002 13.58%
access 1 0.003 0.003 0.003 0.003 0.00%
lseek 3 0.002 0.001 0.001 0.001 9.00%
arch_prctl 2 0.002 0.001 0.001 0.001 0.69%
execve 1 0.000 0.000 0.000 0.000 0.00%
#
# perf trace -e io_uring* -s io_uring-cp ~acme/isos/RHEL-x86_64-dvd1.iso ~/bla
Summary of events:
io_uring-cp (390), 1191250 events, 100.0%
syscall calls total min avg max stddev
(msec) (msec) (msec) (msec) (%)
-------------- ------ -------- ------ ------ ------ ------
io_uring_enter 597093 2706.060 0.001 0.005 14.761 1.10%
io_uring_setup 1 0.038 0.038 0.038 0.038 0.00%
#
More work needed to make the tools/perf/examples/bpf/augmented_raw_syscalls.c
BPF program to copy the 'struct io_uring_params' arguments to perf's ring
buffer so that 'perf trace' can use the BTF info put in place by pahole's
conversion of the kernel DWARF and then auto-beautify those arguments.
This patch produces the expected change in the generated syscalls table
for x86_64:
--- /tmp/build/perf/arch/x86/include/generated/asm/syscalls_64.c.before 2019-03-26 13:37:46.679057774 -0300
+++ /tmp/build/perf/arch/x86/include/generated/asm/syscalls_64.c 2019-03-26 13:38:12.755990383 -0300
@@ -334,5 +334,9 @@ static const char *syscalltbl_x86_64[] =
[332] = "statx",
[333] = "io_pgetevents",
[334] = "rseq",
+ [424] = "pidfd_send_signal",
+ [425] = "io_uring_setup",
+ [426] = "io_uring_enter",
+ [427] = "io_uring_register",
};
-#define SYSCALLTBL_x86_64_MAX_ID 334
+#define SYSCALLTBL_x86_64_MAX_ID 427
This silences these perf build warnings:
Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/unistd.h' differs from latest version at 'include/uapi/asm-generic/unistd.h'
diff -u tools/include/uapi/asm-generic/unistd.h include/uapi/asm-generic/unistd.h
Warning: Kernel ABI header at 'tools/perf/arch/x86/entry/syscalls/syscall_64.tbl' differs from latest version at 'arch/x86/entry/syscalls/syscall_64.tbl'
diff -u tools/perf/arch/x86/entry/syscalls/syscall_64.tbl arch/x86/entry/syscalls/syscall_64.tbl
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Christian Brauner <christian@brauner.io>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Link: https://lkml.kernel.org/n/tip-p0ars3otuc52x5iznf21shhw@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To get the changes in:
e46c2e99f6 ("drm/i915: Expose RPCS (SSEU) configuration to userspace (Gen11 only)")
That don't cause changes in the generated perf binaries.
To silence this perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/drm/i915_drm.h' differs from latest version at 'include/uapi/drm/i915_drm.h'
diff -u tools/include/uapi/drm/i915_drm.h include/uapi/drm/i915_drm.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://lkml.kernel.org/n/tip-h6bspm1nomjnpr90333rrx7q@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To get the changes from:
52f6490940 ("x86: Add TSX Force Abort CPUID/MSR")
That don't cause any changes in the generated perf binaries.
And silence this perf build warning:
Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h'
diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/n/tip-zv8kw8vnb1zppflncpwfsv2w@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To get the changes in:
ab3948f58f ("mm/memfd: add an F_SEAL_FUTURE_WRITE seal to memfd")
And silence this perf build warning:
Warning: Kernel ABI header at 'tools/include/uapi/linux/fcntl.h' differs from latest version at 'include/uapi/linux/fcntl.h'
diff -u tools/include/uapi/linux/fcntl.h include/uapi/linux/fcntl.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-lvfx5cgf0xzmdi9mcjva1ttl@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To deal with the move of some defines from asm-generic/mmap-common.h to
linux/mman.h done in:
746c9398f5 ("arch: move common mmap flags to linux/mman.h")
The generated mmap_flags array stays the same:
$ tools/perf/trace/beauty/mmap_flags.sh
static const char *mmap_flags[] = {
[ilog2(0x40) + 1] = "32BIT",
[ilog2(0x01) + 1] = "SHARED",
[ilog2(0x02) + 1] = "PRIVATE",
[ilog2(0x10) + 1] = "FIXED",
[ilog2(0x20) + 1] = "ANONYMOUS",
[ilog2(0x100000) + 1] = "FIXED_NOREPLACE",
[ilog2(0x0100) + 1] = "GROWSDOWN",
[ilog2(0x0800) + 1] = "DENYWRITE",
[ilog2(0x1000) + 1] = "EXECUTABLE",
[ilog2(0x2000) + 1] = "LOCKED",
[ilog2(0x4000) + 1] = "NORESERVE",
[ilog2(0x8000) + 1] = "POPULATE",
[ilog2(0x10000) + 1] = "NONBLOCK",
[ilog2(0x20000) + 1] = "STACK",
[ilog2(0x40000) + 1] = "HUGETLB",
[ilog2(0x80000) + 1] = "SYNC",
};
$
And to have the system's sys/mman.h find the definition of MAP_SHARED
and MAP_PRIVATE, make sure they are defined in the tools/ mman-common.h
in a way that keeps it the same as the kernel's, need for keeping the
Android's NDK cross build working.
This silences these perf build warnings:
Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/mman-common.h' differs from latest version at 'include/uapi/asm-generic/mman-common.h'
diff -u tools/include/uapi/asm-generic/mman-common.h include/uapi/asm-generic/mman-common.h
Warning: Kernel ABI header at 'tools/include/uapi/linux/mman.h' differs from latest version at 'include/uapi/linux/mman.h'
diff -u tools/include/uapi/linux/mman.h include/uapi/linux/mman.h
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/n/tip-h80ycpc6pedg9s5z2rwpy6ws@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>