Commit Graph

244 Commits

Author SHA1 Message Date
Ido Schimmel
a9b5d871ab netdevsim: Only use sampling truncation length when valid
When the sampling truncation length is invalid (zero), pass the length
of the packet. Without the fix, no payload is reported to user space
when the truncation length is zero.

Fixes: a8700c3dd0 ("netdevsim: Add dummy psample implementation")
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-22 13:23:56 -07:00
Qiheng Lin
be107538c5 netdevsim: remove unneeded semicolon
Eliminate the following coccicheck warning:
 drivers/net/netdevsim/fib.c:569:2-3: Unneeded semicolon

Signed-off-by: Qiheng Lin <linqiheng@huawei.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-04-06 16:27:33 -07:00
Jakub Kicinski
0d7f76dc11 netdevsim: add FEC settings support
Add support for ethtool FEC and some ethtool error injection.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-31 14:15:23 -07:00
Wei Yongjun
20fd4f421c netdevsim: switch to memdup_user_nul()
Use memdup_user_nul() helper instead of open-coding to
simplify the code.

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-24 16:26:58 -07:00
Ido Schimmel
a8700c3dd0 netdevsim: Add dummy psample implementation
Allow netdevsim to report "sampled" packets to the psample module by
periodically generating packets from a work queue. The behavior can be
enabled / disabled (default) and the various meta data attributes can be
controlled via debugfs knobs.

This implementation enables both testing of the psample module with all
the optional attributes as well as development of user space
applications on top of psample such as hsflowd and a Wireshark dissector
for psample generic netlink packets.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-14 15:00:43 -07:00
Ido Schimmel
c6385c0b67 netdevsim: Allow reporting activity on nexthop buckets
A key component of the resilient hashing algorithm is the hash buckets'
activity. If a bucket is active, it will not be populated with a new
nexthop in order not to break existing flows. Therefore, in order to
easily and thoroughly test the algorithm, we need to be in full control
over the reported activity.

Add a debugfs interface that allows user space to have netdevsim report
a nexthop bucket within a resilient nexthop group as active. For
example:

 # echo 10 23 > /sys/kernel/debug/netdevsim/netdevsim10/fib/nexthop_bucket_activity

Will mark bucket 23 in nexthop group 10 as active.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12 17:44:10 -08:00
Ido Schimmel
d8eaa4faca netdevsim: Add support for resilient nexthop groups
Allow resilient nexthop groups to be programmed and account their
occupancy according to their number of buckets. The nexthop group itself
as well as its buckets are marked with hardware flags (i.e.,
'RTNH_F_TRAP').

Replacement of a single nexthop bucket can fail using the following
debugfs knob:

 # cat /sys/kernel/debug/netdevsim/netdevsim10/fib/fail_nexthop_bucket_replace
 N
 # echo 1 > /sys/kernel/debug/netdevsim/netdevsim10/fib/fail_nexthop_bucket_replace
 # cat /sys/kernel/debug/netdevsim/netdevsim10/fib/fail_nexthop_bucket_replace
 Y

Replacement of a resilient nexthop group can fail using the following
debugfs knob:

 # cat /sys/kernel/debug/netdevsim/netdevsim10/fib/fail_res_nexthop_group_replace
 N
 # echo 1 > /sys/kernel/debug/netdevsim/netdevsim10/fib/fail_res_nexthop_group_replace
 # cat /sys/kernel/debug/netdevsim/netdevsim10/fib/fail_res_nexthop_group_replace
 Y

This enables testing of various error paths.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12 17:44:10 -08:00
Ido Schimmel
40ff83711f netdevsim: Create a helper for setting nexthop hardware flags
Instead of calling nexthop_set_hw_flags(), call a helper. It will be
used to also set nexthop bucket flags in a subsequent patch.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12 17:44:10 -08:00
Petr Machata
86927c9c4d netdevsim: fib: Introduce a lock to guard nexthop hashtable
Currently netdevsim relies on RTNL to maintain exclusivity in accessing the
nexthop hash table. However, bucket notification may be called without RTNL
having been held. Instead, introduce a custom lock to guard the table.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-12 17:44:10 -08:00
Jiapeng Chong
c53d21af67 netdevsim: fib: Remove redundant code
Fix the following coccicheck warnings:

./drivers/net/netdevsim/fib.c:874:5-8: Unneeded variable: "err". Return
"0" on line 889.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-11 14:32:48 -08:00
Hillf Danton
863a42b289 netdevsim: init u64 stats for 32bit hardware
Init the u64 stats in order to avoid the lockdep prints on the 32bit
hardware like

 INFO: trying to register non-static key.
 the code is fine but needs lockdep annotation.
 turning off the locking correctness validator.
 CPU: 0 PID: 4695 Comm: syz-executor.0 Not tainted 5.11.0-rc5-syzkaller #0
 Hardware name: ARM-Versatile Express
 Backtrace:
 [<826fc5b8>] (dump_backtrace) from [<826fc82c>] (show_stack+0x18/0x1c arch/arm/kernel/traps.c:252)
 [<826fc814>] (show_stack) from [<8270d1f8>] (__dump_stack lib/dump_stack.c:79 [inline])
 [<826fc814>] (show_stack) from [<8270d1f8>] (dump_stack+0xa8/0xc8 lib/dump_stack.c:120)
 [<8270d150>] (dump_stack) from [<802bf9c0>] (assign_lock_key kernel/locking/lockdep.c:935 [inline])
 [<8270d150>] (dump_stack) from [<802bf9c0>] (register_lock_class+0xabc/0xb68 kernel/locking/lockdep.c:1247)
 [<802bef04>] (register_lock_class) from [<802baa2c>] (__lock_acquire+0x84/0x32d4 kernel/locking/lockdep.c:4711)
 [<802ba9a8>] (__lock_acquire) from [<802be840>] (lock_acquire.part.0+0xf0/0x554 kernel/locking/lockdep.c:5442)
 [<802be750>] (lock_acquire.part.0) from [<802bed10>] (lock_acquire+0x6c/0x74 kernel/locking/lockdep.c:5415)
 [<802beca4>] (lock_acquire) from [<81560548>] (seqcount_lockdep_reader_access include/linux/seqlock.h:103 [inline])
 [<802beca4>] (lock_acquire) from [<81560548>] (__u64_stats_fetch_begin include/linux/u64_stats_sync.h:164 [inline])
 [<802beca4>] (lock_acquire) from [<81560548>] (u64_stats_fetch_begin include/linux/u64_stats_sync.h:175 [inline])
 [<802beca4>] (lock_acquire) from [<81560548>] (nsim_get_stats64+0xdc/0xf0 drivers/net/netdevsim/netdev.c:70)
 [<8156046c>] (nsim_get_stats64) from [<81e2efa0>] (dev_get_stats+0x44/0xd0 net/core/dev.c:10405)
 [<81e2ef5c>] (dev_get_stats) from [<81e53204>] (rtnl_fill_stats+0x38/0x120 net/core/rtnetlink.c:1211)
 [<81e531cc>] (rtnl_fill_stats) from [<81e59d58>] (rtnl_fill_ifinfo+0x6d4/0x148c net/core/rtnetlink.c:1783)
 [<81e59684>] (rtnl_fill_ifinfo) from [<81e5ceb4>] (rtmsg_ifinfo_build_skb+0x9c/0x108 net/core/rtnetlink.c:3798)
 [<81e5ce18>] (rtmsg_ifinfo_build_skb) from [<81e5d0ac>] (rtmsg_ifinfo_event net/core/rtnetlink.c:3830 [inline])
 [<81e5ce18>] (rtmsg_ifinfo_build_skb) from [<81e5d0ac>] (rtmsg_ifinfo_event net/core/rtnetlink.c:3821 [inline])
 [<81e5ce18>] (rtmsg_ifinfo_build_skb) from [<81e5d0ac>] (rtmsg_ifinfo+0x44/0x70 net/core/rtnetlink.c:3839)
 [<81e5d068>] (rtmsg_ifinfo) from [<81e45c2c>] (register_netdevice+0x664/0x68c net/core/dev.c:10103)
 [<81e455c8>] (register_netdevice) from [<815608bc>] (nsim_create+0xf8/0x124 drivers/net/netdevsim/netdev.c:317)
 [<815607c4>] (nsim_create) from [<81561184>] (__nsim_dev_port_add+0x108/0x188 drivers/net/netdevsim/dev.c:941)
 [<8156107c>] (__nsim_dev_port_add) from [<815620d8>] (nsim_dev_port_add_all drivers/net/netdevsim/dev.c:990 [inline])
 [<8156107c>] (__nsim_dev_port_add) from [<815620d8>] (nsim_dev_probe+0x5cc/0x750 drivers/net/netdevsim/dev.c:1119)
 [<81561b0c>] (nsim_dev_probe) from [<815661dc>] (nsim_bus_probe+0x10/0x14 drivers/net/netdevsim/bus.c:287)
 [<815661cc>] (nsim_bus_probe) from [<811724c0>] (really_probe+0x100/0x50c drivers/base/dd.c:554)
 [<811723c0>] (really_probe) from [<811729c4>] (driver_probe_device+0xf8/0x1c8 drivers/base/dd.c:740)
 [<811728cc>] (driver_probe_device) from [<81172fe4>] (__device_attach_driver+0x8c/0xf0 drivers/base/dd.c:846)
 [<81172f58>] (__device_attach_driver) from [<8116fee0>] (bus_for_each_drv+0x88/0xd8 drivers/base/bus.c:431)
 [<8116fe58>] (bus_for_each_drv) from [<81172c6c>] (__device_attach+0xdc/0x1d0 drivers/base/dd.c:914)
 [<81172b90>] (__device_attach) from [<8117305c>] (device_initial_probe+0x14/0x18 drivers/base/dd.c:961)
 [<81173048>] (device_initial_probe) from [<81171358>] (bus_probe_device+0x90/0x98 drivers/base/bus.c:491)
 [<811712c8>] (bus_probe_device) from [<8116e77c>] (device_add+0x320/0x824 drivers/base/core.c:3109)
 [<8116e45c>] (device_add) from [<8116ec9c>] (device_register+0x1c/0x20 drivers/base/core.c:3182)
 [<8116ec80>] (device_register) from [<81566710>] (nsim_bus_dev_new drivers/net/netdevsim/bus.c:336 [inline])
 [<8116ec80>] (device_register) from [<81566710>] (new_device_store+0x178/0x208 drivers/net/netdevsim/bus.c:215)
 [<81566598>] (new_device_store) from [<8116fcb4>] (bus_attr_store+0x2c/0x38 drivers/base/bus.c:122)
 [<8116fc88>] (bus_attr_store) from [<805b4b8c>] (sysfs_kf_write+0x48/0x54 fs/sysfs/file.c:139)
 [<805b4b44>] (sysfs_kf_write) from [<805b3c90>] (kernfs_fop_write_iter+0x128/0x1ec fs/kernfs/file.c:296)
 [<805b3b68>] (kernfs_fop_write_iter) from [<804d22fc>] (call_write_iter include/linux/fs.h:1901 [inline])
 [<805b3b68>] (kernfs_fop_write_iter) from [<804d22fc>] (new_sync_write fs/read_write.c:518 [inline])
 [<805b3b68>] (kernfs_fop_write_iter) from [<804d22fc>] (vfs_write+0x3dc/0x57c fs/read_write.c:605)
 [<804d1f20>] (vfs_write) from [<804d2604>] (ksys_write+0x68/0xec fs/read_write.c:658)
 [<804d259c>] (ksys_write) from [<804d2698>] (__do_sys_write fs/read_write.c:670 [inline])
 [<804d259c>] (ksys_write) from [<804d2698>] (sys_write+0x10/0x14 fs/read_write.c:667)
 [<804d2688>] (sys_write) from [<80200060>] (ret_fast_syscall+0x0/0x2c arch/arm/mm/proc-v7.S:64)

Fixes: 83c9e13aa3 ("netdevsim: add software driver for testing offloads")
Reported-by: syzbot+e74a6857f2d0efe3ad81@syzkaller.appspotmail.com
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Hillf Danton <hdanton@sina.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-04 14:36:26 -08:00
Amit Cohen
134c753242 netdevsim: fib: Add debugfs to debug route offload failure
Add "fail_route_offload" flag to disallow offloading routes.
It is needed to test "offload failed" notifications.

Create the flag as part of nsim_fib_create() under fib directory and set
it to false by default.

When FIB_EVENT_ENTRY_{REPLACE, APPEND} are triggered and
"fail_route_offload" value is true, set the appropriate hardware flag to
make the kernel emit RTM_NEWROUTE notification with RTM_F_OFFLOAD_FAILED
flag.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-08 16:47:03 -08:00
Ido Schimmel
f57ab5b75f netdevsim: dev: Initialize FIB module after debugfs
Initialize the dummy FIB offload module after debugfs, so that the FIB
module could create its own directory there.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-08 16:47:03 -08:00
Amit Cohen
484a4dfb75 netdevsim: fib: Do not warn if route was not found for several events
The next patch will add the ability to fail route offload controlled by
debugfs variable called "fail_route_offload".

If we vetoed the addition, we might get a delete or append notification
for a route we do not have. Therefore, do not warn if route was not found.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-08 16:47:03 -08:00
Amit Cohen
0c5fcf9e24 IPv6: Add "offload failed" indication to routes
After installing a route to the kernel, user space receives an
acknowledgment, which means the route was installed in the kernel, but not
necessarily in hardware.

The asynchronous nature of route installation in hardware can lead to a
routing daemon advertising a route before it was actually installed in
hardware. This can result in packet loss or mis-routed packets until the
route is installed in hardware.

To avoid such cases, previous patch set added the ability to emit
RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags
are changed, this behavior is controlled by sysctl.

With the above mentioned behavior, it is possible to know from user-space
if the route was offloaded, but if the offload fails there is no indication
to user-space. Following a failure, a routing daemon will wait indefinitely
for a notification that will never come.

This patch adds an "offload_failed" indication to IPv6 routes, so that
users will have better visibility into the offload process.

'struct fib6_info' is extended with new field that indicates if route
offload failed. Note that the new field is added using unused bit and
therefore there is no need to increase struct size.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-08 16:47:03 -08:00
Amit Cohen
36c5100e85 IPv4: Add "offload failed" indication to routes
After installing a route to the kernel, user space receives an
acknowledgment, which means the route was installed in the kernel, but not
necessarily in hardware.

The asynchronous nature of route installation in hardware can lead to a
routing daemon advertising a route before it was actually installed in
hardware. This can result in packet loss or mis-routed packets until the
route is installed in hardware.

To avoid such cases, previous patch set added the ability to emit
RTM_NEWROUTE notifications whenever RTM_F_OFFLOAD/RTM_F_TRAP flags
are changed, this behavior is controlled by sysctl.

With the above mentioned behavior, it is possible to know from user-space
if the route was offloaded, but if the offload fails there is no indication
to user-space. Following a failure, a routing daemon will wait indefinitely
for a notification that will never come.

This patch adds an "offload_failed" indication to IPv4 routes, so that
users will have better visibility into the offload process.

'struct fib_alias', and 'struct fib_rt_info' are extended with new field
that indicates if route offload failed. Note that the new field is added
using unused bit and therefore there is no need to increase structs size.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-02-08 16:47:03 -08:00
Amit Cohen
efc42879ec net: Do not call fib6_info_hw_flags_set() when IPv6 is disabled
With the next patch mlxsw and netdevsim will fail in compilation if
CONFIG_IPV6 is disabled.

Do not call fib6_info_hw_flags_set() when IPv6 is disabled.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02 17:45:59 -08:00
Amit Cohen
fbaca8f895 net: Pass 'net' struct as first argument to fib6_info_hw_flags_set()
The next patch will emit notification when hardware flags are changed,
in case that fib_notify_on_flag_change sysctl is set to 1.

To know sysctl values, net struct is needed.
This change is consistent with the IPv4 version, which gets 'net' struct
as its first argument.

Currently, the only callers of this function are mlxsw and netdevsim.
Patch the callers to pass net.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02 17:45:59 -08:00
Amit Cohen
0ae3eb7b46 netdevsim: fib: Perform the route programming in a non-atomic context
Currently, netdevsim implements dummy FIB offload and marks notified
routes with RTM_F_TRAP flag. netdevsim does not defer route notifications
to a work queue because it does not need to program any hardware.

Given that netdevsim's purpose is to both give an example implementation
and allow developers to test their code, align netdevsim to a "real"
hardware device driver like mlxsw and have it also perform the route
"programming" in a non-atomic context.

It will be used to test route flags notifications which will be added in
the next patches.

The following changes are needed when route handling is performed in WQ:
- Handle the accounting in the main context, to be able to return an
  error for adding route when all the routes are used.
  For FIB_EVENT_ENTRY_REPLACE increase the counter before scheduling
  the delayed work, and in case that this event replaces an existing route,
  decrease the counter as part of the delayed work.

- For IPv6, cannot use fen6_info->rt->fib6_siblings list because it
  might be changed during handling the delayed work.
  Save an array with the nexthops as part of fib6_event struct, and take
  a reference for each nexthop to prevent them from being freed while
  event is queued.

- Change GFP_ATOMIC allocations to GFP_KERNEL.

- Use single work item that is handling a list of ordered routes.
  Handling routes must be processed in the order they were submitted to
  avoid logical errors that could lead to unexpected failures.

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02 17:45:58 -08:00
Amit Cohen
9e635a21ca netdevsim: fib: Convert the current occupancy to an atomic variable
When route is added/deleted, the appropriate counter is increased/decreased
to maintain number of routes.

User can limit the number of routes and then according to the appropriate
counter, adding more routes than the limitation is forbidden.

Currently, there is one lock which protects hashtable, list and accounting.

Handling the counters will be performed from both atomic context and
non-atomic context, while the hashtable and the list will be used only from
non-atomic context and therefore will be protected by a separate lock.

Protect accounting by using an atomic variable, so lock is not needed.

v2:
* Use atomic64_sub() in nsim_nexthop_account()'s error path

Signed-off-by: Amit Cohen <amcohen@nvidia.com>
Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-02 17:45:58 -08:00
Ido Schimmel
09ad6becf5 nexthop: Use enum to encode notification type
Currently there are only two types of in-kernel nexthop notification.
The two are distinguished by the 'is_grp' boolean field in 'struct
nh_notifier_info'.

As more notification types are introduced for more next-hop group types, a
boolean is not an easily extensible interface. Instead, convert it to an
enum.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: Petr Machata <petrm@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-28 20:49:52 -08:00
Jakub Kicinski
30bfce1094 net: remove ndo_udp_tunnel_* callbacks
All UDP tunnel port management is now routed via udp_tunnel_nic
infra directly. Remove the old callbacks.

Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-07 12:53:29 -08:00
Jakub Kicinski
46d5e62dd3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
xdp_return_frame_bulk() needs to pass a xdp_buff
to __xdp_return().

strlcpy got converted to strscpy but here it makes no
functional difference, so just keep the right code.

Conflicts:
	net/netfilter/nf_tables_api.c

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-11 22:29:38 -08:00
Toke Høiland-Jørgensen
e4ff5aa469 netdevsim: Add debugfs toggle to reject BPF programs in verifier
This adds a new debugfs toggle ('bpf_bind_verifier_accept') that can be
used to make netdevsim reject BPF programs from being accepted by the
verifier. If this toggle (which defaults to true) is set to false,
nsim_bpf_verify_insn() will return EOPNOTSUPP on the last
instruction (after outputting the 'Hello from netdevsim' verifier message).

This makes it possible to check the verification callback in the driver
from test_offload.py in selftests, since the verifier now clears the
verifier log on a successful load, hiding the message from the driver.

Fixes: 6f8a57ccf8 ("bpf: Make verifier log more relevant by default")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/bpf/160752225964.110217.12584017165318065332.stgit@toke.dk
2020-12-09 16:27:42 +01:00
Toke Høiland-Jørgensen
998f172962 xdp: Remove the xdp_attachment_flags_ok() callback
Since commit 7f0a838254 ("bpf, xdp: Maintain info on attached XDP BPF
programs in net_device"), the XDP program attachment info is now maintained
in the core code. This interacts badly with the xdp_attachment_flags_ok()
check that prevents unloading an XDP program with different load flags than
it was loaded with. In practice, two kinds of failures are seen:

- An XDP program loaded without specifying a mode (and which then ends up
  in driver mode) cannot be unloaded if the program mode is specified on
  unload.

- The dev_xdp_uninstall() hook always calls the driver callback with the
  mode set to the type of the program but an empty flags argument, which
  means the flags_ok() check prevents the program from being removed,
  leading to bpf prog reference leaks.

The original reason this check was added was to avoid ambiguity when
multiple programs were loaded. With the way the checks are done in the core
now, this is quite simple to enforce in the core code, so let's add a check
there and get rid of the xdp_attachment_flags_ok() callback entirely.

Fixes: 7f0a838254 ("bpf, xdp: Maintain info on attached XDP BPF programs in net_device")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/bpf/160752225751.110217.10267659521308669050.stgit@toke.dk
2020-12-09 16:27:42 +01:00
Antonio Cardace
a7fc6db099 netdevsim: support ethtool ring and coalesce settings
Add ethtool ring and coalesce settings support for testing.

Signed-off-by: Antonio Cardace <acardace@redhat.com>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20 12:51:54 -08:00
Antonio Cardace
77f9591b21 netdevsim: move ethtool pause params in separate struct
This will help the refactoring in the next commit
when coalesce and ring settings are added.

Signed-off-by: Antonio Cardace <acardace@redhat.com>
Reviewed-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-20 12:51:54 -08:00
Jacob Keller
52cc5f3a16 devlink: move flash end and begin to core devlink
When performing a flash update via devlink, device drivers may inform
user space of status updates via
devlink_flash_update_(begin|end|timeout|status)_notify functions.

It is expected that drivers do not send any status notifications unless
they send a begin and end message. If a driver sends a status
notification without sending the appropriate end notification upon
finishing (regardless of success or failure), the current implementation
of the devlink userspace program can get stuck endlessly waiting for the
end notification that will never come.

The current ice driver implementation may send such a status message
without the appropriate end notification in rare cases.

Fixing the ice driver is relatively simple: we just need to send the
begin_notify at the start of the function and always send an end_notify
no matter how the function exits.

Rather than assuming driver authors will always get this right in the
future, lets just fix the API so that it is not possible to get wrong.
Make devlink_flash_update_begin_notify and
devlink_flash_update_end_notify static, and call them in devlink.c core
code. Always send the begin_notify just before calling the driver's
flash_update routine. Always send the end_notify just after the routine
returns regardless of success or failure.

Doing this makes the status notification easier to use from the driver,
as it no longer needs to worry about catching failures and cleaning up
by calling devlink_flash_update_end_notify. It is now no longer possible
to do the wrong thing in this regard. We also save a couple of lines of
code in each driver.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-19 21:41:02 -08:00
Jakub Kicinski
56495a2442 Merge https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-19 19:08:46 -08:00
Taehee Yoo
a5bbcbf290 netdevsim: set .owner to THIS_MODULE
If THIS_MODULE is not set, the module would be removed while debugfs is
being used.
It eventually makes kernel panic.

Fixes: 82c93a87bf ("netdevsim: implement couple of testing devlink health reporters")
Fixes: 424be63ad8 ("netdevsim: add UDP tunnel port offload support")
Fixes: 4418f862d6 ("netdevsim: implement support for devlink region and snapshots")
Fixes: d3cbb907ae ("netdevsim: add ACL trap reporting cookie as a metadata")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Link: https://lore.kernel.org/r/20201115103041.30701-1-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-17 14:08:59 -08:00
Ido Schimmel
66e58bf070 netdevsim: Allow programming routes with nexthop objects
Previous patches added the ability to program nexthop objects.
Therefore, no longer forbid the programming of routes that point to such
objects.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-06 11:28:50 -08:00
Ido Schimmel
8fa84742d6 netdevsim: Add dummy implementation for nexthop offload
Implement dummy nexthop "offload" in the driver by storing currently
"programmed" nexthops in a hash table. Each nexthop in the hash table is
marked with "trap" indication and increments the nexthops resource
occupancy.

This will later allow us to test the nexthop offload API on top of
netdevsim.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-06 11:28:50 -08:00
Ido Schimmel
35266255d6 netdevsim: Add devlink resource for nexthops
The Spectrum ASIC has a dedicated table where nexthops (i.e., adjacency
entries) are populated. The size of this table can be controlled via
devlink-resource.

Add such a resource to netdevsim so that its occupancy will reflect the
number of nexthop objects currently programmed to the device.

By limiting the size of the resource, error paths could be exercised and
tested.

Example output:

# devlink resource show netdevsim/netdevsim10
netdevsim/netdevsim10:
  name IPv4 size unlimited unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables none
    resources:
      name fib size unlimited occ 4 unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables none
      name fib-rules size unlimited occ 3 unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables none
  name IPv6 size unlimited unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables none
    resources:
      name fib size unlimited occ 1 unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables none
      name fib-rules size unlimited occ 2 unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables none
  name nexthops size unlimited occ 0 unit entry size_min 0 size_max unlimited size_gran 1 dpipe_tables none

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-11-06 11:28:50 -08:00
Moshe Shemesh
dc64cc7c63 devlink: Add devlink reload limit option
Add reload limit to demand restrictions on reload actions.
Reload limits supported:
no_reset: No reset allowed, no down time allowed, no link flap and no
          configuration is lost.

By default reload limit is unspecified and so no constraints on reload
actions are required.

Some combinations of action and limit are invalid. For example, driver
can not reinitialize its entities without any downtime.

The no_reset reload limit will have usecase in this patchset to
implement restricted fw_activate on mlx5.

Have the uapi parameter of reload limit ready for future support of
multiselection.

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-09 12:06:52 -07:00
Moshe Shemesh
ccdf07219d devlink: Add reload action option to devlink reload command
Add devlink reload action to allow the user to request a specific reload
action. The action parameter is optional, if not specified then devlink
driver re-init action is used (backward compatible).
Note that when required to do firmware activation some drivers may need
to reload the driver. On the other hand some drivers may need to reset
the firmware to reinitialize the driver entities. Therefore, the devlink
reload command returns the actions which were actually performed.
Reload actions supported are:
driver_reinit: driver entities re-initialization, applying devlink-param
               and devlink-resource values.
fw_activate: firmware activate.

command examples:
$devlink dev reload pci/0000:82:00.0 action driver_reinit
reload_actions_performed:
  driver_reinit

$devlink dev reload pci/0000:82:00.0 action fw_activate
reload_actions_performed:
  driver_reinit fw_activate

Signed-off-by: Moshe Shemesh <moshe@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-10-09 12:06:52 -07:00
Jakub Kicinski
dda751731e netdevsim: support the static IANA VXLAN port flag
Allow setting UDP_TUNNEL_NIC_INFO_STATIC_IANA_VXLAN.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-28 12:50:12 -07:00
Jakub Kicinski
dc9c07537c netdevsim: shared UDP tunnel port table support
Add the ability to simulate a device with a shared UDP tunnel port
table.

Try to reject the configurations and actions which are not supported
by the core, so we don't get syzcaller etc. warning reports.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-28 12:50:12 -07:00
Jakub Kicinski
860e9d328f netdevsim: add warnings on unexpected UDP tunnel port errors
We should never see a removal of a port which is not in the table
or adding a port to an occupied entry in the table. To make sure
such errors don't escape the checks in the test script add a
warning/kernel spat.

Error injection will not trigger those, nor should it ever put
us in a bad state.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-28 12:50:12 -07:00
Jakub Kicinski
090bc03bc9 netdevsim: fix duplicated debugfs directory
The "ethtool" debugfs directory holds per-netdev knobs, so move
it from the device instance directory to the port directory.

This fixes the following warning when creating multiple ports:

 debugfs: Directory 'ethtool' with parent 'netdevsim1' already present!

Fixes: ff1f7c17fb ("netdevsim: add pause frame stats")
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-26 14:19:08 -07:00
Jacob Keller
cbb58368fb netdevsim: add support for flash_update overwrite mask
The devlink interface recently gained support for a new "overwrite mask"
parameter that allows specifying how various sub-sections of a flash
component are modified when updating.

Add support for this to netdevsim, to enable easily testing the
interface. Make the allowed overwrite mask values controllable via
a debugfs parameter. This enables testing a flow where the driver
rejects an unsupportable overwrite mask.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-25 17:20:57 -07:00
Jacob Keller
bc75c054f0 devlink: convert flash_update to use params structure
The devlink core recently gained support for checking whether the driver
supports a flash_update parameter, via `supported_flash_update_params`.
However, parameters are specified as function arguments. Adding a new
parameter still requires modifying the signature of the .flash_update
callback in all drivers.

Convert the .flash_update function to take a new `struct
devlink_flash_update_params` instead. By using this structure, and the
`supported_flash_update_params` bit field, a new parameter to
flash_update can be added without requiring modification to existing
drivers.

As before, all parameters except file_name will require driver opt-in.
Because file_name is a necessary field to for the flash_update to make
sense, no "SUPPORTED" bitflag is provided and it is always considered
valid. All future additional parameters will require a new bit in the
supported_flash_update_params bitfield.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Chan <michael.chan@broadcom.com>
Cc: Bin Luo <luobin9@huawei.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Ido Schimmel <idosch@mellanox.com>
Cc: Danielle Ratson <danieller@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-25 17:20:57 -07:00
Jacob Keller
22ec3d232f devlink: check flash_update parameter support in net core
When implementing .flash_update, drivers which do not support
per-component update are manually checking the component parameter to
verify that it is NULL. Without this check, the driver might accept an
update request with a component specified even though it will not honor
such a request.

Instead of having each driver check this, move the logic into
net/core/devlink.c, and use a new `supported_flash_update_params` field
in the devlink_ops. Drivers which will support per-component update must
now specify this by setting DEVLINK_SUPPORT_FLASH_UPDATE_COMPONENT in
the supported_flash_update_params in their devlink_ops.

This helps ensure that drivers do not forget to check for a NULL
component if they do not support per-component update. This also enables
a slightly better error message by enabling the core stack to set the
netlink bad attribute message to indicate precisely the unsupported
attribute in the message.

Going forward, any new additional parameter to flash update will require
a bit in the supported_flash_update_params bitfield.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michael Chan <michael.chan@broadcom.com>
Cc: Bin Luo <luobin9@huawei.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Ido Schimmel <idosch@mellanox.com>
Cc: Danielle Ratson <danieller@mellanox.com>
Cc: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-25 17:20:57 -07:00
Andrew Lunn
d4602a9f47 net: devlink: region: Pass the region ops to the snapshot function
Pass the region to be snapshotted to the function performing the
snapshot. This allows one function to operate on numerous regions.

v4:
Add missing kerneldoc for ICE

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-18 18:17:45 -07:00
Shannon Nelson
b311b001de netdevsim: devlink flash timeout message
Add a simple devlink flash timeout message to exercise
the message mechanism.

Signed-off-by: Shannon Nelson <snelson@pensando.io>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-18 13:54:23 -07:00
Jakub Kicinski
ff1f7c17fb netdevsim: add pause frame stats
Add minimal ethtool interface for testing ethtool pause stats.

v2: add missing static on nsim_ethtool_ops

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Saeed Mahameed <saeedm@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-15 13:26:28 -07:00
Gustavo A. R. Silva
df561f6688 treewide: Use fallthrough pseudo-keyword
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-23 17:36:59 -05:00
David S. Miller
2e7199bd77 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2020-08-04

The following pull-request contains BPF updates for your *net-next* tree.

We've added 73 non-merge commits during the last 9 day(s) which contain
a total of 135 files changed, 4603 insertions(+), 1013 deletions(-).

The main changes are:

1) Implement bpf_link support for XDP. Also add LINK_DETACH operation for the BPF
   syscall allowing processes with BPF link FD to force-detach, from Andrii Nakryiko.

2) Add BPF iterator for map elements and to iterate all BPF programs for efficient
   in-kernel inspection, from Yonghong Song and Alexei Starovoitov.

3) Separate bpf_get_{stack,stackid}() helpers for perf events in BPF to avoid
   unwinder errors, from Song Liu.

4) Allow cgroup local storage map to be shared between programs on the same
   cgroup. Also extend BPF selftests with coverage, from YiFei Zhu.

5) Add BPF exception tables to ARM64 JIT in order to be able to JIT BPF_PROBE_MEM
   load instructions, from Jean-Philippe Brucker.

6) Follow-up fixes on BPF socket lookup in combination with reuseport group
   handling. Also add related BPF selftests, from Jakub Sitnicki.

7) Allow to use socket storage in BPF_PROG_TYPE_CGROUP_SOCK-typed programs for
   socket create/release as well as bind functions, from Stanislav Fomichev.

8) Fix an info leak in xsk_getsockopt() when retrieving XDP stats via old struct
   xdp_statistics, from Peilin Ye.

9) Fix PT_REGS_RC{,_CORE}() macros in libbpf for MIPS arch, from Jerry Crunchtime.

10) Extend BPF kernel test infra with skb->family and skb->{local,remote}_ip{4,6}
    fields and allow user space to specify skb->dev via ifindex, from Dmitry Yakunin.

11) Fix a bpftool segfault due to missing program type name and make it more robust
    to prevent them in future gaps, from Quentin Monnet.

12) Consolidate cgroup helper functions across selftests and fix a v6 localhost
    resolver issue, from John Fastabend.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-03 18:27:40 -07:00
Ido Schimmel
c88e11e047 devlink: Pass extack when setting trap's action and group's parameters
A later patch will refuse to set the action of certain traps in mlxsw
and also to change the policer binding of certain groups. Pass extack so
that failure could be communicated clearly to user space.

Reviewed-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-03 18:06:46 -07:00
Andrii Nakryiko
e8407fdeb9 bpf, xdp: Remove XDP_QUERY_PROG and XDP_QUERY_PROG_HW XDP commands
Now that BPF program/link management is centralized in generic net_device
code, kernel code never queries program id from drivers, so
XDP_QUERY_PROG/XDP_QUERY_PROG_HW commands are unnecessary.

This patch removes all the implementations of those commands in kernel, along
the xdp_attachment_query().

This patch was compile-tested on allyesconfig.

Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200722064603.3350758-10-andriin@fb.com
2020-07-25 20:37:02 -07:00
David S. Miller
a57066b1a0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
The UDP reuseport conflict was a little bit tricky.

The net-next code, via bpf-next, extracted the reuseport handling
into a helper so that the BPF sk lookup code could invoke it.

At the same time, the logic for reuseport handling of unconnected
sockets changed via commit efc6b6f6c3
which changed the logic to carry on the reuseport result into the
rest of the lookup loop if we do not return immediately.

This requires moving the reuseport_has_conns() logic into the callers.

While we are here, get rid of inline directives as they do not belong
in foo.c files.

The other changes were cases of more straightforward overlapping
modifications.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-25 17:49:04 -07:00
Taehee Yoo
2c9d8e01f0 netdevsim: fix unbalaced locking in nsim_create()
In the nsim_create(), rtnl_lock() is called before nsim_bpf_init().
If nsim_bpf_init() is failed, rtnl_unlock() should be called,
but it isn't called.
So, unbalanced locking would occur.

Fixes: e05b2d141f ("netdevsim: move netdev creation/destruction to dev probe")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-21 16:00:43 -07:00
Jakub Kicinski
424be63ad8 netdevsim: add UDP tunnel port offload support
Add UDP tunnel port handlers to our fake driver so we can test
the core infra.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-10 13:54:00 -07:00
Danielle Ratson
71ad8d55f8 devlink: Replace devlink_port_attrs_set parameters with a struct
Currently, devlink_port_attrs_set accepts a long list of parameters,
that most of them are devlink port's attributes.

Use the devlink_port_attrs struct to replace the relevant parameters.

Signed-off-by: Danielle Ratson <danieller@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-07-09 13:15:29 -07:00
Ido Schimmel
1897936744 netdevsim: Register control traps
Register two control traps with devlink. The existing selftest at
tools/testing/selftests/drivers/net/netdevsim/devlink_trap.sh iterates
over all registered traps and checks that the action of non-drop traps
cannot be changed. Up until now only exception traps were tested, now
control traps will be tested as well.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01 11:49:23 -07:00
Ido Schimmel
85176f19f5 netdevsim: Move layer 3 exceptions to exceptions trap group
The layer 3 exceptions are still subject to the same trap policer, so
nothing changes, but user space can choose to assign a different one.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-06-01 11:49:23 -07:00
Ido Schimmel
be43224fc0 netdevsim: Ensure policer drop counter always increases
In case the policer drop counter is retrieved when the jiffies value is
a multiple of 64, the counter will not be incremented.

This randomly breaks a selftest [1] the reads the counter twice and
checks that it was incremented:

```
TEST: Trap policer                                                  [FAIL]
	Policer drop counter was not incremented
```

Fix by always incrementing the counter by 1.

[1] tools/testing/selftests/drivers/net/netdevsim/devlink_trap.sh

Fixes: ad188458d0 ("netdevsim: Add devlink-trap policer support")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-05-22 16:05:42 -07:00
Gustavo A. R. Silva
3902baf9ab netdevsim: dev: Fix memory leak in nsim_dev_take_snapshot_write
In case memory resources for dummy_data were allocated, release them
before return.

Addresses-Coverity-ID: 1491997 ("Resource leak")
Fixes: 7ef19d3b1d ("devlink: report error once U32_MAX snapshot ids have been used")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30 20:14:22 -07:00
Ido Schimmel
0dc8249a2e netdevsim: Add support for setting of packet trap group parameters
Add a dummy callback to set trap group parameters. Return an error when
the 'fail_trap_group_set' debugfs file is set in order to exercise error
paths and verify that error is propagated to user space when should.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30 17:54:58 -07:00
Ido Schimmel
f9f54392d2 devlink: Add packet trap group parameters support
Packet trap groups are used to aggregate logically related packet traps.
Currently, these groups allow user space to batch operations such as
setting the trap action of all member traps.

In order to prevent the CPU from being overwhelmed by too many trapped
packets, it is desirable to bind a packet trap policer to these groups.
For example, to limit all the packets that encountered an exception
during routing to 10Kpps.

Allow device drivers to bind default packet trap policers to packet trap
groups when the latter are registered with devlink.

The next patch will enable user space to change this default binding.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30 17:54:58 -07:00
Ido Schimmel
ad188458d0 netdevsim: Add devlink-trap policer support
Register three dummy packet trap policers with devlink and implement
callbacks to change their parameters and read their counters.

This will be used later on in the series to test the devlink-trap
policer infrastructure.

v2:
* Remove check about burst size being a power of 2 and instead add a
  debugfs knob to fail the operation
* Provide max/min rate/burst size when registering policers and remove
  the validity checks from nsim_dev_devlink_trap_policer_set()

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30 17:54:58 -07:00
Eran Ben Elisha
ba7d16c779 devlink: Implicitly set auto recover flag when registering health reporter
When health reporter is registered to devlink, devlink will implicitly set
auto recover if and only if the reporter has a recover method. No reason
to explicitly get the auto recover flag from the driver.

Remove this flag from all drivers that called
devlink_health_reporter_create.

All existing health reporters set auto recovery to true if they have a
recover method.

Yet, administrator can unset auto recover via netlink command as prior to
this patch.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30 11:17:34 -07:00
Eran Ben Elisha
c7f0d4c898 netdevsim: Change dummy reporter auto recover default
Health reporters should be registered with auto recover set to true.
Align dummy reporter behaviour with that, as in later patch the option to
set auto recover behaviour will be removed.

In addition, align netdevsim selftest to the new default value.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-30 11:17:34 -07:00
Jacob Keller
3fe0fd531a netdevsim: support taking immediate snapshot via devlink
Implement the .snapshot region operation for the dummy data region. This
enables a region snapshot to be taken upon request via the new
DEVLINK_CMD_REGION_SNAPSHOT command.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-26 19:39:26 -07:00
Jacob Keller
12102436ac devlink: track snapshot id usage count using an xarray
Each snapshot created for a devlink region must have an id. These ids
are supposed to be unique per "event" that caused the snapshot to be
created. Drivers call devlink_region_snapshot_id_get to obtain a new id
to use for a new event trigger. The id values are tracked per devlink,
so that the same id number can be used if a triggering event creates
multiple snapshots on different regions.

There is no mechanism for snapshot ids to ever be reused. Introduce an
xarray to store the count of how many snapshots are using a given id,
replacing the snapshot_id field previously used for picking the next id.

The devlink_region_snapshot_id_get() function will use xa_alloc to
insert an initial value of 1 value at an available slot between 0 and
U32_MAX.

The new __devlink_snapshot_id_increment() and
__devlink_snapshot_id_decrement() functions will be used to track how
many snapshots currently use an id.

Drivers must now call devlink_snapshot_id_put() in order to release
their reference of the snapshot id after adding region snapshots.

By tracking the total number of snapshots using a given id, it is
possible for the decrement() function to erase the id from the xarray
when it is not in use.

With this method, a snapshot id can become reused again once all
snapshots that referred to it have been deleted via
DEVLINK_CMD_REGION_DEL, and the driver has finished adding snapshots.

This work also paves the way to introduce a mechanism for userspace to
request a snapshot.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-26 19:39:26 -07:00
Jacob Keller
7ef19d3b1d devlink: report error once U32_MAX snapshot ids have been used
The devlink_snapshot_id_get() function returns a snapshot id. The
snapshot id is a u32, so there is no way to indicate an error code.

A future change is going to possibly add additional cases where this
function could fail. Refactor the function to return the snapshot id in
an argument, so that it can return zero or an error value.

This ensures that snapshot ids cannot be confused with error values, and
aids in the future refactor of snapshot id allocation management.

Because there is no current way to release previously used snapshot ids,
add a simple check ensuring that an error is reported in case the
snapshot_id would over flow.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-26 19:39:26 -07:00
Jacob Keller
a0a09f6bb2 devlink: convert snapshot destructor callback to region op
It does not makes sense that two snapshots for a given region would use
different destructors. Simplify snapshot creation by adding
a .destructor op for regions.

This operation will replace the data_destructor for the snapshot
creation, and makes snapshot creation easier.

Noticed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-26 19:39:26 -07:00
Jacob Keller
e893768179 devlink: prepare to support region operations
Modify the devlink region code in preparation for adding new operations
on regions.

Create a devlink_region_ops structure, and move the name pointer from
within the devlink_region structure into the ops structure (similar to
the devlink_health_reporter_ops).

This prepares the regions to enable support of additional operations in
the future such as requesting snapshots, or accessing the region
directly without a snapshot.

In order to re-use the constant strings in the mlx4 driver their
declaration must be changed to 'const char * const' to ensure the
compiler realizes that both the data and the pointer cannot change.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-26 19:39:26 -07:00
David S. Miller
9fb16955fb Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Overlapping header include additions in macsec.c

A bug fix in 'net' overlapping with the removal of 'version'
string in ena_netdev.c

Overlapping test additions in selftests Makefile

Overlapping PCI ID table adjustments in iwlwifi driver.

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-25 18:58:11 -07:00
Ido Schimmel
107f167894 devlink: Only pass packet trap group identifier in trap structure
Packet trap groups are now explicitly registered by drivers and not
implicitly registered when the packet traps are registered. Therefore,
there is no need to encode entire group structure the trap is associated
with inside the trap structure.

Instead, only pass the group identifier. Refer to it as initial group
identifier, as future patches will allow user space to move traps
between groups.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-23 21:40:40 -07:00
Ido Schimmel
b29545d840 netdevsim: Explicitly register packet trap groups
Use the previously added API to explicitly register / unregister
supported packet trap groups. This is in preparation for future patches
that will enable drivers to pass additional group attributes, such as
associated policer identifier.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-23 21:40:40 -07:00
Takashi Iwai
2da222f612 net: netdevsim: Use scnprintf() for avoiding potential buffer overflow
Since snprintf() returns the would-be-output size instead of the
actual output size, the succeeding calls may go beyond the given
buffer limit.  Fix it by replacing with scnprintf().

Cc: "David S . Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-03-15 17:06:22 -07:00
Jiri Pirko
d3cbb907ae netdevsim: add ACL trap reporting cookie as a metadata
Add new trap ACL which reports flow action cookie in a metadata. Allow
used to setup the cookie using debugfs file.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-25 11:05:55 -08:00
Jiri Pirko
5a2e106c74 devlink: extend devlink_trap_report() to accept cookie and pass
Add cookie argument to devlink_trap_report() allowing driver to pass in
the user cookie. Pass on the cookie down to drop monitor code.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-25 11:05:54 -08:00
kbuild test robot
34611e6996 netdevsim: fix ptr_ret.cocci warnings
drivers/net/netdevsim/dev.c:937:1-3: WARNING: PTR_ERR_OR_ZERO can be used

 Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR

Generated by: scripts/coccinelle/api/ptr_ret.cocci

Fixes: 6556ff32f1 ("netdevsim: use IS_ERR instead of IS_ERR_OR_NULL for debugfs")
CC: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: kbuild test robot <lkp@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-02-05 13:55:32 +01:00
Taehee Yoo
245311637f netdevsim: remove unused sdev code
sdev.c code is merged into dev.c and is not used anymore.
it would be removed.

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-02-03 15:32:20 -08:00
Taehee Yoo
83cf4213ba netdevsim: use __GFP_NOWARN to avoid memalloc warning
vfnum buffer size and binary_len buffer size is received by user-space.
So, this buffer size could be too large. If so, kmalloc will internally
print a warning message.
This warning message is actually not necessary for the netdevsim module.
So, this patch adds __GFP_NOWARN.

Test commands:
    modprobe netdevsim
    echo 1 > /sys/bus/netdevsim/new_device
    echo 1000000000 > /sys/devices/netdevsim1/sriov_numvfs

Splat looks like:
[  357.847266][ T1000] WARNING: CPU: 0 PID: 1000 at mm/page_alloc.c:4738 __alloc_pages_nodemask+0x2f3/0x740
[  357.850273][ T1000] Modules linked in: netdevsim veth openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrx
[  357.852989][ T1000] CPU: 0 PID: 1000 Comm: bash Tainted: G    B             5.5.0-rc5+ #270
[  357.854334][ T1000] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[  357.855703][ T1000] RIP: 0010:__alloc_pages_nodemask+0x2f3/0x740
[  357.856669][ T1000] Code: 64 fe ff ff 65 48 8b 04 25 c0 0f 02 00 48 05 f0 12 00 00 41 be 01 00 00 00 49 89 47 0
[  357.860272][ T1000] RSP: 0018:ffff8880b7f47bd8 EFLAGS: 00010246
[  357.861009][ T1000] RAX: ffffed1016fe8f80 RBX: 1ffff11016fe8fae RCX: 0000000000000000
[  357.861843][ T1000] RDX: 0000000000000000 RSI: 0000000000000017 RDI: 0000000000000000
[  357.862661][ T1000] RBP: 0000000000040dc0 R08: 1ffff11016fe8f67 R09: dffffc0000000000
[  357.863509][ T1000] R10: ffff8880b7f47d68 R11: fffffbfff2798180 R12: 1ffff11016fe8f80
[  357.864355][ T1000] R13: 0000000000000017 R14: 0000000000000017 R15: ffff8880c2038d68
[  357.865178][ T1000] FS:  00007fd9a5b8c740(0000) GS:ffff8880d9c00000(0000) knlGS:0000000000000000
[  357.866248][ T1000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  357.867531][ T1000] CR2: 000055ce01ba8100 CR3: 00000000b7dbe005 CR4: 00000000000606f0
[  357.868972][ T1000] Call Trace:
[  357.869423][ T1000]  ? lock_contended+0xcd0/0xcd0
[  357.870001][ T1000]  ? __alloc_pages_slowpath+0x21d0/0x21d0
[  357.870673][ T1000]  ? _kstrtoull+0x76/0x160
[  357.871148][ T1000]  ? alloc_pages_current+0xc1/0x1a0
[  357.871704][ T1000]  kmalloc_order+0x22/0x80
[  357.872184][ T1000]  kmalloc_order_trace+0x1d/0x140
[  357.872733][ T1000]  __kmalloc+0x302/0x3a0
[  357.873204][ T1000]  nsim_bus_dev_numvfs_store+0x1ab/0x260 [netdevsim]
[  357.873919][ T1000]  ? kernfs_get_active+0x12c/0x180
[  357.874459][ T1000]  ? new_device_store+0x450/0x450 [netdevsim]
[  357.875111][ T1000]  ? kernfs_get_parent+0x70/0x70
[  357.875632][ T1000]  ? sysfs_file_ops+0x160/0x160
[  357.876152][ T1000]  kernfs_fop_write+0x276/0x410
[  357.876680][ T1000]  ? __sb_start_write+0x1ba/0x2e0
[  357.877225][ T1000]  vfs_write+0x197/0x4a0
[  357.877671][ T1000]  ksys_write+0x141/0x1d0
[ ... ]

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Fixes: 7957922056 ("netdevsim: add SR-IOV functionality")
Fixes: 82c93a87bf ("netdevsim: implement couple of testing devlink health reporters")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-02-03 15:32:20 -08:00
Taehee Yoo
6556ff32f1 netdevsim: use IS_ERR instead of IS_ERR_OR_NULL for debugfs
Debugfs APIs return valid pointer or error pointer. it doesn't return NULL.
So, using IS_ERR is enough, not using IS_ERR_OR_NULL.

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-02-03 15:32:20 -08:00
Taehee Yoo
6fb8852b12 netdevsim: fix stack-out-of-bounds in nsim_dev_debugfs_init()
When netdevsim dev is being created, a debugfs directory is created.
The variable "dev_ddir_name" is 16bytes device name pointer and device
name is "netdevsim<dev id>".
The maximum dev id length is 10.
So, 16bytes for device name isn't enough.

Test commands:
    modprobe netdevsim
    echo "1000000000 0" > /sys/bus/netdevsim/new_device

Splat looks like:
[  249.622710][  T900] BUG: KASAN: stack-out-of-bounds in number+0x824/0x880
[  249.623658][  T900] Write of size 1 at addr ffff88804c527988 by task bash/900
[  249.624521][  T900]
[  249.624830][  T900] CPU: 1 PID: 900 Comm: bash Not tainted 5.5.0+ #322
[  249.625691][  T900] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[  249.626712][  T900] Call Trace:
[  249.627103][  T900]  dump_stack+0x96/0xdb
[  249.627639][  T900]  ? number+0x824/0x880
[  249.628173][  T900]  print_address_description.constprop.5+0x1be/0x360
[  249.629022][  T900]  ? number+0x824/0x880
[  249.629569][  T900]  ? number+0x824/0x880
[  249.630105][  T900]  __kasan_report+0x12a/0x170
[  249.630717][  T900]  ? number+0x824/0x880
[  249.631201][  T900]  kasan_report+0xe/0x20
[  249.631723][  T900]  number+0x824/0x880
[  249.632235][  T900]  ? put_dec+0xa0/0xa0
[  249.632716][  T900]  ? rcu_read_lock_sched_held+0x90/0xc0
[  249.633392][  T900]  vsnprintf+0x63c/0x10b0
[  249.633983][  T900]  ? pointer+0x5b0/0x5b0
[  249.634543][  T900]  ? mark_lock+0x11d/0xc40
[  249.635200][  T900]  sprintf+0x9b/0xd0
[  249.635750][  T900]  ? scnprintf+0xe0/0xe0
[  249.636370][  T900]  nsim_dev_probe+0x63c/0xbf0 [netdevsim]
[ ... ]

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Fixes: ab1d0cc004 ("netdevsim: change debugfs tree topology")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-02-03 15:32:20 -08:00
Taehee Yoo
8526ad9646 netdevsim: fix panic in nsim_dev_take_snapshot_write()
nsim_dev_take_snapshot_write() uses nsim_dev and nsim_dev->dummy_region.
So, during this function, these data shouldn't be removed.
But there is no protecting stuff in this function.

There are two similar cases.
1. reload case
reload could be called during nsim_dev_take_snapshot_write().
When reload is being executed, nsim_dev_reload_down() is called and it
calls nsim_dev_reload_destroy(). nsim_dev_reload_destroy() calls
devlink_region_destroy() to destroy nsim_dev->dummy_region.
So, during nsim_dev_take_snapshot_write(), nsim_dev->dummy_region()
would be removed.
At this point, snapshot_write() would access freed pointer.
In order to fix this case, take_snapshot file will be removed before
devlink_region_destroy().
The take_snapshot file will be re-created by ->reload_up().

2. del_device_store case
del_device_store() also could call nsim_dev_reload_destroy()
during nsim_dev_take_snapshot_write(). If so, panic would occur.
This problem is actually the same problem with the first case.
So, this problem will be fixed by the first case's solution.

Test commands:
    modprobe netdevsim
    while :
    do
        echo 1 > /sys/bus/netdevsim/new_device &
        echo 1 > /sys/bus/netdevsim/del_device &
	devlink dev reload netdevsim/netdevsim1 &
	echo 1 > /sys/kernel/debug/netdevsim/netdevsim1/take_snapshot &
    done

Splat looks like:
[   45.564513][  T975] general protection fault, probably for non-canonical address 0xdffffc000000003a: 0000 [#1] SMP DEI
[   45.566131][  T975] KASAN: null-ptr-deref in range [0x00000000000001d0-0x00000000000001d7]
[   45.566135][  T975] CPU: 1 PID: 975 Comm: bash Not tainted 5.5.0+ #322
[   45.569020][  T975] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[   45.569026][  T975] RIP: 0010:__mutex_lock+0x10a/0x14b0
[   45.570518][  T975] Code: 08 84 d2 0f 85 7f 12 00 00 44 8b 0d 10 23 65 02 45 85 c9 75 29 49 8d 7f 68 48 b8 00 00 00 0f
[   45.570522][  T975] RSP: 0018:ffff888046ccfbf0 EFLAGS: 00010206
[   45.572305][  T975] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
[   45.572308][  T975] RDX: 000000000000003a RSI: ffffffffac926440 RDI: 00000000000001d0
[   45.576843][  T975] RBP: ffff888046ccfd70 R08: ffffffffab610645 R09: 0000000000000000
[   45.576847][  T975] R10: ffff888046ccfd90 R11: ffffed100d6360ad R12: 0000000000000000
[   45.578471][  T975] R13: dffffc0000000000 R14: ffffffffae1976c0 R15: 0000000000000168
[   45.578475][  T975] FS:  00007f614d6e7740(0000) GS:ffff88806c400000(0000) knlGS:0000000000000000
[   45.581492][  T975] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   45.582942][  T975] CR2: 00005618677d1cf0 CR3: 000000005fb9c002 CR4: 00000000000606e0
[   45.584543][  T975] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   45.586633][  T975] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   45.589889][  T975] Call Trace:
[   45.591445][  T975]  ? devlink_region_snapshot_create+0x55/0x4a0
[   45.601250][  T975]  ? mutex_lock_io_nested+0x1380/0x1380
[   45.602817][  T975]  ? mutex_lock_io_nested+0x1380/0x1380
[   45.603875][  T975]  ? mark_held_locks+0xa5/0xe0
[   45.604769][  T975]  ? _raw_spin_unlock_irqrestore+0x2d/0x50
[   45.606147][  T975]  ? __mutex_unlock_slowpath+0xd0/0x670
[   45.607723][  T975]  ? crng_backtrack_protect+0x80/0x80
[   45.613530][  T975]  ? wait_for_completion+0x390/0x390
[   45.615152][  T975]  ? devlink_region_snapshot_create+0x55/0x4a0
[   45.616834][  T975]  devlink_region_snapshot_create+0x55/0x4a0
[ ... ]

Fixes: 4418f862d6 ("netdevsim: implement support for devlink region and snapshots")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-02-03 15:32:20 -08:00
Taehee Yoo
6ab63366e1 netdevsim: disable devlink reload when resources are being used
devlink reload destroys resources and allocates resources again.
So, when devices and ports resources are being used, devlink reload
function should not be executed. In order to avoid this race, a new
lock is added and new_port() and del_port() call devlink_reload_disable()
and devlink_reload_enable().

Thread0                      Thread1
{new/del}_port()             {new/del}_port()
devlink_reload_disable()
                             devlink_reload_disable()
devlink_reload_enable()
                             //here
                             devlink_reload_enable()

Before Thread1's devlink_reload_enable(), the devlink is already allowed
to execute reload because Thread0 allows it. devlink reload disable/enable
variable type is bool. So the above case would exist.
So, disable/enable should be executed atomically.
In order to do that, a new lock is used.

Test commands:
    modprobe netdevsim
    echo 1 > /sys/bus/netdevsim/new_device
    while :
    do
        echo 1 > /sys/devices/netdevsim1/new_port &
        echo 1 > /sys/devices/netdevsim1/del_port &
        devlink dev reload netdevsim/netdevsim1 &
    done

Splat looks like:
[   23.342145][  T932] DEBUG_LOCKS_WARN_ON(mutex_is_locked(lock))
[   23.342159][  T932] WARNING: CPU: 0 PID: 932 at kernel/locking/mutex-debug.c:103 mutex_destroy+0xc7/0xf0
[   23.344182][  T932] Modules linked in: netdevsim openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_dx
[   23.346485][  T932] CPU: 0 PID: 932 Comm: devlink Not tainted 5.5.0+ #322
[   23.347696][  T932] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[   23.348893][  T932] RIP: 0010:mutex_destroy+0xc7/0xf0
[   23.349505][  T932] Code: e0 07 83 c0 03 38 d0 7c 04 84 d2 75 2e 8b 05 00 ac b0 02 85 c0 75 8b 48 c7 c6 00 5e 07 96 40
[   23.351887][  T932] RSP: 0018:ffff88806208f810 EFLAGS: 00010286
[   23.353963][  T932] RAX: dffffc0000000008 RBX: ffff888067f6f2c0 RCX: ffffffff942c4bd4
[   23.355222][  T932] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff96dac5b4
[   23.356169][  T932] RBP: ffff888067f6f000 R08: fffffbfff2d235a5 R09: fffffbfff2d235a5
[   23.357160][  T932] R10: 0000000000000001 R11: fffffbfff2d235a4 R12: ffff888067f6f208
[   23.358288][  T932] R13: ffff88806208fa70 R14: ffff888067f6f000 R15: ffff888069ce3800
[   23.359307][  T932] FS:  00007fe2a3876740(0000) GS:ffff88806c000000(0000) knlGS:0000000000000000
[   23.360473][  T932] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   23.361319][  T932] CR2: 00005561357aa000 CR3: 000000005227a006 CR4: 00000000000606f0
[   23.362323][  T932] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   23.363417][  T932] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   23.364414][  T932] Call Trace:
[   23.364828][  T932]  nsim_dev_reload_destroy+0x77/0xb0 [netdevsim]
[   23.365655][  T932]  nsim_dev_reload_down+0x84/0xb0 [netdevsim]
[   23.366433][  T932]  devlink_reload+0xb1/0x350
[   23.367010][  T932]  genl_rcv_msg+0x580/0xe90

[ ...]

[   23.531729][ T1305] kernel BUG at lib/list_debug.c:53!
[   23.532523][ T1305] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[   23.533467][ T1305] CPU: 2 PID: 1305 Comm: bash Tainted: G        W         5.5.0+ #322
[   23.534962][ T1305] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[   23.536503][ T1305] RIP: 0010:__list_del_entry_valid+0xe6/0x150
[   23.538346][ T1305] Code: 89 ea 48 c7 c7 00 73 1e 96 e8 df f7 4c ff 0f 0b 48 c7 c7 60 73 1e 96 e8 d1 f7 4c ff 0f 0b 44
[   23.541068][ T1305] RSP: 0018:ffff888047c27b58 EFLAGS: 00010282
[   23.542001][ T1305] RAX: 0000000000000054 RBX: ffff888067f6f318 RCX: 0000000000000000
[   23.543051][ T1305] RDX: 0000000000000054 RSI: 0000000000000008 RDI: ffffed1008f84f61
[   23.544072][ T1305] RBP: ffff88804aa0fca0 R08: ffffed100d940539 R09: ffffed100d940539
[   23.545085][ T1305] R10: 0000000000000001 R11: ffffed100d940538 R12: ffff888047c27cb0
[   23.546422][ T1305] R13: ffff88806208b840 R14: ffffffff981976c0 R15: ffff888067f6f2c0
[   23.547406][ T1305] FS:  00007f76c0431740(0000) GS:ffff88806c800000(0000) knlGS:0000000000000000
[   23.548527][ T1305] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   23.549389][ T1305] CR2: 00007f5048f1a2f8 CR3: 000000004b310006 CR4: 00000000000606e0
[   23.550636][ T1305] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   23.551578][ T1305] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   23.552597][ T1305] Call Trace:
[   23.553004][ T1305]  mutex_remove_waiter+0x101/0x520
[   23.553646][ T1305]  __mutex_lock+0xac7/0x14b0
[   23.554218][ T1305]  ? nsim_dev_port_del+0x4e/0x140 [netdevsim]
[   23.554908][ T1305]  ? mutex_lock_io_nested+0x1380/0x1380
[   23.555570][ T1305]  ? _parse_integer+0xf0/0xf0
[   23.556043][ T1305]  ? kstrtouint+0x86/0x110
[   23.556504][ T1305]  ? nsim_dev_port_del+0x4e/0x140 [netdevsim]
[   23.557133][ T1305]  nsim_dev_port_del+0x4e/0x140 [netdevsim]
[   23.558024][ T1305]  del_port_store+0xcc/0xf0 [netdevsim]
[ ... ]

Fixes: 75ba029f3c ("netdevsim: implement proper devlink reload")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-02-03 15:32:20 -08:00
Taehee Yoo
f5cd21605e netdevsim: fix using uninitialized resources
When module is being initialized, __init() calls bus_register() and
driver_register().
These functions internally create various resources and sysfs files.
The sysfs files are used for basic operations(add/del device).
/sys/bus/netdevsim/new_device
/sys/bus/netdevsim/del_device

These sysfs files use netdevsim resources, they are mostly allocated
and initialized in ->probe() function, which is nsim_dev_probe().
But, sysfs files could be executed before ->probe() is finished.
So, accessing uninitialized data would occur.

Another problem is very similar.
/sys/bus/netdevsim/new_device internally creates sysfs files.
/sys/devices/netdevsim<id>/new_port
/sys/devices/netdevsim<id>/del_port

These sysfs files also use netdevsim resources, they are mostly allocated
and initialized in creating device routine, which is nsim_bus_dev_new().
But they also could be executed before nsim_bus_dev_new() is finished.
So, accessing uninitialized data would occur.

To fix these problems, this patch adds flags, which means whether the
operation is finished or not.
The flag variable 'nsim_bus_enable' means whether netdevsim bus was
initialized or not.
This is protected by nsim_bus_dev_list_lock.
The flag variable 'nsim_bus_dev->init' means whether nsim_bus_dev was
initialized or not.
This could be used in {new/del}_port_store() with no lock.

Test commands:
    #SHELL1
    modprobe netdevsim
    while :
    do
        echo "1 1" > /sys/bus/netdevsim/new_device
        echo "1 1" > /sys/bus/netdevsim/del_device
    done

    #SHELL2
    while :
    do
        echo 1 > /sys/devices/netdevsim1/new_port
        echo 1 > /sys/devices/netdevsim1/del_port
    done

Splat looks like:
[   47.508954][ T1008] general protection fault, probably for non-canonical address 0xdffffc0000000021: 0000 I
[   47.510793][ T1008] KASAN: null-ptr-deref in range [0x0000000000000108-0x000000000000010f]
[   47.511963][ T1008] CPU: 2 PID: 1008 Comm: bash Not tainted 5.5.0+ #322
[   47.512823][ T1008] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[   47.514041][ T1008] RIP: 0010:__mutex_lock+0x10a/0x14b0
[   47.514699][ T1008] Code: 08 84 d2 0f 85 7f 12 00 00 44 8b 0d 10 23 65 02 45 85 c9 75 29 49 8d 7f 68 48 b8 00 00 00 0f
[   47.517163][ T1008] RSP: 0018:ffff888059b4fbb0 EFLAGS: 00010206
[   47.517802][ T1008] RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
[   47.518941][ T1008] RDX: 0000000000000021 RSI: ffffffff85926440 RDI: 0000000000000108
[   47.519732][ T1008] RBP: ffff888059b4fd30 R08: ffffffffc073fad0 R09: 0000000000000000
[   47.520729][ T1008] R10: ffff888059b4fd50 R11: ffff88804bb38040 R12: 0000000000000000
[   47.521702][ T1008] R13: dffffc0000000000 R14: ffffffff871976c0 R15: 00000000000000a0
[   47.522760][ T1008] FS:  00007fd4be05a740(0000) GS:ffff88806c800000(0000) knlGS:0000000000000000
[   47.523877][ T1008] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   47.524627][ T1008] CR2: 0000561c82b69cf0 CR3: 0000000065dd6004 CR4: 00000000000606e0
[   47.527662][ T1008] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   47.528604][ T1008] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[   47.529531][ T1008] Call Trace:
[   47.529874][ T1008]  ? nsim_dev_port_add+0x50/0x150 [netdevsim]
[   47.530470][ T1008]  ? mutex_lock_io_nested+0x1380/0x1380
[   47.531018][ T1008]  ? _kstrtoull+0x76/0x160
[   47.531449][ T1008]  ? _parse_integer+0xf0/0xf0
[   47.531874][ T1008]  ? kernfs_fop_write+0x1cf/0x410
[   47.532330][ T1008]  ? sysfs_file_ops+0x160/0x160
[   47.532773][ T1008]  ? kstrtouint+0x86/0x110
[   47.533168][ T1008]  ? nsim_dev_port_add+0x50/0x150 [netdevsim]
[   47.533721][ T1008]  nsim_dev_port_add+0x50/0x150 [netdevsim]
[   47.534336][ T1008]  ? sysfs_file_ops+0x160/0x160
[   47.534858][ T1008]  new_port_store+0x99/0xb0 [netdevsim]
[   47.535439][ T1008]  ? del_port_store+0xb0/0xb0 [netdevsim]
[   47.536035][ T1008]  ? sysfs_file_ops+0x112/0x160
[   47.536544][ T1008]  ? sysfs_kf_write+0x3b/0x180
[   47.537029][ T1008]  kernfs_fop_write+0x276/0x410
[   47.537548][ T1008]  ? __sb_start_write+0x215/0x2e0
[   47.538110][ T1008]  vfs_write+0x197/0x4a0
[ ... ]

Fixes: f9d9db47d3 ("netdevsim: add bus attributes to add new and delete devices")
Fixes: 794b2c05ca ("netdevsim: extend device attrs to support port addition and deletion")
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-02-03 15:32:20 -08:00
David S. Miller
b3f7e3f23a Merge ra.kernel.org:/pub/scm/linux/kernel/git/netdev/net 2020-01-19 22:10:04 +01:00
Eric Dumazet
41cdc74104 netdevsim: fix nsim_fib6_rt_create() error path
It seems nsim_fib6_rt_create() intent was to return
either a valid pointer or an embedded error code.

BUG: unable to handle page fault for address: fffffffffffffff4
PGD 9870067 P4D 9870067 PUD 9872067 PMD 0
Oops: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 22851 Comm: syz-executor.1 Not tainted 5.5.0-rc5-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:jhash2 include/linux/jhash.h:125 [inline]
RIP: 0010:rhashtable_jhash2+0x76/0x2c0 lib/rhashtable.c:963
Code: b9 00 00 00 00 00 fc ff df 48 c1 e8 03 0f b6 14 08 4c 89 f0 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 30 02 00 00 49 8d 7e 04 <41> 8b 06 48 be 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 0f b6
RSP: 0018:ffffc90016127190 EFLAGS: 00010246
RAX: 0000000000000007 RBX: 00000000dfb3ab49 RCX: dffffc0000000000
RDX: 0000000000000000 RSI: ffffffff839ba7c8 RDI: fffffffffffffff8
RBP: ffffc900161271c0 R08: ffff8880951f8640 R09: ffffed1015d0703d
R10: ffffed1015d0703c R11: ffff8880ae8381e3 R12: 00000000dfb3ab49
R13: 00000000dfb3ab49 R14: fffffffffffffff4 R15: 0000000000000007
FS:  00007f40bfbc6700(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: fffffffffffffff4 CR3: 0000000093660000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 rht_key_get_hash include/linux/rhashtable.h:133 [inline]
 rht_key_hashfn include/linux/rhashtable.h:159 [inline]
 rht_head_hashfn include/linux/rhashtable.h:174 [inline]
 __rhashtable_insert_fast.constprop.0+0xe15/0x1180 include/linux/rhashtable.h:723
 rhashtable_insert_fast include/linux/rhashtable.h:832 [inline]
 nsim_fib6_rt_add drivers/net/netdevsim/fib.c:603 [inline]
 nsim_fib6_rt_insert drivers/net/netdevsim/fib.c:658 [inline]
 nsim_fib6_event drivers/net/netdevsim/fib.c:719 [inline]
 nsim_fib_event drivers/net/netdevsim/fib.c:744 [inline]
 nsim_fib_event_nb+0x1b16/0x2600 drivers/net/netdevsim/fib.c:772
 notifier_call_chain+0xc2/0x230 kernel/notifier.c:83
 __atomic_notifier_call_chain+0xa6/0x1a0 kernel/notifier.c:173
 atomic_notifier_call_chain+0x2e/0x40 kernel/notifier.c:183
 call_fib_notifiers+0x173/0x2a0 net/core/fib_notifier.c:35
 call_fib6_notifiers+0x4b/0x60 net/ipv6/fib6_notifier.c:22
 call_fib6_entry_notifiers+0xfb/0x150 net/ipv6/ip6_fib.c:399
 fib6_add_rt2node net/ipv6/ip6_fib.c:1216 [inline]
 fib6_add+0x20cd/0x3ec0 net/ipv6/ip6_fib.c:1471
 __ip6_ins_rt+0x54/0x80 net/ipv6/route.c:1315
 ip6_ins_rt+0x96/0xd0 net/ipv6/route.c:1325
 __ipv6_dev_ac_inc+0x76f/0xb20 net/ipv6/anycast.c:324
 ipv6_sock_ac_join+0x4c1/0x790 net/ipv6/anycast.c:139
 do_ipv6_setsockopt.isra.0+0x3908/0x4290 net/ipv6/ipv6_sockglue.c:670
 ipv6_setsockopt+0xff/0x180 net/ipv6/ipv6_sockglue.c:944
 udpv6_setsockopt+0x68/0xb0 net/ipv6/udp.c:1564
 sock_common_setsockopt+0x94/0xd0 net/core/sock.c:3149
 __sys_setsockopt+0x261/0x4c0 net/socket.c:2130
 __do_sys_setsockopt net/socket.c:2146 [inline]
 __se_sys_setsockopt net/socket.c:2143 [inline]
 __x64_sys_setsockopt+0xbe/0x150 net/socket.c:2143
 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x45aff9

Fixes: 48bb9eb47b ("netdevsim: fib: Add dummy implementation for FIB offload")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-17 11:00:57 +01:00
Ido Schimmel
48bb9eb47b netdevsim: fib: Add dummy implementation for FIB offload
Implement dummy IPv4 and IPv6 FIB "offload" in the driver by storing
currently "programmed" routes in a hash table. Each route in the hash
table is marked with "trap" indication. The indication is cleared when
the route is replaced or when the netdevsim instance is deleted.

This will later allow us to test the route offload API on top of
netdevsim.

v2:
* Convert to new fib_alias_hw_flags_set() interface

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-14 18:53:35 -08:00
Jacob Keller
b0efcae5e1 devlink: correct misspelling of snapshot
The function to obtain a unique snapshot id was mistakenly typo'd as
devlink_region_shapshot_id_get. Fix this typo by renaming the function
and all of its users.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-11 14:30:24 -08:00
Jacob Keller
04e4272cb3 devlink: rename and expand devlink-trap-netdevsim.rst
Rename the trap-specific netdevimsim.rst file, and expand it to include
documentation of all the devlink features currently implemented by the
netdevsim driver code.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-10 17:07:00 -08:00
Jacob Keller
f4bdd71036 devlink: move devlink documentation to subfolder
Combine the documentation for devlink into a subfolder, and provide an
index.rst file that can be used to generally describe devlink.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-01-10 17:06:59 -08:00
Ido Schimmel
caafb2509f ipv6: Remove old route notifications and convert listeners
Now that mlxsw is converted to use the new FIB notifications it is
possible to delete the old ones and use the new replace / append /
delete notifications.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-24 22:37:30 -08:00
Ido Schimmel
446f739104 ipv4: Remove old route notifications and convert listeners
Unlike mlxsw, the other listeners to the FIB notification chain do not
require any special modifications as they never considered multiple
identical routes.

This patch removes the old route notifications and converts all the
listeners to use the new replace / delete notifications.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-12-16 16:14:43 -08:00
Aya Levin
8fdcd8fb21 netdevsim: Update dummy reporter's devlink binary interface
Update dummy reporter's output to use updated devlink interface of
binary fmsg pair.

Signed-off-by: Aya Levin <ayal@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-12 11:25:44 -08:00
Jiri Pirko
a0c76345e3 devlink: disallow reload operation during device cleanup
There is a race between driver code that does setup/cleanup of device
and devlink reload operation that in some drivers works with the same
code. Use after free could we easily obtained by running:

while true; do
        echo 10 > /sys/bus/netdevsim/new_device
        devlink dev reload netdevsim/netdevsim10 &
        echo 10 > /sys/bus/netdevsim/del_device
done

Fix this by enabling reload only after setup of device is complete and
disabling it at the beginning of the cleanup process.

Reported-by: Ido Schimmel <idosch@mellanox.com>
Fixes: 2d8dc5bbf4 ("devlink: Add support for reload")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-08 14:11:47 -08:00
Jakub Kicinski
bfcccfe78b netdevsim: drop code duplicated by a merge
Looks like the port adding loop makes a re-appearance on net-next
after net was merged back into it (even though it doesn't feature
in the merge diff).

The ports are already added in nsim_dev_create() so when we try
to add them again get EEXIST, and see:

netdevsim: probe of netdevsim0 failed with error -17

in the logs. When we remove the loop again the nsim_dev_probe()
and nsim_dev_remove() become a wrapper of nsim_dev_create() and
nsim_dev_destroy(). Remove this layer of indirection.

Fixes: d31e95585c ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-06 09:59:57 -08:00
David S. Miller
d31e95585c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
The only slightly tricky merge conflict was the netdevsim because the
mutex locking fix overlapped a lot of driver reload reorganization.

The rest were (relatively) trivial in nature.

Signed-off-by: David S. Miller <davem@davemloft.net>
2019-11-02 13:54:56 -07:00
Ido Schimmel
6d6f0383b6 netdevsim: Fix use-after-free during device dismantle
Commit da58f90f11 ("netdevsim: Add devlink-trap support") added
delayed work to netdevsim that periodically iterates over the registered
netdevsim ports and reports various packet traps via devlink.

While the delayed work takes the 'port_list_lock' mutex to protect
against concurrent addition / deletion of ports, during device creation
/ dismantle ports are added / deleted without this lock, which can
result in a use-after-free [1].

Fix this by making sure that the ports list is always modified under the
lock.

[1]
[   59.205543] ==================================================================
[   59.207748] BUG: KASAN: use-after-free in nsim_dev_trap_report_work+0xa67/0xad0
[   59.210247] Read of size 8 at addr ffff8883cbdd3398 by task kworker/3:1/38
[   59.212584]
[   59.213148] CPU: 3 PID: 38 Comm: kworker/3:1 Not tainted 5.4.0-rc3-custom-16119-ge6abb5f0261e #2013
[   59.215896] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20180724_192412-buildhw-07.phx2.fedoraproject.org-1.fc29 04/01/2014
[   59.218384] Workqueue: events nsim_dev_trap_report_work
[   59.219428] Call Trace:
[   59.219924]  dump_stack+0xa9/0x10e
[   59.220623]  print_address_description.constprop.4+0x21/0x340
[   59.221976]  ? vprintk_func+0x66/0x240
[   59.222752]  __kasan_report.cold.8+0x78/0x91
[   59.223602]  ? nsim_dev_trap_report_work+0xa67/0xad0
[   59.224603]  kasan_report+0xe/0x20
[   59.225296]  nsim_dev_trap_report_work+0xa67/0xad0
[   59.226435]  ? rcu_read_lock_sched_held+0xaf/0xe0
[   59.227512]  ? trace_event_raw_event_rcu_quiescent_state_report+0x360/0x360
[   59.228851]  process_one_work+0x98f/0x1760
[   59.229684]  ? pwq_dec_nr_in_flight+0x330/0x330
[   59.230656]  worker_thread+0x91/0xc40
[   59.231587]  ? process_one_work+0x1760/0x1760
[   59.232451]  kthread+0x34a/0x410
[   59.233104]  ? __kthread_queue_delayed_work+0x240/0x240
[   59.234141]  ret_from_fork+0x3a/0x50
[   59.234982]
[   59.235371] Allocated by task 187:
[   59.236189]  save_stack+0x19/0x80
[   59.236853]  __kasan_kmalloc.constprop.5+0xc1/0xd0
[   59.237822]  kmem_cache_alloc_trace+0x14c/0x380
[   59.238769]  __nsim_dev_port_add+0xaf/0x5c0
[   59.239627]  nsim_dev_probe+0x4fc/0x1140
[   59.240550]  really_probe+0x264/0xc00
[   59.241418]  driver_probe_device+0x208/0x2e0
[   59.242255]  __device_attach_driver+0x215/0x2d0
[   59.243150]  bus_for_each_drv+0x154/0x1d0
[   59.243944]  __device_attach+0x1ba/0x2b0
[   59.244923]  bus_probe_device+0x1dd/0x290
[   59.245805]  device_add+0xbac/0x1550
[   59.246528]  new_device_store+0x1f4/0x400
[   59.247306]  bus_attr_store+0x7b/0xa0
[   59.248047]  sysfs_kf_write+0x10f/0x170
[   59.248941]  kernfs_fop_write+0x283/0x430
[   59.249843]  __vfs_write+0x81/0x100
[   59.250546]  vfs_write+0x1ce/0x510
[   59.251190]  ksys_write+0x104/0x200
[   59.251873]  do_syscall_64+0xa4/0x4e0
[   59.252642]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   59.253837]
[   59.254203] Freed by task 187:
[   59.254811]  save_stack+0x19/0x80
[   59.255463]  __kasan_slab_free+0x125/0x170
[   59.256265]  kfree+0x100/0x440
[   59.256870]  nsim_dev_remove+0x98/0x100
[   59.257651]  nsim_bus_remove+0x16/0x20
[   59.258382]  device_release_driver_internal+0x20b/0x4d0
[   59.259588]  bus_remove_device+0x2e9/0x5a0
[   59.260551]  device_del+0x410/0xad0
[   59.263777]  device_unregister+0x26/0xc0
[   59.264616]  nsim_bus_dev_del+0x16/0x60
[   59.265381]  del_device_store+0x2d6/0x3c0
[   59.266295]  bus_attr_store+0x7b/0xa0
[   59.267192]  sysfs_kf_write+0x10f/0x170
[   59.267960]  kernfs_fop_write+0x283/0x430
[   59.268800]  __vfs_write+0x81/0x100
[   59.269551]  vfs_write+0x1ce/0x510
[   59.270252]  ksys_write+0x104/0x200
[   59.270910]  do_syscall_64+0xa4/0x4e0
[   59.271680]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[   59.272812]
[   59.273211] The buggy address belongs to the object at ffff8883cbdd3200
[   59.273211]  which belongs to the cache kmalloc-512 of size 512
[   59.275838] The buggy address is located 408 bytes inside of
[   59.275838]  512-byte region [ffff8883cbdd3200, ffff8883cbdd3400)
[   59.278151] The buggy address belongs to the page:
[   59.279215] page:ffffea000f2f7400 refcount:1 mapcount:0 mapping:ffff8883ecc0ce00 index:0x0 compound_mapcount: 0
[   59.281449] flags: 0x200000000010200(slab|head)
[   59.282356] raw: 0200000000010200 ffffea000f2f3a08 ffffea000f2fd608 ffff8883ecc0ce00
[   59.283949] raw: 0000000000000000 0000000000150015 00000001ffffffff 0000000000000000
[   59.285608] page dumped because: kasan: bad access detected
[   59.286981]
[   59.287337] Memory state around the buggy address:
[   59.288310]  ffff8883cbdd3280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   59.289763]  ffff8883cbdd3300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   59.291452] >ffff8883cbdd3380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[   59.292945]                             ^
[   59.293815]  ffff8883cbdd3400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   59.295220]  ffff8883cbdd3480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   59.296872] ==================================================================

Fixes: da58f90f11 ("netdevsim: Add devlink-trap support")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: syzbot+9ed8f68ab30761f3678e@syzkaller.appspotmail.com
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-31 12:35:14 -07:00
YueHaibing
33902b4a42 netdevsim: Fix error handling in nsim_fib_init and nsim_fib_exit
In nsim_fib_init(), if register_fib_notifier failed, nsim_fib_net_ops
should be unregistered before return.

In nsim_fib_exit(), unregister_fib_notifier should be called before
nsim_fib_net_ops be unregistered, otherwise may cause use-after-free:

BUG: KASAN: use-after-free in nsim_fib_event_nb+0x342/0x570 [netdevsim]
Read of size 8 at addr ffff8881daaf4388 by task kworker/0:3/3499

CPU: 0 PID: 3499 Comm: kworker/0:3 Not tainted 5.3.0-rc7+ #30
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
Workqueue: ipv6_addrconf addrconf_dad_work [ipv6]
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0xa9/0x10e lib/dump_stack.c:113
 print_address_description+0x65/0x380 mm/kasan/report.c:351
 __kasan_report+0x149/0x18d mm/kasan/report.c:482
 kasan_report+0xe/0x20 mm/kasan/common.c:618
 nsim_fib_event_nb+0x342/0x570 [netdevsim]
 notifier_call_chain+0x52/0xf0 kernel/notifier.c:95
 __atomic_notifier_call_chain+0x78/0x140 kernel/notifier.c:185
 call_fib_notifiers+0x30/0x60 net/core/fib_notifier.c:30
 call_fib6_entry_notifiers+0xc1/0x100 [ipv6]
 fib6_add+0x92e/0x1b10 [ipv6]
 __ip6_ins_rt+0x40/0x60 [ipv6]
 ip6_ins_rt+0x84/0xb0 [ipv6]
 __ipv6_ifa_notify+0x4b6/0x550 [ipv6]
 ipv6_ifa_notify+0xa5/0x180 [ipv6]
 addrconf_dad_completed+0xca/0x640 [ipv6]
 addrconf_dad_work+0x296/0x960 [ipv6]
 process_one_work+0x5c0/0xc00 kernel/workqueue.c:2269
 worker_thread+0x5c/0x670 kernel/workqueue.c:2415
 kthread+0x1d7/0x200 kernel/kthread.c:255
 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352

Allocated by task 3388:
 save_stack+0x19/0x80 mm/kasan/common.c:69
 set_track mm/kasan/common.c:77 [inline]
 __kasan_kmalloc.constprop.3+0xa0/0xd0 mm/kasan/common.c:493
 kmalloc include/linux/slab.h:557 [inline]
 kzalloc include/linux/slab.h:748 [inline]
 ops_init+0xa9/0x220 net/core/net_namespace.c:127
 __register_pernet_operations net/core/net_namespace.c:1135 [inline]
 register_pernet_operations+0x1d4/0x420 net/core/net_namespace.c:1212
 register_pernet_subsys+0x24/0x40 net/core/net_namespace.c:1253
 nsim_fib_init+0x12/0x70 [netdevsim]
 veth_get_link_ksettings+0x2b/0x50 [veth]
 do_one_initcall+0xd4/0x454 init/main.c:939
 do_init_module+0xe0/0x330 kernel/module.c:3490
 load_module+0x3c2f/0x4620 kernel/module.c:3841
 __do_sys_finit_module+0x163/0x190 kernel/module.c:3931
 do_syscall_64+0x72/0x2e0 arch/x86/entry/common.c:296
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Freed by task 3534:
 save_stack+0x19/0x80 mm/kasan/common.c:69
 set_track mm/kasan/common.c:77 [inline]
 __kasan_slab_free+0x130/0x180 mm/kasan/common.c:455
 slab_free_hook mm/slub.c:1423 [inline]
 slab_free_freelist_hook mm/slub.c:1474 [inline]
 slab_free mm/slub.c:3016 [inline]
 kfree+0xe9/0x2d0 mm/slub.c:3957
 ops_free net/core/net_namespace.c:151 [inline]
 ops_free_list.part.7+0x156/0x220 net/core/net_namespace.c:184
 ops_free_list net/core/net_namespace.c:182 [inline]
 __unregister_pernet_operations net/core/net_namespace.c:1165 [inline]
 unregister_pernet_operations+0x221/0x2a0 net/core/net_namespace.c:1224
 unregister_pernet_subsys+0x1d/0x30 net/core/net_namespace.c:1271
 nsim_fib_exit+0x11/0x20 [netdevsim]
 nsim_module_exit+0x16/0x21 [netdevsim]
 __do_sys_delete_module kernel/module.c:1015 [inline]
 __se_sys_delete_module kernel/module.c:958 [inline]
 __x64_sys_delete_module+0x244/0x330 kernel/module.c:958
 do_syscall_64+0x72/0x2e0 arch/x86/entry/common.c:296
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Reported-by: Hulk Robot <hulkci@huawei.com>
Fixes: 59c84b9fcf ("netdevsim: Restore per-network namespace accounting for fib entries")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-13 11:30:14 -07:00
Jiri Pirko
82c93a87bf netdevsim: implement couple of testing devlink health reporters
Implement "empty" and "dummy" reporters. The first one is really simple
and does nothing. The other one has debugfs files to trigger breakage
and it is able to do recovery. The ops also implement dummy fmsg
content.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-11 21:02:30 -07:00
Colin Ian King
f9867b51d2 netdevsim: fix spelling mistake "forbidded" -> "forbid"
There is a spelling mistake in a NL_SET_ERR_MSG_MOD message. Fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
2019-10-08 11:47:54 -07:00
Jiri Pirko
8e23cc0319 netdevsim: implement devlink dev_info op
Do simple dev_info devlink operation implementation which only fills up
the driver name.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-07 09:11:07 -04:00
Jiri Pirko
155ddfc5e5 netdevsim: add couple of debugfs bools to debug devlink reload
Add flag to disallow reload and another one that causes reload to
always fail.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-06 18:28:42 +02:00
Jiri Pirko
7b60027bbc netdevsim: create devlink and netdev instances in namespace
When user does create new netdevsim instance using sysfs bus file,
create the devlink instance and related netdev instance in the namespace
of the caller.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-10-05 16:34:15 -07:00