Add the set of info versions reported by bnxt_en driver, including
a description of what the version represents, and what modes (fixed,
running, stored) it reports.
v2: Use fw.psid.
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Display the following information via devlink info command:
- Driver name
- Board id
- Broad revision
- Board Serial number
- Board FW version
- FW parameter set version
- FW App version
- FW management version
- FW RoCE version
Standard output example:
$ devlink dev info pci/0000:3b:00.0
pci/0000:3b:00.0:
driver bnxt_en
serial_number 00-10-18-FF-FE-AD-05-00
versions:
fixed:
asic.id D802
asic.rev 1
running:
fw 216.1.124.0
fw.psid 0.0.0
fw.app 216.1.122.0
fw.mgmt 864.0.32.0
fw.roce 216.1.15.0
[ This version has incorporated changes suggested by Jakub Kicinski to
use generic devlink version tags. ]
v2: Use fw.psid
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add definition and documentation for the new generic info "fw.roce".
v2: Remove board.nvm_cfg since fw.psid is similar.
Cc: Jiri Pirko <jiri@mellanox.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of switch_id, renaming it to dsn will be more meaningful
so that it can be used to display device serial number in follow up
patch via devlink_info command.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds status notification to devlink flash update
while flashing is in progress.
$ devlink dev flash pci/0000:05:00.0 file 103.pkg
Preparing to flash
Flashing done
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Latest kernels get the phys_port_name via devlink, if
ndo_get_phys_port_name is not defined. To provide the phys_port_name
correctly, register devlink before registering netdev.
Also call devlink_port_type_eth_set() after registering netdev as
devlink port updates the netdev structure and notifies user.
Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This will allow to register for devlink port and use port features.
Also register params only if firmware spec version is at least 0x10600
which will support reading/setting numbered variables in NVRAM.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Define bnxt_dl_params_register() and bnxt_dl_params_unregister()
functions and move params register/unregister code to these newly
defined functions. This patch is in preparation to register
devlink irrespective of firmware spec. version in the next patch.
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The hardware bug has been fixed on B0 and newer chips, so disable the
workaround on these chips.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the only time we check and remove expired filters is
when we are inserting new filters.
Improving the aRFS expiry handling by adding code to do the above
work periodically.
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In bnxt_rx_flow_steer(), if the dissected packet is a fragment, do not
proceed to create the ntuple filter and return error instead. Otherwise
we would create a filter with 0 source and destination ports because
the dissected ports would not be available for fragments.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
575XX (P5) chips have the same UDP RSS hashing capability as P4 chips,
so we can enable it on P5 chips.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The dev_port is meant to distinguish the network ports belonging to
the same PCI function. Our devices only have one network port
associated with each PCI function and so we should not set it for
correctness.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the 2nd parameter fw_dflt is not set, we are calling bnxt_probe_phy()
after the firmware has reset. There is no need to query the current
PHY settings from firmware as these settings may be different from
the ethtool settings that the driver will re-establish later. So
return earlier in bnxt_probe_phy() to save one firmware call.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In bnxt_update_phy_setting(), ethtool_get_link_ksettings() and
bnxt_disable_an_for_lpbk(), we inconsistently use netif_carrier_ok()
to determine link. Instead, we should use bp->link_info.link_up
which has the true link state. The netif_carrier state may be off
during self-test and while the device is being reset and may not always
reflect the true link state.
By always using bp->link_info.link_up, the code is now more
consistent and more correct. Some unnecessary link toggles are
now prevented with this patch.
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Kubecek says:
====================
ethtool netlink interface, part 2
This shorter series adds support for getting and setting of wake-on-lan
settings and message mask (originally message level). Together with the
code already in net-next, this will allow full implementation of
"ethtool <dev>" and "ethtool -s <dev> ...".
Older versions of the ethtool netlink series allowed getting WoL settings
by unprivileged users and only filtered out the password but this was
a source of controversy so for now, ETHTOOL_MSG_WOL_GET request always
requires CAP_NET_ADMIN as ETHTOOL_GWOL ioctl request does.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Send ETHTOOL_MSG_WOL_NTF notification whenever wake-on-lan settings of
a device are modified using ETHTOOL_MSG_WOL_SET netlink message or
ETHTOOL_SWOL ioctl request.
As notifications can be received by anyone, do not include SecureOn(tm)
password in notification messages.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement WOL_SET netlink request to set wake-on-lan settings. This is
equivalent to ETHTOOL_SWOL ioctl request.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement WOL_GET request to get wake-on-lan settings for a device,
traditionally available via ETHTOOL_GWOL ioctl request.
As part of the implementation, provide symbolic names for wake-on-line
modes as ETH_SS_WOL_MODES string set.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Send ETHTOOL_MSG_DEBUG_NTF notification message whenever debugging message
mask for a device are modified using ETHTOOL_MSG_DEBUG_SET netlink message
or ETHTOOL_SMSGLVL ioctl request.
The notification message has the same format as reply to DEBUG_GET request.
As with other ethtool notifications, netlink requests only trigger the
notification if the mask is actually changed while ioctl request trigger it
whenever the request results in calling the ethtool_ops handler.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement DEBUG_SET netlink request to set debugging settings for a device.
At the moment, only message mask corresponding to message level as set by
ETHTOOL_SMSGLVL ioctl request can be set. (It is called message level in
ioctl interface but almost all drivers interpret it as a bit mask.)
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement DEBUG_GET request to get debugging settings for a device. At the
moment, only message mask corresponding to message level as reported by
ETHTOOL_GMSGLVL ioctl request is provided. (It is called message level in
ioctl interface but almost all drivers interpret it as a bit mask.)
As part of the implementation, provide symbolic names for message mask bits
as ETH_SS_MSG_CLASSES string set.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix missing or incorrect function argument and struct member descriptions.
Signed-off-by: Michal Kubecek <mkubecek@suse.cz>
Signed-off-by: David S. Miller <davem@davemloft.net>
* pm-core:
PM-runtime: add tracepoints for usage_count changes
* powercap:
powercap/intel_rapl: add support for JasperLake
x86/cpu: Add Jasper Lake to Intel family
powercap/intel_rapl: add support for TigerLake Mobile
* pm-opp:
opp: Replace list_kref with a local counter
opp: Free static OPPs on errors while adding them
* pm-avs:
power: avs: qcom-cpr: remove duplicated include from qcom-cpr.c
power: avs: fix uninitialized error return on failed cpr_read_fuse_uV() call
power: avs: qcom-cpr: make cpr_get_opp_hz_for_req() static
power: avs: qcom-cpr: remove set but unused variable
power: avs: qcom-cpr: make sure that regmap is available
power: avs: qcom-cpr: fix unsigned expression compared with zero
power: avs: qcom-cpr: fix invalid printk specifier in debug print
power: avs: Add support for CPR (Core Power Reduction)
dt-bindings: power: avs: Add support for CPR (Core Power Reduction)
* pm-misc:
mailmap: Add entry for <rjw@sisk.pl>
* pm-cpufreq:
cpufreq: loongson2_cpufreq: adjust cpufreq uses of LOONGSON_CHIPCFG
cpufreq: brcmstb-avs: fix imbalance of cpufreq policy refcount
cpufreq: intel_pstate: fix spelling mistake: "Whethet" -> "Whether"
cpufreq: s3c: fix unbalances of cpufreq policy refcount
cpufreq: imx-cpufreq-dt: Add i.MX8MP support
cpufreq: Use imx-cpufreq-dt for i.MX8MP's speed grading
cpufreq: tegra186: convert to devm_platform_ioremap_resource
cpufreq: kirkwood: convert to devm_platform_ioremap_resource
cpufreq: CPPC: put ACPI table after using it
cpufreq : CPPC: Break out if HiSilicon CPPC workaround is matched
* pm-sleep:
PM: suspend: Add sysfs attribute to control the "sync on suspend" behavior
PM: hibernate: fix spelling mistake "shapshot" -> "snapshot"
PM: hibernate: Add more logging on hibernation failure
PM: hibernate: improve arithmetic division in preallocate_highmem_fraction()
PM: wakeup: Show statistics for deleted wakeup sources again
PM: sleep: Switch to rtc_time64_to_tm()/rtc_tm_to_time64()
* pm-cpuidle: (27 commits)
intel_idle: Clean up irtl_2_usec()
intel_idle: Move 3 functions closer to their callers
intel_idle: Annotate initialization code and data structures
intel_idle: Move and clean up intel_idle_cpuidle_devices_uninit()
intel_idle: Rearrange intel_idle_cpuidle_driver_init()
intel_idle: Clean up NULL pointer check in intel_idle_init()
intel_idle: Fold intel_idle_probe() into intel_idle_init()
intel_idle: Eliminate __setup_broadcast_timer()
cpuidle: fix cpuidle_find_deepest_state() kerneldoc warnings
cpuidle: sysfs: fix warnings when compiling with W=1
cpuidle: coupled: fix warnings when compiling with W=1
Documentation: admin-guide: PM: Add intel_idle document
cpuidle: arm: Enable compile testing for some of drivers
cpuidle: Drop unused cpuidle_driver_ref/unref() functions
intel_idle: Use ACPI _CST on server systems
intel_idle: Add module parameter to prevent ACPI _CST from being used
intel_idle: Allow ACPI _CST to be used for selected known processors
cpuidle: Allow idle states to be disabled by default
intel_idle: Use ACPI _CST for processor models without C-state tables
intel_idle: Refactor intel_idle_cpuidle_driver_init()
...
Second set of patches for v5.6. Nothing special standing out, smaller
new features and fixes allover.
Major changes:
ar5523
* add support for SMCWUSBT-G2 USB device
iwlwifi
* support new versions of the FTM FW APIs
* support new version of the beacon template FW API
* print some extra information when the driver is loaded
rtw88
* support wowlan feature for 8822c
* add support for WIPHY_WOWLAN_NET_DETECT
brcmfmac
* add initial support for monitor mode
qtnfmac
* add module parameter to enable DFS offloading in firmware
* add support for STA HE rates
* add support for TWT responder and spatial reuse
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJeLcWsAAoJEG4XJFUm622bDGUIAJuhM4LiDAzOGnDjH5cMosOC
/qwDpGD19cEeVAhNSMKUn7Uo9+qbQYIleiO7XXxzCuJceYbxbT5s3Vb/fWsRUPjS
cBDmGuK8/giqMQJshXvfCTsoF83CyirCjY/MJvomK2BRlXM8hQ3s6hAcfU0zQ96e
OU1akcfZzUwjaBKaDYLncxLuGbeUXMy8AZwFdgQlQRMoObpen/IIwx6jDK/A+3l6
VCOJF+JTkDytfNWmTpZ65uJrYEXoLe6G4028FjOI4BDFmEgvdHC5vTm7VOBhrkuB
9tsKcYNvECCNn2WI96V+etD8kaZPscQIW8hgOrYyGxw43lJg5zH4Z9eWAmHojxE=
=UOhb
-----END PGP SIGNATURE-----
Merge tag 'wireless-drivers-next-2020-01-26' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
Kalle Valo says:
====================
wireless-drivers-next patches for v5.6
Second set of patches for v5.6. Nothing special standing out, smaller
new features and fixes allover.
Major changes:
ar5523
* add support for SMCWUSBT-G2 USB device
iwlwifi
* support new versions of the FTM FW APIs
* support new version of the beacon template FW API
* print some extra information when the driver is loaded
rtw88
* support wowlan feature for 8822c
* add support for WIPHY_WOWLAN_NET_DETECT
brcmfmac
* add initial support for monitor mode
qtnfmac
* add module parameter to enable DFS offloading in firmware
* add support for STA HE rates
* add support for TWT responder and spatial reuse
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Yoshiki Komachi says:
====================
When I tried a test based on the selftest program for BPF flow dissector
(test_flow_dissector.sh), I observed unexpected result as below:
$ tc filter add dev lo parent ffff: protocol ip pref 1337 flower ip_proto \
udp src_port 8-10 action drop
$ tools/testing/selftests/bpf/test_flow_dissector -i 4 -f 9 -F
inner.dest4: 127.0.0.1
inner.source4: 127.0.0.3
pkts: tx=10 rx=10
The last rx means the number of received packets. I expected rx=0 in this
test (i.e., all received packets should have been dropped), but it resulted
in acceptance.
Although the previous commit 8ffb055bea ("cls_flower: Fix the behavior
using port ranges with hw-offload") added new flag and field toward filtering
based on port ranges with hw-offload, it missed applying for BPF flow dissector
then. As a result, BPF flow dissector currently stores data extracted from
packets in incorrect field used for exact match whenever packets are classified
by filters based on port ranges. Thus, they never match rules in such cases
because flow dissector gives rise to generating incorrect flow keys.
This series fixes the issue by replacing incorrect flag and field with new
ones in BPF flow dissector, and adds a test for filtering based on specified
port ranges to the existing selftest program.
Changes in v2:
- set key_ports to NULL at the top of __skb_flow_bpf_to_target()
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add a simple test to make sure that a filter based on specified port
range classifies packets correctly.
Signed-off-by: Yoshiki Komachi <komachi.yoshiki@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Petar Penkov <ppenkov@google.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200117070533.402240-3-komachi.yoshiki@gmail.com
This patch applies new flag (FLOW_DISSECTOR_KEY_PORTS_RANGE) and
field (tp_range) to BPF flow dissector to generate appropriate flow
keys when classified by specified port ranges.
Fixes: 8ffb055bea ("cls_flower: Fix the behavior using port ranges with hw-offload")
Signed-off-by: Yoshiki Komachi <komachi.yoshiki@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Petar Penkov <ppenkov@google.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20200117070533.402240-2-komachi.yoshiki@gmail.com
Johan Hedberg says:
====================
pull request: bluetooth-next 2020-01-26
Here's (probably) the last bluetooth-next pull request for the 5.6 kernel.
- Initial pieces of Bluetooth 5.2 Isochronous Channels support
- mgmt: Various cleanups and a new Set Blocked Keys command
- btusb: Added support for 04ca:3021 QCA_ROME device
- hci_qca: Multiple fixes & cleanups
- hci_bcm: Fixes & improved device tree support
- Fixed attempts to create duplicate debugfs entries
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
'alloc_etherdev_mqs()' expects first 'tx', then 'rx'. The semantic here
looks reversed.
Reorder the arguments passed to 'alloc_etherdev_mqs()' in order to keep
the correct semantic.
In fact, this is a no-op because both XGENE_NUM_[RT]X_RING are 8.
Fixes: 107dec2749 ("drivers: net: xgene: Add support for multiple queues")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now that we depend on rcu_call() and synchronize_rcu() to also wait
for preempt_disabled region to complete the rcu read critical section
in __dev_map_flush() is no longer required. Except in a few special
cases in drivers that need it for other reasons.
These originally ensured the map reference was safe while a map was
also being free'd. And additionally that bpf program updates via
ndo_bpf did not happen while flush updates were in flight. But flush
by new rules can only be called from preempt-disabled NAPI context.
The synchronize_rcu from the map free path and the rcu_call from the
delete path will ensure the reference there is safe. So lets remove
the rcu_read_lock and rcu_read_unlock pair to avoid any confusion
around how this is being protected.
If the rcu_read_lock was required it would mean errors in the above
logic and the original patch would also be wrong.
Now that we have done above we put the rcu_read_lock in the driver
code where it is needed in a driver dependent way. I think this
helps readability of the code so we know where and why we are
taking read locks. Most drivers will not need rcu_read_locks here
and further XDP drivers already have rcu_read_locks in their code
paths for reading xdp programs on RX side so this makes it symmetric
where we don't have half of rcu critical sections define in driver
and the other half in devmap.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/1580084042-11598-4-git-send-email-john.fastabend@gmail.com
virtio_net currently relies on rcu critical section to access the xdp
program in its xdp_xmit handler. However, the pointer to the xdp program
is only used to do a NULL pointer comparison to determine if xdp is
enabled or not.
Use rcu_access_pointer() instead of rcu_dereference() to reflect this.
Then later when we drop rcu_read critical section virtio_net will not
need in special handling.
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/1580084042-11598-3-git-send-email-john.fastabend@gmail.com
Now that we rely on synchronize_rcu and call_rcu waiting to
exit perempt-disable regions (NAPI) lets update the comments
to reflect this.
Fixes: 0536b85239 ("xdp: Simplify devmap cleanup")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Acked-by: Song Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/1580084042-11598-2-git-send-email-john.fastabend@gmail.com
Defaults for min_mtu and max_mtu are set by ether_setup(), which is
called from devm_alloc_etherdev(). Let rtl_jumbo_max() only return
a positive value if actually jumbo packets are supported. This also
allows to remove constant Jumbo_1K which is a little misleading anyway.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
An 'alloc_etherdev()' called is not ballanced by a corresponding
'free_netdev()' call in one error handling path.
Slighly reorder the error handling code to catch the missed case.
Fixes: c100e47caa ("mlxsw: minimal: Add ethtool support")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
DSA sets up a switch tree little by little. Every switch of the N
members of the tree calls dsa_register_switch, and (N - 1) will just
touch the dst->ports list with their ports and quickly exit. Only the
last switch that calls dsa_register_switch will find all DSA links
complete in dsa_tree_setup_routing_table, and not return zero as a
result but instead go ahead and set up the entire DSA switch tree
(practically on behalf of the other switches too).
The trouble is that the (N - 1) switches don't clean up after themselves
after they get an error such as EPROBE_DEFER. Their footprint left in
dst->ports by dsa_switch_touch_ports is still there. And switch N, the
one responsible with actually setting up the tree, is going to work with
those stale dp, dp->ds and dp->ds->dev pointers. In particular ds and
ds->dev might get freed by the device driver.
Be there a 2-switch tree and the following calling order:
- Switch 1 calls dsa_register_switch
- Calls dsa_switch_touch_ports, populates dst->ports
- Calls dsa_port_parse_cpu, gets -EPROBE_DEFER, exits.
- Switch 2 calls dsa_register_switch
- Calls dsa_switch_touch_ports, populates dst->ports
- Probe doesn't get deferred, so it goes ahead.
- Calls dsa_tree_setup_routing_table, which returns "complete == true"
due to Switch 1 having called dsa_switch_touch_ports before.
- Because the DSA links are complete, it calls dsa_tree_setup_switches
now.
- dsa_tree_setup_switches iterates through dst->ports, initializing
the Switch 1 ds structure (invalid) and the Switch 2 ds structure
(valid).
- Undefined behavior (use after free, sometimes NULL pointers, etc).
Real example below (debugging prints added by me, as well as guards
against NULL pointers):
[ 5.477947] dsa_tree_setup_switches: Setting up port 0 of switch ffffff803df0b980 (dev ffffff803f775c00)
[ 6.313002] dsa_tree_setup_switches: Setting up port 1 of switch ffffff803df0b980 (dev ffffff803f775c00)
[ 6.319932] dsa_tree_setup_switches: Setting up port 2 of switch ffffff803df0b980 (dev ffffff803f775c00)
[ 6.329693] dsa_tree_setup_switches: Setting up port 3 of switch ffffff803df0b980 (dev ffffff803f775c00)
[ 6.339458] dsa_tree_setup_switches: Setting up port 4 of switch ffffff803df0b980 (dev ffffff803f775c00)
[ 6.349226] dsa_tree_setup_switches: Setting up port 5 of switch ffffff803df0b980 (dev ffffff803f775c00)
[ 6.358991] dsa_tree_setup_switches: Setting up port 6 of switch ffffff803df0b980 (dev ffffff803f775c00)
[ 6.368758] dsa_tree_setup_switches: Setting up port 7 of switch ffffff803df0b980 (dev ffffff803f775c00)
[ 6.378524] dsa_tree_setup_switches: Setting up port 8 of switch ffffff803df0b980 (dev ffffff803f775c00)
[ 6.388291] dsa_tree_setup_switches: Setting up port 9 of switch ffffff803df0b980 (dev ffffff803f775c00)
[ 6.398057] dsa_tree_setup_switches: Setting up port 10 of switch ffffff803df0b980 (dev ffffff803f775c00)
[ 6.407912] dsa_tree_setup_switches: Setting up port 0 of switch ffffff803da02f80 (dev 0000000000000000)
[ 6.417682] dsa_tree_setup_switches: Setting up port 1 of switch ffffff803da02f80 (dev 0000000000000000)
[ 6.427446] dsa_tree_setup_switches: Setting up port 2 of switch ffffff803da02f80 (dev 0000000000000000)
[ 6.437212] dsa_tree_setup_switches: Setting up port 3 of switch ffffff803da02f80 (dev 0000000000000000)
[ 6.446979] dsa_tree_setup_switches: Setting up port 4 of switch ffffff803da02f80 (dev 0000000000000000)
[ 6.456744] dsa_tree_setup_switches: Setting up port 5 of switch ffffff803da02f80 (dev 0000000000000000)
[ 6.466512] dsa_tree_setup_switches: Setting up port 6 of switch ffffff803da02f80 (dev 0000000000000000)
[ 6.476277] dsa_tree_setup_switches: Setting up port 7 of switch ffffff803da02f80 (dev 0000000000000000)
[ 6.486043] dsa_tree_setup_switches: Setting up port 8 of switch ffffff803da02f80 (dev 0000000000000000)
[ 6.495810] dsa_tree_setup_switches: Setting up port 9 of switch ffffff803da02f80 (dev 0000000000000000)
[ 6.505577] dsa_tree_setup_switches: Setting up port 10 of switch ffffff803da02f80 (dev 0000000000000000)
[ 6.515433] dsa_tree_setup_switches: Setting up port 0 of switch ffffff803db15b80 (dev ffffff803d8e4800)
[ 7.354120] dsa_tree_setup_switches: Setting up port 1 of switch ffffff803db15b80 (dev ffffff803d8e4800)
[ 7.361045] dsa_tree_setup_switches: Setting up port 2 of switch ffffff803db15b80 (dev ffffff803d8e4800)
[ 7.370805] dsa_tree_setup_switches: Setting up port 3 of switch ffffff803db15b80 (dev ffffff803d8e4800)
[ 7.380571] dsa_tree_setup_switches: Setting up port 4 of switch ffffff803db15b80 (dev ffffff803d8e4800)
[ 7.390337] dsa_tree_setup_switches: Setting up port 5 of switch ffffff803db15b80 (dev ffffff803d8e4800)
[ 7.400104] dsa_tree_setup_switches: Setting up port 6 of switch ffffff803db15b80 (dev ffffff803d8e4800)
[ 7.409872] dsa_tree_setup_switches: Setting up port 7 of switch ffffff803db15b80 (dev ffffff803d8e4800)
[ 7.419637] dsa_tree_setup_switches: Setting up port 8 of switch ffffff803db15b80 (dev ffffff803d8e4800)
[ 7.429403] dsa_tree_setup_switches: Setting up port 9 of switch ffffff803db15b80 (dev ffffff803d8e4800)
[ 7.439169] dsa_tree_setup_switches: Setting up port 10 of switch ffffff803db15b80 (dev ffffff803d8e4800)
The solution is to recognize that the functions that call
dsa_switch_touch_ports (dsa_switch_parse_of, dsa_switch_parse) have side
effects, and therefore one should clean up their side effects on error
path. The cleanup of dst->ports was taken from dsa_switch_remove and
moved into a dedicated dsa_switch_release_ports function, which should
really be per-switch (free only the members of dst->ports that are also
members of ds, instead of all switch ports).
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
All usage of this function was removed three years ago, and the
function was marked as deprecated:
a52ad514fd ("net: deprecate eth_change_mtu, remove usage")
So I think we can remove it now.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Lorenzo Bianconi says:
====================
XDP fixes for socionext driver
Fix possible user-after-in XDP rx path
Fix rx statistics accounting if no bpf program is attached
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix xdp_result initialization in netsec_process_rx in order to not
increase rx counters if there is no bpf program attached to the xdp hook
and napi_gro_receive returns GRO_DROP
Fixes: ba2b232108 ("net: netsec: add XDP support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix possible use-after-free in in netsec_process_rx that can occurs if
the first packet is sent to the normal networking stack and the
following one is dropped by the bpf program attached to the xdp hook.
Fix the issue defining the skb pointer in the 'budget' loop
Fixes: ba2b232108 ("net: netsec: add XDP support")
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko says:
====================
net: allow per-net notifier to follow netdev into namespace
Currently we have per-net notifier, which allows to get only
notifications relevant to particular network namespace. That is enough
for drivers that have netdevs local in a particular namespace (cannot
move elsewhere).
However if netdev can change namespace, per-net notifier cannot be used.
Introduce dev_net variant that is basically per-net notifier with an
extension that re-registers the per-net notifier upon netdev namespace
change. Basically the per-net notifier follows the netdev into
namespace.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Register the dev_net notifier and allow the per-net notifier to follow
the device into different namespace.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce dev_net variants of netdev notifier register/unregister functions
and allow per-net notifier to follow the netdevice into the namespace it is
moved to.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Push the code which is done under rtnl lock in net notifier register and
unregister function into separate helpers.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function does the same thing as the existing code, so rather call
call_netdevice_unregister_net_notifiers() instead of code duplication.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
reuseport_grow() does not need to initialize the more_reuse->max_socks
again. It is already initialized in __reuseport_alloc().
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
Steffen Klassert says:
====================
Support fraglist GRO/GSO
This patchset adds support to do GRO/GSO by chaining packets
of the same flow at the SKB frag_list pointer. This avoids
the overhead to merge payloads into one big packet, and
on the other end, if GSO is needed it avoids the overhead
of splitting the big packet back to the native form.
Patch 1 adds netdev feature flags to enable fraglist GRO,
this implements one of the configuration options discussed
at netconf 2019.
Patch 2 adds a netdev software feature set that defaults to off
and assigns the new fraglist GRO feature flag to it.
Patch 3 adds the core infrastructure to do fraglist GRO/GSO.
Patch 4 enables UDP to use fraglist GRO/GSO if configured.
I have only meaningful forwarding performance measurements.
I did some tests for the local receive path with netperf and iperf,
but in this case the sender that generates the packets is the
bottleneck. So the benchmarks are not that meaningful for the
receive path.
Paolo Abeni did some benchmarks of the local receive path for the
RFC v2 version of this pachset, results can be found here:
https://www.spinics.net/lists/netdev/msg551158.html
I used my IPsec forwarding test setup for the performance measurements:
------------ ------------
-->| router 1 |-------->| router 2 |--
| ------------ ------------ |
| |
| -------------------- |
--------|Spirent Testcenter|<----------
--------------------
net-next (September 7th 2019):
Single stream UDP frame size 1460 Bytes: 1.161.000 fps (13.5 Gbps).
----------------------------------------------------------------------
net-next (September 7th 2019) + standard UDP GRO/GSO (not implemented
in this patchset):
Single stream UDP frame size 1460 Bytes: 1.801.000 fps (21 Gbps).
----------------------------------------------------------------------
net-next (September 7th 2019) + fraglist UDP GRO/GSO:
Single stream UDP frame size 1460 Bytes: 2.860.000 fps (33.4 Gbps).
=======================================================================
net-next (January 23th 2020):
Single stream UDP frame size 1460 Bytes: 919.000 fps (10.73 Gbps).
----------------------------------------------------------------------
net-next (January 23th 2020) + fraglist UDP GRO/GSO:
Single stream UDP frame size 1460 Bytes: 2.430.000 fps (28.38 Gbps).
-----------------------------------------------------------------------
Changes from RFC v1:
- Add IPv6 support.
- Split patchset to enable UDP GRO by default before adding
fraglist GRO support.
- Mark fraglist GRO packets as CHECKSUM_NONE.
- Take a refcount on the first segment skb when doing fraglist
segmentation. With this we can use the same error handling
path as with standard segmentation.
Changes from RFC v2:
- Add a netdev feature flag to configure listifyed GRO.
- Fix UDP GRO enabling for IPv6.
- Fix a rcu_read_lock() imbalance.
- Fix error path in skb_segment_list().
Changes from RFC v3:
- Rename NETIF_F_GRO_LIST to NETIF_F_GRO_FRAGLIST and add
NETIF_F_GSO_FRAGLIST.
- Move introduction of SKB_GSO_FRAGLIST to patch 2.
- Use udpv6_encap_needed_key instead of udp_encap_needed_key in IPv6.
- Move some missplaced code from patch 5 to patch 1 where it belongs to.
Changes from RFC v4:
- Drop the 'UDP: enable GRO by default' patch for now. Standard UDP GRO
is not changed with this patchset.
- Rebase to net-next current.
Changes fom v1 (December 18th):
- Do a full __copy_skb_header instead of tryng to find the really
needed subset header fields. Thisa can be done later.
- Mark all fraglist GRO packets with CHECKSUM_UNNECESSARY.
- Rebase to net-next current.
Changes fom v2 (January 24th):
- Do the CHECKSUM_UNNECESSARY setting from IPv4 for IPv6 too.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>