Commit Graph

15962 Commits

Author SHA1 Message Date
Talal Ahmad
f1a456f8f3 net: avoid double accounting for pure zerocopy skbs
Track skbs with only zerocopy data and avoid charging them to kernel
memory to correctly account the memory utilization for msg_zerocopy.
All of the data in such skbs is held in user pages which are already
accounted to user. Before this change, they are charged again in
kernel in __zerocopy_sg_from_iter. The charging in kernel is
excessive because data is not being copied into skb frags. This
excessive charging can lead to kernel going into memory pressure
state which impacts all sockets in the system adversely. Mark pure
zerocopy skbs with a SKBFL_PURE_ZEROCOPY flag and remove
charge/uncharge for data in such skbs.

Initially, an skb is marked pure zerocopy when it is empty and in
zerocopy path. skb can then change from a pure zerocopy skb to mixed
data skb (zerocopy and copy data) if it is at tail of write queue and
there is room available in it and non-zerocopy data is being sent in
the next sendmsg call. At this time sk_mem_charge is done for the pure
zerocopied data and the pure zerocopy flag is unmarked. We found that
this happens very rarely on workloads that pass MSG_ZEROCOPY.

A pure zerocopy skb can later be coalesced into normal skb if they are
next to each other in queue but this patch prevents coalescing from
happening. This avoids complexity of charging when skb downgrades from
pure zerocopy to mixed. This is also rare.

In sk_wmem_free_skb, if it is a pure zerocopy skb, an sk_mem_uncharge
for SKB_TRUESIZE(MAX_TCP_HEADER) is done for sk_mem_charge in
tcp_skb_entail for an skb without data.

Testing with the msg_zerocopy.c benchmark between two hosts(100G nics)
with zerocopy showed that before this patch the 'sock' variable in
memory.stat for cgroup2 that tracks sum of sk_forward_alloc,
sk_rmem_alloc and sk_wmem_queued is around 1822720 and with this
change it is 0. This is due to no charge to sk_forward_alloc for
zerocopy data and shows memory utilization for kernel is lowered.

Signed-off-by: Talal Ahmad <talalahmad@google.com>
Acked-by: Arjun Roy <arjunroy@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-01 16:33:27 -07:00
Talal Ahmad
03271f3a35 tcp: rename sk_wmem_free_skb
sk_wmem_free_skb() is only used by TCP.

Rename it to make this clear, and move its declaration to
include/net/tcp.h

Signed-off-by: Talal Ahmad <talalahmad@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Arjun Roy <arjunroy@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-11-01 16:33:27 -07:00
Taehee Yoo
b75f7095d4 amt: add mld report message handler
In the previous patch, igmp report handler was added.
That handler can be used for mld too.
So, it uses that common code to parse mld report message.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01 13:36:09 +00:00
Taehee Yoo
bc54e49c14 amt: add multicast(IGMP) report message handler
amt 'Relay' interface manages multicast groups(igmp/mld) and sources.
In order to manage, it should have the function to parse igmp/mld
report messages. So, this adds the logic for parsing igmp report messages
and saves them on their own data structure.

   struct amt_group_node means one group(igmp/mld).
   struct amt_source_node means one source.

The same source can't exist in the same group.
The same group can exist in the same tunnel because it manages
the host address too.

The group information is used when forwarding multicast data.
If there are no groups in the specific tunnel, Relay doesn't forward it.

Although Relay manages sources, it doesn't support the source filtering
feature. Because the reason to manage sources is just that in order
to manage group more correctly.

In the next patch, MLD part will be added.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01 13:36:08 +00:00
Taehee Yoo
cbc21dc1cf amt: add data plane of amt interface
Before forwarding multicast traffic, the amt interface establishes between
gateway and relay. In order to establish, amt defined some message type
and those message flow looks like the below.

                      Gateway                  Relay
                      -------                  -----
                         :        Request        :
                     [1] |           N           |
                         |---------------------->|
                         |    Membership Query   | [2]
                         |    N,MAC,gADDR,gPORT  |
                         |<======================|
                     [3] |   Membership Update   |
                         |   ({G:INCLUDE({S})})  |
                         |======================>|
                         |                       |
    ---------------------:-----------------------:---------------------
   |                     |                       |                     |
   |                     |    *Multicast Data    |  *IP Packet(S,G)    |
   |                     |      gADDR,gPORT      |<-----------------() |
   |    *IP Packet(S,G)  |<======================|                     |
   | ()<-----------------|                       |                     |
   |                     |                       |                     |
    ---------------------:-----------------------:---------------------
                         ~                       ~
                         ~        Request        ~
                     [4] |           N'          |
                         |---------------------->|
                         |   Membership Query    | [5]
                         | N',MAC',gADDR',gPORT' |
                         |<======================|
                     [6] |                       |
                         |       Teardown        |
                         |   N,MAC,gADDR,gPORT   |
                         |---------------------->|
                         |                       | [7]
                         |   Membership Update   |
                         |  ({G:INCLUDE({S})})   |
                         |======================>|
                         |                       |
    ---------------------:-----------------------:---------------------
   |                     |                       |                     |
   |                     |    *Multicast Data    |  *IP Packet(S,G)    |
   |                     |     gADDR',gPORT'     |<-----------------() |
   |    *IP Packet (S,G) |<======================|                     |
   | ()<-----------------|                       |                     |
   |                     |                       |                     |
    ---------------------:-----------------------:---------------------
                         |                       |
                         :                       :

1. Discovery
 - Sent by Gateway to Relay
 - To find Relay unique ip address
2. Advertisement
 - Sent by Relay to Gateway
 - Contains the unique IP address
3. Request
 - Sent by Gateway to Relay
 - Solicit to receive 'Query' message.
4. Query
 - Sent by Relay to Gateway
 - Contains General Query message.
5. Update
 - Sent by  Gateway to Relay
 - Contains report message.
6. Multicast Data
 - Sent by Relay to Gateway
 - encapsulated multicast traffic.
7. Teardown
 - Not supported at this time.

Except for the Teardown message, it supports all messages.

In the next patch, IGMP/MLD logic will be added.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01 13:36:08 +00:00
Taehee Yoo
b9022b53ad amt: add control plane of amt interface
It adds definitions and control plane code for AMT.
this is very similar to udp tunneling interfaces such as gtp, vxlan, etc.
In the next patch, data plane code will be added.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01 13:36:08 +00:00
Jakub Kicinski
1af0a0948e ethtool: don't drop the rtnl_lock half way thru the ioctl
devlink compat code needs to drop rtnl_lock to take
devlink->lock to ensure correct lock ordering.

This is problematic because we're not strictly guaranteed
that the netdev will not disappear after we re-lock.
It may open a possibility of nested ->begin / ->complete
calls.

Instead of calling into devlink under rtnl_lock take
a ref on the devlink instance and make the call after
we've dropped rtnl_lock.

We (continue to) assume that netdevs have an implicit
reference on the devlink returned from ndo_get_devlink_port

Note that ndo_get_devlink_port will now get called
under rtnl_lock. That should be fine since none of
the drivers seem to be taking serious locks inside
ndo_get_devlink_port.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01 13:26:07 +00:00
Jakub Kicinski
46db1b77cd devlink: expose get/put functions
Allow those who hold implicit reference on a devlink instance
to try to take a full ref on it. This will be used from netdev
code which has an implicit ref because of driver call ordering.

Note that after recent changes devlink_unregister() may happen
before netdev unregister, but devlink_free() should still happen
after, so we are safe to try, but we can't just refcount_inc()
and assume it's not zero.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01 13:26:07 +00:00
Marek Behún
c07c6e8eb4 net: dsa: populate supported_interfaces member
Add a new DSA switch operation, phylink_get_interfaces, which should
fill in which PHY_INTERFACE_MODE_* are supported by given port.

Use this before phylink_create() to fill phylinks supported_interfaces
member, allowing phylink to determine which PHY_INTERFACE_MODEs are
supported.

Signed-off-by: Marek Behún <kabel@kernel.org>
[tweaked patch and description to add more complete support -- rmk]
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01 13:06:32 +00:00
David S. Miller
894d084434 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for net-next:

1) Use array_size() in ebtables, from Gustavo A. R. Silva.

2) Attach IPS_ASSURED to internal UDP stream state, reported by
   Maciej Zenczykowski.

3) Add NFT_META_IFTYPE to match on the interface type either
   from ingress or egress.

4) Generalize pktinfo->tprot_set to flags field.

5) Allow to match on inner headers / payload data.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-11-01 12:59:58 +00:00
Pablo Neira Ayuso
c46b38dc87 netfilter: nft_payload: support for inner header matching / mangling
Allow to match and mangle on inner headers / payload data after the
transport header. There is a new field in the pktinfo structure that
stores the inner header offset which is calculated only when requested.
Only TCP and UDP supported at this stage.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-11-01 09:31:03 +01:00
Pablo Neira Ayuso
b5bdc6f9c2 netfilter: nf_tables: convert pktinfo->tprot_set to flags field
Generalize boolean field to store more flags on the pktinfo structure.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2021-11-01 09:30:20 +01:00
Luiz Augusto von Dentz
182ee45da0 Bluetooth: hci_sync: Rework hci_suspend_notifier
This makes hci_suspend_notifier use the hci_*_sync which can be
executed synchronously which is allowed in the suspend_notifier and
simplifies a lot of the handling since the status of each command can
be checked inline so no other work need to be scheduled thus can be
performed without using of a state machine.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-10-29 16:52:00 +02:00
Luiz Augusto von Dentz
d0b137062b Bluetooth: hci_sync: Rework init stages
This moves the init stages to use the hci_sync infra and in addition
to that have the stages as function tables so it is easier to change
the command sequence.

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-10-29 16:52:00 +02:00
Brian Gix
3244845c63 Bluetooth: hci_sync: Convert MGMT_OP_SSP
mgmt-tester paths:
Set SSP on - Success 2
Set Device ID - SSP off and Power on

Signed-off-by: Brian Gix <brian.gix@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-10-29 16:52:00 +02:00
Brian Gix
6f6ff38a1e Bluetooth: hci_sync: Convert MGMT_OP_SET_LOCAL_NAME
Uses existing *_sync functions, but made hci_update_name_sync
non-static.

mgmt-test paths:
Set Advertising on - Local name 1
Set Advertising on - Name + Appear 1
Set Local Name - Success 1
Set Local Name - Success 2
Set Local Name - Success 3
Add Advertising - Success (Empty ScRsp)
Add Advertising - Success (Complete name)
Add Advertising - Success (Shortened name)
Add Advertising - Success (Short name)
Add Advertising - Success (Name + data)
Add Advertising - Invalid Params (Name + data)
Add Advertising - Success (Name+data+appear)
Read Ext Controller Info 3
Read Ext Controller Info 4
Read Ext Controller Info 5
Add Ext Advertising - Success (Empty ScRsp)
Add Ext Advertising - Success (Complete name)
Add Ext Advertising - Success (Shortened name)
Add Ext Advertising - Success (Short name)
Add Ext Advertising - Success (Name + data)
Add Ext Advertising - Invalid Params (Name + data)
Add Ext Advertising - Success (Name+data+appear)

Signed-off-by: Brian Gix <brian.gix@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-10-29 16:52:00 +02:00
Brian Gix
f892244b05 Bluetooth: hci_sync: Convert MGMT_OP_READ_LOCAL_OOB_DATA
New functions:
  hci_read_local_oob_data_sync

This function requires all of the data from the cmd cmplt event
to be passed up to the caller via the skb.

mgmt-tester paths:
Read Local OOB Data - Not powered
Read Local OOB Data - Legacy pairing
Read Local OOB Data - Success SSP
Read Local OOB Data - Success SC

Signed-off-by: Brian Gix <brian.gix@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-10-29 16:51:59 +02:00
Brian Gix
d81a494c43 Bluetooth: hci_sync: Convert MGMT_OP_SET_LE
Uses:
  hci_disable_advertising_sync
  hci_remove_ext_adv_instance_sync
  hci_write_le_host_supported_sync
  hci_setup_ext_adv_instance_sync
  hci_update_scan_rsp_data_sync

Signed-off-by: Brian Gix <brian.gix@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-10-29 16:51:59 +02:00
Brian Gix
5a75013746 Bluetooth: hci_sync: Convert MGMT_OP_GET_CLOCK_INFO
Synchronous version of MGMT_OP_GET_CLOCK_INFO.

Signed-off-by: Brian Gix <brian.gix@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-10-29 16:51:59 +02:00
Brian Gix
2f2eb0c9de Bluetooth: hci_sync: Convert MGMT_OP_SET_SECURE_CONN
Synchronous version of MGMT_OP_SET_SECURE_CONN.

Signed-off-by: Brian Gix <brian.gix@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-10-29 16:51:59 +02:00
Brian Gix
47db6b4299 Bluetooth: hci_sync: Convert MGMT_OP_GET_CONN_INFO
Synchronous version of MGMT_OP_GET_CONN_INFO

Implements:
  hci_read_rssi_sync
  hci_read_tx_power_sync

Signed-off-by: Brian Gix <brian.gix@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-10-29 16:51:59 +02:00
Brian Gix
451d95a98c Bluetooth: hci_sync: Enable synch'd set_bredr
Uses previously written:
  hci_write_fast_connectable_sync
  hci_update_scan_sync
  hci_update_adv_data_sync

Signed-off-by: Brian Gix <brian.gix@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-10-29 16:51:59 +02:00
Brian Gix
353a0249c3 Bluetooth: hci_sync: Convert MGMT_OP_SET_FAST_CONNECTABLE
This creates a synchronized Write Fast Connectable call and attaches it
to the MGMT_OP_SET_FAST_CONNECTABLE management opcode.

Signed-off-by: Brian Gix <brian.gix@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-10-29 16:51:59 +02:00
Luiz Augusto von Dentz
abfeea476c Bluetooth: hci_sync: Convert MGMT_OP_START_DISCOVERY
This make use of hci_cmd_sync_queue for MGMT_OP_START_DISCOVERY,
MGMT_OP_START_SERVICE_DISCOVERY and MGMT_OP_STOP_DISCOVERY to use
hci_cmd_sync_queue so they no longer depend on hdev->discov_update work
to send any commands.

Tested with:

tools/mgmt-tester -s "Start Discovery"

Test Summary
------------
Start Discovery - Not powered 1                      Passed
Start Discovery - Invalid parameters 1               Passed
Start Discovery - Not supported 1                    Passed
Start Discovery - Success 1                          Passed
Start Discovery - Success 2                          Passed
Start Discovery - Power Off 1                        Passed
Start Discovery BREDR LE - (Ext Scan Enable)         Passed
Start Discovery LE - (Ext Scan Enable)               Passed
Start Discovery LE - (Ext Scan Param)                Passed
Start Discovery - (2m, Scan Param)                   Passed
Start Discovery - (coded, Scan Param)                Passed
Start Discovery - (1m, 2m, coded, Scan Param)        Passed
LL Privacy - Start Discovery 1 (Disable RL)          Passed
LL Privacy - Start Discovery 2 (Disable RL)          Passed
Total: 14, Passed: 14 (100.0%), Failed: 0, Not Run: 0

tools/mgmt-tester -s "Start Service"

Test Summary
------------
Start Service Discovery - Not powered 1              Passed
Start Service Discovery - Invalid parameters 1       Passed
Start Service Discovery - Not supported 1            Passed
Start Service Discovery - Success 1                  Passed
Start Service Discovery - Success 2                  Passed
Total: 5, Passed: 5 (100.0%), Failed: 0, Not Run: 0

tools/mgmt-tester -s "Stop Discovery"

Test Summary
------------
Stop Discovery - Success 1                           Passed
Stop Discovery - BR/EDR (Inquiry) Success 1          Passed
Stop Discovery - Rejected 1                          Passed
Stop Discovery - Invalid parameters 1                Passed
Stop Discovery - (Ext Scan Disable)                  Passed
Total: 5, Passed: 5 (100.0%), Failed: 0, Not Run: 0

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-10-29 16:51:59 +02:00
Luiz Augusto von Dentz
cf75ad8b41 Bluetooth: hci_sync: Convert MGMT_SET_POWERED
This make use of hci_cmd_sync_queue when MGMT_SET_POWERED is used so all
commands are run within hdev->cmd_sync_work instead of
hdev->power_on_work and hdev->power_off_work.

In addition to that the power on sequence now takes into account if
local IRK needs to be programmed in the resolving list.

Tested with:

tools/mgmt-tester -s "Set powered"

Test Summary
------------
Set powered on - Success                             Passed
Set powered on - Invalid parameters 1                Passed
Set powered on - Invalid parameters 2                Passed
Set powered on - Invalid parameters 3                Passed
Set powered on - Invalid index                       Passed
Set powered on - Privacy and Advertising             Passed
Set powered off - Success                            Passed
Set powered off - Class of Device                    Passed
Set powered off - Invalid parameters 1               Passed
Set powered off - Invalid parameters 2               Passed
Set powered off - Invalid parameters 3               Passed
Total: 11, Passed: 11 (100.0%), Failed: 0, Not Run: 0

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-10-29 16:51:58 +02:00
Luiz Augusto von Dentz
ad383c2c65 Bluetooth: hci_sync: Enable advertising when LL privacy is enabled
This enables advertising when LL privacy is enabled and changes the
command sequence when resolving list is updated to also account for when
advertising is enabled using the following sequence:

If there are devices to scan:

Disable Scanning -> Update Accept List ->
use_ll_privacy((Disable Advertising) -> Disable Resolving List ->
Update Resolving List -> Enable Resolving List -> (Enable Advertising)) ->
Enable Scanning

Otherwise:

Disable Scanning

Errors during the Update Accept List stage are handled gracefully by
restoring any previous state (e.g. advertising) and disabling the use of
accept list as either accept list or resolving list could not be
updated.

Tested with:

mgmt-tester -s "LL Privacy"

Test Summary
------------
LL Privacy - Add Device 1 (Add to WL)                Passed
LL Privacy - Add Device 2 (Add to RL)                Passed
LL Privacy - Add Device 3 (Enable RL)                Passed
LL Privacy - Add Device 4 (2 Devices to WL)          Passed
LL Privacy - Add Device 5 (2 Devices to RL)          Passed
LL Privacy - Add Device 6 (RL is full)               Passed
LL Privacy - Add Device 7 (WL is full)               Passed
LL Privacy - Add Device 8 (Disable Adv)              Passed
LL Privacy - Add Device 9 (Multi Adv)                Passed
LL Privacy - Add Device 10 (Multi Dev and Multi Adv) Passed
LL Privacy - Remove Device 1 (Remove from WL)        Passed
LL Privacy - Remove Device 2 (Remove from RL)        Passed
LL Privacy - Remove Device 3 (Disable RL)            Passed
LL Privacy - Remove Device 4 (Disable Adv)           Passed
LL Privacy - Remove Device 5 (Multi Adv)             Passed
LL Privacy - Start Discovery 1 (Disable RL)          Passed
LL Privacy - Start Discovery 2 (Disable RL)          Passed
Total: 18, Passed: 18 (100.0%), Failed: 0, Not Run: 0

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-10-29 16:51:58 +02:00
Luiz Augusto von Dentz
e8907f7654 Bluetooth: hci_sync: Make use of hci_cmd_sync_queue set 3
This make use of hci_cmd_sync_queue for the following MGMT commands:

    Add Device
    Remove Device

Tested with:

mgmt-tester -s "Add Device"

Test Summary
------------
Add Device - Invalid Params 1                        Passed
Add Device - Invalid Params 2                        Passed
Add Device - Invalid Params 3                        Passed
Add Device - Invalid Params 4                        Passed
Add Device - Success 1                               Passed
Add Device - Success 2                               Passed
Add Device - Success 3                               Passed
Add Device - Success 4                               Passed
Add Device - Success 5                               Passed
Add Device - Success 6 - Add to whitelist            Passed
Add Device - Success 7 - Add to resolv list          Passed
Add Device - Success 8 - Enable resolv list          Passed
Total: 12, Passed: 12 (100.0%), Failed: 0, Not Run: 0
Overall execution time: 0.209 seconds

mgmt-tester -s "Remove Device"

Test Summary
------------
Remove Device - Invalid Params 1                     Passed
Remove Device - Invalid Params 2                     Passed
Remove Device - Invalid Params 3                     Passed
Remove Device - Success 1                            Passed
Remove Device - Success 2                            Passed
Remove Device - Success 3                            Passed
Remove Device - Success 4                            Passed
Remove Device - Success 5                            Passed
Remove Device - Success 6 - All Devices              Passed
Remove Device - Success 7 - Remove from whitelist    Passed
Remove Device - Success 8 - Remove from resolv list  Passed
Total: 11, Passed: 11 (100.0%), Failed: 0, Not Run: 0
Overall execution time: 4.26 seconds

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-10-29 16:51:58 +02:00
Luiz Augusto von Dentz
cba6b75871 Bluetooth: hci_sync: Make use of hci_cmd_sync_queue set 2
This make use of hci_cmd_sync_queue for the following MGMT commands:

Add Advertising
Remove Advertising
Add Extended Advertising Parameters
Add Extended Advertising Data

mgmt-tester -s "Add Advertising"

Test Summary
------------
Add Advertising - Failure: LE off                    Passed
Add Advertising - Invalid Params 1 (AD too long)     Passed
Add Advertising - Invalid Params 2 (Malformed len)   Passed
Add Advertising - Invalid Params 3 (Malformed len)   Passed
Add Advertising - Invalid Params 4 (Malformed len)   Passed
Add Advertising - Invalid Params 5 (AD too long)     Passed
Add Advertising - Invalid Params 6 (ScRsp too long)  Passed
Add Advertising - Invalid Params 7 (Malformed len)   Passed
Add Advertising - Invalid Params 8 (Malformed len)   Passed
Add Advertising - Invalid Params 9 (Malformed len)   Passed
Add Advertising - Invalid Params 10 (ScRsp too long) Passed
Add Advertising - Rejected (Timeout, !Powered)       Passed
Add Advertising - Success 1 (Powered, Add Adv Inst)  Passed
Add Advertising - Success 2 (!Powered, Add Adv Inst) Passed
Add Advertising - Success 3 (!Powered, Adv Enable)   Passed
Add Advertising - Success 4 (Set Adv on override)    Passed
Add Advertising - Success 5 (Set Adv off override)   Passed
Add Advertising - Success 6 (Scan Rsp Dta, Adv ok)   Passed
Add Advertising - Success 7 (Scan Rsp Dta, Scan ok)  Passed
Add Advertising - Success 8 (Connectable Flag)       Passed
Add Advertising - Success 9 (General Discov Flag)    Passed
Add Advertising - Success 10 (Limited Discov Flag)   Passed
Add Advertising - Success 11 (Managed Flags)         Passed
Add Advertising - Success 12 (TX Power Flag)         Passed
Add Advertising - Success 13 (ADV_SCAN_IND)          Passed
Add Advertising - Success 14 (ADV_NONCONN_IND)       Passed
Add Advertising - Success 15 (ADV_IND)               Passed
Add Advertising - Success 16 (Connectable -> on)     Passed
Add Advertising - Success 17 (Connectable -> off)    Passed
Add Advertising - Success 18 (Power -> off, Remove)  Passed
Add Advertising - Success 19 (Power -> off, Keep)    Passed
Add Advertising - Success 20 (Add Adv override)      Passed
Add Advertising - Success 21 (Timeout expires)       Passed
Add Advertising - Success 22 (LE -> off, Remove)     Passed
Add Advertising - Success (Empty ScRsp)              Passed
Add Advertising - Success (ScRsp only)               Passed
Add Advertising - Invalid Params (ScRsp too long)    Passed
Add Advertising - Success (ScRsp appear)             Passed
Add Advertising - Invalid Params (ScRsp appear long) Passed
Add Advertising - Success (Appear is null)           Passed
Add Advertising - Success (Name is null)             Passed
Add Advertising - Success (Complete name)            Passed
Add Advertising - Success (Shortened name)           Passed
Add Advertising - Success (Short name)               Passed
Add Advertising - Success (Name + data)              Passed
Add Advertising - Invalid Params (Name + data)       Passed
Add Advertising - Success (Name+data+appear)         Passed
Total: 47, Passed: 47 (100.0%), Failed: 0, Not Run: 0
Overall execution time: 2.17 seconds

mgmt-tester -s "Remove Advertising"

Test Summary
------------
Remove Advertising - Invalid Params 1                Passed
Remove Advertising - Success 1                       Passed
Remove Advertising - Success 2                       Passed
Total: 3, Passed: 3 (100.0%), Failed: 0, Not Run: 0
Overall execution time: 0.0585 seconds

mgmt-tester -s "Ext Adv MGMT Params"

Test Summary:
------------
Ext Adv MGMT Params - Unpowered                      Passed
Ext Adv MGMT Params - Invalid parameters             Passed
Ext Adv MGMT Params - Success                        Passed
Ext Adv MGMT Params - (5.0) Success                  Passed
Total: 4, Passed: 4 (100.0%), Failed: 0, Not Run: 0
Overall execution time: 0.0746 seconds

mgmt-tester -s "Ext Adv MGMT -"

Test Summary
------------
Ext Adv MGMT - Data set without Params               Passed
Ext Adv MGMT - AD Data (5.0) Invalid parameters      Passed
Ext Adv MGMT - AD Data (5.0) Success                 Passed
Ext Adv MGMT - AD Scan Response (5.0) Success        Passed
Total: 4, Passed: 4 (100.0%), Failed: 0, Not Run: 0
Overall execution time: 0.0805 seconds

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-10-29 16:51:58 +02:00
Luiz Augusto von Dentz
161510ccf9 Bluetooth: hci_sync: Make use of hci_cmd_sync_queue set 1
This make use of hci_cmd_sync_queue for the following MGMT commands:

Set Device Class
Set Device ID
Add UUID
Remove UUID

tools/mgmt-tester -s "Set Device Class"

Test Summary
------------
Set Device Class - Success 1                         Passed
Set Device Class - Success 2                         Passed
Set Device Class - Invalid parameters 1              Passed
Total: 3, Passed: 3 (100.0%), Failed: 0, Not Run: 0
Overall execution time: 0.0599 seconds

tools/mgmt-tester -s "Set Device ID"

Test Summary
------------
Set Device ID - Success 1                            Passed
Set Device ID - Success 2                            Passed
Set Device ID - Disable                              Passed
Set Device ID - Power off and Power on               Passed
Set Device ID - SSP off and Power on                 Passed
Set Device ID - Invalid Parameter                    Passed
Total: 6, Passed: 6 (100.0%), Failed: 0, Not Run: 0
Overall execution time: 0.107 seconds

tools/mgmt-tester -s "Add UUID"

Test Summary
------------
Add UUID - UUID-16 1                                 Passed
Add UUID - UUID-16 multiple 1                        Passed
Add UUID - UUID-16 partial 1                         Passed
Add UUID - UUID-32 1                                 Passed
Add UUID - UUID-32 multiple 1                        Passed
Add UUID - UUID-32 partial 1                         Passed
Add UUID - UUID-128 1                                Passed
Add UUID - UUID-128 multiple 1                       Passed
Add UUID - UUID-128 partial 1                        Passed
Add UUID - UUID mix                                  Passed
Total: 10, Passed: 10 (100.0%), Failed: 0, Not Run: 0
Overall execution time: 0.198 seconds

tools/mgmt-tester -s "Remove UUID"

Test Summary
------------
Remove UUID - Success 1                              Passed
Remove UUID - All UUID - Success 2                   Passed
Remove UUID - Power Off - Success 3                  Passed
Remove UUID - Power Off and On - Success 4           Passed
Remove UUID - Not Exist - Invalid Params 1           Passed
Total: 5, Passed: 5 (100.0%), Failed: 0, Not Run: 0
Overall execution time: 0.0908 seconds

Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-10-29 16:51:58 +02:00
Marcel Holtmann
6a98e3836f Bluetooth: Add helper for serialized HCI command execution
The usage of __hci_cmd_sync() within the hdev->setup() callback allows for
a nice and simple serialized execution of HCI commands. More importantly
it allows for result processing before issueing the next command.

With the current usage of hci_req_run() it is possible to batch up
commands and execute them, but it is impossible to react to their
results or errors.

This is an attempt to generalize the hdev->setup() handling and provide
a simple way of running multiple HCI commands from a single function
context.

There are multiple struct work that are decdicated to certain tasks
already used right now. It is add a lot of bloat to hci_dev struct and
extra handling code. So it might be possible to put all of these behind
a common HCI command infrastructure and just execute the HCI commands
from the same work context in a serialized fashion.

For example updating the white list and resolving list can be done now
without having to know the list size ahead of time. Also preparing for
suspend or resume shouldn't require a state machine anymore. There are
other tasks that should be simplified as well.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2021-10-29 16:51:58 +02:00
Yoshiki Komachi
6de6e46d27 cls_flower: Fix inability to match GRE/IPIP packets
When a packet of a new flow arrives in openvswitch kernel module, it dissects
the packet and passes the extracted flow key to ovs-vswtichd daemon. If hw-
offload configuration is enabled, the daemon creates a new TC flower entry to
bypass openvswitch kernel module for the flow (TC flower can also offload flows
to NICs but this time that does not matter).

In this processing flow, I found the following issue in cases of GRE/IPIP
packets.

When ovs_flow_key_extract() in openvswitch module parses a packet of a new
GRE (or IPIP) flow received on non-tunneling vports, it extracts information
of the outer IP header for ip_proto/src_ip/dst_ip match keys.

This means ovs-vswitchd creates a TC flower entry with IP protocol/addresses
match keys whose values are those of the outer IP header. OTOH, TC flower,
which uses flow_dissector (different parser from openvswitch module), extracts
information of the inner IP header.

The following flow is an example to describe the issue in more detail.

   <----------- Outer IP -----------------> <---------- Inner IP ---------->
  +----------+--------------+--------------+----------+----------+----------+
  | ip_proto | src_ip       | dst_ip       | ip_proto | src_ip   | dst_ip   |
  | 47 (GRE) | 192.168.10.1 | 192.168.10.2 | 6 (TCP)  | 10.0.0.1 | 10.0.0.2 |
  +----------+--------------+--------------+----------+----------+----------+

In this case, TC flower entry and extracted information are shown as below:

  - ovs-vswitchd creates TC flower entry with:
      - ip_proto: 47
      - src_ip: 192.168.10.1
      - dst_ip: 192.168.10.2

  - TC flower extracts below for IP header matches:
      - ip_proto: 6
      - src_ip: 10.0.0.1
      - dst_ip: 10.0.0.2

Thus, GRE or IPIP packets never match the TC flower entry, as each
dissector behaves differently.

IMHO, the behavior of TC flower (flow dissector) does not look correct,
as ip_proto/src_ip/dst_ip in TC flower match means the outermost IP
header information except for GRE/IPIP cases. This patch adds a new
flow_dissector flag FLOW_DISSECTOR_F_STOP_BEFORE_ENCAP which skips
dissection of the encapsulated inner GRE/IPIP header in TC flower
classifier.

Signed-off-by: Yoshiki Komachi <komachi.yoshiki@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-29 14:03:22 +01:00
Jeremy Kerr
67737c4572 mctp: Pass flow data & flow release events to drivers
Now that we have an extension for MCTP data in skbs, populate the flow
when a key has been created for the packet, and add a device driver
operation to inform of flow destruction.

Includes a fix for a warning with test builds:
Reported-by: kernel test robot <lkp@intel.com>

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-29 13:23:51 +01:00
Jeremy Kerr
78476d315e mctp: Add flow extension to skb
This change adds a new skb extension for MCTP, to represent a
request/response flow.

The intention is to use this in a later change to allow i2c controllers
to correctly configure a multiplexer over a flow.

Since we have a cleanup function in the core path (if an extension is
present), we'll need to make CONFIG_MCTP a bool, rather than a tristate.

Includes a fix for a build warning with clang:
Reported-by: kernel test robot <lkp@intel.com>

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-29 13:23:51 +01:00
Xin Long
cc4665ca64 sctp: subtract sctphdr len in sctp_transport_pl_hlen
sctp_transport_pl_hlen() is called to calculate the outer header length
for PL. However, as the Figure in rfc8899#section-4.4:

   Any additional
     headers         .--- MPS -----.
            |        |             |
            v        v             v
     +------------------------------+
     | IP | ** | PL | protocol data |
     +------------------------------+

                <----- PLPMTU ----->
     <---------- PMTU -------------->

Outer header are IP + Any additional headers, which doesn't include
Packetization Layer itself header, namely sctphdr, whereas sctphdr
is counted by __sctp_mtu_payload().

The incorrect calculation caused the link pathmtu to be set larger
than expected by t->pl.pmtu + sctp_transport_pl_hlen(). This patch
is to fix it by subtracting sctphdr len in sctp_transport_pl_hlen().

Fixes: d9e2e410ae ("sctp: add the constants/variables and states and some APIs for transport")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-29 12:21:23 +01:00
Xin Long
c6ea04ea69 sctp: reset probe_timer in sctp_transport_pl_update
sctp_transport_pl_update() is called when transport update its dst and
pathmtu, instead of stopping the PLPMTUD probe timer, PLPMTUD should
start over and reset the probe timer. Otherwise, the PLPMTUD service
would stop.

Fixes: 92548ec2f1 ("sctp: add the probe timer in transport for PLPMTUD")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-29 12:21:23 +01:00
Jakub Kicinski
7df621a3ee Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
include/net/sock.h
  7b50ecfcc6 ("net: Rename ->stream_memory_read to ->sock_is_readable")
  4c1e34c0db ("vsock: Enable y2038 safe timeval for timeout")

drivers/net/ethernet/marvell/octeontx2/af/rvu_debugfs.c
  0daa55d033 ("octeontx2-af: cn10k: debugfs for dumping LMTST map table")
  e77bcdd1f6 ("octeontx2-af: Display all enabled PF VF rsrc_alloc entries.")

Adjacent code addition in both cases, keep both.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-28 10:43:58 -07:00
Davide Caratti
f7cc8890f3 mptcp: fix corrupt receiver key in MPC + data + checksum
using packetdrill it's possible to observe that the receiver key contains
random values when clients transmit MP_CAPABLE with data and checksum (as
specified in RFC8684 §3.1). Fix the layout of mptcp_out_options, to avoid
using the skb extension copy when writing the MP_CAPABLE sub-option.

Fixes: d7b2690837 ("mptcp: shrink mptcp_out_options struct")
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/233
Reported-by: Poorva Sonparote <psonparo@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Link: https://lore.kernel.org/r/20211027203855.264600-1-mathew.j.martineau@linux.intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-28 08:19:06 -07:00
Daniel Jordan
da353fac65 net/tls: Fix flipped sign in tls_err_abort() calls
sk->sk_err appears to expect a positive value, a convention that ktls
doesn't always follow and that leads to memory corruption in other code.
For instance,

    [kworker]
    tls_encrypt_done(..., err=<negative error from crypto request>)
      tls_err_abort(.., err)
        sk->sk_err = err;

    [task]
    splice_from_pipe_feed
      ...
        tls_sw_do_sendpage
          if (sk->sk_err) {
            ret = -sk->sk_err;  // ret is positive

    splice_from_pipe_feed (continued)
      ret = actor(...)  // ret is still positive and interpreted as bytes
                        // written, resulting in underflow of buf->len and
                        // sd->len, leading to huge buf->offset and bogus
                        // addresses computed in later calls to actor()

Fix all tls_err_abort() callers to pass a negative error code
consistently and centralize the error-prone sign flip there, throwing in
a warning to catch future misuse and uninlining the function so it
really does only warn once.

Cc: stable@vger.kernel.org
Fixes: c46234ebb4 ("tls: RX path for ktls")
Reported-by: syzbot+b187b77c8474f9648fae@syzkaller.appspotmail.com
Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-28 14:41:20 +01:00
Eric Dumazet
a406290af0 net: cleanup __sk_stream_memory_free()
We now have INDIRECT_CALL_INET_1() macro, no need to use #ifdef CONFIG_INET

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-28 14:40:09 +01:00
Eric Dumazet
27728ba80f tcp: cleanup tcp_remove_empty_skb() use
All tcp_remove_empty_skb() callers now use tcp_write_queue_tail()
for the skb argument, we can therefore factorize code.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-28 12:44:38 +01:00
Paolo Abeni
292e6077b0 net: introduce sk_forward_alloc_get()
A later patch will change the MPTCP memory accounting schema
in such a way that MPTCP sockets will encode the total amount of
forward allocated memory in two separate fields (one for tx and
one for rx).

MPTCP sockets will use their own helper to provide the accurate
amount of fwd allocated memory.

To allow the above, this patch adds a new, optional, sk method to
fetch the fwd memory, wrap the call in a new helper and use it
where it is appropriate.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-27 18:20:29 -07:00
Paolo Abeni
5823fc96d7 tcp: define macros for a couple reclaim thresholds
A following patch is going to implement a similar reclaim schema
for the MPTCP protocol, with different locking.

Let's define a couple of macros for the used thresholds, so
that the latter code will be more easily maintainable.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-27 18:20:29 -07:00
Seth Forshee
267463823a net: sch: eliminate unnecessary RCU waits in mini_qdisc_pair_swap()
Currently rcu_barrier() is used to ensure that no readers of the
inactive mini_Qdisc buffer remain before it is reused. This waits for
any pending RCU callbacks to complete, when all that is actually
required is to wait for one RCU grace period to elapse after the buffer
was made inactive. This means that using rcu_barrier() may result in
unnecessary waits.

To improve this, store the current RCU state when a buffer is made
inactive and use poll_state_synchronize_rcu() to check whether a full
grace period has elapsed before reusing it. If a full grace period has
not elapsed, wait for a grace period to elapse, and in the non-RT case
use synchronize_rcu_expedited() to hasten it.

Since this approach eliminates the RCU callback it is no longer
necessary to synchronize_rcu() in the tp_head==NULL case. However, the
RCU state should still be saved for the previously active buffer.

Before this change I would typically see mini_qdisc_pair_swap() take
tens of milliseconds to complete. After this change it typcially
finishes in less than 1 ms, and often it takes just a few microseconds.

Thanks to Paul for walking me through the options for improving this.

Cc: "Paul E. McKenney" <paulmck@kernel.org>
Signed-off-by: Seth Forshee <sforshee@digitalocean.com>
Link: https://lore.kernel.org/r/20211026130700.121189-1-seth@forshee.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-27 17:09:26 -07:00
Jakub Kicinski
afe8ca110c Merge tag 'mac80211-for-net-2021-10-27' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:

====================
Two fixes:
 * bridge vs. 4-addr mode check was wrong
 * management frame registrations locking was
   wrong, causing list corruption/crashes
====================

Link: https://lore.kernel.org/r/20211027143756.91711-1-johannes@sipsolutions.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-27 08:13:15 -07:00
Vladimir Oltean
716a30a97a net: switchdev: merge switchdev_handle_fdb_{add,del}_to_device
To reduce code churn, the same patch makes multiple changes, since they
all touch the same lines:

1. The implementations for these two are identical, just with different
   function pointers. Reduce duplications and name the function pointers
   "mod_cb" instead of "add_cb" and "del_cb". Pass the event as argument.

2. Drop the "const" attribute from "orig_dev". If the driver needs to
   check whether orig_dev belongs to itself and then
   call_switchdev_notifiers(orig_dev, SWITCHDEV_FDB_OFFLOADED), it
   can't, because call_switchdev_notifiers takes a non-const struct
   net_device *.

Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-27 14:54:02 +01:00
Jakub Kicinski
440ffcdd9d Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:

====================
pull-request: bpf 2021-10-26

We've added 12 non-merge commits during the last 7 day(s) which contain
a total of 23 files changed, 118 insertions(+), 98 deletions(-).

The main changes are:

1) Fix potential race window in BPF tail call compatibility check, from Toke Høiland-Jørgensen.

2) Fix memory leak in cgroup fs due to missing cgroup_bpf_offline(), from Quanyang Wang.

3) Fix file descriptor reference counting in generic_map_update_batch(), from Xu Kuohai.

4) Fix bpf_jit_limit knob to the max supported limit by the arch's JIT, from Lorenz Bauer.

5) Fix BPF sockmap ->poll callbacks for UDP and AF_UNIX sockets, from Cong Wang and Yucong Sun.

6) Fix BPF sockmap concurrency issue in TCP on non-blocking sendmsg calls, from Liu Jian.

7) Fix build failure of INODE_STORAGE and TASK_STORAGE maps on !CONFIG_NET, from Tejun Heo.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Fix potential race in tail call compatibility check
  bpf: Move BPF_MAP_TYPE for INODE_STORAGE and TASK_STORAGE outside of CONFIG_NET
  selftests/bpf: Use recv_timeout() instead of retries
  net: Implement ->sock_is_readable() for UDP and AF_UNIX
  skmsg: Extract and reuse sk_msg_is_readable()
  net: Rename ->stream_memory_read to ->sock_is_readable
  tcp_bpf: Fix one concurrency problem in the tcp_bpf_send_verdict function
  cgroup: Fix memory leak caused by missing cgroup_bpf_offline
  bpf: Fix error usage of map_fd and fdget() in generic_map_update_batch()
  bpf: Prevent increasing bpf_jit_limit above max
  bpf: Define bpf_jit_alloc_exec_limit for arm64 JIT
  bpf: Define bpf_jit_alloc_exec_limit for riscv JIT
====================

Link: https://lore.kernel.org/r/20211026201920.11296-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-10-26 14:38:55 -07:00
Cong Wang
7b50ecfcc6 net: Rename ->stream_memory_read to ->sock_is_readable
The proto ops ->stream_memory_read() is currently only used
by TCP to check whether psock queue is empty or not. We need
to rename it before reusing it for non-TCP protocols, and
adjust the exsiting users accordingly.

Signed-off-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20211008203306.37525-2-xiyou.wangcong@gmail.com
2021-10-26 12:29:33 -07:00
Jeremy Kerr
99ce45d5e7 mctp: Implement extended addressing
This change allows an extended address struct - struct sockaddr_mctp_ext
- to be passed to sendmsg/recvmsg. This allows userspace to specify
output ifindex and physical address information (for sendmsg) or receive
the input ifindex/physaddr for incoming messages (for recvmsg). This is
typically used by userspace for MCTP address discovery and assignment
operations.

The extended addressing facility is conditional on a new sockopt:
MCTP_OPT_ADDR_EXT; userspace must explicitly enable addressing before
the kernel will consume/populate the extended address data.

Includes a fix for an uninitialised var:
Reported-by: kernel test robot <lkp@intel.com>

Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26 14:58:45 +01:00
Eric Dumazet
f8dd3b8d70 tcp: rename sk_stream_alloc_skb
sk_stream_alloc_skb() is only used by TCP.

Rename it to make this clear, and move its declaration
to include/net/tcp.h

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26 14:45:11 +01:00
Eric Dumazet
d18785e213 net: annotate data-race in neigh_output()
neigh_output() reads n->nud_state and hh->hh_len locklessly.

This is fine, but we need to add annotations and document this.

We evaluate skip_cache first to avoid reading these fields
if the cache has to by bypassed.

syzbot report:

BUG: KCSAN: data-race in __neigh_event_send / ip_finish_output2

write to 0xffff88810798a885 of 1 bytes by interrupt on cpu 1:
 __neigh_event_send+0x40d/0xac0 net/core/neighbour.c:1128
 neigh_event_send include/net/neighbour.h:444 [inline]
 neigh_resolve_output+0x104/0x410 net/core/neighbour.c:1476
 neigh_output include/net/neighbour.h:510 [inline]
 ip_finish_output2+0x80a/0xaa0 net/ipv4/ip_output.c:221
 ip_finish_output+0x3b5/0x510 net/ipv4/ip_output.c:309
 NF_HOOK_COND include/linux/netfilter.h:296 [inline]
 ip_output+0xf3/0x1a0 net/ipv4/ip_output.c:423
 dst_output include/net/dst.h:450 [inline]
 ip_local_out+0x164/0x220 net/ipv4/ip_output.c:126
 __ip_queue_xmit+0x9d3/0xa20 net/ipv4/ip_output.c:525
 ip_queue_xmit+0x34/0x40 net/ipv4/ip_output.c:539
 __tcp_transmit_skb+0x142a/0x1a00 net/ipv4/tcp_output.c:1405
 tcp_transmit_skb net/ipv4/tcp_output.c:1423 [inline]
 tcp_xmit_probe_skb net/ipv4/tcp_output.c:4011 [inline]
 tcp_write_wakeup+0x4a9/0x810 net/ipv4/tcp_output.c:4064
 tcp_send_probe0+0x2c/0x2b0 net/ipv4/tcp_output.c:4079
 tcp_probe_timer net/ipv4/tcp_timer.c:398 [inline]
 tcp_write_timer_handler+0x394/0x520 net/ipv4/tcp_timer.c:626
 tcp_write_timer+0xb9/0x180 net/ipv4/tcp_timer.c:642
 call_timer_fn+0x2e/0x1d0 kernel/time/timer.c:1421
 expire_timers+0x135/0x240 kernel/time/timer.c:1466
 __run_timers+0x368/0x430 kernel/time/timer.c:1734
 run_timer_softirq+0x19/0x30 kernel/time/timer.c:1747
 __do_softirq+0x12c/0x26e kernel/softirq.c:558
 invoke_softirq kernel/softirq.c:432 [inline]
 __irq_exit_rcu kernel/softirq.c:636 [inline]
 irq_exit_rcu+0x4e/0xa0 kernel/softirq.c:648
 sysvec_apic_timer_interrupt+0x69/0x80 arch/x86/kernel/apic/apic.c:1097
 asm_sysvec_apic_timer_interrupt+0x12/0x20
 native_safe_halt arch/x86/include/asm/irqflags.h:51 [inline]
 arch_safe_halt arch/x86/include/asm/irqflags.h:89 [inline]
 acpi_safe_halt drivers/acpi/processor_idle.c:109 [inline]
 acpi_idle_do_entry drivers/acpi/processor_idle.c:553 [inline]
 acpi_idle_enter+0x258/0x2e0 drivers/acpi/processor_idle.c:688
 cpuidle_enter_state+0x2b4/0x760 drivers/cpuidle/cpuidle.c:237
 cpuidle_enter+0x3c/0x60 drivers/cpuidle/cpuidle.c:351
 call_cpuidle kernel/sched/idle.c:158 [inline]
 cpuidle_idle_call kernel/sched/idle.c:239 [inline]
 do_idle+0x1a3/0x250 kernel/sched/idle.c:306
 cpu_startup_entry+0x15/0x20 kernel/sched/idle.c:403
 secondary_startup_64_no_verify+0xb1/0xbb

read to 0xffff88810798a885 of 1 bytes by interrupt on cpu 0:
 neigh_output include/net/neighbour.h:507 [inline]
 ip_finish_output2+0x79a/0xaa0 net/ipv4/ip_output.c:221
 ip_finish_output+0x3b5/0x510 net/ipv4/ip_output.c:309
 NF_HOOK_COND include/linux/netfilter.h:296 [inline]
 ip_output+0xf3/0x1a0 net/ipv4/ip_output.c:423
 dst_output include/net/dst.h:450 [inline]
 ip_local_out+0x164/0x220 net/ipv4/ip_output.c:126
 __ip_queue_xmit+0x9d3/0xa20 net/ipv4/ip_output.c:525
 ip_queue_xmit+0x34/0x40 net/ipv4/ip_output.c:539
 __tcp_transmit_skb+0x142a/0x1a00 net/ipv4/tcp_output.c:1405
 tcp_transmit_skb net/ipv4/tcp_output.c:1423 [inline]
 tcp_xmit_probe_skb net/ipv4/tcp_output.c:4011 [inline]
 tcp_write_wakeup+0x4a9/0x810 net/ipv4/tcp_output.c:4064
 tcp_send_probe0+0x2c/0x2b0 net/ipv4/tcp_output.c:4079
 tcp_probe_timer net/ipv4/tcp_timer.c:398 [inline]
 tcp_write_timer_handler+0x394/0x520 net/ipv4/tcp_timer.c:626
 tcp_write_timer+0xb9/0x180 net/ipv4/tcp_timer.c:642
 call_timer_fn+0x2e/0x1d0 kernel/time/timer.c:1421
 expire_timers+0x135/0x240 kernel/time/timer.c:1466
 __run_timers+0x368/0x430 kernel/time/timer.c:1734
 run_timer_softirq+0x19/0x30 kernel/time/timer.c:1747
 __do_softirq+0x12c/0x26e kernel/softirq.c:558
 invoke_softirq kernel/softirq.c:432 [inline]
 __irq_exit_rcu kernel/softirq.c:636 [inline]
 irq_exit_rcu+0x4e/0xa0 kernel/softirq.c:648
 sysvec_apic_timer_interrupt+0x69/0x80 arch/x86/kernel/apic/apic.c:1097
 asm_sysvec_apic_timer_interrupt+0x12/0x20
 native_safe_halt arch/x86/include/asm/irqflags.h:51 [inline]
 arch_safe_halt arch/x86/include/asm/irqflags.h:89 [inline]
 acpi_safe_halt drivers/acpi/processor_idle.c:109 [inline]
 acpi_idle_do_entry drivers/acpi/processor_idle.c:553 [inline]
 acpi_idle_enter+0x258/0x2e0 drivers/acpi/processor_idle.c:688
 cpuidle_enter_state+0x2b4/0x760 drivers/cpuidle/cpuidle.c:237
 cpuidle_enter+0x3c/0x60 drivers/cpuidle/cpuidle.c:351
 call_cpuidle kernel/sched/idle.c:158 [inline]
 cpuidle_idle_call kernel/sched/idle.c:239 [inline]
 do_idle+0x1a3/0x250 kernel/sched/idle.c:306
 cpu_startup_entry+0x15/0x20 kernel/sched/idle.c:403
 rest_init+0xee/0x100 init/main.c:734
 arch_call_rest_init+0xa/0xb
 start_kernel+0x5e4/0x669 init/main.c:1142
 secondary_startup_64_no_verify+0xb1/0xbb

value changed: 0x20 -> 0x01

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-10-26 13:44:18 +01:00